linuxchild
V2EX  ›  问与答

spark 中当列的类型为 Array 或者 Map 的时候,如何判断两列是否相等

  •  
  •   linuxchild · Nov 24, 2017 · 2849 views
    This topic created in 3104 days ago, the information mentioned may be changed or developed.

    Schema 如下:

     |-- list: array (nullable = true)
     |    |-- element: map (containsNull = true)
     |    |    |-- key: string
     |    |    |-- value: array (valueContainsNull = true)
     |    |    |    |-- element: struct (containsNull = true)
     |    |    |    |    |-- Date: integer (nullable = true)
     |    |    |    |    |-- Name: string (nullable = true)
     |-- list2: array (nullable = true)
     |    |-- element: map (containsNull = true)
     |    |    |-- key: string
     |    |    |-- value: array (valueContainsNull = true)
     |    |    |    |-- element: struct (containsNull = true)
     |    |    |    |    |-- Date: integer (nullable = true)
     |    |    |    |    |-- Name: string (nullable = true)
    

    想过滤出来 list 和 list2 相等的数据,该如何判断?

    使用filter($"list" === $"list2")判断会提示错误:

    org.apache.spark.sql.AnalysisException: cannot resolve '(`list` = `list2`)' due to data type mismatch: Cannot use map type in EqualTo, but the actual input type is array<map<string,array<str
    uct<Date:int,Name:string>>>>.;;
    

    以上,感谢~

    4 replies    2017-11-24 15:55:35 +08:00
    linuxchild
        1
    linuxchild  
    OP
       Nov 24, 2017
    木有人搞么
    MasterC
        2
    MasterC  
       Nov 24, 2017
    自己写 function 进行遍历比较
    czheo
        3
    czheo  
       Nov 24, 2017
    udf
    linuxchild
        4
    linuxchild  
    OP
       Nov 24, 2017
    @MasterC 嗯,刚刚搞出来了,貌似只能自己写 UDF 了


    @czheo 嗯,的确只能这样,感谢
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   5808 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 76ms · UTC 02:32 · PVG 10:32 · LAX 19:32 · JFK 22:32
    ♥ Do have faith in what you're doing.