When using udf, it is hard to understand what is happening inside the function, because udf() returns a function, not a value.
The benefit of using function using udf() is, you can just declare code block inside udf() just like declaring function, and use your customer parameter (function type) when calling withColumn.
This may help advanced programmer, but when it comes to customizing the function, partial function intervenes.
Lets say we have we have to calculate average but with some detail condition using parameter.
commonly, the usage of withColumn is like below,
df.withColumn("NEW_COLUMN", foo(col("COLUMN")))
def foo = udf( (value:Double) => value + x )
when looking closely, foo is called with Column type parameter. if udf returns value, this does not happend. udf returns userdefinedfunction type function. with this partial function, you can use the userdefinedfunction as normal function.
then how about declaring function like below?
def getSth(str: String) = udf(( (value:Double) => value + str ))
since it returns function, we have to call getSth method first, then call userdefinedfunction last.
df.withColumn("NEW_COLUMN", foo("aa")(col("COLUMN")) )
In functional programming language, curring is mandatory because of these cases which use methods that return function type.
Hard to understand the flow, but it will come to you at last. alas..
'소프트웨어 개발 > Scala - Functional' 카테고리의 다른 글
스파크 전치... (0) | 2018.06.20 |
---|---|
Spark Dataframe UDF - Schema for type Any is not supported (0) | 2018.06.01 |
Spark 2, Decision Tree FeatureImportance (0) | 2018.04.10 |
Scala Data Structure Efficiency (0) | 2017.11.26 |
실사용에 있어 스칼라의 문제와 코틀린 (0) | 2017.11.26 |