소프트웨어 개발/Scala - Functional

Custom parameter when using udf

늘근이 2018. 5. 19. 09:54

When using udf, it is hard to understand what is happening inside the function, because udf() returns a function, not a value.

The benefit of using function using udf() is, you can just declare code block inside udf() just like declaring function, and use your customer parameter (function type) when calling withColumn.

This may help advanced programmer, but when it comes to customizing the function, partial function intervenes.

Lets say we have we have to calculate average but with some detail condition using parameter.

commonly, the usage of withColumn is like below,


df.withColumn("NEW_COLUMN", foo(col("COLUMN")))

def foo = udf( (value:Double) => value + x )


when looking closely, foo is called with Column type parameter. if udf returns value, this does not happend. udf returns userdefinedfunction type function. with this partial function, you can use the userdefinedfunction as normal function.


then how about declaring function like below?

def getSth(str: String) = udf(( (value:Double) => value + str ))

since it returns function, we have to call getSth method first, then call userdefinedfunction last.

df.withColumn("NEW_COLUMN", foo("aa")(col("COLUMN")) )


In functional programming language, curring is mandatory because of these cases which use methods that return function type.

Hard to understand the flow, but it will come to you at last. alas..