Requirement is to reverse the Explode operation to convert the string into array values on Spark Dataframe.

Code snippet to unit test is given below.

test("Reverse-explode operation") {
import spark.implicits._

val arrayData = Seq(
Row("James", "Blue", "Java"),
Row("James", "Blue", "Spark"))

val arraySchema = new StructType()
.add("name",StringType)
.add("Color",StringType)
.add("knownLanguages", StringType)

val df = spark.createDataFrame(spark.sparkContext.parallelize(arrayData),arraySchema)
df.printSchema()
df.show(false)

df.groupBy("name", "color")
.agg(collect_list("knownLanguages").alias("knownLanguages"))
.show(false)
}

Hope this helps to handle reverse explode related usecase!

--

--

satabdi ray

Data Engineer Professionally, loves writing, sharing and learning!