Usecase#3 on Spark-scala

satabdi ray
2 min readSep 28, 2021

Usecase: Source table is on PostgreSQL(any JDBC related Databases) having data as multidimensional array types. Requirement is to read the data and apply any transformation operation and write it to csv as output.

JDBC does not support multidimensional array to read as Spark Dataframe(by Spark SQL JDBC connector). In this case after reading Spark Dataframe returns the hash value of array as shown below.

This is the table structure and data available as input.

When you try to read it as Spark Dataframe through JDBC, it does not read correct data(please check below image) as JDBC has limitations on multidimensional array using Spark DF and it does not support even user defined schema.

Solution: The data has to be read through JDBC directly withought using Spark. Go through the below code snippet, execute each step. Here the sample data is ingested from PostgreSQL directly using JDBC connection, then converting this data into Spark Dataframe and apply any transformation operation(explode operation for example).

Hope this helps to resolve such kind of usecase!

--

--

satabdi ray

Data Engineer Professionally, loves writing, sharing and learning!