Usecase#1 on SparkScala

satabdi ray
1 min readMay 13, 2021

Problem Statement: There is a source DB2 and sink as MongoDb where you are reading data from DB2 and writing it to MongoDb.
But while writing there is an issue arises because of type mismatch.

DF is created reading from DB2, which has few columns with datatype as float or double, that is not supported by MongoDb(as it supports bson type).

Solution:
1. we have to first filter out columns which are having float/double datatype from the DF.
2. on that filter result, we have to apply typer conversion operation as shown on below code snippet.

// show df reading from DB2
df.show()

// Filter float columns/double columns and fix for Type mismatch
val floatColumns = df.schema.fields.filter(x => x.dataType == FloatType || x.dataType == DoubleType).map(x => x.name)
val doubleToStringDf = df.select(
df.columns.map{c =>
if (floatColumns.contains(c))
col(c).cast(StringType)
else
col(c)
}: _*
)
// Check the Df and schema
doubleToStringDf.show()
doubleToStringDf.printSchema()
// Then write function apply on df

This way issue can be resolved. You can try it out and hope it helps!

Happy Learning !

--

--

satabdi ray

Data Engineer Professionally, loves writing, sharing and learning!