Hive connection (Remote Server) to Spark-scala (as sink/source) through JDBC (non Kerberos)

satabdi ray
2 min readMay 28, 2021

If there is a requirement to connect to Hive installed on remote server to a Spark application(written in scala and running on separate AWS EMR), then here you can refer below code snippet(Test.scala) which will help you to read from Hive and writing a Dataframe to this Hive server locally provided you have been given with required connection details.

Try test connection from DB client as mentioned below. You should be able to connect to Hive.

  1. First you have to create a sample table in Hive using SQL for testing purpose while read and write.
  2. Download HiveJDBC42.jar and add below 2 files into intelliJ/File/ProjectSettings/Modules/Dependencies/+ icon for attaching file and then apply and OK. Build you application and execute Test case.

i. HiveJDBC42.jar & ii. HiveJDBC42-EULA.txt

3. Next you have to read a table as a Dataframe to your Spark from Hive following below function.

4. After reading, you can do some query onto it and create a new table where you can write the results of the query as a dataframe there in Hive.

5. You have to set custom JDBC dialect required for hive like below.

6. And in build.sbt, add the below library.(it depends on Scala version used in your application)

libraryDependencies += "org.apache.hive" % "hive-jdbc" % "3.1.2"

7. If you use JDBC for making the connection, then no need to enableHiveSupport in Spark Session.

This is written to establish a quick connection test from your local environement to Hive installed on remote server. Hope this helps!

--

--

satabdi ray

Data Engineer Professionally, loves writing, sharing and learning!