PROBLEM DESCRIPTION


Infoworks systems are configured to make a direct connection to a single Hiveserver host using port 10000, so the ingestion and pipeline jobs depend on this Hiveserver being up. In HDP clusters, there is an option to avail HA for hive through Zookeeper. In this setup, if one Hiveserver is down, Zookeeper will route to another one that is running. Infoworks supports Hive connection via Zookeeper


STEPS TO CONFIGURE INFOWORKS WITH ZOOKEEPER FOR HIVESERVER2:


We need to configure Infoworks to connect to Hive via Zookeeper URL instead of the Hiveserver2 URL


The Zookeeper URL will be in the format below:


jdbc:hive2://<ZOOKEEPER-QUORUM>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2


Sample URL:


jdbc:hive2://<zk0>:2181,<zk1>:2181,<zk2>:2181/default;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2


NOTE: <zk0> through <<zk2> are placeholders to Zookeeper hosts


The url portion which is in bold is the Zookeeper quorum and the one which is not in bold is the extra information. These two need to be separated out for IW so that the URL gets created properly. These properties need to be set in the Admin page as below:


hiveConnectionExtraProperties=serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2


hive=hive2://<zk0>:2181,<zk1>:2181,<zk2>:2181


This will help IWX in getting the correct URL. Please see the screenshot below.


IWX versions : 2.4.x and 2.5.x