How to query time increase in Impala?

In one of my projects we hve Kibn dshbrd with l hrts weve built tht shw us interesting dt n the Iml queries frm the lst 14 dys.

ne f the hrts in the dshbrd shws the 75, 90 nd 95 erentiles f the queries durtin. Thnks t this hrt, few weeks g we ntied tht there is sudden jum in the queries durtin in the lst 23 dys. We hve nther hrt shwing us the number f exetins er hur, nd we sw rrelted jum in tht hrt t.

Tht mment we knew we hd rblem. Nw its time fr little investigtin.

Dignsing The rblem

We exmined the mst mmn exetins frm ll the nes we gt in thse 23 dys nd we fund smething interesting.

The min exetin ws bkend iml demn is ver its memry limit. Yu get tht exetin when query needs ertin iml demn fr its exeutin but tht seifi demn is t 100% memry usge. By the wy, this exetin desnt tell yu whih ne that demn is.

The next exetins were unrehble imld(s): X, Y, Z whih yu get when the sttestres helth hek t ertin demns is negtive. In thse exetins yu n see whih daemons are unrehble. We ntied tht the sme 3 demns er in thse exetins ver nd ver gin.

Then we heked thse demns in the luder Mnger nd we sw tht their memry usge is lmst 100%. The first thing we did ws reset the 3 demns. It didnt wrk, their memry usge quikly jumed t 100% gin.

Wht uld be the rblem? We deided t nlyze the queries in the lst 7 dys t see if mybe there is differene between the lst 23 dys and the dys befre them.

nlyzing The Queries

Thts n interesting ress. First f ll, I need t sy tht mst f ur Iml queries re nt nes tht n nlyst writes nd sumbits. Mst f the queries re generted by BI tls r utmti lerts systems. It mens tht we n esily hek if there is smething different by lking t the queries temltes.

S thts wht we did. We extrted the temltes f the queries frm the lst 7 dys nd erfrmed simle gru by unt. The int ws t see wht re the mst mmn temltes in the st 23 dys mred t the dys befre them.

nd just s we suseted, we fund query temlte tht in the st 3 dys ered but 10,000 times mred t 150 times in the 4 dys befre them.

Then we sked urselves, wht des this query temlte hve t d with the 3 iml demns tht kee rehing 100% memry usge?

The Htstting

We lked t the query temlte nd we sw very lng query with lt f LIKE ertrs nd NT()s. It lked like this:

query temlte.JPG

Thts relly inneficient wy f using the LIKE ertr, nd thts kind f hevy query, but still it desnt exlin the 3 demns issue.

nd then we heked the tble in the query nd we sw smething weird. The tble size ws but 100mb. Less thn the size f n HDFS blk.

We hd n ide wht used the memry exlsin in thse 3 demns.

Iml is leverging dt llity s we guessed the 3 relitins f the tbles HDFS blk re stred in the ext sme 3 demns.

S with simle hd fsk {th} -files -blks -ltins we fund the blk relitins ltins nd it nfirmed ur ssumtin.

Thusnds f queries (with the temlte desribed bve) were exeuted nly in thse 3 iml demns, t leverge dt llity, nd used the memry usge exlsin. Thts htstting.

The Slutin

We mved tht tble t n RDBMS nd tht slved the rblem. We uld ls inrese the relitin ftr f this file but we thught its bd rtie beuse, in ur inin, Hd is nt ment fr suh smll tbles.

How to query time increase in Impala.jpg

nlusins nd Imrvements

We hd 3 nlusins/imrvements frm tht inident:

  • We reted new hrt in the luder Mnger tht shws us the memry usge er iml demn nd we led it in the Iml dshbrd. Tht wy we n identify demns with reltively high memry usge nd dignse the rblem erlier
  • nlyzing the queries in rder t investigte rblem n give yu relly gd lue but whts ging n
  • Smll nd frequently-queried tbles shuldnt be stred in HDFS. Itll use htstting. Dnt get me wrng, we hve mny smll tbles but theyre nt queried tht frequently (10k queries in 23 dys). nd if yu hse t stre them in HDFS mke sure the relitin ftr is high enugh

Original post can be found here.

Interested in upgrading your skills? Check out our trainings.

Siddharth Garg
Software Development Engineer

Share the knowledge

Still have questions?
Connect with us
Thank you.
Your request has been received.
Thank you!
The form has been submitted successfully.