How to do Indexing in MongoDB with Elastic Search? Part 2

The second part of our article on indexing in MongoDB with Elastic Search. This time we will look at Elastic Search.

ElstiSerh

I just wnted t nte tht this st is just suer little tiny simle exmle f wht yu n hieve with Elsti Serh. There re bks written n it, s I dnt wnt yu t think Elsti Serh its useful just t imlement utmlete inuts. I just find it s n esy t understnd exmle f hw Elsti might hel ding mlex serhes tht MngDB nt rvide us.

The sendry urse f the st is t shw hw yu n imrt yur existing MngDB duments int full text indexed duments in ElstiSerh. gin, the utmlete exmle is smll enugh t be exlined in ne st fr this t. If yu find the text indexing wrld interesting, lese g hed nd red mre but ElstiSerh (ES frm nw n) nd the huge set f fetures it hs.

Im nt ging t exlin here hw t instll ES sine the ress its quite simle. Sine ES is built n Jv, just mke sure yu hve Jv instlled nd the JV_HME vrible set. ne yu hve ES instlled, this is the verll ress well fllw:

  • rete the index fr ur duments.
  • Imrt ur MngDB lletin int ES with tl lled mng-nnetr.
  • Migrte the index reted by mng-nnetr in ES t the index we reted in ste 1.
  • Try ut ur new index nd see hw duments re indexed ll the time while we kee the mng-nnetr running.


reting the ES index

S hw d we rete n index tht erfrms better thn the built in MngDB text index? Wht d we need t nfigure in ES? Well hve t define wht ES lls the nlysis hin. This is simly ut, the ieline thrugh whih eh f the duments we insert int the index will g thrugh in rder t be indexed.

n nlysis hin is frmed by nlysers. nlysers re filters tht tke the dument, nlyse nd mdify it nd ss it t the next ne. Fr exmle there might be n nlyser t remve the s lled st wrds, whih re very mmn wrds tht d nt rvide ny useful infrmtin fr indexing, like the r nd.

nlysers re msed by three funtins: hrter filter, tkenizer nd tken filter. The first ne is in hrge f lening u the string befre its tkenized, fr exmle by striing HTML tgs. The send ne is the resnsible fr slitting it int terms, fr exmle by slitting the string by ses. The lst nes jb is t mdify terms t timize the index urse, fr exmle by remving st wrds r lwersing ll the terms.

ES rvides different nlysers whih serve s strting int fr reting ustm nlysers tht suit better t ny index needs. ne f the lterntives rvided by ES is lled edge_ngrms nlyser. T understnd wht edge n-grms re, we first need t understnd wht n-grms re. s the n-grm wikiedi ge ints ut:

n n-grm is ntiguus sequene f n items frm given sequene f text r seeh

S lets sy yu hve the wrd blueberry, then the 1-grms r unigrms will be:

mongodb_8.JPG


Increasing n by 1, we get the bigrams of blueberry:

mongodb_9.JPG


nd I guess yu knw hw t build the list f trigrms nd 4-grms nd s n

Nw we n see wht edge n-grms re, nd rding t the ES dumenttin:

Edge n-grms re nhred t the beginning f the wrd

Whih mens tht fr blueberry, the edge n-grms will be:

mongodb_10.JPG


See where re we ging with this? If yu hve the wrd blueberry indexed with its edge n-grms, yu n esily rete n utmlete serh mdule. Beuse if user tyes b, it will mth, if the user tyes bl it will mth, if the user tyes bl it wnt mth nymre nd the utmlete tin wuld diser.

S this edge n-grm thing shuld be definitely rt f ur index, nd this is hw well define it:

mongodb_11.JPG


S with this jsn bjet were defining tken filter (filter) lled utmlete_filter. nd were sying tht it will be n edge_ngrm filter whih will hve frm 3-grms u t 20-grms. The resn I used 3 s minimum is beuse fr very big dtbses, hving unigrms wuld slw dwn the erfrmne lt, sine lts f duments wuld mth the serh. Thts why mny websites tht hve utmlete funtin sk users t tye t lest three hrters until they n suggest lterntives.

Nw tht we hve ur tken filter defined, we need t define ur ustm nlyser:

mongodb_12.JPG


Here we define ustm nlyzer lled utmlete, we tell ES tht it will be ustm nlyser, tht will use the stndrd tkeniser nd we set tw filtering stes: lwerse(whih is self-exlntry) nd fter tht we set ur ustm utmlete_filter.

Nw tht we defined the filter nd the nlyser, lets rete the index. Grb nsle nd exeute the fllwing url mmnd:

mongodb_13.JPG

The fulltext_t in the endint URL tells ES t rete new index nmed like tht. The resn I hse tht nme is beuse ur MngDB lletin is nmed fulltext, nd when we imrt it the first time t ES fulltext index will be reted utmtilly. Well lter mve ll the duments frm fulltext t the timized fulltext_t index.

The lst thing we hve t d in ur fulltext_t index is rete the mings. Mings re just grus f duments. Well rete ming lled rtiles nd well define the rerty title nd ntent n it:

mongodb_14.JPG


Yu n see tht we used ur utmlete nlyser fr the title rerty nly. Sine were susedly using this fr n utmlete funtin it mkes n sense t index the rtile ntent (unless yud like t suggest rtile ntent t the user whih wuld be weird).

The knwledged: true resnse mens ur index ws suessfully reted nd the mings dded.

This is how you can create the indexing.

Original post can be found here.

Interested in upgrading your skills? Check out our trainings.

Siddharth Garg
Software Development Engineer

Share the knowledge

Still have questions?
Connect with us
Thank you.
Your request has been received.
Thank you!
The form has been submitted successfully.