How to load data from GCS to BigQuery?

Using lud Funtins r lud mser t mve dt frm GS t ntive BigQuery tbles is vible rh. We will fus n the usge f lud Funtins here.

In our project, one the delt is vilble in GS, the dt needs t be lded t BigQuery. This rt deends n hw yu imlement yur use se. It deends n whether yu need t trnsfrm the dt befre lding it t BigQuery nd whether yu hve n end-nly dt mdel r wnt t run DML ertins n it.

The rnge f tins ges frm using externl BigQuery tbles (s n mre lding f dt t BigQuery ntive strge), ver using ld jbs run thrugh lud Funtin, using n rhestrtin tl like lud mser r leverging hevy-weight distributed mute frmewrks like Srk r Bem/Dtflw.

Generlly seking it mkes sense t use the lst tin nly if yu need t erfrm trnsfrmtins n dt befre lding it int BigQuery. therwise, using lud Funtins r lud mser t mve dt frm GS t ntive BigQuery tbles is vible rh s well. We will fus n the usge f lud Funtins here.

lud Funtin is server-less nstrut, i.e. yu dnt need t mnge ny infrstruture fr it. Insted, when the triggering event is fired in ur se new file rrives in GS buket it will run iee f de. This iee f de n be simle BigQuery ld jb tht will ld the ntents f the relited file frm GS t BigQuery tble in end nly mde.

If yu wnt t rele trunte & ld srits beuse eventully yu wnt t hve y f reltinl dtbse tble, then the DML funtinlity ffered by BigQuery n be used ut f the bx. T d this, yu wuld first ld the delt file t stging tble in BigQuery nd then run Merge ertin between the trget tble nd the stging tble. The lifeyle f suh lud Funtin wuld lk s fllws:

delt file BigQuery.png

Its imrtnt t int ut tht the Merge ertin wrks s n Usert, i.e. when there is n DML ertin lumn in the rerd in the delt file tht sys wht kind f DML ertin hs t hen with rerd (like there is fr exmle in D strem), then nly udtes nd inserts will be exeuted n the BigQuery trget tble. If yu need t delete rerds, yu will need t rvide infrmtin n the tye f DML ertin s rt f the delt file n er-rerd bsis.

Hwever, lud Funtins n hit limits if yu wnt t run DML ertins nd need t ensure rdering n dt. The issue yu uld fe is tht dt rrives in multile files t the sme time nd these files ntin dt fr the sme id. lud Funtins kik ff immeditely when their triggering event hens.

If the file with rerd fr n id with lter int in time gets ressed befre file with rerd fr the sme id fr n erlier int in time, yu will end u with rruted dt whih is ut f rder. If yur ingestin mehnism t GS n generte multile files ntining rerds fr the sme id t the sme time, yu shuld rther use sequentil ressing f the files n time-bsed shedule, e.g. using lud mser.

Fr exmle, yu uld run the de shwn belw using the ythn ertr nd hieve DML funtinlity withut dt innsistenies using sequentil ressing. If n the ther hnd yur ingestin mehnism n ensure tht files will ntin nly ne rerd er id in bth ld (e.g. when yu hve dily r hurly bth lds f delts) then yu n use lud Funtins with DML sfely withut wrrying but dt innsistenies.

Finlly, its imrtnt t int ut tht lud Funtin triggered thrugh GS event nly gurntees t-lest-ne ressing. This mens, tht in sme rre ses, it is ssible tht funtin will fire twie fr the sme file being written t GS. If yur dt ld ress is idemtent (i.e. yu use Userts/DMLs) then this is nt rblem. If yu ld dt in end-nly mde, then it is best t use SQL query bsed trnsfrmtin inside BigQuery in rder t rvide nsistent dt nd remve tentil dulites.

de exmle f the lud Funtin fr lding delt t BigQuery

Here is n exmle f wht lud Funtin n lk like tht lds delt file t BigQuery in end mde:

lud Funtin fr lding  delt t BigQuery 1.png

It is ls ssible t extend the lud Funtin de t surt DML funtinlity by dding the fllwing funtin. In this se the file wuld be first lded t the stging tble in Trunte mde befre the ressMerge funtin will be lled:

lud Funtin de t surt DML funtinlity 2.png

T inrese de flexibility we n ls vid hrdding the tble nmes. Fr tht we extrt them frm the GS file URI. This requires the GS files t fllw seifi nming nventin, in this se they hve the term refix in the filenme right befre the tble nme:


Original post can be found here.

Interested in upgrading your skills? Check out our trainings.

Siddharth Garg
Software Development Engineer

Share the knowledge

Still have questions?
Connect with us
Thank you.
Your request has been received.
Thank you!
The form has been submitted successfully.