Can We Use HDFS as Back-up Storage?

Have you ever thought of using something which is highly available for Backup Storage? I recently strted t think but hw I uld imlement self hsted, slble, relible bkend infrstruture.

Have you ever thought of using something which is highly available for Backup Storage? I recently strted t think but hw I uld imlement self hsted, slble, relible bkend infrstruture. Between 15 yers f hts, my musi, my fmilys muter bkus nd mny imrtnt files, I hve but 30TB f dt I dnt wnt t lse. With mlwres, bku is the biggest rblem t the digitl ge. Mnging lrge infrstruture t wrk, they hve been giving me nightmres fr mre thn dede. The mre mhines nd dt yu get, the less lets swn few servers nd run rsyn t bku ll the stuff wrk.

  • Yu need n lmst infinite se
  • Yu quikly beme I/ bund s yu run rllel bkus n tens f servers.
  • Restrtin is extremely slow if yu need t restre multile bkus hsted n the sme server.
  • Its esy t lse trks f where yu bku wht, unless yu strt dding NMEs like bku.server.xxx.
  • Lsing bku server mens yu lse ll yur bkus t ne.
  • dding multile huge bku servers is dmn exensive.
  • Shrödingers bkus: The nditin f ny bku is unknwn until restre is ttemted.

While wrking n the rblem, I first thught but mving my bkus t mzn S3 / Glier r VH ubli lud bjet Strge / rhive. Bth slutins re interesting beuse they slve mst f my rblems:

  • Unlimited se, s I dnt hve t wrry but sling my servers.
  • Redundny, s I dnt hve t fer t lse my bkus.
  • They run in the lud which means fewer I/ rblems (in theory).
  • Restrtin is fster (in thery).
  • The rie is relatively he (but 1000$ / mnth fr 100TB f live dt)

Unfrtuntely, there re ls sme blking ns:

  • I didnt wnt t delegte my bkus t third rty, beuse it imlied enryting EVERYTHING. Enrytin imlies lt f U, nd mkes the bkus muh slwer thn simle rsyn. nd dnt tell me but enryting multile terbites dtbses n the fly. Its insne.
  • Yu dnt ntrl the rie. If yur bku rvider dubles their rie, yu just hve t y r rethink yur whle bku liy, whih might be even mre exensive.
  • I/s in mzn S3 & friends re jke when yu need seed.

I strted t hve lk t vrius tls nd ended thinking but using HDFS luster s bku bkend.

  • HDFS wrks n luster, whih mens yu dnt hve t think but filling this r tht server nymre.
  • HDFS sles hrizntlly.
  • HDFS works great with big big files.
  • HDFS slits the big files in hunks, s string 10+TB dtbse is esy.
  • HDFS is bjet strge, s yu n esily run mysqldum | xbstrem - | hdfs t stre lrge MySQL dtbses.
  • Beuse yure running f bunh f servers t the sme time, yu slve the I/ rblems.
  • HDFS mnges relitin. N mre lst bkus beuse single server rshes.
  • HDFS is erfet fr JBD. N mre RID whih sts mney nd I/s.
  • Yu n use smll mhines with just bunh f 4 t 6TB sinning disks nd let the mgi hen.

ne gin there re few ns:

  • HDFS is nt s gd t mnging gzilln smll files.
  • Unlike ZFS / rsnsht, HDFS des nt hndle file dedulitin ntively (but se is he)
  • mlexity: yu need full HDFS luster with nme ndes, jurnl ndes et
  • The HDFS lient requires the whle Jv stk whih yu dnt wnt t instll everywhere.

Imlementtin I strted t wrk n quik nd dirty t rvide HDFS bked bku system.

  • It uses lightweight HDFS lient written in G.
  • It mnges bku rttin with vrible retentin (hurly / dily / weekly / mnthly).
  • It runs rllel bkus.

I strted t test it n smll HDFS luster:

  • 2 smll 20$/mnth servers.
  • 4 * 4TB JBD sinning disks.

Fr diretries full f smll files like /et/, the thrughut is but 30% slwer thn simle rsyn. Fr lrge files, the thrughut is 20% fster thn rsyn beuse were limited by the netwrk. The gd int: restring file is nt but lking fr needle in hystk nymre. All my prerequisites are satisfied. The bd int: mlexity. Building even smll HDFS luster is bit verkill fr yur hme bku. But fr rfessinl use, it wrks like hrm.

Original post can be found here.

Interested in upgrading your skills? Check out our trainings.

Siddharth Garg
Software Development Engineer


Share the knowledge

Still have questions?
Connect with us
Thank you.
Your request has been received.
Thank you!
The form has been submitted successfully.