Data disaster recovery service update and planning input request

We want to get input on disaster recovery services for data stored on our clusters. In the last year, due to increasing volume of data generation all across MIT, the central IS&T TSM service has become unable to support full off-site mirroring for /orcd/ storage in a timely fashion. Off campus mirrors are now typically 2-3 months behind current cluster state. For this reason we are currently unable to offer disaster recovery backups on Pool or rental storage.

Data remains reasonably secure locally. We use RAID-Z3 storage resilience, which provides built-in redundancy to protect against up to 3 simultaneous drive failures. Our operations team monitors this system closely and replaces failed drives rapidly to ensure redundancy is constantly maintained.

We are currently evaluating how to provide cost-effective options that can take the place of TSM. To help us plan future services, we would love to receive email at orcd-help@mit.edu regarding your expectations for off-site data mirroring. Specifically we would love to hear

  1. To what extent your data is unique (exists nowhere else).
  2. To what extent some of your data is already mirrored or backed up elsewhere.
  3. To what extent you independently follow practices of mirroring critical data to multiple locations, irrespective of services provided by ORCD. 

Note - /home directories are smaller and are mirrored offsite every day, with archives retained for 14 days.