The Visual Genomics Centre operates and maintains an HPC storage environment to facilitate and support Bioinformatics research throughout Canada. Our environment is based on SAM-QFS which is a high performance storage archival system that provides both long term retention and high availability. In addition, we utilize Netbackup for general backups of servers that do not need long term archival.


SAM-QFS consists of a Storage Archival Manager (SAM) and the Quick File System (QFS) which together provide data management and high performance. SAM is a hierarchical storage manager which means it stores data in different media (disk, tape) based on policy. QFS is a shared file system which supports up to 256 nodes and access to data at device-rated speeds. 

Features & Benefits of SAM-QFS 

Data Protection
  • Multiple file archives replicated across different storage media - prevents loss from corruption or media failure
  • SAM-Remote allows for remote archival copies to be created
  • Ability to restore files from older archive copies
Long Term Retention
  • Unused data can live "offline" indefinitely until needed (stages back automatically)
  • Import/Export tapes as needed
High Performance & Availability      
  • Data written and accessed at device-rated speeds, enabling superior application I/O rates
  • Fully redundant environment with dedicated infrastructure
  • Extremely fast disaster recovery
  • Additional storage media can be added  without impacting availability
  • Ability to grow file systems
  • Support for up to 256 QFS nodes

VGC SAM-QFS Environment

The VGC SAM-QFS environment consists of master and failover metadata servers, a number of QFS clients and storage media such as disk arrays and tape libraries. Our master metadata server is a SunFire v890 with 4 x 1350MHz UltraSPARC IV processors and 16GB RAM. Our fail-over is a SunFire v880 with 4 x 900MHz UltraSPARC III processors and 8GB RAM. All of the hosts are connected via fibre to a storage area network (SAN) using Sun StorEdge Traffic Manager Software (Multipathed IO) for redundancy.

Our primary disk cache for online files consists of a Sun StorEdge 6320 with 12 terabytes  of high performance fibre channel disks in RAID 5 configuration. 40 terabytes of Sun StorEdge SATA array hold copy 1 of our file archives followed by 424 terabytes (compressed) of tape provided by two StorageTek L700 libraries. Below is a general overview of our environment.

SAM-QFS Environment Overview


Data Archival

Our SAM-QFS policy creates a total of three file archives referred to as copy 1, copy 2 and copy 3. The archive interval at which these copies are created is controlled by a pre-defined policy set for each file system. Copy 1 resides on the SATA arrays and is usually created shortly after a file has been created or modified on the primary disk cache. Copy 2 is created usually within 4 - 12 hours (depending on policy) and resides on tape in a StorageTek L700 library. 

For added data security, archive copy 3 resides in a remote data center on a fibre attached StorageTek L700e accessible by Virtual Private Network (VPN). This ensures that we always have an archive copy to restore from - even in the event of a major data center or building disaster. The diagram below provides a general overview of the stages in which archive copies are created.


SAM-QFS Architecture




