Data Access Alliance

The DAA is a voluntary and open organization of research-data-storage providers and is being developed in collaboration with the Dataverse and Globus projects as a pilot of National Data Service. The DAA has two aims:

  • to facilitate global dissemination and compute access to datasets.
  • to minimize the chance of data loss by replicating SBDB data sets

The development of DAA has been guided by Merce Crosas and Ian Foster. Quarterly meetings with advisors and directors of DAA Centers are scheduled on quarterly basis. For more information about the DAA, please see our 2016 Nature Communications article.

DAA Centers

Although it is expected that DAA membership and architecture will evolve rapidly, in its current state the DAA framework already provides a global solution for data dissemination. DAA centers in Europe, Asia, North America and South America replicate the entire SBDB collection and provide local access to members of regional communities. There are four DAA centers:

  1. Harvard Medical School in the USA - coordinating director: Dr. Piotr Sliz, technical lead: Pete Meyer

  2. Uppsala University in Sweden - coordinating director: Philippe Maia

  3. Shanghai Institutes for Biological Sciences in China - coordinating director: Ming Lei

  4. Institut Pasteur de Montevideo in Uruguay - coordinating director: Alejandro Buschiazzo

As a secondary service, DAA centers can provide local, direct access to data sets for their institutional research groups. For example, Harvard Medical School hosts the entire collection and provides direct access to all data from its computing center.

DAA Satellites

The DAA infrastructure is further extended by the DAA satellites, which replicate fractions of SBDB data sets in their local storage for direct access by members of individual institutions. This mode of participation provides an attractive option for research institutions to develop local archives of all primary data generated by the local community.

We are planning to deploy satellite distribution mode across all 99 institutions that currently participate in SBGrid Consortium.

Currently two satellite sites are in operations:

  • NE-CAT (Northeastern Collaborative Access Team; sector 24-ID) synchrotron beamline at the Advanced Photon Source, in Argonne, IL, replicates all SBDB data sets that originate from NE-CAT beamlines and makes them available to beamline staff and users.

  • Yale University, replicates all data sets from Yale laboratories on its institutional storage and makes them accessible to structural biology workstations through the Network File System.

We anticipate that, as research storage infrastructure catches up with the capacities required to archive larger collections of diffraction data sets, some DAA satellites will elect to replicate a larger fraction of SBDB archives and make them available to the general community.

Joining

If you are interested in joining the DAA, please contact us.