AF-M predictions accompanying the manuscript: Predictomes: A classifier-curated database of AlphaFold-modeled protein-protein interactions

dataset logo thumbnail

Data DOI: 10.15785/SBGRID/1155 | ID: 1155

Walter Laboratory, Harvard Medical School

Release Date: 4 Feb 2025

Data Access Instructions

1. If this dataset is locally available, it should be accessable at /programs/datagrid/1155

2. To download this dataset, please run the following command from your Terminal on a Linux or OS X workstation:

'rsync -av rsync://data.sbgrid.org/10.15785/SBGRID/1155 .' (Harvard Medical School, USA)

Depending on your location, faster access may be available from a Tier 1 site closer to your location

'rsync -av rsync://sbgrid.icm.uu.se/10.15785/SBGRID/1155 .' (Uppsala University, Sweden)

'rsync -av rsync://sbgrid.pasteur.edu.uy/10.15785/SBGRID/1155 .' (Institut Pasteur de Montevideo, Uruguay)

'rsync -av rsync://sbgrid.ncpss.org/10.15785/SBGRID/1155 .' (Shanghai Institutes for Biological Sciences, China)

3. After the transfer is completed, please issue the following command to verify data integrity:

'cd 1155 ; shasum -c files.sha'

Storage requirements: 446G

Biological Sample:

N/A

Dataset Type:

Structural Model

Subject Composition:

Protein

Collection Facility:

N/A

Data Creation Date:

1 Jul 2024

Related Datasets:

None


Cite this Dataset

Schmid, EW; Walter, J. 2025. "N/A.", SBGrid Data Bank, V1, https://doi.org/10.15785/SBGRID/1155.

Download Citation

Dataset Description

The set of all AlphaFold multimer (AF-M) v2.3 pairwise structure predictions accompanying the publication: Predictomes: A classifier-curated database of AlphaFold-modeled protein-protein interactions. This dataset includes prediction pairs used for training random forest classifiers including SPOC, pairs used for 30 ranking experiments, all pairs that belong to the genome maintenance matrix on predictomes.org, and three proteome wide in-silico interaction screens conducted with human DONSON, human STK19, and human USP37. All pairs were generated with ColabFold v1.5.2. All our predictions used AF-M multimer version 3 weights models 1, 2, and 4 with 3 recycles, templates enabled, 1 ensemble, no dropout, and no AMBER relaxation. The Multiple Sequence Alignments (MSAs) (unpaired + paired) supplied to AF-M were generated by the MMSeqs2 server using default settings. Sequences run were generally capped at 3,600 amino acids total to avoid memory exhaustion on GPUs.

Project Members

Name Additional Roles Affiliation While Working on the Project
Ernst W SchmidData Collector, DepositorHarvard Medical School
Johannes WalterPIHarvard Medical School

Reprocessing Instructions

none


License and Terms of use

License: CC0

Terms: Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation, as generated by the SBGrid Data Bank.