FAQs 1 — BLOODPAC

A Data Commons is a shared digital data resource for a scientific community. Data Commons typically consist of a platform that allows management, exploration, analysis and sharing of data to provide scientific evidence, to help create consensus, and to develop best practices for the community. A Data Commons can also support workspaces with Jupyter notebooks, R, Stata and other applications for the collaborative analysis of data.

The BLOODPAC Data Commons (BPDC) uses data to transform informal discussions into evidence-based decision-making among consortium members and nonmembers. The BPDC is built on the Gen3 platform, making it a one-stop shop to deposit, study and share liquid biopsy data following FAIR principles for scientific data management and stewardship. To find out more about BLOODPAC’s approach click here.

The BLOODPAC Data Commons was created by the UChicago Center for Translational Data Science (CTDS), and is managed by the CTDS. This page has more information. For the source code for the platform can be found here. The platform is open-source (Apache License, current version 2.0).

Members can access all features of the BLOODPAC Data Commons. Nonmembers can use the BLOODPAC Discovery Portal, a publicly available subset of the BPDC which allows researchers to store and search liquid biopsy datasets of interest, and then either download the datasets or analyze them in a cloud-based workspace. If you have publication data to contribute, or would like to search current datasets in the BLOODPAC Discovery Portal, click here.

Each BLOODPAC member organization provides a list of authorized users who have access to the Data Commons portal. Only member organizations and their approved employees/affiliates have access to the data in the BLOODPAC Data Commons. Individuals within organizations can make requests for access to the lead BLOODPAC liaison/contact within their organization. Member organizations are responsible for notifying BLOODPAC when user access needs to be modified. In addition, an audit is done annually to review access.

In general, member-submitted datasets consist of contrived or patient samples profiled as part of analytical validation, clinical validation, or clinical research studies. We do not accept cell line data from test development, for instance. We are agnostic to the molecular profiling approach and bio-fluid (e.g. plasma, CSF, saliva), but we ask that relevant clinical and patient-context information is submitted with any dataset, including patient demographics, tumor pathology, treatment response, prior treatment regimens, findings from medical imaging, comorbidities, etc.

Since solid tumor tissue or cells are the gold standards for clinical diagnostics, we are especially interested in comparing and contrasting NGS results (or other molecular profiling modalities) from matched solid tumor tissue with corresponding bio-fluid liquid biopsy data. If you wish to submit data from a longitudinal study in progress, we are happy to accept submissions as data becomes available. In addition, we ask that all datasets submitted are part of a peer-reviewed publication or white paper.

More specifically, datasets contributed by BLOODPAC members should support one of the current BLOODPAC projects including:

Project Exhale: patient data focusing on liquid-tissue concordance in lung cancer
BLOODPAC’s Pre-Analytical MTDEs

If you do not have a dataset in one of the areas noted above, we are happy to discuss alternative datasets for contribution.

Member-provided controlled data is uploaded to the BLOODPAC Data Commons, cannot be removed or transferred as per our data agreement contracts, and is only accessible by approved organizational members. However, when data is made publicly available (i.e. uploaded to the BLOODPAC Discovery Portal), it can be downloaded locally and transferred outside the Data Commons.

First, please reach out to us at info@bloodpac.org to begin the submission process. In general, we will need the following files:

A peer-reviewed publication or white paper
The dataset(s) associated with the paper
A metadata file describing the dataset(s)
A data dictionary that describes the variables in the dataset(s)

Any additional information necessary to replicate the published results or figures based on the submitted dataset(s)

The BloodPAC Minimum Technical Data Element (MTDE) Working Group developed recommendations for 11 required preanalytic attributes that are essential for studies that it sponsors and for data contributed to the BloodPAC Data Commons. These 11 recommended preanalytic data elements (formerly MTDEs) along with the process used to identify them are described here.

We encourage members to submit the following types of de-identified data: read-level data (e.g. BAM files), call-level data (e.g. VCF files), performance-level data (e.g. PPA, NPA, etc.), and metadata files. A combination of metadata, performance-level data, and call-level data would be acceptable, with read-level data being ideal. Imaging data is also accepted in standard medical imaging formats. We have no restrictions or expectations on file names.

We accept the following four formats for genomic data: BAM, CRAM, unaligned BAM, and FASTQ. For BAM/CRAM/uBAM, we expect proper read group information in the file header.

For processed/downstream genomic data, we accept VCFs. There is no standard naming/header convention for submitted VCFs. We prefer to also receive the raw data for those submitting VCFs.

One FASTQ tarball for each sample is expected, which can be either .tar or tar.gz. If multiple read groups exist per sample, include them all in the same tarball. Files inside of the tarball are named as “readgroupname_[12s].(fq|fastq)(.gz)?”. The postfix could be fq, fastq, fq.gz, fastq.gz. The prefixes are readgroupname_1 and readgroupname_2 for paired-ended reads; or readgroupname_s for single-ended reads. At this time, we cannot accept FASTQ chunks.

No, we do not require or expect repeating data submissions from our members.

We encourage all submitters to familiarize themselves with FAIR data principles and make every effort to ensure their data submissions meet these guidelines.

Members can log in to the BLOODPAC member portal to access resources on data submission and features of the data portal. You can also contact the BLOODPAC HelpDesk at bpa-support@datacommons.io for more information.

BLOODPAC Data Commons FAQs