Frequently Asked Questions
Where can i learn more about data contribution?
The data group maintains a submission Wiki on Synapse: https://www.synapse.org/#!Synapse:syn8011461/wiki/
What formats can I use to send genomic data?
We can accept the following four formats: BAM, CRAM, unaligned BAM, and FASTQ. For BAM/CRAM/uBAM, we expect proper read group information in the file header.
What is the best format for FASTQ submission?
One FASTQ tarball for each sample is expected, which can be either .tar or tar.gz. If multiple read groups exist per sample, include them all in the same tarball. Files inside of the tarball are named as “readgroupname_[12s].(fq|fastq)(.gz)?”. The postfix could be fq, fastq, fq.gz, fastq.gz. The prefix are readgroupname_1 and readgroupname_2 for paired-ended reads; or readgroupname_s for single-ended reads.
At this time, we cannot accept FASTQ chunks.
What formats can I use to send processed/downstream genomic data?
We can accept VCFs. There is no standard naming/header convention for submitted VCFs. We prefer to also receive the raw data for those submitting VCFs.
May I submit data in increments?
Yes. Please fill out a new data inventory form for every dataset you submit. If you are going to submit data in three increments, please fill out the data inventory form for each of the three datasets.
Are there file name expectations?
Yes. For FASTQ naming, please see What is the best format for FASTQ submission? For BAM/CRAM/unaligned BAM, we expect proper read group information in the file header, but have no restrictions on the file names.