900 square-meter DNA Sequencing Center fully dedicated to Whole Genome Sequencing 900 square-meter DNA Sequencing Center fully dedicated to Whole Genome Sequencing
Home / Discover Dante Labs / Genetic Data: FASTQ, BAM and VCF
Genetic Data: FASTQ, BAM and VCF

Genetic Data: FASTQ, BAM and VCF

With Dante Labs whole genomes, you always get your raw data. We give raw data because it represents your DNA, is yours, and is an asset for life: in the next months and years you will be able to use your raw data on new tools, by Dante Labs and by other organizations. 

Instead of keeping the raw data and forcing you to return to us, we give you the raw data. 

Raw data can be confusing. It is a lot of files. Some are very large and hard to understand.

In a nutshell:

  • the VCF SNP is the most commonly used file (ex. on third party websites), followed by the VCF INDEL

  • if you don't know what the FASTQ or BAM files are, it will be very hard to read them (they are 100 GB each and requires bioinformatics knowledge)

The table below has a short description that we hope you may find useful to understand what files you will receive by Dante Labs. 

When you sequence your genome with Dante Labs (Whole Genome, Whole GenomeZ, Whole GenomeL), you will get this data:

SNP VCF SNP

VCF stands for Variant Call Format. It is a standardized text file format for representing SNP, INDEL, SV and CNV variation calls.

SNPs (Single Sucleotide Polymorphisms, pronounced “snips”), are the most common type of genetic variation among people. Each SNP represents a difference in a single DNA building block, called a nucleotide.

This is the most used VCF (ex. on third party tools like Sequencing.com)

INDEL VCF INDEL

Indel is a molecular biology term for insertions or deletions in your DNA. The number of INDELs in human genomes is second only to the number of SNPs. They have a key role in your genetics.

SV VCF SV

SVs, or Structural Variants, are large DNA sequences that are inserted, inverted, deleted or duplicated within genomes.

CNV VCF CNV

A CNV (copy number variation) is when the number of copies of a particular gene varies from one individual to the next. Some cancers are believed to be associated with elevated copy numbers of particular genes.

BAM BAM

Binary Alignment Map (BAM) is the comprehensive raw data of genome sequencing; it consists of the lossless, compressed binary representation of the Sequence Alignment Map. BAM files are 90-100 gigabytes in size. They are generated by aligning the FASTQ files to the reference genome.

 FASTQ File FASTQ

FASTQ files contain billions of entries and are about 90-100 gigabytes in size, making them too large to open in a normal text editor. FASTQ files are the ultimate raw data.

 

If you are interested to learn more, we suggest: