Genetic Data: FASTQ, BAM and VCF

User blog

Generoso Ianniciello



With Dante Labs whole genomes, you always get your raw data. We give raw data because it represents your DNA, is yours, and is an asset for life: in the next months and years you will be able to use your raw data on new tools, by Dante Labs and by other organizations. 

Instead of keeping the raw data and forcing you to return to us, we give you the raw data. 

Raw data can be confusing. It is a lot of files. Some are very large and hard to understand.

In a nutshell:

  • the VCF SNP is the most commonly used file (ex. on third party websites), followed by the VCF INDEL

  • if you don't know what the FASTQ or BAM files are, it will be very hard to read them (they are 100 GB each and requires bioinformatics knowledge)

The table below has a short description that we hope you may find useful to understand what files you will receive by Dante Labs. 

When you sequence your genome with Dante Labs (Whole Genome, Whole GenomeZ, Whole GenomeL), you will get this data:


VCF stands for Variant Call Format. It is a standardized text file format for representing SNP, INDEL, SV and CNV variation calls.

SNPs (Single Sucleotide Polymorphisms, pronounced “snips”), are the most common type of genetic variation among people. Each SNP represents a difference in a single DNA building block, called a nucleotide.

This is the most used VCF (ex. on third party tools like


Indel is a molecular biology term for insertions or deletions in your DNA. The number of INDELs in human genomes is second only to the number of SNPs. They have a key role in your genetics.


SVs, or Structural Variants, are large DNA sequences that are inserted, inverted, deleted or duplicated within genomes.


A CNV (copy number variation) is when the number of copies of a particular gene varies from one individual to the next. Some cancers are believed to be associated with elevated copy numbers of particular genes.


Binary Alignment Map (BAM) is the comprehensive raw data of genome sequencing; it consists of the lossless, compressed binary representation of the Sequence Alignment Map. BAM files are 90-100 gigabytes in size. They are generated by aligning the FASTQ files to the reference genome.


FASTQ files contain billions of entries and are about 90-100 gigabytes in size, making them too large to open in a normal text editor. FASTQ files are the ultimate raw data.


