From us decoding your genome, to you understanding your VCF file

The fusion of modern technology with bioinformatics have provided genome sequencing companies with different options to decode your genome. Currently on the market for sequencing companies, are two giants; microarrays and next generation sequencing (NGS). Microarrays decode segments of your genome. They arrived first on the market, and thus are the industrial ‘standard’. Recently, NGS technologies have been developed, introducing an accurate and more widespread sequencing alternative for your genome. Each company adopts one of the two options.  Older well-established companies tend to remain with the ‘standard’, whereas newer companies, like ourselves, tend to adopt NGS technologies. Therefore, depending on who you choose to understand your genome with, will decide how your genome will be decoded.

Let’s compare these genomic giants a little further. As stated previously, microarrays decode segments of your genome. These segments are of scientific significance and generally contain DNA segments controlling ‘important’ phenotypic traits relevant to diseases which are currently ‘actionable’. Totalling up these segments means that around 600,000 letters in your genome will be analysed. Next up, NGS. Today, NGS allows companies to sequence your whole genome, which contains 6 billion letters. Thus, companies using NGS can provide 10000X more data for a slightly higher price. We Dante, use NGS, as well as other newer companies. This allows you to know every single variant of the genetic code you have. This is instead of only being informed of differing variants from predetermined segments of your genome.

As you can imagine, 6,000,000 letters worth of information is a lot of data (around 150 Gb, to be exact!). So, at Dante we go a step further. We collect the data from your 6 billion letters and compare it to a reference ‘healthy’ human genome. From this, we generate a VCF (Variant Call Format) report. This VCF report tells you exactly where in your 6 billion letters you differ from the reference genome, or more simply: every single variant you have from the reference. The report narrows your genetic data down to the specific variants you have and excludes the vast amounts of your genome which is exactly the same to the reference genome. However, of course, because your whole genome was sequenced you also have the option to receive your full raw data too, with us, available on request!  

As with every life choice, comes the decision point and here there is room for confusion and misunderstandings. And this is what we (companies like Dante Labs) and you, face. When genome companies provide you with your fully sequenced genome, the ever curious and fast-learning customers wants to know more (and quite rightly so, knowledge is power)! Softwares, like Promethease, can provide you with the opportunity to know more about your genome, by comparing the genetic data you are supplied with by us (the companies), to the reference genome. These softwares also generate a VCF style report, depending on the ‘data input’ … and this ultimately depends on your genome sequencing provider.  

And this is the problem.

As companies, like us already provide a VCF report, this data becomes your ‘data input’ for softwares such a Promethease. This data is then uploaded to these softwares for comparison with a reference genome. 

But; the data generated from the VCF report has already singled out the points where you differ from the reference genome and removed the parts which are the same.

Unfortunately, this problem cannot be understood by softwares, built to compare simply ‘input data’ to the reference genome. Therefore, the parts that are already removed (because they are the same) appear as ‘-/-’ or ‘no data’. Technically, this is correct because there is no data there! But this is because the data from the VCF has already established the letters with variance and removed the letters where there is no variance.

The technological age is upon us, and it’s happening a little bit too fast for everyone to handle, including the companies! However, with Dante the product you receive is your VCF report from all 6 billion letters, with the option of receiving you 6 billion letters-worth of raw data too! So big data, lots of knowledge, and providing empowerment to you, this is our aim, so now it’s your choice.