We have developed an interactive web-based tool, AVIA, to explore and interpret large sets of genomic variations
(single nucleotide variations and insertion/deletions) to help guide and summarize genomic experiments. The tool
is based on coupling a comprehensive annotation pipeline with a flexible visualization method. We leveraged the
ANNOVAR (Wang et. al, 2010) framework for assigning functional impact to genomic variations by extending its
list of reference annotation databases (
RefSeq,
UCSC,
SIFT,
Polyphen etc.) with additional in-house developed sources
(
Non-B DB,
PolyBrowse). Further, because many users also have their own annotation sources, we have added the ability to
supply their own files as well. The results can be obtained in tabular format or as tracks in whole genome circular views
generated by the
Circos application (Krzywinski et. al, 2009). Users can also select different sets of pre-computed tracks,
including whole genome distributions of different genomic features (genes, exons, repeats), as well as variations analysis
tracks for the 69 CGI public genomes for reference.
This version of AVIA is focused on gene related impact assessment.
Tracks showing the distribution of genes with variations of specific functional effects such as non-synonymous variations,
frame shifts, variable miRNAs target sites or variations in G-quadruplexes in 5'UTRs can be produced. Additional modules
that inspect functional implication of the variations in the non-coding regions of the genome are being developed.
During exploratory work with AVIA, users can browse different tracks with their data and then re-generate signature plots to summarize the project. To our knowledge, this is
the first web-based program that integrates annotation, visualization, and
impact analysis.
We have also developed a variation detection pipeline for Sanger sequencing,
in particular for PCR directed re-sequencing. Using open source programs
phred (basecalling) and polyphred (variation detection), in addition
to in-house tools, we identify mutations in sequences as compared to
a NCBI reference. Once variations have been identified, we determine
the mutation"s impact on the gene and related disease information.
For diagnostic purposes, pictures of each variation can be obtained
for use in publications or for patient files.
For our Sanger pipeline, we have implemented three methods of chromatogram retrieval,
using a querying via the NCBI trace archives, querying publicly available requests,
or submitting by ftp. Using the NCBI trace
archives method, the user will be able to query the database based on
a project, center submission, gene, or by trace_ids. Due to the size
of the traces, the LIMs and the trace archive retrieval are the
preferable methods as the ftp method may have slow connection speeds.
Like many variation detection programs for Sanger Data (e.g. Mutation
Surveyor, VarDetect, Genalys), our pipeline uses peak ratio differences
between two fluorescence bases. The ftp site will become available for a small
set of traces less than 2 GB. However multiple submissions can be done
and then combined.
If you already have mapped genomic data, please click
here to find full descriptions of the tools available.