Data Example

File Format

Note

  • For local usage: all formats, except rsID, require data to be sorted first by sequence name and then by leftmost coordinate.
rsID Format

The first column of rsID format must be the rsID and other columns are optional.

Local Usage: -I:rsid /path/to/file

rs11191416	4.67E-15
rs4918072	9.63E-10
rs61848342	6.38E-10
rs4752700	8.02E-11
rs1887318	1.73E-17
...
VCF Format

VCF format must have meta information and header line.

Local Usage: -I:vcf /path/to/file

##VCF meta lines...
#VCF header line
1	64649	rs181431124	A	C	100	PASS	.
1	81125	rs560365426	T	C	100	PASS	.
1	81712	rs558839829	C	T	100	PASS	.
1	88230	rs543088928	T	C	100	PASS	.
...
VCF-Like Format

Note

  • Meta information and header line are optional for VCF-like format.
  • The first five columns are the same as the VCF format.

Local Usage: -I:vcfLike /path/to/file

1	64649	rs181431124	A	C	col6	col7
1	81125	rs560365426	T	C	col6	col7
1	81712	rs558839829	C	T	col6	col7
1	88230	rs543088928	T	C	col6	col7
1	99687	rs139153227	C	T	col6	col7
...
BED-Like Format

The first three column of BED-like format must be the CHROM, START, END and other columns are optional.

Local Usage: -I:bed /path/to/file

chr1	10141	10237
chr1	235503	235929
chr1	237727	237953
chr1	565508	565728
chr1	567451	567955
...
BED-Like Allele Format

The first five column of BED-like allele format must be the CHROM, START, END, REF, ALT and other columns are optional.

Local Usage: -I:bedAllele /path/to/file

1	10000	10001	T	A
1	10000	10001	T	C
1	10000	10001	T	G
1	10001	10002	A	C
1	10002	10003	A	C
...
Coord-Only Format

The first two column of Coord-Only format must be the CHROM, POS and other columns are optional.

Local Usage: -I:coordOnly /path/to/file

1	69091
1	69092
1	13116
1	1645399
1	3706538
1	3706816
...
Coord-Allele Format

The first two column of Coord-Allele format must be the CHROM, POS, REF, ALT and other columns are optional.

Local Usage: -I:coordAllele /path/to/file

1	10000	10001	T	A
1	10000	10001	T	C
1	10000	10001	T	G
1	10001	10002	A	C
1	10002	10003	A	C
...
TAB Format

The TAB format should be used with attributes

  • Required: c,b,e
  • Optional: ref,alt,0,ci,sep

Local Usage: -I:tab,c=1,b=4,e=5,0=true, /path/to/file

1	13482	G	C
1	48204	G	A
1	52152	ATAAT	A
1	54712	T	TTTTC
1	57226	G	A
1	62863	CACTT	C
1	63336	C	T
1	68082	T	C
1	73269	T	A
1	76856	T	A
...

Annotation Database File

A compressed bgzip which contains annotation information for genomic features.
Annotation Database File must be indexed with Index and then can be used for query and annotation programs.

VCF Format Example(1000G_p3.sort.vcf.gz)

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. The 1000 Genomes Project phase 3 it the completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping.

##VCF meta lines...
#VCF header line
1	67948	rs556268856	T	C	100	PASS	AC=3;AF=0.000599042;AN=5008;NS=2504;DP=25933;EAS_AF=0.003;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
1	67955	rs576545302	T	A	100	PASS	AC=3;AF=0.000599042;AN=5008;NS=2504;DP=25869;EAS_AF=0;AMR_AF=0;AFR_AF=0.0023;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
1	68082	rs367789441	T	C	100	PASS	AC=169;AF=0.033746;AN=5008;NS=2504;DP=25952;EAS_AF=0.001;AMR_AF=0.0331;AFR_AF=0.003;EUR_AF=0.0984;SAS_AF=0.0429;AA=.|||;VT=SNP
1	68118	rs562240137	T	C	100	PASS	AC=1;AF=0.000199681;AN=5008;NS=2504;DP=25464;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
1	68247	rs527989887	G	A	100	PASS	AC=4;AF=0.000798722;AN=5008;NS=2504;DP=22292;EAS_AF=0;AMR_AF=0;AFR_AF=0.003;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
1	68337	rs540193047	C	G	100	PASS	AC=1;AF=0.000199681;AN=5008;NS=2504;DP=19265;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
...
VCF Format Example(cosmic.sort.vcf.gz)

COSMIC, the Catalogue Of Somatic Mutations In Cancer, is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer. COSMIC collects these somatic mutation data from a variety of public sources into one standardized repository, and make it easily explorable in a variety of graphical, tabulated and downloadable ways.

##VCF meta lines...
#VCF header line
1	69224	COSV58737130	A	C	.	.	GENE=OR4F5;STRAND=+;LEGACY_ID=COSM3677745;CDS=c.134A>C;AA=p.D45A;CNT=1
1	69230	COSV58737076	A	C	.	.	GENE=OR4F5;STRAND=+;LEGACY_ID=COSM3677746;CDS=c.140A>C;AA=p.H47P;CNT=1
1	69236	COSV58737142	A	C	.	.	GENE=OR4F5;STRAND=+;LEGACY_ID=COSM3677747;CDS=c.146A>C;AA=p.H49P;CNT=1
1	69270	COSV58736820	A	G	.	.	GENE=OR4F5;STRAND=+;LEGACY_ID=COSM5424184;SNP;CDS=c.180A>G;AA=p.S60=;CNT=1
1	69345	COSV58736780	C	A	.	.	GENE=OR4F5;STRAND=+;LEGACY_ID=COSM911918;CDS=c.255C>A;AA=p.I85=;CNT=1
1	69359	COSV58736910	G	T	.	.	GENE=OR4F5;STRAND=+;LEGACY_ID=COSM6401900;CDS=c.269G>T;AA=p.C90F;CNT=2
1	69486	COSV58736947	C	T	.	.	GENE=OR4F5;STRAND=+;LEGACY_ID=COSM6734473;CDS=c.396C>T;AA=p.N132=;CNT=1
1	69511	COSV58736924	A	G	.	.	GENE=OR4F5;STRAND=+;LEGACY_ID=COSM4144171;SNP;CDS=c.421A>G;AA=p.T141A;CNT=1
1	69517	COSV58737059	G	A	.	.	GENE=OR4F5;STRAND=+;LEGACY_ID=COSM3492078;CDS=c.427G>A;AA=p.G143R;CNT=1
...
BED-like Format Example(roadmap.sort.bed.gz)

The NIH Roadmap Epigenomics Mapping Consortium was launched with the goal of producing a public resource of human epigenomic data to catalyze basic biology and disease-oriented research. The Consortium leverages experimental pipelines built around next-generation sequencing technologies to map DNA methylation, histone modifications, chromatin accessibility and small RNA transcripts in stem cells and primary ex vivo tissues selected to represent the normal counterparts of tissues and organ systems frequently involved in human disease.

#chrom	chromStart	chromEnd	name	score	strand	signalValue	pValue	qValue	peak	cellMark	cellID	cellName
1	9959	10511	Rank_24984	68	.	2.62240	6.83866	4.25719	287	E062-H3K9me3	E062	Primary mononuclear cells from peripheral blood
1	9975	10710	Rank_1550	475	.	12.86108	47.54574	44.03423	370	E080-H3K9me3	E080	Fetal Adrenal Gland
1	10003	10563	Rank_1363	488	.	13.42901	48.85612	44.82438	248	E084-H3K9me3	E084	Fetal Intestine Large
1	10011	10631	Rank_3377	217	.	8.95720	21.74096	18.31963	239	E092-H3K9me3	E092	Fetal Stomach
1	10012	10397	Rank_15590	193	.	7.50000	19.36886	16.46100	157	E061-H3K36me3	E061	Foreskin Melanocyte Primary Cells skin03
1	10012	10425	Rank_15065	71	.	3.84862	7.13214	4.01348	237	E012-H3K9me3	E012	hESC Derived CD56+ Ectoderm Cultured Cells
...
Coord-Allele Format Example(dbscSNV.sort.tab.gz)

dbscSNV includes all potential human SNVs within splicing consensus regions (−3 to +8 at the 5’ splice site and −12 to +2 at the 3’ splice site), i.e. scSNVs, related functional annotations and two ensemble prediction scores for predicting their potential of altering splicing.

1	860326	A	C	1	924946	n	y	upstream	SAMD11	.	.	UTR5	ENSG00000187634	.	.	0.00764482882370293	0.03
1	860326	A	G	1	924946	n	y	upstream	SAMD11	.	.	UTR5	ENSG00000187634	.	.	0.00764482882370293	0.032
1	860326	A	T	1	924946	n	y	upstream	SAMD11	.	.	UTR5	ENSG00000187634	.	.	0.00692000194525311	0.03
1	860327	A	C	1	924947	n	y	upstream	SAMD11	.	.	UTR5	ENSG00000187634	.	.	0.00430955476585136	0.04
1	860327	A	G	1	924947	n	y	upstream	SAMD11	.	.	UTR5	ENSG00000187634	.	.	0.00430955476585136	0.04
1	860327	A	T	1	924947	n	y	upstream	SAMD11	.	.	UTR5	ENSG00000187634	.	.	0.00430955476585136	0.042
...

Result File

Count Result

Query: q2.sort.bed  Database: 1000G_p3.sort.vcf.gz, roadmap.sort.bed.gz

Output file: q2.sort.bed.count.gz

Chr	Begin	End	1000g	roadmap	Total
chr1	10141	10237	2	171	173
chr1	235503	235929	4	48	52
chr1	237727	237953	4	51	55
chr1	565508	565728	29	94	123
chr1	567451	567955	51	132	183
chr1	569701	570067	34	154	188
chr1	570129	570274	7	37	44
...
QueryRegion Result

Query: 1:2298288-2298289  Database: 1000G_p3.sort.vcf.gz

Output1 in console:

1000G_p3.sort.vcf.gz	1	2298289	rs182863424	T	C	100	PASS	AC=4;AF=0.000798722;AN=5008;NS=2504;DP=17505;EAS_AF=0.003;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=T|||;VT=SNP

Query: 1:959100-959200  Database: 1000G_p3.sort.vcf.gz, cosmic.sort.vcf.gz

Output2 in console:

1000g	1	959104	rs538473605	G	A	100	PASS	AC=1;AF=0.000199681;AN=5008;NS=2504;DP=17741;EAS_AF=0.001;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP
1000g	1	959128	rs548777990	C	T	100	PASS	AC=2;AF=0.000399361;AN=5008;NS=2504;DP=17304;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0.001;AA=C|||;VT=SNP
1000g	1	959137	rs568781463	G	C	100	PASS	AC=1;AF=0.000199681;AN=5008;NS=2504;DP=17086;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0.001;AA=G|||;VT=SNP
1000g	1	959155	rs3845291	G	A	100	PASS	AC=2404;AF=0.480032;AN=5008;NS=2504;DP=16907;EAS_AF=0.7361;AMR_AF=0.6182;AFR_AF=0.1309;EUR_AF=0.5398;SAS_AF=0.5286;AA=G|||;VT=SNP
1000g	1	959169	rs3845292	G	C	100	PASS	AC=2830;AF=0.565096;AN=5008;NS=2504;DP=16335;EAS_AF=0.8264;AMR_AF=0.6398;AFR_AF=0.3154;EUR_AF=0.5497;SAS_AF=0.5961;AA=c|||;VT=SNP
1000g	1	959193	rs188044457	G	A	100	PASS	AC=6;AF=0.00119808;AN=5008;NS=2504;DP=15909;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0.0051;AA=G|||;VT=SNP
cosmic	1	959109	COSV65072933	C	T	.	.	GENE=AGRN;STRAND=+;LEGACY_ID=COSN6051067;CDS=c.463+1267C>T;AA=p.?;CNT=1
Intersect Result("OVERLAP" file)

An OVERLAP file is the result of Intersect program, and can be an intermediate file that can be used with AnnotationIntersectFile.

Description of OVERLAP file format:

  • Comment lines start with '@';
  • Query lines start with '#';
  • Database lines start with the database tag.

Note

  • Comment lines can be removed by setting -RC true".
  • Comment lines are required for using AnnotationIntersectFile program. Please refer AnnotationIntersectFile section for details.

Exact Mode Example  Query: q1.sort.vcf  Database: 1000G_p3.sort.vcf.gz, cosmic.sort.vcf.gz

Output file: q1.sort.vcf.overlap.gz

@query_file=/test_data/q1.sort.vcf
@query_format=2,1,2,2,0,##,4,5,true
@header=CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
@db_path=/test_data/1000G_p3.sort.vcf.gz
@db_index_type=VARNOTE
@db_tag=1000g
@db_path=/test_data/cosmic.sort.vcf.gz
@db_index_type=TBI
@db_tag=cosmic
@out_file=/test_data/q1.sort.vcf.overlap.gz
@end
#query	1	81125	rs560365426	T	C	100	PASS	.
1000g	1	81125	rs560365426	T	C	100	PASS	AC=1;AF=0.000199681;AN=5008;NS=2504;DP=22536;EAS_AF=0.001;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
#query	1	81712	rs558839829	C	T	100	PASS	.
1000g	1	81712	rs558839829	C	T	100	PASS
...
#query	1	914333	rs13302979	C	G	100	PASS	.
1000g	1	914333	rs13302979	C	G	100	PASS	AC=2786;AF=0.55631;AN=5008;NS=2504;DP=12459;EAS_AF=0.8056;AMR_AF=0.6542;AFR_AF=0.2436;EUR_AF=0.6034;SAS_AF=0.6043;AA=G|||;VT=SNP
cosmic	1	914333	COSV58020681	C	G	.	.	GENE=PERM1;STRAND=-;LEGACY_ID=COSM4591185;SNP;CDS=c.1795G>C;AA=p.E599Q;CNT=18
cosmic	1	914333	COSV58020681	C	G	.	.	GENE=PERM1_ENST00000341290;STRAND=-;LEGACY_ID=COSM4591185;SNP;CDS=c.1735G>C;AA=p.E579Q;CNT=18
#query	1	923978	rs70949537	A	AG	100	PASS	.
1000g	1	923978	rs70949537	A	AG	100	PASS	AC=4568;AF=0.912141;AN=5008;NS=2504;DP=18544;EAS_AF=0.9554;AMR_AF=0.9424;AFR_AF=0.7761;EUR_AF=0.9573;SAS_AF=0.9836;AA=|||unknown(HR);VT=INDEL
...

Output file with option -loj true(set left join to reports each of query record regardless of whether containing intersected records):

...
#query	1	64649	rs181431124	A	C	100	PASS	.
#query	1	81125	rs560365426	T	C	100	PASS	.
1000g	1	81125	rs560365426	T	C	100	PASS	AC=1;AF=0.000199681;AN=5008;NS=2504;DP=22536;EAS_AF=0.001;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
#query	1	81712	rs558839829	C	T	100	PASS	.
1000g	1	81712	rs558839829	C	T	100	PASS	AC=1;AF=0.000199681;AN=5008;NS=2504;DP=20171;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
...

Intersect Mode Example  Query: q2.sort.bed  Database: roadmap.sort.bed.gz

Output file: q2.sort.bed.overlap.gz

@query_file=/test_data/q2.sort.bed
@query_format=65536,1,2,3,0,##,-1,-1,false
@header=col1	col2	col3
@db_path=/test_data/roadmap.sort.bed.gz
@db_index_type=VARNOTE
@db_tag=roadmap.sort.bed.gz
@out_file==/test_data/q2.sort.bed.overlap.gz
@end
#query	chr1	10141	10237
roadmap.sort.bed.gz	1	9959	10511	Rank_24984	68	.	2.62240	6.83866	4.25719	287	E062-H3K9me3	E062	Primary mononuclear cells from peripheral blood
roadmap.sort.bed.gz	1	9975	10710	Rank_1550	475	.	12.86108	47.54574	44.03423	370	E080-H3K9me3	E080	Fetal Adrenal Gland
roadmap.sort.bed.gz	1	10003	10563	Rank_1363	488	.	13.42901	48.85612	44.82438	248	E084-H3K9me3	E084	Fetal Intestine Large
roadmap.sort.bed.gz	1	10011	10631	Rank_3377	217	.	8.95720	21.74096	18.31963	239	E092-H3K9me3	E092	Fetal Stomach
roadmap.sort.bed.gz	1	10012	10397	Rank_15590	193	.	7.50000	19.36886	16.46100	157	E061-H3K36me3	E061	Foreskin Melanocyte Primary Cells skin03
roadmap.sort.bed.gz	1	10012	10425	Rank_15065	71	.	3.84862	7.13214	4.01348	237	E012-H3K9me3	E012	hESC Derived CD56+ Ectoderm Cultured Cells
roadmap.sort.bed.gz	1	10012	10643	Rank_6221	143	.	7.06203	14.32418	10.81042	249	E093-H3K9me3	E093	Fetal Thymus
roadmap.sort.bed.gz	1	10013	10542	Rank_10363	165	.	6.79631	16.50425	13.62928	243	E016-H3K9me3	E016	HUES64 Cells
...
#query	chr1	235503	235929
roadmap.sort.bed.gz	1	235476	235948	Rank_67025	61	.	3.83294	6.17368	4.21646	262	E118-H3K4me1	E118	HepG2 Hepatocellular Carcinoma Cell Line
roadmap.sort.bed.gz	1	235502	235748	Rank_76537	93	.	4.95012	9.30910	7.14876	151	E034-H3K27me3	E034	Primary T cells from peripheral blood
roadmap.sort.bed.gz	1	235538	235953	Rank_78927	47	.	2.99401	4.70182	2.87583	135	E123-H3K4me2	E123	K562 Leukemia Cells
roadmap.sort.bed.gz	1	235538	236066	Rank_36164	97	.	4.30883	9.79599	7.66409	129	E123-H3K27ac	E123	K562 Leukemia Cells
roadmap.sort.bed.gz	1	235541	235947	Rank_185476	35	.	2.60000	3.56270	1.77043	198	E114-H3K27me3	E114	A549 EtOH 0.02pct Lung Carcinoma Cell Line
roadmap.sort.bed.gz	1	235545	235761	Rank_84115	57	.	4.18234	5.75779	4.17927	191	E030-H3K4me1	E030	Primary neutrophils from peripheral blood
...

Multiple Mode Example  Query: q1.sort.vcf  1000G_p3.sort.vcf.gz, roadmap.sort.bed.gz

Output file: q1.twomode.overlap.gz

@query_file=/test_data/q1.sort.vcf
@query_format=2,1,2,2,0,##,4,5,true
@header=CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
@db_path=/test_data/1000G_p3.sort.vcf.gz
@db_index_type=VARNOTE
@db_tag=1000g
@db_path=/test_data/roadmap.sort.bed.gz
@db_index_type=VARNOTE
@db_tag=roadmap
@out_file=/test_data/q1.twomode.overlap.gz
@end
#query	1	81125	rs560365426	T	C	100	PASS	.
1000g	1	81125	rs560365426	T	C	100	PASS	AC=1;AF=0.000199681;AN=5008;NS=2504;DP=22536;EAS_AF=0.001;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
#query	1	81712	rs558839829	C	T	100	PASS	.
1000g	1	81712	rs558839829	C	T	100	PASS	AC=1;AF=0.000199681;AN=5008;NS=2504;DP=20171;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
#query	1	88230	rs543088928	T	C	100	PASS	.
1000g	1	88230	rs543088928	T	C	100	PASS	AC=1;AF=0.000199681;AN=5008;NS=2504;DP=18579;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP
roadmap	1	88072	88328	Rank_137612	41	.	3.51627	4.16325	2.13235	117	E107-H3K27me3	E107	Skeletal Muscle Male
roadmap	1	88103	88478	Rank_26335	76	.	4.25000	7.66459	4.62783	188	E117-H3K27me3	E117	HeLa-S3 Cervical Carcinoma Cell Line
roadmap	1	88106	88400	Rank_14127	90	.	6.46950	9.08008	6.56416	154	E070-H3K27me3	E070	Brain Germinal Matrix
roadmap	1	88126	88414	Rank_14959	103	.	5.72283	10.34884	7.39796	170	E108-H3K27me3	E108	Skeletal Muscle Female
...

Intersection with remote DB  Query: q3.sort.tab  Database: http://202.113.53.226/VarNoteDB/VarNoteDB_AF_gnomAD_Genome.vcf.gz

Output file: q3.sort.tab.overlap.gz

@query_file=/test_data/q3.sort.tab
@query_format=0,1,2,2,0,##,3,4,false
@header=col1	col2	col3	col4
@db_path=http://202.113.53.226/VarNoteDB/VarNoteDB_AF_gnomAD_Genome.vcf.gz
@db_index_type=VARNOTE
@db_tag=gnomAD
@out_file=/test_data/q3.sort.tab.overlap.gz
@end
#query	1	13482	G	C
gnomAD	1	13482	rs537951473	G	C	624.47	RF;AC0	AC=0;AF=0.00000e+00;AN=29724;BaseQRankSum=-1.74200e+00;ClippingRankSum=-3.87000e-01;DP=1059428;FS=4.58900e+00;InbreedingCoeff=-1.20000e-03;MQ=2.99300e+01;MQRankSum=-1.70000e-01;QD=4.40000e-01;ReadPosRankSum=6.37000e-01;SOR=1.39200e+00;VQSLOD=-5.04700e+01;VQSR_culprit=MQ;GQ_HIST_ALT=1|0|1|1|0|0|1|0|1|1|2|1|1|1|0|0|3|0|1|3;DP_HIST_ALT=0|0|0|0|0|0|0|0|1|2|0|0|1|4|2|2|0|1|3|0;AB_HIST_ALT=0|0|11|7|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;GQ_HIST_ALL=31|144|91|140|130|49|66|30|16|14|13|6|16|4|16|7|36|11|44|14593;DP_HIST_ALL=250|310|107|26|31|35|1005|2734|2043|1549|1445|1280|1150|917|748|510|384|265|202|142;AB_HIST_ALL=0|0|11|7|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;AC_Male=0;AC_Female=0;AN_Male=16338;AN_Female=13386;AF_Male=0.00000e+00;AF_Female=0.00000e+00;GC_Male=8169,0,0;GC_Female=6693,0,0;GC_raw=15439,18,0;AC_raw=18;AN_raw=30914;GC=14862,0,0;AF_raw=5.82260e-04;Hom_AFR=0;Hom_AMR=0;Hom_ASJ=0;Hom_EAS=0;Hom_FIN=0;Hom_NFE=0;Hom_OTH=0;Hom=0;Hom_raw=0;AC_AFR=0;AC_AMR=0;AC_ASJ=0;AC_EAS=0;AC_FIN=0;AC_NFE=0;AC_OTH=0;AN_AFR=8662;AN_AMR=798;AN_ASJ=240;AN_EAS=1618;AN_FIN=3494;AN_NFE=13990;AN_OTH=922;AF_AFR=0.00000e+00;AF_AMR=0.00000e+00;AF_ASJ=0.00000e+00;AF_EAS=0.00000e+00;AF_FIN=0.00000e+00;AF_NFE=0.00000e+00;AF_OTH=0.00000e+00;POPMAX=.;AC_POPMAX=.;AN_POPMAX=.;AF_POPMAX=.;DP_MEDIAN=72;DREF_MEDIAN=1.24822e-06;GQ_MEDIAN=60;AB_MEDIAN=1.36395e-01;AS_RF=9.76716e-03;AS_FilterStatus=RF|AC0;AS_RF_NEGATIVE_TRAIN=1;CSQ=C|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000423562|unprocessed_pseudogene||||||||||rs537951473|1|881|-1||SNV|1|HGNC|38034||||||||||||||C:0.0004|C:0|C:0.0015|C:0|C:0|C:0|C:0|||C:0|C:4.500e-05|C:0.003195|C:0.0001704|C:0|C:0|C:0|C:0||||||||||||,C|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000438504|unprocessed_pseudogene||||||||||rs537951473|1|881|-1||SNV|1|HGNC|38034|YES|||||||||||||C:0.0004|C:0|C:0.0015|C:0|C:0|C:0|C:0|||C:0|C:4.500e-05|C:0.003195|C:0.0001704|C:0|C:0|C:0|C:0||||||||||||,C|non_coding_transcript_exon_variant&non_coding_transcript_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000450305|transcribed_unprocessed_pseudogene|6/6||ENST00000450305.2:n.444G>C||444|||||rs537951473|1||1||SNV|1|HGNC|37102||||||||||||||C:0.0004|C:0|C:0.0015|C:0|C:0|C:0|C:0|||C:0|C:4.500e-05|C:0.003195|C:0.0001704|C:0|C:0|C:0|C:0||||||||||||,C|non_coding_transcript_exon_variant&non_coding_transcript_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000456328|processed_transcript|3/3||ENST00000456328.2:n.730G>C||730|||||rs537951473|1||1||SNV|1|HGNC|37102|YES|||||||||||||C:0.0004|C:0|C:0.0015|C:0|C:0|C:0|C:0|||C:0|C:4.500e-05|C:0.003195|C:0.0001704|C:0|C:0|C:0|C:0||||||||||||,C|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000488147|unprocessed_pseudogene||||||||||rs537951473|1|922|-1||SNV|1|HGNC|38034||||||||||||||C:0.0004|C:0|C:0.0015|C:0|C:0|C:0|C:0|||C:0|C:4.500e-05|C:0.003195|C:0.0001704|C:0|C:0|C:0|C:0||||||||||||,C|non_coding_transcript_exon_variant&non_coding_transcript_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000515242|transcribed_unprocessed_pseudogene|3/3||ENST00000515242.2:n.723G>C||723|||||rs537951473|1||1||SNV|1|HGNC|37102||||||||||||||C:0.0004|C:0|C:0.0015|C:0|C:0|C:0|C:0|||C:0|C:4.500e-05|C:0.003195|C:0.0001704|C:0|C:0|C:0|C:0||||||||||||,C|non_coding_transcript_exon_variant&non_coding_transcript_variant|MODIFIER|DDX11L1|ENSG00000223972|Transcript|ENST00000518655|transcribed_unprocessed_pseudogene|3/4||ENST00000518655.2:n.561G>C||561|||||rs537951473|1||1||SNV|1|HGNC|37102||||||||||||||C:0.0004|C:0|C:0.0015|C:0|C:0|C:0|C:0|||C:0|C:4.500e-05|C:0.003195|C:0.0001704|C:0|C:0|C:0|C:0||||||||||||,C|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000538476|unprocessed_pseudogene||||||||||rs537951473|1|929|-1||SNV|1|HGNC|38034||||||||||||||C:0.0004|C:0|C:0.0015|C:0|C:0|C:0|C:0|||C:0|C:4.500e-05|C:0.003195|C:0.0001704|C:0|C:0|C:0|C:0||||||||||||,C|downstream_gene_variant|MODIFIER|WASH7P|ENSG00000227232|Transcript|ENST00000541675|unprocessed_pseudogene||||||||||rs537951473|1|881|-1||SNV|1|HGNC|38034||||||||||||||C:0.0004|C:0|C:0.0015|C:0|C:0|C:0|C:0|||C:0|C:4.500e-05|C:0.003195|C:0.0001704|C:0|C:0|C:0|C:0||||||||||||,C|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00001576075|CTCF_binding_site||||||||||rs537951473|1||||SNV|1||||||||||||||||C:0.0004|C:0|C:0.0015|C:0|C:0|C:0|C:0|||C:0|C:4.500e-05|C:0.003195|C:0.0001704|C:0|C:0|C:0|C:0||||||||||||;GC_AFR=4331,0,0;GC_AMR=399,0,0;GC_ASJ=120,0,0;GC_EAS=809,0,0;GC_FIN=1747,0,0;GC_NFE=6995,0,0;GC_OTH=461,0,0;Hom_Male=0;Hom_Female=0
#query	1	48204	G	A
gnomAD	1	48204	rs548809068	G	A,T	2087.50	PASS	AC=9,2;AF=3.16589e-04,7.03532e-05;AN=28428;BaseQRankSum=7.67000e-01;ClippingRankSum=1.70000e-02;DP=526368;FS=1.31000e+00;InbreedingCoeff=3.64000e-02;MQ=2.71500e+01;MQRankSum=-2.23000e-01;QD=3.16000e+00;ReadPosRankSum=5.61000e-01;SOR=8.28000e-01;VQSLOD=-4.89500e+01;VQSR_culprit=MQ;GQ_HIST_ALT=0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|1|1|10,0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0;DP_HIST_ALT=0|0|0|0|0|1|1|3|2|0|2|2|0|0|1|0|0|0|1|0,0|0|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;AB_HIST_ALT=0|0|2|2|4|3|1|0|0|0|0|1|0|0|0|0|0|0|0|0,0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0;GQ_HIST_ALL=98|54|55|191|382|280|594|623|322|656|714|356|1381|267|869|320|1060|100|935|5870;DP_HIST_ALL=181|702|1383|1550|2125|1843|3002|3745|548|37|6|2|1|0|1|0|0|0|1|0;AB_HIST_ALL=0|0|2|2|4|3|1|0|0|0|0|1|0|0|0|0|0|0|0|0;AC_Male=3,2;AC_Female=6,0;AN_Male=15750;AN_Female=12678;AF_Male=1.90476e-04,1.26984e-04;AF_Female=4.73261e-04,0.00000e+00;GC_Male=7871,3,0,0,0,1;GC_Female=6333,6,0,0,0,0;GC_raw=15113,13,0,0,0,1;AC_raw=13,2;AN_raw=30254;GC=14204,9,0,0,0,1;AF_raw=4.29695e-04,6.61070e-05;Hom_AFR=0,0;Hom_AMR=0,1;Hom_ASJ=0,0;Hom_EAS=0,0;Hom_FIN=0,0;Hom_NFE=0,0;Hom_OTH=0,0;Hom=0,1;Hom_raw=0,1;AC_AFR=9,0;AC_AMR=0,2;AC_ASJ=0,0;AC_EAS=0,0;AC_FIN=0,0;AC_NFE=0,0;AC_OTH=0,0;AN_AFR=8212;AN_AMR=726;AN_ASJ=250;AN_EAS=1588;AN_FIN=3076;AN_NFE=13686;AN_OTH=890;AF_AFR=1.09596e-03,0.00000e+00;AF_AMR=0.00000e+00,2.75482e-03;AF_ASJ=0.00000e+00,0.00000e+00;AF_EAS=0.00000e+00,0.00000e+00;AF_FIN=0.00000e+00,0.00000e+00;AF_NFE=0.00000e+00,0.00000e+00;AF_OTH=0.00000e+00,0.00000e+00;POPMAX=AFR,AMR;AC_POPMAX=9,2;AN_POPMAX=8212,726;AF_POPMAX=1.09596e-03,2.75482e-03;DP_MEDIAN=42,11;DREF_MEDIAN=1.58489e-16,7.93930e-32;GQ_MEDIAN=99,33;AB_MEDIAN=2.22222e-01,4.90833e-01;AS_RF=1.07496e-01,4.06749e-01;AS_FilterStatus=RF,PASS;AS_RF_NEGATIVE_TRAIN=1,2;CSQ=A|upstream_gene_variant|MODIFIER|OR4G4P|ENSG00000268020|Transcript|ENST00000594647|unprocessed_pseudogene||||||||||rs548809068|1|4845|1||SNV||HGNC|14822||||||||||||||A:0.0004||A:0.0008|A:0|A:0.001|A:0|A:0||||||||||||||||||||||,T|upstream_gene_variant|MODIFIER|OR4G4P|ENSG00000268020|Transcript|ENST00000594647|unprocessed_pseudogene||||||||||rs548809068|2|4845|1||SNV||HGNC|14822||||||||||||||A:0.0004||A:0.0008|A:0|A:0.001|A:0|A:0||||||||||||||||||||||,A|upstream_gene_variant|MODIFIER|OR4G4P|ENSG00000268020|Transcript|ENST00000606857|unprocessed_pseudogene||||||||||rs548809068|1|4269|1||SNV||HGNC|14822|YES|||||||||||||A:0.0004||A:0.0008|A:0|A:0.001|A:0|A:0||||||||||||||||||||||,T|upstream_gene_variant|MODIFIER|OR4G4P|ENSG00000268020|Transcript|ENST00000606857|unprocessed_pseudogene||||||||||rs548809068|2|4269|1||SNV||HGNC|14822|YES|||||||||||||A:0.0004||A:0.0008|A:0|A:0.001|A:0|A:0||||||||||||||||||||||;GC_AFR=4097,9,0,0,0,0;GC_AMR=362,0,0,0,0,1;GC_ASJ=125,0,0,0,0,0;GC_EAS=794,0,0,0,0,0;GC_FIN=1538,0,0,0,0,0;GC_NFE=6843,0,0,0,0,0;GC_OTH=445,0,0,0,0,0;Hom_Male=0,1;Hom_Female=0,0
#query	1	52152	ATAAT	A
gnomAD	1	52152	rs568235219	ATAAT	A	7526.08	PASS	AC=14;AF=5.34392e-04;AN=26198;BaseQRankSum=3.51000e-01;ClippingRankSum=1.96000e-01;DP=472975;FS=1.11430e+01;InbreedingCoeff=-4.00000e-04;MQ=5.10900e+01;MQRankSum=-5.23000e-01;QD=1.23400e+01;ReadPosRankSum=7.69000e-01;SOR=8.23000e-01;VQSLOD=-4.27800e-01;VQSR_culprit=MQRankSum;VQSR_NEGATIVE_TRAIN_SITE;GQ_HIST_ALT=0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0|0|1|0|14;DP_HIST_ALT=0|0|0|4|1|2|4|1|2|1|0|0|0|0|0|0|1|0|0|0;AB_HIST_ALT=0|0|1|1|0|3|2|0|0|1|2|2|0|0|1|1|1|1|0|0;GQ_HIST_ALL=199|284|157|419|523|341|774|765|412|816|809|414|1312|254|844|293|1014|97|845|4278;DP_HIST_ALL=516|1175|1724|1894|2097|1709|1741|2319|990|337|164|77|45|27|17|5|7|1|1|3;AB_HIST_ALL=0|0|1|1|0|3|2|0|0|1|2|2|0|0|1|1|1|1|0|0;AC_Male=12;AC_Female=2;AN_Male=14590;AN_Female=11608;AF_Male=8.22481e-04;AF_Female=1.72295e-04;GC_Male=7283,12,0;GC_Female=5802,2,0;GC_raw=14834,16,0;AC_raw=16;AN_raw=29700;GC=13085,14,0;AF_raw=5.38721e-04;Hom_AFR=0;Hom_AMR=0;Hom_ASJ=0;Hom_EAS=0;Hom_FIN=0;Hom_NFE=0;Hom_OTH=0;Hom=0;Hom_raw=0;AC_AFR=0;AC_AMR=0;AC_ASJ=0;AC_EAS=0;AC_FIN=0;AC_NFE=14;AC_OTH=0;AN_AFR=7434;AN_AMR=610;AN_ASJ=240;AN_EAS=1582;AN_FIN=2490;AN_NFE=13048;AN_OTH=794;AF_AFR=0.00000e+00;AF_AMR=0.00000e+00;AF_ASJ=0.00000e+00;AF_EAS=0.00000e+00;AF_FIN=0.00000e+00;AF_NFE=1.07296e-03;AF_OTH=0.00000e+00;POPMAX=NFE;AC_POPMAX=14;AN_POPMAX=13048;AF_POPMAX=1.07296e-03;DP_MEDIAN=31;DREF_MEDIAN=6.25594e-46;GQ_MEDIAN=99;AB_MEDIAN=4.96838e-01;AS_RF=9.72455e-01;AS_FilterStatus=PASS;CSQ=-|upstream_gene_variant|MODIFIER|OR4G4P|ENSG00000268020|Transcript|ENST00000594647|unprocessed_pseudogene||||||||||rs568235219|1|893|1||deletion|1|HGNC|14822||||||||||||||-:0.0006||-:0.0008|-:0|-:0|-:0|-:0.002||||||||||||||||||||||,-|upstream_gene_variant|MODIFIER|OR4G4P|ENSG00000268020|Transcript|ENST00000606857|unprocessed_pseudogene||||||||||rs568235219|1|317|1||deletion|1|HGNC|14822|YES|||||||||||||-:0.0006||-:0.0008|-:0|-:0|-:0|-:0.002||||||||||||||||||||||;GC_AFR=3717,0,0;GC_AMR=305,0,0;GC_ASJ=120,0,0;GC_EAS=791,0,0;GC_FIN=1245,0,0;GC_NFE=6510,14,0;GC_OTH=397,0,0;Hom_Male=0;Hom_Female=0
...
Annotation Configuration File

Format of annotation configuration file:

  • Command line starts with '#';
  • Database tag starts with '@';
  • Configuration fields including: fields, info_fields, cols, out_names, has_header, header_path, comment_indicator, vcf_info_path

Description of fields:

  • fields: Specify fields to extract. Valid fields for VCF are CHROM, BEGIN, REF, ALT, QUAL, FILTER, INFO, for BED and TAB format should apply fields referring to header or external header file.
  • info_fields: Used for VCF format only, should be applied together with fields=[INFO].
  • cols: BED and TAB field without header could use cols to specify the columns to extract. Default column name is "col"+n.
  • out_names: Rename extracted fields.
  • header_path: Used for BED and TAB format to specify header path.
  • comment_indicator:Used for BED and TAB format to specify header path.
  • vcf_info_path: Used with BED format (annotation output format is VCF) to specify field INFO.

all_dbs.annoc

@1000g
fields=[INFO]
info_fields=[AC, AF]
out_names=[AC:1000g_AC, AF:1000g_AF]

@gnomAD
fields=[INFO]
info_fields=[AC, AF]
out_names=[AC:gnomAD_AC, AF:gnomAD_AF]

@cosmic
fields=[INFO]
info_fields=[GENE, STRAND]
out_names=[GENE:cosmic_GENE, STRAND:cosmic_STRAND]

@dbscSNV
header_path=/path/to/config/dbscSNV.header
fields=[RefSeq_gene, rf_score]
out_names=[RefSeq_gene:dbscSNV_refGene, rf_score:dbscSNV_rf_score]

dbscSNV.header

chr	pos	ref	alt	hg38_chr	hg38_pos	RefSeq?	Ensembl?	RefSeq_region	RefSeq_gene	RefSeq_functional_consequence	RefSeq_id_c_change_p_change	Ensembl_region	Ensembl_gene	Ensembl_functional_consequence	Ensembl_id_c_change_p_change	ada_score	rf_score
Annotation Result("ANNO" file)

Annotation without configuration file  Query: q1.sort.vcf  Database: 1000G_p3.sort.vcf.gz, cosmic.sort.vcf.gz

Output file: q1.sort.vcf.allfields.anno.gz

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=1000g_CHROM,Number=.,Type=String,Description="">
##INFO=<ID=1000g_POS,Number=.,Type=String,Description="">
##INFO=<ID=1000g_REF,Number=.,Type=String,Description="">
...
##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele. Format: AA|REF|ALT|IndelType. AA: Ancestral allele, REF:Reference Allele, ALT:Alternate Allele, IndelType:Type of Indel (REF, ALT and IndelType are only defined for indels)">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1)">
...
##INFO=<ID=cosmic_ALT,Number=.,Type=String,Description="">
##INFO=<ID=cosmic_CHROM,Number=.,Type=String,Description="">
##INFO=<ID=cosmic_FILTER,Number=.,Type=String,Description="">
##INFO=<ID=cosmic_ID,Number=.,Type=String,Description="">
##INFO=<ID=cosmic_POS,Number=.,Type=String,Description="">
...
##contig=<D=1,assembly=b37,length=249250621>
##fileDate=20150218
##reference=ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
##source=1000GenomesPhase3Pipeline
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
...
1	81125	rs560365426	T	C	100	PASS	.;1000g_CHROM=1;1000g_POS=81125;1000g_ID=rs560365426;1000g_REF=T;1000g_ALT=C;1000g_QUAL=100;1000g_FILTER=PASS;1000g_AC=1;1000g_AF=0.000199681;1000g_NS=2504;1000g_AN=5008;1000g_EAS_AF=0.001;1000g_EUR_AF=0;1000g_AFR_AF=0;1000g_AMR_AF=0;1000g_SAS_AF=0;1000g_DP=22536;1000g_AA=.|||;1000g_VT=SNP
1	81712	rs558839829	C	T	100	PASS	.;1000g_CHROM=1;1000g_POS=81712;1000g_ID=rs558839829;1000g_REF=C;1000g_ALT=T;1000g_QUAL=100;1000g_FILTER=PASS;1000g_AC=1;1000g_AF=0.000199681;1000g_NS=2504;1000g_AN=5008;1000g_EAS_AF=0;1000g_EUR_AF=0;1000g_AFR_AF=0.0008;1000g_AMR_AF=0;1000g_SAS_AF=0;1000g_DP=20171;1000g_AA=.|||;1000g_VT=SNP
1	88230	rs543088928	T	C	100	PASS	.;1000g_CHROM=1;1000g_POS=88230;1000g_ID=rs543088928;1000g_REF=T;1000g_ALT=C;1000g_QUAL=100;1000g_FILTER=PASS;1000g_AC=1;1000g_AF=0.000199681;1000g_NS=2504;1000g_AN=5008;1000g_EAS_AF=0;1000g_EUR_AF=0;1000g_AFR_AF=0.0008;1000g_AMR_AF=0;1000g_SAS_AF=0;1000g_DP=18579;1000g_AA=.|||;1000g_VT=SNP
1	99687	rs139153227	C	T	100	PASS	.;1000g_CHROM=1;1000g_POS=99687;1000g_ID=rs139153227;1000g_REF=C;1000g_ALT=T;1000g_QUAL=100;1000g_FILTER=PASS;1000g_AC=161;1000g_AF=0.0321486;1000g_NS=2504;1000g_AN=5008;1000g_EAS_AF=0.001;1000g_EUR_AF=0.0895;1000g_AFR_AF=0.0045;1000g_AMR_AF=0.0331;1000g_SAS_AF=0.0419;1000g_DP=17422;1000g_AA=.|||;1000g_VT=SNP
...

Annotation with configuration fileconfig/all_dbs.annoc  Query: q1.sort.vcf  Database: 1000G_p3.sort.vcf.gz, cosmic.sort.vcf.gz

Output file: q1.sort.vcf.anno.gz

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=1000g_AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##INFO=<ID=1000g_AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1)">
##INFO=<ID=cosmic_GENE,Number=1,Type=String,Description="Gene name">
##INFO=<ID=cosmic_STRAND,Number=1,Type=String,Description="Gene strand">
##contig=<D=1,assembly=b37,length=249250621>
##fileDate=20150218
##reference=ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
##source=1000GenomesPhase3Pipeline
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
...
1	895903	rs544271560	G	A	100	PASS	.;1000g_AC=7;1000g_AF=0.00139776
1	899747	rs368028255	G	A	100	PASS	.;1000g_AC=1;1000g_AF=0.000199681
1	901328	rs574254395	A	G	100	PASS	.;1000g_AC=4;1000g_AF=0.000798722
1	902018	rs567034360	A	G	100	PASS	.;1000g_AC=1;1000g_AF=0.000199681
1	905307	rs528578943	G	A	100	PASS	.;1000g_AC=18;1000g_AF=0.00359425
1	908062	rs527519589	C	T	100	PASS	.;1000g_AC=9;1000g_AF=0.00179712
1	914333	rs13302979	C	G	100	PASS	.;1000g_AC=2786;1000g_AF=0.55631;cosmic_GENE=PERM1,PERM1_ENST00000341290;cosmic_STRAND=-,-
1	923978	rs70949537	A	AG	100	PASS	.;1000g_AC=4568;1000g_AF=0.912141
1	924628	rs552249487	C	T	100	PASS	.;1000g_AC=1;1000g_AF=0.000199681
1	929190	rs9777939	A	G	100	PASS	.;1000g_AC=4688;1000g_AF=0.936102
...

BED output format  Query: q1.sort.vcf  Database: cosmic.sort.vcf.gz

Output file: q1.sort.bed.anno.gz

CHROM	BEGIN	END	REF	ALT	QUAL	FILTER	1000g_AC	1000g_AF
1	81124	81125	T	C	100	PASS	1	0.000199681
1	81711	81712	C	T	100	PASS	1	0.000199681
1	88229	88230	T	C	100	PASS	1	0.000199681
1	99686	99687	C	T	100	PASS	161	0.0321486
1	254262	254263	C	T	100	PASS	1	0.000199681
1	534168	534169	G	A	100	PASS	6	0.00119808
1	565078	565079	A	G	100	PASS	2	0.000399361
1	570093	570094	G	A	100	PASS	78	0.0155751
1	715845	715846	G	A	100	PASS	40	0.00798722
1	722602	722603	T	C	100	PASS	89	0.0177716
...

Remote database  Query: q1.sort.vcf  Database: 1000G_p3.sort.vcf.gz, VarNoteDB_AF_gnomAD_Genome.vcf.gz

Output file: q1.sort.vcf.remote.anno.gz

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=1000g_AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##INFO=<ID=1000g_AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1)">
##INFO=<ID=gnomAD_AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=gnomAD_AF,Number=A,Type=Float,Description="Allele Frequency among genotypes, for each ALT allele, in the same order as listed">
##contig=<ID=1,assembly=b37,length=249250621>
##fileDate=20150218
##reference=ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
##source=1000GenomesPhase3Pipeline
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO
1	64649	rs181431124	A	C	100	PASS	.;gnomAD_AC=584;gnomAD_AF=2.05402e-02
1	81125	rs560365426	T	C	100	PASS	.;1000g_AC=1;1000g_AF=0.000199681
1	81712	rs558839829	C	T	100	PASS	.;1000g_AC=1;1000g_AF=0.000199681;gnomAD_AC=14;gnomAD_AF=5.39915e-04
1	88230	rs543088928	T	C	100	PASS	.;1000g_AC=1;1000g_AF=0.000199681
1	99687	rs139153227	C	T	100	PASS	.;1000g_AC=161;1000g_AF=0.0321486;gnomAD_AC=1434;gnomAD_AF=6.64689e-02
1	254263	rs558650540	C	T	100	PASS	.;1000g_AC=1;1000g_AF=0.000199681
1	534169	rs59089120	G	A	100	PASS	.;1000g_AC=6;1000g_AF=0.00119808;gnomAD_AC=37;gnomAD_AF=1.24899e-03
 ...
Run program with configuration file

Configuration file with all possible arguments for Intersect:

# configuration file with all possible arguments
# database should be started with [db]
query_file=./q1.sort.vcf

# format options of query
query_format=vcf
chrom=1
begin=2
end=2
ref=4
alt=5
zero_based=false
comment_indicator=##
header_indicator=#
has_header=true

# output options
out_file=./q1.sort.vcf.overlap.gz
is_loj=false
is_zip=true

# other options
thread=4
is_log=true
use_jdk_inflater=false

[db]
db_path=./1000G_p3.sort.vcf.gz
db_index_type=VarNote
db_tag=1000g
db_mode=1

[db]
db_path=./cosmic.sort.vcf.gz
db_index_type=TBI
db_tag=cosmic
db_mode=1

Configuration file with all required arguments for Intersect:

# configuration file with all required arguments
query_file=./q1.sort.vcf

# other options
thread=4

[db]
db_path=./1000G_p3.sort.vcf.gz
db_tag=1000g
db_mode=1

[db]
db_path=./cosmic.sort.vcf.gz
db_index_type=TBI
db_tag=cosmic
db_mode=1

Configuration file with all possible arguments for Annotation:

# configuration file with all possible arguments
# database should be started with [db]
query_file=./q1.sort.vcf

# format options of query
query_format=vcf
chrom=1
begin=2
end=2
ref=4
alt=5
zero_based=false
comment_indicator=##
header_indicator=#
has_header=true

# annotation options
anno_config=./config/all_dbs.annoc
force_overlap=false
out_format=VCF

# output options
out_file=./q1.sort.vcf.overlap.gz
is_loj=false
is_zip=true

# other options
thread=4
is_log=true
use_jdk_inflater=false

[db]
db_path=./1000G_p3.sort.vcf.gz
db_index_type=VarNote
db_tag=1000g
db_mode=1

[db]
db_path=./cosmic.sort.vcf.gz
db_index_type=TBI
db_tag=cosmic
db_mode=1