Data Example

Query File

Note

rsID format is supported by Online but not Locally.

Tab format is supported by Locally but not Online.

rsID Format
rs367896724
rs4970354
rs28460227
rs12731916
rs13303005
rs13303005
...
VCF Format
##VCF meta lines...
#VCF header lines
1	64649	rs181431124	A	C	100	PASS	.
1	81125	rs560365426	T	C	100	PASS	.
1	81712	rs558839829	C	T	100	PASS	.
1	88230	rs543088928	T	C	100	PASS	.
1	99687	rs139153227	C	T	100	PASS	.
1	254263	rs558650540	C	T	100	PASS	.
1	534169	rs59089120	G	A	100	PASS	.
...
VCF-like

Note

VCF-like format has no comment and header lines.

The first five columns are the same as the VCF format.

1	64649	rs181431124	A	C	col6	col7
1	81125	rs560365426	T	C	col6	col7
1	81712	rs558839829	C	T	col6	col7
1	88230	rs543088928	T	C	col6	col7
1	99687	rs139153227	C	T	col6	col7
1	254263	rs558650540	C	T	col6	col7
1	534169	rs59089120	G	A	col6	col7
...
Coord-Only

The first two columns are:

  • 1. chrom - The name of the chromosome.
  • 2. pos - The position of the feature in the chromosome, 1-based.
1	10177
1	1122916
1	1999840
1	2533552
1	2557191
1	2854172
...
Coord-Allele

The first four columns are:

  • 1. chrom - The name of the chromosome.
  • 2. pos - The position of the feature in the chromosome, 1-based.
  • 3. ref
  • 4. alt
1	10177	A	AC	0.4056
1	63479713	G	GCTA	0.4533
1	1122916	A	G	0.0845
1	1042927	G	T	0.1342
1	1999840	C	T	0.1243
...
Tab

Tab format should be used together with attributes {c, b, e, ref, alt, 0}, please refer tagArgument for more details:

For example: c=2,b=3,e=4,0=true was used to define the following data.

col1	1	10177   10179	col4
col1	1	63479713   63479813	col4
col1	1	1122916   1122918	col4
col1	1	1042927   1043927	col4
col1	1	1999840   1999940	col4
...

Configuration File

The configuration file to build database.

roadmap=/path/to/Roadmap_127Epi.bed.gz

1000G=/path/to/1kg.phase3.v5.shapeit2.eur.hg19.all.split.multi.vcf.gz

bit_file=/path/to/1kg.phase3.v5.shapeit2.eur.hg19.all.split.multi.vcf.gz.bit

gene_file=/path/to/gencode.v32lift37.annotation.gene.sort.gtf.gz

ser_path=/path/to/ser/hg19_ensembl.ser

vf_path=/path/to/vf.json

tissue_list=/path/to/tissue.list

tissue_path=/path/to/GTEx_v8.signif_variant_gene_pairs.txt.gz

gc_path=/path/to/hg19.gc5Base.bed.gz

ld_window=100

maf_cutoff=-1

genome=hg19

output_dir=/path/to/output

Output Files

anno.out.txt

This file contains the annotations of query and controls.

  1. LABEL, query or control
  2. CHR, sequence name
  3. POS, position
  4. REF, reference allele
  5. ALT, alternative allele
  6. MAF, minor allele frequency
  7. DTCT, distance to the nearest gene
  8. GENE_DIS_KB*, gene density of variants , * is the parameter of -GP
  9. LD_BUDDIES_GT*, number of LD buddies, * is the parameter of -LDD
  10. GC_Content_BP*, percentage of GC content, * is the parameter of -BP
LABEL	CHR	POS	REF	ALT	MAF	DTCT	Gene_Dis_KB500
query	1	2854172	G	C	0.010900	31100	28
control1	1	179953553	C	T	0.012900	29645	27
control2	1	85189424	A	G	0.028800	32938	27
control3	1	48057464	G	T	0.047700	29385	27
control4	1	225809505	C	T	0.028800	31339	23
control5	1	67929952	A	G	0.023900	33854	22
control6	1	212935123	A	G	0.011900	30001	34
control7	1	157840538	C	G	0.036800	27508	28
control8	1	101144690	T	G	0.021900	27187	29
...
sampler.config.txt

This file contains the configuration of the job, (i.e. the parameters used when running vSampler)

Query File: input.txt
Database File: EUR.gz

Exclude input SNPs: true
Sampling across chromosomes: false
Variant type specific: true
Sample control number: 100
Annotation number: 100
MAF deviation: [-0.05, 0.05]

Distance to closest tss deviation: [-5000, 5000]
Gene density in distance: 500KB
Gene density deviation: [-10, 10]
Variant Region Match: false
Number of insufficient match: 103

Proportion of insufficient match:  0.0344
Median size of insufficient pool: 60.0
Median proportion of of insufficient pool:  0.600
Number of variants excluded: 3
sampler.out.txt

This file reports the sampling outputs.

  1. ratio, the ratio of sampling pool : control number (SN)
  2. query, user input variant
  3. annotation, annotation from database
  4. controlN, the n-th control varaint
ratio	query	annotation	control1	control2	control3	control4	control5	control6	control7	control8	control9	control10	control11	control12	control13	control14	control15	control16	control17	control18	control19	control20	control21	control22	control23	control24	control25	control26	control27	control28	control29	control30	control31	control32	control33	control34	control35	control36	control37	control38	control39	control40	control41	control42	control43	control44	control45	control46	control47	control48	control49	control50	control51	control52	control53	control54	control55	control56	control57	control58	control59	control60	control61	control62	control63	control64	control65	control66	control67	control68	control69	control70	control71	control72	control73	control74	control75	control76	control77	control78	control79	control80	control81	control82	control83	control84	control85	control86	control87	control88	control89	control90	control91	control92	control93	control94	control95	control96	control97	control98	control99	control100
9160:100	1:2854172:G:C	1:2854172:G:C	1:179953553:C:T	1:85189424:A:G	1:48057464:G:T	1:225809505:C:T	1:67929952:A:G	1:212935123:A:G	1:157840538:C:G	1:101144690:T:G	1:242047347:A:G	1:89180893:T:C	1:185730038:G:C	1:64023363:G:A	1:117782144:A:G	1:86919011:G:A	1:119718067:A:G	1:157710893:C:A	1:20383766:C:T	1:246641380:G:A	1:68669479:T:G	1:205225480:G:C	1:95419064:G:T	1:183639566:C:G	1:114857147:C:T	1:36722309:G:A	1:93781806:C:T	1:179591414:G:T	1:184384740:A:T	1:185433179:T:C	1:235150199:G:T	1:29530718:G:A	1:202189594:G:A	1:90492618:G:A	1:171841662:C:T	1:89429126:A:T	1:38816676:A:G	1:54548798:T:C	1:163007664:G:A	1:207658025:T:C	1:236796218:C:T	1:20387575:G:T	1:201107974:C:T	1:31802167:T:C	1:229735229:C:G	1:51949832:G:A	1:2531211:G:A	1:9321633:G:C	1:41900284:C:T	1:185643883:T:C	1:22384713:G:T	1:243186881:A:G	1:58981823:T:A	1:211725120:A:G	1:59365820:C:T	1:166688415:T:C	1:35992011:T:C	1:22443482:A:T	1:59135525:A:C	1:212180861:G:A	1:182247649:A:G	1:9322155:C:T	1:211333734:A:G	1:147199689:G:C	1:19838908:C:T	1:226467391:G:C	1:84831864:A:G	1:166917857:C:G	1:174023412:C:T	1:86919502:A:T	1:243111945:C:T	1:109447270:C:T	1:22343070:C:T	1:209404552:G:C	1:146612437:C:G	1:180233705:C:T	1:169659138:T:C	1:116478247:G:C	1:205686004:C:G	1:173281261:C:T	1:115177771:T:C	1:117786057:G:T	1:166663665:C:T	1:229609139:T:G	1:143174982:G:A	1:230813638:C:G	1:67960819:C:T	1:165769308:T:G	1:117085412:A:G	1:109551602:G:A	1:116886547:G:T	1:84896939:C:T	1:212174207:T:C	1:206976119:T:C	1:43952955:T:C	1:204319140:T:C	1:203495475:C:T	1:162984461:A:G	1:51490709:G:A	1:85639045:G:A	1:10304810:A:G	1:212935021:C:T
13528:100	1:4540244:T:G	1:4540244:T:G	1:218627980:T:C	1:218063027:G:A	1:208900231:A:C	1:99733022:A:C	1:183576561:C:T	1:210560647:G:A	1:77539447:A:C	1:238104628:A:G	1:208863701:C:T	1:118239981:C:T	1:164329837:C:A	1:219462937:G:A	1:81551673:T:C	1:221728873:T:C	1:198501247:G:A	1:79207124:G:T	1:233055507:G:T	1:240905398:C:T	1:90519649:C:T	1:18342468:G:T	1:214098919:A:G	1:209280702:T:A	1:104112945:G:A	1:67394918:A:G	1:173168988:A:T	1:83904847:G:C	1:223351890:C:G	1:103572249:C:T	1:68628664:G:A	1:213658675:G:A	1:118140425:G:A	1:242681394:A:T	1:239371588:C:T	1:84473726:G:A	1:233970186:G:A	1:77684216:A:G	1:243273667:G:A	1:184115578:G:A	1:239014183:A:C	1:14745118:A:C	1:70918890:A:G	1:238271426:T:C	1:30605323:G:A	1:245837851:T:A	1:104075623:C:T	1:80909978:G:A	1:223318105:G:T	1:34746143:A:T	1:184000875:T:C	1:189096408:T:C	1:73654086:T:A	1:61103421:T:C	1:57281281:G:A	1:118475805:C:T	1:30703035:A:G	1:87786610:C:T	1:98837117:A:G	1:5033437:C:T	1:98051531:T:A	1:101602275:C:T	1:218307178:G:T	1:232946359:C:T	1:244278311:G:T	1:208438271:A:G	1:59244920:A:T	1:217310364:A:G	1:69514213:A:G	1:221916281:C:T	1:237165142:A:G	1:215174338:A:G	1:30890583:G:A	1:68916770:A:G	1:243145701:T:C	1:245391640:G:A	1:98685388:A:G	1:164960929:A:G	1:34634011:G:A	1:189008742:G:A	1:221721593:C:G	1:215047420:A:G	1:80517561:C:T	1:118324978:G:A	1:198481771:C:T	1:214158772:G:A	1:96208859:C:T	1:244278402:G:A	1:208410626:T:A	1:79093910:G:A	1:233048254:C:T	1:4473793:C:T	1:238654738:G:T	1:176182934:A:C	1:56403984:T:C	1:240926675:T:C	1:99921083:T:G	1:108567100:G:A	1:81565135:T:A	1:184280245:C:T	1:68298811:C:T	1:113934217:T:C
...
input.exclude.txt

vSampler only contains all variants that locates on autosomes with MAF > 0.01 based on 1000 genome phase 3 project genotype data, query variants out of this scope will be exclude.

1:240713704
2:166437374
2:170922481