Data Example
Query File
Note
rsID format is supported by Online but not Locally.
Tab format is supported by Locally but not Online.
rsID Format
rs367896724
rs4970354
rs28460227
rs12731916
rs13303005
rs13303005
...
VCF Format
##VCF meta lines...
#VCF header lines
1 64649 rs181431124 A C 100 PASS .
1 81125 rs560365426 T C 100 PASS .
1 81712 rs558839829 C T 100 PASS .
1 88230 rs543088928 T C 100 PASS .
1 99687 rs139153227 C T 100 PASS .
1 254263 rs558650540 C T 100 PASS .
1 534169 rs59089120 G A 100 PASS .
...
VCF-like
Note
VCF-like format has no comment and header lines.
The first five columns are the same as the VCF format.
1 64649 rs181431124 A C col6 col7
1 81125 rs560365426 T C col6 col7
1 81712 rs558839829 C T col6 col7
1 88230 rs543088928 T C col6 col7
1 99687 rs139153227 C T col6 col7
1 254263 rs558650540 C T col6 col7
1 534169 rs59089120 G A col6 col7
...
Coord-Only
The first two columns are:
- 1.
chrom - The name of the chromosome. - 2.
pos - The position of the feature in the chromosome,1-based .
1 10177
1 1122916
1 1999840
1 2533552
1 2557191
1 2854172
...
Coord-Allele
The first four columns are:
- 1.
chrom - The name of the chromosome. - 2.
pos - The position of the feature in the chromosome,1-based . - 3.
ref - 4.
alt
1 10177 A AC 0.4056
1 63479713 G GCTA 0.4533
1 1122916 A G 0.0845
1 1042927 G T 0.1342
1 1999840 C T 0.1243
...
Tab
Tab format should be used together with attributes {c, b, e, ref, alt, 0}, please refer tagArgument for more details:
For example:
col1 1 10177 10179 col4
col1 1 63479713 63479813 col4
col1 1 1122916 1122918 col4
col1 1 1042927 1043927 col4
col1 1 1999840 1999940 col4
...
Configuration File
The configuration file to build database.
roadmap=/path/to/Roadmap_127Epi.bed.gz
1000G=/path/to/1kg.phase3.v5.shapeit2.eur.hg19.all.split.multi.vcf.gz
bit_file=/path/to/1kg.phase3.v5.shapeit2.eur.hg19.all.split.multi.vcf.gz.bit
gene_file=/path/to/gencode.v32lift37.annotation.gene.sort.gtf.gz
ser_path=/path/to/ser/hg19_ensembl.ser
vf_path=/path/to/vf.json
tissue_list=/path/to/tissue.list
tissue_path=/path/to/GTEx_v8.signif_variant_gene_pairs.txt.gz
gc_path=/path/to/hg19.gc5Base.bed.gz
ld_window=100
maf_cutoff=-1
genome=hg19
output_dir=/path/to/output
Output Files
anno.out.txt
This file contains the annotations of query and controls.
LABEL , query or controlCHR , sequence namePOS , positionREF , reference alleleALT , alternative alleleMAF , minor allele frequencyDTCT , distance to the nearest geneGENE_DIS_KB* , gene density of variants , * is the parameter of -GPLD_BUDDIES_GT* , number of LD buddies, * is the parameter of -LDDGC_Content_BP* , percentage of GC content, * is the parameter of -BP
LABEL CHR POS REF ALT MAF DTCT Gene_Dis_KB500
query 1 2854172 G C 0.010900 31100 28
control1 1 179953553 C T 0.012900 29645 27
control2 1 85189424 A G 0.028800 32938 27
control3 1 48057464 G T 0.047700 29385 27
control4 1 225809505 C T 0.028800 31339 23
control5 1 67929952 A G 0.023900 33854 22
control6 1 212935123 A G 0.011900 30001 34
control7 1 157840538 C G 0.036800 27508 28
control8 1 101144690 T G 0.021900 27187 29
...
sampler.config.txt
This file contains the configuration of the job, (i.e. the parameters used when running vSampler)
Query File: input.txt
Database File: EUR.gz
Exclude input SNPs: true
Sampling across chromosomes: false
Variant type specific: true
Sample control number: 100
Annotation number: 100
MAF deviation: [-0.05, 0.05]
Distance to closest tss deviation: [-5000, 5000]
Gene density in distance: 500KB
Gene density deviation: [-10, 10]
Variant Region Match: false
Number of insufficient match: 103
Proportion of insufficient match: 0.0344
Median size of insufficient pool: 60.0
Median proportion of of insufficient pool: 0.600
Number of variants excluded: 3
sampler.out.txt
This file reports the sampling outputs.
ratio , the ratio of sampling pool : control number (SN)query , user input variantannotation , annotation from databasecontrolN , the n-th control varaint
ratio query annotation control1 control2 control3 control4 control5 control6 control7 control8 control9 control10 control11 control12 control13 control14 control15 control16 control17 control18 control19 control20 control21 control22 control23 control24 control25 control26 control27 control28 control29 control30 control31 control32 control33 control34 control35 control36 control37 control38 control39 control40 control41 control42 control43 control44 control45 control46 control47 control48 control49 control50 control51 control52 control53 control54 control55 control56 control57 control58 control59 control60 control61 control62 control63 control64 control65 control66 control67 control68 control69 control70 control71 control72 control73 control74 control75 control76 control77 control78 control79 control80 control81 control82 control83 control84 control85 control86 control87 control88 control89 control90 control91 control92 control93 control94 control95 control96 control97 control98 control99 control100
9160:100 1:2854172:G:C 1:2854172:G:C 1:179953553:C:T 1:85189424:A:G 1:48057464:G:T 1:225809505:C:T 1:67929952:A:G 1:212935123:A:G 1:157840538:C:G 1:101144690:T:G 1:242047347:A:G 1:89180893:T:C 1:185730038:G:C 1:64023363:G:A 1:117782144:A:G 1:86919011:G:A 1:119718067:A:G 1:157710893:C:A 1:20383766:C:T 1:246641380:G:A 1:68669479:T:G 1:205225480:G:C 1:95419064:G:T 1:183639566:C:G 1:114857147:C:T 1:36722309:G:A 1:93781806:C:T 1:179591414:G:T 1:184384740:A:T 1:185433179:T:C 1:235150199:G:T 1:29530718:G:A 1:202189594:G:A 1:90492618:G:A 1:171841662:C:T 1:89429126:A:T 1:38816676:A:G 1:54548798:T:C 1:163007664:G:A 1:207658025:T:C 1:236796218:C:T 1:20387575:G:T 1:201107974:C:T 1:31802167:T:C 1:229735229:C:G 1:51949832:G:A 1:2531211:G:A 1:9321633:G:C 1:41900284:C:T 1:185643883:T:C 1:22384713:G:T 1:243186881:A:G 1:58981823:T:A 1:211725120:A:G 1:59365820:C:T 1:166688415:T:C 1:35992011:T:C 1:22443482:A:T 1:59135525:A:C 1:212180861:G:A 1:182247649:A:G 1:9322155:C:T 1:211333734:A:G 1:147199689:G:C 1:19838908:C:T 1:226467391:G:C 1:84831864:A:G 1:166917857:C:G 1:174023412:C:T 1:86919502:A:T 1:243111945:C:T 1:109447270:C:T 1:22343070:C:T 1:209404552:G:C 1:146612437:C:G 1:180233705:C:T 1:169659138:T:C 1:116478247:G:C 1:205686004:C:G 1:173281261:C:T 1:115177771:T:C 1:117786057:G:T 1:166663665:C:T 1:229609139:T:G 1:143174982:G:A 1:230813638:C:G 1:67960819:C:T 1:165769308:T:G 1:117085412:A:G 1:109551602:G:A 1:116886547:G:T 1:84896939:C:T 1:212174207:T:C 1:206976119:T:C 1:43952955:T:C 1:204319140:T:C 1:203495475:C:T 1:162984461:A:G 1:51490709:G:A 1:85639045:G:A 1:10304810:A:G 1:212935021:C:T
13528:100 1:4540244:T:G 1:4540244:T:G 1:218627980:T:C 1:218063027:G:A 1:208900231:A:C 1:99733022:A:C 1:183576561:C:T 1:210560647:G:A 1:77539447:A:C 1:238104628:A:G 1:208863701:C:T 1:118239981:C:T 1:164329837:C:A 1:219462937:G:A 1:81551673:T:C 1:221728873:T:C 1:198501247:G:A 1:79207124:G:T 1:233055507:G:T 1:240905398:C:T 1:90519649:C:T 1:18342468:G:T 1:214098919:A:G 1:209280702:T:A 1:104112945:G:A 1:67394918:A:G 1:173168988:A:T 1:83904847:G:C 1:223351890:C:G 1:103572249:C:T 1:68628664:G:A 1:213658675:G:A 1:118140425:G:A 1:242681394:A:T 1:239371588:C:T 1:84473726:G:A 1:233970186:G:A 1:77684216:A:G 1:243273667:G:A 1:184115578:G:A 1:239014183:A:C 1:14745118:A:C 1:70918890:A:G 1:238271426:T:C 1:30605323:G:A 1:245837851:T:A 1:104075623:C:T 1:80909978:G:A 1:223318105:G:T 1:34746143:A:T 1:184000875:T:C 1:189096408:T:C 1:73654086:T:A 1:61103421:T:C 1:57281281:G:A 1:118475805:C:T 1:30703035:A:G 1:87786610:C:T 1:98837117:A:G 1:5033437:C:T 1:98051531:T:A 1:101602275:C:T 1:218307178:G:T 1:232946359:C:T 1:244278311:G:T 1:208438271:A:G 1:59244920:A:T 1:217310364:A:G 1:69514213:A:G 1:221916281:C:T 1:237165142:A:G 1:215174338:A:G 1:30890583:G:A 1:68916770:A:G 1:243145701:T:C 1:245391640:G:A 1:98685388:A:G 1:164960929:A:G 1:34634011:G:A 1:189008742:G:A 1:221721593:C:G 1:215047420:A:G 1:80517561:C:T 1:118324978:G:A 1:198481771:C:T 1:214158772:G:A 1:96208859:C:T 1:244278402:G:A 1:208410626:T:A 1:79093910:G:A 1:233048254:C:T 1:4473793:C:T 1:238654738:G:T 1:176182934:A:C 1:56403984:T:C 1:240926675:T:C 1:99921083:T:G 1:108567100:G:A 1:81565135:T:A 1:184280245:C:T 1:68298811:C:T 1:113934217:T:C
...
input.exclude.txt
vSampler only contains all variants that locates on autosomes with MAF > 0.01 based on 1000 genome phase 3 project genotype data, query variants out of this scope will be exclude.
1:240713704
2:166437374
2:170922481