bcftools
provides utilities for working with data in variant calling (.vcf
) format. The manual fully documents the arguments and features, and the developers have written their own “HowTo” page. The goal of this post is to walk through some scenarios with a reproducible dataset to showcase the bcftools
functionality I use regularly.
Note that this will not be an exhaustive demonstration of all bcftools
features, nor will it include other .vcf
parsing/manipulation tools or linux utilities (i.e. awk
, sed
) that can be handy for working with variant calling data.
The examples should be reproducible given setup described below. However, the output at the command line will look slightly different than the inline output in this post. For legibility, I’ve run each of the commands, excluded the header, and read the results back in as a text file. The inline output in this post will show a max of 6 rows with a final placeholder row (. . . . . . . . . . .
) if necessary.
- How do I concatenate multiple vcf files?
- How do I subset for individual samples by name?
- How do I restrict a vcf to only include INDELs?
- How do I filter a vcf by SNP ID?
- How do I filter a vcf by genomic coordinates?
- How do I format the genotype as nucleotide in a vcf?
- How do I merge multiple vcf files?
- How do I extract genotypes for multiple samples from a single vcf?
- How do I change the chromosome names in a vcf?
- How do I inspect a vcf without the header?
- How do I view only the header in a vcf?
Setup
To get started we need to find some data to work with and do a bit of pre-processing:
- Download all of the files for the 20130502 release of the 1000 Genomes Project (these are in compressed
.vcf.gz
format, each with.tbi
index) - Download a
.vcf.gz
(and.tbi
) for sites annotated by ClinVar1 - Create
.vcf.gz
files for each chromosome (1-22) filtered to only include the ClinVar sites - Create tabix index for each of the newly created
.vcf.gz
files
The code that follows will perform all of the steps described above. Keep in mind that the each step (especially downloading and filtering the 1000 Genomes data) may take quite a while as these files are large (~ 20GB total). You’ll need a system with sufficient storage, and has wget
, parallel
, bcftools
, and tabix
installed.
## download 1000 genomes vcf files
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/*.vcf.gz*
## download clinvar vcf
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar.vcf.gz*
## use parallel to restrict each chromosome (chr1 to chr22) to clinvar sites
find . -type f -name "ALL.chr[1-9]*vcf.gz" | parallel "bcftools view {} -R clinvar.vcf.gz --output-type z --output {}.clinvar.vcf.gz"
## make sure all vcf.gz files are tabix indexed
find . -type f -name "ALL.chr[1-9]*.clinvar.vcf.gz" | parallel "tabix {}"
With the data processed we can move onto the scenarios.
All subsequent code will use bcftools
version 1.10.
bcftools --version
bcftools 1.10.2-27-g9d66868
Using htslib 1.10.2-33-g1bbcd02
Copyright (C) 2019 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Scenarios
Concatenate multiple files together
If we wanted to concatenate (i.e. “stack”) multiple .vcf
files together we can use bcftools concat
, so long as the input files share the same fields. In this example, we’ll combine all of the chromosomes (1-22) into a single file.
The --output-type z
argument specifies that the output will be compressed, and the --output
flag allows us to explicitly name the resulting file:
bcftools concat ALL.chr*.clinvar.vcf.gz --output-type z --output all.clinvar.vcf.gz
NOTE: bcftools concat
is not equivalent bcftools merge
. For an example of the latter see below.
Select individual samples by name
bcftools view -s
allows for subsetting by sample ID.
The combined all.clinvar.vcf.gz
file above contains multiple samples. Here we’ll create individual compressed .vcf
files for NA20536 and HG03718 samples, along with a tabix index for each file (using bcftools index -t
):
bcftools view -s NA20536 all.clinvar.vcf.gz --output-type z --output NA20536.clinvar.vcf.gz
bcftools view -s HG03718 all.clinvar.vcf.gz --output-type z --output HG03718.clinvar.vcf.gz
## note: bcftools index -t is equivalent to tabix here
bcftools index -t NA20536.clinvar.vcf.gz
bcftools index -t HG03718.clinvar.vcf.gz
Filter to only include INDELs
bcftools view -v
will restrict the file to specified variant types: “snps”, “indels”, “mnps”, or “other”.
We can use the command to filter the .vcf
to only include INDELs:
bcftools view -v indels NA20536.clinvar.vcf.gz
CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | NA20536 |
---|---|---|---|---|---|---|---|---|---|
1 | 978603 | rs35881187 | CCT | C | 100 | PASS | AC=2;AF=0.479233;AN=2;NS=2504;DP=14705;EAS_AF=0.8036;AMR_AF=0.6412;AFR_AF=0.0348;EUR_AF=0.5487;SAS_AF=0.5593;VT=INDEL | GT | 1|1 |
1 | 984171 | rs140904842 | CAG | C | 100 | PASS | AC=2;AF=0.920527;AN=2;NS=2504;DP=7127;EAS_AF=0.9891;AMR_AF=0.9769;AFR_AF=0.7602;EUR_AF=0.9742;SAS_AF=0.9714;VT=INDEL | GT | 1|1 |
1 | 1168239 | rs533071750 | C | CG | 100 | PASS | AC=0;AF=0.000599042;AN=2;NS=2504;DP=9648;EAS_AF=0;AMR_AF=0.0029;AFR_AF=0;EUR_AF=0.001;SAS_AF=0;AA=?|GGGGGGG|GGGGGGGG|unsure;VT=INDEL;EX_TARGET | GT | 0|0 |
1 | 2343991 | rs570192538 | CCA | C | 100 | PASS | AC=0;AF=0.00459265;AN=2;NS=2504;DP=9045;EAS_AF=0;AMR_AF=0;AFR_AF=0.0174;EUR_AF=0;SAS_AF=0;VT=INDEL | GT | 0|0 |
1 | 2435830 | rs555614613 | TTCC | T | 100 | PASS | AC=0;AF=0.00579073;AN=2;NS=2504;DP=15005;EAS_AF=0;AMR_AF=0.0029;AFR_AF=0.0204;EUR_AF=0;SAS_AF=0;VT=INDEL;EX_TARGET | GT | 0|0 |
1 | 2492946 | rs149579135 | AG | A | 100 | PASS | AC=0;AF=0.00359425;AN=2;NS=2504;DP=17775;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0.0129;EUR_AF=0;SAS_AF=0;AA=G|G|-|deletion;VT=INDEL | GT | 0|0 |
. | . | . | . | . | . | . | . | . | . |
Filter by rsid
With bcftools
you can filter a .vcf
file for certain sites by passing in a file that contains the IDs to be retained.
Assuming we have the following RSIDs in a file called snps.list
2:
rs145413551
rs34610323
rs79548709
rs371163239
rs148716910
rs374704178
We can use snps.list
to filter with bcftools view
:
bcftools view --include ID==@snps.list NA20536.clinvar.vcf.gz
CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | NA20536 |
---|---|---|---|---|---|---|---|---|---|
17 | 648546 | rs34610323 | C | T | 100 | PASS | AC=0;AF=0.0159744;AN=2;NS=2504;DP=21874;EAS_AF=0;AMR_AF=0.0058;AFR_AF=0.0575;EUR_AF=0;SAS_AF=0;AA=C|||;VT=SNP;EX_TARGET | GT | 0|0 |
2 | 31620566 | rs145413551 | G | T | 100 | PASS | AC=0;AF=0.000199681;AN=2;NS=2504;DP=19652;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP;EX_TARGET | GT | 0|0 |
21 | 45707000 | rs374704178 | G | A | 100 | PASS | AC=0;AF=0.000399361;AN=2;NS=2504;DP=11479;EAS_AF=0;AMR_AF=0;AFR_AF=0.0015;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP;EX_TARGET | GT | 0|0 |
5 | 151721 | rs148716910 | G | A | 100 | PASS | AC=0;AF=0.00279553;AN=2;NS=2504;DP=18789;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP;EX_TARGET | GT | 0|0 |
8 | 1841816 | rs79548709 | C | T | 100 | PASS | AC=0;AF=0.00519169;AN=2;NS=2504;DP=16683;EAS_AF=0;AMR_AF=0;AFR_AF=0.0197;EUR_AF=0;SAS_AF=0;AA=C|||;VT=SNP;EX_TARGET | GT | 0|0 |
8 | 3889458 | rs371163239 | T | A | 100 | PASS | AC=0;AF=0.000199681;AN=2;NS=2504;DP=15669;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=T|||;VT=SNP;EX_TARGET | GT | 0|0 |
Filter by chromosome and/or position
The --regions
flag takes input chromosome and/or position coordinates to filter the .vcf
.
If we wanted to restrict to chromosome 5:
bcftools view --regions 5 NA20536.vcf.gz
CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | NA20536 |
---|---|---|---|---|---|---|---|---|---|
5 | 40417 | esv3603720;esv3603721 | G | 100 | PASS | AC=0,0;AF=0.000199681,0.000798722;AN=2;CS=DUP_uwash;END=176437;NS=2504;SVTYPE=CNV;DP=16231;EAS_AF=0,0;AMR_AF=0,0;AFR_AF=0,0;EUR_AF=0,0.003;SAS_AF=0.001,0.001;VT=SV;EX_TARGET | GT | 0|0 | |
5 | 124186 | esv3603731 | T | 100 | PASS | AC=0;AF=0.000199681;AN=2;CS=DUP_gs;END=163795;NS=2504;SVTYPE=DUP;DP=19153;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0.001;SAS_AF=0;VT=SV;EX_TARGET | GT | 0|0 | |
5 | 143490 | rs142208662 | C | T | 100 | PASS | AC=0;AF=0.00279553;AN=2;NS=2504;DP=19664;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AA=c|||;VT=SNP;EX_TARGET | GT | 0|0 |
5 | 151721 | rs148716910 | G | A | 100 | PASS | AC=0;AF=0.00279553;AN=2;NS=2504;DP=18789;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP;EX_TARGET | GT | 0|0 |
5 | 156288 | rs193920840 | C | T | 100 | PASS | AC=0;AF=0.000199681;AN=2;NS=2504;DP=17617;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0.001;AA=C|||;VT=SNP;EX_TARGET | GT | 0|0 |
5 | 162045 | rs568109142 | G | A | 100 | PASS | AC=0;AF=0.000199681;AN=2;NS=2504;DP=15391;EAS_AF=0.001;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP;EX_TARGET | GT | 0|0 |
. | . | . | . | . | . | . | . | . | . |
And if we were interested in a specific region (let’s say chromosome 10, anywhere between positions 800000:900000):
bcftools view --regions 10:800000-900000 NA20536.clinvar.vcf.gz
CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | NA20536 |
---|---|---|---|---|---|---|---|---|---|
10 | 859076 | rs144565605 | T | C | 100 | PASS | AC=0;AF=0.000199681;AN=2;NS=2504;DP=15608;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=T|||;VT=SNP;EX_TARGET | GT | 0|0 |
10 | 860990 | rs144883024 | G | A | 100 | PASS | AC=0;AF=0.00259585;AN=2;NS=2504;DP=18990;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0.0091;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP;EX_TARGET | GT | 0|0 |
10 | 871816 | rs79707128 | T | A | 100 | PASS | AC=0;AF=0.0211661;AN=2;NS=2504;DP=21039;EAS_AF=0;AMR_AF=0.0058;AFR_AF=0.0703;EUR_AF=0;SAS_AF=0.0092;AA=T|||;VT=SNP;EX_TARGET | GT | 0|0 |
Format translated genotype output
bcftools query
will output contents of the .vcf
in text format. The contents can be specified in a string that includes fields to extract, separators, and line endings.
In this scenario, we’ll pull out the ID (RSID), chromosome, position, a translated genotype, and the “type” (SNP, INDEL, etc.) in tab-separated format:
bcftools query -f "%ID\t%CHROM\t%POS[\t%TGT]\t%TYPE\n" NA20536.clinvar.vcf.gz
ID | CHROM | POS | GT | TYPE |
---|---|---|---|---|
rs41285790 | 1 | 865628 | G|G | SNP |
rs113383096 | 1 | 879481 | G|G | SNP |
rs112433394 | 1 | 880944 | G|G | SNP |
rs113226136 | 1 | 887409 | G|G | SNP |
rs112966263 | 1 | 887989 | A|A | SNP |
rs58931985 | 1 | 889450 | C|C | SNP |
. | . | . | . | . |
Merge vcf files together
bcftools merge
will combine data from multiple files.
To merge individual sample .vcf
files into one:
bcftools merge NA20536.clinvar.vcf.gz HG03718.clinvar.vcf.gz --output-type z --output NA20536.HG03718.clinvar.vcf.gz
CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | NA20536 | HG03718 |
---|---|---|---|---|---|---|---|---|---|---|
1 | 865628 | rs41285790 | G | A | 100 | PASS | NS=2504;AA=g|||;VT=SNP;EX_TARGET;DP=33950;AF=0.00279553;EAS_AF=0;AMR_AF=0.0072;AFR_AF=0;EUR_AF=0.005;SAS_AF=0.0041;AN=4;AC=0 | GT | 0|0 | 0|0 |
1 | 879481 | rs113383096 | G | C | 100 | PASS | NS=2504;AA=g|||;VT=SNP;EX_TARGET;DP=27530;AF=0.0197684;EAS_AF=0;AMR_AF=0.0058;AFR_AF=0.0719;EUR_AF=0;SAS_AF=0;AN=4;AC=0 | GT | 0|0 | 0|0 |
1 | 880944 | rs112433394 | G | A | 100 | PASS | NS=2504;AA=g|||;VT=SNP;EX_TARGET;DP=41446;AF=0.00259585;EAS_AF=0;AMR_AF=0;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AN=4;AC=0 | GT | 0|0 | 0|0 |
1 | 887409 | rs113226136 | G | C | 100 | PASS | NS=2504;AA=g|||;VT=SNP;EX_TARGET;DP=39832;AF=0.00119808;EAS_AF=0;AMR_AF=0;AFR_AF=0.0045;EUR_AF=0;SAS_AF=0;AN=4;AC=0 | GT | 0|0 | 0|0 |
1 | 887989 | rs112966263 | A | G | 100 | PASS | NS=2504;AA=G|||;VT=SNP;EX_TARGET;DP=36768;AF=0.00579073;EAS_AF=0;AMR_AF=0;AFR_AF=0.0219;EUR_AF=0;SAS_AF=0;AN=4;AC=0 | GT | 0|0 | 0|0 |
1 | 889450 | rs58931985 | C | A | 100 | PASS | NS=2504;AA=C|||;VT=SNP;EX_TARGET;DP=32298;AF=0.00159744;EAS_AF=0;AMR_AF=0;AFR_AF=0.0061;EUR_AF=0;SAS_AF=0;AN=4;AC=0 | GT | 0|0 | 0|0 |
. | . | . | . | . | . | . | . | . | . | . |
Parse genotypes for multiple samples
Given that you have a mutli-sample .vcf
you can parse genotypes for each individual:
bcftools query -f '[%CHROM\t%POS\t%SAMPLE\t%TGT\n]' NA20536.HG03718.clinvar.vcf.gz
CHROM | POS | SAMPLE | GT |
---|---|---|---|
1 | 865628 | NA20536 | G|G |
1 | 865628 | HG03718 | G|G |
1 | 879481 | NA20536 | G|G |
1 | 879481 | HG03718 | G|G |
1 | 880944 | NA20536 | G|G |
1 | 880944 | HG03718 | G|G |
. | . | . | . |
Edit chromosome names
You can rename chromosomes with bcftools annotate --rename-chrs
. The command requires that you supply a tab-separated file with the desired naming convention, organized as “old\tnew”:
1\tchr1
2\tchr2
3\tchr3
4\tchr4
5\tchr5
6\tchr6
7\tchr7
8\tchr8
9\tchr9
10\tchr10
11\tchr11
12\tchr12
13\tchr13
14\tchr14
15\tchr15
16\tchr16
17\tchr17
18\tchr18
19\tchr19
20\tchr20
21\tchr21
22\tchr22
bcftools annotate --rename-chrs chromosomes.txt NA20536.clinvar.vcf.gz
CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | NA20536 |
---|---|---|---|---|---|---|---|---|---|
chr1 | 865628 | rs41285790 | G | A | 100 | PASS | AC=0;AF=0.00279553;AN=2;NS=2504;DP=16975;EAS_AF=0;AMR_AF=0.0072;AFR_AF=0;EUR_AF=0.005;SAS_AF=0.0041;AA=g|||;VT=SNP;EX_TARGET | GT | 0|0 |
chr1 | 879481 | rs113383096 | G | C | 100 | PASS | AC=0;AF=0.0197684;AN=2;NS=2504;DP=13765;EAS_AF=0;AMR_AF=0.0058;AFR_AF=0.0719;EUR_AF=0;SAS_AF=0;AA=g|||;VT=SNP;EX_TARGET | GT | 0|0 |
chr1 | 880944 | rs112433394 | G | A | 100 | PASS | AC=0;AF=0.00259585;AN=2;NS=2504;DP=20723;EAS_AF=0;AMR_AF=0;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AA=g|||;VT=SNP;EX_TARGET | GT | 0|0 |
chr1 | 887409 | rs113226136 | G | C | 100 | PASS | AC=0;AF=0.00119808;AN=2;NS=2504;DP=19916;EAS_AF=0;AMR_AF=0;AFR_AF=0.0045;EUR_AF=0;SAS_AF=0;AA=g|||;VT=SNP;EX_TARGET | GT | 0|0 |
chr1 | 887989 | rs112966263 | A | G | 100 | PASS | AC=0;AF=0.00579073;AN=2;NS=2504;DP=18384;EAS_AF=0;AMR_AF=0;AFR_AF=0.0219;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP;EX_TARGET | GT | 0|0 |
chr1 | 889450 | rs58931985 | C | A | 100 | PASS | AC=0;AF=0.00159744;AN=2;NS=2504;DP=16149;EAS_AF=0;AMR_AF=0;AFR_AF=0.0061;EUR_AF=0;SAS_AF=0;AA=C|||;VT=SNP;EX_TARGET | GT | 0|0 |
. | . | . | . | . | . | . | . | . | . |
View without header
To view only the results without header (i.e. remove the header) use the -H
flag:
bcftools view -H NA20536.HG03718.clinvar.vcf.gz
X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 | X11 |
---|---|---|---|---|---|---|---|---|---|---|
1 | 865628 | rs41285790 | G | A | 100 | PASS | NS=2504;AA=g|||;VT=SNP;EX_TARGET;DP=33950;AF=0.00279553;EAS_AF=0;AMR_AF=0.0072;AFR_AF=0;EUR_AF=0.005;SAS_AF=0.0041;AN=4;AC=0 | GT | 0|0 | 0|0 |
1 | 879481 | rs113383096 | G | C | 100 | PASS | NS=2504;AA=g|||;VT=SNP;EX_TARGET;DP=27530;AF=0.0197684;EAS_AF=0;AMR_AF=0.0058;AFR_AF=0.0719;EUR_AF=0;SAS_AF=0;AN=4;AC=0 | GT | 0|0 | 0|0 |
1 | 880944 | rs112433394 | G | A | 100 | PASS | NS=2504;AA=g|||;VT=SNP;EX_TARGET;DP=41446;AF=0.00259585;EAS_AF=0;AMR_AF=0;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AN=4;AC=0 | GT | 0|0 | 0|0 |
1 | 887409 | rs113226136 | G | C | 100 | PASS | NS=2504;AA=g|||;VT=SNP;EX_TARGET;DP=39832;AF=0.00119808;EAS_AF=0;AMR_AF=0;AFR_AF=0.0045;EUR_AF=0;SAS_AF=0;AN=4;AC=0 | GT | 0|0 | 0|0 |
1 | 887989 | rs112966263 | A | G | 100 | PASS | NS=2504;AA=G|||;VT=SNP;EX_TARGET;DP=36768;AF=0.00579073;EAS_AF=0;AMR_AF=0;AFR_AF=0.0219;EUR_AF=0;SAS_AF=0;AN=4;AC=0 | GT | 0|0 | 0|0 |
1 | 889450 | rs58931985 | C | A | 100 | PASS | NS=2504;AA=C|||;VT=SNP;EX_TARGET;DP=32298;AF=0.00159744;EAS_AF=0;AMR_AF=0;AFR_AF=0.0061;EUR_AF=0;SAS_AF=0;AN=4;AC=0 | GT | 0|0 | 0|0 |
. | . | . | . | . | . | . | . | . | . | . |
View only header
To view only the header (i.e. extract header) use the -h
flag:
bcftools view -h clinvar.vcf.gz
## ##fileformat=VCFv4.1
## ##FILTER=<ID=PASS,Description="All filters passed">
## ##fileDate=2020-02-17
## ##source=ClinVar
## ##reference=GRCh37
## ##ID=<Description="ClinVar Variation ID">
## ##INFO=<ID=AF_ESP,Number=1,Type=Float,Description="allele frequencies from GO-ESP">
## ##INFO=<ID=AF_EXAC,Number=1,Type=Float,Description="allele frequencies from ExAC">
## ##INFO=<ID=AF_TGP,Number=1,Type=Float,Description="allele frequencies from TGP">
## ##INFO=<ID=ALLELEID,Number=1,Type=Integer,Description="the ClinVar Allele ID">
## ##INFO=<ID=CLNDN,Number=.,Type=String,Description="ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
## ##INFO=<ID=CLNDNINCL,Number=.,Type=String,Description="For included Variant : ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
## ##INFO=<ID=CLNDISDB,Number=.,Type=String,Description="Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
## ##INFO=<ID=CLNDISDBINCL,Number=.,Type=String,Description="For included Variant: Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
## ##INFO=<ID=CLNHGVS,Number=.,Type=String,Description="Top-level (primary assembly, alt, or patch) HGVS expression.">
## ##INFO=<ID=CLNREVSTAT,Number=.,Type=String,Description="ClinVar review status for the Variation ID">
## ##INFO=<ID=CLNSIG,Number=.,Type=String,Description="Clinical significance for this single variant">
## ##INFO=<ID=CLNSIGCONF,Number=.,Type=String,Description="Conflicting clinical significance for this single variant">
## ##INFO=<ID=CLNSIGINCL,Number=.,Type=String,Description="Clinical significance for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:clinical significance.">
## ##INFO=<ID=CLNVC,Number=1,Type=String,Description="Variant type">
## ##INFO=<ID=CLNVCSO,Number=1,Type=String,Description="Sequence Ontology id for variant type">
## ##INFO=<ID=CLNVI,Number=.,Type=String,Description="the variant's clinical sources reported as tag-value pairs of database and variant identifier">
## ##INFO=<ID=DBVARID,Number=.,Type=String,Description="nsv accessions from dbVar for the variant">
## ##INFO=<ID=GENEINFO,Number=1,Type=String,Description="Gene(s) for the variant reported as gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
## ##INFO=<ID=MC,Number=.,Type=String,Description="comma separated list of molecular consequence in the form of Sequence Ontology ID|molecular_consequence">
## ##INFO=<ID=ORIGIN,Number=.,Type=String,Description="Allele origin. One or more of the following values may be added: 0 - unknown; 1 - germline; 2 - somatic; 4 - inherited; 8 - paternal; 16 - maternal; 32 - de-novo; 64 - biparental; 128 - uniparental; 256 - not-tested; 512 - tested-inconclusive; 1073741824 - other">
## ##INFO=<ID=RS,Number=.,Type=String,Description="dbSNP ID (i.e. rs number)">
## ##INFO=<ID=SSR,Number=1,Type=Integer,Description="Variant Suspect Reason Codes. One or more of the following values may be added: 0 - unspecified, 1 - Paralog, 2 - byEST, 4 - oldAlign, 8 - Para_EST, 16 - 1kg_failed, 1024 - other">
## ##contig=<ID=1>
## ##contig=<ID=2>
## ##contig=<ID=3>
## ##contig=<ID=4>
## ##contig=<ID=5>
## ##contig=<ID=6>
## ##contig=<ID=7>
## ##contig=<ID=8>
## ##contig=<ID=9>
## ##contig=<ID=10>
## ##contig=<ID=11>
## ##contig=<ID=12>
## ##contig=<ID=13>
## ##contig=<ID=14>
## ##contig=<ID=15>
## ##contig=<ID=16>
## ##contig=<ID=17>
## ##contig=<ID=18>
## ##contig=<ID=19>
## ##contig=<ID=20>
## ##contig=<ID=21>
## ##contig=<ID=22>
## ##contig=<ID=X>
## ##contig=<ID=Y>
## ##contig=<ID=MT>
## ##bcftools_viewVersion=1.10.2-27-g9d66868+htslib-1.10.2-33-g1bbcd02
## ##bcftools_viewCommand=view -h clinvar.vcf.gz; Date=Fri Feb 28 19:06:40 2020
## #CHROM POS ID REF ALT QUAL FILTER INFO
From ClinVar vcf documentation: This file contains variations submitted through clinical channels. The variations contained in this file are therefore a mixture of variations asserted to be pathogenic as well as those known to be non-pathogenic. The user should note that any variant may have different assertions regarding clinical significance and that this file will contain only those that are the most “pathogenic”.↩
This solution is based on a Biostars post: https://www.biostars.org/p/373852/↩