bcftools Primer

Feb 28, 2020 10 min read notes

bcftools provides utilities for working with data in variant calling (.vcf) format. The manual fully documents the arguments and features, and the developers have written their own “HowTo” page. The goal of this post is to walk through some scenarios with a reproducible dataset to showcase the bcftools functionality I use regularly.

Note that this will not be an exhaustive demonstration of all bcftools features, nor will it include other .vcf parsing/manipulation tools or linux utilities (i.e. awk, sed) that can be handy for working with variant calling data.

The examples should be reproducible given setup described below. However, the output at the command line will look slightly different than the inline output in this post. For legibility, I’ve run each of the commands, excluded the header, and read the results back in as a text file. The inline output in this post will show a max of 6 rows with a final placeholder row (. . . . . . . . . . .) if necessary.

How do I concatenate multiple vcf files?
How do I subset for individual samples by name?
How do I restrict a vcf to only include INDELs?
How do I filter a vcf by SNP ID?
How do I filter a vcf by genomic coordinates?
How do I format the genotype as nucleotide in a vcf?
How do I merge multiple vcf files?
How do I extract genotypes for multiple samples from a single vcf?
How do I change the chromosome names in a vcf?
How do I inspect a vcf without the header?
How do I view only the header in a vcf?

Setup

To get started we need to find some data to work with and do a bit of pre-processing:

Download all of the files for the 20130502 release of the 1000 Genomes Project (these are in compressed .vcf.gz format, each with .tbi index)
Download a .vcf.gz (and .tbi) for sites annotated by ClinVar ¹
Create .vcf.gz files for each chromosome (1-22) filtered to only include the ClinVar sites
Create tabix index for each of the newly created .vcf.gz files

The code that follows will perform all of the steps described above. Keep in mind that the each step (especially downloading and filtering the 1000 Genomes data) may take quite a while as these files are large (~ 20GB total). You’ll need a system with sufficient storage, and has wget, parallel, bcftools, and tabix installed.

## download 1000 genomes vcf files
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/*.vcf.gz*

## download clinvar vcf
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar.vcf.gz*

## use parallel to restrict each chromosome (chr1 to chr22) to clinvar sites
find . -type f -name "ALL.chr[1-9]*vcf.gz" | parallel "bcftools view {} -R clinvar.vcf.gz --output-type z --output {}.clinvar.vcf.gz"

## make sure all vcf.gz files are tabix indexed
find . -type f -name "ALL.chr[1-9]*.clinvar.vcf.gz" | parallel "tabix {}"

With the data processed we can move onto the scenarios.

All subsequent code will use bcftools version 1.10.

bcftools --version

bcftools 1.10.2-27-g9d66868
Using htslib 1.10.2-33-g1bbcd02
Copyright (C) 2019 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Scenarios

Concatenate multiple files together

If we wanted to concatenate (i.e. “stack”) multiple .vcf files together we can use bcftools concat, so long as the input files share the same fields. In this example, we’ll combine all of the chromosomes (1-22) into a single file.

The --output-type z argument specifies that the output will be compressed, and the --output flag allows us to explicitly name the resulting file:

bcftools concat ALL.chr*.clinvar.vcf.gz --output-type z --output all.clinvar.vcf.gz

NOTE: bcftools concat is not equivalent bcftools merge. For an example of the latter see below.

Select individual samples by name

bcftools view -s allows for subsetting by sample ID.

The combined all.clinvar.vcf.gz file above contains multiple samples. Here we’ll create individual compressed .vcf files for NA20536 and HG03718 samples, along with a tabix index for each file (using bcftools index -t):

bcftools view -s NA20536 all.clinvar.vcf.gz --output-type z --output NA20536.clinvar.vcf.gz
bcftools view -s HG03718 all.clinvar.vcf.gz --output-type z --output HG03718.clinvar.vcf.gz

## note: bcftools index -t is equivalent to tabix here
bcftools index -t NA20536.clinvar.vcf.gz
bcftools index -t HG03718.clinvar.vcf.gz

Filter to only include INDELs

bcftools view -v will restrict the file to specified variant types: “snps”, “indels”, “mnps”, or “other”.

We can use the command to filter the .vcf to only include INDELs:

bcftools view -v indels NA20536.clinvar.vcf.gz

CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA20536
1	978603	rs35881187	CCT	C	100	PASS	AC=2;AF=0.479233;AN=2;NS=2504;DP=14705;EAS_AF=0.8036;AMR_AF=0.6412;AFR_AF=0.0348;EUR_AF=0.5487;SAS_AF=0.5593;VT=INDEL	GT	1\|1
1	984171	rs140904842	CAG	C	100	PASS	AC=2;AF=0.920527;AN=2;NS=2504;DP=7127;EAS_AF=0.9891;AMR_AF=0.9769;AFR_AF=0.7602;EUR_AF=0.9742;SAS_AF=0.9714;VT=INDEL	GT	1\|1
1	1168239	rs533071750	C	CG	100	PASS	AC=0;AF=0.000599042;AN=2;NS=2504;DP=9648;EAS_AF=0;AMR_AF=0.0029;AFR_AF=0;EUR_AF=0.001;SAS_AF=0;AA=?\|GGGGGGG\|GGGGGGGG\|unsure;VT=INDEL;EX_TARGET	GT	0\|0
1	2343991	rs570192538	CCA	C	100	PASS	AC=0;AF=0.00459265;AN=2;NS=2504;DP=9045;EAS_AF=0;AMR_AF=0;AFR_AF=0.0174;EUR_AF=0;SAS_AF=0;VT=INDEL	GT	0\|0
1	2435830	rs555614613	TTCC	T	100	PASS	AC=0;AF=0.00579073;AN=2;NS=2504;DP=15005;EAS_AF=0;AMR_AF=0.0029;AFR_AF=0.0204;EUR_AF=0;SAS_AF=0;VT=INDEL;EX_TARGET	GT	0\|0
1	2492946	rs149579135	AG	A	100	PASS	AC=0;AF=0.00359425;AN=2;NS=2504;DP=17775;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0.0129;EUR_AF=0;SAS_AF=0;AA=G\|G\|-\|deletion;VT=INDEL	GT	0\|0
.	.	.	.	.	.	.	.	.	.

Filter by rsid

With bcftools you can filter a .vcf file for certain sites by passing in a file that contains the IDs to be retained.

Assuming we have the following RSIDs in a file called snps.list²:

rs145413551
rs34610323
rs79548709
rs371163239
rs148716910
rs374704178

We can use snps.list to filter with bcftools view:

bcftools view --include ID==@snps.list NA20536.clinvar.vcf.gz

CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA20536
17	648546	rs34610323	C	T	100	PASS	AC=0;AF=0.0159744;AN=2;NS=2504;DP=21874;EAS_AF=0;AMR_AF=0.0058;AFR_AF=0.0575;EUR_AF=0;SAS_AF=0;AA=C\|\|\|;VT=SNP;EX_TARGET	GT	0\|0
2	31620566	rs145413551	G	T	100	PASS	AC=0;AF=0.000199681;AN=2;NS=2504;DP=19652;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=G\|\|\|;VT=SNP;EX_TARGET	GT	0\|0
21	45707000	rs374704178	G	A	100	PASS	AC=0;AF=0.000399361;AN=2;NS=2504;DP=11479;EAS_AF=0;AMR_AF=0;AFR_AF=0.0015;EUR_AF=0;SAS_AF=0;AA=G\|\|\|;VT=SNP;EX_TARGET	GT	0\|0
5	151721	rs148716910	G	A	100	PASS	AC=0;AF=0.00279553;AN=2;NS=2504;DP=18789;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AA=G\|\|\|;VT=SNP;EX_TARGET	GT	0\|0
8	1841816	rs79548709	C	T	100	PASS	AC=0;AF=0.00519169;AN=2;NS=2504;DP=16683;EAS_AF=0;AMR_AF=0;AFR_AF=0.0197;EUR_AF=0;SAS_AF=0;AA=C\|\|\|;VT=SNP;EX_TARGET	GT	0\|0
8	3889458	rs371163239	T	A	100	PASS	AC=0;AF=0.000199681;AN=2;NS=2504;DP=15669;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=T\|\|\|;VT=SNP;EX_TARGET	GT	0\|0

Filter by chromosome and/or position

The --regions flag takes input chromosome and/or position coordinates to filter the .vcf.

If we wanted to restrict to chromosome 5:

bcftools view --regions 5 NA20536.vcf.gz

CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA20536
5	40417	esv3603720;esv3603721	G	,	100	PASS	AC=0,0;AF=0.000199681,0.000798722;AN=2;CS=DUP_uwash;END=176437;NS=2504;SVTYPE=CNV;DP=16231;EAS_AF=0,0;AMR_AF=0,0;AFR_AF=0,0;EUR_AF=0,0.003;SAS_AF=0.001,0.001;VT=SV;EX_TARGET	GT	0\|0
5	124186	esv3603731	T		100	PASS	AC=0;AF=0.000199681;AN=2;CS=DUP_gs;END=163795;NS=2504;SVTYPE=DUP;DP=19153;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0.001;SAS_AF=0;VT=SV;EX_TARGET	GT	0\|0
5	143490	rs142208662	C	T	100	PASS	AC=0;AF=0.00279553;AN=2;NS=2504;DP=19664;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AA=c\|\|\|;VT=SNP;EX_TARGET	GT	0\|0
5	151721	rs148716910	G	A	100	PASS	AC=0;AF=0.00279553;AN=2;NS=2504;DP=18789;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AA=G\|\|\|;VT=SNP;EX_TARGET	GT	0\|0
5	156288	rs193920840	C	T	100	PASS	AC=0;AF=0.000199681;AN=2;NS=2504;DP=17617;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0.001;AA=C\|\|\|;VT=SNP;EX_TARGET	GT	0\|0
5	162045	rs568109142	G	A	100	PASS	AC=0;AF=0.000199681;AN=2;NS=2504;DP=15391;EAS_AF=0.001;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=G\|\|\|;VT=SNP;EX_TARGET	GT	0\|0
.	.	.	.	.	.	.	.	.	.

And if we were interested in a specific region (let’s say chromosome 10, anywhere between positions 800000:900000):

bcftools view --regions 10:800000-900000 NA20536.clinvar.vcf.gz

CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA20536
10	859076	rs144565605	T	C	100	PASS	AC=0;AF=0.000199681;AN=2;NS=2504;DP=15608;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=T\|\|\|;VT=SNP;EX_TARGET	GT	0\|0
10	860990	rs144883024	G	A	100	PASS	AC=0;AF=0.00259585;AN=2;NS=2504;DP=18990;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0.0091;EUR_AF=0;SAS_AF=0;AA=G\|\|\|;VT=SNP;EX_TARGET	GT	0\|0
10	871816	rs79707128	T	A	100	PASS	AC=0;AF=0.0211661;AN=2;NS=2504;DP=21039;EAS_AF=0;AMR_AF=0.0058;AFR_AF=0.0703;EUR_AF=0;SAS_AF=0.0092;AA=T\|\|\|;VT=SNP;EX_TARGET	GT	0\|0

Format translated genotype output

bcftools query will output contents of the .vcf in text format. The contents can be specified in a string that includes fields to extract, separators, and line endings.

In this scenario, we’ll pull out the ID (RSID), chromosome, position, a translated genotype, and the “type” (SNP, INDEL, etc.) in tab-separated format:

bcftools query -f "%ID\t%CHROM\t%POS[\t%TGT]\t%TYPE\n" NA20536.clinvar.vcf.gz

ID	CHROM	POS	GT	TYPE
rs41285790	1	865628	G\|G	SNP
rs113383096	1	879481	G\|G	SNP
rs112433394	1	880944	G\|G	SNP
rs113226136	1	887409	G\|G	SNP
rs112966263	1	887989	A\|A	SNP
rs58931985	1	889450	C\|C	SNP
.	.	.	.	.

Merge vcf files together

bcftools merge will combine data from multiple files.

To merge individual sample .vcf files into one:

bcftools merge NA20536.clinvar.vcf.gz HG03718.clinvar.vcf.gz --output-type z --output NA20536.HG03718.clinvar.vcf.gz

CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA20536	HG03718
1	865628	rs41285790	G	A	100	PASS	NS=2504;AA=g\|\|\|;VT=SNP;EX_TARGET;DP=33950;AF=0.00279553;EAS_AF=0;AMR_AF=0.0072;AFR_AF=0;EUR_AF=0.005;SAS_AF=0.0041;AN=4;AC=0	GT	0\|0	0\|0
1	879481	rs113383096	G	C	100	PASS	NS=2504;AA=g\|\|\|;VT=SNP;EX_TARGET;DP=27530;AF=0.0197684;EAS_AF=0;AMR_AF=0.0058;AFR_AF=0.0719;EUR_AF=0;SAS_AF=0;AN=4;AC=0	GT	0\|0	0\|0
1	880944	rs112433394	G	A	100	PASS	NS=2504;AA=g\|\|\|;VT=SNP;EX_TARGET;DP=41446;AF=0.00259585;EAS_AF=0;AMR_AF=0;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AN=4;AC=0	GT	0\|0	0\|0
1	887409	rs113226136	G	C	100	PASS	NS=2504;AA=g\|\|\|;VT=SNP;EX_TARGET;DP=39832;AF=0.00119808;EAS_AF=0;AMR_AF=0;AFR_AF=0.0045;EUR_AF=0;SAS_AF=0;AN=4;AC=0	GT	0\|0	0\|0
1	887989	rs112966263	A	G	100	PASS	NS=2504;AA=G\|\|\|;VT=SNP;EX_TARGET;DP=36768;AF=0.00579073;EAS_AF=0;AMR_AF=0;AFR_AF=0.0219;EUR_AF=0;SAS_AF=0;AN=4;AC=0	GT	0\|0	0\|0
1	889450	rs58931985	C	A	100	PASS	NS=2504;AA=C\|\|\|;VT=SNP;EX_TARGET;DP=32298;AF=0.00159744;EAS_AF=0;AMR_AF=0;AFR_AF=0.0061;EUR_AF=0;SAS_AF=0;AN=4;AC=0	GT	0\|0	0\|0
.	.	.	.	.	.	.	.	.	.	.

Parse genotypes for multiple samples

Given that you have a mutli-sample .vcf you can parse genotypes for each individual:

bcftools query -f '[%CHROM\t%POS\t%SAMPLE\t%TGT\n]' NA20536.HG03718.clinvar.vcf.gz

CHROM	POS	SAMPLE	GT
1	865628	NA20536	G\|G
1	865628	HG03718	G\|G
1	879481	NA20536	G\|G
1	879481	HG03718	G\|G
1	880944	NA20536	G\|G
1	880944	HG03718	G\|G
.	.	.	.

Edit chromosome names

You can rename chromosomes with bcftools annotate --rename-chrs. The command requires that you supply a tab-separated file with the desired naming convention, organized as “old\tnew”:

1\tchr1
2\tchr2
3\tchr3
4\tchr4
5\tchr5
6\tchr6
7\tchr7
8\tchr8
9\tchr9
10\tchr10
11\tchr11
12\tchr12
13\tchr13
14\tchr14
15\tchr15
16\tchr16
17\tchr17
18\tchr18
19\tchr19
20\tchr20
21\tchr21
22\tchr22

bcftools annotate --rename-chrs chromosomes.txt NA20536.clinvar.vcf.gz

CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA20536
chr1	865628	rs41285790	G	A	100	PASS	AC=0;AF=0.00279553;AN=2;NS=2504;DP=16975;EAS_AF=0;AMR_AF=0.0072;AFR_AF=0;EUR_AF=0.005;SAS_AF=0.0041;AA=g\|\|\|;VT=SNP;EX_TARGET	GT	0\|0
chr1	879481	rs113383096	G	C	100	PASS	AC=0;AF=0.0197684;AN=2;NS=2504;DP=13765;EAS_AF=0;AMR_AF=0.0058;AFR_AF=0.0719;EUR_AF=0;SAS_AF=0;AA=g\|\|\|;VT=SNP;EX_TARGET	GT	0\|0
chr1	880944	rs112433394	G	A	100	PASS	AC=0;AF=0.00259585;AN=2;NS=2504;DP=20723;EAS_AF=0;AMR_AF=0;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AA=g\|\|\|;VT=SNP;EX_TARGET	GT	0\|0
chr1	887409	rs113226136	G	C	100	PASS	AC=0;AF=0.00119808;AN=2;NS=2504;DP=19916;EAS_AF=0;AMR_AF=0;AFR_AF=0.0045;EUR_AF=0;SAS_AF=0;AA=g\|\|\|;VT=SNP;EX_TARGET	GT	0\|0
chr1	887989	rs112966263	A	G	100	PASS	AC=0;AF=0.00579073;AN=2;NS=2504;DP=18384;EAS_AF=0;AMR_AF=0;AFR_AF=0.0219;EUR_AF=0;SAS_AF=0;AA=G\|\|\|;VT=SNP;EX_TARGET	GT	0\|0
chr1	889450	rs58931985	C	A	100	PASS	AC=0;AF=0.00159744;AN=2;NS=2504;DP=16149;EAS_AF=0;AMR_AF=0;AFR_AF=0.0061;EUR_AF=0;SAS_AF=0;AA=C\|\|\|;VT=SNP;EX_TARGET	GT	0\|0
.	.	.	.	.	.	.	.	.	.

View without header

To view only the results without header (i.e. remove the header) use the -H flag:

bcftools view -H NA20536.HG03718.clinvar.vcf.gz

X1	X2	X3	X4	X5	X6	X7	X8	X9	X10	X11
1	865628	rs41285790	G	A	100	PASS	NS=2504;AA=g\|\|\|;VT=SNP;EX_TARGET;DP=33950;AF=0.00279553;EAS_AF=0;AMR_AF=0.0072;AFR_AF=0;EUR_AF=0.005;SAS_AF=0.0041;AN=4;AC=0	GT	0\|0	0\|0
1	879481	rs113383096	G	C	100	PASS	NS=2504;AA=g\|\|\|;VT=SNP;EX_TARGET;DP=27530;AF=0.0197684;EAS_AF=0;AMR_AF=0.0058;AFR_AF=0.0719;EUR_AF=0;SAS_AF=0;AN=4;AC=0	GT	0\|0	0\|0
1	880944	rs112433394	G	A	100	PASS	NS=2504;AA=g\|\|\|;VT=SNP;EX_TARGET;DP=41446;AF=0.00259585;EAS_AF=0;AMR_AF=0;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AN=4;AC=0	GT	0\|0	0\|0
1	887409	rs113226136	G	C	100	PASS	NS=2504;AA=g\|\|\|;VT=SNP;EX_TARGET;DP=39832;AF=0.00119808;EAS_AF=0;AMR_AF=0;AFR_AF=0.0045;EUR_AF=0;SAS_AF=0;AN=4;AC=0	GT	0\|0	0\|0
1	887989	rs112966263	A	G	100	PASS	NS=2504;AA=G\|\|\|;VT=SNP;EX_TARGET;DP=36768;AF=0.00579073;EAS_AF=0;AMR_AF=0;AFR_AF=0.0219;EUR_AF=0;SAS_AF=0;AN=4;AC=0	GT	0\|0	0\|0
1	889450	rs58931985	C	A	100	PASS	NS=2504;AA=C\|\|\|;VT=SNP;EX_TARGET;DP=32298;AF=0.00159744;EAS_AF=0;AMR_AF=0;AFR_AF=0.0061;EUR_AF=0;SAS_AF=0;AN=4;AC=0	GT	0\|0	0\|0
.	.	.	.	.	.	.	.	.	.	.

View only header

To view only the header (i.e. extract header) use the -h flag:

bcftools view -h clinvar.vcf.gz

## ##fileformat=VCFv4.1
## ##FILTER=<ID=PASS,Description="All filters passed">
## ##fileDate=2020-02-17
## ##source=ClinVar
## ##reference=GRCh37
## ##ID=<Description="ClinVar Variation ID">
## ##INFO=<ID=AF_ESP,Number=1,Type=Float,Description="allele frequencies from GO-ESP">
## ##INFO=<ID=AF_EXAC,Number=1,Type=Float,Description="allele frequencies from ExAC">
## ##INFO=<ID=AF_TGP,Number=1,Type=Float,Description="allele frequencies from TGP">
## ##INFO=<ID=ALLELEID,Number=1,Type=Integer,Description="the ClinVar Allele ID">
## ##INFO=<ID=CLNDN,Number=.,Type=String,Description="ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
## ##INFO=<ID=CLNDNINCL,Number=.,Type=String,Description="For included Variant : ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
## ##INFO=<ID=CLNDISDB,Number=.,Type=String,Description="Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
## ##INFO=<ID=CLNDISDBINCL,Number=.,Type=String,Description="For included Variant: Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
## ##INFO=<ID=CLNHGVS,Number=.,Type=String,Description="Top-level (primary assembly, alt, or patch) HGVS expression.">
## ##INFO=<ID=CLNREVSTAT,Number=.,Type=String,Description="ClinVar review status for the Variation ID">
## ##INFO=<ID=CLNSIG,Number=.,Type=String,Description="Clinical significance for this single variant">
## ##INFO=<ID=CLNSIGCONF,Number=.,Type=String,Description="Conflicting clinical significance for this single variant">
## ##INFO=<ID=CLNSIGINCL,Number=.,Type=String,Description="Clinical significance for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:clinical significance.">
## ##INFO=<ID=CLNVC,Number=1,Type=String,Description="Variant type">
## ##INFO=<ID=CLNVCSO,Number=1,Type=String,Description="Sequence Ontology id for variant type">
## ##INFO=<ID=CLNVI,Number=.,Type=String,Description="the variant's clinical sources reported as tag-value pairs of database and variant identifier">
## ##INFO=<ID=DBVARID,Number=.,Type=String,Description="nsv accessions from dbVar for the variant">
## ##INFO=<ID=GENEINFO,Number=1,Type=String,Description="Gene(s) for the variant reported as gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
## ##INFO=<ID=MC,Number=.,Type=String,Description="comma separated list of molecular consequence in the form of Sequence Ontology ID|molecular_consequence">
## ##INFO=<ID=ORIGIN,Number=.,Type=String,Description="Allele origin. One or more of the following values may be added: 0 - unknown; 1 - germline; 2 - somatic; 4 - inherited; 8 - paternal; 16 - maternal; 32 - de-novo; 64 - biparental; 128 - uniparental; 256 - not-tested; 512 - tested-inconclusive; 1073741824 - other">
## ##INFO=<ID=RS,Number=.,Type=String,Description="dbSNP ID (i.e. rs number)">
## ##INFO=<ID=SSR,Number=1,Type=Integer,Description="Variant Suspect Reason Codes. One or more of the following values may be added: 0 - unspecified, 1 - Paralog, 2 - byEST, 4 - oldAlign, 8 - Para_EST, 16 - 1kg_failed, 1024 - other">
## ##contig=<ID=1>
## ##contig=<ID=2>
## ##contig=<ID=3>
## ##contig=<ID=4>
## ##contig=<ID=5>
## ##contig=<ID=6>
## ##contig=<ID=7>
## ##contig=<ID=8>
## ##contig=<ID=9>
## ##contig=<ID=10>
## ##contig=<ID=11>
## ##contig=<ID=12>
## ##contig=<ID=13>
## ##contig=<ID=14>
## ##contig=<ID=15>
## ##contig=<ID=16>
## ##contig=<ID=17>
## ##contig=<ID=18>
## ##contig=<ID=19>
## ##contig=<ID=20>
## ##contig=<ID=21>
## ##contig=<ID=22>
## ##contig=<ID=X>
## ##contig=<ID=Y>
## ##contig=<ID=MT>
## ##bcftools_viewVersion=1.10.2-27-g9d66868+htslib-1.10.2-33-g1bbcd02
## ##bcftools_viewCommand=view -h clinvar.vcf.gz; Date=Fri Feb 28 19:06:40 2020
## #CHROM   POS ID  REF ALT QUAL    FILTER  INFO

From ClinVar vcf documentation: This file contains variations submitted through clinical channels. The variations contained in this file are therefore a mixture of variations asserted to be pathogenic as well as those known to be non-pathogenic. The user should note that any variant may have different assertions regarding clinical significance and that this file will contain only those that are the most “pathogenic”.↩
This solution is based on a Biostars post: https://www.biostars.org/p/373852/↩

bioinformatics