The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.

Download Latest Version samtools-1.22.1.tar.bz2 (9.3 MB)
Email in envelope

Get an email when there's a new version of SAM tools

Home / samtools / 1.22
Name Modified Size InfoDownloads / Week
Parent folder
README.txt 2025-05-30 18.9 kB
bcftools-1.22.tar.bz2 2025-05-30 8.2 MB
samtools-1.22.tar.bz2 2025-05-30 9.3 MB
htslib-1.22.tar.bz2 2025-05-30 4.8 MB
Totals: 4 Items   22.3 MB 6
------------------------------------------------------------------------------
htslib - changes v1.22
------------------------------------------------------------------------------

Note this release changes the default output CRAM version from 3.0 to
3.1.HTSlib and SAMtools have been able to read CRAM 3.1 since version
1.12, however other tools may not yet be able to cope.  We know Noodles
reads CRAM3.1 and htsjdk has a draft implementation that has not yet been
released.

HTSlib has options for modifying the output formats, which are exposed
in SAMtools.  When specifying an output format you can explicitly change
the version via e.g. `samtools view -O cram,version=3.0 ...`.

Further documentation on this change can be found at
https://www.htslib.org/benchmarks/CRAM.html

HTSlib no longer fetches CRAM reference data from EBI's server by default.
Your organisation may wish to set up local infrastructure to supply
reference sequences, e.g., using the new ref-cache tool included in this
HTSlib release. See the REF_CACHE and REF_PATH environment variables
documented in https://www.htslib.org/doc/reference_seqs.html and the
SAMtools manpage for details.

Updates
-------

* NEW. Add ref-cache, a caching proxy for reference sequences.  This is a
  local server of reference sequences, for use when encoding or decoding CRAM
  files that use reference-based compression. (PR #1911, PR #1921, PR #1922)

* Add support for matching VCF lines by ID. (PR #1844, addresses issue
  bcftools#1739 reported by Han Cao)

* Make it possible to test for VCF_REF as declared in the documentation.
  (PR #1879)

* Updated VCF code to work with VCF 4.4 prefixed phasing info. (PR#1861,
  fixes #1847.  Reported by John Marshall)

* Use the highest VCF version when merging headers. (PR#1912, see
  bcftools#2395 and bcftools#2404)

* Update RLEN calculation for VCF 4.4 and 4.5. (PR#1897, fixes #1820. 
  Reported by Dave Lawrence)

* Convert U to T instead of U to N when sam_parsing.  Though SAM format
  itself can contain U the BAM format cannot. (PR #1854, fixes samtools#2131
  reported by James Ferguson)

* Add an hts_crc32 function to use zlib or libdeflate.  The libdeflate crc32
  function is faster than native zlib and should be used when available. (PR
  #1850)

* Increase the input block size for bgzip. This deals with a slow down
  introduced in PR #1493 when reading from a pipe. (PR #1768, fixes
  #1767.  Reported by Konstantin Riege)

* Allow BYTE_ARRAY_STOP to work on non-zero STOP code with TOK3.  Although
  the htscodecs name tokeniser uses a NUL between names there is no reason
  why another value could not be used.  This change lets CRAM recognise other
  separator values. (PR #1871)

* Remove cram seek ability to do range queries via SEEK_CUR.  A probable
  misfeature from the original implementation. (PR #1878, fixes #1877. 
  Reported by Rick Wertenbroek)

* Add hts_tpool_worker_id() API.  This may be used to associate data with a
  thread rather than to a job. (PR #1875)

* Update bcf_synced_reader to use htsFile. (PR #1868, implements #1862. 
  Requested by Brent Pedersen)

* Exit with return value 1 on tabix parse error.  This previously returned 0.
  (PR #1887, fixes #1885.  Reported by Fan-iX)

* Automatically recognise BED vs TSV files and add the option -C, --coords to
  set index positions (1 or 0 based coordinates) in annot-tsv. (PR #1894)

* Reading SQ lines with multiple differing LN will now fail.  Such lines are
  invalid (by the spec) and previous handling was inconsistent. (PR #1882,
  fixes #1866)

* Return errors instead of EOF after all I/O errors etc in
  hts_itr_multi_next/sam_itr_next/sam_read1/vcf_parse/bcf_read. (PR#1899.
   Thanks to John Marshall)

* Remove UR:file:// and UR:ftp:// from ref search path, plus REF_PATH to EBI.
  Removing EBI as the default fallback when REF_PATH not set prevents the
  unintended DDOS on EBI's servers. (PR#1881. PR#1915, fixes oss-fuzz
  issue 418125747)

Build Changes
-------------

* Detect the presence of getauxval() and elf_aux_info() for *BSD variants.
  (PR #1835, thanks to Brad Smith)

* Make HAVE_ATTRIBUTE_TARGET check also check that SSSE3 intrinsics work.
  Mainly for use with old compilers. (PR #1886, fixes #1838 and
  pysam-developers/pysam#1327.  Thanks to John Marshall)

* Fix broken tests due to MSYS2 changes. Due to changes in how MSYS2 perl
  reported the identity of the OS it was built for, our tests were failing to
  adapt to the Windows style file locations. (PR #1892)

* Updated htscodecs submodule to version 1.6.3 (PR #1917)

* Fix the script used to build the symbol version file. (PR #1918)

Bug fixes
---------

* Fix possible 1 byte underflow in find_file_extension(). Fixes an issue
  reported by OSS-Fuzz. (PR #1840, fixes oss-fuzz id 71740)

* Replace home-brew string end searching with memchr() to speed up looking at
  long aux tags. (PR #1842)

* Prevent segfault on empty tbi index.  This could happen when a VCF file has
  a header but no data lines. (PR #1845, fixes bcftools#2286.  Reported by
  Devon Ryan)

* Fix CRAM embed_ref=2 with seqs overlapping ref end. (PR #1848 and PR #1849
  which fixed oss-fuzz issue 372547397)

* Fix sam_hdr_remove_line_pos() not dealing with the 0 index position
  properly. (PR #1853.  Thanks to Julian Regalado Perez)

* Fix threaded sam_read1() after EOF.  Prevents sam_read1() getting stuck
  when trying to read after EOF and waiting forever for data that is never
  going to arrive. (PR #1856, fixes #1855.  Reported by Yan Gao)

* Fix a bug in breakend detection. It was incorrectly assuming that the ALT
  allele is of equal length to REF allele, but the VCF specification allows
  breakend insertions. (PR #1858, fixes bcftools#2317.  Reported by
  Nicolai von Kügelgen).

* Fix cram_encode fuzzer issue caused by negative reference lengths. 
  Reported by OSS-Fuzz. (PR #1863 fixes oss-fuzz issue 382922241)

* Fixed a typo in vcf.h. (PR #1870, thanks to Yu Wang)

* Reset variant types after updating alleles with bcf_update_alleles() or
  bcf_update_alleles_str().  Prevents an out-of-bounds access by bcftools
  consensus. (PR #1883)

* Recognize T > A[chr15:12345[ breakend type in VCF. (PR#1903, fixes
  bcftools#2389.  Reported by Dennis Hendriksen)

* Fix possible buffer overruns in expand_path(). (PR#1907)

Documentation updates
---------------------

* Add instructions to INSTALL for FreeBSD, NetBSD and OpenBSD. (PR #1843)

* Clarify bam_set1() parameter documentation to note that quality values do
  not have the ASCII 33 offset. (PR #1891.  Thanks to Chris Wright)

* Fixed incorrectly named table in bam1_t structure documentation. (PR #1923.
  Thanks to Julian Hess)

------------------------------------------------------------------------------
samtools - changes v1.22
------------------------------------------------------------------------------

Note this release changes the default output CRAM version from 3.0 to
3.1.HTSlib and SAMtools have been able to read CRAM 3.1 since version
1.12, however other tools may not yet be able to cope.  We know Noodles
reads CRAM3.1 and htsjdk has a draft implementation, but not yet released.

HTSlib has options for modifying the output formats, which are exposed
in SAMtools.  When specifying an output format you can explicitly change
the version via e.g. `samtools view -O cram,version=3.0 ...`.

Further documentation on this change can be found at
https://www.htslib.org/benchmarks/CRAM.html

HTSlib no longer fetches CRAM reference data from EBI's server by default.
Your organisation may wish to set up local infrastructure to supply
reference sequences, e.g., using the new ref-cache tool included in this
HTSlib release. See the REF_CACHE and REF_PATH environment variables
documented in https://www.htslib.org/doc/reference_seqs.html and the
SAMtools manpage for details.

New work and changes:

* New `samtools checksum` command.  This checksums sequence, name, quality
  and barcode tags in an order and orientation agnostic manner, to facilitate
  validation of no data loss between raw fastq (or unmapped crams) through
  alignment, duplication marking, sorting, and other processing operations to
  get to the final aligned bam/cram. (PR#2122)

* Extend `samtools sort -M` to distinguish between mapped and unmapped files.
  (PR#2110, fixes #2105.  Reported by Armin Töpfer)

* Allow the `samtools sort` "merging from..." message to be silenced. Setting
  the verbosity to 0 or 1 will now silence this message. (PR#2197, resolves
  #2185.  Requested by Alex Predeus)

* Add `--save-counts` option to `samtools view`.  Adds an option to store
  counts of records processed, accepted and rejected by filtering to a file.
  (PR#2120, resolves #2038.  Requested by Chang Y)

* `samtools fasta` and `fastq` can now make faidx/fqidx indexes while writing
  using the `--write-index` option. (PR#2125, resolves #2118.  Requested by
  Filipe G. Vieira)

* Add a warning for `samtools fastq` on coordinate sorted data. (PR#2176,
  fixes #2169 and #2161.  Reported by wook2014)

* `samtools tview` add `-i` to hide inserts. (PR#2123.  Thanks to
  Benjamin Bræstrup Sayoc)

* Show optional headers with `samtools bedcov -H`. (PR#2140, fixes #2126. 
  Reported by biounix)

* `samtools consensus` now supports proper multi-threading.  Previously
  this was restricted to decompression only, but it should now scale
  better. (PR#2174, supersedes PR#2141)

* Add `samtools consensus -T ref.fa` functionality.  This reports the
  reference value if a consensus value cannot be calculated. (PR#2153,
  fixes an additional request in #1915)

* In `samtools consensus`, do not use consensus N for "*" (absent) calls that
  are masked due to insufficient depth. (PR#2204, fixes #2167.  Reported by
  sanschiffre)

* Improve `plot-bamstats` quality plots. (PR#2143 combined with PR#2116
  (thanks to James Gilbert))

* Make `reheader -h` use /tmp and honour TMPDIR. (PR#2168, related to #2165. 
  Reported by Zhang Yuanfeng)

* Set sort order header tag to unsorted when ordering is lost during
  `samtools merge`. (PR#2173, fixes #2159.  Reported by Filipe G. Vieira)

* Protect against merging CRAM files with different headers. (PR#2220, fixes
  #2218.  Reported by Kevin Lewis)

* `samtools stats` bug-fix to checksum calculation for quality values.  This
  corrects the checksums but in turn makes the calculated value different to
  that reported by previous samtools versions. (PR#2193, fixes #2187)

* Clarification for `samtools stats` when used on files with different sort
  orders. (PR#2198, fixes #2177.  Reported by Filipe G. Vieira)

* In `samtools stats`, dovetailed (completely overlapping) read pairs are
  now always counted as inward-oriented.  Previously they could have been
  inwards or outwards depending on read ordering. (PR#2216, resolves #2210.
   Requested by Pontus Hüer)

Documentation:

* Correct the example for 1:1 `samtools consensus` coords. (PR#2113, fixes
  #2111.  Reported by schorlton-bugseq)

* Documents the fastq format options used in SAMtools and HTSlib. (PR#2123,
  fixes #2121)

* Remove mention of threads from `samtools cat` man page. (PR#2162, fixes
  #2160.  Reported by Brandon Pickett)

* Update `samtools merge` man page to include `--template-coordinate`.
  (PR#2164.  Thanks to Nils Homer)

* Revised CRAM reference sequence documentation in the samtools man page.
  (PR#2178)

* Added fish shell completion and renamed completion for bash shell.  These
  files can be copied to appropriate directories by the user.  For full
  functionality it requires Python3.5+ and installed samtools manpages.
  (PR#2203.  Thanks to LunarEclipse363)

* Fix URL printed by the `seq_cache_populate.pl` script. (PR#2222.  Thanks to
  Charles Plessy)

Bug fixes:

* `samtools consensus` previously could give different results for BAM and
  CRAM files with the same content.  This was because MD/NM tag generation
  was disabled in CRAM, but the `decode_md=0` option did nothing with BAM.
  Note with `--no-adj-MQ` both BAM and CRAM gave identical results. Now
  use `--input-fmt-option decode_md=0` to get the old CRAM behaviour.
  Otherwise, both BAM and CRAM will be utilising MD/NM to locally modify
  mapping quality. (PR#2156)

* `samtools consensus` without `-a` previously still padded with leading Ns
  in some cases.  It now consistently removes both leading and trailing Ns.
  Use "-a" if you want all reference bases displayed. (Part of PR#2174 above)

* Change how `markdup` looks for single reads.  Due to changes to `fixmate`
  in 1.21 `markdup` no longer recognised single reads that would have
  normally have been part of a pair. (PR#2117, fixes #2117.  Reported by
  Kristy Horan)

* Fix `samtools merge` crash on BAM files with malformed headers. (PR#2128,
  fixes #2127.  Reported by Frostb1te)

* Fix `faidx --write-index` invalid free. (PR#2147, fixes 2142.  Reported by
  Alex Leonard)

* Fix `samtools fastq -i` to force CRAM aux tag decoding. (PR#2155, fixes
  #2155.  Reported by Alex Leonard)

Non user-visible changes and build improvements:

* Improve htslib#PRnum support for Cirrus-CI and GitHub Actions. (PR#2115)

* Fix broken tests due to MSYS2 changes. Due to changes in how MSYS2 perl
  reported the identity of the OS it was built for, our tests were failing to
  adapt to the Windows style file locations. (PR #2196)

* Upgrade to `_XOPEN_SOURCE=700`, to match HTSlib.  Also replace `usleep()`
  with `nanosleep()`. (PR#2221)

------------------------------------------------------------------------------
bcftools - changes v1.22
------------------------------------------------------------------------------

Changes affecting the whole of bcftools, or multiple commands:

* Add support for matching lines by ID via the --pair-logic and --collapse
  options (#1739)

* The -i/-e filtering expressions

    - The expressions now properly match the regex negation of missing
      values, e.g. -i 'TAG!~"\."' (#2355)

    - Added support for Fisher's exact test

* Add the option `-v, --verbosity INT` to all bcftools commands and
  plugins. Verbosity values bigger than 3 are passed to the underlying
  HTSlib library so that the user can investigate network issues and
  other problems occurring at the library level.

Changes affecting specific commands:

* bcftools annotate

    - Fix Number in the header definition of transferred FILTER and ID tags
      (#2335)

* bcftools call

    - The `-s, --samples` option was not working properly, now also
      supporting sample negation as advertised in the manual page, e.g.
      `-s ^sample1,sample2` to include all samples but sample1 and
      sample2 (#2380)

* bcftools consensus

    - Preserve entire missing gVCF blocks with --missing (#2350)

    - Fixed a bug, the `-S, --samples-file` option is no longer ignored
      (#2398)

* bcftools convert

    - The command `convert --gvcf2vcf` was not filling the REF allele when
      BCF was output (#243)

* bcftools csq

    - Check the input GFF for features outside transcript boundaries and
      extend the transcript to contain the feature fully (#2323)

    - Add experimental support for alternative genetic code tables,
      accessible via a new option `-C, --genetic-code` (#2368)

    - Change in the `--unify-chr-names` option, no automatic sequence
      name modification is attempted anymore, the prefixes to trim must
      be given explictly. For example, if run with
      `--unify-chr-names chr,Chromosome,-`, the program will trim the
      "chr" prefix in the VCF, "Chromosome" in the GFF, leaving the
      fasta unchanged (#2378)

* bcftools +fill-tags

    - Thanks to the extension of filtering expressions with Fisher's exact
      test, the plugin can now be used to add FT annotation (#1582)

* bcftools merge

    - Preserve phasing in half-missing genotypes (#2331)

    - The option `--merge none` is expected to create no new multiallelic
      sites, but it should allow to merge, say, A>C with A>C,AT (#2333)

    - Make `--merge both` work with indel-only records; for example, the
      multiallelic site G>GT,T should be merged with G>GT (#2339)

    - Do not merge symbolic alleles unless they have not just the same type,
      eg. <DEL>, but also length, i.e the INFO/END coordinate (#2362)

    - Fix a bug where an incorrectly formatted gVCF file with overlapping
      blocks would trigger an infinite loop in the program (#2410)

* bcftools mpileup

    - The -r/-R option newly merge overlapping regions, preventing the output
      of duplicate sites

* bcftools norm

    - Print the number of removed duplicate sites in the final statistics
      (#2346)

    - Preserve the original alleles in `--old-rec-tag` when `--check-ref s`
      requested (#2357)

    - Print a warning when INFO/SVLEN is not defined as Number=A (#2371)

* plot-vcfstats

    - Make the option `-s, --sample-names` functional again (#2353)

* bcftools +prune

    - New option to remove or annotate clusters of sites within a window

* bcftools query

    - The functions used in -i/-e filtering expressions (such as SUM,
      MEDIAN, etc) can be now used in formatting expressions (#2271). If
      the VCF contains INFO/AD and FORMAT/AD, try:

 bcftools query test.vcf -f '%CHROM:%POS \t [ %AD]  \t [ %sSUM(FMT/AD)]'
 bcftools query test.vcf -f '%CHROM:%POS \t [ %AD]  \t [ %SUM(FMT/AD)]'
 bcftools query test.vcf -f '%CHROM:%POS \t [ %AD]  \t   %SUM(FMT/AD)'
 bcftools query test.vcf -f '%CHROM:%POS \t [ %AD]  \t   %SUM(INFO/AD)'

    - Make it possible to refer to the ID column from the FORMAT expression
      (#2337)

 bcftools query test.vcf -f 'ID=%ID  ID=[ %/ID]  vs  FMT_ID=[ %ID]'

* bcftools roh

    - New visualization tool misc/roh-viz, see below

* bcftools +setGT

    - Support for setting missing genotypes with arbitrary ploidy
      via `-n c:./.` (#2303)

* bcftools +split-vep

    - The `-s, --select` option was extended to print only one consequence.
      Previously it was possible to select a single transcript (e.g., the
      one with the worst consequence), and it was possible to filter by
      consequence severity (e.g., missing or worse), but in some cases
      multiple consequences are reported within a single transcript (e.g.,
      start_lost&splice_region). The extended option allows to print the
      worst part, for example as
         --select primary:missense+:worst

* bcftools +trio-dnm2

    - Fix a problem with --strictly-novel option which would neglect the
      presence of the apparent de novo allele in the father for male
      offspring

    - Fix a problem with uncalled mosaic chrX variants in males

* roh-viz

    - HTML/JavaScript visualization of bcftools/roh output and homozygosity
      rate.

* bcftools +vrfs

    - New experimental plugin for scoring variants and assess site noisiness
      (variant read frequency profiles) from a large number of unaffected
      parental samples

Source: README.txt, updated 2025-05-30