[go: up one dir, main page]

Meng et al., 2023 - Google Patents

Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach

Meng et al., 2023

View HTML
Document ID
5614553162732858376
Author
Meng Q
Chandak S
Zhu Y
Weissman T
Publication year
Publication venue
Scientific Reports

External Links

Snippet

The amount of data produced by genome sequencing experiments has been growing rapidly over the past several years, making compression important for efficient storage, transfer and analysis of the data. In recent years, nanopore sequencing technologies have …
Continue reading at www.nature.com (HTML) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30312Storage and indexing structures; Management thereof
    • G06F17/30321Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30386Retrieval requests
    • G06F17/30424Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30289Database design, administration or maintenance
    • G06F17/30303Improving data quality; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30067File systems; File servers
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30861Retrieval from the Internet, e.g. browsers
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30943Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
    • G06F17/30946Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30017Multimedia data retrieval; Retrieval of more than one type of audiovisual media
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • G06F19/10Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
    • G06F19/22Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for sequence comparison involving nucleotides or amino acids, e.g. homology search, motif or SNP [Single-Nucleotide Polymorphism] discovery or sequence alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme

Similar Documents

Publication Publication Date Title
Cox et al. Large-scale compression of genomic sequence databases with the Burrows–Wheeler transform
Holley et al. Bifrost: highly parallel construction and indexing of colored and compacted de Bruijn graphs
Deorowicz FQSqueezer: k-mer-based compression of sequencing data
Kuhnle et al. Efficient construction of a complete index for pan-genomics read alignment
Harris et al. Improved representation of sequence bloom trees
Novak et al. A graph extension of the positional Burrows–Wheeler transform and its applications
Břinda et al. Simplitigs as an efficient and scalable representation of de Bruijn graphs
US20070174238A1 (en) Indexing and searching numeric ranges
Ginart et al. Optimal compressed representation of high throughput sequence data via light assembly
Bonfield CRAM 3.1: advances in the CRAM file format
Kokot et al. CoLoRd: compressing long reads
Shiryev et al. Indexing and searching petabase-scale nucleotide resources
Břinda et al. Efficient and robust search of microbial genomes via phylogenetic compression
Meng et al. Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach
Al-Okaily et al. Toward a better compression for DNA sequences using Huffman encoding
Song et al. Centrifuger: lossless compression of microbial genomes for efficient and accurate metagenomic sequence classification
Yi et al. Kssd: sequence dimensionality reduction by k-mer substring space sampling enables real-time large-scale datasets analysis
Li BWT construction and search at the terabase scale
Shibuya et al. Space-efficient representation of genomic k-mer count tables
Dufresne et al. The K-mer File Format: a standardized and compact disk representation of sets of k-mers
Pibiri et al. Meta-colored compacted de Bruijn graphs
Shibuya et al. Better quality score compression through sequence-based quality smoothing
Sun et al. PMFFRC: a large-scale genomic short reads compression optimizer via memory modeling and redundant clustering
Campanelli et al. Where the patterns are: repetition-aware compression for colored de Bruijn graphs
Kredens et al. Vertical lossless genomic data compression tools for assembled genomes: A systematic literature review