[go: up one dir, main page]

Ilkhechi et al., 2020 - Google Patents

Deepsqueeze: Deep semantic compression for tabular data

Ilkhechi et al., 2020

View PDF
Document ID
4832615398456292895
Author
Ilkhechi A
Crotty A
Galakatos A
Mao Y
Fan G
Shi X
Çetintemel U
Publication year
Publication venue
Proceedings of the 2020 ACM SIGMOD international conference on management of data

External Links

Snippet

With the rapid proliferation of large datasets, efficient data compression has become more important than ever. Columnar compression techniques (eg, dictionary encoding, run-length encoding, delta encoding) have proved highly effective for tabular data, but they typically …
Continue reading at cs.brown.edu (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30067File systems; File servers
    • G06F17/30129Details of further file system functionalities
    • G06F17/3015Redundancy elimination performed by the file system
    • G06F17/30153Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30067File systems; File servers
    • G06F17/30129Details of further file system functionalities
    • G06F17/3015Redundancy elimination performed by the file system
    • G06F17/30156De-duplication implemented within the file system, e.g. based on file segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30289Database design, administration or maintenance
    • G06F17/30303Improving data quality; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30312Storage and indexing structures; Management thereof
    • G06F17/30321Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30587Details of specialised database models
    • HELECTRICITY
    • H03BASIC ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same information or similar information or a subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
    • H03M7/3086Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing a sliding window, e.g. LZ77
    • HELECTRICITY
    • H03BASIC ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same information or similar information or a subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • H03M7/4031Fixed length to variable length coding
    • H03M7/4037Prefix coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • HELECTRICITY
    • H03BASIC ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same information or similar information or a subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3082Vector coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled

Similar Documents

Publication Publication Date Title
Ilkhechi et al. Deepsqueeze: Deep semantic compression for tabular data
US10901948B2 (en) Query predicate evaluation and computation for hierarchically compressed data
CN107210753B (en) Lossless reduction of data by deriving data from prime data units residing in a content association filter
Zender Bit Grooming: statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4. 4.8+)
Bhattacharjee et al. Comparison study of lossless data compression algorithms for text data
Feng et al. MLC: An efficient multi-level log compression method for cloud backup systems
Shun et al. Practical parallel lempel-ziv factorization
Gao et al. Squish: Near-optimal compression for archival of relational datasets
JP2018524886A (en) Perform multi-dimensional search, content associative retrieval, and keyword-based retrieval and retrieval for lossless data using basic data sheaves
Yu et al. Unlocking the Power of Numbers: Log Compression via Numeric Token Parsing
Niemi et al. Burrows‐Wheeler post‐transformation with effective clustering and interpolative coding
Talasila et al. Generalized deduplication: Lossless compression by clustering similar data
US20230367752A1 (en) Systems and methods for processing timeseries data
Oswald et al. An efficient text compression algorithm-data mining perspective
Hishida et al. Beyond Compression: A Comprehensive Evaluation of Lossless Floating-Point Compression
Zavadskyi et al. Binary mixed-digit data compression codes
Oswald et al. Text and image compression based on data mining perspective
Oswald et al. Hash based frequent pattern mining approach to text compression
Mahmoudi et al. Comparison of Compression Algorithms in text data for Data Mining
Collet et al. OpenZL: A Graph-Based Model for Compression
Das ByteZip: Efficient Lossless Compression for Structured Byte Streams Using DNNs
Tenhunen Scientific Methods in Relational Database Compression Research
Persson et al. Compression Selection for Columnar Data using Machine-Learning and Feature Engineering
Shapira Compressed transitive delta encoding
Ferragina et al. Compressibility Measures and Succinct Data Structures for Piecewise Linear Approximations