Ilkhechi et al., 2020 - Google Patents
Deepsqueeze: Deep semantic compression for tabular dataIlkhechi et al., 2020
View PDF- Document ID
- 4832615398456292895
- Author
- Ilkhechi A
- Crotty A
- Galakatos A
- Mao Y
- Fan G
- Shi X
- Çetintemel U
- Publication year
- Publication venue
- Proceedings of the 2020 ACM SIGMOD international conference on management of data
External Links
Snippet
With the rapid proliferation of large datasets, efficient data compression has become more  important than ever. Columnar compression techniques (eg, dictionary encoding, run-length  encoding, delta encoding) have proved highly effective for tabular data, but they typically … 
    - 238000007906 compression 0 title abstract description 134
Classifications
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30067—File systems; File servers
- G06F17/30129—Details of further file system functionalities
- G06F17/3015—Redundancy elimination performed by the file system
- G06F17/30153—Redundancy elimination performed by the file system using compression, e.g. sparse files
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30067—File systems; File servers
- G06F17/30129—Details of further file system functionalities
- G06F17/3015—Redundancy elimination performed by the file system
- G06F17/30156—De-duplication implemented within the file system, e.g. based on file segments
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30289—Database design, administration or maintenance
- G06F17/30303—Improving data quality; Data cleansing
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30312—Storage and indexing structures; Management thereof
- G06F17/30321—Indexing structures
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30587—Details of specialised database models
 
- 
        - H—ELECTRICITY
- H03—BASIC ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same information or similar information or a subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3084—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method
- H03M7/3086—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method employing a sliding window, e.g. LZ77
 
- 
        - H—ELECTRICITY
- H03—BASIC ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same information or similar information or a subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/40—Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
- H03M7/4031—Fixed length to variable length coding
- H03M7/4037—Prefix coding
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
 
- 
        - H—ELECTRICITY
- H03—BASIC ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same information or similar information or a subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
- H03M7/3082—Vector coding
 
- 
        - G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| Ilkhechi et al. | Deepsqueeze: Deep semantic compression for tabular data | |
| US10901948B2 (en) | Query predicate evaluation and computation for hierarchically compressed data | |
| CN107210753B (en) | Lossless reduction of data by deriving data from prime data units residing in a content association filter | |
| Zender | Bit Grooming: statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4. 4.8+) | |
| Bhattacharjee et al. | Comparison study of lossless data compression algorithms for text data | |
| Feng et al. | MLC: An efficient multi-level log compression method for cloud backup systems | |
| Shun et al. | Practical parallel lempel-ziv factorization | |
| Gao et al. | Squish: Near-optimal compression for archival of relational datasets | |
| JP2018524886A (en) | Perform multi-dimensional search, content associative retrieval, and keyword-based retrieval and retrieval for lossless data using basic data sheaves | |
| Yu et al. | Unlocking the Power of Numbers: Log Compression via Numeric Token Parsing | |
| Niemi et al. | Burrows‐Wheeler post‐transformation with effective clustering and interpolative coding | |
| Talasila et al. | Generalized deduplication: Lossless compression by clustering similar data | |
| US20230367752A1 (en) | Systems and methods for processing timeseries data | |
| Oswald et al. | An efficient text compression algorithm-data mining perspective | |
| Hishida et al. | Beyond Compression: A Comprehensive Evaluation of Lossless Floating-Point Compression | |
| Zavadskyi et al. | Binary mixed-digit data compression codes | |
| Oswald et al. | Text and image compression based on data mining perspective | |
| Oswald et al. | Hash based frequent pattern mining approach to text compression | |
| Mahmoudi et al. | Comparison of Compression Algorithms in text data for Data Mining | |
| Collet et al. | OpenZL: A Graph-Based Model for Compression | |
| Das | ByteZip: Efficient Lossless Compression for Structured Byte Streams Using DNNs | |
| Tenhunen | Scientific Methods in Relational Database Compression Research | |
| Persson et al. | Compression Selection for Columnar Data using Machine-Learning and Feature Engineering | |
| Shapira | Compressed transitive delta encoding | |
| Ferragina et al. | Compressibility Measures and Succinct Data Structures for Piecewise Linear Approximations |