Bachteler et al., 2013 - Google Patents
Similarity filtering with multibit trees for record linkageBachteler et al., 2013
View PDF- Document ID
- 4353362606844105045
- Author
- Bachteler T
- Reiher J
- Schnell R
- Publication year
- Publication venue
- German Record Linkage Center, Working Paper Series, No. WP-GRLC-2013-01
External Links
Snippet
Record linkage is the process of identifying pairs of records that refer to the same real-world object within or across data files. Basically, each record pair is compared with a similarity function and then classified in supposedly matching and non-matching pairs. However, if …
- 238000001914 filtration 0 title description 18
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
- G06F17/30619—Indexing indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/30675—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30312—Storage and indexing structures; Management thereof
- G06F17/30321—Indexing structures
- G06F17/3033—Hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30946—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30964—Querying
- G06F17/30979—Query processing
- G06F17/30985—Query processing by using string matching techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30289—Database design, administration or maintenance
- G06F17/30303—Improving data quality; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
- G06F17/30864—Retrieval from the Internet, e.g. browsers by querying, e.g. search engines or meta-search engines, crawling techniques, push systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3074—Audio data retrieval
- G06F17/30778—Audio database index structures and management thereof
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Bachteler et al. | Similarity filtering with multibit trees for record linkage | |
| US7996369B2 (en) | Method and apparatus for improving performance of approximate string queries using variable length high-quality grams | |
| US7860853B2 (en) | Document matching engine using asymmetric signature generation | |
| CN100557606C (en) | Method and apparatus for finding a string | |
| US7747642B2 (en) | Matching engine for querying relevant documents | |
| US7516130B2 (en) | Matching engine with signature generation | |
| Nishimoto et al. | Optimal-time queries on BWT-runs compressed indexes | |
| JP3149337B2 (en) | Method and system for data compression using a system-generated dictionary | |
| US8171029B2 (en) | Automatic generation of ontologies using word affinities | |
| US8244767B2 (en) | Composite locality sensitive hash based processing of documents | |
| US20110087668A1 (en) | Clustering of near-duplicate documents | |
| US8266150B1 (en) | Scalable document signature search engine | |
| Fu et al. | Privacy-preserving smart similarity search based on simhash over encrypted data in cloud computing | |
| KR20070049664A (en) | Multi-stage quality processing system and method for use with token space storage | |
| US20120124060A1 (en) | Method and system of identifying adjacency data, method and system of generating a dataset for mapping adjacency data, and an adjacency data set | |
| WO2008073820A1 (en) | Identifying relationships among database records | |
| Raza et al. | Accelerating pattern-based time series classification: a linear time and space string mining approach | |
| US8140546B2 (en) | Computer system for performing aggregation of tree-structured data, and method and computer program product therefor | |
| Navarro | Indexing highly repetitive string collections | |
| US10885121B2 (en) | Fast filtering for similarity searches on indexed data | |
| Flor | A fast and flexible architecture for very large word n-gram datasets | |
| CN101248433A (en) | Matching engine with signature generation and correlation detection | |
| Zhang et al. | Effective and Fast Near Duplicate Detection via Signature‐Based Compression Metrics | |
| Transier et al. | Compressed inverted indexes for in-memory search engines | |
| Zhang | Transform based and search aware text compression schemes and compressed domain text retrieval |