[go: up one dir, main page]

Bachteler et al., 2013 - Google Patents

Similarity filtering with multibit trees for record linkage

Bachteler et al., 2013

View PDF
Document ID
4353362606844105045
Author
Bachteler T
Reiher J
Schnell R
Publication year
Publication venue
German Record Linkage Center, Working Paper Series, No. WP-GRLC-2013-01

External Links

Snippet

Record linkage is the process of identifying pairs of records that refer to the same real-world object within or across data files. Basically, each record pair is compared with a similarity function and then classified in supposedly matching and non-matching pairs. However, if …
Continue reading at papers.ssrn.com (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30613Indexing
    • G06F17/30619Indexing indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30634Querying
    • G06F17/30657Query processing
    • G06F17/30675Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30312Storage and indexing structures; Management thereof
    • G06F17/30321Indexing structures
    • G06F17/3033Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30943Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
    • G06F17/30946Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30943Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
    • G06F17/30964Querying
    • G06F17/30979Query processing
    • G06F17/30985Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30386Retrieval requests
    • G06F17/30424Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30289Database design, administration or maintenance
    • G06F17/30303Improving data quality; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30861Retrieval from the Internet, e.g. browsers
    • G06F17/30864Retrieval from the Internet, e.g. browsers by querying, e.g. search engines or meta-search engines, crawling techniques, push systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3074Audio data retrieval
    • G06F17/30778Audio database index structures and management thereof

Similar Documents

Publication Publication Date Title
Bachteler et al. Similarity filtering with multibit trees for record linkage
US7996369B2 (en) Method and apparatus for improving performance of approximate string queries using variable length high-quality grams
US7860853B2 (en) Document matching engine using asymmetric signature generation
CN100557606C (en) Method and apparatus for finding a string
US7747642B2 (en) Matching engine for querying relevant documents
US7516130B2 (en) Matching engine with signature generation
Nishimoto et al. Optimal-time queries on BWT-runs compressed indexes
JP3149337B2 (en) Method and system for data compression using a system-generated dictionary
US8171029B2 (en) Automatic generation of ontologies using word affinities
US8244767B2 (en) Composite locality sensitive hash based processing of documents
US20110087668A1 (en) Clustering of near-duplicate documents
US8266150B1 (en) Scalable document signature search engine
Fu et al. Privacy-preserving smart similarity search based on simhash over encrypted data in cloud computing
KR20070049664A (en) Multi-stage quality processing system and method for use with token space storage
US20120124060A1 (en) Method and system of identifying adjacency data, method and system of generating a dataset for mapping adjacency data, and an adjacency data set
WO2008073820A1 (en) Identifying relationships among database records
Raza et al. Accelerating pattern-based time series classification: a linear time and space string mining approach
US8140546B2 (en) Computer system for performing aggregation of tree-structured data, and method and computer program product therefor
Navarro Indexing highly repetitive string collections
US10885121B2 (en) Fast filtering for similarity searches on indexed data
Flor A fast and flexible architecture for very large word n-gram datasets
CN101248433A (en) Matching engine with signature generation and correlation detection
Zhang et al. Effective and Fast Near Duplicate Detection via Signature‐Based Compression Metrics
Transier et al. Compressed inverted indexes for in-memory search engines
Zhang Transform based and search aware text compression schemes and compressed domain text retrieval