[go: up one dir, main page]

Ah-Pine et al., 2016 - Google Patents

Similarity based hierarchical clustering with an application to text collections

Ah-Pine et al., 2016

View PDF
Document ID
2932362250936166665
Author
Ah-Pine J
Wang X
Publication year
Publication venue
International Symposium on Intelligent Data Analysis

External Links

Snippet

Lance-Williams formula is a framework that unifies seven schemes of agglomerative hierarchical clustering. In this paper, we establish a new expression of this formula using cosine similarities instead of distances. We state conditions under which the new formula is …
Continue reading at hal.science (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30705Clustering or classification
    • G06F17/3071Clustering or classification including class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30613Indexing
    • G06F17/30619Indexing indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30705Clustering or classification
    • G06F17/30707Clustering or classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30312Storage and indexing structures; Management thereof
    • G06F17/30321Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30861Retrieval from the Internet, e.g. browsers
    • G06F17/30864Retrieval from the Internet, e.g. browsers by querying, e.g. search engines or meta-search engines, crawling techniques, push systems
    • G06F17/30867Retrieval from the Internet, e.g. browsers by querying, e.g. search engines or meta-search engines, crawling techniques, push systems with filtering and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30634Querying
    • G06F17/30657Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30386Retrieval requests
    • G06F17/30424Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30587Details of specialised database models
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30067File systems; File servers
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • G06N99/005Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run

Similar Documents

Publication Publication Date Title
Qian et al. Efficient distance metric learning by adaptive sampling and mini-batch stochastic gradient descent (SGD)
Yang et al. A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization
Jiang et al. An improved K-nearest-neighbor algorithm for text categorization
Hussain et al. Multi-view document clustering via ensemble method
Bikku et al. A contemporary feature selection and classification framework for imbalanced biomedical datasets
Nguyen et al. Supervised term weighting centroid-based classifiers for text categorization
Hussain et al. Co-clustering of multi-view datasets
Ah-Pine et al. Similarity based hierarchical clustering with an application to text collections
Lattanzi et al. A framework for parallelizing hierarchical clustering methods
Aljedani et al. Multi-label Arabic text classification: an overview
Wong et al. Feature selection and feature extraction: highlights
Fang et al. Fast training for large-scale one-versus-all linear classifiers using tree-structured initialization
Leung et al. Finding efficiencies in frequent pattern mining from big uncertain data
Erenel et al. Improving the precision-recall trade-off in undersampling-based binary text categorization using unanimity rule
Pang et al. Parallel multi-graph classification using extreme learning machine and MapReduce
Erdinç et al. MCMSTStream: applying minimum spanning tree to KD-tree-based micro-clusters to define arbitrary-shaped clusters in streaming data
Čech et al. Comparing MapReduce-based k-NN similarity joins on Hadoop for high-dimensional data
Ukey et al. Efficient continuous kNN join over dynamic high-dimensional data
Li et al. Mining association rules based on deep pruning strategies
Magliani et al. LSH kNN graph for diffusion on image retrieval
Yu et al. A classifier chain algorithm with k-means for multi-label classification on clouds
Gupta et al. Feature selection: an overview
De Vries et al. Parallel streaming signature em-tree: A clustering algorithm for web scale applications
Ding et al. A framework for distributed nearest neighbor classification using Hadoop
Akila et al. Executing the Apriori Hybrid Algorithm in Semi-structured Mining Datasets and Comparison with HD Algorithm