Ah-Pine et al., 2016 - Google Patents
Similarity based hierarchical clustering with an application to text collectionsAh-Pine et al., 2016
View PDF- Document ID
- 2932362250936166665
- Author
- Ah-Pine J
- Wang X
- Publication year
- Publication venue
- International Symposium on Intelligent Data Analysis
External Links
Snippet
Lance-Williams formula is a framework that unifies seven schemes of agglomerative hierarchical clustering. In this paper, we establish a new expression of this formula using cosine similarities instead of distances. We state conditions under which the new formula is …
- 239000011159 matrix material 0 abstract description 36
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/3071—Clustering or classification including class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
- G06F17/30619—Indexing indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/30707—Clustering or classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30312—Storage and indexing structures; Management thereof
- G06F17/30321—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
- G06F17/30864—Retrieval from the Internet, e.g. browsers by querying, e.g. search engines or meta-search engines, crawling techniques, push systems
- G06F17/30867—Retrieval from the Internet, e.g. browsers by querying, e.g. search engines or meta-search engines, crawling techniques, push systems with filtering and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30587—Details of specialised database models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30067—File systems; File servers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Qian et al. | Efficient distance metric learning by adaptive sampling and mini-batch stochastic gradient descent (SGD) | |
| Yang et al. | A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization | |
| Jiang et al. | An improved K-nearest-neighbor algorithm for text categorization | |
| Hussain et al. | Multi-view document clustering via ensemble method | |
| Bikku et al. | A contemporary feature selection and classification framework for imbalanced biomedical datasets | |
| Nguyen et al. | Supervised term weighting centroid-based classifiers for text categorization | |
| Hussain et al. | Co-clustering of multi-view datasets | |
| Ah-Pine et al. | Similarity based hierarchical clustering with an application to text collections | |
| Lattanzi et al. | A framework for parallelizing hierarchical clustering methods | |
| Aljedani et al. | Multi-label Arabic text classification: an overview | |
| Wong et al. | Feature selection and feature extraction: highlights | |
| Fang et al. | Fast training for large-scale one-versus-all linear classifiers using tree-structured initialization | |
| Leung et al. | Finding efficiencies in frequent pattern mining from big uncertain data | |
| Erenel et al. | Improving the precision-recall trade-off in undersampling-based binary text categorization using unanimity rule | |
| Pang et al. | Parallel multi-graph classification using extreme learning machine and MapReduce | |
| Erdinç et al. | MCMSTStream: applying minimum spanning tree to KD-tree-based micro-clusters to define arbitrary-shaped clusters in streaming data | |
| Čech et al. | Comparing MapReduce-based k-NN similarity joins on Hadoop for high-dimensional data | |
| Ukey et al. | Efficient continuous kNN join over dynamic high-dimensional data | |
| Li et al. | Mining association rules based on deep pruning strategies | |
| Magliani et al. | LSH kNN graph for diffusion on image retrieval | |
| Yu et al. | A classifier chain algorithm with k-means for multi-label classification on clouds | |
| Gupta et al. | Feature selection: an overview | |
| De Vries et al. | Parallel streaming signature em-tree: A clustering algorithm for web scale applications | |
| Ding et al. | A framework for distributed nearest neighbor classification using Hadoop | |
| Akila et al. | Executing the Apriori Hybrid Algorithm in Semi-structured Mining Datasets and Comparison with HD Algorithm |