[go: up one dir, main page]

CN110807101A - Scientific and technical literature big data classification method - Google Patents

Scientific and technical literature big data classification method Download PDF

Info

Publication number
CN110807101A
CN110807101A CN201911066136.1A CN201911066136A CN110807101A CN 110807101 A CN110807101 A CN 110807101A CN 201911066136 A CN201911066136 A CN 201911066136A CN 110807101 A CN110807101 A CN 110807101A
Authority
CN
China
Prior art keywords
classification
documents
sentences
keywords
topological relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911066136.1A
Other languages
Chinese (zh)
Inventor
张晓丹
梁冰
王莉
白海燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
INSTITUTE OF SCIENCE AND TECHNOLOGY INFORMATION OF CHINA
Original Assignee
INSTITUTE OF SCIENCE AND TECHNOLOGY INFORMATION OF CHINA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by INSTITUTE OF SCIENCE AND TECHNOLOGY INFORMATION OF CHINA filed Critical INSTITUTE OF SCIENCE AND TECHNOLOGY INFORMATION OF CHINA
Publication of CN110807101A publication Critical patent/CN110807101A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a scientific and technical literature big data classification method, belonging to the technical field of big data text mining; the method S1 includes the following steps: the graph consists of nodes and edges, wherein the nodes are documents, sentences and keywords in STKOS; the edges are documents and sentences, documents and keywords, sentences and sentences, sentences and keywords and the relations between keywords and keywords; s2, converting the topological relation graph into a topological relation matrix; s3, training the classification model by using the training data and the topological relation matrix constructed by the training data; s4, document classification: and inputting the batches of documents to be classified into the trained classification model to obtain the probability that the documents to be classified belong to different categories. Compared with the prior art, the topological relation graph constructed by the method has the advantages that the sentence considers the factor of the word order, the key words are terms indexed by experts, and the classification accuracy is improved; the classification model is adopted, repeated training is not needed, and sampling calculation is carried out on the input of each convolution layer, so that the classification efficiency is improved.

Description

Scientific and technical literature big data classification method
Technical Field
The invention relates to a classification method for big data of scientific and technical literature, in particular to a deep learning classification method for big data of scientific and technical literature, and belongs to the technical field of big data text mining. The invention provides a method for constructing a topological relation graph by documents, sentences and keywords and realizing document classification through a FASTGCN graph neural network model. The method can improve the accuracy and efficiency of classification of the scientific and technical literature big data.
Background
The big data mining of the scientific and technological literature is a hot problem in the research of the data mining field at present, and the key problem in the research of the field is how to accurately and efficiently classify the big data of the scientific and technological literature. Deep learning is a big data mining method emerging in recent years, and has made certain progress in solving classification of big data of documents. The currently common literature big data deep learning method comprises the following steps: word Embeddings, convolutional neural networks CNN and LSTM, etc., which have their respective limitations although they have achieved certain classification results. Although the Word Embeddings method is optimized and improved, the problem of processing continuity is limited, the CNN method can only solve the problem that input data conforms to a positive qualitative matrix, and the LSTM method has a better effect on the classification of short texts.
The graph neural network method is a new model for solving the graph classification developed in the last two years, is one of the hot points of the research in the field of deep learning at present, has the function of processing irregular matrixes, and makes up the limitation of a CNN model. The model carries out graph convolution operation on the constructed topological relation graph to obtain characteristics so as to realize classification. Has obtained good classification effect in the fields of visual discovery, machine translation and the like.
The input of the graph neural network is a topological relation graph, so different topological relation graphs can lead to different classification results. Therefore, the construction of the topological relation graph has great influence on the classification result. The existing graph neural network text classification method mainly comprises a topological relation graph constructed based on a text, a topological relation graph constructed based on a sentence and a topological relation graph constructed based on the text and extracted words, wherein the classification accuracy of the method of the topological relation graph constructed based on the text and the extracted words is high, but the GCN is a direct-push graph neural network model, so that the classification task with real-time requirement cannot be guaranteed because the GCN needs to be trained again during classification, and meanwhile, the method does not consider the word order problem during constructing the topological relation graph, so that the accuracy is slightly influenced. The invention provides a new solution mainly aiming at the problems of efficiency and accuracy of the model.
Disclosure of Invention
The invention aims to provide a classification method of a graph neural network for solving the problems of accuracy and efficiency of classification of big data of scientific and technical literature.
The invention is realized by the following technical scheme.
A method for classifying scientific and technical literature big data comprises the following steps:
step 1, constructing a topological relation graph:
the topological relation graph is composed of nodes and edges, wherein the nodes are respectively as follows: documents, sentences, and keywords; the system comprises a document node, a sentence node and a keyword node, wherein the document node consists of a title, a document keyword and an abstract of a document, the sentence node is a sentence with a language order characteristic extracted from the abstract of the document, the keyword node is a term in STKOS, and the STKOS is a super dictionary developed by a national book document center;
preferably, the sentence extraction algorithm employs an LSTM method.
Edges are relationships between nodes, and are respectively: documents and sentences, documents and keywords, sentences and sentences, sentences and keywords;
preferably, the relation between the literature and the sentence is described by using the similarity after word2 vec; the relationship between documents and keywords is described using TFIDF; the relation between sentences is described by the similarity after the sentences word2 vec; the relation between the sentence and the keyword is described by CHI; the keywords and the relations between the keywords are described by PMI.
Step 2, converting the topological relation graph into a topological matrix;
the topological matrix is a two-dimensional matrix, and the matrix vectors are documents, sentences and keywords respectively; the matrix nodes are the relation values among the vectors;
step 3, training the classification model by using the training data and the topological relation matrix constructed in the step 1 and the step 2 based on the training data to obtain a trained classification model;
preferably, the classification model adopts a FASTGCN model, and the convolution layer is 3 layers; activation function selection RELU; selecting a SOFTMAX function by the classification function; and selecting a cross entropy function as an error function, comparing a model classification result with an input document classification with a label to obtain an error, and training model parameters by reversely transmitting the error by adopting a gradient descent method until the error is in a preset threshold range.
Preferably, in order to improve efficiency, data input to each convolutional layer is sampled and input.
Preferably, the markov algorithm is selected for sampling.
And 4, classifying the documents to be classified: step 1 is adopted to construct a topological relation graph of a batch of documents to be classified, step 2 is adopted to convert the topological relation graph into a matrix, the matrix and the documents to be classified are input into a FASTGCN model trained in step 3 to be classified, the probabilities of the documents to be classified belonging to different classes are obtained, and the maximum probability corresponding to the class is selected as the document classification.
Advantageous effects
In order to improve the classification accuracy, a topological relation graph is constructed by using scientific and technical documents, sentences and keywords, wherein the extracted sentences take the factors of word order into consideration and make up the defects of the text GCN, and the keywords are terms indexed by experts in a scientific and technical knowledge organization system STKOS (knowledge organization system construction thought facing foreign scientific and technical document information, Sun Tan, Liu Wu, books and information, 2013.1.(1)) developed by the national book literature center. In order to improve the classification efficiency, the FASTGCN classification model is adopted, so that the defect of repeated training of the GCN model can be overcome, and meanwhile, the input of each convolution layer is sampled and calculated, so that the classification efficiency can be greatly improved. Therefore, the topological relation graph and the FASTGCN classification model constructed by the method can realize accurate and efficient classification effect.
Drawings
FIG. 1 is a topological relation diagram of scientific and technical literature constructed by the invention.
FIG. 2 is a schematic diagram of a scientific and technical literature classification model constructed by the present invention.
Fig. 3 is a flowchart illustrating a scientific and technical literature classification method according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings and embodiments, and technical problems and advantages solved by the technical solutions of the present invention will be described, wherein the described embodiments are only intended to facilitate understanding of the present invention, and do not limit the present invention in any way.
Examples
As an implementation of the object of the present invention, as shown in fig. 3, a process of a scientific and technical literature big data classification method is as follows:
1) constructing a document big data classification topological relation graph
The topological relation graph consists of nodes and edges, and the graph is represented as G ═ V, E, wherein V is a set of nodes, and E is a set of relations.
As shown in fig. 1, the nodes are divided into three classes, which are composed of circles with different sizes, and are divided into documents, sentences and keyword nodes according to the different sizes of the circles. The document nodes consist of titles, keywords and abstracts of documents; the sentence nodes are sentences with language order characteristics extracted from the document abstract, and there are many methods for extracting sentences with language order characteristics from the document abstract, such as a naive Bayes method, a maximum entropy method, and the like, in which an LSTM method is adopted. The keyword nodes adopt terms in STKOS, which is a super dictionary developed by the national book literature center.
Edges are different relationships among nodes, and are divided into five types according to the difference of the relationships: documents and keywords, keywords and sentences, sentences and documents, and sentences.
Then: v-nodes (documents, sentences, keywords), and E-relationships (documents and keywords, keywords and sentences, documents and sentences, sentences and sentences).
There are many algorithms available to describe the above relationship, such as: inter-point mutual information PMI, TF-IDF (term-inverse document frequency), mutual information MI (mutual information), CHI (Chi-square), sen2vec, word2vec, and the like.
The PMI is mainly used for calculating semantic similarity between words, the basic idea is to count the probability of simultaneous occurrence of two words in a text, if the probability is higher, the correlation is tighter, and the correlation is higher; the PMI value calculation for the two words word1 and word2 is shown as follows:
Figure BDA0002259411860000051
wherein P (-) represents the probability of occurrence in a document;
TF-IDF is a commonly used weighting technique for information searching and information mining. Has wide application in searching, document classification and other relevant fields. The main idea of TF-IDF is that if a word or phrase occurs in an article with a high frequency TF and rarely occurs in other articles, the word or phrase is considered to have a good class distinction capability and is suitable for classification. TF Term Frequency (Term Frequency) refers to the number of times a given Term appears in the document. The main idea of IDF Inverse Document Frequency (Inverse Document Frequency) is: if the number of documents containing a given term is less and the IDF is larger, the term is proved to have good category distinguishing capability. See (TFIDF algorithm research review, computer application, small warrior xu army yangxiang. 2009,29(z 1)).
MI is used to measure the amount of information that is characteristic words directly with document classes (zhangfeng, xuxin. machine learning based text classification technology research advances [ J ] software bulletin, 2006, (9)).
The CHI feature selection algorithm takes advantage of the basic idea of "hypothesis testing" in statistics: first, it is assumed that the feature words are directly irrelevant to the category, if the test value calculated using the CHI distribution deviates more from the threshold, then the original hypothesis is denied with more confidence, and the alternative hypothesis that accepts the original hypothesis: the characteristic words have high association degree with the categories. The specific contents are shown in (comparative research of a feature extraction method in Chinese text classification, which is a substitute for six lingering yellow river swallow-aged tasking, Chinese information report 2004,18 (1)).
Preferably, the following algorithm is specifically used in the present embodiment to describe the relationship between different nodes:
let E's relation value be y, let x1, x2 be neighboring nodes, and according to different relations, y is respectively expressed as:
y=PMI(x1,x2) Similarity between keywords;
y=CHI(x1,x2) Similarity between keywords and sentences;
COS (word2vec (x1,) word2vec (x2)) similarity between sentences, similarity between sentences and documents;
y is TFIDF, the relevance of the keyword to the literature;
the literature is composed of titles, keywords and abstracts. The sentences are obtained by extracting the document abstract through an LSTM (Long short-term memory) model and contain the language order relation. The keywords are derived from the keywords of the terminology layer in the super dictionary STKOS developed by the national book literature center.
2) Construction of FASTGCN Classification model
The classification model consists of a convolutional layer, a full-link layer and a classification layer, scientific and technical documents learn through the convolutional layer to obtain document characteristics, the document characteristics are input into the full-link layer, and then the classification layer is divided to obtain a final classification result.
As shown in fig. 2, the document topological relation diagram is converted into a matrix and input to the classification model for classification, so that the category (extracted from the category layer of the STKOS), i.e., the category, to which each document belongs can be obtained.
The FASTGCN model cannot directly recognize the topological relation diagram of the document, and therefore, before inputting the topological relation diagram into the FASTGCN model, the FASTGCN model needs to be converted into a matrix that can be recognized by the FASTGCN model.
And converting the topological relation matrix into a topological relation matrix according to the constructed topological structure diagram and inputting the topological relation matrix into the classification model. The topological relation matrix is a set of relations between nodes, i.e., documents, keywords, and sentences, and the relations are relationships between documents and keywords, keywords and sentences, sentences and documents, and sentences.
According to the relation value Y, a relation matrix is constructed, wherein the set Y is { PMI (x)1,x2) (relationship between keywords), CHI (x)1,x2) (keyword to sentence relationship), COS (word2vec (x1,) word2vec (x2)) (sentence to sentence relationship, sentence to document relationship), TFIDF (x2)1,x2) (relationship between keywords and documents) }, namely, the columns and rows are respectively documents, sentences and keywords which are arranged in sequence, the values of different columns and rows are the values of the relationship between the corresponding specific documents, sentences or keywords and the documents, sentences or keywords, the value at the diagonal position in the matrix is 1, and the values of the elements of the matrix corresponding to all the documents are set to be 0 because there is no relationship between the documents.
A data set for training was prepared, which was extracted 85% as training data and the remaining 15% as test data. Documents and labels of the documents in the training data, documents and labels of the documents in the test data, and a relationship matrix constructed from a training-based dataset are trained as inputs to the FASTGCN classification model. For the FASTGCN classification model used in the embodiment, the RELU function is selected as the activation function, so that the problem of data transmission between convolution layers can be quickly and accurately solved; selecting a SOFTMAX function by the classification function; in order to converge the neural network to the error interval, the error function selects a cross entropy function. And comparing the model classification result (category 1 … … category n in STKOS) with the input labeled document classification to obtain an error, and then training the model parameters by a gradient descent method to reversely transmit the error until the error is in a preset threshold range to obtain the trained FASTGCN model.
Preferably, to improve efficiency, the data input to each convolutional layer is sampled first to reduce the amount of data. The sampling may be any sampling algorithm, such as Box-Muller algorithm, monte carlo and markov algorithm, etc., and in this embodiment, the markov algorithm Gibbs (decomposable markov network structure learning with missing data, cyber newspaper, wang shuangyuan vast, 2004,27(9)) is used.
3) Classification process
The classification flow is shown in fig. 3.
And inputting documents to be classified in batches. The topological relation graph of the batch of documents is constructed, the topological relation graph is converted into a topological matrix, and all documents with the classification are input into the trained FASTGCN classification model to obtain the classification of each document.
In order to improve the classification efficiency, the invention adopts a FASTGCN model to realize the document big data classification task. The model is an inductive graph neural network model and classifies nodes on a topological relation graph. After the classification model is trained, the classification model does not need to be retrained when formal classification is carried out. Thus, the efficiency is high, but the accuracy is low. In order to solve the problem of accuracy, the topological relation graph constructed by the invention adopts the existing keywords, sentences with language order relation and documents as nodes. The keyword nodes adopt the existing terms in a super dictionary (STKOS for short) developed by the national book literature center as the keyword nodes of the topological relational graph, so that the accuracy of the keyword nodes can be ensured. The sentence nodes are extracted from the abstract of the document and have the sentence with the language order characteristic, so that the topological relation graph can contain the factor of the language order. The document nodes are texts composed of titles, keywords and abstracts of scientific documents. Feature extraction and classification of scientific and technical literature big data are realized through FASTGCN (classification model is shown in figure 2). Therefore, the classification accuracy can be guaranteed to the maximum extent, and the classification efficiency can be guaranteed. In order to further improve the classification efficiency, the method samples the input layer and the convolution layer of FASTGCN respectively.
Results of the experiment
The data used in this experiment are from the real data of the national book literature center, where the categories are five, respectively: general theory of social science, military, medical and health, industrial science and technology, aerospace and the like. The data in each category is 20000 bars. Wherein the training data is 15000 pieces of data, and the testing data is 5000 pieces of data. The literature data format is the TXT format. The models participating in the experiment are respectively a text GCN, a text FASTGCN and the model provided by the invention.
The topological relation graphs adopted by the text GCNCCN and the text FASTGCN are documents and keywords extracted from the documents, and the methods of the invention adopt the documents, sentences extracted from the documents and STKOS existing keywords. The loss function selects a cross entropy function, the classification function selects a SoftMax function, and the activation function selects a ReLU function.
Through experiments, the classification results are shown in table 1.
TABLE 1 comparison of classification results for classification models
Figure BDA0002259411860000091
The experimental parameters were verified by selecting accuracy (precision), recall (recall), f1 value and classification time. As can be seen from the table, the text FASTGCN has the lowest classification accuracy, which can reach 0.5586 at the highest, but the shortest classification time.
The accuracy of the text GCN can reach 0.9030 at the highest, but the classification time is longest.
The method provided by the invention has the highest accuracy rate of 0.9471, and the classification time is more than that of the text GCN and lower than that of the text FASTGCN; the method provided by the invention has the best comprehensive efficiency and accuracy performance.
The above detailed description is intended to illustrate the objects, aspects and advantages of the present invention, and it should be understood that the above detailed description is only exemplary of the present invention and is not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (6)

1. A scientific and technical literature big data classification method is characterized in that: the method comprises the following steps:
s1, constructing a topological relation graph: the topological relation graph is composed of nodes and edges, wherein the nodes are respectively as follows: documents, sentences, and keywords; the system comprises a document node, a sentence node and a keyword node, wherein the document node consists of a title, a document keyword and an abstract of a document, the sentence node is a sentence with a language order characteristic extracted from the abstract of the document, the keyword node is a term in STKOS, and the STKOS is a super dictionary developed by a national book document center; edges are relationships between nodes, and are respectively: documents and sentences, documents and keywords, sentences and sentences, sentences and keywords;
s2, converting the topological relation graph into a topological relation matrix;
s3, training the classification model by using the training data and the topological relation matrix constructed through S1 and S2 based on the training data to obtain a trained classification model;
s4, classifying documents to be classified: step 1 is adopted to construct a topological relation graph of the batch of documents to be classified, step 2 is adopted to convert the topological relation graph into a matrix, the matrix and the documents to be classified are input into a classification model trained in step 3 to be classified, the probabilities of the documents to be classified belonging to different classes are obtained, and the maximum probability corresponding to the class is selected as the document classification.
2. The method of claim 1, wherein: the sentence extraction algorithm uses the LSTM method.
3. The method of claim 1, wherein: the relation between the literature and the sentence is described by the similarity after word2 vec; the relationship between documents and keywords is described using TFIDF; the sentence and the relation between the sentences are described by the similarity after word2 vec; the relation between the sentence and the keyword is described by CHI; the keywords and the relations between the keywords are described by PMI.
4. The method of claim 1, wherein: the classification model adopts a FASTGCN model, and the convolution layer is 3 layers; activation function selection RELU; selecting a SOFTMAX function by the classification function; and selecting a cross entropy function as an error function, comparing a model classification result with an input document classification with a label to obtain an error, and training model parameters by reversely transmitting the error by adopting a gradient descent method until the error is in a preset threshold range.
5. The method according to any one of claims 1 to 4, wherein: in order to improve efficiency, the data input to each convolutional layer is sampled and then input.
6. The method of claim 5, wherein: and the Markov algorithm is selected for sampling.
CN201911066136.1A 2019-10-15 2019-11-04 Scientific and technical literature big data classification method Pending CN110807101A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910975978 2019-10-15
CN2019109759782 2019-10-15

Publications (1)

Publication Number Publication Date
CN110807101A true CN110807101A (en) 2020-02-18

Family

ID=69501051

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911066136.1A Pending CN110807101A (en) 2019-10-15 2019-11-04 Scientific and technical literature big data classification method

Country Status (1)

Country Link
CN (1) CN110807101A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814842A (en) * 2020-06-17 2020-10-23 北京邮电大学 Object classification method and device based on multi-channel graph convolutional neural network
CN112163069A (en) * 2020-09-27 2021-01-01 广东工业大学 Text classification method based on graph neural network node feature propagation optimization
CN112231476A (en) * 2020-10-14 2021-01-15 中国科学技术信息研究所 Improved graph neural network scientific and technical literature big data classification method
CN112380345A (en) * 2020-11-20 2021-02-19 山东省计算中心(国家超级计算济南中心) COVID-19 scientific literature fine-grained classification method based on GNN
CN112434134A (en) * 2020-12-04 2021-03-02 中国科学院深圳先进技术研究院 Search model training method and device, terminal equipment and storage medium
CN113536508A (en) * 2021-07-30 2021-10-22 齐鲁工业大学 Method and system for classifying manufacturing network nodes
CN114511027A (en) * 2022-01-29 2022-05-17 重庆工业职业技术学院 Method for extracting English remote data through big data network
CN114860937A (en) * 2022-05-17 2022-08-05 海南大学 Sentence classification method and system based on Chinese bionic document abstract
WO2022193627A1 (en) * 2021-03-15 2022-09-22 华南理工大学 Markov chain model-based paper collective classification method and system, and medium
CN116304110A (en) * 2023-03-30 2023-06-23 重庆工业职业技术学院 Working method for constructing knowledge graph by using English vocabulary data
CN119227012A (en) * 2024-12-03 2024-12-31 南京信息工程大学 A quantitative method for correlation degree of literature research fields based on multi-dimensional feature fusion

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160078038A1 (en) * 2014-09-11 2016-03-17 Sameep Navin Solanki Extraction of snippet descriptions using classification taxonomies
US20170140030A1 (en) * 2008-03-05 2017-05-18 Kofax, Inc. Systems and methods for organizing data sets
CN109977223A (en) * 2019-03-06 2019-07-05 中南大学 A method of the figure convolutional network of fusion capsule mechanism classifies to paper

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170140030A1 (en) * 2008-03-05 2017-05-18 Kofax, Inc. Systems and methods for organizing data sets
US20160078038A1 (en) * 2014-09-11 2016-03-17 Sameep Navin Solanki Extraction of snippet descriptions using classification taxonomies
CN109977223A (en) * 2019-03-06 2019-07-05 中南大学 A method of the figure convolutional network of fusion capsule mechanism classifies to paper

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JIE CHEN 等: "《FastGCN: fast learning with graph convolutional networks via importance sampling》", 《ARXIV》 *
LIANG YAO 等: "《Graph Convolutional Networks for Text Classification》", 《ARXIV》 *
朱翔: "《基于分布式表示的文本分类与自动摘要方法研究》", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814842B (en) * 2020-06-17 2023-11-03 北京邮电大学 Object classification method and device based on multichannel graph convolution neural network
CN111814842A (en) * 2020-06-17 2020-10-23 北京邮电大学 Object classification method and device based on multi-channel graph convolutional neural network
CN112163069A (en) * 2020-09-27 2021-01-01 广东工业大学 Text classification method based on graph neural network node feature propagation optimization
CN112163069B (en) * 2020-09-27 2024-04-12 广东工业大学 Text classification method based on graph neural network node characteristic propagation optimization
CN112231476B (en) * 2020-10-14 2023-06-06 中国科学技术信息研究所 Improved graphic neural network scientific literature big data classification method
CN112231476A (en) * 2020-10-14 2021-01-15 中国科学技术信息研究所 Improved graph neural network scientific and technical literature big data classification method
CN112380345A (en) * 2020-11-20 2021-02-19 山东省计算中心(国家超级计算济南中心) COVID-19 scientific literature fine-grained classification method based on GNN
CN112380345B (en) * 2020-11-20 2022-03-29 山东省计算中心(国家超级计算济南中心) A GNN-based method for fine-grained classification of COVID-19 scientific literature
CN112434134A (en) * 2020-12-04 2021-03-02 中国科学院深圳先进技术研究院 Search model training method and device, terminal equipment and storage medium
WO2022116324A1 (en) * 2020-12-04 2022-06-09 中国科学院深圳先进技术研究院 Search model training method, apparatus, terminal device, and storage medium
CN112434134B (en) * 2020-12-04 2023-10-20 中国科学院深圳先进技术研究院 Search model training method, device, terminal equipment and storage medium
WO2022193627A1 (en) * 2021-03-15 2022-09-22 华南理工大学 Markov chain model-based paper collective classification method and system, and medium
CN113536508B (en) * 2021-07-30 2023-11-21 齐鲁工业大学 A manufacturing network node classification method and system
CN113536508A (en) * 2021-07-30 2021-10-22 齐鲁工业大学 Method and system for classifying manufacturing network nodes
CN114511027B (en) * 2022-01-29 2022-11-11 重庆工业职业技术学院 English remote data extraction method through big data network
CN114511027A (en) * 2022-01-29 2022-05-17 重庆工业职业技术学院 Method for extracting English remote data through big data network
CN114860937A (en) * 2022-05-17 2022-08-05 海南大学 Sentence classification method and system based on Chinese bionic document abstract
CN114860937B (en) * 2022-05-17 2024-08-06 海南大学 A sentence classification method and system based on Chinese bionic literature abstracts
CN116304110A (en) * 2023-03-30 2023-06-23 重庆工业职业技术学院 Working method for constructing knowledge graph by using English vocabulary data
CN116304110B (en) * 2023-03-30 2023-09-08 重庆工业职业技术学院 Working methods for building knowledge graphs using English vocabulary data
CN119227012A (en) * 2024-12-03 2024-12-31 南京信息工程大学 A quantitative method for correlation degree of literature research fields based on multi-dimensional feature fusion

Similar Documents

Publication Publication Date Title
CN110807101A (en) Scientific and technical literature big data classification method
Abubakar et al. Sentiment classification: Review of text vectorization methods: Bag of words, Tf-Idf, Word2vec and Doc2vec
CN105183833B (en) A user model-based microblog text recommendation method and recommendation device
Etzioni et al. Open information extraction from the web
Adeleke et al. Comparative analysis of text classification algorithms for automated labelling of Quranic verses
CN111178053B (en) Text generation method for generating abstract extraction by combining semantics and text structure
Ju et al. An efficient method for document categorization based on word2vec and latent semantic analysis
Ali et al. Named entity recognition using deep learning: A review
Li et al. Stacking-based ensemble learning on low dimensional features for fake news detection
Monika et al. Machine learning approaches for sentiment analysis: A survey
Kandhro et al. Classification of Sindhi headline news documents based on TF-IDF text analysis scheme
Li et al. Combination of multiple feature selection methods for text categorization by using combinatorial fusion analysis and rank-score characteristic
CN112231476B (en) Improved graphic neural network scientific literature big data classification method
Khekare et al. Text normalization and summarization using advanced natural language processing
Sharma et al. Resume classification using elite bag-of-words approach
CN116756346A (en) An information retrieval method and device
Yelmen Multi-class document classification based on deep neural network and Word2Vec
Aalaa Abdulwahab et al. Documents classification based on deep learning
Alshammary et al. Evaluating The Impact of Feature Extraction Techniques on Arabic Reviews Classification
Alharithi Performance analysis of machine learning approaches in automatic classification of Arabic language
Liu et al. Chinese news text classification and its application based on combined-convolutional neural network
Yao et al. Method and dataset mining in scientific papers
Elnadree et al. Performance Investigation of Features Extraction and Classification Approaches for Sentiment Analysis Systems
Haque et al. Sentiment analysis in low-resource Bangla text using active learning
Nasrullah et al. Sentiment analysis in arabic language using machine learning: Iraqi dialect case study

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200218

WD01 Invention patent application deemed withdrawn after publication