[go: up one dir, main page]

CN113919333B - Text knowledge supplementation method and device based on knowledge graph - Google Patents

Text knowledge supplementation method and device based on knowledge graph Download PDF

Info

Publication number
CN113919333B
CN113919333B CN202111235816.9A CN202111235816A CN113919333B CN 113919333 B CN113919333 B CN 113919333B CN 202111235816 A CN202111235816 A CN 202111235816A CN 113919333 B CN113919333 B CN 113919333B
Authority
CN
China
Prior art keywords
vector
text
concept
weight
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111235816.9A
Other languages
Chinese (zh)
Other versions
CN113919333A (en
Inventor
吴天博
王健宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202111235816.9A priority Critical patent/CN113919333B/en
Publication of CN113919333A publication Critical patent/CN113919333A/en
Application granted granted Critical
Publication of CN113919333B publication Critical patent/CN113919333B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例提供了一种基于知识图谱的文本知识补充方法及装置,所述方法包括:获取并拼接文本的字符向量、词向量和主题向量,并将其输入到双向GRU得到隐藏状态;利用自注意力机制处理隐藏状态,得到特征矩阵,并通过池化层转为特征向量;调用知识图谱对文本进行概念化处理,得到包括概念向量的概念集合;计算概念向量与特征向量之间的关系权重;计算概念向量在概念集合中的重要性权重;利用重要性权重调整对应的关系权重,根据调整后的关系权重,对每个概念向量进行加权计算,得到概念集特征,以通过概念集特征对文本进行知识补充。本申请从字符粒度、词粒度和文本粒度层面对文本特征进行扩充,借助知识图谱,弥补文本上下文信息不足。

The embodiment of the present application provides a method and device for supplementing text knowledge based on a knowledge graph, the method comprising: obtaining and concatenating the character vector, word vector and topic vector of the text, and inputting them into a bidirectional GRU to obtain a hidden state; using a self-attention mechanism to process the hidden state, obtaining a feature matrix, and converting it into a feature vector through a pooling layer; calling the knowledge graph to conceptualize the text, obtaining a concept set including a concept vector; calculating the relationship weight between the concept vector and the feature vector; calculating the importance weight of the concept vector in the concept set; adjusting the corresponding relationship weight using the importance weight, and performing weighted calculation on each concept vector according to the adjusted relationship weight to obtain a concept set feature, so as to supplement the text with knowledge through the concept set feature. The present application expands the text features from the character granularity, word granularity and text granularity levels, and uses the knowledge graph to make up for the lack of text context information.

Description

Text knowledge supplementing method and device based on knowledge graph
Technical Field
The application relates to the field of natural language processing, in particular to a text knowledge supplementing method and device based on a knowledge graph.
Background
Text classification is a widely used natural language processing technique. By utilizing text classification, implicit information in massive texts in a network can be quickly mined, and the method is widely applied to the fields of information retrieval, question-answering systems, dialogue systems and the like.
The problems of difficult semantic understanding, sparse features and the like in text classification processing are caused by short text space, lack of context and low expression precision in a network. The feature sparsity problem of the text can be effectively relieved by utilizing the knowledge graph, but in the text knowledge supplementing method introducing external knowledge, the problem of low introduced knowledge quality still exists, so that the text lacks context information, and the semantic understanding effect is poor.
Disclosure of Invention
The application aims to solve the problems in the prior art to at least a certain extent, and provides a text knowledge supplementing method, a device, a computer device and a computer readable storage medium based on a knowledge graph, which can improve the quality of introduced knowledge so as to fully and effectively supplement the knowledge to a text.
The technical scheme of the embodiment of the application is as follows:
in a first aspect, the present application provides a text knowledge supplementing method based on a knowledge graph, the method comprising:
acquiring a character vector, a word vector and a theme vector of a text;
splicing the character vector, the word vector and the theme vector to obtain a word vector matrix;
Inputting the word vector matrix to a bi-directional gating loop unit (Gate Recurrent Unit, GRU) network to output a hidden state through the bi-directional GRU network;
processing the hidden state by using a self-attention mechanism to obtain a feature matrix of the text;
Invoking a knowledge graph to perform conceptual processing on the text to obtain a concept set comprising concept vectors;
inputting the feature matrix to a pooling layer to output feature vectors through the pooling layer;
calculating a relationship weight between the concept vector and the feature vector using an attention mechanism;
calculating importance weights of the concept vectors in the concept set using a self-attention mechanism;
and adjusting the corresponding relation weight by using the importance weight, and carrying out weighted calculation on each concept vector according to the adjusted relation weight to obtain concept set characteristics so as to carry out knowledge supplement on the text through the concept set characteristics.
According to some embodiments of the application, the formula for calculating the relationship weight is:
Wherein a i is a relation weight between a concept vector in the ith concept set and a feature vector q of the text, c i is the ith concept vector in the concept set, i is an integer greater than 1, and As a weight matrix, saidAs a weight vector, d a is a hyper-parameter and b 1 is a bias.
According to some embodiments of the application, the importance weight is calculated by the formula:
wherein β i is the importance weight of the concept vector in the i-th concept set in the concept set, c i is the i-th concept vector in the concept set, and As a weight matrix, saidAs a weight vector, d b is a hyper-parameter and b 2 is a bias.
According to some embodiments of the application, the calculation formula for adjusting the corresponding relation weight by using the importance weight is:
αi=softmax(γαi+(1-γ)βi)
Wherein, gamma is the self-adaptive coefficient based on the neural network, and gamma epsilon (0, 1).
According to some embodiments of the application, the method for obtaining the character vector of the text comprises the following steps:
And establishing a convolutional neural network (Convolutional Neural Networks, CNN) model taking the granularity of the characters as an input unit, and extracting character features of each word in the word sequence in the text through the CNN model to obtain a character vector.
According to some embodiments of the application, the method for obtaining the word vector of the text comprises the following steps:
and mapping words in the text into word vectors.
According to some embodiments of the application, the method for obtaining the topic vector of the text comprises the following steps:
Inputting the text into a word to vector (word 2 vec) model to obtain a plurality of word vectors corresponding to the text;
For each word vector, respectively inputting the word vector into a pre-trained implicit dirichlet Allocation (LDA) model so that the LDA model outputs a plurality of topics corresponding to the word vector and probability distribution values corresponding to the topics;
Selecting a topic with the largest probability distribution value from the topics as a target topic, and acquiring a topic word file corresponding to the target topic, wherein the topic word file comprises a plurality of topic words and probability values corresponding to the topic words;
Sorting the plurality of subject words according to the probability values from large to small, and selecting the first K subject words as target subject words;
acquiring the weight of each target subject word according to the probability value of each target subject word;
according to the weight of each target subject word, weighting operation is carried out on K target subject words, so that subject features corresponding to the word vectors are obtained;
And obtaining the topic vector corresponding to the text according to the topic feature corresponding to each word vector.
In a second aspect, the present application provides a text knowledge supplementing device based on a knowledge graph, including:
the vector acquisition module is used for acquiring character vectors, word vectors and theme vectors of the text;
the vector splicing module is used for splicing the character vector, the word vector and the theme vector to obtain a word vector matrix;
The hidden state acquisition module is used for inputting the word vector matrix into a bidirectional GRU network so as to output a hidden state through the bidirectional GRU network;
the hidden state processing module is used for processing the hidden state by using a self-attention mechanism to obtain a feature matrix of the text;
The text knowledge acquisition module is used for calling a knowledge graph to perform conceptual processing on the text to obtain a concept set comprising concept vectors;
The feature matrix processing module is used for inputting the feature matrix into a pooling layer so as to output feature vectors through the pooling layer;
A first weight acquisition module for calculating a relationship weight between the concept vector and the feature vector using an attention mechanism;
A second weight acquisition module for calculating importance weights of the concept vectors in the concept set using a self-attention mechanism;
and the text knowledge supplementing module is used for adjusting the corresponding relation weight by utilizing the importance weight, and carrying out weighted calculation on each concept vector according to the adjusted relation weight to obtain concept set characteristics so as to supplement knowledge to the text through the concept set characteristics.
In a third aspect, the present application provides a computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by one or more of the processors, cause the one or more processors to perform the steps of the method as described in any of the first aspects above.
In a fourth aspect, the present application also provides a computer readable storage medium readable and writable by a processor, the storage medium storing computer instructions which when executed by one or more processors cause the one or more processors to perform the steps of a method as described in any of the first aspects above.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
According to the embodiment of the application, the character vector, the word vector and the theme vector of the text are acquired, the word vector matrix is spliced and is used as the input of the bidirectional GRU network, so that the hidden state is output through the bidirectional GRU network, then the hidden state is processed by using a self-attention mechanism, the feature matrix of the text is acquired, and the knowledge graph is called to perform conceptual processing on the text, so that a concept set comprising the concept vector is acquired. In addition, the feature matrix is input into the pooling layer to output feature vectors through the pooling layer, the relationship weights between the concept vectors and the feature vectors are calculated by using an attention mechanism, the importance weights of the concept vectors in the concept set are calculated by using a self-attention mechanism, then the corresponding relationship weights are adjusted by using the importance weights, and each concept vector is weighted according to the adjusted relationship weights to obtain concept set features so as to supplement knowledge to the text through the concept set features. According to the text knowledge supplementing method, the text is modeled from the character granularity, the word granularity and the text granularity, the text characteristics are expanded, knowledge in the knowledge graph is introduced to supplement knowledge to the text by correlating the knowledge in the text with the knowledge graph, wherein the corresponding relation weight is adjusted by using the importance weight, the weight of key concept vectors is increased more reasonably, incorrect concepts introduced due to ambiguity or irrelevant noise of entities in the knowledge graph are reduced, the text is classified based on the text knowledge supplementing method, the introduced knowledge quality can be improved, and the knowledge is supplemented more fully and effectively, so that the text is classified more accurately.
Drawings
FIG. 1 is a flow chart of a method for supplementing text knowledge based on a knowledge graph according to an embodiment of the application;
FIG. 2 is a flow chart of a text knowledge supplement method based on a knowledge graph according to another embodiment of the application;
FIG. 3 is a schematic diagram of a text classification model provided by an embodiment of the application;
FIG. 4 is a schematic diagram of a text knowledge supplement apparatus based on a knowledge graph according to an embodiment of the application;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In the embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relation of association objects, and indicates that there may be three kinds of relations, for example, a and/or B, and may indicate that a alone exists, a and B together, and B alone exists. Wherein A, B may be singular or plural. The text "/" generally indicates that the context associated object is an "or" relationship. "at least one of the following" and the like means any combination of these items, including any combination of single or plural items. For example, at least one of a, b and c may represent a, b, c, a and b, a and c, b and c, or a and b and c, wherein a, b, c may be single or plural.
It should be appreciated that embodiments of the present application may acquire and process relevant data based on artificial intelligence techniques. Among these, artificial intelligence (ARTIFICIAL INTELLIGENCE, al) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Referring to fig. 1, fig. 1 shows a flow chart of a text knowledge supplementing method based on a knowledge graph according to an embodiment of the present application. It can be understood that the method of the embodiment of the application can be applied to a server, can also be applied to a terminal, can also be applied to a system comprising the terminal and the server, and can be realized through interaction of the terminal and the server. As shown in fig. 1, the method comprises the steps of:
In step S110, a character vector, a word vector, and a topic vector of the text are acquired.
In some embodiments, a CNN model with the granularity of characters as an input unit is established, and character feature extraction is performed on each word in the word sequence in the text through the CNN model to obtain a character vector.
Morphological information in word characters of text, such as prefixes or suffixes of words, can be extracted using convolutional neural networks, and additional information can be provided for words lacking word vectors by embedding the characters as an extension of the word vectors. Character vectors for words can be trained by using character-level word-embedded vectors concatenated with word vectors as input to a text coding model, e.g., by inputting a word sequence { x 1,x2,...,xi,...,xn},xi for a text to represent the i-th word in the text, where x i words contain characters of length L, c j for each character-embedded vector in word x i, each character representing a corresponding feature thereof, processing the sequence of characters in each word x i using a standard convolutional neural network
Where W CNN and b CNN are training parameters, ke represents the convolution kernel size, and max represents performing the max pooling operation.
In some embodiments, the method of obtaining the word vector of the text is as follows, mapping words in the text to word vectors.
By mapping words in the word sequence { x 1,x2,...,xi,...,xn } of the text one by one into word vectors: it will be appreciated that a word2vec model may be utilized to map each word in a word sequence to a word vector.
In some embodiments, referring to fig. 2, fig. 2 shows a flow chart of a text knowledge supplement method based on a knowledge graph, where the method for obtaining a topic vector of a text is as follows:
And step S210, inputting the text into a word2vec model to obtain a plurality of word vectors corresponding to the text.
And mapping each word in the text to a vector by using a word2vec model to obtain a word vector corresponding to each word of the text.
Step S220, for each word vector, respectively inputting the word vectors into a pre-trained LDA model so that the LDA model outputs a plurality of topics corresponding to the word vector and probability distribution values corresponding to the topics.
It can be understood that, by performing topic model training on a text training set by using an LDA model, a topic word file including X topics, Y topic words corresponding to the topics, and probability values is obtained, where X and Y are integers greater than 1, and by inputting word vectors of text words into the trained LDA model, a plurality of topics corresponding to the input word vectors and probability distributions corresponding to the topics are obtained.
For example, 500 news are selected as a text training set, firstly the file training set is cleaned, stop words, special characters other than Chinese are removed, then word segmentation is performed, each news is digitized, and finally the input is used for training an LDA model. The LDA is an unsupervised learning method, and only the number of topics is designated during training, for example, the number of topics is designated as 4, after the LDA model is trained, a topic word file including 4 topics and a plurality of topic words and probabilities under the 4 topics is obtained, for example, the topic word file includes (0, '' 0.08×foundation "+0.014×market" +0.013×company "+0.012×investment" +0.011×stock "+0.010×red" +, ") and the like, wherein 0 represents a first topic, and 0.08×foundation" +, "for example, and 0.08 represents a probability distribution value of the topic word" foundation "under the first topic. Then, word vectors of text words are input into the trained LDA model to obtain topic distribution results, such as [ (0,0.07), (1,0.46), (2,0.24), (3,0.23) ], taking (1,0.46) as an example, 1 represents a second topic, and 0.46 represents a probability value of the input word vector belonging to the topic.
Step S230, selecting a topic with the largest probability distribution value from the topics as a target topic, and acquiring a topic word file corresponding to the target topic, wherein the topic word file comprises a plurality of topic words and probability values corresponding to the topic words.
For example, when the topic probability distribution is [ (0,0.07), (1,0.46), (2,0.24), (3,0.23) ], it is known that the topic with the largest probability distribution value is the second topic, the second topic is taken as the target topic, and the topic word file including the target topic and Y topic words and probability values corresponding to the target topic is obtained.
Step S240, the plurality of subject words are ranked according to the probability values from large to small, and the first K subject words are selected as target subject words.
After a subject word file comprising a target subject, subject words corresponding to the subject and a probability value is acquired, sorting the subject words according to the probability value from large to small, and selecting the first K subject words as the target subject words, wherein K is an integer larger than 1.
Illustratively, a subject word file including a target subject and subject words corresponding to the target subject and a probability value is obtained, and subject words are ranked according to the probability value from large to small, for example, (0, '0.08 ' foundation ' 0.014 ' market ' 0.013 ' company ' 0.012 ' investment ' 0.011 ' stock ' 0.010 ' score ' + ' ') and the first 5 subject words are selected as target subject words, namely, foundation, market, investment, stock and score are selected as target subject words.
Step S250, according to the probability value of each target subject word, the weight of each target subject word is obtained.
The probability value { p 1,p2,...,pk } of each target subject word { r 1,r2,...,rk } is normalized as the weight of the target subject word.
Illustratively, when the probability value corresponding to k=5 target subject words is (0.08, 0.014, 0.013, 0.012, 0.011), the probability value normalization calculation can be performed using the following formula:
Wherein q i represents the normalized value of the probability value p i, i.e. the weight corresponding to the target subject term.
And step S260, carrying out weighting operation on K target subject words according to the weight of each target subject word to obtain subject features corresponding to the word vectors.
It can be appreciated that the subject feature of each subject word is obtained by multiplying the weight of the target subject word by the word vector corresponding to the target subject word.
Step S270, according to the topic features corresponding to each word vector, obtaining the topic vector corresponding to the text.
Exemplary, topic vectorThe calculation formula of (2) is as follows:
Wherein K is the number of target subject words, q m is the weight of the target subject words, C (r m) is the word vector of the target subject words, and q m×C(rm) is the subject feature of the target subject words.
And obtaining the topic characteristics of the text by summing the topic vectors of the target topic words.
And step S120, splicing the character vector, the word vector and the theme vector to obtain a word vector matrix.
Exemplary, concatenated character vectors, word vectors, and subject vectorsWherein the method comprises the steps ofAnd respectively representing the corresponding character vector, the word vector and the theme vector in the text, thereby obtaining a word vector matrix E= (E 1,E2,…,En).
Step S130, inputting the word vector matrix into a bidirectional GRU network to output hidden states through the bidirectional GRU network.
It can be understood that the word vector matrix e= (E 1,E2,…,En) is used as the input of the bidirectional GRU network, the forward GRU network in the bidirectional GRU network reads the input sequence (E 1,E2,…,En) in the forward sequence, the reverse GRU network reads the input sequence (E n,En-1,…,E1) in the reverse sequence, the input vector E i at each time t is calculated by the gating loop unit, and the forward hidden state at each time is obtainedAnd reverse hidden stateThe forward hidden state of each momentReverse hidden state corresponding to the timeAnd (3) connecting to obtain the hidden state at the moment:
And step S140, processing the hidden state by using a self-attention mechanism to obtain a feature matrix of the text.
The hidden state of each moment is processed by using a self-attention mechanism, the words input by each time step are weighted according to the attention calculation, so that important words acquire higher weight, and the feature matrix of the text is acquired.
And step S150, invoking a knowledge graph to perform conceptualization processing on the text to obtain a concept set comprising concept vectors.
Mapping keywords in the text with entities in the knowledge graph, mapping the text keywords to target entities through an entity linking technology, obtaining a concept set of the text, and vectorizing the concept set to obtain the concept set comprising concept vectors.
Illustratively, mapping keywords in a text with entities of encyclopedia general knowledge Graph CN-DBpedia or microsoft accept Graph knowledge Graph, mapping the text keywords to target entities through entity linking technology, and obtaining a text Concept set c= (C 1,c2,...,ci,...,cm) including Concept vectors, where C i represents Concept vectors in the i-th Concept set.
Step S160, inputting the feature matrix to a pooling layer to output feature vectors through the pooling layer.
The feature matrix of the text is converted into feature vectors by the max pooling layer. It will be appreciated that the feature matrix of text can be compressed into feature vectors using the max pooling layer.
Illustratively, the text matrix h' e R n×2n is transformed into the feature vector q e R 2n through the max pooling layer.
Step S170, calculating a relationship weight between the concept vector and the feature vector using an attention mechanism.
In some embodiments, the relational weights are calculated as:
Wherein a i is a relation weight between a concept vector in the ith concept set and a feature vector q of the text, c i is the ith concept vector in the concept set, i is an integer greater than 1, and As a weight matrix, saidAs a weight vector, d a is a hyper-parameter and b 1 is a bias.
Step S180, calculating importance weights of the concept vectors in the concept set using a self-attention mechanism.
A self-attention mechanism is introduced in the concept set and attention calculation is performed to acquire importance weights of each concept vector in the whole concept set. It should be noted that, the attention mechanism gives a larger weight to important concepts and gives a smaller weight to unimportant concepts so as to highlight important concepts in the concept set.
In some embodiments, the importance weight is calculated as:
wherein β i is the importance weight of the concept vector in the i-th concept set in the concept set, c i is the i-th concept vector in the concept set, and As a weight matrix, saidAs a weight vector, d b is a hyper-parameter and b 2 is a bias.
And step S190, adjusting the corresponding relation weight by using the importance weight, and carrying out weighted calculation on each concept vector according to the adjusted relation weight to obtain a concept set feature so as to carry out knowledge supplement on the text through the concept set feature.
In some embodiments, the calculation formula for adjusting the corresponding relation weight by using the importance weight is:
αi=softmax(γαi+(1-γ)βi)
Wherein, gamma is the self-adaptive coefficient based on the neural network, and gamma epsilon (0, 1).
The importance weight is utilized to adjust the relation weight, so that the semantic relevance of the text and the corresponding concept set is better calculated, the weight of the key concept vector is more reasonably increased, and incorrect concepts introduced due to ambiguity or irrelevant noise of the entity are reduced.
Calculating a weighted sum of the concept vectors according to the adjusted relation weights, thereby obtaining semantic vectors representing the conceptsThat is, a set of conceptual features, where m is the total number of conceptual vectors in the set of concepts, c i is the i-th conceptual vector in the set of concepts, and a i is the adjusted relationship weight for the corresponding conceptual vector c i.
Referring to fig. 3, fig. 3 shows a schematic structural diagram of a text classification model according to an embodiment of the present application, and a text knowledge supplementing method based on a knowledge graph according to an embodiment of the present application is described on the text classification model.
The text classification model integrates a knowledge graph, an attention mechanism and a bidirectional GRU network, and mainly comprises two parts:
And in the first part, text coding, after character vectors, word vectors and theme vectors of the text are acquired, splicing the character vectors, the word vectors and the theme vectors to obtain word vector matrixes, inputting the word vector matrixes into a bidirectional GRU network to output hidden states through the bidirectional GRU network, and finally processing the hidden states through a self-attention layer by utilizing a self-attention mechanism to obtain feature matrixes of the text.
And in the second part, knowledge conceptualization coding, namely, invoking a knowledge graph to carry out conceptualization processing on a text to obtain a concept set comprising concept vectors, inputting a feature matrix into a pooling layer to output the feature vectors through the pooling layer, calculating the relation weights between the concept vectors and the feature vectors by using an attention mechanism and calculating the importance weights of the concept vectors in the concept set by using a self-attention mechanism, adjusting the corresponding relation weights by using the importance weights, improving the weights of the concept vectors closely related to the text in the concept set, reducing the introduction of incorrect concepts due to ambiguity or irrelevant noise of an entity, and carrying out weighted calculation on each concept vector according to the adjusted relation weights to obtain the feature of the concept set.
It will further be appreciated that although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous.
According to the embodiment of the application, the character vector, the word vector and the theme vector of the text are acquired, the word vector matrix is spliced and is used as the input of the bidirectional GRU network, so that the hidden state is output through the bidirectional GRU network, then the hidden state is processed by using a self-attention mechanism, the feature matrix of the text is acquired, and the knowledge graph is called to perform conceptual processing on the text, so that a concept set comprising the concept vector is acquired. In addition, the feature matrix is input into the pooling layer to output feature vectors through the pooling layer, the relationship weights between the concept vectors and the feature vectors are calculated by using an attention mechanism, the importance weights of the concept vectors in the concept set are calculated by using a self-attention mechanism, then the corresponding relationship weights are adjusted by using the importance weights, and each concept vector is weighted according to the adjusted relationship weights to obtain concept set features so as to supplement knowledge to the text through the concept set features. According to the text knowledge supplementing method, the text is modeled from the character granularity, the word granularity and the text granularity, the text characteristics are expanded, knowledge in the knowledge graph is introduced to supplement knowledge to the text by correlating the knowledge in the text with the knowledge graph, wherein the corresponding relation weight is adjusted by using the importance weight, the weight of key concept vectors is increased more reasonably, incorrect concepts introduced due to ambiguity or irrelevant noise of entities in the knowledge graph are reduced, the text is classified based on the text knowledge supplementing method, the introduced knowledge quality can be improved, and the knowledge is supplemented more fully and effectively, so that the text is classified more accurately.
Referring to fig. 4, a text knowledge supplementing device 200 based on a knowledge graph according to an embodiment of the present application includes:
a vector obtaining module 210, configured to obtain a character vector, a word vector, and a topic vector of a text;
a vector stitching module 220, configured to stitch the character vector, the word vector, and the topic vector to obtain a word vector matrix;
a hidden state acquisition module 230, configured to input the word vector matrix to a bidirectional GRU network, so as to output a hidden state through the bidirectional GRU network;
a hidden state processing module 240, configured to process the hidden state by using a self-attention mechanism to obtain a feature matrix of the text;
the text knowledge acquisition module 250 is configured to invoke a knowledge graph to perform conceptual processing on the text, so as to obtain a concept set including concept vectors;
a feature matrix processing module 260, configured to input the feature matrix to a pooling layer, so as to output a feature vector through the pooling layer;
A first weight acquisition module 270 for calculating a relationship weight between the concept vector and the feature vector using an attention mechanism;
a second weight acquisition module 280 for calculating importance weights of the concept vectors in the concept set using a self-attention mechanism;
The text knowledge supplementing module 290 is configured to adjust the corresponding relation weight by using the importance weight, and perform weighted calculation on each concept vector according to the adjusted relation weight, so as to obtain a concept set feature, so as to supplement knowledge to the text through the concept set feature.
In some embodiments, the vector acquisition module specifically includes:
A character vector acquisition unit for acquiring a character vector of a text;
a word vector acquisition unit for acquiring a word vector of a text;
And the theme vector acquisition unit is used for acquiring the theme vector of the text.
In some embodiments, the text knowledge supplementing module specifically includes:
the weight adjusting unit is used for adjusting the corresponding relation weight by utilizing the importance weight;
and the weighting calculation unit is used for carrying out weighting calculation on each concept vector according to the adjusted relation weight to obtain the concept set characteristics.
And the knowledge supplementing unit is used for supplementing knowledge to the text through the concept set characteristics.
According to the embodiment of the application, the character vector, the word vector and the theme vector of the text are acquired through the vector acquisition module, the vector splicing module splices the vectors to obtain the word vector matrix, the hidden state acquisition module takes the word vector matrix as the input of the bidirectional GRU network to output the hidden state through the bidirectional GRU network, then the hidden state processing module processes the hidden state by using a self-attention mechanism to obtain the feature matrix of the text, and the text knowledge acquisition module invokes the knowledge graph to perform conceptual processing on the text to obtain the concept set comprising the concept vector. In addition, the feature matrix processing module inputs the feature matrix to the pooling layer to output feature vectors through the pooling layer, the first weight acquisition module calculates a relationship weight between the concept vectors using an attention mechanism, and the second weight acquisition module calculates an importance weight of the concept vectors in the concept set using a self-attention mechanism. And finally, the text knowledge supplementing module adjusts the corresponding relation weight by using the importance weight, and performs weighted calculation on each concept vector according to the adjusted relation weight to obtain concept set characteristics so as to supplement knowledge to the text through the concept set characteristics. According to the text knowledge supplementing method, the text is modeled from the character granularity, the word granularity and the text granularity, the text characteristics are expanded, knowledge in the knowledge graph is introduced to supplement knowledge to the text by correlating the knowledge in the text with the knowledge graph, wherein the corresponding relation weight is adjusted by using the importance weight, the weight of key concept vectors is increased more reasonably, incorrect concepts introduced due to ambiguity or irrelevant noise of entities in the knowledge graph are reduced, the text is classified based on the text knowledge supplementing method, the introduced knowledge quality can be improved, and the knowledge is supplemented more fully and effectively, so that the text is classified more accurately.
It should be noted that, because the content of information interaction and execution process between the modules/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.
Fig. 5 shows a computer device 300 provided by an embodiment of the application. The computer device 300 may be a server or a terminal, and the internal structure of the computer device 300 includes, but is not limited to:
a memory 310 for storing a program;
The processor 320 is configured to execute the program stored in the memory 310, and when the processor 320 executes the program stored in the memory 310, the processor 320 is configured to execute the above-described text knowledge supplement method based on the knowledge graph.
The processor 320 and the memory 310 may be connected by a bus or other means.
The memory 310 serves as a non-transitory computer readable storage medium that can be used to store non-transitory software programs and non-transitory computer executable programs, such as the knowledge-graph-based text knowledge supplement method described in any embodiment of the invention. The processor 320 implements the above-described knowledge graph-based text knowledge supplement method by running non-transitory software programs and instructions stored in the memory 310.
The memory 310 may include a memory program area that may store an operating system, an application program required for at least one function, and a memory data area that may store a text knowledge supplement method based on a knowledge graph as described above. In addition, memory 310 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some implementations, memory 310 may optionally include memory located remotely from processor 320, which may be connected to processor 320 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The non-transitory software programs and instructions required to implement the knowledge-graph-based text knowledge supplement method described above are stored in the memory 310, which when executed by the one or more processors 320, perform the knowledge-graph-based text knowledge supplement method provided by any embodiment of the invention.
The embodiment of the application also provides a computer-readable storage medium which stores computer-executable instructions for executing the text knowledge supplementing method based on the knowledge graph.
In one embodiment, the storage medium stores computer-executable instructions that are executed by one or more control processors, for example, by one or more processors 320 in the computer device 300, so that the one or more processors 320 perform the method for supplementing text knowledge based on knowledge graph according to any embodiment of the invention.
The embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically include computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and may include any information delivery media.
The preferred embodiments of the present invention have been described in detail, but the present invention is not limited to the above embodiments, and those skilled in the art will appreciate that the present invention may be practiced without departing from the spirit of the present invention. Various equivalent modifications and substitutions may be made in the shared context, and are intended to be included within the scope of the present invention as defined in the following claims.

Claims (8)

1.一种基于知识图谱的文本知识补充方法,其特征在于,所述方法包括:1. A text knowledge supplement method based on knowledge graph, characterized in that the method comprises: 获取文本的字符向量、词向量和主题向量;Get the character vector, word vector and topic vector of the text; 拼接所述字符向量、所述词向量和所述主题向量,得到词向量矩阵;Concatenate the character vector, the word vector and the topic vector to obtain a word vector matrix; 将所述词向量矩阵输入到双向GRU网络,以通过所述双向GRU网络输出隐藏状态;Inputting the word vector matrix into a bidirectional GRU network to output a hidden state through the bidirectional GRU network; 利用自注意力机制对所述隐藏状态进行处理,得到所述文本的特征矩阵;Processing the hidden state using a self-attention mechanism to obtain a feature matrix of the text; 调用知识图谱对所述文本进行概念化处理,得到包括概念向量的概念集合;Calling the knowledge graph to conceptualize the text to obtain a concept set including a concept vector; 将所述特征矩阵输入到池化层,以通过所述池化层输出特征向量;Inputting the feature matrix into a pooling layer to output a feature vector through the pooling layer; 使用注意力机制计算所述概念向量与所述特征向量之间的关系权重;Using an attention mechanism to calculate the relationship weight between the concept vector and the feature vector; 使用自注意力机制计算所述概念向量在所述概念集合中的重要性权重;Using a self-attention mechanism to calculate the importance weight of the concept vector in the concept set; 利用所述重要性权重调整对应的关系权重,根据调整后的关系权重,对每个概念向量进行加权计算,得到概念集特征,以通过所述概念集特征对所述文本进行知识补充;Using the importance weight to adjust the corresponding relationship weight, and performing weighted calculation on each concept vector according to the adjusted relationship weight to obtain a concept set feature, so as to supplement the text with knowledge through the concept set feature; 其中,所述关系权重的计算公式为:The calculation formula of the relationship weight is: 其中,所述ai为第i个所述概念集合中概念向量与所述文本的特征向量q之间的关系权重,所述ci表示概念集合中第i个概念向量,i为大于1的整数,所述为权重矩阵,所述为权重向量,所述da为超参数,所述b1为偏置;Wherein, ai is the relationship weight between the concept vector in the i-th concept set and the feature vector q of the text, ci represents the i-th concept vector in the concept set, i is an integer greater than 1, is the weight matrix, is a weight vector, d a is a hyperparameter, and b 1 is a bias; 所述重要性权重的计算公式为:The calculation formula of the importance weight is: 其中,所述βi为第i个所述概念集合中概念向量在所述概念集合中的重要性权重,所述ci表示概念集合中第i个概念向量,所述为权重矩阵,所述为权重向量,所述db为超参数,所述b2为偏置。Wherein, the β i is the importance weight of the concept vector in the i-th concept set in the concept set, the c i represents the i-th concept vector in the concept set, and the is the weight matrix, is the weight vector, d b is the hyperparameter, and b 2 is the bias. 2.根据权利要求1所述的方法,其特征在于,所述利用所述重要性权重调整对应的关系权重的计算公式为:2. The method according to claim 1, characterized in that the calculation formula for adjusting the corresponding relationship weight using the importance weight is: αi=softmax(γαi+(1-γ)βi)α i =softmax(γα i +(1-γ)β i ) 其中,所述γ为基于神经网络的自适应系数,γ∈(0,1)。Wherein, the γ is an adaptive coefficient based on a neural network, γ∈(0,1). 3.根据权利要求1所述的方法,其特征在于,所述文本的字符向量的获取方法如下:3. The method according to claim 1, characterized in that the character vector of the text is obtained as follows: 建立以字符粒度为输入单位的CNN模型,通过所述CNN模型对所述文本中的单词序列中的每个单词进行字符特征提取,得到字符向量。A CNN model with character granularity as input unit is established, and character features of each word in the word sequence in the text are extracted through the CNN model to obtain a character vector. 4.根据权利要求1所述的方法,其特征在于,所述文本的词向量的获取方法如下:4. The method according to claim 1, characterized in that the word vector of the text is obtained as follows: 将所述文本中的单词映射为词向量。Map the words in the text into word vectors. 5.根据权利要求1所述的方法,其特征在于,所述文本的主题向量的获取方法如下:5. The method according to claim 1, characterized in that the method for obtaining the topic vector of the text is as follows: 将所述文本输入到word2vec模型,得到所述文本对应的多个词向量;Input the text into the word2vec model to obtain multiple word vectors corresponding to the text; 对于每个词向量,分别输入到预先训练好的LDA模型,以使所述LDA模型输出与所述词向量对应的多个主题以及主题对应的概率分布值;For each word vector, input it into a pre-trained LDA model, so that the LDA model outputs multiple topics corresponding to the word vector and probability distribution values corresponding to the topics; 从所述主题中选取概率分布值最大的主题作为目标主题,并获取所述目标主题对应的主题词文件,所述主题词文件包括多个主题词以及主题词对应的概率值;Select a topic with the largest probability distribution value from the topics as a target topic, and obtain a keyword file corresponding to the target topic, wherein the keyword file includes multiple keywords and probability values corresponding to the keywords; 将所述多个主题词按照概率值从大到小排序,并选取前K个主题词作为目标主题词;Sort the multiple subject words from large to small according to the probability values, and select the first K subject words as target subject words; 根据每个目标主题词的概率值,获取每个目标主题词的权重;According to the probability value of each target keyword, the weight of each target keyword is obtained; 根据每个目标主题词的权重,对K个所述目标主题词进行加权运算,得到与所述词向量对应的主题特征;According to the weight of each target topic word, a weighted operation is performed on the K target topic words to obtain a topic feature corresponding to the word vector; 根据各个词向量对应的主题特征,得到所述文本对应的主题向量。According to the topic features corresponding to each word vector, the topic vector corresponding to the text is obtained. 6.一种基于知识图谱的文本知识补充装置,其特征在于,包括:6. A text knowledge supplement device based on knowledge graph, characterized by comprising: 向量获取模块,用于获取文本的字符向量、词向量和主题向量;Vector acquisition module, used to obtain character vectors, word vectors and topic vectors of text; 向量拼接模块,用于拼接所述字符向量、所述词向量和所述主题向量,得到词向量矩阵;A vector concatenation module, used for concatenating the character vector, the word vector and the topic vector to obtain a word vector matrix; 隐藏状态获取模块,用于将所述词向量矩阵输入到双向GRU网络,以通过所述双向GRU网络输出隐藏状态;A hidden state acquisition module, used for inputting the word vector matrix into a bidirectional GRU network to output a hidden state through the bidirectional GRU network; 隐藏状态处理模块,用于利用自注意力机制对所述隐藏状态进行处理,得到所述文本的特征矩阵;A hidden state processing module, used to process the hidden state using a self-attention mechanism to obtain a feature matrix of the text; 文本知识获取模块,用于调用知识图谱对所述文本进行概念化处理,得到包括概念向量的概念集合;A text knowledge acquisition module, used to call the knowledge graph to conceptualize the text and obtain a concept set including a concept vector; 特征矩阵处理模块,用于将所述特征矩阵输入到池化层,以通过所述池化层输出特征向量;A feature matrix processing module, used for inputting the feature matrix into a pooling layer to output a feature vector through the pooling layer; 第一权重获取模块,用于使用注意力机制计算所述概念向量与所述特征向量之间的关系权重;A first weight acquisition module, used for calculating the relationship weight between the concept vector and the feature vector using an attention mechanism; 第二权重获取模块,用于使用自注意力机制计算所述概念向量在所述概念集合中的重要性权重;A second weight acquisition module, used for calculating the importance weight of the concept vector in the concept set by using a self-attention mechanism; 文本知识补充模块,用于利用所述重要性权重调整对应的关系权重,根据调整后的关系权重,对每个概念向量进行加权计算,得到概念集特征,以通过所述概念集特征对所述文本进行知识补充;A text knowledge supplementation module, used to adjust the corresponding relationship weight using the importance weight, and perform weighted calculation on each concept vector according to the adjusted relationship weight to obtain a concept set feature, so as to supplement the text with knowledge through the concept set feature; 其中,所述关系权重的计算公式为:The calculation formula of the relationship weight is: 其中,所述ai为第i个所述概念集合中概念向量与所述文本的特征向量q之间的关系权重,所述ci表示概念集合中第i个概念向量,i为大于1的整数,所述为权重矩阵,所述为权重向量,所述da为超参数,所述b1为偏置;Wherein, ai is the relationship weight between the concept vector in the i-th concept set and the feature vector q of the text, ci represents the i-th concept vector in the concept set, i is an integer greater than 1, is the weight matrix, is a weight vector, d a is a hyperparameter, and b 1 is a bias; 所述重要性权重的计算公式为:The calculation formula of the importance weight is: 其中,所述βi为第i个所述概念集合中概念向量在所述概念集合中的重要性权重,所述ci表示概念集合中第i个概念向量,所述为权重矩阵,所述为权重向量,所述db为超参数,所述b2为偏置。Wherein, the β i is the importance weight of the concept vector in the i-th concept set in the concept set, the c i represents the i-th concept vector in the concept set, and the is the weight matrix, is the weight vector, d b is the hyperparameter, and b 2 is the bias. 7.一种计算机设备,其特征在于,包括:7. A computer device, comprising: 存储器,用于存储程序;Memory, used to store programs; 处理器,用于执行所述存储器存储的程序,当所述处理器执行所述存储器存储的程序时,所述处理器用于执行:如权利要求1至5中任一项所述的方法。A processor, configured to execute the program stored in the memory. When the processor executes the program stored in the memory, the processor is configured to execute: the method as claimed in any one of claims 1 to 5. 8.一种计算机可读存储介质,其特征在于,存储有计算机可执行指令,所述计算机可执行指令用于执行:如权利要求1至5中任一项所述的方法。8. A computer-readable storage medium, characterized in that it stores computer-executable instructions, wherein the computer-executable instructions are used to execute: the method according to any one of claims 1 to 5.
CN202111235816.9A 2021-10-22 2021-10-22 Text knowledge supplementation method and device based on knowledge graph Active CN113919333B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111235816.9A CN113919333B (en) 2021-10-22 2021-10-22 Text knowledge supplementation method and device based on knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111235816.9A CN113919333B (en) 2021-10-22 2021-10-22 Text knowledge supplementation method and device based on knowledge graph

Publications (2)

Publication Number Publication Date
CN113919333A CN113919333A (en) 2022-01-11
CN113919333B true CN113919333B (en) 2025-04-04

Family

ID=79242459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111235816.9A Active CN113919333B (en) 2021-10-22 2021-10-22 Text knowledge supplementation method and device based on knowledge graph

Country Status (1)

Country Link
CN (1) CN113919333B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114742052A (en) * 2022-04-25 2022-07-12 平安普惠企业管理有限公司 Text subject extraction method, device, equipment and storage medium
CN117216268B (en) * 2023-08-29 2025-09-09 大连理工大学 Intelligent knowledge classification method
CN119692343B (en) * 2025-02-25 2025-06-17 小哆智能科技(北京)有限公司 A method and system for extracting topics from user question text

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717334A (en) * 2019-09-10 2020-01-21 上海理工大学 Text emotion analysis method based on BERT model and double-channel attention
CN110781305A (en) * 2019-10-30 2020-02-11 北京小米智能科技有限公司 Text classification method and device based on classification model and model training method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008046104A2 (en) * 2006-10-13 2008-04-17 Collexis Holding, Inc. Methods and systems for knowledge discovery
CN111223498A (en) * 2020-01-10 2020-06-02 平安科技(深圳)有限公司 Intelligent emotion recognition method and device and computer readable storage medium
CN112100401B (en) * 2020-09-14 2024-05-07 北京大学 Knowledge graph construction method, device, equipment and storage medium for science and technology services

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110717334A (en) * 2019-09-10 2020-01-21 上海理工大学 Text emotion analysis method based on BERT model and double-channel attention
CN110781305A (en) * 2019-10-30 2020-02-11 北京小米智能科技有限公司 Text classification method and device based on classification model and model training method

Also Published As

Publication number Publication date
CN113919333A (en) 2022-01-11

Similar Documents

Publication Publication Date Title
CN113919333B (en) Text knowledge supplementation method and device based on knowledge graph
CN109783817B (en) A Computational Model of Text Semantic Similarity Based on Deep Reinforcement Learning
CN110837738B (en) Method, device, computer equipment and storage medium for identifying similarity
US6173275B1 (en) Representation and retrieval of images using context vectors derived from image information elements
CN113239700A (en) Text semantic matching device, system, method and storage medium for improving BERT
US6760714B1 (en) Representation and retrieval of images using content vectors derived from image information elements
CN111400470A (en) Question processing method and device, computer equipment and storage medium
CN111221944B (en) Text intention recognition method, device, equipment and storage medium
CN113569094B (en) Video recommendation method, device, electronic device and storage medium
CN112257449A (en) Named entity recognition method and device, computer equipment and storage medium
CN110069627A (en) Classification method, device, electronic equipment and the storage medium of short text
CN109918507B (en) An Improved Text Classification Method Based on TextCNN
CN112100377B (en) Text classification method, apparatus, computer device and storage medium
CN114329029B (en) Object retrieval method, device, equipment and computer storage medium
CN112115702A (en) Intention recognition method, device, dialogue robot and computer readable storage medium
CN111881264B (en) A method and electronic device for long text retrieval in open domain question answering tasks
CN113704528B (en) Cluster center determining method, device and equipment and computer storage medium
CN113761868A (en) Text processing method and device, electronic equipment and readable storage medium
US20240249133A1 (en) Systems, apparatuses, methods, and non-transitory computer-readable storage devices for training artificial-intelligence models using adaptive data-sampling
CN111858878A (en) Method, system and storage medium for automatically extracting answer from natural language text
CN111639186A (en) Multi-class multi-label text classification model and device dynamically embedded with projection gate
CN118504643A (en) Compression method, device, storage medium and program product of neural network model
CN110728135A (en) Text theme indexing method and device, electronic equipment and computer storage medium
CN113849679B (en) Image retrieval method, device, electronic device and storage medium
CN116108140A (en) Method for matching legal provision by natural language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant