CN110969005A - Method and device for determining similarity between entity corpora - Google Patents
Method and device for determining similarity between entity corpora Download PDFInfo
- Publication number
- CN110969005A CN110969005A CN201811151935.4A CN201811151935A CN110969005A CN 110969005 A CN110969005 A CN 110969005A CN 201811151935 A CN201811151935 A CN 201811151935A CN 110969005 A CN110969005 A CN 110969005A
- Authority
- CN
- China
- Prior art keywords
- training
- entity
- test
- corpus
- entity corpus
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 72
- 238000012549 training Methods 0.000 claims abstract description 352
- 239000013598 vector Substances 0.000 claims abstract description 126
- 239000011159 matrix material Substances 0.000 claims abstract description 99
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 62
- 230000008569 process Effects 0.000 claims abstract description 15
- 238000012360 testing method Methods 0.000 claims description 200
- 238000012545 processing Methods 0.000 claims description 46
- 238000005070 sampling Methods 0.000 claims description 21
- 238000000605 extraction Methods 0.000 claims description 16
- 238000003860 storage Methods 0.000 claims description 9
- 239000000284 extract Substances 0.000 abstract description 7
- 238000010586 diagram Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 9
- 238000004590 computer program Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 238000011176 pooling Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 208000031968 Cadaver Diseases 0.000 description 4
- 238000001914 filtration Methods 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000007230 neural mechanism Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Landscapes
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method and a device for determining similarity between entity corpora, wherein a training device randomly extracts a training set from a preset entity corpus, matches the entity corpora in the training set to obtain a training entity corpus relationship pair, obtains a matrix vector corresponding to the training entity corpus relationship pair, processes the matrix vector by using a convolutional neural network to obtain a training classification probability of the training entity corpus relationship pair, thereby completing the training of the convolutional neural network, providing the accurate searching function of the answer to the question for the user by using the convolutional neural network and the intelligent customer service of the preset entity corpus, further solving the problems of the intelligent customer service system in the prior art, due to the fact that information input by a user is not accurate, the intelligent customer service system cannot find correct answers from a knowledge base of the intelligent customer service system, and therefore the technical problem of user experience is lowered.
    Description
Technical Field
      The invention relates to the technical field of deep learning, in particular to a method and a device for determining similarity between entity corpora.
    Background
      With the rapid development of artificial intelligence technology, it is often not enough to apply the extracted relationship between entity corpuses to text search, for example, in tax aspect, the relationship of tax entity corpuses refers to the similarity between tax entity corpuses. The method for extracting the relationship between entity linguistic data is divided into three categories, one category is a supervised learning method, namely, the relationship extraction task is taken as a classification problem. Effective features are designed according to training data so as to learn various classification models, and then the trained classifier is used for predicting the relation. The method has the disadvantages that a large amount of manual labeling is needed to train entity corpora, and the corpus labeling work is usually time-consuming and labor-consuming. The second category is semi-supervised learning methods: the method mainly adopts BootStraping to extract the relationship, and for the relationship to be extracted, the method firstly sets a plurality of seed instances manually, and then extracts the relationship template corresponding to the relationship and more instances from the data in an iterative manner. The third category is unsupervised learning methods: it is assumed that pairs of entities having the same semantic relationship have similar context information. Therefore, the semantic relationship of each entity corpus relationship pair can be represented by the context information corresponding to the entity corpus relationship pair, and the semantic relationships of all the entity pairs are clustered. The existing supervised learning relationship extraction method has achieved a good effect, but the method relies heavily on natural language processing labels such as part of speech labels, syntax parsing and the like to provide classification features, while natural language processing labels usually have a large number of errors, and the errors are continuously propagated and amplified in a relationship extraction system, so that the relationship extraction effect is influenced finally.
      For example, in the existing intelligent customer service system, tax payment service is stepping into the intelligent era of "internet + tax". The intelligent customer service provides convenient, intelligent and ubiquitous customer service for the taxpayers, for example, in an intelligent customer service system such as WeChat public numbers in a certain city, the taxpayers can input related problems in a voice or text mode usually at the entrance of consultation, and the intelligent customer service finds matched answers from a tax knowledge base through artificial intelligence technologies such as voice recognition, natural language understanding and the like and feeds the answers back to the taxpayers in the forms of texts, pictures and texts, webpage links and the like. However, because taxpayers are distributed all over the country, the phenomena that the Putonghua is mixed with various dialects, the spoken expressions of tax entities are different or the spoken expressions of tax entities in all the regions are not strict in the tax consultation process, and the like exist, the intelligent customer service system generally cannot accurately match the nonstandard spoken expression contents with the standard answers, so that the answers cannot be quickly searched, and the satisfaction degree of the intelligent question-answering system is low. For example, the tax disc in the oral language of the taxpayer in a certain place is consistent with the gold tax disc in the standard knowledge base, and belongs to different words, the intelligent customer service system cannot take the content expressed by the oral language and the answer of the standard knowledge base as a complete matching item, so that the accurate search of the answer cannot be completed, and the result that the satisfaction degree of the intelligent customer service system is not high is generated.
      Therefore, the prior art has at least the following technical problems:
      for the intelligent customer service system, because the information input by the user is not accurate, the intelligent customer service system cannot find the correct answer from the knowledge base of the intelligent customer service system, and therefore the user experience is reduced.
    Disclosure of Invention
      The embodiment of the invention provides a method and a device for determining similarity between entity corpora, which are used for solving the technical problem that in the prior art, for an intelligent customer service system, the intelligent customer service system cannot find correct answers from a knowledge base of the intelligent customer service system due to inaccurate information input by a user, so that the user experience is reduced.
      In a first aspect, an embodiment of the present invention provides a method for determining similarity between entity corpuses, including:
      randomly extracting a training set from a preset entity corpus, wherein the training set is composed of a plurality of entity corpora;
      matching any entity corpus in the training set with each entity corpus except the entity corpus until all the entity corpuses in the training set are matched, thereby obtaining a plurality of training entity corpus relation pairs;
      acquiring each training statement matrix vector corresponding to each training entity corpus relationship pair;
      processing the training statement matrix vectors by using a convolutional neural network to obtain the training classification probability of the training entity corpus relationship pairs;
      and determining the similarity between the training entity corpora in the training entity corpus relationship pair based on the training classification probability.
      Optionally, the obtaining of each training statement matrix vector corresponding to each training entity corpus relationship pair specifically includes:
      acquiring a first set of word vectors corresponding to all words forming the training set, wherein each entity corpus in the training set is formed by a plurality of words;
      and acquiring a training sentence matrix vector of each training entity corpus relationship pair based on the first set, wherein the training sentence matrix vector is composed of a plurality of word vectors.
      Optionally, the processing the matrix vector of each training statement by using a convolutional neural network to obtain the training classification probability of each training entity corpus relationship pair specifically includes: performing convolution operation on each training statement matrix vector to acquire training characteristic information corresponding to the training entity corpus relationship pair;
      sampling each training characteristic information to obtain a plurality of training optimal characteristics of each training entity corpus pair;
      combining the training optimal features to obtain training local optimal features of the training entity corpus pairs;
      and processing each training local optimal characteristic by using a Softmax model to obtain the training classification probability of each training entity corpus pair.
      Optionally, the randomly extracting the training set from the preset entity corpus specifically includes:
      extracting a training set and a test set from a preset entity corpus by using a random extraction algorithm; the union of the training set and the test set is the preset entity corpus, and the training set and the test set have no intersection.
      After the processing the matrix vectors of the training sentences by using the convolutional neural network to obtain the test classification probability of the corpus relationship pairs of the training entities, the method further includes:
      pairing any entity corpus in the test set with each entity corpus except the entity corpus until all the entity corpuses in the test set are paired, so as to obtain a plurality of test entity corpus relation pairs, wherein the test set consists of a plurality of entity corpuses;
      obtaining each test statement matrix vector corresponding to each test entity corpus relationship pair;
      processing the matrix vector of each test statement by using the convolutional neural network to obtain the classification probability of the corpus relationship pair of each test entity;
      and outputting the classification probability of each corpus relationship pair of the test entities, so that a user can judge whether the convolutional neural network needs to be trained again based on the classification probability.
      Optionally, the obtaining of each test statement matrix vector corresponding to each test entity corpus relationship pair specifically includes:
      acquiring a second set of word vectors corresponding to all words forming the test set, wherein each entity corpus in the test set is formed by a plurality of words;
      and acquiring a test statement matrix vector of each test entity corpus relationship pair based on the second set, wherein the test statement matrix vector is composed of a plurality of word vectors.
      Optionally, the processing the matrix vector of each test statement by using the convolutional neural network to obtain the test classification probability of each test entity corpus relationship pair specifically includes:
      performing convolution operation on each test statement matrix vector to acquire test characteristic information corresponding to the test entity corpus relationship pair;
      sampling each test characteristic information to obtain a plurality of test optimal characteristics of each test entity corpus pair;
      merging the test optimal features to obtain test local optimal features of the corpus pairs of the test entities;
      and processing the local optimal characteristics of each test by using a Softmax model to obtain the test classification probability of each test entity corpus pair.
      Optionally, the preset entity corpus is a preset tax entity corpus.
      In a second aspect, an embodiment of the present invention provides an apparatus for determining similarity between entity corpuses, including:
      the extraction unit is used for randomly extracting a training set from a preset entity corpus, wherein the training set is composed of a plurality of entity corpora;
      a first matching unit, configured to match any entity corpus in the training set with each entity corpus except the entity corpus until all entity corpuses in the training set are matched, so as to obtain a plurality of training entity corpus relationship pairs;
      the first acquisition unit is used for acquiring each training statement matrix vector corresponding to each training entity corpus relationship pair;
      a second obtaining unit, configured to process the matrix vectors of the training sentences by using a convolutional neural network, and obtain a training classification probability of each training entity corpus relationship pair;
      and the determining unit is used for determining the similarity between the training entity corpora in the training entity corpus relationship pair based on the training classification probability.
      Optionally, the first obtaining unit specifically includes:
      a first obtaining subunit, configured to obtain a first set of word vectors corresponding to all words forming the training set, where each entity corpus in the training set is formed by a plurality of words;
      and a second obtaining subunit, configured to obtain a training sentence matrix vector of each training entity corpus relationship pair based on the first set, where the training sentence matrix vector is formed by a plurality of word vectors.
      Optionally, the second obtaining unit specifically includes:
      the first operation subunit is used for performing convolution operation on the matrix vectors of the training sentences to acquire training characteristic information corresponding to the training entity corpus relationship pairs;
      the first sampling subunit is used for sampling and processing each training characteristic information to obtain a plurality of training optimal characteristics of each training entity corpus pair;
      the first merging subunit is used for merging the training optimal features to obtain training local optimal features of the training entity corpus pairs;
      and the first classification subunit is used for processing each training local optimal feature by using a Softmax model and acquiring the training classification probability of each training entity corpus pair.
      Optionally, the apparatus further comprises:
      a second pairing unit, configured to pair any entity corpus in the test set with each entity corpus except the entity corpus after the test classification probability of each training entity corpus relationship pair is obtained by processing the training statement matrix vector by using a convolutional neural network, until all entity corpuses in the test set are paired, thereby obtaining a plurality of test entity corpus relationship pairs, where the test set is composed of a plurality of entity corpuses;
      a third obtaining unit, configured to obtain each test statement matrix vector corresponding to each test entity corpus relationship pair;
      a fourth obtaining unit, configured to process the test statement matrix vectors by using the convolutional neural network, and obtain a classification probability of the corpus relationship pair of each test entity;
      and the output unit is used for outputting the classification probability of each corpus relationship pair of the test entities, so that a user can judge whether the convolutional neural network needs to be trained again based on the classification probability.
      Optionally, the third obtaining unit specifically includes:
      a third obtaining subunit, configured to obtain a second set of word vectors corresponding to all words forming the test set, where each entity corpus in the test set is formed by a plurality of words;
      and a fourth obtaining subunit, configured to obtain, based on the second set, a test statement matrix vector of each test entity corpus relationship pair, where the test statement matrix vector is formed by a plurality of word vectors.
      Optionally, the fourth obtaining unit specifically includes:
      the second operation subunit is used for performing convolution operation on the matrix vectors of the test statements to acquire test characteristic information corresponding to the corpus relationship pair of the test entity;
      the second sampling subunit is used for sampling and processing each test characteristic information to obtain a plurality of test optimal characteristics of each test entity corpus pair;
      the second merging subunit is used for merging the test optimal features to obtain the test local optimal features of each test entity corpus pair;
      and the second classification subunit is used for processing the local optimal characteristics of each test by using a Softmax model and acquiring the test classification probability of each test entity corpus pair.
      Optionally, the preset entity corpus is a preset tax entity corpus.
      In a third aspect, an embodiment of the present invention provides an apparatus for determining similarity between entity corpuses, including:
      at least one processor, and a memory coupled to the at least one processor;
      wherein the memory stores instructions executable by the at least one processor, the at least one processor performing the method as described in the first aspect above by executing the instructions stored by the memory.
      In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, including:
      the computer-readable storage medium has stored thereon computer instructions which, when executed by at least one processor of the apparatus for determining similarity between entity corpuses, implement the method as described in the above first aspect.
      One or more technical solutions provided in the embodiments of the present invention have at least the following technical effects or advantages:
      in the invention, the device for determining the similarity between entity corpora performs a method for determining the similarity between entity corpora, namely, randomly extracts a training set from a preset entity corpus, matches any entity corpus in the training set with each entity corpus except the entity corpus until all entity corpora in the training set are matched, thereby obtaining a plurality of training entity corpus relation pairs, obtains each training statement matrix vector corresponding to each training entity corpus relation pair, processes each training statement matrix vector by using a convolutional neural network, obtains the training classification probability of each training entity corpus relation pair, can complete the learning process of the convolutional neural network for the preset entity corpus, thereby providing a problem answer accurate search function for a user by using the convolutional neural network and the intelligent customer service of the preset entity corpus, therefore, the problem that the intelligent customer service system cannot find the correct answer from the knowledge base of the intelligent customer service system due to inaccurate information input by a user in the prior art can be solved, the technical problem of user experience is reduced, and the technical effect of improving the user experience is achieved.
    Drawings
      Fig. 1 is a schematic structural diagram of a convolutional neural network according to an embodiment of the present invention;
      FIG. 2 is a flowchart illustrating a method for determining similarity between entity corpuses according to an embodiment of the present invention;
      FIG. 3 is a flowchart of a method for determining a training classification probability of a corpus relationship pair of a training entity using a convolutional neural network according to an embodiment of the present invention;
      FIG. 4 is a flowchart of a method for determining whether retraining the convolutional neural network is required according to an embodiment of the present invention;
      FIG. 5 is a schematic diagram of an intelligent customer service system according to an embodiment of the present invention, which employs a method for determining similarity between entity corpuses;
      FIG. 6 is a schematic structural diagram illustrating an apparatus for determining similarity between entity corpuses according to an embodiment of the present invention;
      fig. 7 is a schematic physical structure diagram of an apparatus for determining similarity between entity corpuses according to an embodiment of the present invention.
    Detailed Description
      In order to solve the technical problem, the technical scheme in the embodiment of the invention has the following general idea:
      a method and a device for determining similarity between entity corpora specifically include:
      randomly extracting a training set from a preset entity corpus, wherein the training set is composed of a plurality of entity corpora;
      matching any entity corpus in the training set with each entity corpus except the entity corpus until all the entity corpuses in the training set are matched, thereby obtaining a plurality of training entity corpus relation pairs;
      acquiring each training statement matrix vector corresponding to each training entity corpus relationship pair;
      processing the training statement matrix vectors by using a convolutional neural network to obtain the training classification probability of the training entity corpus relationship pairs;
      and determining the similarity between the training entity corpora in the training entity corpus relationship pair based on the training classification probability.
      In order to better understand the technical solutions, the technical solutions will be described in detail below with reference to the drawings and the specific embodiments of the specification, and it should be understood that the embodiments and specific features of the embodiments of the present invention are detailed descriptions of the technical solutions of the present invention, and are not limitations of the technical solutions of the present invention, and the technical features of the embodiments and examples of the present invention may be combined with each other without conflict.
      In an embodiment of the invention, the convolutional neural network comprises a convolutional layer, a pooling layer and a full-link layer; the Convolutional Neural Network (CNN) is a principle of a Neural mechanism derived from vision, and Hubel et al found that a Network structure exists in the Neural mechanism of vision, which can reduce the complexity of the Network, and the Network structure has invariance to changes such as scaling and translation, thereby having the Convolutional Neural Network. Referring to fig. 1, the basic structure of CNN is a hierarchical recursive network structure, which mainly includes two layers: convolutional layers and sampling layers, which also include fully-connected layers, and the input of the convolutional neural network is input in the form of a matrix vector. The convolutional layer is also called a feature extraction layer, and the sampling layer is called a feature mapping layer or a pooling layer. The two-layer structure can be actually understood as reducing the characteristic dimension and reducing the optimized parameters, which is also the advantage of the convolution network over other neural networks in full connection. Parameters in the network are reduced by sharing the local weight, and the method has good effects on the aspects of voice recognition and image processing. Based on the advantages, the convolutional neural network has great advantages in the aspect of text processing.
      The training device can be any terminal equipment which can run a computer program, such as a mobile phone, a tablet computer, a desktop computer and the like;
      an entity corpus relationship pair may be composed of two entity corpuses or multiple entity corpuses, e.g., an entity corpus relationship pair may be represented as<e1,e2>Wherein e is1,e2∈E,e1,e2Is an entity corpus, and E is a preset entity corpus;
      the sentence matrix vector corresponding to the entity corpus relationship pair may be represented as X, where X is a two-dimensional matrix of n × k, where n is the length of the word of the entity corpus relationship pair, and k is the word vector X of the ith word constituting the entity corpus relationship pairiIs determined by the total number of words.
      The above list is merely illustrative and not intended to be a specific limitation of the embodiment of the present invention.
      Referring to fig. 2, an embodiment of the present invention provides a method for determining similarity between entity corpuses, including the following steps:
      step S101, a training set is randomly extracted from a preset entity corpus, wherein the training set is composed of a plurality of entity corpora.
      Step S102, any entity corpus in the training set is paired with each entity corpus except the entity corpus until all the entity corpora in the training set are paired, and therefore a plurality of training entity corpus relation pairs are obtained.
      Step S103, obtaining each training statement matrix vector corresponding to each training entity corpus relationship pair.
      And step S104, processing each training statement matrix vector by using a convolutional neural network to obtain the training classification probability of each training entity corpus relationship pair.
      Step S105, based on the training classification probability, determining the similarity between the training entity corpora in the training entity corpus relationship pair.
      Firstly, step S101 is executed to randomly extract a training set from a preset entity corpus, wherein the training set is composed of a plurality of entity corpora.
      Specifically, the algorithm used in the training set extraction may be a random extraction algorithm, or may be other algorithms that can implement a random extraction function, and is not limited herein.
      Further, randomly extracting the training set from the preset entity corpus also comprises randomly extracting the test set from the preset entity corpus; the union of the training set and the test set is the preset entity corpus, and the training set and the test set have no intersection.
      Specifically, the method for extracting the test set includes extracting the test set while randomly extracting the training set from the predetermined entity corpus, for example, simultaneously extracting the training set and the test set from the predetermined entity corpus by using a random extraction algorithm; another method for extracting the test set is to use a part except the training set in a preset entity corpus as the test set after the training set is extracted; another method for extracting the test set is to randomly extract the test set from the predetermined entity corpus by using an algorithm, and then use the part of the predetermined entity corpus other than the test set as a training set.
      In addition, the ratio of the training set to the test set can be freely set, for example, the ratio of the training set to the test set is 1:2, that is, the number of the entity corpora constituting the training set is one third of the total number of the entity corpora in the predetermined entity corpus.
      After the training set is extracted, step S102 is executed to pair any entity corpus in the training set with each entity corpus except the entity corpus until all the entity corpuses in the training set are paired, so as to obtain a plurality of training entity corpus relationship pairs.
      Specifically, for example, 10 entity corpora numbered 1, 2, …, and 10 are included in the training set, in order to obtain training entity corpus pairs, the entity corpus 1 and the entity corps 2-10 may be respectively paired to obtain 9 training entity corpus relationship pairs, and then the entity corpus 2 and the entity corps 3-10 may be respectively paired to obtain 8 training entity corpus relationship pairs, and so on, until the entity corpus 9 and the entity corpus 10 are paired, total 45 training entity corpus relationship pairs are obtained, where the order and manner of pairing the entity corps in the training set are not limited, as long as all the entity corps in the training set are paired and there is no repeated entity corpus relationship pair.
      After step S102 is completed, step S103 is executed to obtain each training sentence matrix vector corresponding to each training entity corpus relationship pair.
      Further, the obtaining of each training statement matrix vector corresponding to each training entity corpus relationship pair specifically includes:
      acquiring a first set of word vectors corresponding to all words forming the training set, wherein each entity corpus in the training set is formed by a plurality of words;
      and acquiring a training sentence matrix vector of each training entity corpus relationship pair based on the first set, wherein the training sentence matrix vector is composed of a plurality of word vectors.
      Specifically, a first set of Word vectors corresponding to all words constituting the training set is obtained, Word vectors corresponding to the words can be obtained by using a Word2Vec model, the Word2Vec model can convert natural language into a vector form that can be recognized by a computer, for example, an entity corpus in the training set is "i love in beijing", "i love in beijing" includes 3 words, which are "i", "love in", "beijing", respectively, and the Word2Vec model is used to convert "i", "love in", "beijing" into Word vectors, the three Word vectors can be [1,0,0], [0,1,0], [0,0,1], and the length of a specific Word vector is determined by the number of non-repeated words constituting a preset entity corpus. Furthermore, each word corresponds to a word vector, and the first set includes word vectors corresponding to all non-repeated words constituting the training set.
      After the first set is obtained, based on the first set, a training sentence matrix vector of each training entity corpus relationship pair in the training set may be obtained, for example, each training entity corpus relationship pair is composed of two training entity corpora, that is, composed of a plurality of words, based on the first set, a training sentence matrix vector corresponding to the training entity corpus relationship pair may be obtained, where the matrix vector is composed of word vectors corresponding to the words composing the training entity corpus relationship pair.
      After the step S103 is executed, step S104 is executed, the convolutional neural network is used to process the matrix vectors of the training sentences, and the training classification probability of the corpus relationship pairs of the training entities is obtained.
      Further, referring to fig. 3, processing the matrix vectors of the training sentences by using a convolutional neural network to obtain the training classification probability of the corpus relationship pairs of the training entities, specifically including the following steps:
      step S104a, performing convolution operation on each training statement matrix vector to obtain training characteristic information corresponding to the training entity corpus relation pair.
      Step S104b, sampling each training characteristic information, and obtaining a plurality of training optimal characteristics of each training entity corpus pair.
      Step S104c, merging the training optimal features to obtain the training local optimal features of each training entity corpus pair.
      And step S104d, processing each training local optimal feature by using a Softmax model, and acquiring the training classification probability of each training entity corpus pair.
      In step S104, step S104a is executed first, a convolution operation is performed on each training sentence matrix vector, and training feature information corresponding to the training entity corpus relationship pair is acquired.
      Specifically, after the training sentence matrix vector is input into the convolutional neural network, the convolutional layer of the convolutional neural network performs convolutional operation on the training sentence matrix vector to obtain training characteristic information corresponding to the training entity corpus, for example, the convolutional layer presets the size of a filtering window through a filter, and performs convolutional operation on the input matrix vector by using the filter, if the size of the filtering window is m, and an offset is added to perform convolutional operation, the characteristic information after convolutional operation can be expressed as:
      ci=f(w·xi:i=m-1+b)
      wherein, ciFor the corresponding ith characteristic value after convolution operation, f (-) is the selection of the convolution kernel function of the layer, and w isWeight matrix in filter, where w ∈ Rh*mH m is the size of the selected filtering window, b e R is a bias matrix, xi:i=m-1From the ith word to the length of i + m-1 words in the text sentence. In addition, the convolutional layer may perform convolution operations using a plurality of filters, each of which may set the size of a filter window.
      After the convolution layer of the convolutional neural network performs convolution operation on the training sentence matrix vector, the obtained training feature information corresponding to the training entity corpus can be represented as a feature matrix c:
      c=[c1,c2,…,cn-h+1]
      wherein c ∈ Rn-h+1。
      After the feature information is obtained, step S104b is executed to perform sampling processing on each training feature information, and obtain a plurality of training optimal features of each training entity corpus pair.
      Specifically, for example, after the convolutional layer of the convolutional neural network performs convolutional operation on the training sentence matrix vector, a plurality of convolutional results (e.g., the feature matrix c) may be obtained, and the pooling layer of the convolutional neural network may sample the plurality of convolutional results by using a Max-pooling (Max-pooling) method, according to the Max-poolingAnd taking the maximum value to obtain the training optimal characteristics of the training entity expected pairs.
      After the step S104b is completed, the step S104c is executed to perform merging processing on the training optimal features, so as to obtain the training local optimal features of each training entity corpus pair.
      Specifically, the convolution results are combined for a plurality of training optimal features, so that a plurality of training optimal features can be combined into one training local optimal feature, the aggregation statistics of the plurality of training optimal features is realized, and the dimensionality of the optimal features is reduced.
      And after the step S104c is executed, the step S104d is executed, the Softmax model is used for processing each training local optimal feature, and the training classification probability of each training entity corpus pair is obtained.
      Specifically, after receiving the training local optimal features, the fully-connected layer of the convolutional neural network performs relationship classification on the local optimal features by using a Softmax model to obtain classification probability.
      After step S104 or step S104d is completed, step S105 is executed to determine the similarity between the training entity corpuses in the training entity corpus relationship pair based on the training classification probability.
      Specifically, the similarity between the training entity corpora in the training entity corpus relationship pair may be represented by similarity (Y) and non-similarity (N), and the similarity is determined based on the training classification probability, which may be that if the training classification probability is greater than a preset threshold, the training entity corpora in the training entity corpus relationship pair are determined to be similar, and if the training classification probability is less than the preset threshold, the training entity corpora in the training entity corpus relationship pair are determined to be non-similar.
      Further, referring to fig. 4, after the step S105 is executed, the training method further includes the following steps:
      step S201, pairing any entity corpus in the test set with each entity corpus except the entity corpus until all entity corpuses in the test set are paired, thereby obtaining a plurality of test entity corpus relationship pairs, wherein the test set is composed of a plurality of entity corpuses.
      Step S202, obtaining each test statement matrix vector corresponding to each test entity corpus relationship pair.
      Step S203, processing the matrix vectors of the test statements by using the convolutional neural network, and acquiring the classification probability of the corpus relationship pairs of the test entities.
      And step S204, outputting the classification probability of each corpus relationship pair of the test entities, so that a user can judge whether the convolutional neural network needs to be trained again based on the classification probability.
      After step S104d is executed, step S201, step S202, step S203, and step S204 are executed in sequence, wherein the specific method for executing the test set in step S201, step S202, and step S203 is the same as the specific method for executing the training set in step S102, step S103, and step S104, respectively, and is not described herein again.
      When step S203 is executed, the method specifically includes:
      performing convolution operation on each test statement matrix vector to acquire test characteristic information corresponding to the test entity corpus relationship pair;
      sampling each test characteristic information to obtain a plurality of test optimal characteristics of each test entity corpus pair;
      merging the test optimal features to obtain test local optimal features of the corpus pairs of the test entities;
      and processing the local optimal characteristics of each test by using a Softmax model to obtain the test classification probability of each test entity corpus pair.
      The specific method for executing the test set specifically included in step S203 is the same as the specific method for executing the training set in steps S104a to S104d, and is not repeated here.
      For step S204, specifically, the classification probability of each corpus pair of test entities is output, where the classification probability includes a group of classification probability values, and the classification probability value is a classification relative probability corresponding to a value of a local optimal feature, and the convolutional neural network outputs the classification probability and words corresponding to the classification probability, and evaluates the output result by using the following formula:
      
      
      
      wherein r isiNumber of the ith test entity corpus relationship pair, t, representing a class correctly classifiediThe total number of the ith training entity corpus relationship pairs, a, determined as the classiTotal number of ith test entity corpus relationship pairs of test set, F1Is a predefined index function.
      Further, the convolutional neural network outputs accuracy, recall, and F to the user1So that the user can be based on accuracy, recall, and F1And judging whether the convolutional neural network needs to be trained again. For example, an accuracy of 60%, the user determines that the convolutional neural network needs to be trained again.
      Further, the convolutional neural network outputs accuracy, recall, and F to the user1In the process, the accuracy, the recall rate and the F of the character format can be output through a display interface of the training device1。
      Further, the preset entity corpus is a preset tax entity corpus, for example, the preset tax entity corpus may be an intelligent customer service tax knowledge base (the knowledge base has 7000 relevant tax knowledge items and 11000 expansion problems).
      For example, referring to fig. 5, the method for determining similarity between entity corpora is applied to an intelligent customer service system, where the preset entity corpus is a preset tax entity corpus, and the method executed by the intelligent customer service system is as follows:
      randomly extracting a training set or a test set from a preset tax entity corpus, pairing entity corpora in the training set or the training set, and acquiring a plurality of training entity corpus relation pairs or test entity corpus relation pairs, wherein if the training entity corpus relation pairs are input into a convolutional neural network, the output is the similarity between the training entity corpora in the training set, and the similarity can be similar or non-similar; if the input to the convolutional neural network is a corpus relationship pair of the test entity, the output result is accuracy, recall rate and F1And the like, for giving the user information to decide whether or not retraining is necessary.
      After the intelligent customer service executes the method, when the user uses the intelligent customer service system, even if the input information is inaccurate, the system can search similar entity corpora in the preset tax entity corpus based on the information input by the user, so that answers required by the user are searched.
      Referring to fig. 6, based on the same inventive concept, a second embodiment of the present invention provides an apparatus for determining similarity between entity corpuses, including:
      an extracting unit 601, configured to randomly extract a training set from a preset entity corpus, where the training set is composed of a plurality of entity corpora;
      a first pairing unit 602, configured to pair any entity corpus in the training set with each entity corpus except the entity corpus until all entity corpuses in the training set are paired, so as to obtain a plurality of training entity corpus relationship pairs;
      a first obtaining unit 603, configured to obtain each training statement matrix vector corresponding to each training entity corpus relationship pair;
      a second obtaining unit 604, configured to process the matrix vectors of the training sentences by using a convolutional neural network, and obtain a training classification probability of each training entity corpus relationship pair;
      a determining unit 605, configured to determine, based on the training classification probability, a similarity between the training entity corpuses in the training entity corpus relationship pair.
      Optionally, the first obtaining unit specifically includes:
      a first obtaining subunit, configured to obtain a first set of word vectors corresponding to all words forming the training set, where each entity corpus in the training set is formed by a plurality of words;
      and a second obtaining subunit, configured to obtain a training sentence matrix vector of each training entity corpus relationship pair based on the first set, where the training sentence matrix vector is formed by a plurality of word vectors.
      Optionally, the second obtaining unit specifically includes:
      the first operation subunit is used for performing convolution operation on the matrix vectors of the training sentences to acquire training characteristic information corresponding to the training entity corpus relationship pairs;
      the first sampling subunit is used for sampling and processing each training characteristic information to obtain a plurality of training optimal characteristics of each training entity corpus pair;
      the first merging subunit is used for merging the training optimal features to obtain training local optimal features of the training entity corpus pairs;
      and the first classification subunit is used for processing each training local optimal feature by using a Softmax model and acquiring the training classification probability of each training entity corpus pair.
      Optionally, the extracting unit is further configured to:
      extracting a test set from a preset entity corpus by using a random extraction algorithm; the union of the training set and the test set is the preset entity corpus, and the training set and the test set have no intersection.
      Optionally, the apparatus further comprises:
      a second pairing unit, configured to pair any entity corpus in the test set with each entity corpus except the entity corpus after the test classification probability of each training entity corpus relationship pair is obtained by processing the training statement matrix vector by using a convolutional neural network, until all entity corpuses in the test set are paired, thereby obtaining a plurality of test entity corpus relationship pairs, where the test set is composed of a plurality of entity corpuses;
      a third obtaining unit, configured to obtain each test statement matrix vector corresponding to each test entity corpus relationship pair;
      a fourth obtaining unit, configured to process the test statement matrix vectors by using the convolutional neural network, and obtain a classification probability of the corpus relationship pair of each test entity;
      and the output unit is used for outputting the classification probability of each corpus relationship pair of the test entities, so that a user can judge whether the convolutional neural network needs to be trained again based on the classification probability.
      Optionally, the third obtaining unit specifically includes:
      a third obtaining subunit, configured to obtain a second set of word vectors corresponding to all words forming the test set, where each entity corpus in the test set is formed by a plurality of words;
      and a fourth obtaining subunit, configured to obtain, based on the second set, a test statement matrix vector of each test entity corpus relationship pair, where the test statement matrix vector is formed by a plurality of word vectors.
      Optionally, the fourth obtaining unit specifically includes:
      the second operation subunit is used for performing convolution operation on the matrix vectors of the test statements to acquire test characteristic information corresponding to the corpus relationship pair of the test entity;
      the second sampling subunit is used for sampling and processing each test characteristic information to obtain a plurality of test optimal characteristics of each test entity corpus pair;
      the second merging subunit is used for merging the test optimal features to obtain the test local optimal features of each test entity corpus pair;
      and the second classification subunit is used for processing the local optimal characteristics of each test by using a Softmax model and acquiring the test classification probability of each test entity corpus pair.
      Optionally, the preset entity corpus is a preset tax entity corpus.
      Referring to fig. 7, based on the same inventive concept, a third embodiment of the present invention provides an apparatus for determining similarity between entity corpuses, including:
      at least one processor  701, and a memory  702 coupled to the at least one processor;
      wherein the memory  702 stores instructions executable by the at least one processor  701, and the at least one processor  701 executes the steps of the method as described in the above method embodiments by executing the instructions stored by the memory  702.
      Optionally, the processor  701 may specifically include a Central Processing Unit (CPU) and an Application Specific Integrated Circuit (ASIC), which may be one or more integrated circuits for controlling program execution, may be a hardware circuit developed by using a Field Programmable Gate Array (FPGA), and may be a baseband processor.
      Optionally, processor  701 may include at least one processing core.
      Optionally, the apparatus further includes a memory  702, and the memory  702 may include a Read Only Memory (ROM), a Random Access Memory (RAM), and a disk memory. The memory  702 is used for storing data required by the processor  701 in operation.
      Based on the same inventive concept, a fourth embodiment of the present invention provides a computer-readable storage medium, including:
      the computer-readable storage medium has stored thereon computer instructions which, when executed by at least one processor of the training apparatus, implement the method as described in the above-mentioned method embodiments.
      The technical scheme in the embodiment of the invention at least has the following technical effects or advantages:
      in the invention, the device for determining the similarity between entity corpora performs a method for determining the similarity between entity corpora, namely, randomly extracts a training set from a preset entity corpus, matches any entity corpus in the training set with each entity corpus except the entity corpus until all entity corpora in the training set are matched, thereby obtaining a plurality of training entity corpus relation pairs, obtains each training statement matrix vector corresponding to each training entity corpus relation pair, processes each training statement matrix vector by using a convolutional neural network, obtains the training classification probability of each training entity corpus relation pair, can complete the learning process of the convolutional neural network for the preset entity corpus, thereby providing a problem answer accurate search function for a user by using the convolutional neural network and the intelligent customer service of the preset entity corpus, therefore, the problem that the intelligent customer service system cannot find the correct answer from the knowledge base of the intelligent customer service system due to inaccurate information input by a user in the prior art can be solved, the technical problem of user experience is reduced, and the technical effect of improving the user experience is achieved.
      While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
      As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
      Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
      These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
      These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
      It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
    Claims (18)
1. A method for determining similarity between entity corpuses, comprising:
      randomly extracting a training set from a preset entity corpus, wherein the training set is composed of a plurality of entity corpora;
      matching any entity corpus in the training set with each entity corpus except the entity corpus until all the entity corpuses in the training set are matched, thereby obtaining a plurality of training entity corpus relation pairs;
      acquiring each training statement matrix vector corresponding to each training entity corpus relationship pair;
      processing the training statement matrix vectors by using a convolutional neural network to obtain the training classification probability of the training entity corpus relationship pairs;
      and determining the similarity between the training entity corpora in the training entity corpus relationship pair based on the training classification probability.
    2. The method according to claim 1, wherein the obtaining of each training sentence matrix vector corresponding to each training entity corpus relationship pair specifically comprises:
      acquiring a first set of word vectors corresponding to all words forming the training set, wherein each entity corpus in the training set is formed by a plurality of words;
      and acquiring a training sentence matrix vector of each training entity corpus relationship pair based on the first set, wherein the training sentence matrix vector is composed of a plurality of word vectors.
    3. The method according to claim 1 or 2, wherein the processing the matrix vector of each training sentence by using a convolutional neural network to obtain the training classification probability of each corpus relationship pair of the training entities specifically comprises:
      performing convolution operation on each training statement matrix vector to acquire training characteristic information corresponding to the training entity corpus relationship pair;
      sampling each training characteristic information to obtain a plurality of training optimal characteristics of each training entity corpus pair;
      combining the training optimal features to obtain training local optimal features of the training entity corpus pairs;
      and processing each training local optimal characteristic by using a Softmax model to obtain the training classification probability of each training entity corpus pair.
    4. The method of claim 1 or 2, wherein the randomly extracting training sets from a predetermined corpus of entities further comprises:
      extracting a test set from a preset entity corpus by using a random extraction algorithm; the union of the training set and the test set is the preset entity corpus, and the training set and the test set have no intersection.
    5. The method of claim 4, wherein after said processing each of said training sentence matrix vectors using a convolutional neural network to obtain a test classification probability for each of said training entity corpus relationship pairs, said method further comprises:
      pairing any entity corpus in the test set with each entity corpus except the entity corpus until all the entity corpuses in the test set are paired, so as to obtain a plurality of test entity corpus relation pairs, wherein the test set consists of a plurality of entity corpuses;
      obtaining each test statement matrix vector corresponding to each test entity corpus relationship pair;
      processing the matrix vector of each test statement by using the convolutional neural network to obtain the classification probability of the corpus relationship pair of each test entity;
      and outputting the classification probability of each corpus relationship pair of the test entities, so that a user can judge whether the convolutional neural network needs to be trained again based on the classification probability.
    6. The method according to claim 5, wherein said obtaining each test statement matrix vector corresponding to each test entity corpus relationship pair specifically comprises:
      acquiring a second set of word vectors corresponding to all words forming the test set, wherein each entity corpus in the test set is formed by a plurality of words;
      and acquiring a test statement matrix vector of each test entity corpus relationship pair based on the second set, wherein the test statement matrix vector is composed of a plurality of word vectors.
    7. The method according to claim 5 or 6, wherein the processing of the matrix vector of each test statement by using the convolutional neural network to obtain the test classification probability of each test entity corpus relationship pair specifically comprises:
      performing convolution operation on each test statement matrix vector to acquire test characteristic information corresponding to the test entity corpus relationship pair;
      sampling each test characteristic information to obtain a plurality of test optimal characteristics of each test entity corpus pair;
      merging the test optimal features to obtain test local optimal features of the corpus pairs of the test entities;
      and processing the local optimal characteristics of each test by using a Softmax model to obtain the test classification probability of each test entity corpus pair.
    8. The method of claim 1, 2, 5, or 6, wherein the predetermined corpus of entities is a predetermined taxation corpus of entities.
    9. An apparatus for determining similarity between entity corpuses, comprising:
      the extraction unit is used for randomly extracting a training set from a preset entity corpus, wherein the training set is composed of a plurality of entity corpora;
      a first matching unit, configured to match any entity corpus in the training set with each entity corpus except the entity corpus until all entity corpuses in the training set are matched, so as to obtain a plurality of training entity corpus relationship pairs;
      the first acquisition unit is used for acquiring each training statement matrix vector corresponding to each training entity corpus relationship pair;
      a second obtaining unit, configured to process the matrix vectors of the training sentences by using a convolutional neural network, and obtain a training classification probability of each training entity corpus relationship pair;
      and the determining unit is used for determining the similarity between the training entity corpora in the training entity corpus relationship pair based on the training classification probability.
    10. The apparatus of claim 9, wherein the first obtaining unit specifically comprises:
      a first obtaining subunit, configured to obtain a first set of word vectors corresponding to all words forming the training set, where each entity corpus in the training set is formed by a plurality of words;
      and a second obtaining subunit, configured to obtain a training sentence matrix vector of each training entity corpus relationship pair based on the first set, where the training sentence matrix vector is formed by a plurality of word vectors.
    11. The apparatus according to claim 9 or 10, wherein the second obtaining unit specifically includes:
      the first operation subunit is used for performing convolution operation on the matrix vectors of the training sentences to acquire training characteristic information corresponding to the training entity corpus relationship pairs;
      the first sampling subunit is used for sampling and processing each training characteristic information to obtain a plurality of training optimal characteristics of each training entity corpus pair;
      the first merging subunit is used for merging the training optimal features to obtain training local optimal features of the training entity corpus pairs;
      and the first classification subunit is used for processing each training local optimal feature by using a Softmax model and acquiring the training classification probability of each training entity corpus pair.
    12. The apparatus of claim 9 or 10, wherein the extraction unit is further configured to:
      extracting a test set from a preset entity corpus by using a random extraction algorithm; the union of the training set and the test set is the preset entity corpus, and the training set and the test set have no intersection.
    13. The apparatus of claim 12, wherein the apparatus further comprises:
      a second pairing unit, configured to pair any entity corpus in the test set with each entity corpus except the entity corpus after the test classification probability of each training entity corpus relationship pair is obtained by processing the training statement matrix vector by using a convolutional neural network, until all entity corpuses in the test set are paired, thereby obtaining a plurality of test entity corpus relationship pairs, where the test set is composed of a plurality of entity corpuses;
      a third obtaining unit, configured to obtain each test statement matrix vector corresponding to each test entity corpus relationship pair;
      a fourth obtaining unit, configured to process the test statement matrix vectors by using the convolutional neural network, and obtain a classification probability of the corpus relationship pair of each test entity;
      and the output unit is used for outputting the classification probability of each corpus relationship pair of the test entities, so that a user can judge whether the convolutional neural network needs to be trained again based on the classification probability.
    14. The apparatus according to claim 13, wherein the third obtaining unit specifically includes:
      a third obtaining subunit, configured to obtain a second set of word vectors corresponding to all words forming the test set, where each entity corpus in the test set is formed by a plurality of words;
      and a fourth obtaining subunit, configured to obtain, based on the second set, a test statement matrix vector of each test entity corpus relationship pair, where the test statement matrix vector is formed by a plurality of word vectors.
    15. The apparatus according to claim 13 or 14, wherein the fourth obtaining unit specifically includes:
      the second operation subunit is used for performing convolution operation on the matrix vectors of the test statements to acquire test characteristic information corresponding to the corpus relationship pair of the test entity;
      the second sampling subunit is used for sampling and processing each test characteristic information to obtain a plurality of test optimal characteristics of each test entity corpus pair;
      the second merging subunit is used for merging the test optimal features to obtain the test local optimal features of each test entity corpus pair;
      and the second classification subunit is used for processing the local optimal characteristics of each test by using a Softmax model and acquiring the test classification probability of each test entity corpus pair.
    16. The method of claim 9, 10, 13, or 14, wherein the predetermined corpus of entities is a predetermined taxation corpus of entities.
    17. An apparatus for determining similarity between entity corpuses, comprising:
      at least one processor, and a memory coupled to the at least one processor;
      wherein the memory stores instructions executable by the at least one processor, the at least one processor performing the method of any one of claims 1-8 by executing the instructions stored by the memory.
    18. A computer-readable storage medium, comprising:
      the computer-readable storage medium having stored thereon computer instructions which, when executed by at least one processor of the apparatus for determining similarity between entity corpuses, implement the method according to any one of claims 1-8.
    Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201811151935.4A CN110969005B (en) | 2018-09-29 | 2018-09-29 | Method and device for determining similarity between entity corpora | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201811151935.4A CN110969005B (en) | 2018-09-29 | 2018-09-29 | Method and device for determining similarity between entity corpora | 
Publications (2)
| Publication Number | Publication Date | 
|---|---|
| CN110969005A true CN110969005A (en) | 2020-04-07 | 
| CN110969005B CN110969005B (en) | 2023-10-31 | 
Family
ID=70027498
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN201811151935.4A Active CN110969005B (en) | 2018-09-29 | 2018-09-29 | Method and device for determining similarity between entity corpora | 
Country Status (1)
| Country | Link | 
|---|---|
| CN (1) | CN110969005B (en) | 
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN112101041A (en) * | 2020-09-08 | 2020-12-18 | 平安科技(深圳)有限公司 | Entity relationship extraction method, device, equipment and medium based on semantic similarity | 
| CN112487201A (en) * | 2020-11-26 | 2021-03-12 | 西北工业大学 | Knowledge graph representation method using shared parameter convolutional neural network | 
| CN113051900A (en) * | 2021-04-30 | 2021-06-29 | 中国平安人寿保险股份有限公司 | Synonym recognition method and device, computer equipment and storage medium | 
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US20120158687A1 (en) * | 2010-12-17 | 2012-06-21 | Yahoo! Inc. | Display entity relationship | 
| US20160148116A1 (en) * | 2014-11-21 | 2016-05-26 | International Business Machines Corporation | Extraction of semantic relations using distributional relation detection | 
| CN106484675A (en) * | 2016-09-29 | 2017-03-08 | 北京理工大学 | Fusion distributed semantic and the character relation abstracting method of sentence justice feature | 
| CN107038480A (en) * | 2017-05-12 | 2017-08-11 | 东华大学 | A kind of text sentiment classification method based on convolutional neural networks | 
| CN107220237A (en) * | 2017-05-24 | 2017-09-29 | 南京大学 | A kind of method of business entity's Relation extraction based on convolutional neural networks | 
| CN108021555A (en) * | 2017-11-21 | 2018-05-11 | 浪潮金融信息技术有限公司 | A kind of Question sentence parsing measure based on depth convolutional neural networks | 
| US20180157643A1 (en) * | 2016-12-06 | 2018-06-07 | Siemens Aktiengesellschaft | Device and method for natural language processing | 
| CN108292310A (en) * | 2015-11-05 | 2018-07-17 | 微软技术许可有限责任公司 | For the relevant technology of digital entities | 
- 
        2018
        - 2018-09-29 CN CN201811151935.4A patent/CN110969005B/en active Active
 
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US20120158687A1 (en) * | 2010-12-17 | 2012-06-21 | Yahoo! Inc. | Display entity relationship | 
| US20160148116A1 (en) * | 2014-11-21 | 2016-05-26 | International Business Machines Corporation | Extraction of semantic relations using distributional relation detection | 
| CN108292310A (en) * | 2015-11-05 | 2018-07-17 | 微软技术许可有限责任公司 | For the relevant technology of digital entities | 
| CN106484675A (en) * | 2016-09-29 | 2017-03-08 | 北京理工大学 | Fusion distributed semantic and the character relation abstracting method of sentence justice feature | 
| US20180157643A1 (en) * | 2016-12-06 | 2018-06-07 | Siemens Aktiengesellschaft | Device and method for natural language processing | 
| CN107038480A (en) * | 2017-05-12 | 2017-08-11 | 东华大学 | A kind of text sentiment classification method based on convolutional neural networks | 
| CN107220237A (en) * | 2017-05-24 | 2017-09-29 | 南京大学 | A kind of method of business entity's Relation extraction based on convolutional neural networks | 
| CN108021555A (en) * | 2017-11-21 | 2018-05-11 | 浪潮金融信息技术有限公司 | A kind of Question sentence parsing measure based on depth convolutional neural networks | 
Non-Patent Citations (3)
| Title | 
|---|
| MATTHEW FRANCIS-LANDAU ET AL: "Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks", ARXIV, pages 1 - 7 * | 
| 刘凯;符海东;邹玉薇;顾进广;: "基于卷积神经网络的中文医疗弱监督关系抽取", 计算机科学, no. 10, pages 254 - 258 * | 
| 魏勇;: "关联语义结合卷积神经网络的文本分类方法", 控制工程, no. 02, pages 187 - 190 * | 
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN112101041A (en) * | 2020-09-08 | 2020-12-18 | 平安科技(深圳)有限公司 | Entity relationship extraction method, device, equipment and medium based on semantic similarity | 
| WO2021121198A1 (en) * | 2020-09-08 | 2021-06-24 | 平安科技(深圳)有限公司 | Semantic similarity-based entity relation extraction method and apparatus, device and medium | 
| CN112487201A (en) * | 2020-11-26 | 2021-03-12 | 西北工业大学 | Knowledge graph representation method using shared parameter convolutional neural network | 
| CN113051900A (en) * | 2021-04-30 | 2021-06-29 | 中国平安人寿保险股份有限公司 | Synonym recognition method and device, computer equipment and storage medium | 
| CN113051900B (en) * | 2021-04-30 | 2023-08-22 | 中国平安人寿保险股份有限公司 | Synonym recognition method, synonym recognition device, computer equipment and storage medium | 
Also Published As
| Publication number | Publication date | 
|---|---|
| CN110969005B (en) | 2023-10-31 | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| CN108363790B (en) | Method, device, equipment and storage medium for evaluating comments | |
| CN116561538A (en) | Question-answer scoring method, question-answer scoring device, electronic equipment and storage medium | |
| CN113220832B (en) | Text processing method and device | |
| CN113255331B (en) | Text error correction method, device and storage medium | |
| CN111125295B (en) | A method and system for obtaining answers to food safety questions based on LSTM | |
| CN110968725B (en) | Image content description information generation method, electronic device and storage medium | |
| CN117453895B (en) | Intelligent customer service response method, device, equipment and readable storage medium | |
| CN111460149A (en) | Text classification method, related device and readable storage medium | |
| CN112101042A (en) | Text emotion recognition method and device, terminal device and storage medium | |
| CN110852071B (en) | Knowledge point detection method, device, equipment and readable storage medium | |
| CN116842168B (en) | Cross-domain problem processing method and device, electronic equipment and storage medium | |
| CN107783958B (en) | Target statement identification method and device | |
| CN110969005B (en) | Method and device for determining similarity between entity corpora | |
| CN112632956A (en) | Text matching method, device, terminal and storage medium | |
| CN117094383A (en) | Joint training method, system, equipment and storage medium for language model | |
| CN107797981B (en) | Target text recognition method and device | |
| CN110334204B (en) | Exercise similarity calculation recommendation method based on user records | |
| CN112464664A (en) | Multi-model fusion Chinese vocabulary repeated description extraction method | |
| CN117056545A (en) | Question data generation method, content acquisition method and device | |
| CN115481620A (en) | Phonetic spelling error correction method, device, computer equipment and storage medium | |
| CN112686052B (en) | Test question recommendation and related model training method, electronic equipment and storage device | |
| CN115292492A (en) | Method, device and equipment for training intention classification model and storage medium | |
| CN116955697A (en) | Analysis method, analysis device, analysis equipment, analysis medium and analysis program product for search results | |
| CN108733757B (en) | Text search method and system | |
| CN113886521A (en) | An automatic labeling method of text relations based on similar vocabulary | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |