CN116842194B

CN116842194B - A power semantic knowledge graph system and method

Info

Publication number: CN116842194B
Application number: CN202310806140.7A
Authority: CN
Inventors: 梁寿愚; 何宇斌; 李映辰; 张坤; 吴小刚; 李文朝; 胡荣; 周华锋; 江伟; 顾慧杰; 符秋稼
Original assignee: China Southern Power Grid Co Ltd
Current assignee: China Southern Power Grid Co Ltd
Priority date: 2023-07-03
Filing date: 2023-07-03
Publication date: 2025-02-28
Anticipated expiration: 2043-07-03
Also published as: CN116842194A

Abstract

A power semantic knowledge graph system and method are disclosed. A smart question-answering system is constructed by using natural language processing technology based on deep learning and combining it with the knowledge graph in the power field to optimize the power audit workflow and enhance audit efficiency.

Description

Electric power semantic knowledge graph system and method

Technical Field

The application relates to the field of electric power, and in particular relates to an electric power semantic knowledge graph system and an electric power semantic knowledge graph method.

Background

The auditing work business flow of the electric power system is complex, and relates to a plurality of links. Wherein the audit data has complex characteristics of non-vertical field and large transverse span. Audit workers need to know various laws and regulations, historical problems, etc. in real time. The manual method generally adopts modes of paper data inquiry, internet search and the like, and related information is difficult to quickly obtain by the methods, so that auditing operation efficiency is greatly influenced.

With the rapid development of artificial intelligence technology in recent years, the unprecedented breadth and depth of artificial intelligence technology driven by big data are rapidly fused with the development of electric power and related industries, and become urgent demands for assisting audit work. The intelligent question-answering system aims to automatically respond to questions posed to the system by a user. However, the existing question-answering system is more focused on boring conversations, intelligent question-answering researches in other professional fields generally relate to the fields of medical treatment, education, electronic commerce and the like, and related work in the auditing field is less. For auditing complex business processes, the existing question-answering system can not well solve the requirements of electric power audit question-answering.

Thus, an optimized solution is desired.

Disclosure of Invention

The present application has been made to solve the above-mentioned technical problems. The embodiment of the application provides a power semantic knowledge graph system and a method, which utilize a natural language processing technology based on deep learning and combine a knowledge graph in the power field to construct an intelligent question-answering system so as to optimize a power auditing workflow and enhance auditing efficiency.

According to one aspect of the present application, there is provided a power semantic knowledge graph system, comprising:

The problem acquisition module is used for acquiring a power audit problem;

the candidate entity extraction module is used for extracting semantic embedded representation of the first candidate entity from the power semantic knowledge graph;

the matching module is used for analyzing and processing the electric power audit problem and the semantic embedded representation of the first alternative entity based on a deep convolutional neural network model to obtain an optimized semantic matching feature matrix and

And the output result generation module is used for determining whether to output the first candidate entity based on the optimized semantic matching feature matrix.

According to another aspect of the present application, there is provided an intelligent floor scrubber control method, including:

acquiring a power audit problem;

extracting semantic embedded representations of the first alternative entities from the power semantic knowledge graph;

analyzing and processing the electric audit problem and the semantic embedded representation of the first candidate entity based on a deep convolutional neural network model to obtain an optimized semantic matching feature matrix, and

And determining whether to output the first candidate entity based on the optimized semantic matching feature matrix.

According to the embodiment of the disclosure, an intelligent question-answering system is constructed by utilizing a natural language processing technology based on deep learning and combining a power field knowledge graph so as to optimize a power audit workflow and enhance audit efficiency.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing embodiments of the present application in more detail with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.

FIG. 1 is a block diagram of a power semantic knowledge graph system, according to an embodiment of the application;

FIG. 2 is a system architecture diagram of a power semantic knowledge graph system, according to an embodiment of the application;

FIG. 3 is a block diagram of a matching module in a power semantic knowledge-graph system, according to an embodiment of the application;

FIG. 4 is a block diagram of a semantic analysis unit in a power semantic knowledge graph system, according to an embodiment of the present application;

FIG. 5 is a block diagram of an output result generation module in a power semantic knowledge-graph system, according to an embodiment of the application;

fig. 6 is a flowchart of a power semantic knowledge graph method according to an embodiment of the application.

Detailed Description

Hereinafter, exemplary embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.

As used in the specification and in the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.

A flowchart is used in the present application to describe the operations performed by a system according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in order precisely. Rather, the various steps may be processed in reverse order or simultaneously, as desired. Also, other operations may be added to or removed from these processes.

Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.

The auditing work business flow of the electric power system is complex, and relates to a plurality of links. Wherein the audit data has complex characteristics of non-vertical field and large transverse span. Audit workers need to know various laws and regulations, historical problems, etc. in real time. The manual method generally adopts modes of paper data inquiry, internet search and the like, and related information is difficult to quickly obtain by the methods, so that auditing operation efficiency is greatly influenced. Thus, an optimized solution is desired.

Fig. 1 is a block diagram of a power semantic knowledge graph system, according to an embodiment of the application. Fig. 2 is a system architecture diagram of a power semantic knowledge graph system, according to an embodiment of the application. As shown in fig. 1 and 2, the power semantic knowledge graph system 300 according to an embodiment of the present application includes a problem obtaining module 310 configured to obtain a power audit problem, an alternative entity extracting module 320 configured to extract a semantic embedded representation of a first alternative entity from a power semantic knowledge graph, a matching module 330 configured to analyze and process the power audit problem and the semantic embedded representation of the first alternative entity based on a deep convolutional neural network model to obtain an optimized semantic matching feature matrix, and an output result generating module 340 configured to determine whether to output the first alternative entity based on the optimized semantic matching feature matrix.

In particular, the problem acquisition module 310 is configured to acquire a power audit problem. Power auditing refers to the evaluation and analysis of the electricity usage of a building or facility to determine whether energy consumption and cost can be reduced by improving the manner and equipment in which energy is used.

Specifically, in one example of the application, the method for acquiring the electric power audit question can be realized in various ways, wherein the method comprises the following steps of 1, inputting by a user, the electric power audit question directly to an intelligent question-answering system, receiving and acquiring the electric power audit question by the system, 2, recognizing and extracting texts related to electric power audit by the intelligent question-answering system through OCR technology, and acquiring the electric power audit question, and 3, recognizing and converting the voice input by the intelligent question-answering system through the voice recognition technology, and acquiring the electric power audit question.

It should be noted that, in other specific examples of the present application, the electric audit problem may be obtained by other means, for example, searching for a professional electric audit company or organization, which can provide professional electric audit service and help you identify potential energy waste and improvement opportunities, searching for information about electric audit on websites of the electric audit company or organization, knowing information about service ranges, charging standards, customer evaluations, etc., contacting customer service personnel of the electric audit company or organization, consulting about the electric audit problem, for example, how to perform electric audit, which data and information need to be provided, the period and cost of electric audit, etc., preparing relevant data and information, for example, electricity consumption, energy consumption, cost, etc., of building or equipment, arranging for electric auditors to perform on-site inspection on the building or equipment, collecting more detailed data and information, and providing improvement suggestions and schemes, etc.

In particular, the candidate entity extraction module 320 is configured to extract the semantic embedded representation of the first candidate entity from the power semantic knowledge graph. Here, a Knowledge Graph (knowledgegraph) is a method for representing and organizing Knowledge based on a Graph data structure.

Knowledge graph is a semi-structured knowledge representation method, and an entity relationship graph is used to describe real world entities and relationships between them. After the knowledge is extracted from the text and the database, the knowledge map performs the processing procedures of concept classification, relation extraction, semantic analysis and the like, and then the knowledge is changed into the forms of entities, attributes and relations to be organized into a large-scale graph database. The knowledge graph has the characteristics of easy understanding and expansion, namely, the knowledge graph adopts the form of edges and points to represent concepts and relations, is visual and easy to understand and convenient to expand, and 2, the semantics are rich, namely, the knowledge graph can improve the understanding and application of a machine to data by standardizing and semantically processing the entities and the relations, and 3, an efficient reasoning mechanism, namely, the knowledge graph is established on a graph database and can be inferred and analyzed by using a graph-based algorithm.

Accordingly, in one possible implementation, the semantic embedded representation of the first candidate entity may be extracted from the power semantic knowledge graph by determining a name or identifier of the first candidate entity. For example, assume that the name of the first candidate entity is "electric automobile", and searching for the entity node of "electric automobile" in the electric semantic knowledge graph. This can be accomplished by using a atlas query language (e.g., SPARQL) or atlas search engine (e.g., neo4 j), and acquiring all relevant attributes and relationships of "electric car". These attributes and relationships may include manufacturer, model, battery capacity, charge mode, horsepower, etc. of the "electric vehicle," converting these attributes and relationships into vector representations. Some pre-trained natural language processing models (e.g., BERT, GPT, etc.) or graphic neural network models (e.g., GCN, GAT, etc.) may be used to convert attributes and relationships into vector representations, and all vector representations may be combined into one vector representing "electric vehicle". This can be achieved by simply stitching all vectors together or using some aggregation function (e.g. average, maximum, weighted average, etc.), the resulting vector being the semantic embedded representation of the "electric car". It is noted that the quality and accuracy of the semantic embedded representation depends on the model and algorithm used, as well as the integrity and accuracy of the extracted attributes and relationships.

In particular, in one specific example of the present application, as shown in fig. 3, the matching module 330 includes a semantic analysis unit 331 configured to perform semantic analysis on the electric power audit problem to obtain a multi-scale electric power audit problem semantic coding feature vector, an association matching unit 332 configured to perform association coding on the multi-scale electric power audit problem semantic coding feature vector and the semantic embedded representation of the first candidate entity to obtain a semantic matching feature matrix, and an expression effect optimization unit 333 configured to perform expression effect optimization of text semantic association features on the semantic matching feature matrix to obtain the optimized semantic matching feature matrix.

Specifically, the semantic analysis unit 331 is configured to perform semantic analysis on the electric power audit problem to obtain a multi-scale electric power audit problem semantic coding feature vector. In one specific example of the present application, as shown in fig. 4, the semantic analysis unit 331 includes a word segmentation subunit 3311 for performing word segmentation processing on the electric power audit problem to obtain a sequence of electric power audit problem descriptors, a word embedding subunit 3312 for passing the sequence of electric power audit problem descriptors through a word embedding layer to obtain a sequence of electric power audit problem descriptor embedding vectors, and a natural language processing subunit 3313 for passing the sequence of electric power audit problem descriptor embedding vectors through a dual-pipeline model including a first natural language processing model and a second natural language processing model to obtain the multi-scale electric power audit problem semantic coding feature vector.

More specifically, the word segmentation subunit 3311 is configured to perform word segmentation processing on the electric power audit problem to obtain a sequence of electric power audit problem descriptors. In the technical scheme of the application, the word segmentation processing is carried out on the electric power audit problem, so that the electric power audit problem can be decomposed into words one by one, and the problem is changed into a sequence which is easier to process.

The word segmentation process is a process of segmenting a text according to a certain rule and separating the text into individual words. Word segmentation may help a computer better understand human language because the computer needs to convert human language into a form that the computer can understand and process, an important step in word segmentation.

Accordingly, in one possible implementation, the electric audit question may be word-segmented to obtain a sequence of electric audit question descriptors by preprocessing the electric audit question, including removing stop words, punctuation marks, and the like, to obtain plain text, word-segmented to divide the text into individual words using a word segmentation tool, word-part tagging to determine the grammatical roles, such as nouns, verbs, adjectives, and the like, in a sentence, removing stop words, such as "words," "yes," and the like, which have no practical meaning for the description of the electric audit question, and word-drying to reduce the stop words, such as different tenses of verbs, different forms of nouns, and the like, to their original forms, such as reducing running to run, to obtain a sequence of electric audit question descriptors, including arranging the processed words in the sequence of electric audit question descriptors in the order of the sentence.

More specifically, the word embedding subunit 3312 is configured to pass the sequence of the electric audit problem description through a word embedding layer to obtain a sequence of electric audit problem description embedding vectors. It should be appreciated that the word embedding layer may convert a sequence of power audit problem descriptors into a sequence of power audit problem descriptor embedding vectors, a process also commonly referred to as word vectorization. That is, the word embedding layer may map words into a high-dimensional vector space and make them into a form of vectors.

Accordingly, in one possible implementation, the sequence of power audit problem descriptors may be passed through a word embedding layer to obtain a sequence of power audit problem descriptor embedding vectors by preparing a word embedding matrix, where each row corresponds to a word embedding vector. The matrix can be trained in advance or obtained by training on the electric power audit problem description word sequence, and each word in the electric power audit problem description word sequence is converted into a corresponding embedded vector. This may be achieved by looking up the embedded vectors for each word in the word embedding matrix, concatenating all the embedded vectors in sequence order to obtain a sequence of power audit problem descriptor embedded vectors, optionally applying some pre-processing steps on the embedded vector sequence, such as normalization or truncation, to ensure that the embedded vectors have the same length and scope, and finally providing the sequence of power audit problem descriptor embedded vectors to downstream tasks, such as classification or regression models, for further analysis and prediction.

More specifically, the natural language processing subunit 3313 is configured to insert the sequence of the electric audit question descriptor into a vector through a dual pipeline model including a first natural language processing model and a second natural language processing model to obtain the multi-scale electric audit question semantic coding feature vector. And the sequence of the embedded vectors of the electric power audit problem description words passes through a double-pipeline model comprising a first natural language processing model and a second natural language processing model to obtain the multi-scale semantic coding feature vectors of the electric power audit problem. That is, semantic information of the power audit problem is more comprehensively and accurately expressed using a dual-pipeline model process including a first natural language process model and a second natural language process model. The first natural language processing model and the second natural language processing model respectively carry out semantic understanding on sequences of the embedded vectors of the electric audit problem descriptors to different degrees.

In an embodiment of the present application, the first natural language processing model is a recurrent neural network model (Recurrent Neural Network, RNN), and the second natural language processing model is a Long Short-Term Memory network model (LSTM). Recurrent neural networks (Recurrent Neural Network, RNN) are mainly applied to the processing of sequence data, such as speech, text, time series, etc. It has the ability to process variable length sequence data, from which correlations and patterns can be learned. Feedback connections exist between neurons in the recurrent neural network model, and the feedback connections can transmit previous information to the current moment to form a dynamic neural network structure. The core of the RNN model is a loop unit (Recurrent Unit, RU) that uses the input at the current time, the output at the previous time, and the state information at the previous time to calculate the output and state at the current time. Long Short-Term Memory (LSTM) networks have three gates, an input gate, a forget gate, and an output gate per neuron, unlike the traditional recurrent neural network model. These gates allow the LSTM neural network to decide what information to input, which information needs to be preserved, and when to output the information. LSTM networks are typically composed of a plurality of LSTM cells, each having an internal state unit and three gates, and by recursively connecting the plurality of LSTM cells, deep LSTM networks can be formed for learning and modeling correlations and patterns in multi-temporal data. Since the recurrent neural network model and the long-short-term memory network model learn semantic association information under different sensitivity fields from the sequence of the electric power audit problem descriptor embedded vector, that is, the sequence of the electric power audit problem descriptor embedded vector is processed in a double-pipeline model formed by the first natural language processing model and the second natural language processing model, the sequence of the electric power audit problem descriptor embedded vector can be converted into a multi-scale electric power audit problem semantic coding feature vector, and the understanding and classifying capability of the model to the electric power audit problem can be improved.

In one specific example of the present application, the natural language processing subunit 3313 includes a first-scale natural language processing secondary subunit configured to pass the sequence of the electric audit problem descriptor embedded vectors through a first natural language processing model of a dual-pipeline model to obtain first-scale electric audit problem semantic coding feature vectors, a first-scale natural language processing secondary subunit configured to pass the sequence of the electric audit problem descriptor embedded vectors through a second natural language processing model of the dual-pipeline model to obtain second-scale electric audit problem semantic coding feature vectors, and a multi-scale fusion secondary subunit configured to fuse the first-scale electric audit problem semantic coding feature vectors and the second-scale electric audit problem semantic coding feature vectors to obtain the multi-scale electric audit problem semantic coding feature vectors.

It should be noted that, in other specific examples of the present application, the sequence of the electric audit problem descriptor embedding vectors may be further processed by a dual-pipeline model including a first natural language processing model and a second natural language processing model to obtain the multi-scale electric audit problem semantic coding feature vector, for example, first, the electric audit problem description is converted into a sequence of Word embedding vectors, each Word may be mapped to a vector representation with a fixed length using a pre-trained Word vector model, such as a Word2Vec or GloVe model, and then the Word embedding vector sequence is processed using a first natural language processing model, such as a Convolutional Neural Network (CNN) or a long short-term memory network (LSTM), to extract local features. These local features may be word, phrase or sentence level features and then the local features extracted by the first model are processed using a second natural language processing model, such as a Recurrent Neural Network (RNN) or Attention mechanism network (Attention), to extract global features. These global features may be semantic information of the entire power audit problem description, and thus the semantically encoded feature vectors of the audit problem. This multi-scale feature vector may be used to perform classification, clustering, or other related tasks.

It should be noted that, in other specific examples of the present application, the semantic analysis may be performed on the electric audit problem by other ways to obtain a semantic encoding feature vector of the multi-scale electric audit problem, for example, data preprocessing, including preprocessing the electric audit problem, including removing stop words, word segmentation, word drying, and other operations, to obtain a text sequence, semantic embedding representation, converting the text sequence into a vector representation, and may use a pre-trained Word vector model (such as Word2Vec, gloVe, etc.) or input the text sequence into a deep learning model to perform training, to obtain a semantic embedding representation of each Word, and multi-scale semantic analysis, including performing convolution operation using a plurality of different convolution check semantic embedding representations, to obtain semantic feature graphs of different scales. The method comprises the steps of carrying out a pooling operation on semantic feature graphs of each scale to obtain feature vectors of fixed size, carrying out feature fusion on the feature vectors of different scales, carrying out normalization on the fused feature vectors to enable the feature vectors to have the same scale and range, and obtaining a multi-scale power audit problem semantic coding feature vector, wherein the feature fusion is carried out by using simple weighted average or a more complex attention mechanism and other methods.

Specifically, the association matching unit 332 is configured to perform association coding on the multi-scale power audit problem semantic coding feature vector and the semantic embedded representation of the first candidate entity to obtain a semantic matching feature matrix. In the technical scheme of the application, after the vectorized semantic expression of the problem and the first candidate entity is obtained, the matching degree and the association degree between the semantic representation of the problem and the semantic representation of the first candidate entity are expected to be calculated. That is, a mapping relationship between the semantic representation of the problem and the semantic representation of the first candidate entity is established by adopting a mode of association coding. In this way, the semantic matching feature matrix obtained by association coding can fully consider the interrelationship between the problem semantic expression and the semantic expression of the first candidate entity, and is not limited to the semantic expression of each split. More specifically, the multi-scale power audit problem semantic coding feature vector and the semantic embedded representation of the first candidate entity are subjected to associated coding by the following associated formula to obtain the semantic matching feature matrix, wherein the formula is as follows: Where V _m represents the multi-scale power audit problem semantically encoded feature vector, Representing a transpose of the multi-scale power audit problem semantically encoded feature vector, V _n representing a semantically embedded representation of the first candidate entity, M representing the semantically matched feature matrix,Representing vector multiplication.

Accordingly, in one possible implementation, the multi-scale power audit problem semantically encoded feature vector and the semantically embedded representation of the first candidate entity may be associated encoded to obtain a semantically matched feature matrix by semantically encoding the multi-scale power audit problem as a feature vector. This may be achieved by converting the question into a numerical vector using natural language processing techniques such as a bag of words model or a word embedding model, the semantic embedding of the first candidate entity being represented as a feature vector. This may be achieved by using Word embedding models such as Word2Vec or GloVe to convert the entities into numerical vectors, and correlating the problem feature vectors with the entity feature vectors to obtain a semantically matched feature matrix. This can be accomplished by using methods such as cosine similarity or dot product calculation, inputting the semantic matching feature matrix into a machine learning algorithm for classification or clustering to solve the multi-scale power audit problem.

Specifically, the expression effect optimization unit 333 is configured to perform expression effect optimization of text semantic association features on the semantic matching feature matrix to obtain the optimized semantic matching feature matrix. In a specific example of the present application, the expression effect optimizing unit 333 includes a multisource information fusion pre-verification distribution evaluation optimizing subunit, configured to perform multisource information fusion pre-verification distribution evaluation optimization on each line feature vector of the semantic matching feature matrix to obtain a plurality of optimized line feature vectors, and a ranking subunit, configured to rank the plurality of optimized line feature vectors into the optimized semantic matching feature matrix.

More specifically, the multisource information fusion pre-verification distribution evaluation optimization subunit is configured to perform multisource information fusion pre-verification distribution evaluation optimization on each line feature vector of the semantic matching feature matrix to obtain a plurality of optimized post-line feature vectors. In the technical scheme of the application, when the semantic embedded representation of the first candidate entity and the semantic coding feature vector of the multi-scale power audit problem are subjected to associated coding to obtain the semantic matching feature matrix, the semantic coding feature vector of the multi-scale power audit problem and the semantic embedded representation of the first candidate entity are subjected to position-by-position association, so that each row feature vector of the semantic matching feature matrix can be regarded as an associated feature vector of the whole semantic embedded representation of each feature value of the semantic coding feature vector of the multi-scale power audit problem and the semantic embedded representation of the first candidate entity, and the semantic matching feature matrix is equivalent to a combined feature set of a local feature set corresponding to each row feature vector. And, since the text semantic association feature distribution of the multi-scale power audit problem expressed by the multi-scale power audit problem semantic coding feature vector is arranged among the feature distribution of each row feature vector, each row feature vector has a neighborhood distribution relation which is associated with each other and also has a multi-source information association relation which corresponds to the graph semantic embedding association distribution information of the candidate entity. Therefore, in order to promote the expression effect of the semantic matching feature matrix as a whole on text semantic association features of different scales of the electric power audit problem, the applicant of the present application performs multisource information fusion pre-verification distribution evaluation optimization on each line feature vector, for example, denoted as V _i, so as to obtain an optimized line feature vector V ^′ _i, which is specifically expressed as:

Wherein V _i is the ith row feature vector of the semantic matching feature matrix, V _j is the jth row feature vector of the semantic matching feature matrix, Is a mean value feature vector, n is a neighborhood setting hyper-parameter, log represents a logarithmic function value based on 2,Representing the per-position subtraction, V ^′ _i is the i-th optimized row feature vector of the optimized semantic matching feature matrix. Here, the optimization of the multisource information fusion pre-verification distribution evaluation can be used for realizing effective folding of the pre-verification information of each feature vector on the local synthesis distribution based on the quasi-maximum likelihood estimation of the feature distribution fusion robustness for the feature local collection formed by a plurality of mutually-associated neighborhood parts, and the optimization paradigm of standard expected fusion information which can be used for evaluating the internal association in the collection and the change relation between the collection is obtained through the pre-verification distribution construction under the multisource condition, so that the information expression effect of the feature vector fusion based on the multisource information association is improved. Therefore, the optimized line feature vector V ^′ _i is arranged as the semantic matching feature matrix, and the expression effect of the semantic matching feature matrix as a whole on text semantic association features with different scales of the electric power audit problem can be improved.

More specifically, the arrangement subunit is configured to arrange the plurality of optimized post-row feature vectors into the optimized semantic matching feature matrix. That is, after the plurality of optimized post-row feature vectors are obtained, the plurality of optimized post-row feature vectors are further two-dimensionally arranged to obtain the optimized semantic matching feature matrix.

Accordingly, in one possible implementation, the plurality of optimized row feature vectors may be arranged as the optimized semantic matching feature matrix by arranging each optimized row feature vector in a certain order to form a matrix. The sequence can determine that each optimized row feature vector is standardized according to a certain rule according to factors such as feature importance, dimension after dimension reduction and the like so as to ensure that the optimized row feature vectors are on the same dimension, and all the standardized optimized row feature vectors are arranged together to form an optimized semantic matching feature matrix.

It should be noted that, in other specific examples of the present application, the expression effect of the text semantic association feature may be optimized for the semantic matching feature matrix in other manners to obtain the optimized semantic matching feature matrix, for example, first, preprocessing is performed on the semantic matching feature matrix, including operations of removing stop words, extracting word stems, labeling parts of speech, and the like. The preprocessing steps can improve the expression capability and discrimination of feature vectors, optimize semantic matching feature matrices by using the expression mode of some text semantic association features, for example, weighting word vectors by using a TF-IDF method or using some text similarity calculation method such as editing distance, jaccard similarity and the like, reduce the dimension of the feature vectors by using some dimension reduction method such as Principal Component Analysis (PCA) or Linear Discriminant Analysis (LDA) to improve the training speed and generalization capability of the model, and select optimal machine learning models and parameters by using some model selection and parameter adjustment methods such as cross-validation, grid search and the like to improve the performance and robustness of the model.

It is worth mentioning that in other specific examples of the application, the deep convolutional neural network model can be used for analyzing and processing the electric audit problem and the semantic embedded representation of the first candidate entity to obtain an optimized semantic matching feature matrix, for example, data preprocessing, namely preprocessing the electric audit problem and the text data of the first candidate entity, including removing operations such as stop words, segmentation words and Word drying, to obtain a text sequence, semantic embedding representation, namely converting the text sequence into vector representation, inputting the text sequence into a deep learning model by using a pre-trained Word vector model (such as Word2Vec and Glove, and the like) or training the text sequence to obtain a semantic embedding representation of each Word, constructing the deep convolutional neural network model, namely constructing a proper deep convolutional neural network model, including a convolutional layer, a pooling layer and a full-connection layer, and the like, according to task requirements, for extracting features from the semantic embedding representation, training the deep convolutional neural network model by using a marked data set, adjusting model parameters, enabling the text sequence to be better extracted features, namely performing semantic matching between the electric audit feature extraction and the first candidate entity model, performing semantic matching on the semantic audit feature matrix and the first candidate entity by using a semantic audit rule, performing a semantic matching algorithm, namely performing the semantic matching on the semantic matching feature matrix according to the first candidate entity classification result, providing a reference for subsequent decisions.

In particular, the output result generating module 340 is configured to determine whether to output the first candidate entity based on the optimized semantic matching feature matrix. In one specific example of the present application, as shown in fig. 5, the output result generating module 340 includes a classifying unit 341 configured to pass the optimized semantic matching feature matrix through a classifier to obtain a classification result, where the classification result is used to indicate whether a probability that the first candidate entity is an entity most relevant to the electric power audit problem exceeds a predetermined threshold, and an output unit 342 configured to determine whether to output the first candidate entity based on the classification result.

Specifically, the classifying unit 341 is configured to pass the optimized semantic matching feature matrix through a classifier to obtain a classification result, where the classification result is used to indicate whether the probability that the first candidate entity is the entity most relevant to the power audit problem exceeds a predetermined threshold. That is, the semantic matching feature matrix is passed through a classifier to obtain a classification result that is used to represent whether the probability that the first candidate entity is the most relevant entity to the power audit problem exceeds a predetermined threshold. In a specific example of the present application, the classification unit 341 includes an expansion subunit, configured to expand the optimized semantic matching feature matrix into a classification feature vector based on a row vector or a column vector, a full-connection encoding subunit, configured to perform full-connection encoding on the classification feature vector by using multiple full-connection layers of the classifier to obtain an encoded classification feature vector, and a classification result generation subunit, configured to pass the encoded classification feature vector through a Softmax classification function of the classifier to obtain the classification result.

The classifier can learn a classification rule according to the relation between the feature matrix and the classification label in the training data, and the classification rule is used for classifying and predicting the semantic matching feature matrix input during inference, so that a classification result is obtained. It is worth mentioning that the classifier obtains a probability that the classifier calculates the maximum of the probability value of "the probability that the first candidate entity is the most relevant entity to the power audit problem exceeds the predetermined threshold value" and the probability value of "the probability that the first candidate entity is the most relevant entity to the power audit problem does not exceed the predetermined threshold value".

A classifier is a machine learning model that is used to classify input data into different categories or labels. It may be a simple rule set or a complex mathematical model. Classifiers typically use known sets of training data to learn how to classify new data into known classes. In practical applications, the classifier can be used in the fields of image recognition, speech recognition, natural language processing, and the like.

It should be noted that, in other specific examples of the present application, the optimized semantic matching feature matrix may also be passed through a classifier in other manners to obtain a classification result, where the classification result is used to indicate whether the probability that the first candidate entity is the most relevant entity to the electric power audit problem exceeds a predetermined threshold, for example, using the optimized semantic matching feature matrix as an input, and training a classifier using a machine learning algorithm. The commonly used classifiers include Support Vector Machines (SVMs), decision trees, random forests and the like, and the trained classifiers are applied to test data to obtain classification results of the test data. The test data refers to the entity to be classified, such as fitness equipment, power equipment and the like, and whether the classification result exceeds a preset threshold value is judged. The preset threshold is set according to actual requirements and performance indexes and is used for judging whether the classification result reaches an expected target.

Specifically, the output unit 342 is configured to determine whether to output the first candidate entity based on the classification result. That is, if the probability of the classification result exceeds a predetermined threshold, the entity is considered to be most relevant to the power audit problem, otherwise, the entity is considered to be irrelevant to the power audit problem, and the classification result is output to indicate whether the probability of the first candidate entity being the entity most relevant to the power audit problem exceeds the predetermined threshold. And if the classification result exceeds a preset threshold, the entity is considered to be the most relevant entity to the power audit problem, outputting the first alternative entity, and otherwise, continuing to consider other alternative entities.

It should be noted that, in other specific examples of the present application, it may also be determined whether to output the first candidate entity based on the optimized semantic matching feature matrix in other manners. For example, text data is collected and preprocessed, including Word segmentation, stop Word removal, word stem extraction and the like, the preprocessed text data is converted into a numerical vector, a Word bag model, TF-IDF, word2Vec and other methods can be used, the text data is subjected to semantic matching, cosine similarity, jaccard similarity, euclidean distance and other methods can be used, and a semantic matching feature matrix is constructed according to semantic matching results, wherein each row represents one text data, and each column represents one semantic feature. The method comprises the steps of performing semantic feature dimension reduction by using PCA, LDA and other methods to reduce feature dimension, optimizing a feature matrix, performing machine learning model training on the optimized feature matrix by using training data, performing multi-scale electric audit problem solving by using classification, regression, clustering and other methods, and performing classification, regression, clustering and other operations on test data by using the trained machine learning model to output a first candidate entity.

As described above, the power semantic knowledge graph system 300 according to the embodiment of the present application may be implemented in various wireless terminals, such as a server or the like having a power semantic knowledge graph algorithm. In one possible implementation, the power semantic knowledge-graph system 300 according to an embodiment of the present application may be integrated into a wireless terminal as a software module and/or hardware module. For example, the power semantic knowledge graph system 300 may be a software module in the operating system of the wireless terminal or may be an application developed for the wireless terminal, although the power semantic knowledge graph system 300 may be one of a plurality of hardware modules of the wireless terminal.

Alternatively, in another example, the power semantic knowledge-graph system 300 and the wireless terminal may be separate devices, and the power semantic knowledge-graph system 300 may be connected to the wireless terminal through a wired and/or wireless network and transmit interaction information in a agreed data format.

Further, a power semantic knowledge graph method is provided.

Fig. 6 is a flowchart of a power semantic knowledge graph method according to an embodiment of the application. As shown in FIG. 6, the power semantic knowledge graph method comprises the steps of S110, obtaining a power audit problem, S120, extracting semantic embedded representations of a first candidate entity from the power semantic knowledge graph, S130, analyzing and processing the power audit problem and the semantic embedded representations of the first candidate entity based on a deep convolutional neural network model to obtain an optimized semantic matching feature matrix, and S140, determining whether to output the first candidate entity based on the optimized semantic matching feature matrix.

In summary, the electric power semantic knowledge graph method based on the embodiment of the application utilizes a natural language processing technology based on deep learning and combines the knowledge graph of the electric power field to construct an intelligent question-answering system so as to optimize the electric power audit workflow and enhance the audit efficiency.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. An electric power semantic knowledge graph system, characterized by comprising:

The problem acquisition module is used to obtain power audit problems;

An alternative entity extraction module is used to extract the semantic embedding representation of the first candidate entity from the power semantic knowledge graph;

A matching module, configured to analyze and process the semantic embedding representation of the power audit problem and the first candidate entity based on a deep convolutional neural network model to obtain an optimized semantic matching feature matrix; and

An output result generating module, used for determining whether to output the first candidate entity based on the optimized semantic matching feature matrix;

Wherein, the matching module includes:

A semantic analysis unit, used for performing semantic analysis on the power audit problem to obtain a multi-scale power audit problem semantic encoding feature vector;

an association matching unit, configured to perform association encoding on the multi-scale power audit question semantic encoding feature vector and the semantic embedding representation of the first candidate entity to obtain a semantic matching feature matrix; and

An expression effect optimization unit, used for optimizing the expression effect of text semantic association features on the semantic matching feature matrix to obtain the optimized semantic matching feature matrix;

The association matching unit is used to: perform association encoding on the multi-scale power audit question semantic encoding feature vector and the semantic embedding representation of the first candidate entity using the following association formula to obtain the semantic matching feature matrix;

Wherein, the formula is:

Where _Vm represents the semantic encoding feature vector of the multi-scale power audit problem, represents the transposed vector of the semantic encoding feature vector of the multi-scale power audit problem, _Vn represents the semantic embedding representation of the first candidate entity, M represents the semantic matching feature matrix, Represents vector multiplication;

Wherein, the expression effect optimization unit includes:

A multi-source information fusion pre-test distribution evaluation optimization subunit is used to perform multi-source information fusion pre-test distribution evaluation optimization on each row feature vector of the semantic matching feature matrix to obtain a plurality of optimized row feature vectors; and

An arranging subunit, used for arranging the plurality of optimized row feature vectors into the optimized semantic matching feature matrix;

The multi-source information fusion pre-test distribution evaluation optimization subunit is used to: perform multi-source information fusion pre-test distribution evaluation optimization on each row feature vector of the semantic matching feature matrix according to the following optimization formula to obtain multiple optimized row feature vectors;

Wherein, the formula is:

Wherein, _Vi is the i-th row feature vector of the semantic matching feature matrix, _Vj is the j-th row feature vector of the semantic matching feature matrix, is the mean eigenvector, n is the neighborhood setting hyperparameter, log represents the logarithmic function value with base 2, represents positional subtraction, and V ^′ _i is the i-th optimized row feature vector of the optimized semantic matching feature matrix.

2. The electric power semantic knowledge graph system according to claim 1, characterized in that the semantic analysis unit comprises:

A word segmentation subunit, used for performing word segmentation processing on the power audit problem to obtain a sequence of power audit problem description words;

A word embedding subunit, used for passing the sequence of power audit problem description words through a word embedding layer to obtain a sequence of power audit problem description word embedding vectors; and

The natural language processing subunit is used to pass the sequence of word embedding vectors describing the power audit problem through a dual-pipeline model including a first natural language processing model and a second natural language processing model to obtain the multi-scale power audit problem semantic encoding feature vector.

3. The electric power semantic knowledge graph system according to claim 2, characterized in that the natural language processing subunit comprises:

A first-scale natural language processing secondary subunit, used for passing the sequence of embedding vectors of the electric power audit problem description words through the first natural language processing model of the dual-pipeline model to obtain a first-scale electric power audit problem semantic encoding feature vector;

A first-scale natural language processing secondary subunit is used to pass the sequence of the electric power audit problem description word embedding vectors through the second natural language processing model of the dual-pipeline model to obtain a second-scale electric power audit problem semantic encoding feature vector;

The multi-scale fusion secondary sub-unit is used to fuse the first-scale power audit question semantic coding feature vector and the second-scale power audit question semantic coding feature vector to obtain the multi-scale power audit question semantic coding feature vector.

4. The electric power semantic knowledge graph system according to claim 3, characterized in that the output result generation module comprises:

a classification unit, configured to pass the optimized semantic matching feature matrix through a classifier to obtain a classification result, wherein the classification result is used to indicate whether the probability that the first candidate entity is the entity most relevant to the power audit problem exceeds a predetermined threshold; and

An output unit is used to determine whether to output the first candidate entity based on the classification result.

5. The electric power semantic knowledge graph system according to claim 4, characterized in that the classification unit comprises:

An expansion subunit, used for expanding the optimized semantic matching feature matrix into a classification feature vector based on a row vector or a column vector;

a fully connected encoding subunit, configured to perform fully connected encoding on the classification feature vector using a plurality of fully connected layers of the classifier to obtain an encoded classification feature vector; and

The classification result generating subunit is used to pass the encoded classification feature vector through the Softmax classification function of the classifier to obtain the classification result.

6. A power semantic knowledge graph method, using the power semantic knowledge graph system according to claim 1, characterized in that it comprises:

Get power audit questions;

Extract the semantic embedding representation of the first candidate entity from the power semantic knowledge graph;

Analyzing and processing the semantic embedding representation of the power audit problem and the first candidate entity based on a deep convolutional neural network model to obtain an optimized semantic matching feature matrix; and

Based on the optimized semantic matching feature matrix, it is determined whether to output the first candidate entity.