Disclosure of Invention
The invention aims to provide an electronic manual processing method, an interactive system, electronic equipment and a readable storage medium, which are used for effectively responding to the operation behaviors of an interactive electronic manual client based on text association information by extracting the text association information in a technical document.
In order to solve the technical problems, the invention provides the following technical scheme:
An electronic manual processing method, comprising:
acquiring a technical document, and performing chapter decomposition on the technical document to acquire chapter text;
extracting text feature vectors of the chapter text, and classifying the article content of the chapter text based on the text feature vectors to obtain article classification;
extracting the topics of the chapter texts of different classes to obtain text topics;
Extracting keywords from the chapter text to obtain article keywords;
Storing the article classification, the text subject and the article keyword as text associated information of corresponding chapter text;
And responding to the operation behavior of the interactive electronic manual client, and outputting target chapter text associated with the operation behavior based on the text association information.
Preferably, performing chapter decomposition on the technical document to obtain chapter text, including:
Acquiring a technical document, and analyzing the technical document to obtain an article title and a title level;
And carrying out chapter decomposition on the technical document based on the article title and the title level to obtain chapter text.
Preferably, extracting keywords from the chapter text to obtain article keywords, including:
word segmentation processing is carried out on the chapter text to obtain a plurality of words forming the chapter text;
calculating word frequency and reverse file frequency of the words, and taking the product of the word frequency and the reverse file frequency as word score;
And selecting the article keywords from a plurality of words by utilizing the word scores.
Preferably, extracting the theme of the chapter text of different classes to obtain a text theme includes:
Taking the chapter texts of different classes as text groups;
training the theme of the text group by using a theme model algorithm;
after convergence is completed, determining the topic similarity of the article by using a machine learning text similarity algorithm;
And determining the text theme from the trained theme based on the similarity of the article theme.
Preferably, classifying the article content of the chapter text based on the text feature vector to obtain article classification, including:
and classifying the probability of the feature occurrence in the text feature vector under the condition of calculating a given category by using a naive Bayes algorithm and based on Bayes theorem to obtain the text classification.
Preferably, obtaining the technical document includes:
receiving the technical document from a designated interface;
acquiring title format information of the technical document by using the designated interface; the title format information includes a title level.
Preferably, the acquiring, using the specified interface, title format information of the technical document includes:
Creating a chapter object corresponding to the technical document in the process of receiving the technical document by using the designated interface; the chapter object stores chapter title contents, chapter text contents and a sub-chapter list;
Traversing each text segment in the technical document in a circulating way, and judging whether the current text is a title or not by utilizing the title format information;
if not, determining that the current text is text content, and writing the current text into chapter text content of the title object generated last time;
If yes, if the current text and the previous generated title object are the same-level title or the previous generated title object is the previous generated title object, determining that the text content of the previous generated title object is finished, generating a new title object, and writing the current text into the chapter title content; if the current text is the lower-level title of the title object generated in the previous time, generating a new title object, writing the current text into the chapter title content, and adding the new title object into a sub-chapter list of the title object generated in the previous time;
Converting the technical document into a JSON format object array divided according to a document structure by utilizing the chapter object; and each value in the array corresponds to each chapter object of the technical document, and if the chapter object has a sub-chapter, the chapter object has a sub-chapter object list as an attribute.
An interactive system, comprising:
An interactive electronic manual client, a file management server and a file analysis server;
The file analysis server is used for acquiring a technical document, and performing chapter decomposition on the technical document to acquire chapter text; extracting text feature vectors of the chapter text, and classifying the article content of the chapter text based on the text feature vectors to obtain article classification; extracting the topics of the chapter texts of different classes to obtain text topics;
Extracting keywords from the chapter text to obtain article keywords; taking the article classification, the text theme and the article keyword as text association information of corresponding chapter text; transmitting the text-related information and the technical document to the file management server;
The file management server is used for receiving and storing the text association information and the technical document; responding to the operation behavior of the interactive electronic manual client, and outputting a target chapter text associated with the operation behavior based on the text association information;
The interactive electronic manual client is used for providing an operation interface, interacting with the file management server and outputting the target chapter text fed back by the file management server.
An electronic device, comprising:
A memory for storing a computer program;
And the processor is used for realizing the steps of the electronic manual processing method when executing the computer program.
A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the electronic manual processing method described above.
The method provided by the embodiment of the invention comprises the following steps: acquiring a technical document, and performing chapter decomposition on the technical document to acquire chapter text; extracting text feature vectors of the chapter text, and classifying article contents of the chapter text based on the text feature vectors to obtain article classification; extracting the topics of the chapter texts of different classes to obtain text topics; extracting keywords from the chapter text to obtain article keywords; storing article classification, text subject and article keywords as text associated information of corresponding chapter text; in response to the operational behavior of the interactive electronic manual client, outputting a target chapter text associated with the operational behavior based on the text-related information.
In the invention, in order to realize effective processing of each part in the technical document, the complete technical document is subjected to chapter decomposition so as to obtain chapter text. Classifying the chapter text to obtain article classification; extracting keywords from the chapter text to obtain keywords of the chapter text; and extracting the topics from the chapter texts of different classes to obtain text topics. And storing the article classification, the text theme and the article keyword corresponding to the chapter text as text associated information of the chapter text. When a user operates at the interactive electronic manual client, in response to the operation behavior of the interactive electronic manual client, a target chapter text associated with the operation behavior can be output based on the text association information. Because the text associated information is stored based on the chapter text dimension, and the text associated information comprises article classification, text subject and article keywords, compared with the method for realizing interactive response in the complete document dimension based on the keywords only, the method and the device can provide more accurate output content in the interactive process.
The technical effects are as follows: the technical document is decomposed chapter by chapter, and key information is extracted and classified for each chapter by combining different algorithms, so that text associated information comprising article classification, text subject and article keywords is obtained, and the inquiring and interacting efficiency of the electronic manual can be improved based on the text associated information.
Correspondingly, the embodiment of the invention also provides a system, equipment and a readable storage medium corresponding to the electronic manual processing method based on the artificial intelligence, which have the technical effects and are not repeated herein.
Detailed Description
In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, fig. 1 is a flowchart of a method for processing an electronic manual according to an embodiment of the present invention, where the method can be applied to an interactive system (technical manual digital interactive system) as shown in fig. 2, and the method includes the following steps:
S101, acquiring a technical document, and performing chapter decomposition on the technical document to obtain chapter text.
In the technical manual digital interaction system, there are a large number of technical documents, and in order to realize efficient interaction based on the technical documents, in this embodiment, it is proposed to split the technical documents into units of chapter text for processing.
In particular, the technical document may be obtained by receiving the technical document, or downloading the technical document, or directly reading the technical document from a storage medium. In the present embodiment, the acquisition channel itself of the technical document is not limited.
In practical applications, the technical document may be embodied as a technical manual or other technical document of the device or system. The specific contents of what is described in the technical document and the specific division of chapters therein are not limited in this embodiment.
After the technical document is acquired, the technical document can be divided into a plurality of chapter texts of the technical document by the information about the title format and the like in the technical document.
In one embodiment of the present invention, performing chapter decomposition on a technical document to obtain chapter text includes:
Acquiring a technical document, and analyzing the technical document to obtain an article title and a title level;
and performing chapter decomposition on the technical document based on the article titles and the title levels to obtain chapter texts.
For convenience of description, the above steps are described in combination.
After the technical document is obtained, the technical document may be parsed to obtain article titles and title levels. Then, based on the article title and the title level, the technical document can be divided into chapters, so that each chapter text of the technical document is obtained.
In one embodiment of the invention, obtaining a technical document includes:
Receiving a technical document from a specified interface;
Acquiring title format information of a technical document by using a designated interface; the title format information includes a title level.
For convenience of description, the two steps are described in combination.
Specifically, the specified interface may be an interface capable of implementing uploading a document and acquiring title format information of the document, such as an Apache POI. The Apache POI (Poor Obfuscation Implementation, a concise version of fuzzy implementation) is a free open-source cross-platform Java API written in Java, and provides an API for Java programs to read and write files in Microsoft Office format.
That is, extraction of Word document information can be completed through the Apache POI, the Word document uploads technical data in a unified format, all title format information in the Word document is stored, and the title grade is distinguished in the storage.
In one embodiment of the present invention, obtaining title format information of a technical document includes:
Creating a chapter object corresponding to the technical document in the process of receiving the technical document by using the designated interface; the chapter object stores chapter title contents, chapter text contents and a sub-chapter list;
Traversing each text segment in the technical document in a circulating way, and judging whether the current text is a title or not by using title format information;
if not, determining that the current text is text content, and writing the current text into chapter text content of the title object generated last time;
if yes, if the current text and the title object generated in the previous time are the same-level title or the upper-level title of the title object generated in the previous time, determining that the text content of the title object generated in the previous time is finished, generating a new title object, and writing the current text into the chapter title content; if the current text is the lower-level title of the title object generated in the previous time, generating a new title object, writing the current text into the chapter title content, and adding the new title object into a sub-chapter list of the title object generated in the previous time;
Converting the technical document into a JSON format object array divided according to the document structure by using the chapter object; wherein, each value in the array corresponds to each chapter object of the technical document, and if the chapter object has a sub-chapter, the chapter object has a sub-chapter object list as an attribute.
For convenience of description, the above steps are described in combination.
Specifically, a chapter object may be defined for storing chapter title content, chapter text content, and a sub-chapter list.
Each text segment in the document can be circularly traversed, whether the text segment is a title is judged according to the format of the text, if not, the text segment is regarded as text content, and the text segment is regarded as the text content of the previous title object; if the title is the title and the title object generated last time is the same-level title or the previous title object is the upper-level title, the last generated title object is regarded as completing text content, and a new title object is generated; if the title object generated in the previous time is the next title of the title, a new title object is generated and used as a sub-level title of the previous title object.
In practical application, the whole document can be converted into a JSON (Java Script Object Notation, JS object numbered musical notation, a lightweight data exchange format) format object array according to the meaning chapter object of the document, each value in the array corresponds to each chapter object of the article, if each chapter object has a sub-chapter, the chapter object has a sub-chapter object list as an attribute, and all sub-chapter objects are stored.
In order to improve the text quality, after the technical document is decomposed to obtain the chapter text, operations such as text cleaning, text word segmentation, stop word filtering and the like of the text data can be performed by taking the chapter text as a unit.
S102, extracting text feature vectors of the chapter text, and classifying the article content of the chapter text based on the text feature vectors to obtain article classification.
After the chapter text is obtained, the chapter text can be classified based on the text feature vector by advancing the text feature vector of the chapter text, thereby obtaining the article classification.
In practical application, each chapter text can be input into a text content classification model for classification and identification, so that article classification of each chapter text is obtained.
In one embodiment of the present invention, classifying article content of chapter text based on text feature vectors to obtain article classification includes:
and classifying by using a naive Bayes algorithm and based on Bayes theorem through calculating the probability of feature occurrence in the text feature vector under the condition of a given category, so as to obtain the text classification.
The naive bayes algorithm is based on bayes theorem, classifies by calculating the probability of each feature occurrence under the condition of a given category, and can be used for text classification, and for a given text sample, the naive bayes algorithm calculates the posterior probability of each category and selects the category with the highest posterior probability as a classification result.
Specifically, text feature vectors can be obtained from all chapter texts of the technical document through a machine learning text feature extraction algorithm, and article content classification is performed through a naive Bayes algorithm.
And S103, extracting the topics of the chapter texts of different classes to obtain text topics.
It is contemplated that in practice, the chapter text may relate to a topic, although it belongs to a different category. In order to realize the interaction recommendation in the interaction process, the interaction recommendation can be performed not only based on classification, but also based on topics. Therefore, in this embodiment, topic extraction is performed on the chapter text of different classes, so as to obtain a common text topic. Specifically, algorithms such as a clustering algorithm can be adopted to extract and obtain the text subject.
In one embodiment of the present invention, extracting a theme from different types of chapter text to obtain a text theme includes:
Taking the chapter texts of different classes as text groups;
training the theme of the text group by using a theme model algorithm;
after convergence is completed, determining the topic similarity of the article by using a machine learning text similarity algorithm;
Based on the similarity of the article topics, determining the text topics from the trained topics.
That is, according to the article classification result, the articles of different classes can be used as a group to extract the topics, the LDA algorithm is used to perform algorithm training and complete convergence, then the machine learning text similarity algorithm is used to judge the similarity of the article topics, and the topics with higher similarity are extracted and stored as the same class of topics.
The LDA topic model belongs to a topic model algorithm and can be used for presuming topic distribution of a document. The topic of each document in the document set can be given in the form of probability distribution, so that topic clustering or text classification can be performed according to the topic distribution after analyzing some documents and extracting the topic distribution of the documents.
And S104, extracting keywords from the chapter text to obtain article keywords.
The interaction based on keywords is also one of the channels for improving the effectiveness of the interaction, so in this embodiment, keyword extraction may be performed on the chapter text, thereby obtaining the article keywords.
In practical applications, the chapter text may now be segmented, and then each word is counted, so that the article keywords are selected based on the repetition rate.
In one embodiment of the present invention, keyword extraction is performed on chapter text to obtain article keywords, including:
performing word segmentation processing on the chapter text to obtain a plurality of words forming the chapter text;
calculating word frequency and reverse file frequency of the words, and taking the product of the word frequency and the reverse file frequency as word score;
article keywords are selected from a number of terms using the term score.
For convenience of description, the above steps are described in combination.
TF (Term Frequency) is word frequency, IDF (Inverse Document Frequency) is inverse text frequency index.
TF-IDF (term frequency-inverse document frequency) is a commonly used weighting technique for information retrieval and data mining to evaluate the importance of a word to one of a corpus or corpus of documents. The importance of a word increases proportionally with the number of times it appears in the file, but at the same time decreases inversely with the frequency with which it appears in the corpus.
That is, in practical application, TF-IDF keyword extraction algorithm may be performed in units of chapters in an article, to obtain key information contained in each chapter, and to extract and store keywords in the chapters.
And S105, storing the article classification, the text theme and the article keyword as text associated information of the corresponding chapter text.
In practical applications, a file management server may be provided, which is used to store all text data and text-related information of all material files.
The text association information includes article classification, text topic and article keyword information.
And S106, responding to the operation behaviors of the interactive electronic manual client, and outputting target chapter text associated with the operation behaviors based on the text association information.
An operation interface can be provided at the interactive electronic manual client, and a user can realize related operations based on the operation interface. The interactive electronic manual client captures the operation behaviors of the user and feeds the operation behaviors back to the server, and the server can respond to the operation behaviors. Specifically, the target chapter text associated with the operation behavior may be output based on the text management information.
Of course, the operation interface may also be presented based on VR (Virtual Reality, also referred to as Virtual technology, virtual environment) and other technologies, and in this embodiment, the specific presentation form of the interaction is not limited.
Illustrating: the user can input query information at the interactive electronic manual client, perform approximate calculation with text association information based on key content in the query information, and select each target chapter text in the query list based on the calculated distance information; or when the interactive electronic manual client displays the A section text, acquiring text association information of the A section text, and finding out text association information closely associated with the A section text from the file management server through the text association information to closely associate with the B section text.
Illustrating: the electronic manual processing method can be applied to a VR-based vehicle equipment maintenance system and combines with interactive electronic manual technology. Specifically, a user identifies a vehicle maintenance fault point through a virtual-actual fusion technology, and obtains each target chapter text of a maintenance component and a related maintenance technology according to a virtual identification component operated by virtual-actual interaction of the user. Or directly searching the related articles through related article keywords input by the user, acquiring text related information related to the text subject of the articles, acquiring related chapter texts through the text related information, and simultaneously displaying the related articles and the related chapter texts on the user terminal.
The method provided by the embodiment of the invention comprises the following steps: acquiring a technical document, and performing chapter decomposition on the technical document to acquire chapter text; extracting text feature vectors of the chapter text, and classifying article contents of the chapter text based on the text feature vectors to obtain article classification; extracting the topics of the chapter texts of different classes to obtain text topics; extracting keywords from the chapter text to obtain article keywords; storing article classification, text subject and article keywords as text associated information of corresponding chapter text; in response to the operational behavior of the interactive electronic manual client, outputting a target chapter text associated with the operational behavior based on the text-related information.
In the invention, in order to realize effective processing of each part in the technical document, the complete technical document is subjected to chapter decomposition so as to obtain chapter text. Classifying the chapter text to obtain article classification; extracting keywords from the chapter text to obtain keywords of the chapter text; and extracting the topics from the chapter texts of different classes to obtain text topics. And storing the article classification, the text theme and the article keyword corresponding to the chapter text as text associated information of the chapter text. When a user operates at the interactive electronic manual client, in response to the operation behavior of the interactive electronic manual client, a target chapter text associated with the operation behavior can be output based on the text association information. Because the text associated information is stored based on the chapter text dimension, and the text associated information comprises article classification, text subject and article keywords, compared with the method for realizing interactive response in the complete document dimension based on the keywords only, the method and the device can provide more accurate output content in the interactive process.
The technical effects are as follows: the technical document is decomposed chapter by chapter, and key information is extracted and classified for each chapter by combining different algorithms, so that text associated information comprising article classification, text subject and article keywords is obtained, and the inquiring and interacting efficiency of the electronic manual can be improved based on the text associated information.
Corresponding to the above method embodiment, the embodiment of the present invention further provides an interactive system, where the interactive system described below and the electronic manual processing method described above may be referred to correspondingly.
Referring to fig. 2, the system includes:
An interactive electronic manual client 100, a file management server 200, and a file analysis server 300;
The file analysis server is used for acquiring the technical document, and decomposing the chapter of the technical document to obtain a chapter text; extracting text feature vectors of the chapter text, and classifying article contents of the chapter text based on the text feature vectors to obtain article classification; extracting the topics of the chapter texts of different classes to obtain text topics;
Extracting keywords from the chapter text to obtain article keywords; the article classification, the text theme and the article keyword are used as text association information of the corresponding chapter text; transmitting the text associated information and the technical document to a file management server;
the file management server is used for receiving and storing the text association information and the technical document; responding to the operation behaviors of the interactive electronic manual client, and outputting target chapter text associated with the operation behaviors based on the text association information;
And the interactive electronic manual client is used for providing an operation interface, interacting with the file management server and outputting a target chapter text fed back by the file management server.
By applying the system provided by the embodiment of the invention, the technical document is acquired, and chapter decomposition is carried out on the technical document to acquire chapter text; extracting text feature vectors of the chapter text, and classifying article contents of the chapter text based on the text feature vectors to obtain article classification; extracting the topics of the chapter texts of different classes to obtain text topics; extracting keywords from the chapter text to obtain article keywords; storing article classification, text subject and article keywords as text associated information of corresponding chapter text; in response to the operational behavior of the interactive electronic manual client, outputting a target chapter text associated with the operational behavior based on the text-related information.
In the invention, in order to realize effective processing of each part in the technical document, the complete technical document is subjected to chapter decomposition so as to obtain chapter text. Classifying the chapter text to obtain article classification; extracting keywords from the chapter text to obtain keywords of the chapter text; and extracting the topics from the chapter texts of different classes to obtain text topics. And storing the article classification, the text theme and the article keyword corresponding to the chapter text as text associated information of the chapter text. When a user operates at the interactive electronic manual client, in response to the operation behavior of the interactive electronic manual client, a target chapter text associated with the operation behavior can be output based on the text association information. Because the text associated information is stored based on the chapter text dimension, and the text associated information comprises article classification, text subject and article keywords, compared with the method for realizing interactive response in the complete document dimension based on the keywords only, the method and the device can provide more accurate output content in the interactive process.
The technical effects are as follows: the technical document is decomposed chapter by chapter, and key information is extracted and classified for each chapter by combining different algorithms, so that text associated information comprising article classification, text subject and article keywords is obtained, and the inquiring and interacting efficiency of the electronic manual can be improved based on the text associated information.
In a specific embodiment of the present invention, a file parsing server is specifically configured to obtain a technical document, parse the technical document, and obtain an article title and a title level;
and performing chapter decomposition on the technical document based on the article titles and the title levels to obtain chapter texts.
In one specific embodiment of the invention, the file analysis server is specifically used for performing word segmentation processing on the chapter text to obtain a plurality of words forming the chapter text;
calculating word frequency and reverse file frequency of the words, and taking the product of the word frequency and the reverse file frequency as word score;
article keywords are selected from a number of terms using the term score.
In one embodiment of the present invention, the file parsing server is specifically configured to use different types of chapter text as text groups;
training the theme of the text group by using a theme model algorithm;
after convergence is completed, determining the topic similarity of the article by using a machine learning text similarity algorithm;
Based on the similarity of the article topics, determining the text topics from the trained topics.
In a specific embodiment of the present invention, the file parsing server is specifically configured to use a naive bayes algorithm, and based on bayes theorem, classify by calculating probability of occurrence of features in text feature vectors under a given category condition, so as to obtain text classification.
In one embodiment of the present invention, a file parsing server is specifically configured to receive a technical document from a specified interface;
Acquiring title format information of a technical document by using a designated interface; the title format information includes a title level.
In one embodiment of the present invention, the file parsing server is specifically configured to create a chapter object corresponding to a technical document during receiving the technical document by using a designated interface; the chapter object stores chapter title contents, chapter text contents and a sub-chapter list;
Traversing each text segment in the technical document in a circulating way, and judging whether the current text is a title or not by using title format information;
if not, determining that the current text is text content, and writing the current text into chapter text content of the title object generated last time;
if yes, if the current text and the title object generated in the previous time are the same-level title or the upper-level title of the title object generated in the previous time, determining that the text content of the title object generated in the previous time is finished, generating a new title object, and writing the current text into the chapter title content; if the current text is the lower-level title of the title object generated in the previous time, generating a new title object, writing the current text into the chapter title content, and adding the new title object into a sub-chapter list of the title object generated in the previous time;
Converting the technical document into a JSON format object array divided according to the document structure by using the chapter object; wherein, each value in the array corresponds to each chapter object of the technical document, and if the chapter object has a sub-chapter, the chapter object has a sub-chapter object list as an attribute.
The electronic manual processing method can be applied to the system, and in order to facilitate the understanding and implementation of the electronic manual processing method by those skilled in the art, the electronic manual processing method is described in detail below with reference to a specific application scenario as an example.
Specifically, the electronic manual processing method may be implemented in a system as shown in fig. 2, and the details are as follows:
Interactive electronic manual client (herein simply referred to as client): and information browsing or equipment interactive operation of the interactive electronic manual is carried out through the visual operation interface. For example, the client performs maintenance operation interaction on the simulation equipment in a VR mode, and further performs related technical document display according to the interacted maintenance operation information.
File management server: all text data and text association data for storing all material files.
File parsing server: and the method is used for analyzing the searched data file and acquiring related text associated data and articles.
The text association data (i.e., text association information) is article classification, text topic, and article keyword information.
Referring to fig. 3, the method includes the steps of:
S1, uploading the data file to a file analysis server, and performing chapter decomposition on the file by the file analysis server according to the analysis article titles and the title level to obtain the article contents contained in different chapter titles.
That is, the file analysis server acquires all title positions in the document, and judges the chapter level of the document according to the text style or title number of the title;
Specifically, extraction of Word document information can be completed through Apache POI, the Word document is uploaded in a unified format, all title format information in the Word document is stored, and the title grade is distinguished in the storage;
Further, defining a chapter object, which is used for storing chapter title contents, chapter text contents and a sub-chapter list, circularly traversing each section of text in a document, judging whether the text is a title according to the format of the text, and if the text is not the title, regarding the text as text contents of the previous title object; if the title is the title and the title object generated last time is the same-level title or the previous title object is the upper-level title, the last generated title object is regarded as completing text content, and a new title object is generated; if the title object generated in the previous time is the next title of the title, a new title object is generated and used as a sub-level title of the previous title object.
Further, the whole document is converted into a JSON format object array divided according to the document structure according to the meaning chapter object of the document, each value in the array corresponds to each chapter object of the article, if each chapter object has a sub-chapter, the chapter object has a sub-chapter object list as an attribute, and all the sub-chapter objects are stored.
S2, the file analysis server analyzes the chapter text through an AI algorithm, and the analyzed result is tidied and sent to the file management server.
And (3) performing text analysis by using a file analysis server, and firstly performing text preprocessing on all uploaded data files by taking articles as units, wherein the text preprocessing comprises operations such as text cleaning, text word segmentation, stop word filtering and the like of text data.
When the text data is uploaded through the file analysis server for the first time, the text data which is already classified is selected for uploading, algorithm parameter setting (weight and smoothness factors) is not performed, the result is evaluated through an evaluation algorithm after model training and prediction are completed, parameter setting is adjusted again for continuous training until the evaluation value reaches the maximum, and then new data files are gradually added to complete the classification model.
And according to the article classification result, extracting the topics from different types of articles, performing algorithm training through an LDA algorithm to complete convergence, judging the similarity of the article topics by using a machine learning text similarity algorithm, and extracting the topics with higher similarity as the same type of topics to store.
And carrying out a TF-IDF keyword extraction algorithm by taking the chapters in the article as units, acquiring the keyword information contained in each section of chapter, extracting the keywords in the chapter and storing the keywords.
And S3, the file management server stores relevant document analysis information according to the analysis result of the file analysis server.
And S4, displaying the associated information when the client performs information browsing and interaction.
When the client performs VR electronic document interaction, according to text associated data associated with maintenance interaction, the associated article chapters are extracted from the file management server for information display, and meanwhile related article contents can be directly browsed.
Therefore, the invention solves the problem of low efficiency of searching by manpower when the interactive electronic manual stores a large amount of data files. And automatically completing the text information extraction of the article through a machine learning algorithm. The shortcomings of a single algorithm are remedied by a plurality of different algorithms.
Corresponding to the above method embodiment, the embodiment of the present invention further provides an electronic device, and an electronic device described below and an electronic manual processing method described above may be referred to correspondingly.
Referring to fig. 4, the electronic device includes:
a memory 332 for storing a computer program;
A processor 322 for implementing the steps of the electronic manual processing method of the above-described method embodiment when executing the computer program.
Specifically, referring to fig. 5, fig. 5 is a schematic diagram of a specific structure of an electronic device according to the present embodiment, where the electronic device may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) (e.g., one or more processors) and a memory 332, where the memory 332 stores one or more computer programs 342 or data 344. Wherein the memory 332 may be transient storage or persistent storage. The program stored in memory 332 may include one or more modules (not shown), each of which may include a series of instruction operations in the data processing apparatus. Still further, the processor 322 may be configured to communicate with the memory 332 and execute a series of instruction operations in the memory 332 on the electronic device 301.
The electronic device 301 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input/output interfaces 358, and/or one or more operating systems 341.
The steps in the electronic manual processing method described above may be implemented by the structure of the electronic device.
Corresponding to the above method embodiments, the embodiments of the present invention further provide a readable storage medium, where a readable storage medium described below and an electronic manual processing method described above may be referred to correspondingly.
A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the electronic manual processing method of the above method embodiment.
The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, which may store various program codes.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.