CN114764506A - Network security knowledge base construction method, equipment, storage medium and device - Google Patents
Network security knowledge base construction method, equipment, storage medium and device Download PDFInfo
- Publication number
- CN114764506A CN114764506A CN202110045843.3A CN202110045843A CN114764506A CN 114764506 A CN114764506 A CN 114764506A CN 202110045843 A CN202110045843 A CN 202110045843A CN 114764506 A CN114764506 A CN 114764506A
- Authority
- CN
- China
- Prior art keywords
- processed
- security
- word
- knowledge base
- network security
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Bioethics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a method, a device, a storage medium and a device for constructing a network security knowledge base, compared with the existing mode of dispersedly storing network security information in different positions of different systems, in the invention, a historical security document base is obtained, words in the historical security document base are extracted to obtain a word set to be processed, relevance analysis is carried out on each word to be processed in the word set to be processed to obtain a relevance analysis result, the word set to be processed is grouped according to the analysis result to obtain a word meaning association group and a grammar association group, a directed acyclic graph of the word set to be processed is generated according to the word meaning association group and the grammar association group, the network security knowledge base is established according to the directed acyclic graph, so that the network security information can be intensively stored in the network security knowledge base in the form of the directed acyclic graph, and further, the searching efficiency of the network security information can be improved.
Description
Technical Field
The invention relates to the technical field of internet, in particular to a method, equipment, a storage medium and a device for constructing a network security knowledge base.
Background
At present, network security information is usually stored in different positions of different systems in a scattered manner, so that when a user needs to search the network security information, the user cannot quickly find the target network security information, and further, the user experience is reduced.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a method, equipment, a storage medium and a device for constructing a network security knowledge base, and aims to solve the technical problem of how to establish the network security knowledge base.
In order to achieve the above object, the present invention provides a method for constructing a network security knowledge base, which comprises the following steps:
acquiring a historical security document library, and extracting words of historical security documents in the historical security document library to obtain a word set to be processed;
performing relevance analysis on each word to be processed in the word set to be processed to obtain a relevance analysis result;
grouping the word sets to be processed according to the analysis result to obtain word sense association sets and grammar association sets;
and generating a directed acyclic graph of the word set to be processed according to the word meaning association and the grammar association, and establishing a network security knowledge base according to the directed acyclic graph.
Optionally, the step of obtaining a historical security document library, and performing term extraction on the historical security documents in the historical security document library to obtain a term set to be processed specifically includes:
when a knowledge base construction instruction is received, determining a historical security document base according to the knowledge base construction instruction;
traversing the historical security documents in the historical security document library, and taking the traversed historical security documents as security documents to be processed;
extracting words from the security document to be processed to obtain words to be processed;
and after the traversal of the historical security document is finished, generating a word set to be processed according to the words to be processed.
Optionally, the step of extracting terms from the to-be-processed secure document to obtain terms to be processed includes:
carrying out security phrase identification on the security document to be processed to obtain security phrases to be processed;
and carrying out named entity identification on the safety phrases to be processed to obtain words to be processed.
Optionally, the step of performing security phrase identification on the to-be-processed security document to obtain a to-be-processed security phrase specifically includes:
cutting the to-be-processed security document through a preset statistical language model to obtain an initial security phrase;
acquiring the occurrence frequency of the initial security phrase, and screening the initial security phrase according to the occurrence frequency to acquire a candidate security phrase;
and acquiring the statistical characteristics of the candidate security phrases, and screening the candidate security phrases according to the statistical characteristics to acquire the security phrases to be processed.
Optionally, the step of obtaining the statistical characteristics of the candidate security phrases, and screening the candidate security phrases according to the statistical characteristics to obtain the security phrases to be processed specifically includes:
acquiring statistical characteristics of the candidate safe phrases, and generating quality scores of the candidate safe phrases according to the statistical characteristics;
sorting the candidate security phrases according to the quality scores to obtain a sorting result;
and screening the candidate security phrases according to the sorting result to obtain the security phrases to be processed.
Optionally, the step of performing named entity recognition on the security phrase to be processed to obtain a term to be processed specifically includes:
carrying out sequence marking on the security phrases to be processed to obtain target security phrases;
and carrying out named entity recognition on the target security phrase through a preset entity recognition model to obtain the words to be processed.
Optionally, the step of performing relevance analysis on each word to be processed in the word set to be processed to obtain a relevance analysis result specifically includes:
carrying out synonym analysis on each word to be processed in the word set to be processed to obtain a synonym analysis result;
extracting the abbreviations of the words to be processed in the word set to be processed to obtain abbreviation extraction results;
performing grammar correlation analysis on each word to be processed in the word set to be processed to obtain a grammar analysis result;
and generating a relevance analysis result according to the synonym analysis result, the abbreviation extraction result and the grammar relevance analysis result.
Optionally, the step of grouping the to-be-processed word sets according to the analysis result to obtain word sense association groups and grammar association groups specifically includes:
searching the word set to be processed according to the synonym analysis result and the abbreviation extraction result to obtain a word meaning association;
and searching the word set to be processed according to the grammar correlation analysis result to obtain a grammar correlation group.
Optionally, the step of generating a directed acyclic graph of the to-be-processed word set according to the word sense association and the grammar association, and establishing a network security knowledge base according to the directed acyclic graph specifically includes:
determining a natural language triple according to the grammar association set;
generating a directed acyclic graph of the word set to be processed according to the word meaning association set and the natural language triples;
and establishing a network security knowledge base according to the directed acyclic graph.
Optionally, the step of generating a directed acyclic graph of the set of words to be processed according to the word sense association and the natural language triplet specifically includes:
establishing a word sense mapping relation table among the words to be processed according to the word sense association group;
establishing a grammar mapping relation table among the words to be processed according to the natural language triples;
and generating the directed acyclic graph of the word set to be processed according to the word sense mapping relation table and the grammar mapping relation table.
Optionally, after the step of generating a directed acyclic graph of the to-be-processed term set according to the word meaning association set and the grammar association set, and establishing a network security knowledge base according to the directed acyclic graph, the method for establishing a network security knowledge base further includes:
when a query instruction is received, determining keywords to be queried according to the query instruction;
and searching a target directed acyclic graph corresponding to the keyword to be queried in the network security knowledge base, and displaying the target directed acyclic graph.
Optionally, the step of searching for the target directed acyclic graph corresponding to the keyword to be queried in the network security knowledge base and displaying the target directed acyclic graph specifically includes:
searching a target directed acyclic graph corresponding to the keyword to be inquired in the network security knowledge base;
determining inquiry equipment information according to the inquiry instruction, and determining an information display template according to the inquiry equipment information;
and writing the target directed acyclic graph into the information display template, obtaining information to be displayed, and displaying the information to be displayed.
In addition, in order to achieve the above object, the present invention further provides a network security knowledge base constructing device, which includes a memory, a processor, and a network security knowledge base constructing program stored in the memory and operable on the processor, wherein the network security knowledge base constructing program is configured to implement the steps of the network security knowledge base constructing method as described above.
In addition, to achieve the above object, the present invention further provides a storage medium, on which a network security knowledge base constructing program is stored, and the network security knowledge base constructing program implements the steps of the network security knowledge base constructing method as described above when executed by a processor.
In addition, in order to achieve the above object, the present invention further provides a network security knowledge base constructing apparatus, including: the device comprises an extraction module, an analysis module, a grouping module and an establishment module;
the extraction module is used for acquiring a historical security document library and extracting words from the historical security documents in the historical security document library to obtain a word set to be processed;
the analysis module is used for carrying out relevance analysis on each word to be processed in the word set to be processed to obtain a relevance analysis result;
the grouping module is used for grouping the word sets to be processed according to the analysis result to obtain word sense association sets and grammar association sets;
the establishing module is used for generating the directed acyclic graph of the word set to be processed according to the word meaning association and the grammar association and establishing a network security knowledge base according to the directed acyclic graph.
Optionally, the extracting module is further configured to determine, when receiving a knowledge base construction instruction, a historical security document base according to the knowledge base construction instruction;
the extraction module is further used for traversing the historical security documents in the historical security document library, and taking the traversed historical security documents as security documents to be processed;
the extraction module is also used for extracting words from the to-be-processed security document to obtain to-be-processed words;
the extraction module is further used for generating a word set to be processed according to the words to be processed after the traversal of the historical security document is finished.
Optionally, the extraction module is further configured to perform security phrase identification on the to-be-processed security document to obtain a to-be-processed security phrase;
the extraction module is further used for conducting named entity recognition on the to-be-processed security phrases to obtain to-be-processed words.
Optionally, the extracting module is further configured to cut the to-be-processed security document through a preset statistical language model to obtain an initial security phrase;
the extraction module is further configured to obtain an occurrence frequency of the initial security phrase, and filter the initial security phrase according to the occurrence frequency to obtain a candidate security phrase;
the extraction module is further configured to obtain statistical characteristics of the candidate security phrases, and filter the candidate security phrases according to the statistical characteristics to obtain security phrases to be processed.
Optionally, the extracting module is further configured to obtain statistical features of the candidate security phrases, and generate quality scores of the candidate security phrases according to the statistical features;
the extracting module is further configured to rank the candidate security phrases according to the quality scores to obtain a ranking result;
the extraction module is further configured to filter the candidate security phrases according to the sorting result to obtain security phrases to be processed.
Optionally, the extracting module is further configured to perform sequence tagging on the security phrase to be processed to obtain a target security phrase;
the extraction module is further used for conducting named entity recognition on the target security phrase through a preset entity recognition model to obtain words to be processed.
In the invention, a historical security document library is obtained, words of historical security documents in the historical security document library are extracted to obtain a word set to be processed, relevance analysis is carried out on all words to be processed in the word set to be processed to obtain a relevance analysis result, the word set to be processed is grouped according to the analysis result to obtain a word meaning association set and a grammar association set, a directed acyclic graph of the word set to be processed is generated according to the word meaning association set and the grammar association set, and a network security knowledge base is established according to the directed acyclic graph; compared with the existing mode of dispersedly storing the network security information in different positions of different systems, the method and the device have the advantages that the word set to be processed is obtained by extracting words from the historical security documents in the historical security document library, the relevance analysis is carried out on each word to be processed in the word set to be processed, the directed acyclic graph of the word set to be processed is generated according to the relevance analysis result, the network security knowledge base is established according to the directed acyclic graph, and therefore the network security information can be stored in the network security knowledge base in a directed acyclic graph mode in a centralized mode, and the searching efficiency of the network security information can be improved.
Drawings
FIG. 1 is a schematic structural diagram of a network security knowledge base construction device of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a first embodiment of a method for constructing a network security knowledge base according to the present invention;
FIG. 3 is a flowchart illustrating a second embodiment of the method for constructing a network security knowledge base according to the present invention;
FIG. 4 is a schematic diagram of a directed acyclic graph according to an embodiment of a method for constructing a network security knowledge base of the present invention;
FIG. 5 is a flowchart illustrating a third embodiment of the method for building a network security knowledge base according to the present invention;
FIG. 6 is a flowchart illustrating a fourth embodiment of the method for building a network security knowledge base according to the present invention;
fig. 7 is a block diagram showing the configuration of a first embodiment of the network security knowledge base constructing apparatus according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a network security knowledge base construction device of a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the network security knowledge base building device may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), and the optional user interface 1003 may further include a standard wired interface and a wireless interface, and the wired interface for the user interface 1003 may be a USB interface in the present invention. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory or a Non-volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the architecture shown in fig. 1 does not constitute a limitation of the network security repository construction apparatus and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, identified as a computer storage medium, may include an operating system, a network communication module, a user interface module, and a network security repository construction program.
In the network security knowledge base construction device shown in fig. 1, the network interface 1004 is mainly used for connecting with a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting user equipment; the network security knowledge base construction device invokes, through the processor 1001, a network security knowledge base construction program stored in the memory 1005, and executes the network security knowledge base construction method provided by the embodiment of the present invention.
Based on the hardware structure, the embodiment of the method for constructing the network security knowledge base is provided.
Referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the method for constructing a network security knowledge base of the present invention, and the first embodiment of the method for constructing a network security knowledge base of the present invention is provided.
In a first embodiment, the method for constructing the network security knowledge base comprises the following steps:
step S10: and acquiring a historical security document library, and extracting words of the historical security documents in the historical security document library to obtain a word set to be processed.
It should be understood that the execution subject of this embodiment is the network security knowledge base constructing device, where the network security knowledge base constructing device may be an electronic device such as a computer or a server, and may also be other devices that can achieve the same or similar functions.
It should be noted that the historical security document library may be a local security document library, and the like, and this embodiment is not limited thereto.
It is to be understood that the obtained historical security document library may be a local security document library of the network security knowledge base building device, which is not limited in this embodiment.
It should be understood that, performing word extraction on the historical security documents in the historical security document library to obtain the to-be-processed word set may be performing word segmentation on the historical security documents to obtain words to be matched, matching the words to be matched with the security information words in the preset security word library, and taking the successfully-matched words to be matched as the words to be processed.
Further, in order to manually select a historical security document library, the obtaining of the historical security document library and the word extraction of the historical security documents in the historical security document library to obtain a word set to be processed includes:
when a knowledge base construction instruction is received, determining a historical security document base according to the knowledge base construction instruction, traversing historical security documents in the historical security document base, taking the traversed historical security documents as security documents to be processed, extracting words of the security documents to be processed to obtain words to be processed, and generating a word set to be processed according to the words to be processed after the historical security documents are traversed.
Further, in order to improve the reliability of the to-be-processed term, the term extraction is performed on the to-be-processed security document to obtain the to-be-processed term, and the method includes:
and carrying out security phrase identification on the security document to be processed to obtain security phrases to be processed, and carrying out named entity identification on the security phrases to be processed to obtain words to be processed.
Step S20: and performing relevance analysis on each word to be processed in the word set to be processed to obtain a relevance analysis result.
It should be understood that, performing relevance analysis on each to-be-processed term in the to-be-processed term set to obtain a relevance analysis result may be performing synonym analysis on each to-be-processed term in the to-be-processed term set to obtain a synonym analysis result, performing abbreviation extraction on each to-be-processed term in the to-be-processed term set to obtain an abbreviation extraction result, performing syntax correlation analysis on each to-be-processed term in the to-be-processed term set to obtain a syntax analysis result, and generating a relevance analysis result according to the synonym analysis result, the abbreviation extraction result, and the syntax correlation analysis result.
It should be understood that, synonym analysis is performed on each to-be-processed term in the to-be-processed term set, and the obtained synonym analysis result may be synonym mining based on a synonym resource to obtain a synonym analysis result; synonym mining can be carried out based on pattern matching to obtain synonym analysis results; and carrying out synonym mining based on a bootstrap method to obtain a synonym analysis result.
It can be understood that, the syntax correlation analysis is performed on each to-be-processed word in the to-be-processed word set, and the obtained syntax analysis result may be obtained by performing syntax correlation analysis on each to-be-processed word in the to-be-processed word set based on a rule, a dictionary, or an online knowledge base.
It should be understood that the syntax correlation analysis of each to-be-processed word in the to-be-processed word set may be based on pattern or rule extraction, supervised learning based on sequence labeling, or supervised learning based on text classification.
Step S30: and grouping the word sets to be processed according to the analysis result to obtain word sense association groups and grammar association groups.
It can be understood that, the to-be-processed word sets are grouped according to the analysis result to obtain the word sense association set and the grammar association set may be obtained by searching the to-be-processed word sets according to the synonym analysis result and the abbreviation extraction result to obtain the word sense association set, and searching the to-be-processed word sets according to the grammar correlation analysis result to obtain the grammar association set.
In particular implementations, for example, XSS vulnerabilities, Cross-Site Scripting, Cross Site Scripting, and XSS can constitute word sense association sets.
The XSS vulnerability, the malicious attacker, the insertion code and the malicious html code can form a grammar association set; malicious html code, which may be < script > alert (1) </script >, may also constitute a grammar association.
Step S40: and generating a directed acyclic graph of the word set to be processed according to the word meaning association and the grammar association, and establishing a network security knowledge base according to the directed acyclic graph.
It should be understood that the generating a directed acyclic graph of the set of words to be processed according to the word sense association and the grammar association, and the establishing a network security knowledge base according to the directed acyclic graph may be determining natural language triples according to the grammar association, generating a directed acyclic graph of the set of words to be processed according to the word sense association and the natural language triples, and establishing a network security knowledge base according to the directed acyclic graph.
In a specific implementation, for example, the preset grammatical phrase may be < subject, predicate, object >, and the syntax association group XSS vulnerability, malicious attacker, and insertion are extracted according to a preset grammatical phrase format to obtain the syntax association group < malicious attacker, insertion, and malicious html code >; and extracting the malicious html codes of the grammar association group, which can be < script > alert (1) </script >, according to a preset grammar phrase format to obtain the malicious html codes of the grammar association group, which can be < script > alert (1) </script >.
In the first embodiment, a historical security document library is obtained, words of historical security documents in the historical security document library are extracted to obtain word sets to be processed, relevance analysis is performed on each word to be processed in the word sets to be processed to obtain a relevance analysis result, the word sets to be processed are grouped according to the analysis result to obtain a word meaning association group and a grammar association group, a directed acyclic graph of the word sets to be processed is generated according to the word meaning association group and the grammar association group, and a network security knowledge base is established according to the directed acyclic graph; compared with the existing mode of dispersedly storing the network security information in different positions of different systems, in the embodiment, the word set to be processed is obtained by extracting words from the historical security documents in the historical security document library, the relevance analysis is performed on the words to be processed in the word set to be processed, the directed acyclic graph of the word set to be processed is generated according to the relevance analysis result, and the network security knowledge base is established according to the directed acyclic graph, so that the network security information can be intensively stored in the network security knowledge base in the form of the directed acyclic graph, and the searching efficiency of the network security information can be improved.
Referring to fig. 3, fig. 3 is a flowchart illustrating a second embodiment of the method for constructing a network security knowledge base according to the present invention, and the second embodiment of the method for constructing a network security knowledge base is proposed based on the first embodiment illustrated in fig. 2.
In the second embodiment, the step S10 includes:
step S101: and when a knowledge base construction instruction is received, determining a historical security document base according to the knowledge base construction instruction.
It should be noted that the knowledge base construction instruction may be a control instruction input by a user through a user interaction interface of the network security knowledge base construction device; the control command may also be a control command input by a user through a terminal device that establishes a communication connection with the network security knowledge base building device in advance, which is not limited in this embodiment.
The historic security document library can be an online security knowledge base, a local security dictionary and the like, which is not limited in this embodiment.
It should be understood that, determining the historic security document library according to the knowledge library construction instruction may be to perform identifier extraction on the knowledge library construction instruction, obtain an instruction identifier, and search the historic security document library corresponding to the instruction identifier in a preset document table. The preset document table includes a corresponding relationship between the instruction identifier and the historical security document library, and the corresponding relationship between the instruction identifier and the historical security document library may be preset by a user, which is not limited in this embodiment.
Step S102: traversing the historical security documents in the historical security document library, and taking the traversed historical security documents as security documents to be processed.
It should be understood that traversing the historic security documents in the historic security document library may be randomly traversing the historic security documents in the historic security document library; the historic security documents in the historic security document library may also be traversed according to the storage time of the historic security documents, which is not limited in this embodiment.
Step S103: and extracting words from the security document to be processed to obtain words to be processed.
It can be understood that, the word extraction is performed on the security document to be processed, the obtaining of the word to be processed may be performing word segmentation on the security document to be processed to obtain a word to be matched, the word to be matched is matched with the security information word in the preset security word library, and the successfully matched word to be matched is used as the word to be processed.
Further, in order to improve the reliability of the to-be-processed term, the term extraction is performed on the to-be-processed security document to obtain the to-be-processed term, and the method includes:
and carrying out security phrase identification on the security document to be processed to obtain a security phrase to be processed, and carrying out named entity identification on the security phrase to be processed to obtain a word to be processed.
Step S104: and after the traversal of the historical security document is finished, generating a word set to be processed according to the words to be processed.
It should be understood that, generating the to-be-processed term set according to the to-be-processed terms may be aggregating the to-be-processed terms to obtain the to-be-processed term set.
In a second embodiment, when a knowledge base construction instruction is received, a historical security document base is determined according to the knowledge base construction instruction, historical security documents in the historical security document base are traversed, the traversed historical security documents serve as security documents to be processed, word extraction is performed on the security documents to be processed, words to be processed are obtained, after the traversal of the historical security documents is finished, word sets to be processed are generated according to the words to be processed, and therefore the word sets to be processed can be determined quickly.
In the second embodiment, the step S20 includes:
step S201: and carrying out synonym analysis on each word to be processed in the word set to be processed to obtain a synonym analysis result.
It should be understood that, synonym analysis is performed on each to-be-processed term in the to-be-processed term set, and the obtained synonym analysis result may be synonym mining based on a synonym resource to obtain a synonym analysis result; synonym mining can be carried out based on pattern matching to obtain synonym analysis results; and carrying out synonym mining based on a bootstrap method to obtain a synonym analysis result.
It should be noted that the synonym resource may be synonym information crawled from a resource website; the bootstrap method may be to find a synonym pair by using a pattern used first, and then find a new pattern by using the synonym pair. The above two steps are repeated repeatedly until a termination condition is reached.
In particular implementations, for example, Cross Site Scripting and Cross Site Scripting are synonyms.
Step S202: and extracting the abbreviation of each word to be processed in the word set to be processed to obtain an abbreviation extraction result.
It can be understood that, the syntax correlation analysis is performed on each to-be-processed word in the to-be-processed word set, and the obtained syntax analysis result may be obtained by performing syntax correlation analysis on each to-be-processed word in the to-be-processed word set based on a rule, a dictionary, or an online knowledge base.
Step S203: and performing grammar correlation analysis on each word to be processed in the word set to be processed to obtain a grammar analysis result.
It should be understood that the syntax correlation analysis of each to-be-processed word in the to-be-processed word set may be based on pattern or rule extraction, supervised learning based on sequence labeling, or supervised learning based on text classification.
Step S204: and generating a relevance analysis result according to the synonym analysis result, the abbreviation extraction result and the grammar relevance analysis result.
It should be understood that generating the association analysis result from the synonym analysis result, the abbreviation extraction result, and the syntax correlation analysis result may be taking the synonym analysis result, the abbreviation extraction result, and the syntax correlation analysis result as the association analysis result.
In a second embodiment, a synonym analysis result is obtained by performing synonym analysis on each to-be-processed term in the to-be-processed term set, a abbreviation extraction is performed on each to-be-processed term in the to-be-processed term set, an abbreviation extraction result is obtained, grammar correlation analysis is performed on each to-be-processed term in the to-be-processed term set, a grammar analysis result is obtained, and a correlation analysis result is generated according to the synonym analysis result, the abbreviation extraction result and the grammar correlation analysis result, so that the accuracy of the correlation analysis result can be improved.
In the second embodiment, the step S30 includes:
step S301: and searching the word set to be processed according to the synonym analysis result and the abbreviation extraction result to obtain a word-meaning association group.
It should be understood that, according to the synonym analysis result and the abbreviation extraction result, the to-be-processed term set is searched, and the obtaining of the word sense association group may be storing the to-be-processed term and the synonym and the abbreviation corresponding to the to-be-processed term into the word sense association group.
In particular implementations, for example, XSS vulnerabilities, Cross-Site Scripting, Cross Site Scripting, and XSS can constitute word sense association sets.
Step S302: and searching the word set to be processed according to the grammar correlation analysis result to obtain a grammar correlation group.
It can be understood that, the to-be-processed word set is searched according to the result of the grammar correlation analysis, and the grammar association group is obtained by storing the to-be-processed word and the grammar correlation word corresponding to the to-be-processed word in the grammar association group.
In particular implementations, for example, XSS vulnerabilities, malicious attackers, insertions, and malicious html code may constitute a grammar association; the malicious html code can be < script > alert (1) </script >, or can form a grammar association set.
In a second embodiment, the to-be-processed word set is searched according to the synonym analysis result and the abbreviation extraction result to obtain a word sense association set, and the to-be-processed word set is searched according to the grammar correlation analysis result to obtain a grammar association set, so that the reliability of the word sense association set and the grammar association set can be improved.
In the second embodiment, the step S40 includes:
step S401: and determining the natural language triple according to the grammar association.
It should be understood that the syntax association set is extracted according to a preset syntax phrase format to obtain a natural language triple.
In a specific implementation, for example, the preset grammatical phrase may be < subject, predicate, object >, and the syntax association group XSS vulnerability, malicious attacker, and insertion are extracted according to a preset grammatical phrase format to obtain the syntax association group < malicious attacker, insertion, and malicious html code >; and extracting the malicious html codes of the grammar association group, which can be < script > alert (1) </script >, according to a preset grammar phrase format to obtain the malicious html codes of the grammar association group, which can be < script > alert (1) </script >.
Step S402: and generating a directed acyclic graph of the to-be-processed word set according to the word meaning association set and the natural language triples.
It should be appreciated that generating a directed acyclic graph of the set of words to be processed from the set of word sense associations and the set of natural language triples may be generating a directed acyclic graph from the set of word sense associations and the set of natural language triples through IsA relationship extraction. The extraction through the IsA relationship may be based on a pattern, an online encyclopedia, or a word vector, which is not limited in this embodiment.
Further, in order to improve the reliability of the directed acyclic graph, the step S402 includes:
establishing a word sense mapping relation table among the words to be processed according to the word sense association group;
establishing a grammar mapping relation table among the words to be processed according to the natural language triples;
and generating the directed acyclic graph of the word set to be processed according to the word sense mapping relation table and the grammar mapping relation table.
It can be understood that, the generating of the directed acyclic graph of the to-be-processed word set according to the word sense mapping relationship table and the grammar mapping relationship table may be a directed acyclic graph storing the word sense mapping relationship table and the grammar mapping relationship table into the to-be-processed word set.
In a specific implementation, the directed acyclic graph may be as shown in fig. 4, where XSS vulnerability- > Cross-Site Scripting attack- > Cross Site Scripting- > XSS represents a first directed acyclic graph, and XSS vulnerability- > malicious attacker inserts malicious html code- > malicious html code may be < script > alert (1) </script > represents a second directed acyclic graph, which is not limited in this embodiment.
Step S403: and establishing a network security knowledge base according to the directed acyclic graph.
It should be understood that the network security knowledge base is established according to the directed acyclic graph, and the network security knowledge base is obtained by storing the directed acyclic graph into a preset network security knowledge base. The preset network security knowledge base may be preset by a user, which is not limited in this embodiment.
In a second embodiment, a natural language triple is determined according to the grammar association, a directed acyclic graph of the to-be-processed word set is generated according to the word meaning association and the natural language triple, and a network security knowledge base is established according to the directed acyclic graph, so that a reliable and accurate network security knowledge base can be established.
Referring to fig. 5, fig. 5 is a flowchart illustrating a third embodiment of the method for constructing a network security knowledge base according to the present invention, and the third embodiment of the method for constructing a network security knowledge base is proposed based on the second embodiment illustrated in fig. 3.
In a third embodiment, the step S103 includes:
step S1031: and carrying out security phrase identification on the security document to be processed to obtain a security phrase to be processed.
It should be understood that, performing security phrase recognition on the security document to be processed, and obtaining the security phrase to be processed may be performing security phrase recognition on the security document to be processed through a supervised learning model. Wherein, the supervised learning model may comprise the following steps: determining candidate phrases according to the safety documents to be processed, performing statistical feature calculation and sample labeling on the candidate phrases, inputting the labeled candidate phrases into a classifier for learning, performing quality grading on the learned candidate phrases, and determining the safety phrases to be processed according to the quality scores.
Further, in practical application, sample labeling requires user participation, which reduces user experience. Therefore, in order to automatically determine the security phrase to be processed, the step S1031 includes:
cutting the to-be-processed security document through a preset statistical language model to obtain an initial security phrase;
acquiring the occurrence frequency of the initial security phrase, and screening the initial security phrase according to the occurrence frequency to acquire a candidate security phrase;
and acquiring the statistical characteristics of the candidate security phrases, and screening the candidate security phrases according to the statistical characteristics to acquire the security phrases to be processed.
It should be noted that the preset statistical language model may be preset by a user, and in this embodiment, an n-gram model is taken as an example for description.
In a specific implementation, for example, the cutting of the to-be-processed security document through the preset statistical language model to obtain the initial security phrase may be the cutting of the to-be-processed security document through an n-gram model to obtain the initial security phrase. Where n may be any integer from 1 to 6, which is not limited in this embodiment.
It can be understood that the screening of the initial security phrases according to the occurrence frequency may be to determine whether the occurrence frequency is greater than a preset frequency threshold, and use the initial security phrases corresponding to the occurrence frequency greater than the preset frequency threshold as the candidate security phrases. In this embodiment, 30 is taken as an example for explanation, wherein the preset frequency threshold may be preset by a user.
It should be noted that the statistical characteristics may be characteristics such as tf-idf, textRank, PMI point mutual information, left-right neighbor entropy, and the like. Wherein it-idf and textRank can screen out words which are high in occurrence frequency but not important, such as pronouns, adverbs, prepositions, auxiliary words and the like; PMI point mutual information determines which segmentation mode is more reasonable by measuring the consistency of phrases and comparing n-gram phrases with the same n value, and identifies the condition of word segmentation and boundary crossing, for example, the 'movie theater' is more consistent than 'movie'; the left and right neighbor entropy is used to identify the richness of the left and right collocation of a phrase, the higher the richness, the more reasonable the left and right collocation of the phrase.
It should be understood that, the screening of the candidate security phrases according to the statistical characteristics to obtain the security phrases to be processed may be to match the statistical characteristics with preset characteristic conditions, and to screen the candidate security phrases according to the matching results to obtain the security phrases to be processed. The preset characteristic condition may be preset by a user, which is not limited in this embodiment.
Further, in order to improve reliability of the security phrases to be processed, the obtaining statistical characteristics of the candidate security phrases and screening the candidate security phrases according to the statistical characteristics to obtain the security phrases to be processed includes:
acquiring statistical characteristics of the candidate security phrases, and generating quality scores of the candidate security phrases according to the statistical characteristics;
sorting the candidate security phrases according to the quality scores to obtain a sorting result;
and screening the candidate security phrases according to the sorting result to obtain the security phrases to be processed.
It should be appreciated that generating the quality scores for the candidate security phrases based on the statistical characteristics may be generating the quality scores for the candidate security phrases via a preset scoring rule based on the statistical characteristics. The preset integration rule may be preset by a user, which is not limited in this embodiment.
It can be understood that the candidate security phrases are ranked according to the quality scores, and the ranking result may be obtained by ranking the candidate security phrases from large to small according to the quality scores.
It should be understood that, the candidate security phrases are filtered according to the sorting result, and the obtaining of the security phrase to be processed may be to use n candidate security phrases sorted at the top as the security phrase to be processed. Where n may be preset by a user, which is not limited in this embodiment.
Step S1032: and carrying out named entity identification on the safety phrases to be processed to obtain words to be processed.
It should be understood that, the naming entity recognition is performed on the to-be-processed security phrase, and the obtaining of the to-be-processed word may be that the naming entity recognition is performed on the to-be-processed security phrase through a preset recognition model to obtain the to-be-processed word. The preset recognition model may be a rule matching model, a supervised learning model, or a NER model based on deep learning, which is not limited in this embodiment.
Further, in order to improve reliability of the to-be-processed term, the performing named entity recognition on the to-be-processed security phrase to obtain the to-be-processed term includes:
carrying out sequence marking on the security phrase to be processed to obtain a target security phrase;
and carrying out named entity recognition on the target security phrase through a preset entity recognition model to obtain words to be processed.
It should be noted that the sequence annotation may be a BIO sequence annotation, where B indicates the beginning of a noun, I indicates the middle or ending part of the noun, and O indicates that a character is not an entity, which is not limited in this embodiment.
The preset entity recognition model may be preset by a user, and in this embodiment, a semi-supervised learning model is taken as an example for description.
In particular implementations, for example, words such as "XSS vulnerability", "sql injection", etc. are mined from the document set of the secure SDL.
In a third embodiment, security phrase recognition is performed on the security document to be processed to obtain security phrases to be processed, and named entity recognition is performed on the security phrases to be processed to obtain words to be processed; in the embodiment, the word to be processed is determined through the safe phrase identification and the named entity identification, so that the reliability of the word to be processed can be improved.
Referring to fig. 6, fig. 6 is a flowchart illustrating a fourth embodiment of the method for constructing a network security knowledge base according to the present invention, and the fourth embodiment of the method for constructing a network security knowledge base according to the present invention is proposed based on the first embodiment illustrated in fig. 2.
In the fourth embodiment, after the step S40, the method further includes:
step S50: and when a query instruction is received, determining keywords to be queried according to the query instruction.
It should be noted that the query instruction may be a control instruction input by the user through the network security knowledge base building device, which is not limited in this embodiment.
It should be understood that, determining the keyword to be queried according to the query instruction may be performing information extraction on the query instruction to obtain the keyword to be queried.
Step S60: and searching a target directed acyclic graph corresponding to the keyword to be inquired in the network security knowledge base, and displaying the target directed acyclic graph.
It should be understood that, the searching for the target directed acyclic graph corresponding to the keyword to be queried in the network security knowledge base may be to match the terms in the network security knowledge base with the keyword to be queried, and use the directed acyclic graph corresponding to the successfully matched term as the target directed acyclic graph.
Further, in order to improve the visibility of the target directed acyclic graph, the step S60 includes:
searching a target directed acyclic graph corresponding to the keyword to be inquired in the network security knowledge base;
determining inquiry equipment information according to the inquiry instruction, and determining an information display template according to the inquiry equipment information;
and writing the target directed acyclic graph into the information display template, obtaining information to be displayed, and displaying the information to be displayed.
It should be understood that the determining of the information presentation template according to the query device information may be searching for an information presentation template corresponding to the query device information in a preset presentation template library. The preset display template library includes a corresponding relationship between the query device information and the information display template, and the corresponding relationship between the query device information and the information display template may be preset by a user, which is not limited in this embodiment.
In the fourth embodiment, when a query instruction is received, a keyword to be queried is determined according to the query instruction, a target directed acyclic graph corresponding to the keyword to be queried is searched in the network security knowledge base, and the target directed acyclic graph is displayed, so that the target directed acyclic graph corresponding to the query instruction can be displayed, and the security information query efficiency of a user is improved.
In addition, an embodiment of the present invention further provides a storage medium, where the storage medium stores a network security knowledge base building program, and the network security knowledge base building program, when executed by a processor, implements the steps of the network security knowledge base building method described above.
In addition, referring to fig. 7, an embodiment of the present invention further provides a network security knowledge base constructing apparatus, where the network security knowledge base constructing apparatus includes: an extraction module 10, an analysis module 20, a grouping module 30 and a building module 40;
the extraction module 10 is configured to obtain a historical security document library, and perform term extraction on the historical security documents in the historical security document library to obtain a term set to be processed.
The analysis module 20 is configured to perform relevance analysis on each word to be processed in the word set to be processed, so as to obtain a relevance analysis result.
The grouping module 30 is configured to group the word sets to be processed according to the analysis result to obtain a word sense association group and a grammar association group.
The establishing module 40 is configured to generate a directed acyclic graph of the to-be-processed word set according to the word sense association and the grammar association, and establish a network security knowledge base according to the directed acyclic graph.
In the embodiment, a historical security document library is obtained, words of historical security documents in the historical security document library are extracted to obtain a word set to be processed, relevance analysis is performed on the words to be processed in the word set to be processed to obtain a relevance analysis result, the word set to be processed is grouped according to the analysis result to obtain a word meaning association set and a grammar association set, a directed acyclic graph of the word set to be processed is generated according to the word meaning association set and the grammar association set, and a network security knowledge base is established according to the directed acyclic graph; compared with the existing mode of dispersedly storing the network security information in different positions of different systems, in the embodiment, the word set to be processed is obtained by extracting words from the historical security documents in the historical security document library, the relevance analysis is performed on the words to be processed in the word set to be processed, the directed acyclic graph of the word set to be processed is generated according to the relevance analysis result, and the network security knowledge base is established according to the directed acyclic graph, so that the network security information can be intensively stored in the network security knowledge base in the form of the directed acyclic graph, and the searching efficiency of the network security information can be improved.
Other embodiments or specific implementation manners of the network security knowledge base construction device according to the present invention may refer to the above method embodiments, and are not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering and these words may be interpreted as names.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g., a Read Only Memory (ROM)/Random Access Memory (RAM), a magnetic disk, an optical disk), and includes several instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
The invention discloses A1 and a method for constructing a network security knowledge base, which comprises the following steps:
acquiring a historical security document library, and extracting words of historical security documents in the historical security document library to obtain a word set to be processed;
performing relevance analysis on each word to be processed in the word set to be processed to obtain a relevance analysis result;
grouping the word sets to be processed according to the analysis result to obtain word sense association sets and grammar association sets;
and generating a directed acyclic graph of the word set to be processed according to the word meaning association and the grammar association, and establishing a network security knowledge base according to the directed acyclic graph.
A2, the method for constructing a network security knowledge base as described in a1, wherein the step of obtaining a historical security document base, and performing word extraction on historical security documents in the historical security document base to obtain a word set to be processed specifically includes:
when a knowledge base construction instruction is received, determining a historical security document base according to the knowledge base construction instruction;
traversing the historical security documents in the historical security document library, and taking the traversed historical security documents as security documents to be processed;
extracting words from the security document to be processed to obtain words to be processed;
and after the traversal of the historical security document is finished, generating a word set to be processed according to the words to be processed.
A3, the method for constructing a network security knowledge base as described in a2, wherein the step of extracting words from the security document to be processed to obtain words to be processed specifically includes:
carrying out security phrase identification on the security document to be processed to obtain security phrases to be processed;
and carrying out named entity identification on the safety phrases to be processed to obtain words to be processed.
A4, the method for constructing a network security knowledge base as described in A3, wherein the step of performing security phrase identification on the security document to be processed to obtain the security phrase to be processed specifically includes:
cutting the to-be-processed security document through a preset statistical language model to obtain an initial security phrase;
acquiring the occurrence frequency of the initial security phrase, and screening the initial security phrase according to the occurrence frequency to acquire a candidate security phrase;
and acquiring the statistical characteristics of the candidate security phrases, and screening the candidate security phrases according to the statistical characteristics to acquire the security phrases to be processed.
A5, the method for constructing a network security knowledge base according to a4, wherein the step of obtaining the statistical characteristics of the candidate security phrases, and screening the candidate security phrases according to the statistical characteristics to obtain the security phrases to be processed specifically includes:
acquiring statistical characteristics of the candidate safe phrases, and generating quality scores of the candidate safe phrases according to the statistical characteristics;
sorting the candidate security phrases according to the quality scores to obtain a sorting result;
and screening the candidate security phrases according to the sorting result to obtain the security phrases to be processed.
A6, the method for constructing a network security knowledge base according to A3, wherein the step of identifying the security phrases to be processed by naming entities to obtain words to be processed specifically includes:
carrying out sequence marking on the security phrase to be processed to obtain a target security phrase;
and carrying out named entity recognition on the target security phrase through a preset entity recognition model to obtain words to be processed.
A7, in which the method for constructing a network security knowledge base as recited in any one of a1-a6, the step of performing relevance analysis on each to-be-processed term in the to-be-processed term set to obtain a result of the relevance analysis includes:
carrying out synonym analysis on each word to be processed in the word set to be processed to obtain a synonym analysis result;
extracting the abbreviations of the words to be processed in the word set to be processed to obtain abbreviation extraction results;
performing grammar correlation analysis on each word to be processed in the word set to be processed to obtain a grammar analysis result;
and generating a relevance analysis result according to the synonym analysis result, the abbreviation extraction result and the grammar relevance analysis result.
A8, the method for constructing a network security knowledge base as described in a7, wherein the step of grouping the sets of words to be processed according to the analysis result to obtain word sense association groups and grammar association groups specifically includes:
searching the word set to be processed according to the synonym analysis result and the abbreviation extraction result to obtain a word meaning association;
and searching the word set to be processed according to the grammar correlation analysis result to obtain a grammar correlation group.
A9, in which the method for constructing a network security knowledge base according to any one of a1-a6, includes the steps of generating a directed acyclic graph of the set of words to be processed according to the word meaning association set and the grammar association set, and establishing a network security knowledge base according to the directed acyclic graph, and specifically includes:
determining a natural language triple according to the grammar association set;
generating a directed acyclic graph of the word set to be processed according to the word meaning association set and the natural language triples;
and establishing a network security knowledge base according to the directed acyclic graph.
A10, the method for constructing a network security knowledge base as described in a9, wherein the step of generating the directed acyclic graph of the set of words to be processed according to the word sense association and the natural language triples specifically includes:
establishing a word sense mapping relation table among the words to be processed according to the word sense association group;
establishing a grammar mapping relation table among the words to be processed according to the natural language triples;
and generating the directed acyclic graph of the word set to be processed according to the word sense mapping relation table and the grammar mapping relation table.
A11, for example, in any one of a1 to a6, after the step of generating a directed acyclic graph of the to-be-processed word set according to the word sense association and the grammar association, and building a network security knowledge base according to the directed acyclic graph, the method further includes:
when a query instruction is received, determining keywords to be queried according to the query instruction;
and searching a target directed acyclic graph corresponding to the keyword to be inquired in the network security knowledge base, and displaying the target directed acyclic graph.
A12, the method for constructing a network security knowledge base as described in a11, wherein the steps of searching for a target directed acyclic graph corresponding to the keyword to be queried in the network security knowledge base and displaying the target directed acyclic graph specifically include:
searching a target directed acyclic graph corresponding to the keyword to be inquired in the network security knowledge base;
determining inquiry equipment information according to the inquiry instruction, and determining an information display template according to the inquiry equipment information;
and writing the target directed acyclic graph into the information display template, obtaining information to be displayed, and displaying the information to be displayed.
The invention discloses B13 and a network security knowledge base construction device, wherein the network security knowledge base construction device comprises: the network security knowledge base building method comprises the steps of a memory, a processor and a network security knowledge base building program which is stored on the memory and can run on the processor, wherein the steps of the network security knowledge base building method are realized when the network security knowledge base building program is executed by the processor.
The invention discloses C14, a storage medium, on which a network security knowledge base construction program is stored, which when executed by a processor implements the steps of the network security knowledge base construction method as described above.
The invention discloses D15 and a network security knowledge base construction device, wherein the network security knowledge base construction device comprises: the device comprises an extraction module, an analysis module, a grouping module and an establishing module;
the extraction module is used for acquiring a historical security document library and extracting words of the historical security documents in the historical security document library to obtain a word set to be processed;
the analysis module is used for carrying out relevance analysis on each word to be processed in the word set to be processed to obtain a relevance analysis result;
the grouping module is used for grouping the word sets to be processed according to the analysis result to obtain word sense association sets and grammar association sets;
the establishing module is used for generating a directed acyclic graph of the to-be-processed word set according to the word meaning association set and the grammar association set, and establishing a network security knowledge base according to the directed acyclic graph.
D16, the network security knowledge base constructing device according to D15, wherein the extracting module is further configured to determine a historical security document base according to the knowledge base constructing instruction when receiving the knowledge base constructing instruction;
the extraction module is further used for traversing the historical security documents in the historical security document library, and taking the traversed historical security documents as security documents to be processed;
the extraction module is also used for extracting words from the to-be-processed security document to obtain to-be-processed words;
the extraction module is further used for generating a word set to be processed according to the words to be processed after the traversal of the historical security document is finished.
D17, the network security knowledge base constructing apparatus as described in D16, where the extracting module is further configured to perform security phrase identification on the to-be-processed security document to obtain to-be-processed security phrases;
the extraction module is further used for conducting named entity recognition on the to-be-processed security phrase to obtain to-be-processed words.
D18, the device for constructing the network security knowledge base as D17, the extraction module being further configured to cut the to-be-processed security document through a preset statistical language model to obtain an initial security phrase;
the extraction module is further configured to obtain an occurrence frequency of the initial security phrase, and filter the initial security phrase according to the occurrence frequency to obtain a candidate security phrase;
the extraction module is further configured to obtain statistical characteristics of the candidate security phrases, and filter the candidate security phrases according to the statistical characteristics to obtain security phrases to be processed.
D19, the apparatus for building cyber security knowledge base as described in D18, the extracting module being further configured to obtain statistical characteristics of the candidate security phrases, and generate quality scores of the candidate security phrases according to the statistical characteristics;
the extracting module is further configured to rank the candidate security phrases according to the quality scores to obtain a ranking result;
the extraction module is further configured to filter the candidate security phrases according to the sorting result to obtain security phrases to be processed.
D20, the device for constructing a network security knowledge base as described in D17, wherein the extraction module is further configured to perform sequence tagging on the security phrases to be processed to obtain a target security phrase;
the extraction module is further used for conducting named entity recognition on the target security phrase through a preset entity recognition model to obtain words to be processed.
Claims (10)
1. A method for constructing a network security knowledge base is characterized by comprising the following steps:
acquiring a historical security document library, and extracting words of historical security documents in the historical security document library to obtain a word set to be processed;
performing relevance analysis on each word to be processed in the word set to be processed to obtain a relevance analysis result;
grouping the word sets to be processed according to the analysis result to obtain word sense association sets and grammar association sets;
and generating a directed acyclic graph of the word set to be processed according to the word meaning association and the grammar association, and establishing a network security knowledge base according to the directed acyclic graph.
2. The method for constructing a network security knowledge base according to claim 1, wherein the step of obtaining a historical security document base, extracting words from the historical security documents in the historical security document base, and obtaining a word set to be processed specifically comprises:
when a knowledge base construction instruction is received, determining a historical security document base according to the knowledge base construction instruction;
traversing the historical security documents in the historical security document library, and taking the traversed historical security documents as security documents to be processed;
extracting words from the security document to be processed to obtain words to be processed;
and after the traversal of the historical security document is finished, generating a word set to be processed according to the words to be processed.
3. The method for constructing a network security knowledge base according to claim 2, wherein the step of extracting words from the security document to be processed to obtain words to be processed specifically comprises:
carrying out security phrase identification on the security document to be processed to obtain a security phrase to be processed;
and carrying out named entity identification on the safety phrases to be processed to obtain words to be processed.
4. The method for constructing a network security knowledge base according to claim 3, wherein the step of performing security phrase recognition on the to-be-processed security document to obtain the to-be-processed security phrase specifically comprises:
cutting the to-be-processed security document through a preset statistical language model to obtain an initial security phrase;
acquiring the occurrence frequency of the initial security phrase, and screening the initial security phrase according to the occurrence frequency to acquire a candidate security phrase;
and acquiring the statistical characteristics of the candidate security phrases, and screening the candidate security phrases according to the statistical characteristics to acquire the security phrases to be processed.
5. The method for constructing a network security knowledge base according to claim 4, wherein the step of obtaining the statistical characteristics of the candidate security phrases and screening the candidate security phrases according to the statistical characteristics to obtain the security phrases to be processed specifically comprises:
acquiring statistical characteristics of the candidate safe phrases, and generating quality scores of the candidate safe phrases according to the statistical characteristics;
sorting the candidate security phrases according to the quality scores to obtain a sorting result;
and screening the candidate security phrases according to the sorting result to obtain the security phrases to be processed.
6. The method for constructing a network security knowledge base according to claim 3, wherein the step of performing named entity recognition on the security phrases to be processed to obtain words to be processed specifically comprises:
carrying out sequence marking on the security phrase to be processed to obtain a target security phrase;
and carrying out named entity recognition on the target security phrase through a preset entity recognition model to obtain words to be processed.
7. The method for constructing the network security knowledge base according to any one of claims 1 to 6, wherein the step of performing relevance analysis on each word to be processed in the set of words to be processed to obtain a relevance analysis result specifically comprises:
performing synonym analysis on each word to be processed in the word set to be processed to obtain a synonym analysis result;
extracting the abbreviations of the words to be processed in the word set to be processed to obtain abbreviation extraction results;
performing grammar correlation analysis on each word to be processed in the word set to be processed to obtain a grammar analysis result;
and generating a relevance analysis result according to the synonym analysis result, the abbreviation extraction result and the grammar relevance analysis result.
8. A network security knowledge base construction device, characterized by comprising: a memory, a processor and a network security knowledge base building program stored on the memory and executable on the processor, the network security knowledge base building program when executed by the processor implementing the steps of the network security knowledge base building method according to any one of claims 1 to 7.
9. A storage medium, characterized in that the storage medium has stored thereon a network security knowledge base construction program, which when executed by a processor implements the steps of the network security knowledge base construction method according to any one of claims 1 to 7.
10. A network security knowledge base constructing apparatus, comprising: the device comprises an extraction module, an analysis module, a grouping module and an establishing module;
the extraction module is used for acquiring a historical security document library and extracting words of the historical security documents in the historical security document library to obtain a word set to be processed;
the analysis module is used for carrying out relevance analysis on each word to be processed in the word set to be processed to obtain a relevance analysis result;
the grouping module is used for grouping the word sets to be processed according to the analysis result to obtain word sense association sets and grammar association sets;
the establishing module is used for generating the directed acyclic graph of the word set to be processed according to the word meaning association and the grammar association and establishing a network security knowledge base according to the directed acyclic graph.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110045843.3A CN114764506A (en) | 2021-01-13 | 2021-01-13 | Network security knowledge base construction method, equipment, storage medium and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110045843.3A CN114764506A (en) | 2021-01-13 | 2021-01-13 | Network security knowledge base construction method, equipment, storage medium and device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN114764506A true CN114764506A (en) | 2022-07-19 |
Family
ID=82363579
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202110045843.3A Pending CN114764506A (en) | 2021-01-13 | 2021-01-13 | Network security knowledge base construction method, equipment, storage medium and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN114764506A (en) |
-
2021
- 2021-01-13 CN CN202110045843.3A patent/CN114764506A/en active Pending
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109325201B (en) | Method, device, equipment and storage medium for generating entity relationship data | |
| US9489401B1 (en) | Methods and systems for object recognition | |
| US20150234927A1 (en) | Application search method, apparatus, and terminal | |
| CN109634436B (en) | Method, device, equipment and readable storage medium for associating input method | |
| CN108090351B (en) | Method and apparatus for processing request message | |
| US9317608B2 (en) | Systems and methods for parsing search queries | |
| CN103136228A (en) | Image search method and image search device | |
| US8631097B1 (en) | Methods and systems for finding a mobile and non-mobile page pair | |
| CN109299235B (en) | Knowledge base searching method, device and computer readable storage medium | |
| CN108287927B (en) | Method and device for obtaining information | |
| CN113609261A (en) | Vulnerability information mining method and device based on knowledge graph of network information security | |
| CN114722137A (en) | Security policy configuration method, device and electronic device based on sensitive data identification | |
| CN115309968A (en) | A method and device for generating webpage fingerprint rules based on a resource search engine | |
| CN104462307A (en) | Searching method and device for object in terminal | |
| US20220058214A1 (en) | Document information extraction method, storage medium and terminal | |
| CN117591624B (en) | Test case recommendation method based on semantic index relation | |
| US11507593B2 (en) | System and method for generating queryeable structured document from an unstructured document using machine learning | |
| CN112579937A (en) | Character highlight display method and device | |
| CN113505889B (en) | Processing method and device of mapping knowledge base, computer equipment and storage medium | |
| CN118964693A (en) | Knowledge question answering method, device, readable medium, electronic device and program product | |
| CN107220249B (en) | Classification-based full-text search | |
| CN112487159A (en) | Search method, search device, and computer-readable storage medium | |
| CN115150354B (en) | Method and device for generating domain name, storage medium and electronic equipment | |
| CN114764506A (en) | Network security knowledge base construction method, equipment, storage medium and device | |
| CN115186240A (en) | Social network user alignment method, device and medium based on relevance information |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| CB02 | Change of applicant information |
Country or region after: China Address after: 100020 1765, 15th floor, 17th floor, building 3, No.10, Jiuxianqiao Road, Chaoyang District, Beijing Applicant after: Beijing 360 Zhiling Technology Co.,Ltd. Address before: 100020 1765, 15th floor, 17th floor, building 3, No.10, Jiuxianqiao Road, Chaoyang District, Beijing Applicant before: Beijing Hongxiang Technical Service Co.,Ltd. Country or region before: China |
|
| CB02 | Change of applicant information |