[go: up one dir, main page]

CN114625929B - Method and device for sampling message - Google Patents

Method and device for sampling message Download PDF

Info

Publication number
CN114625929B
CN114625929B CN202210267837.7A CN202210267837A CN114625929B CN 114625929 B CN114625929 B CN 114625929B CN 202210267837 A CN202210267837 A CN 202210267837A CN 114625929 B CN114625929 B CN 114625929B
Authority
CN
China
Prior art keywords
strategy
acquisition
hash table
source port
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210267837.7A
Other languages
Chinese (zh)
Other versions
CN114625929A (en
Inventor
邵慧丽
李亚辉
肖成民
王虹
杨晓娟
万焱
李向通
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING LEADSEC TECHNOLOGY CO LTD
Beijing Venustech Cybervision Co ltd
Original Assignee
BEIJING LEADSEC TECHNOLOGY CO LTD
Beijing Venustech Cybervision Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING LEADSEC TECHNOLOGY CO LTD, Beijing Venustech Cybervision Co ltd filed Critical BEIJING LEADSEC TECHNOLOGY CO LTD
Priority to CN202210267837.7A priority Critical patent/CN114625929B/en
Publication of CN114625929A publication Critical patent/CN114625929A/en
Application granted granted Critical
Publication of CN114625929B publication Critical patent/CN114625929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and device for sampling and collecting message, the method includes: reading an acquisition strategy file, creating a time hash table according to acquisition time information corresponding to each acquisition strategy in the acquisition strategy file, and creating a source port hash table according to source port information corresponding to each acquisition strategy in the acquisition strategy file; matching the message to be acquired with the time hash table and the source port hash table, and selecting an acquisition strategy from the acquisition strategy file according to a matching result to put the acquisition strategy into a pre-screening strategy set; and if the message to be acquired is matched with the confirmation function corresponding to any strategy in the pre-screening strategy set, acquiring the message to be acquired.

Description

Method and device for sampling message
Technical Field
The present application relates to the field of computer networks, and in particular, to a method and apparatus for sampling a message.
Background
The method for sampling the message mainly exists in the field of computer networks. In some technologies, two methods are used in the field of computer networks for sampling the message, one is a direct matching method, the time for capturing the message and the quintuple are matched with a first acquisition strategy, the message is acquired after the matching is successful, and the next acquisition strategy is continuously matched until the acquisition strategies are all matched. But this method has the following drawbacks:
(1) The matching efficiency is low. In a network environment, the amount of messages in a short time is very large. Each message is matched with the time and quintuple information of all acquisition strategies, and the time complexity is high.
(2) And (5) packet loss. Under the background of large messages in a short time, the low matching efficiency can cause subsequent message accumulation and generate a packet loss phenomenon.
The other is a method of using a cascade hash table, which includes coarse screening of collection strategy, narrowing and matching, and the occupation of memory space is large, for example: only the source port and the destination port are cascaded to build the hash table, so that a memory space of 2 32 is required, namely a 4G memory. Therefore, although the method can accelerate the matching efficiency, the memory space occupation is large.
Disclosure of Invention
The application provides a method for sampling a message, which realizes the method for sampling the message with low memory occupation, rapid pre-screening and rapid accurate matching.
The application provides a method for sampling a message, which comprises the following steps:
reading an acquisition strategy file, creating a time hash table according to acquisition time information corresponding to each acquisition strategy in the acquisition strategy file, and creating a source port hash table according to source port information corresponding to each acquisition strategy in the acquisition strategy file;
Matching the message to be acquired with the time hash table and the source port hash table, and selecting an acquisition strategy from the acquisition strategy file according to a matching result to put the acquisition strategy into a pre-screening strategy set;
And if the message to be acquired is matched with the confirmation function corresponding to any strategy in the pre-screening strategy set, acquiring the message to be acquired.
In an exemplary embodiment, the creating a time hash table according to the collection time information corresponding to each collection policy in the collection policy file includes:
Establishing a time hash table with preset length;
Analyzing the acquisition strategy file into a strategy array, traversing the strategy array, and calculating a time hash value of the acquisition time range of each acquisition strategy through a time hash function;
and creating a time hash table according to the time hash value of each acquisition strategy.
In an exemplary embodiment, the creating a time hash table according to the time hash value of each acquisition policy includes:
Placing the index of the strategy array into a time hash table, wherein the index of the strategy array is used for representing the collection strategy stored in the corresponding position of the analyzed strategy array;
and if the strategy array subscript conflicts with the subscript in the time hash table, the strategy array subscript is linked to the time hash table conflict chain.
In an exemplary embodiment, the creating a source port hash table according to source port information corresponding to each acquisition policy in the acquisition policy file includes:
establishing a source port hash table with a preset length;
analyzing the acquisition strategy file into a strategy array, traversing the strategy array, and calculating a source port hash value of an acquisition port range of each acquisition strategy through a source port hash function;
and creating a source port hash table according to the source port hash value of each acquisition strategy.
In an exemplary embodiment, the creating a source port hash table according to the source port hash value of each acquisition policy includes:
placing the index of the strategy array into a source port hash table, wherein the index of the strategy array is used for representing the collection strategy stored in the corresponding position of the analyzed strategy array;
and if the strategy array subscript conflicts with the subscript in the source port hash table, the strategy array subscript is linked to the source port hash table conflict chain.
In an exemplary embodiment, the matching the message to be collected with the time hash table and the source port hash table, and selecting a collection policy from the collection policy file according to a matching result, and putting the collection policy into a pre-screening policy set includes:
calculating a time hash value of a message to be acquired, inquiring the time hash table, and determining a first acquisition strategy set of the time period;
calculating a source port hash value of a message to be acquired, inquiring a source port hash table, and determining a second acquisition strategy set of the source port range;
and selecting an intersection set of the first acquisition strategy set and the second acquisition strategy set as a pre-screening strategy set.
In an exemplary embodiment, no acquisition operation is performed when the intersection of the first acquisition strategy set and the second acquisition strategy set is empty.
In an exemplary embodiment, the validation function is generated according to each collection strategy in the collection strategy file, and the validation function names are in one-to-one correspondence with the collection strategies; the identification function name is uniquely generated by the acquisition strategy id, and the identification function body is generated by other acquisition conditions of the acquisition strategy.
The application also provides a device for sampling the message, which comprises: a memory and a processor; the memory is configured to store a program for collecting a message, and the processor is configured to read and execute the program for collecting a message, and execute the method for sampling a message according to any one of the foregoing embodiments.
The present application also provides a storage medium having stored therein a program for collecting messages, the program being arranged to perform the method of sampling a message according to any of the above embodiments at run-time.
Compared with the related art, the application provides a method for sampling a message, which comprises the following steps: reading an acquisition strategy file, creating a time hash table according to acquisition time information corresponding to each acquisition strategy in the acquisition strategy file, and creating a source port hash table according to source port information corresponding to each acquisition strategy in the acquisition strategy file; matching the message to be acquired with the time hash table and the source port hash table, and selecting an acquisition strategy from the acquisition strategy file according to a matching result to put the acquisition strategy into a pre-screening strategy set; and if the message to be acquired is matched with the confirmation function corresponding to any strategy in the pre-screening strategy set, acquiring the message to be acquired. In the embodiment of the application, the collected messages are inquired by adopting the hash table, so that the memory can be saved, and the rapid pre-screening can be realized, so that the policy matching range is reduced. And compared with variable indirect addressing, the hard coding of the confirmation function is fast in executing speed when the final matching is carried out, so that the policy matching efficiency is improved. The embodiment of the application realizes a method for sampling messages with small memory occupation, rapid pre-screening and rapid accurate matching.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. Other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
The accompanying drawings are included to provide an understanding of the principles of the application, and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain, without limitation, the principles of the application.
FIG. 1 is a flow chart of a method for sampling a message according to an embodiment of the present application;
FIG. 2 is a flowchart of a method for sampling an acquisition message according to an acquisition strategy in an exemplary embodiment;
FIG. 3 is an illustration of creating a temporal hash representation intent based on an acquisition policy in an exemplary embodiment;
FIG. 4 is an illustration of creating a source port hash representation intent in accordance with an acquisition policy in an exemplary embodiment;
FIG. 5 is a schematic diagram of an apparatus for sampling a message according to an embodiment of the present application;
fig. 6 is a flow chart of an exemplary method of sampling a message.
Detailed Description
The present application has been described in terms of several embodiments, but the description is illustrative and not restrictive, and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the described embodiments. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or in place of any other feature or element of any other embodiment unless specifically limited.
The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The disclosed embodiments, features and elements of the present application may also be combined with any conventional features or elements to form a unique inventive arrangement as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive arrangements to form another unique inventive arrangement as defined in the claims. It is therefore to be understood that any of the features shown and/or discussed in the present application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not to be restricted except in light of the attached claims and their equivalents. Further, various modifications and changes may be made within the scope of the appended claims.
Furthermore, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other sequences of steps are possible as will be appreciated by those of ordinary skill in the art. Accordingly, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Furthermore, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.
The embodiment of the disclosure provides a method for sampling a message, as shown in fig. 1, the method includes steps S100-S120, specifically as follows:
s100, reading an acquisition strategy file, creating a time hash table according to acquisition time information corresponding to each acquisition strategy in the acquisition strategy file, and creating a source port hash table according to source port information corresponding to each acquisition strategy in the acquisition strategy file;
s110, matching the message to be acquired with the time hash table and the source port hash table, and selecting an acquisition strategy from the acquisition strategy file according to a matching result to put the acquisition strategy into a pre-screening strategy set;
And S120, if the message to be acquired is matched with a confirmation function corresponding to any strategy in the pre-screening strategy set, acquiring the message to be acquired.
In this embodiment, the acquisition policy file includes a plurality of acquisition policies; each acquisition policy may be specified by a user and added to an acquisition policy profile. For example: the collection strategy file format and the meaning of each field are as follows:
id=0
src_ip=192.168.39.2-192.168.39.254
dst_ip=192.168.35.25-192.168.35.36
src_port=6000-8000
dst_port=80-8888
protocol=TCP
starttime=00:00:10
endtime=00:00:30
Wherein, id is the number of the acquisition strategy, and each acquisition strategy corresponds to only one id; src_ip, dst_ip, src_port, dst_port are quadruple information, and this field may be a range value; protocol is the protocol name; starttime and endtime identify time ranges for acquisition.
In an exemplary embodiment, the creating a time hash table according to the collection time information corresponding to each collection policy in the collection policy file includes: establishing a time hash table with preset length; analyzing the acquisition strategy file into a strategy array, traversing the strategy array, and calculating a time hash value of the acquisition time range of each acquisition strategy through a time hash function; and creating a time hash table according to the time hash value of each acquisition strategy.
In an exemplary embodiment, the creating a time hash table according to the time hash value of each acquisition policy includes: placing the index of the strategy array into a time hash table, wherein the index of the strategy array is used for representing the collection strategy stored in the corresponding position of the analyzed strategy array; and if the strategy array subscript conflicts with the subscript in the time hash table, the strategy array subscript is linked to the time hash table conflict chain.
In step S100, the process of creating the time hash table may be: and creating a time hash table according to the acquisition time of the acquisition strategy, wherein a specific algorithm is to set up a hash array with the length of 86400 (24 x 60) in a unit of seconds and in a range of days, and initializing the hash array to be-1, namely, no acquisition strategy. Let the acquisition time be (x1:x2:x3), wherein x1 is the time x2 is the fraction and x3 is the second, and the time hash value calculation formula is (y=x1×60+x2×60+x3). Traversing the strategy array according to the sequence, calculating a hash value of the acquisition time range of each strategy through a time hash function, putting the strategy index into a hash table according to the hash value, and recording the strategy index by adopting a conflict chain if the conflict exists, thereby completing the creation of the time hash table.
In an exemplary embodiment, the creating a source port hash table according to source port information corresponding to each acquisition policy in the acquisition policy file includes: establishing a source port hash table with a preset length; analyzing the acquisition strategy file into a strategy array, traversing the strategy array, and calculating a source port hash value of an acquisition port range of each acquisition strategy through a source port hash function; and creating a source port hash table according to the source port hash value of each acquisition strategy.
In an exemplary embodiment, the creating a source port hash table according to the source port hash value of each acquisition policy includes: placing the index of the strategy array into a source port hash table, wherein the index of the strategy array is used for representing the collection strategy stored in the corresponding position of the analyzed strategy array; and if the strategy array subscript conflicts with the subscript in the source port hash table, the strategy array subscript is linked to the source port hash table conflict chain.
In step S100, the process of creating the source port hash table may be: the specific algorithm is to build a hash table with the length of 65536 in the port number range (0-65535), and initialize the hash table to-1, namely, no acquisition strategy. If x is the source port number, the source port hash value calculation formula is (y=x). Traversing the strategy array in sequence, calculating hash values of source port ranges of each strategy through a port hash function, putting strategy subscripts into a source port hash table according to the hash values, and recording strategy subscripts by adopting a conflict chain if conflicts exist.
In step S100, the creating of the source port hash table according to the source port hash value of each acquisition policy and the creating of the time hash table according to the time hash value of each acquisition policy may be performed in parallel, that is, the creating of the source port hash table and the time hash table may be performed in no sequence.
In step S110, the message to be collected is matched with the time hash table and the source port hash table, and the collection policy is selected from the collection policy file according to the matching result and placed into a pre-screening policy set. That is, the matching is determined for each collected message, and the generated hash table can be used to determine a plurality of messages. If 00:00: and 25 capturing a message with a quadruple of (192.168.39.2, 8000, 192.168.35.25, 9000), a protocol of TCP, carrying out hash value operation on time according to a hash function of a time hash table to obtain 25, and inquiring a strategy with a subscript of 25 of the time hash table.
In an exemplary embodiment, the message to be collected is matched with the time hash table and the source port hash table, and the collection strategy is selected from the collection strategy file according to the matching result to be placed into a pre-screening strategy set, so that the implementation process is as follows: calculating a time hash value of a message to be acquired, inquiring the time hash table, and determining a first acquisition strategy set of the time period; calculating a source port hash value of a message to be acquired, inquiring a source port hash table, and determining a second acquisition strategy set of the source port range; and selecting an intersection set of the first acquisition strategy set and the second acquisition strategy set as a pre-screening strategy set. And when the intersection of the first acquisition strategy set and the second acquisition strategy set is empty, not performing acquisition operation. For example: the specific acquisition algorithm of the time pre-screening set A is that the time hash value of the acquisition time of the message is calculated according to a time hash value calculation formula, then the time hash table is inquired, and the acquisition strategy set of the time period is acquired and recorded as the time pre-screening set A. The specific acquisition algorithm of the source port pre-screening set B is that the source port hash value of a message is calculated according to a source port hash value calculation formula, then a source port hash table is queried, an acquisition strategy set of the source port is acquired, and the acquisition strategy set is recorded as the source port pre-screening set B. And taking the intersection of the time pre-screening set A and the source port pre-screening set B to obtain a pre-screening strategy set C. Because the acquisition strategy array is traversed sequentially when the hash table is created, the subscripts in the established acquisition subscript conflict chain are ordered, namely the time pre-screening set A and the source port pre-screening set B are two ordered arrays. Therefore, the operation of intersection is to obtain the intersection of two ordered arrays, in this embodiment, a double pointer method is adopted to obtain the intersection, that is, the idea of two-way merging and sorting is used, two pointers are used to mark the two sets respectively, the two pointers are compared in size and then slide, and finally the intersection is obtained and marked as a pre-screening strategy set C. And traversing each strategy in the pre-screening strategy set C, calling a confirmation function of the strategy, acquiring the message if the matching is successful, and not acquiring if the matching is not successful after traversing the acquisition strategy set.
And S120, if the message to be acquired is matched with a confirmation function corresponding to any strategy in the pre-screening strategy set, acquiring the message to be acquired.
In an exemplary embodiment, the validation function is generated according to each collection policy in the collection policy file, and the validation function name corresponds to the collection policies one by one. The process of generating the validation function is as follows: firstly, reading an acquisition strategy file, then generating a function name according to the strategy id, generating a confirmation function source code for the strategy according to the acquisition condition of the strategy, and finally generating a code for recording the confirmation function address.
An example of
The present embodiment provides a method for sampling a sampling message according to a sampling strategy, firstly, the sampling condition of the sampling strategy is built into a source code (for example, a C program source code, but the present application is not limited to the C program source code), and the source code includes two parts of a validation function and a validation function address function. And respectively establishing a hash table for the acquisition time and the source port of the acquisition strategy to realize the rapid screening of the strategy, and finally, carrying out accurate matching on the screened acquisition strategy according to a confirmation function to confirm whether the message is acquired or not. As shown in fig. 2, the whole implementation process includes the following operations:
And step 1, generating an acquisition strategy source code confirmation function according to the acquisition strategy file.
In step1, if the content of the acquisition policy file set by the user is as follows:
id=1
src_ip=192.168.39.2-192.168.39.254
dst_ip=192.168.35.25-192.168.35.36
src_port=6000-8000
dst_port=8999-9200
protocol=TCP
starttime=00:00:10
endtime=00:00:30
id=2
src_ip=192.168.39.2-192.168.39.254
dst_ip=192.168.35.15-192.168.35.66
src_port=1-7777
dst_port=80-7777
protocol=TCP
starttime=00:00:20
endtime=23:59:59
And generating a confirmation function for each strategy by utilizing the script according to the acquisition strategy file, generating a matching function name according to the strategy id, and compiling the matching function name into the target program. Taking the acquisition strategy with the strategy id of 1 in step 1 as an example. Generating a confirmation function for each strategy by utilizing the script according to the content of the acquisition strategy file, wherein the specific process is as follows: the script reads the content of the strategy file, firstly analyzes the id of the acquisition strategy according to the id name and the equal sign and checks, then analyzes the minimum source IP and the maximum source IP according to the equal sign and the reduced sign and checks each field, and simultaneously understands and separates out the values of the rest fields such as the minimum destination IP, the maximum destination IP, the minimum source port and the like, and finally fills the analyzed values into the programming code file according to the comparison logic according to the programming language logic specification. And generating a matching function name according to the strategy id, recording the matching function address, and compiling the matching function address into the target program. Taking the acquisition strategy with the strategy id of 1 in the step 1 as an example, the generated pseudo code of the C language source code is as follows: bool capture_1 (sip, dip, sport, dport, proto, time)
{
if(sip in range(192.168.39.2-192.168.39.254)
&&dip in range(192.168.35.25-192.168.35.36)
&&(sport>=6000&&sport<=8000)
&&(dport>=8999&&dport<=9200)
&&strcmp(proto,“TCP”)==0
&&time in range(00:00:10-00:00:30))
{
return true;
}
return false;
}
And 2, establishing a hash table according to the acquisition strategy file.
In the step 2, the hash table includes two parts, namely a time hash table and a port hash table, and the creation process is as follows:
Firstly, analyzing an acquisition strategy file into a structure object array, putting a structure body of an acquisition strategy with an id of 1 at a position with an array subscript of 0, and putting a structure body of an acquisition strategy with an id of 2 at a position with an array subscript of 1, wherein the subscript of the strategy array marks the acquisition strategy. The hash table is then built using the array subscripts.
Then, a time hash table is built according to the policy time of the policy array, and a hash array with the length of 86400 (24×60×60) is built in units of per second and in the range of days. Hash operation is performed on the time range of each acquisition strategy, namely, the strategy acquisition time with the id of 1 is 00:00:10-00:00:30, the strategy acquisition time with the id of 2 is 00:00:20-23:59, the corresponding hash function value is calculated according to the hash function of the time hash table, namely, the strategy acquisition time with the id of 1 is 10 (0 x 60+60+10) to 30 (0 x 60+60+0 x 60+30), the strategy acquisition time with the id of 2 is 20 (0 x 60+0 x 60+20) to 86399 (23 x 60+59), the array content with the hash table subscript of (10-30) is assigned to 0, the array content with the hash table subscript of (20-86399) is assigned to 1, wherein the array with the hash table subscript of (20-30) has the subscript of 0, and the hash table subscript 1 is linked to the hash chain, and the hash table is shown in the time chain 3.
Finally, a source port hash table is established according to the source port of the policy array, and a hash array with the length of 65536 is established first. Then, hash operation is carried out on the source port range of each acquisition strategy, for example, a hash value is calculated according to a hash function of a source port hash table by a port range (6000-8000) of rule id=1, so that the port range is (6000-8888), and a hash value is calculated according to a hash function of the source port hash table by a port range (1-7777) of rule id=2, so that the port range is (1-7777). The method comprises the steps of firstly assigning an array content with hash table subscript (6000-8000) to be 0, and then assigning an array content with hash table subscript (1-7777) to be 1, wherein subscript 0 is already in an array with hash table subscript (6000-7777), when a conflict occurs, subscript 1 is linked to a source port hash table conflict chain, the source port hash table is successfully established, and the source port hash table is shown in fig. 4. The source port hash table and the time hash table can be simultaneously operated in parallel or separately operated, and the two tables have no specified context.
And step 3, inquiring the time hash table to obtain a time pre-screening set A.
In step 3, if 00:00: and 25 capturing a message with a quadruple number (192.168.39.2, 8000, 192.168.35.25, 9000) and a protocol of TCP, carrying out hash value operation on time according to a hash function of a time hash table to obtain 25, inquiring a strategy with a time hash table subscript of 25, and obtaining the index (0, 1) of the strategy to be acquired at the moment, wherein the pre-screening set A is (0, 1), and the A is not null.
And 4, inquiring a source port hash table to obtain a source port pre-screening set B.
In step 4, the captured message is the same as step 3, the hash value 8000 is calculated for the message source port according to the hash function of the source port hash table, the content of the source port hash table array 8000 is checked, and the policy index number 0 of the acquisition source port 8000 is obtained, that is, the source port pre-screening set B is (0), and B is not null.
And 5, taking the intersection of the time pre-screening set A and the source port pre-screening set B to obtain a pre-screening strategy set C, and not collecting if the C is an empty set.
In the step 5, the time pre-screening set a (0, 1) obtained in the step 3 and the source port pre-screening set B (0) obtained in the step 4 are intersected to obtain a pre-screening policy set C (0), where C is not null.
And 6, traversing each strategy in the pre-screening strategy set C, calling a confirmation function of the strategy, and checking whether the message needs to be acquired or not.
In the step 6, the pre-screening policy set C (0) is traversed, the policy id of the policy and the validation function of the policy are obtained according to the subscript 0, matching is performed through the validation function of the policy, and the validation function code is shown in the step 1, and the message is acquired by matching each acquisition condition and successful matching.
The embodiment of the application provides a method for sampling a collection message according to a collection strategy, which adopts compiling collection conditions into a source code and a hash table to primarily screen the collection strategy to realize the rapid matching of the strategies, and firstly generates a confirmation function code for each collection strategy configured by a user. Then, a hash table is respectively built according to the strategy acquisition time and the source port. And finally, performing primary screening on the strategy through a hash table in the matching process in operation, and then performing confirmation function matching on the primary screened strategy set. The independent acquisition time and the source port hash table are adopted, the total memory occupation is 148.3KB, and the memory occupation is far lower than that of the cascade hash table. The hard coding is adopted to generate a confirmation function in the running process, and the execution speed is faster than that of variable indirect addressing. Therefore, the method and the system for sampling the collected message according to the collection strategy provided by the embodiment of the application have the advantages of low memory occupation, rapid pre-screening and rapid and accurate matching, and the defects of the traditional method for sampling the collected message are overcome.
The application also provides a device for sampling the message, as shown in fig. 5, the device comprises: a memory and a processor; the memory is configured to store a program for collecting a message, and the processor is configured to read and execute the program for collecting a message, and execute the method for sampling a message according to any one of the foregoing embodiments.
The present application also provides a storage medium having stored therein a program for sampling a message, the program being arranged to perform the method of sampling a message of any of the above embodiments at run-time.
An example of
The present embodiment provides a sorting process for sampling an acquisition message according to an acquisition strategy, as shown in fig. 6:
Step 600 captures traffic and obtains an acquisition policy file.
Step 601 calculates a corresponding time hash value and port hash value according to the time information and the port information in the acquisition policy file.
Step 602, inquiring a time hash table according to the time hash value; jump to step 603 when the time hash value matches the time hash table; if empty, no acquisition is performed, and the process proceeds to step 612.
Step 603, obtaining a time pre-screening set A; step 604 continues.
Step 604 queries a port hash table according to the port hash value; when the port hash value matches the port hash table, go to step 605; if empty, no acquisition is performed, and the process proceeds to step 612.
Step 605, obtaining a port pre-screening set B; execution continues with step 606.
Step 606, the port pre-screening set a and the source port pre-screening set B are intersected to obtain a pre-screening policy set C.
Step 607 judges whether C is an empty set; jump to step 612 when empty; step 608 is performed when C is not an empty set.
Step 608 determines whether the length of the pre-screening policy set C is greater than i; the initial value of i is 0; when greater, step 609 is performed; otherwise, step 612 is skipped.
Step 609 finds a validation function from the policy index.
Step 610 confirms the function match; successful match, step 611 is performed; if the match fails, i=i+1 and step 608 is re-executed.
Step 611 collects the message.
Step 612 does not collect a message.
In this embodiment, the collected file and the hash table are primarily screened to realize the rapid matching of the policies, and in the matching process, the policies are primarily screened through the hash table, and then the primary screened policy set is subjected to confirmation function matching. The independent acquisition time and the source port hash table are adopted, the total memory occupation is 148.3KB, and the memory occupation is far lower than that of the cascade hash table. Therefore, the method and the system for sampling the sampling message according to the sampling strategy provided by the application have the advantages of low memory occupation, rapid pre-screening and rapid and accurate matching, and the defects of the traditional method for sampling the message are overcome.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

Claims (9)

1. A method of sampling a message, the method comprising:
reading an acquisition strategy file, creating a time hash table according to acquisition time information corresponding to each acquisition strategy in the acquisition strategy file, and creating a source port hash table according to source port information corresponding to each acquisition strategy in the acquisition strategy file;
Matching the message to be acquired with the time hash table and the source port hash table, and selecting an acquisition strategy from the acquisition strategy file according to a matching result to put the acquisition strategy into a pre-screening strategy set;
if the message to be acquired is matched with a confirmation function corresponding to any strategy in the pre-screening strategy set, acquiring the message to be acquired;
The confirmation function is generated by hard coding according to each acquisition strategy in the acquisition strategy file, and the confirmation function names are in one-to-one correspondence with the acquisition strategies; the validation function name is uniquely generated by an acquisition strategy id, and the validation function body is generated by acquisition time information STARTTIME, ENDTIME in the acquisition strategy, source port information src_ip, src_port, destination port information dst_ip, dst_port and protocol name protocol;
the collection policies include the collection policy id, the collection time information STARTTIME, ENDTIME, the source port information src_ip, src_port, the destination port information dst_ip, dst_port, and the protocol name protocol.
2. The method for sampling a message according to claim 1, wherein the creating a time hash table according to the collection time information corresponding to each collection policy in the collection policy file includes:
establishing a time hash table with preset length;
Analyzing the acquisition strategy file into a strategy array, traversing the strategy array, and calculating a time hash value of the acquisition time range of each acquisition strategy through a time hash function;
and creating a time hash table according to the time hash value of each acquisition strategy.
3. The method of sampling a message according to claim 2, wherein creating a temporal hash table based on the temporal hash value of each sampling strategy comprises:
Placing the index of the strategy array into a time hash table, wherein the index of the strategy array is used for representing the collection strategy stored in the corresponding position of the analyzed strategy array;
and if the strategy array subscript conflicts with the subscript in the time hash table, the strategy array subscript is linked to the time hash table conflict chain.
4. The method for sampling a message according to claim 3, wherein creating a source port hash table according to source port information corresponding to each collection policy in the collection policy file comprises:
Establishing a source port hash table with a preset length;
Analyzing the acquisition strategy file into a strategy array, traversing the strategy array, and calculating a source port hash value of an acquisition source port range of each acquisition strategy through a source port hash function;
and creating a source port hash table according to the source port hash value of each acquisition strategy.
5. The method of sampling a message according to claim 4, wherein creating a source port hash table based on the source port hash value for each sampling policy comprises:
placing the index of the strategy array into a source port hash table, wherein the index of the strategy array is used for representing the collection strategy stored in the corresponding position of the analyzed strategy array;
and if the strategy array subscript conflicts with the subscript in the source port hash table, the strategy array subscript is linked to the source port hash table conflict chain.
6. The method for sampling a message according to claim 5, wherein the matching the message to be sampled with the time hash table and the source port hash table, and selecting a sampling strategy from the sampling strategy file according to the matching result, and placing the sampling strategy into a pre-screening strategy set comprises:
calculating a time hash value of a message to be acquired, inquiring the time hash table, and determining a first acquisition strategy set of the time period;
Calculating a source port hash value of a message to be acquired, inquiring the source port hash table, and determining a second acquisition strategy set of the source port range;
and selecting an intersection set of the first acquisition strategy set and the second acquisition strategy set as a pre-screening strategy set.
7. The method of sampling a message according to claim 6, wherein the method further comprises:
and when the intersection of the first acquisition strategy set and the second acquisition strategy set is empty, not performing acquisition operation.
8. An apparatus for sampling a message, the apparatus comprising: a memory and a processor; the memory is used for storing a program for collecting messages, and the processor is used for reading and executing the program for collecting messages and executing the method for sampling and collecting messages according to any one of claims 1-7.
9. A storage medium, wherein the storage medium has stored therein a program for collecting messages, the program being arranged to perform the method of sampling a collected message as claimed in any one of claims 1 to 7 at run-time.
CN202210267837.7A 2022-03-17 2022-03-17 Method and device for sampling message Active CN114625929B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210267837.7A CN114625929B (en) 2022-03-17 2022-03-17 Method and device for sampling message

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210267837.7A CN114625929B (en) 2022-03-17 2022-03-17 Method and device for sampling message

Publications (2)

Publication Number Publication Date
CN114625929A CN114625929A (en) 2022-06-14
CN114625929B true CN114625929B (en) 2024-08-13

Family

ID=81902142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210267837.7A Active CN114625929B (en) 2022-03-17 2022-03-17 Method and device for sampling message

Country Status (1)

Country Link
CN (1) CN114625929B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103973684A (en) * 2014-05-07 2014-08-06 北京神州绿盟信息安全科技股份有限公司 Rule compiling and matching method and device
CN105913281A (en) * 2016-04-12 2016-08-31 宁波极动精准广告传媒有限公司 Advertisement publishing method based on classified Hash table
CN112511441A (en) * 2020-11-18 2021-03-16 潍柴动力股份有限公司 Message processing method and device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101525623B1 (en) * 2008-12-18 2015-06-03 삼성전자주식회사 Method and apparatus for filtering network traffic
US9237128B2 (en) * 2013-03-15 2016-01-12 International Business Machines Corporation Firewall packet filtering
CN106326258B (en) * 2015-06-26 2022-04-08 中兴通讯股份有限公司 URL matching method and device
CN106936799B (en) * 2015-12-31 2021-05-04 阿里巴巴集团控股有限公司 Message cleaning method and device
CN106790742A (en) * 2016-11-23 2017-05-31 北京锐安科技有限公司 A kind of method and device of IP matchings
CN106713015B (en) * 2016-12-07 2019-06-04 武汉斗鱼网络科技有限公司 A scheme testing method and server
CN111818099B (en) * 2020-09-02 2020-12-04 南京云信达科技有限公司 TCP (Transmission control protocol) message filtering method and device
CN112491901B (en) * 2020-11-30 2023-03-24 北京锐驰信安技术有限公司 Network flow fine screening device and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103973684A (en) * 2014-05-07 2014-08-06 北京神州绿盟信息安全科技股份有限公司 Rule compiling and matching method and device
CN105913281A (en) * 2016-04-12 2016-08-31 宁波极动精准广告传媒有限公司 Advertisement publishing method based on classified Hash table
CN112511441A (en) * 2020-11-18 2021-03-16 潍柴动力股份有限公司 Message processing method and device

Also Published As

Publication number Publication date
CN114625929A (en) 2022-06-14

Similar Documents

Publication Publication Date Title
US8387003B2 (en) Pluperfect hashing
US20200125900A1 (en) Selecting an algorithm for analyzing a data set based on the distribution of the data set
CN115456002A (en) Two-dimensional code identification method, two-dimensional code positioning identification model establishment method and device
US7657387B2 (en) Method of processing and storing mass spectrometry data
US20100262684A1 (en) Method and device for packet classification
US20190005099A1 (en) Low memory sampling-based estimation of distinct elements and deduplication
CN104182519A (en) File scanning method and device
CN114625929B (en) Method and device for sampling message
CN112073711B (en) Method, system and equipment for simulating and debugging camera performance parameters of intelligent equipment
CN111813971B (en) Hash table construction and image matching method and device, storage medium and electronic equipment
CN113923002B (en) Computer network intrusion prevention method, device, storage medium and processor
CN110661913A (en) User sorting method and device and electronic equipment
CN112688924A (en) Network protocol analysis system
CN115112169B (en) Method, equipment and medium for collecting and analyzing environmental data in tunnel
CN111190898B (en) Data processing method and device, electronic equipment and storage medium
CN107577604B (en) Test data generation method and device and computer readable storage medium
CN114116643A (en) A log processing method, device and storage medium for embedded system
CN116939669B (en) Network element identification method, system, equipment and readable medium based on IP learning table
CN113010538B (en) A satellite data management method
CN114241472B (en) License plate recognition method and device based on bottom library, storage medium and equipment
CN117668705A (en) Dust shielding identification method and device
KR102690827B1 (en) Real-time cumulative data processing method and device for flow-oriented integrated analysis
TWI829099B (en) Learning module and training module
CN117932106B (en) Multi-level indexing method and system for audio fingerprint database data
CN116701153B (en) Evaluation method and device of settlement service performance, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant