CN117272292A - Data processing method, device, equipment and computer readable storage medium - Google Patents
Data processing method, device, equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN117272292A CN117272292A CN202311401326.0A CN202311401326A CN117272292A CN 117272292 A CN117272292 A CN 117272292A CN 202311401326 A CN202311401326 A CN 202311401326A CN 117272292 A CN117272292 A CN 117272292A
- Authority
- CN
- China
- Prior art keywords
- detection
- data
- detected
- weights
- detection result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/554—Detecting local intrusion or implementing counter-measures involving event detection and direct action
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2433—Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/566—Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/568—Computer malware detection or handling, e.g. anti-virus arrangements eliminating virus, restoring damaged files
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/03—Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
- G06F2221/033—Test or assess software
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Virology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Debugging And Monitoring (AREA)
Abstract
The application provides a data processing method, a device, equipment and a computer readable storage medium, which are applied to various data detection scenes such as cloud technology, artificial intelligence, games, network security, application security, intelligent traffic, maps, vehicle-mounted and the like; the data processing method comprises the following steps: responding to an abnormality detection request aiming at data to be detected, obtaining N detection results for respectively detecting the data to be detected by adopting N detection modes, wherein N is a positive integer greater than 1, and the detection modes are used for detecting whether the data to be detected is abnormal; the N detection results are arranged in reverse order based on the priority of the detection mode, and a detection result sequence is obtained; traversing the detection result sequence, and ending the traversing of the detection result sequence when the final detection result is determined based on the traversed detection result; and processing the data to be detected based on the final detection result. Through this application, can promote data detection efficiency.
Description
Technical Field
The present invention relates to data processing technology in the field of computer applications, and in particular, to a data processing method, apparatus, device, and computer readable storage medium.
Background
In order to improve the security of computer applications, it is often detected whether the relevant data of the application is abnormal; in addition, in order to improve the detection accuracy of the data, a plurality of detection modes are generally adopted; however, in the above detection process, the final detection result is obtained by computing the total number of the detection results corresponding to the detection methods, which affects the data detection efficiency.
Disclosure of Invention
The embodiment of the application provides a data processing method, a device, equipment, a computer readable storage medium and a computer program product, which can improve the data detection efficiency.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides a data processing method, which comprises the following steps:
responding to an abnormality detection request aiming at data to be detected, obtaining N detection results for respectively detecting the data to be detected by adopting N detection modes, wherein N is a positive integer greater than 1, and the detection modes are used for detecting whether the data to be detected is abnormal or not;
the N detection results are arranged in reverse order based on the priority of the detection mode, and a detection result sequence is obtained;
traversing the detection result sequence, and ending the traversing of the detection result sequence when a final detection result is determined based on the traversed detection result;
And processing the data to be detected based on the final detection result.
An embodiment of the present application provides a data processing apparatus, including:
the request response module is used for responding to an abnormal detection request aiming at the data to be detected, acquiring N detection results for respectively detecting the data to be detected by adopting N detection modes, wherein N is a positive integer greater than 1, and the detection modes are used for detecting whether the data to be detected is abnormal or not;
the result ordering module is used for carrying out reverse order arrangement on the N detection results based on the priority of the detection mode to obtain a detection result sequence;
the result determining module is used for traversing the detection result sequence, and finishing the traversing of the detection result sequence when determining a final detection result based on the traversed detection result;
and the data processing module is used for processing the data to be detected based on the final detection result.
In this embodiment of the present application, the result determining module is further configured to determine that the data to be detected is a white sample as the final detection result when the traversed detection result indicates that a first probability is greater than or equal to a probability threshold of the white sample, where the first probability is a probability that the data to be detected is the white sample, and the white sample is data independent of an abnormal feature.
In this embodiment of the present application, the result determining module is further configured to determine that the data to be detected is a black sample as the final detection result when the traversed detection result indicates that a second probability is greater than or equal to a black sample probability threshold, where the second probability is a probability that the data to be detected is the black sample, and the black sample is data hitting an abnormal feature.
In this embodiment of the present application, the result determining module is further configured to obtain a second probability sequence corresponding to the traversed detection result sequence when the traversed detection result indicates that the second probability is smaller than a black sample probability threshold; determining a target weight sequence based on a detection mode sequence corresponding to the traversed detection result sequence from M target weights corresponding to M detection modes, wherein the M detection modes comprise N detection modes, M is a positive integer, andthe target weight in the target weight sequence represents the weight of the detection mode; correspondingly combining the target weight sequence and the second probability sequence to obtain a current comprehensive probability; at the current, the comprehensive probability is largeAnd when the probability threshold value of the black sample is equal to or higher than the probability threshold value of the black sample, determining that the data to be detected is the black sample as the final detection result.
In this embodiment of the present application, the data processing apparatus further includes a weight determining module, configured to determine M initial weights of the M detection modes as M ith weights, and iterate i to perform the following processing, where i is a natural number: selecting L ith weights from M ith weights, wherein L is a positive integer, andthe method comprises the steps of carrying out a first treatment on the surface of the Correspondingly adjusting L ith weights based on L adjustment directions to obtain L (i+1) th weights; based on the L (i+1) th weights, replacing L (i) th weights in the M (i) th weights to obtain M first weights to be detected, and determining current index values corresponding to the M first weights to be detected; when the current index value is smaller than or equal to a detection index threshold value, determining M first weights to be detected as M (i+1) th weights; and determining M current weights, obtained in the iteration i, with the detection index value larger than the detection index threshold value as M target weights.
In this embodiment of the present application, the weight determining module is further configured to correspondingly adjust L th weights based on L adjustment directions, to obtain L intermediate weights; based on the L intermediate weights, replacing L ith weights in the M ith weights to obtain M second weights to be detected, and determining first index values corresponding to the M second weights to be detected; the following process is iteratively performed: when the second index value corresponding to the M ith weights is smaller than or equal to the first index value, correspondingly adjusting the L intermediate weights based on the L adjustment directions; or when the second index value is greater than the first index value, reversely adjusting the L ith weights based on the L adjustment directions; and determining the L local optimal weights adjusted by iteration as L (i+1) th weights.
In the embodiment of the application, the iterative adjustment of the initial weights is triggered by one or more of the following: and when the adjustment time is up, adding and deleting the detection mode, updating the detection mode, detecting that the index value is lower than the detection index threshold value, receiving adjustment operation, and triggering an adjustment event.
In this embodiment of the present application, the request response module is further configured to obtain, in response to the anomaly detection request for the data to be detected, a data identifier from the anomaly detection request, where the data identifier is used to identify the data to be detected; matching the detection result corresponding to the data identifier in a detection result library corresponding to the detection result and the identifier stored in the first storage area; based on the matched detection results, N detection results of the data to be detected are respectively detected by N detection modes; and when the matching fails, adopting N detection modes to detect the data to be detected respectively, and obtaining N detection results.
In this embodiment of the present application, the request response module is further configured to determine a sequence of ways to be rechecked from N types of detection ways based on one or two of detection time and a version of the detection way; detecting the data to be detected by adopting the mode sequence to be re-detected to obtain a re-detection result sequence; and based on the re-detection result sequence, replacing one or more corresponding detection results in the matched detection results to obtain N detection results of the data to be detected, wherein the N detection results are detected by the N detection modes respectively.
In this embodiment of the present application, the request response module is further configured to determine, as the data to be detected, data corresponding to the data identifier, which is matched from a sample database corresponding to the data identifier, when the matching fails; selecting N detection modes of the data to be detected from M detection modes; and respectively detecting the data to be detected by adopting N detection modes to obtain N detection results.
In this embodiment of the present application, the data processing apparatus further includes an information storage module, configured to obtain N pieces of information to be displayed of N detection results and final display information of the final detection result; storing the N detection results and the final detection results into a detection result library, and storing the N information to be displayed and the final display information into a second storage area, wherein the processing speed of the second storage area is smaller than that of the first storage area.
The embodiment of the application provides an electronic device for data processing, which comprises:
a memory for storing computer executable instructions or computer programs;
and the processor is used for realizing the data processing method provided by the embodiment of the application when executing the computer executable instructions or the computer programs stored in the memory.
The embodiment of the application provides a computer readable storage medium, which stores computer executable instructions or a computer program, wherein the computer executable instructions or the computer program are used for realizing the data processing method provided by the embodiment of the application when being executed by a processor.
The embodiment of the application provides a computer program product, which comprises computer executable instructions or a computer program, and the computer executable instructions or the computer program realize the data processing method provided by the embodiment of the application when being executed by a processor.
The embodiment of the application has at least the following beneficial effects: when responding to an abnormal detection request aiming at data to be detected, N detection results of the data to be detected are respectively detected in N detection modes, then the N detection results are traversed based on the detection priority, and under the condition that the final detection result is determined based on the traversed current detection result, the traversing of the detection results is finished; thus, the data processing amount of the detection result is reduced, and the data detection efficiency can be improved.
Drawings
FIG. 1 is an exemplary schematic diagram of a test;
FIG. 2 is a schematic diagram of an exemplary cache detection result;
FIG. 3 is an exemplary decision diagram;
FIG. 4 is a schematic diagram of the architecture of a data processing system provided by embodiments of the present application;
fig. 5 is a schematic structural diagram of the server in fig. 4 according to an embodiment of the present application;
FIG. 6 is a flowchart illustrating a data processing method according to an embodiment of the present disclosure;
FIG. 7 is a second flow chart of a data processing method according to an embodiment of the present disclosure;
FIG. 8 is a schematic flow chart of determining target weights according to an embodiment of the present application;
FIG. 9 is a third flow chart of a data processing method according to an embodiment of the present disclosure;
FIG. 10 is a schematic diagram of an exemplary plug-in detection scenario provided in an embodiment of the present application;
FIG. 11 is a schematic diagram of an exemplary Trojan horse detection scenario provided in an embodiment of the present application;
FIG. 12 is a schematic diagram of an exemplary game detection scenario provided by an embodiment of the present application;
FIG. 13 is a schematic diagram of exemplary decision region content provided by an embodiment of the present application;
FIG. 14 is an exemplary consolidated schematic provided by embodiments of the present application;
FIG. 15 is a schematic diagram of an exemplary engine information area provided by an embodiment of the present application;
FIG. 16 is an exemplary plug-in architecture diagram provided by embodiments of the present application;
FIG. 17 is an exemplary decision flow chart provided by an embodiment of the present application;
FIG. 18 is an exemplary scan engine update flow diagram provided by an embodiment of the present application;
FIG. 19 is an exemplary scan engine expansion flowchart provided by an embodiment of the present application;
fig. 20 is an exemplary coefficient adjustment flowchart provided in an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
In the following description, the terms "first", "second", and the like are used to distinguish between similar objects and do not represent a particular ordering of the objects, it being understood that the "first", "second", or the like may be interchanged with one another, if permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used in the embodiments of the present application have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in the embodiments of the application is for the purpose of describing the embodiments of the application only and is not intended to be limiting of the application.
Before further describing embodiments of the present application in detail, the terms and expressions that are referred to in the embodiments of the present application are described, and are suitable for the following explanation.
1) Artificial intelligence (Artificial Intelligence, AI), is a theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In the embodiment of the application, the detection mode may be a mode realized by adopting an artificial intelligence technology.
2) Machine Learning (ML), a multi-domain interdisciplinary, involves multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory. For studying computer simulation or implementing learning behavior of humans to obtain new knowledge or skills; reorganizing the existing knowledge structure to continuously improve the performance of the knowledge structure. Machine learning is the core of artificial intelligence, and is the fundamental approach to make computers intelligent, and machine learning is applied throughout various fields of artificial intelligence. Machine learning typically includes techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, and induction learning. In the embodiment of the present application, the detection mode may be a detection functional module implemented by machine learning.
3) An artificial neural network, which is a mathematical model that mimics the structure and function of a biological neural network, exemplary structures of the artificial neural network in embodiments of the present application include a graph roll-up network (Graph Convolutional Network, GCN, a neural network for processing data of the graph structure), a deep neural network (Deep Neural Networks, DNN), a convolutional neural network (Convolutional Neural Network, CNN) and a cyclic neural network (Recurrent Neural Network, RNN), a neural state machine (Neural State Machine, NSM), and a Phase-function neural network (Phase-Functioned Neural Network, PFNN), among others. In the embodiment of the application, the detection mode may be an artificial neural network model.
4) The sample refers to data to be subjected to anomaly detection, such as running process data, module data, driving data, memory block data and the like; in this embodiment of the present application, the data to be detected is a sample.
5) Sample-based challenge refers to taking a sample (e.g., in a client device) and scanning the sample to perform blackout challenge based on the scan results. The scanning is used for acquiring data such as sample features and statistics of the sample features (e.g., the number of times the sample features hit the abnormal features), and the scanning can be implemented by a scanning engine (also called a detection mode), that is, the scanning engine is used for scanning the sample to obtain data such as sample features and statistics. The data processing method provided by the embodiment of the application is applied to a sample-based countermeasure scene.
6) Detection refers to a process of determining whether a sample is normal, and detection results are, for example, black samples, white samples, non-black and non-white samples, and the like. The black samples are, for example, plug-in samples, malicious samples (virus files, attack programs, etc.), combinations of the two, and the like, and are also called abnormal samples; white samples, also known as normal samples (e.g., browser processes, operating system processes, etc.); a non-black non-white sample refers to a sample type for which it is not possible to determine whether it is a black sample or a white sample.
It should be noted that, in order to improve the security of computer applications, it is often detected whether the application-related samples are abnormal, i.e. the sample-based countermeasure is performed; in sample-based challenge applications, various scan engines are typically employed to improve the detection accuracy of the sample. Referring to fig. 1, fig. 1 is an exemplary schematic diagram of detection; as shown in fig. 1, a sample 1-4 is scanned by a scan engine 1-1 to a scan engine 1-3, and the respective scan results are integrated to obtain a final scan result 1-5.
However, in the above detection process, since the multiple scan engines are independent, the data detection efficiency is affected by acquiring the scan results corresponding to each scan engine one by one for the multiple scan engines and then integrating the multiple scan results to determine the final scan result. In addition, there is a problem of repeated scanning when the sample is repeatedly reported, and even if the number of scanning times can be reduced by buffering, there is still a problem of failure to accurately acquire the scanning result due to updating of the scanning engine, resource consumption caused by buffering, and the like. In addition, the scanning results of the samples output by the scanning engines are different in form, so that the complexity of detecting by integrating a plurality of scanning results is increased, and the data detection efficiency is further influenced; and the situation that each scanning engine has misjudgment influences the detection accuracy.
Illustratively, referring to fig. 2 based on fig. 1, fig. 2 is a schematic diagram of an exemplary cache detection result; as shown in FIG. 2, the scan results of the scan engine 1-1 for the sample 1-4 are stored by the cache 2-1, the scan results of the scan engine 1-2 for the sample 1-4 are stored by the cache 2-2, and the scan results of the scan engine 1-3 for the sample 1-4 are stored by the cache 2-3. However, caches 2-1 through 2-3 increase resource consumption.
It should be noted that, there are often cases where different scan engines are combined to determine, and at this time, a combination policy is generated between the respective scan engines; the final scan result is obtained by combining the combination policy and the scan results of the scan engines not participating in the combination.
Referring to fig. 3 for exemplary purposes, fig. 3 is an exemplary decision diagram; as shown in fig. 3, the sample 3-5 is scanned by the scan engines 3-1 to 3-4; wherein, the scanning result of the scanning engine 3-1 to the sample 3-5 is stored by the buffer 3-61, the scanning result of the scanning engine 3-2 to the sample 3-5 is stored by the buffer 3-62, the scanning result of the scanning engine 3-3 to the sample 3-5 is stored by the buffer 3-63, and the scanning result of the scanning engine 3-4 to the sample 3-5 is stored by the buffer 3-64. In addition, a combination policy 3-71 is generated between the scanning engine 3-3 and the scanning engine 3-2, and a combination policy 3-72 is generated between the scanning engine 3-3 and the scanning engine 3-4; finally, the scanning result of the sample 3-5 by the scanning engine 3-1, the result corresponding to the combination strategy 3-71 and the result corresponding to the combination strategy 3-72 are synthesized to obtain a final scanning result 3-8.
It should be noted that, because the output formats of the scan engines are different, when generating the combination policy, the output format of each scan engine needs to be determined first, and then different analysis policies are adopted to analyze the different output formats, so as to obtain a result corresponding to the combination policy; thus, the data detection efficiency is also affected.
It should be further noted that, in the process of obtaining the final scan result based on the plurality of scan engines, there is a case where the scan results of different scan engines collide, thereby affecting the accuracy of data detection.
Based on this, the embodiments of the present application provide a data processing method, apparatus, device, computer readable storage medium and computer program product, which can improve data detection efficiency and accuracy, and reduce storage consumption. The following describes an exemplary application of an electronic device for data processing (hereinafter referred to as a data processing device) provided in an embodiment of the present application, where the data processing device provided in the embodiment of the present application may be implemented as various types of terminals such as a smart phone, a smart watch, a notebook computer, a tablet computer, a desktop computer, an intelligent home appliance, a set-top box, an intelligent vehicle-mounted device, a portable music player, a personal digital assistant, a dedicated messaging device, an intelligent voice interaction device, a portable game device, and an intelligent sound box, or may be implemented as a server, or may be a combination of the two. In the following, an exemplary application when the data processing apparatus is implemented as a server will be described.
With reference now to FIG. 4, FIG. 4 is a schematic diagram illustrating an architecture of a data processing system according to an embodiment of the present application; as shown in FIG. 4, to support a data processing application, in data processing system 100, terminal 200 (terminal 200-1 and terminal 200-2 are illustratively shown) is coupled to server 400 via network 300, and network 300 may be a wide area network or a local area network, or a combination of both. In addition, database 500 is included in data processing system 100 for providing data support to server 400; also, the database 500 is shown in fig. 4 as a case independent from the server 400, and in addition, the database 500 may be integrated in the server 400, which is not limited in the embodiment of the present application.
The terminal 200 is configured to send an abnormality detection request for data to be detected to the server 400 through the network 300 in response to an operation (the graphical interface 210-1 is exemplarily shown). And also for displaying the final test results (graphical interface 210-2 is shown for example).
The server 400 is configured to obtain N detection results for respectively detecting data to be detected by using N detection modes in response to an abnormality detection request, where N is a positive integer greater than 1, and the detection modes are used for detecting whether the data to be detected is abnormal; the N detection results are arranged in reverse order based on the priority of the detection mode, and a detection result sequence is obtained; traversing the detection result sequence, and ending the traversing of the detection result sequence when the final detection result is determined based on the traversed detection result; and processing the data to be detected based on the final detection result. And also to transmit the final detection result to the terminal 200 through the network 300.
In some embodiments, the server 400 may be a stand-alone physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDNs), and basic cloud computing services such as big data and artificial intelligence platforms. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiments of the present application.
Referring to fig. 5, fig. 5 is a schematic structural diagram of the server in fig. 4 according to an embodiment of the present application; as shown in fig. 5, the server 400 includes: at least one processor 410, a memory 450, at least one network interface 420, and a user interface 430. The various components in server 400 are coupled together by bus system 440. It is understood that the bus system 440 is used to enable connected communication between these components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled in fig. 5 as bus system 440.
The processor 410 may be an integrated circuit chip having signal processing capabilities such as a general purpose processor, such as a microprocessor or any conventional processor, a digital signal processor (Digital Signal Processor, DSP), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.
The user interface 430 includes one or more output devices 431, including one or more speakers and/or one or more visual displays, that enable presentation of the media content. The user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
Memory 450 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like. Memory 450 optionally includes one or more storage devices physically remote from processor 410.
Memory 450 includes volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a random access Memory (Random Access Memory, RAM). The memory 450 described in the embodiments herein is intended to comprise any suitable type of memory.
In some embodiments, memory 450 is capable of storing data to support various operations, examples of which include programs, modules and data structures, or subsets or supersets thereof, as exemplified below.
An operating system 451 including system programs, e.g., framework layer, core library layer, driver layer, etc., for handling various basic system services and performing hardware-related tasks, for implementing various basic services and handling hardware-based tasks;
a network communication module 452 for accessing other electronic devices via one or more (wired or wireless) network interfaces 420, the exemplary network interface 420 comprising: bluetooth, wireless compatibility authentication (Wi-Fi), and universal serial bus (Universal Serial Bus, USB), etc.;
a presentation module 453 for enabling presentation of information (e.g., a user interface for operating peripheral devices and displaying content and information) via one or more output devices 431 (e.g., a display screen, speakers, etc.) associated with the user interface 430;
an input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.
In some embodiments, the data processing apparatus provided in the embodiments of the present application may be implemented in software, and fig. 5 shows the data processing apparatus 455 stored in the memory 450, which may be software in the form of a program and a plug-in, and includes the following software modules: the request response module 4551, the result ordering module 4552, the result determination module 4553, the data processing module 4554, the weight determination module 4555 and the information storage module 4556 are logical, and thus may be arbitrarily combined or further split according to the implemented functions. The functions of the respective modules will be described hereinafter.
In some embodiments, the data processing apparatus provided in the embodiments of the present application may be implemented in hardware, and by way of example, the data processing apparatus provided in the embodiments of the present application may be a processor in the form of a hardware decoding processor that is programmed to perform the data processing method provided in the embodiments of the present application, for example, the processor in the form of a hardware decoding processor may employ one or more application specific integrated circuits (Application Specific Integrated Circuit, ASIC), DSP, programmable logic device (Programmable Logic Device, PLD), complex programmable logic device (Complex Programmable Logic Device, CPLD), field programmable gate array (Field-Programmable Gate Array, FPGA) or other electronic component.
In some embodiments, the terminal or server may implement the data processing methods provided in the embodiments of the present application by running various computer-executable instructions or computer programs. For example, the computer-executable instructions may be commands at the micro-program level, machine instructions, or software instructions. The computer program may be a native program or a software module in an operating system; can be a local (Native) application (APPlication, APP), i.e. a program that needs to be installed in an operating system to run, such as a challenge APP, a disinfection APP, etc.; or an applet that can be embedded in any APP, i.e., a program that can be run only by being downloaded into the browser environment. In general, the computer-executable instructions may be any form of instructions and the computer program may be any form of application, module, or plug-in.
The data processing method provided in the embodiment of the present application will be described below in connection with exemplary applications and implementations of the data processing apparatus provided in the embodiment of the present application. In addition, the data processing method provided by the embodiment of the application is applied to various data detection scenes such as cloud technology, artificial intelligence, games, network security, application security, intelligent traffic, maps, vehicle-mounted and the like.
Referring to fig. 6, fig. 6 is a schematic flow chart of a data processing method according to an embodiment of the present application, in which an execution body of each step in fig. 6 is a data processing apparatus; the steps shown in fig. 6 will be described below.
Step 101, responding to an abnormal detection request for the data to be detected, and acquiring N detection results for respectively detecting the data to be detected by adopting N detection modes.
In the embodiment of the application, when the request for performing the anomaly detection on the data to be detected is received, the data processing device also receives the anomaly detection request for the data to be detected; at this time, the data processing apparatus starts executing a flow of abnormality detection of the data to be detected in response to the abnormality detection request. In the process of carrying out anomaly detection on the data to be detected, N detection results of respectively detecting the data to be detected by adopting N detection modes are firstly obtained, wherein N is a positive integer greater than 1, and the detection modes are used for detecting whether the data to be detected is anomalous or not.
It should be noted that, the data to be detected refers to whether or not there is an abnormality to be detected, and may be running process data, called driving data, memory block data, transmission data, game operation data, storage data, network application data, and other data that may have an abnormality; since the data to be detected corresponds to the data specification, the anomaly refers to a situation that does not conform to the data specification, such as a virus file, prohibited operation data, and the like. The anomaly detection request may be triggered by one or more of the following: detection time arrival (e.g., detection time arrival for periodic detection), receipt of a detection operation, a specified event (e.g., data loss, file corruption, etc.), and the like, as embodiments of the present application are not limited in this regard. Different types of data to be detected correspond to different detection mode sets, for example, when the data to be detected is data of type a, the corresponding detection mode set is a detection mode set A, and when the data to be detected is data of type B, the corresponding detection mode set is a detection mode set B; therefore, the data processing device can determine the corresponding detection mode set based on the characteristics such as the type of the data to be detected, and the N detection modes may be the detection mode set, and may also be part of the detection modes in the detection mode set, which is not limited in the embodiment of the present application. Here, the N detection results may be obtained by the data processing apparatus detecting the data to be detected in real time by using N detection methods, may be cached data obtained by the data processing apparatus, may be a combination of the two, or the like, which is not limited in the embodiment of the present application.
Referring to fig. 7, fig. 7 is a second flowchart of a data processing method according to an embodiment of the present application, where an execution body of each step in fig. 7 is a data processing apparatus; as shown in fig. 7, in the embodiment of the present application, step 101 may be implemented through step 1011, step 1012, and step 1013A (or step 1013B); that is, the data processing apparatus acquires N detection results for detecting the data to be detected in the N detection methods, respectively, in response to the abnormality detection request for the data to be detected, including step 1011, step 1012, and step 1013A (or step 1013B), respectively, which will be described below.
Step 1011, obtaining a data identification from the abnormality detection request in response to the abnormality detection request for the data to be detected.
It should be noted that, because the anomaly detection request is used for requesting to perform anomaly detection on the data to be detected, the anomaly detection request carries an identifier for identifying the data to be detected, namely, a data identifier; thus, the data processing apparatus can obtain the data identification from the abnormality detection request. It is readily apparent that the data identification is used to identify the data to be detected.
Step 1012, matching the detection result corresponding to the data identifier in the detection result library corresponding to the identification stored in the first storage area.
In the embodiment of the application, a detection result library is stored in the first storage area, and each detection result corresponding to the identification of each detected data is stored in the detection result library; here, the data processing device can access the detection result library in the first storage area, and match the data identifiers with the identifiers in the detection result library one by one so as to match the detection results corresponding to the data identifiers; when the identification matched with the data identification is matched, the successful matching is determined, and the data to be detected is the detected data; and when the identification matched with the data identification is not matched, determining that the matching fails, wherein the data to be detected is the data for carrying out the anomaly detection for the first time.
It should be noted that, the first storage area is used for storing the detection result library, the data read-write speed of the first storage area is greater than the appointed read-write speed, and the appointed read-write speed is determined based on the actual implementation condition; the first storage area may be a storage device such as a memory database. The data identified for anomaly detection from the unique representation of the data content may be a data Digest, such as a Message-Digest Algorithm (MD 5) value of the data. In addition, the data processing method provided by the embodiment of the application is suitable for a scene of abnormality detection of data by adopting a plurality of detection modes, so that the number of detection results corresponding to each identifier is a plurality in a detection result library.
It can be understood that, by storing each detection result of the detected data in the detection result library, when the data to be detected is the detected data, the final N detection results can be determined directly based on each detection result in the detection result library, so that the detection consumption of the repeated detection data can be reduced, and the detection efficiency of the repeated detection data can be improved. In addition, the first storage area which is larger than the appointed read-write speed stores the detection result library, so that the matching speed of the detection results can be improved, and the determination speed of N detection results can be improved.
Step 1013A, based on the matched detection results, determining N detection results of the data to be detected by N detection methods.
It should be noted that, under the condition of successful matching, the data processing device determines a plurality of detection results corresponding to the matched identifier in the detection result library as each matched detection result; here, the number of detection results among the respective detection results matched may be N. Here, the data processing apparatus may directly determine each of the matched detection results as N detection results based on the update cases of the N detection modes (at this time, the data processing apparatus may obtain a final detection result from the detection result library, so that execution of a subsequent processing procedure may be omitted), and may re-detect some or all of the matched detection results and obtain N detection results to determine accuracy and instantaneity of the N detection results; it is easy to know that after each matched detection result is obtained, under the condition that the detection mode is not updated, each matched detection result is N detection results; and when the detection mode is updated after the matched detection result is obtained, re-detection is performed based on the updated detection mode.
In this embodiment of the present application, the data processing apparatus determines, based on each of the matched detection results, N detection results of the data to be detected by N detection methods, including: the data processing equipment determines a mode sequence to be rechecked from N detection modes based on one or two of the detection time and the detection mode version; detecting the data to be detected by adopting a mode sequence to be re-detected to obtain a re-detection result sequence; and based on the re-detection result sequence, replacing one or more corresponding detection results in the matched detection results to obtain N detection results of the data to be detected respectively by N detection modes. The rechecking result sequences are in one-to-one correspondence with the mode sequences to be detected, and the rechecking result sequences are in one-to-one correspondence with one or more replaced detection results.
The detection time refers to the acquisition time of the detection result, and the detection mode version refers to the version of the detection mode in which the detection result is detected, and may be the version of the detection mode itself or the version of the resource in the detection mode. Here, the data processing device can read the update time of each detection mode, and each detection result includes the detection time, when the detection time is earlier than the update time, the detection mode corresponding to the detection result is indicated to be the mode to be rechecked, otherwise, the detection result is determined to be the latest detection result, and rechecking is not needed. In addition, the data processing device can also read the latest version of each detection mode, each detection result comprises a detection mode version, when the latest version is inconsistent with the detection mode version, the detection mode corresponding to the detection result is the mode to be rechecked, otherwise, the detection result is determined to be the latest detection result, and rechecking is not needed. The sequence of the mode to be rechecked is a sequence formed by the modes to be rechecked in N detection modes, and the mode to be rechecked refers to a detection mode for rechecking the data to be detected in the N detection modes. Wherein the update time and the latest version of each detection mode are stored in a case where it is determined that the update of the detection mode affects the content of the detection result (different from the acquisition speed affecting the detection result).
It can be understood that after each detection result is matched, the data processing device further judges whether to re-detect the data to be detected based on the update conditions of the N detection modes, so that the instantaneity and accuracy of the N detection results can be improved, and the accuracy of data detection can be further improved.
And step 1013B, when the matching fails, detecting the data to be detected by adopting N detection modes respectively to obtain N detection results.
When the matching fails, the data processing device determines that the detection result of the data to be detected does not exist in the detection result library; the data processing equipment determines N detection modes for carrying out abnormal detection on the data to be detected, and respectively detects the data to be detected by adopting the N detection modes, so that N detection results are obtained; wherein, N detection modes are in one-to-one correspondence with N detection results.
In this embodiment of the present application, when matching fails, a data processing device detects data to be detected by using N detection modes, to obtain N detection results, including: when the matching fails, the data processing equipment determines the data matched with the data identifier from the sample database corresponding to the identifier data as the data to be detected; selecting N detection modes of data to be detected from M detection modes; and then, respectively detecting the data to be detected by adopting N detection modes to obtain N detection results.
The data processing apparatus stores the obtained data for abnormality detection in a sample database in a form in which the identification is stored in correspondence with the data. When the data processing equipment matches the data corresponding to the data identifier from the sample database, the matched data is determined to be the data to be detected, and at the moment, the data to be detected is the data which is not subjected to abnormal detection after being acquired; when the data processing device is not matched with the data corresponding to the data identifier from the sample database, the data corresponding to the data identifier is acquired from the client device, and the data to be detected is obtained. Here, the M detection modes are a set of detection modes corresponding to types corresponding to the data to be detected, so that the M detection modes include N detection modes, M is a positive integer, and。
it can be appreciated that the data processing apparatus stores data through the sample database, so that the collection consumption of repeated data is reduced, and the efficiency of data detection can be improved.
And 102, arranging N detection results in reverse order based on the priority of the detection mode to obtain a detection result sequence.
It should be noted that, each detection mode corresponds to a priority in the corresponding detection mode set, and the priority may be positively related to an index such as accuracy of the detection mode. Thus, the data processing apparatus, after obtaining the N detection results and before determining the final result based on the N detection results, performs reverse order arrangement on the N detection results based on the priority of the detection manner to determine the final detection result based on the detection result sequence of the priority reverse order arrangement.
It can be appreciated that the data processing apparatus can set the corresponding priority by numbering the detection modes; the number may be positively or negatively correlated with the priority when the numbers are different.
And 103, traversing the detection result sequence, and ending the traversing of the detection result sequence when the final detection result is determined based on the traversed detection result.
In the embodiment of the application, after obtaining the detection result sequence, the data processing device traverses the detection result sequence to determine a final detection result based on the detection result with the highest priority. Here, when the data processing apparatus determines a final detection result based on the currently traversed detection result, the traversal of the detection result sequence is ended. In addition, the data processing device may determine the final detection result based on the currently traversed detection result alone, and may determine the final detection result based on at least one traversed detection result, which is not limited in this embodiment of the present application.
It should be noted that, the final detection result refers to a final determination result of the data to be detected, or refers to a final detection result of whether the data to be detected is abnormal; for example, the data to be detected is a black sample, the data to be detected is a white sample, etc.; the white sample is data independent of the abnormal feature, namely data without hitting the abnormal feature, such as an operating system process in a task manager; the black sample is data hit with abnormal characteristics, such as a virus file, an attack program, an externally hung operation process and the like; in addition, the abnormal feature refers to a feature for determining an abnormality, such as a malicious function, a malicious operation mode, a mode of skipping a specified process, or the like.
In the embodiment of the application, the data processing device determines a final detection result based on the traversed detection result, which may be the case of determining that the data to be detected is a white sample by the current traversed detection result; that is, the data processing apparatus determines the data to be detected as the white sample as the final detection result when the traversed detection result indicates that the first probability is greater than or equal to the white sample probability threshold.
It should be noted that the detection result includes a first probability, where the first probability is a probability that the data to be detected is a white sample. Here, the data processing apparatus can obtain a white sample probability threshold, which is the lowest white sample probability for determining that the data to be detected is a white sample, for example, 0.99,1 or the like. The data processing device compares the first probability of the currently traversed detection result with the white sample probability threshold, if the first probability is larger than or equal to the white sample probability threshold, the probability that the data to be detected is a white sample is large, and accordingly the data processing device determines that the data to be detected is a white sample, and a final detection result that the data to be detected is the white sample is obtained.
In the embodiment of the application, the data processing device determines a final detection result based on the traversed detection result, or may determine that the data to be detected is a black sample according to the currently traversed detection result; that is, the data processing apparatus determines that the data to be detected is a black sample as the final detection result when the traversed detection result indicates that the second probability is greater than or equal to the black sample probability threshold.
It should be noted that, the detection result includes a second probability, where the second probability is a probability that the data to be detected is a black sample; the first probability and the second probability are independent, and no relative association relation exists. Here, the data processing apparatus can obtain a black sample probability threshold, which is the lowest black sample probability for determining that the data to be detected is a black sample, such as 1,0.99, or the like. And comparing the second probability of the currently traversed detection result with the black sample probability threshold value by the data processing equipment, and if the second probability is larger than or equal to the black sample probability threshold value, indicating that the probability of the data to be detected is large, thereby determining that the data to be detected is the black sample by the data processing equipment, and obtaining the final detection result that the data to be detected is the black sample at the moment.
In the embodiment of the application, the data processing device determines a final detection result based on the traversed detection result, or may determine that the data to be detected is a black sample according to the current traversed detection result and the historical traversed detection result; that is, the data processing apparatus acquires a second probability sequence corresponding to the traversed detection result sequence when the traversed detection result indicates that the second probability is less than or equal to the black sample probability threshold; determining a target weight sequence from M target weights corresponding to the M detection modes based on the detection mode sequence corresponding to the traversed detection result sequence, wherein the target weight in the target weight sequence represents the weight of the detection mode; correspondingly combining the target weight sequence and the second probability sequence to obtain the current comprehensive probability; and finally, when the current comprehensive probability is greater than or equal to the black sample probability threshold, determining that the data to be detected is a black sample as a final detection result.
It should be noted that, the data processing device compares the second probability in the currently traversed detection result with the black sample probability threshold, if the second probability is smaller than the black sample probability threshold, the probability that the data to be detected is a black sample is smaller, so that the data processing device comprehensively determines all the second probabilities in all the traversed detection results. The traversed detection result sequence is all traversed detection results, including the current traversed detection result and the historical traversed detection result. The detection mode sequence is a sequence formed by detection modes for obtaining a traversed detection result sequence, and the traversed detection result sequence corresponds to the detection mode sequence one by one; because the M detection modes are a plurality of detection modes for carrying out abnormality judgment by integrating the plurality of detection modes, each detection mode corresponds to a target weight, and the target weight represents the importance degree of the corresponding detection mode in the M detection modes; and because the detection mode sequence belongs to M detection modes, the data processing equipment can obtain the target weight sequence corresponding to the detection mode sequence one by one. Then, the data processing device performs weighted combination on the second probability sequence corresponding to the traversed detection result sequence by using the target weight sequence, so that the current comprehensive probability is obtained. When the current comprehensive probability is larger than the black sample probability threshold, the data processing equipment determines that the probability of the data to be detected as the black sample is larger, and then determines that the data to be detected as the black sample, and a final detection result that the data to be detected is the black sample is obtained. In addition, since each detection result includes one second probability, the data processing apparatus can obtain a second probability sequence corresponding to the traversed detection result sequence.
And 104, processing the data to be detected based on the final detection result.
In the embodiment of the application, when the data processing device processes the data to be detected based on the final detection result, whether to fight or cancel the data to be detected based on the content of the final detection result is determined. Here, when the final detection result indicates that the data to be detected is a white sample, the countermeasure against the data to be detected is canceled; and when the final detection result indicates that the data to be detected is a black sample, the data to be detected is counteracted. Thus, the processing of the data to be detected includes fighting or canceling the fighting. In addition, the countermeasure is determined based on the type of the data to be detected, for example, when the data to be detected is malicious information such as a virus file, the countermeasure refers to processing such as deleting or preventing operation; when the data to be detected is an operation instruction of the account, the countermeasure is to control the operation authority of the account; etc.
It can be understood that when responding to an abnormal detection request for data to be detected, N detection results of detecting the data to be detected in N detection modes are obtained first, then the N detection results are traversed based on the detection priority, and under the condition that the final detection result is determined based on the traversed current detection result, the traversal of the detection results is ended; therefore, the data processing amount of the detection result is reduced, and the data detection efficiency can be improved. In addition, the final detection result is preferentially acquired based on the detection result with the highest priority, so that the detection accuracy can be ensured.
Referring to fig. 8, fig. 8 is a schematic flow chart of determining a target weight according to an embodiment of the present application, where an execution subject of each step in fig. 8 is a data processing apparatus; as shown in fig. 8, in the embodiment of the present application, the data processing method further includes steps 105 to 110; that is, the data processing apparatus further includes steps 105 to 110 before determining the target weight sequence based on the detection pattern sequence corresponding to the traversed detection result sequence from the M target weights corresponding to the M detection patterns, and each step is described below.
Step 105, determining M initial weights of M detection modes as M ith weights.
It should be noted that, each detection mode corresponds to an initial weight, which is used to represent the importance degree of the initially set detection mode in the detection mode set; thus, for the M detection modes, M initial weights are corresponding, and the M detection modes are in one-to-one correspondence with the M initial weights. Here, the data processing apparatus refers to each initial weight as an i-th weight, and can obtain M-th weights to iteratively adjust the M-th initial weights by iteration i.
In an embodiment of the present application, the data processing apparatus triggers iterative adjustment of M initial weights by one or more of: and when the adjustment time arrives, adding and deleting the detection mode, updating the detection mode, detecting that the index value is lower than the detection index threshold value, receiving the adjustment operation, and triggering an adjustment event.
It should be noted that, the adjustment time reaching refers to triggering iterative adjustment of M initial weights by a specified time, such as periodicity, randomness, and the like. The addition and deletion detection mode refers to an addition and deletion detection mode from M detection modes. The update detection method is a detection method among M detection methods. The detection index threshold is the lowest index value detected by adopting M detection modes, so that when the detection index value of the M detection modes is lower than the detection index threshold, the detection accuracy of the M detection modes is lower, and the adjustment of M initial weights is triggered. An adjustment operation, such as an operation indicating adjustment of M initial weights, is received. The trigger adjustment event refers to an event that triggers adjustment of M initial weights, for example, negative feedback information is received.
In the embodiment of the present application, the data processing apparatus iterates i to perform the following processing (step 106 to step 109), where each iteration is used to adjust some (but not all of the M initial weights).
And 106, selecting L ith weights from M ith weights.
In the embodiment of the application, the data processing device selects L ith weights from M ith weights to adjust; and the selection mode can be sequentially selected according to the detection mode, or can be arranged and combined according to the detection mode; therefore, in each iteration, the L detection modes corresponding to the selected L ith weights may be all different or may be partially different, which is not limited in the embodiment of the present application. For example, the L detection modes corresponding to the weight selected in the previous iteration are detection mode 1, the weight of detection mode 2 can be sequentially selected for adjustment in the next iteration, and the weight of detection mode 3 can be selected for adjustment … in the next iteration; for another example, the L detection methods corresponding to the weights selected in the previous iteration are detection method 1 and detection method 2, and the weights of detection method 1 and detection method 3 can be selected in combination for adjustment in the next iteration, and the weights of detection method 1 and detection method 4 can be selected for adjustment … in the next iteration.
And 107, correspondingly adjusting the L ith weights based on the L adjustment directions to obtain L (i+1) th weights.
In this embodiment of the present application, the data processing apparatus sets an adjustment direction for each i-th weight of the L-th weights, where the adjustment direction may be a direction in which the corresponding weight is increased by a specified step size, or may be a direction in which the corresponding weight is decreased by a specified step size; here, when L is a positive integer greater than 1, the L adjustment directions may be the same or different, which is not limited in the embodiment of the present application.
It should be noted that, the i+1 weight is a locally optimal weight adjusted based on the adjustment direction; thus, the i+1th weight may be the i-th weight, at which time the data processing apparatus correspondingly adjusts the L-th weights 0 times based on the L adjustment directions, that is, the i-th weight is already the locally optimal weight. The i+1th weight may be the i-th weight adjusted one or more times, and at this time, the data processing apparatus correspondingly adjusts the L-th weights one or more times based on the L adjustment directions, that is, the i+1th weight obtained by adjusting the i-th weight one or more times based on the L adjustment directions is the locally optimal weight.
In this embodiment of the present application, the data processing device correspondingly adjusts the L ith weights based on the L adjustment directions to obtain L (i+1) th weights, including: the data processing equipment correspondingly adjusts L ith weights based on L adjustment directions to obtain L intermediate weights; based on the L intermediate weights, replacing L ith weights in the M ith weights to obtain M second weights to be detected, and determining first index values corresponding to the M second weights to be detected; the following process is then iteratively performed: when the second index value corresponding to the M ith weights is smaller than or equal to the first index value, correspondingly adjusting L middle weights based on L adjusting directions; or when the second index value is greater than the first index value, reversely adjusting the L ith weights based on the L adjustment directions; and finally, determining the L local optimal weights adjusted by iteration as L (i+1) th weights.
It should be noted that, when the second index value corresponding to the M ith weights is smaller than or equal to the first index value, it indicates that the L ith weights are correspondingly adjusted based on the L adjustment directions, so that the detection index value can be improved, so that the data processing apparatus continues to adjust the L intermediate weights based on the L adjustment directions, until the detection index value of the L weights that are iteratively adjusted decreases for the first time, it is determined that the L local optimal weights are adjusted, and the L local optimal weights are previous adjustment results. When the second index value is larger than the first index value, reversely adjusting the L ith weights based on the L adjustment directions, and if the detection index value adjusted at the moment is still smaller than or equal to the second index value, determining the L ith weights as L local optimal weights by the data processing equipment; if the detection index value adjusted at this time is greater than the second index value, the data processing device continues to reversely adjust based on the L adjustment directions until the detection index values of the L weights adjusted by iteration are reduced for the first time, and it is determined that the L local optimal weights are adjusted, and the L local optimal weights are the previous adjustment results.
And step 108, replacing the L ith weights in the M ith weights based on the L ith+1 weights to obtain M first weights to be detected, and determining the current index values corresponding to the M first weights to be detected.
It should be noted that, the M first weights to be detected include L i+1 weights, and i weights except for the L i weights among the M i weights.
In the embodiment of the application, the first index value, the current index value, the detection index value and other index values comprise one or two of coverage rate and accuracy rate; the accuracy refers to the ratio of the determined number of black samples to the determined number of black samples, and the coverage refers to the ratio of the determined number of black samples to the actual number of black samples.
And 109, determining M first weights to be detected as M (i+1) th weights when the current index value is smaller than or equal to the detection index threshold.
In the embodiment of the application, the data processing device compares the current index value with the detection index threshold value; when the current index value is smaller than or equal to the detection index threshold value, the condition that the adjustment is needed to be continued is indicated, so that M first weights to be detected are determined to be M (i+1) th weights, the adjustment of the weights is continued to be performed through iteration i, and the adjustment is ended until the obtained detection index value is larger than the detection index threshold value.
The detection index threshold value corresponds to the content of the index value; for example, when the index value includes an accuracy rate, the detection index threshold includes an accuracy rate threshold, and the two are compared correspondingly; when the index value comprises coverage rate, the detection index threshold value comprises coverage rate threshold value, and the coverage rate threshold value and the detection index threshold value are correspondingly compared.
Step 110, determining M current weights, obtained in the iteration i, with the detection index value greater than the detection index threshold value as M target weights.
It should be noted that, when the data processing apparatus determines that the detection index values corresponding to the adjusted M current weights are greater than the detection index threshold through iteration i, the data processing apparatus ends the adjustment, and determines the M current weights as M target weights.
It can be understood that, by adjusting weights corresponding to the M detection modes respectively, the detection indexes of the M detection modes are greater than the detection index threshold, so that the detection accuracy can be improved.
Referring to fig. 9, fig. 9 is a flowchart III of a data processing method provided in an embodiment of the present application, where an execution body of each step in fig. 9 is a data processing apparatus; as shown in fig. 9, in the embodiment of the present application, step 104 further includes step 111 and step 112; that is, the data processing apparatus traverses the detection result sequence, and after finishing the traversal of the detection result sequence when the final detection result is determined based on the traversed detection result, the data processing method further includes steps 111 and 112, which will be described below, respectively.
Step 111, acquiring N pieces of information to be displayed of N detection results and final display information of a final detection result.
It should be noted that, the N detection results are in one-to-one correspondence with the N information to be displayed, and the information to be displayed is the display information of the detection results. Wherein the detection result includes at least one of the following information: the detection mode number, the sample type (such as a white sample, a black sample and a non-black non-white sample) and the corresponding probability of the sample type, the detection time and the detection mode version; the information to be presented comprises at least one of the following information: and (5) detecting results, data acquisition time, remarking information and additional information. The final presentation information includes at least one of the following: and obtaining process information of the final detection result (namely, a process for determining the final detection result based on the N detection results) by a countermeasure processing mode.
And 112, storing the N detection results and the final detection result into a detection result library, and storing the N information to be displayed and the final display information into a second storage area.
It can be understood that the data processing device stores information for judging the data to be detected into the first storage area, and stores N pieces of information to be displayed and final display information for display into the second storage area, so that hierarchical storage of the information is realized; the processing speed (such as the read-write speed) of the first storage area is larger than that of the second storage area, so that the consumption of storage resources can be reduced.
In the embodiment of the application, when the detection result comprises a first probability and a second probability, determining a final detection result based on the second probability when the traversed detection result indicates that the first probability is smaller than a white sample probability threshold; and when the traversed detection result shows that the second probability is smaller than the black sample probability threshold value, acquiring the current comprehensive probability. And continuing traversing the detection result sequence when the current integrated probability is smaller than the black sample probability threshold.
In the following, an exemplary application of the embodiments of the present application in a practical application scenario will be described. The exemplary application may be an application scenario where multiple scan engines are employed to scan samples to obtain final decision results, such as a detection scenario for cloud inspection, game playback (Replay), trojan detection, plug-in detection, etc.; the cloud searching refers to that a sample is transmitted to a server side or scanning logic is deployed at a client side so as to carry out scanning judgment; the game playback means that video playback acquired from the game server is taken as a sample to perform scan determination.
Referring to fig. 10, fig. 10 is a schematic diagram of an exemplary plug-in detection scenario provided in an embodiment of the present application; as shown in fig. 10, after a sample collection service 10-11 in a server 10-1 (referred to as a data processing device) collects a sample 10-21 to be detected (referred to as data to be detected) in a client 10-2, a plurality of plug-in analysis engines 10-12 (the plug-in analysis engine 1, the plug-in analysis engine 2, the plug-in analysis engines 3, …, the plug-in analysis engine m1, referred to as N detection modes are exemplarily shown) are adopted to scan the sample 10-21 to be detected until a final scan result 10-3 (referred to as a final detection result) is obtained, and a plug-in countermeasure 10-4 (referred to as processing the data to be detected) is performed based on the final scan result 10-3. In addition, the sample information display platform 10-5 can also display at least one of the following information: the scanning result, the scanning time and the version number of the scanning engine of each plug-in analysis engine, the acquisition time of the sample 10-21 to be detected and the final scanning result 10-3.
Referring to fig. 11, fig. 11 is a schematic diagram of an exemplary Trojan horse detection scenario provided in an embodiment of the present application; as shown in fig. 11, after a sample collection service 11-11 in a server 11-1 (referred to as a data processing device) collects a sample 11-21 to be detected (referred to as data to be detected) in a client 11-2, a plurality of Trojan horse analysis engines 11-12 (Trojan horse analysis engine 1, trojan horse analysis engine 2, trojan horse analysis engines 3, …, trojan horse analysis engine m2, referred to as N detection modes are exemplarily shown) are adopted to scan the sample 11-21 to be detected until a final scan result 11-3 (referred to as a final detection result) is obtained, and Trojan horse examination and killing 11-4 (referred to as processing the data to be detected) is performed based on the final scan result 11-3.
Referring to fig. 12, fig. 12 is a schematic diagram of an exemplary game detection scenario provided in an embodiment of the present application; as shown in fig. 12, after collecting a sample 12-21 to be detected (referred to as data to be detected) in a client 12-2, a cloud storage service 12-11 in a server 12-1 (referred to as data processing device) scans the sample 12-21 to be detected using a plurality of game analysis engines 12-12 (game analysis engine 1, game analysis engine 2, game analysis engine 3, …, game analysis engine m3, referred to as N detection modes are exemplarily shown) until a final scan result 12-3 (referred to as final detection result) is obtained, and performs malicious manipulation against 12-4 (referred to as processing the data to be detected) based on the final scan result 12-3.
It should be noted that, in the embodiment of the present application, the number, the unified output format, the hierarchical storage data, the optimization determination algorithm and the correction mechanism are used to integrate multiple scan engines to perform comprehensive determination, and each process is described below.
Numbering refers to numbering scan engines or sub-scan engines (also known as decision strategies) in a scan engine. For example, scan engine 1 is numbered 101 and scan engine 2 is numbered 102; for another example, the sub-scan engine (hit "string contains simulated key" policy) number in the plug-in detection scenario is 1000. Here, the priorities corresponding to the scan engines and the sub-scan engines may be represented by numbers; for example, positively and negatively, with the number. In addition, the priority may be positively correlated with the detection accuracy of the scan engine itself; and the priority may be determined whether to set according to the actual implementation.
And the unified output format is used for standardizing the scanning result of the scanning engine so as to store data in a hierarchical manner. The principles of the specification include: according to the function of the field in the scanning result, performing blocking processing; for example, a field for judging a function, corresponding information is stored in a judgment area (referred to as a first storage area), a field for displaying a function, and corresponding information is stored in a display area (referred to as a second storage area) other than the judgment area.
Referring to fig. 13, fig. 13 is a schematic diagram illustrating exemplary contents of a determination zone according to an embodiment of the present application; as shown in fig. 13, in the scan result set 13-1 (scan_results), an engine number 13-11 (src), a scan result 13-12 (match), and a scan time 13-13 (update, collectively referred to as detection result with the scan result 13-12); wherein the scan result 13-12 includes three dimensions of information, namely decision strategy number 13-121 (sid), decision result 13-122 (risk, enumeration type, for example, 1 represents non-black and non-white, 2 represents white, 4 represents black) and probability 13-123 (prob, between 0 and 1, default 1); the first probability and the second probability in the embodiments of the present application refer to the corresponding probability of the sample type of the scan engine, and the corresponding probability of the sample type of the scan engine may be determined by the determination result and the probability of one or more determination policies of the scan engine. Here, the scan result set and the scan result are arrays, and horizontal expansion can be achieved.
It should be noted that, through the unified output format, the scan results of different scan engines can be combined, and the combination is a process of horizontally expanding based on the same field.
Illustratively, referring to FIG. 14, FIG. 14 is an exemplary consolidated schematic provided by embodiments of the present application; as shown in fig. 14, in the scan result set 14-1, 4 engine numbers are exemplarily shown: engine number 14-11 through engine number 14-14. Wherein the scan time corresponding to the engine number 14-11 is the scan time 14-111, the scan result corresponding to the engine number 14-11 is the scan result 14-112, and the scan result 14-112 comprises two scan records; the scanning time corresponding to the engine number 14-12 is 14-121, the scanning result corresponding to the engine number 14-12 is 14-122, and the scanning result 14-122 comprises two scanning records; the scanning time corresponding to the engine number 14-13 is the scanning time 14-131, the scanning result corresponding to the engine number 14-13 is the scanning result 14-132, and the scanning result 14-132 comprises a scanning record; the scan time corresponding to the engine number 14-14 is the scan time 14-141, the scan result corresponding to the engine number 14-14 is the scan result 14-142, and the scan result 14-142 includes two scan records.
Here, since the information in the decision area is used for countermeasure, the information in the display area is used for display on the operation platform. Therefore, the judgment area and the presentation area are hierarchically stored according to the data amount and the read-write frequency (QPS), and an exemplary hierarchical storage manner is shown in table 1.
TABLE 1
As can be seen from Table 1, compared with the display area, the determination area has smaller data size and higher read/write frequency, and can use memory databases such as cache (e.g., redis, etc.) or distributed databases (e.g., tcaplus, etc.); and the display area has lower read-write frequency than the judgment area, and a disk-based database (such as Mysql and the like) can be adopted.
It should be noted that the principles of the specification further include: an engine information area is added in the decision area to store engine information (engine_info). The engine information includes an engine number, an engine version number (e.g., "so" field) and an engine resource file version number (e.g., "cfg" field, collectively referred to as a detection mode version with the engine version number) of each scan engine.
Referring to fig. 15, fig. 15 is a schematic diagram of an exemplary engine information area provided in an embodiment of the present application; as shown in fig. 15, in the engine information 15-1, an engine number (an engine number 15-11 is exemplarily shown), an engine version number (an engine version number 15-12 is exemplarily shown), and an engine resource file version number (an engine resource file version number 15-13 is exemplarily shown) of each scan engine are included.
Here, when rescanning is initiated to each scan engine, whether the engine resource file version number in the existing determination result is identical to the engine resource file version number in the engine information or whether the engine version number in the existing determination result is identical to the engine version number in the engine information or not may be compared, and rescanning is canceled if any one of the two is inconsistent; in addition, whether the scanning time in the existing judging result is consistent with the updating time of the scanning engine or not can be compared, rescanning is carried out if the scanning time is inconsistent with the updating time of the scanning engine, and rescanning is canceled if the scanning time is consistent with the updating time of the scanning engine.
Optimizing the judgment algorithm, namely, a final judgment algorithm, wherein the final judgment algorithm is as follows: traversing a sequence of scan engines based on priority; when the judgment information of the traversed scanning engine indicates that the sample is a white sample (for example, the probability of the white sample is 1), the traversing is ended, and the sample is determined to be the white sample; when the judgment information of the traversed scanning engine indicates that the probability (called second probability) that the sample is a black sample is 1, ending the traversing and determining that the sample is a black sample; when the judgment information of the traversed scanning engine indicates that the probability of the sample being a black sample is 0, continuing to traverse the next scanning engine; when the judgment information of the traversed scanning engine indicates that the probability of the sample being a black sample is larger than 0 and smaller than 1, weighting and summing the obtained black probabilities (called a second probability sequence) of the black samples based on the scanning engine coefficient to obtain a black probability value (called the current comprehensive probability); if the black probability value is greater than a black probability threshold (referred to as a black sample probability threshold), determining that the sample is a black sample; if the black probability value is less than or equal to the black probability threshold, continuing to traverse the next scan engine.
When weighting and summing the acquired individual black probabilities based on the scan engine coefficients, this can be achieved by the expression (1).
(1);
Wherein,black probability values obtained for the summation; />(i e (1, 2, …, n)) is the coefficient of the ith scan engine; />Black probability for the ith scan engine; n represents the number of scan engines that have been scanned.
In addition, in the scene of the normalization processing,satisfying type (2)
(2);
Wherein,is the total number of scan engines.
Correction mechanism, means correctionIs a process of (2). According to the coverage and accuracy of the black samples (the accuracy means the ratio of the number of actual black samples to the number of samples determined as black samples in the samples determined as black samples, the coverage means the ratio of the samples determined as black samples to the number of actual all black samples), the coefficient is corrected periodically->. The correction mechanism being, for example, first adjusted +.>One step (e.g., increase or decrease by 0.1), the change in coverage and accuracy is determined. If the coverage rate and the accuracy rate are improved, continuing to adjust in the same direction, otherwise, reversely adjusting until the coverage rate and the accuracy rate reach the optimal value; then adjusting +.>To->. Of course, a machine learning model can also be introduced, and the coefficients of each scanning engine can be obtained through training 。
The overall architecture of the embodiments of the present application is described below in terms of plug-in antagonism.
Referring to FIG. 16, FIG. 16 is an exemplary plug-in architecture diagram provided by embodiments of the present application; as shown in fig. 16, the sample collection service 16-11 in the server 16-1 is configured to collect a sample 16-3 from the client 16-2; the scan management service 16-12 is configured to determine whether scanning is required through the scan result storage 16-13, and further configured to determine a plurality of on-premise analysis engines 16-17 (illustrated as on-premise analysis engine 1, on-premise analysis engine 2, on-premise analysis engine 3, …, on-premise analysis engine m 4) to be scanned for performing scan distribution, and further configured to aggregate and store the scan results into the scan result storage 16-13; the judging service 16-14 is used for determining a final scanning result according to all engine judging results and the coefficients of all engines; the plug-in suppression service 16-15 is used for resisting account numbers corresponding to samples finally determined to be black samples; the suppression result analysis module 16-16 corrects the coefficients of each engine according to the accuracy and coverage. Here, information presentation may also be performed by the sample information presentation platforms 16-18.
Based on fig. 16, referring to fig. 17, fig. 17 is an exemplary decision flow chart provided in an embodiment of the present application; as shown in fig. 17, this exemplary determination flow includes steps 1701 to 1710, and each step is described below.
Step 1701, a sample abstract reported by the client is received.
Step 1702, inquiring the scanning result based on the sample abstract.
It should be noted that, when the client reports the MD5 of the sample (i.e., the sample abstract, referred to as the data identifier) to the server, the server queries whether the scanning result of the MD5 exists.
Step 1703, determine whether a scan result exists. If yes, go to step 1704, otherwise go to step 1705.
Step 1704, determine whether to rescan. If yes, go to step 1707, otherwise go to step 1709.
Step 1705, judging whether a sample corresponding to the sample abstract exists. If yes, go to step 1707, otherwise go to step 1706.
Step 1706, collect a sample.
Step 1707, the sample is distributed to a scan engine for scanning.
Step 1708, store scan results.
Step 1709, determining a final scan result.
If the scanning result exists and it is determined that rescanning is not needed, generating a final scanning result according to the existing scanning result, and determining whether to perform countermeasure based on the final scanning result; if the MD5 of the sample does not exist, the sample is sequentially subjected to the processes of collecting the sample, scanning and calculating the final scanning result, and whether the countermeasure is performed or not is determined based on the final scanning result. If there is a scan result and the scan engine is updated, the final scan result is regenerated after rescanning.
Step 1710, performing plug-in countermeasure based on the final scanning result.
The update flow of the scan engine is described below.
Referring to FIG. 18, FIG. 18 is an exemplary scan engine update flow diagram provided by embodiments of the present application; as shown in fig. 18, the exemplary scan engine update flow includes steps 1801 to 1806, which are described below.
Step 1801, start updating the scan engine.
Step 1802, determine whether the update affects the stored scan results. If yes, go to step 1803, otherwise go to step 1806.
It should be noted that, when the engine itself or the engine resource file is updated, it is necessary to first determine whether the update affects the stored scan result; for example, an update such as an increase in the scanning efficiency of the scan engine or a change in the algorithm does not affect the stored scan results.
Step 1803, regenerating the version number of the scan engine.
It should be noted that the version number of the regenerated scan engine may be one or both of the engine version number and the engine resource file version number.
Step 1804, send the regenerated version number to the scan management service.
In step 1805, the scan management service updates the regenerated version number to the engine information of the scan engine area.
Step 1806, end scan engine update.
It should be noted that if the update of the scan engine does not affect the stored scan results, the update of the scan engine is ignored; otherwise, the scanning management service is informed after the new version number is generated. The scan management service maintains engine information of the scan engine area, and updates the engine information after receiving the notification.
The following describes the expansion flow of the scan engine.
Referring to FIG. 19, FIG. 19 is an exemplary scan engine expansion flowchart provided by embodiments of the present application; as shown in fig. 19, the exemplary scan engine expansion flow includes steps 1901 to 1906, each of which is described below.
Step 1901, starting an extended scan engine.
Step 1902, normalizing the output of the scan engine.
When a new scan engine is accessed, the output format of the scan engine needs to be standardized first, that is, information for real-time determination is stored in a determination area, and information for display is stored outside the determination area.
Step 1903, determining the engine number of the scan engine.
Step 1904, determining the version number of the scan engine.
It should be noted that, the number of the scan engine is determined according to the priority relationship, and an initial version number is generated to notify the scan management service, so that the scan management service inserts a mapping relationship of a new scan engine in the engine information.
Step 1905, adjusting the scan engine coefficients.
It should be noted that, the coefficient of each scan engine may be adjusted by randomly determining a value from 0 to 1, and then executing the coefficient correction procedure.
Step 1906, end scan engine expansion.
The following describes the coefficient adjustment flow of the scan engine.
Referring to fig. 20, fig. 20 is an exemplary coefficient adjustment flowchart provided by an embodiment of the present application; as shown in fig. 20, the exemplary coefficient adjustment flow includes steps 2001 to 2007, each of which is described below.
Step 2001, starting to adjust the scan engine coefficients.
Step 2002, determining the coefficients of the partial scan engine to be adjusted and the adjustment direction.
Step 2003, adjusting part of the scan engine coefficients based on the adjustment direction.
Step 2004, obtaining locally optimal coefficients for the partial scan engine.
After adjusting the partial scan engine coefficients based on the adjustment direction, if the accuracy and coverage rate are improved, continuing to adjust the partial scan engine coefficients based on the adjustment direction until the accuracy and coverage rate are reduced, and taking the previous adjustment result as the local optimal coefficient of the partial scan engine. If the accuracy and coverage rate are reduced, reversely adjusting part of the scanning engine coefficients based on the adjustment direction; then, if the accuracy and coverage rate are still reduced, determining the coefficient of the partial scanning engine as a local optimal coefficient; if the accuracy and coverage rate are improved, the coefficients of the partial scanning engine are reversely adjusted based on the adjustment direction continuously until the accuracy and coverage rate are reduced, and the previous adjustment result is used as the local optimal coefficients of the partial scanning engine.
Step 2005, obtaining the current accuracy and coverage rate.
And 2006, judging whether the whole reaches the optimal or not. If yes, go to step 2007, otherwise go to step 2002.
Step 2007, finishing the adjustment of the scan engine coefficients.
For example, there are currently 4 scan engines, each with an initial specification coefficient of 0.25,0.25,0.25,0.25 and a corresponding accuracy of 95%. The 1 st and the 2 nd scanning engines are adjusted firstly by taking 0.05 as step length, and the coefficients after adjustment are 0.3,0.2,0.25,0.25 respectively, so that the accuracy is 96%, namely the accuracy is improved for the adjustment; and continuing to adjust to 0.35,0.15,0.25,0.25 to obtain the accuracy rate of 0.94, namely reducing the accuracy rate for the current adjustment, invalidating the current adjustment, and determining that 0.3 and 0.2 are local optimal coefficients of the 1 st and 2 nd scanning engines respectively. If the accuracy rate is reduced to 94% after the initial adjustment, the adjustment can be reversely adjusted to 0.2,0.3,0.25,0.25 until the accuracy rate reaches the maximum value, and the associated adjustment of the 1 st and 2 nd scanning engines is finished; then the association adjustments for scan engines 1 and 3 can be made, and so on. In addition, the coefficients of all the scanning engines can be comprehensively adjusted according to actual conditions. And after multiple rounds of adjustment, obtaining the overall optimal scanning coefficient.
In the embodiment of the application, the displayed information and the stored scanning result can be executed after serialization (for example, serialization by json or protobuf, etc.); and the adjustment of the coefficients can be realized by adopting any regression strategy; and, the judging area and the exhibiting area can be realized by at least one storage device.
It can be understood that, in the embodiment of the present application, on the one hand, hierarchical storage is realized by a unified format, so that consumption of storage resources can be reduced; on the other hand, through numbering, priority setting and optimizing a judgment algorithm, the calculation consumption of the scanning results is reduced, and the problem of judgment conflict existing in the process of synthesizing all the scanning results is solved, so that the detection efficiency and accuracy can be improved; on the other hand, the coefficients of the scanning engine are corrected through feedback, so that the adding and deleting efficiency and flexibility of the scanning engine and the detection accuracy are improved; in still another aspect, whether the stored scan result is redetected is determined by the version and update time of the scan engine, so that the instantaneity and accuracy of the scan result can be improved.
Continuing with the description below of exemplary structures implemented as software modules of the data processing device 455 provided by embodiments of the present application, in some embodiments, as shown in fig. 5, the software modules stored in the data processing device 455 of the memory 450 may include:
The request response module 4551 is configured to obtain N detection results for respectively detecting the data to be detected in N detection manners in response to an abnormality detection request for the data to be detected, where N is a positive integer greater than 1, and the detection manners are used to detect whether the data to be detected is abnormal;
the result ordering module 4552 is configured to perform reverse order ordering on the N detection results based on the priority of the detection manner, to obtain a detection result sequence;
the result determining module 4553 is configured to traverse the detection result sequence, and when determining a final detection result based on the traversed detection result, end the traversal of the detection result sequence;
and the data processing module 4554 is used for performing countermeasure processing on the data to be detected based on the final detection result.
In this embodiment of the present application, the result determining module 4553 is further configured to determine that the data to be detected is a white sample as the final detection result when the traversed detection result indicates that a first probability is greater than or equal to a probability threshold of the white sample, where the first probability is a probability that the data to be detected is the white sample, and the white sample is data independent of an abnormal feature.
In this embodiment of the present application, the result determining module 4553 is further configured to determine that the data to be detected is a black sample as the final detection result when the traversed detection result indicates that a second probability is greater than or equal to a black sample probability threshold, where the second probability is a probability that the data to be detected is the black sample, and the black sample is data hitting an abnormal feature.
In this embodiment of the present application, the result determining module 4553 is further configured to obtain a second probability sequence corresponding to the traversed detection result sequence when the traversed detection result indicates that the second probability is smaller than a black sample probability threshold; determining a target weight sequence based on a detection mode sequence corresponding to the traversed detection result sequence from M target weights corresponding to M detection modes, wherein the M detection modes comprise N detection modes, M is a positive integer,and is also provided withThe target weight in the target weight sequence represents the weight of the detection mode; correspondingly combining the target weight sequence and the second probability sequence to obtain a current comprehensive probability; and when the current comprehensive probability is greater than or equal to a black sample probability threshold, determining that the data to be detected is a black sample as the final detection result.
In this embodiment of the present application, the data processing apparatus 455 further includes a weight determining module 4555, configured to determine M initial weights of the M detection manners as M ith weights, and iterate i to perform the following processing, where i is a natural number: selecting L ith weights from M ith weights, wherein L is a positive integer, andthe method comprises the steps of carrying out a first treatment on the surface of the Correspondingly adjusting L ith weights based on L adjustment directions to obtain L (i+1) th weights; based on the L (i+1) th weights, replacing L (i) th weights in the M (i) th weights to obtain M first weights to be detected, and determining current index values corresponding to the M first weights to be detected; when the current index value is smaller than or equal to a detection index threshold value, determining M first weights to be detected as M (i+1) th weights; and determining M current weights, obtained in the iteration i, with the detection index value larger than the detection index threshold value as M target weights.
In this embodiment of the present application, the weight determining module 4555 is further configured to correspondingly adjust L ith weights based on L adjustment directions, to obtain L intermediate weights; based on the L intermediate weights, replacing L ith weights in the M ith weights to obtain M second weights to be detected, and determining first index values corresponding to the M second weights to be detected; the following process is iteratively performed: when the second index value corresponding to the M ith weights is smaller than or equal to the first index value, correspondingly adjusting the L intermediate weights based on the L adjustment directions; or when the second index value is greater than the first index value, reversely adjusting the L ith weights based on the L adjustment directions; and determining the L local optimal weights adjusted by iteration as L (i+1) th weights.
In the embodiment of the application, the iterative adjustment of the initial weights is triggered by one or more of the following: and when the adjustment time is up, adding and deleting the detection mode, updating the detection mode, detecting that the index value is lower than the detection index threshold value, receiving adjustment operation, and triggering an adjustment event.
In this embodiment of the present application, the request response module 4551 is further configured to obtain, in response to the anomaly detection request for the data to be detected, a data identifier from the anomaly detection request, where the data identifier is used to identify the data to be detected; matching the detection result corresponding to the data identifier in a detection result library corresponding to the detection result and the identifier stored in the first storage area; based on the matched detection results, N detection results of the data to be detected are respectively detected by N detection modes; and when the matching fails, adopting N detection modes to detect the data to be detected respectively, and obtaining N detection results.
In this embodiment of the present application, the request response module 4551 is further configured to determine a sequence of ways to be rechecked from N types of detection ways based on one or two of detection time and a version of the detection way; detecting the data to be detected by adopting a mode sequence to be re-detected to obtain a re-detection result sequence; and based on the re-detection result sequence, replacing one or more corresponding detection results in the matched detection results to obtain N detection results of the data to be detected, wherein the N detection results are detected by the N detection modes respectively.
In this embodiment of the present application, the request response module 4551 is further configured to determine, as the data to be detected, data corresponding to the data identifier, which is matched from a sample database corresponding to the data identifier, when the matching fails; selecting N detection modes of the data to be detected from M detection modes; and respectively detecting the data to be detected by adopting N detection modes to obtain N detection results.
In this embodiment of the present application, the data processing device 455 further includes an information storage module 4556, configured to obtain N pieces of information to be displayed of N pieces of detection results, and final display information of the final detection result; storing the N detection results and the final detection results into a detection result library, and storing the N information to be displayed and the final display information into a second storage area, wherein the processing speed of the second storage area is smaller than that of the first storage area.
Embodiments of the present application provide a computer program product comprising computer-executable instructions or a computer program stored in a computer-readable storage medium. The processor of the data processing apparatus reads the computer-executable instructions or the computer program from the computer-readable storage medium, and executes the computer-executable instructions or the computer program, so that the data processing apparatus performs the data processing method described in the embodiment of the present application.
The present embodiments provide a computer-readable storage medium in which computer-executable instructions or a computer program are stored, which when executed by a processor, cause the processor to perform a data processing method provided by the embodiments of the present application, for example, a data processing method as shown in fig. 6.
In some embodiments, the computer readable storage medium may be FRAM, ROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above memories.
In some embodiments, computer-executable instructions may be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, in the form of programs, software modules, scripts, or code, and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.
As an example, computer-executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, such as in one or more scripts in a hypertext markup language (Hyper Text Markup Language, HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
As an example, the computer-executable instructions may be deployed to be executed on one electronic device (in this case, the one electronic device is referred to as a data processing device), or on a plurality of electronic devices located at one place (in this case, a plurality of electronic devices located at one place are referred to as a data processing device), or on a plurality of electronic devices distributed at a plurality of places and interconnected via a communication network (in this case, a plurality of electronic devices distributed at a plurality of places and interconnected via a communication network are referred to as a data processing device).
It will be appreciated that in the embodiments of the present application, related data such as data to be detected and samples are referred to, and when the embodiments of the present application are applied to specific products or technologies, user permission or consent is required to be obtained, and the collection, use and processing of related data is required to comply with relevant laws and regulations and standards of relevant countries and regions. In the application, the relevant data collection and processing should be strictly according to the requirements of relevant national laws and regulations when the application is implemented, the informed consent or independent consent of the personal information body is obtained, and the subsequent data use and processing behaviors are developed within the authorized range of the laws and regulations and the personal information body. In addition, in the application, when the embodiment of the application is applied to specific products or technologies, the related data collection, use and processing processes should conform to the national legal and legal requirements, conform to legal, legal and necessary principles, do not relate to acquiring data types forbidden or limited by legal and legal regulations, and do not hinder the normal operation of a target website.
In summary, in the embodiment of the present application, when responding to an abnormal detection request for data to be detected, N detection results for respectively detecting the data to be detected by using N detection manners are obtained first, then the N detection results are traversed based on the detection priority, and under the condition that the final detection result is determined based on the traversed current detection result, the traversal of the detection result is ended; therefore, the data processing amount of the detection result is reduced, and the data detection efficiency can be improved; the problem of judgment conflict existing in the process of integrating all detection results can be solved. In addition, the final detection result is preferentially acquired based on the detection result with the highest priority, so that the detection accuracy can be ensured. In addition, the consumption of storage resources can be reduced through grading storage, and the adding and deleting efficiency and flexibility of the detection mode and the detection accuracy can be improved through adjusting the weight of the detection mode; in still another aspect, whether the stored detection result is re-detected or not is judged through the detection mode version and the detection time, so that the instantaneity and the accuracy of the scanning result can be improved.
The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. that are within the spirit and scope of the present application are intended to be included within the scope of the present application.
Claims (14)
1. A method of data processing, the method comprising:
responding to an abnormality detection request aiming at data to be detected, obtaining N detection results for respectively detecting the data to be detected by adopting N detection modes, wherein N is a positive integer greater than 1, and the detection modes are used for detecting whether the data to be detected is abnormal or not;
the N detection results are arranged in reverse order based on the priority of the detection mode, and a detection result sequence is obtained;
traversing the detection result sequence, and ending the traversing of the detection result sequence when a final detection result is determined based on the traversed detection result;
and processing the data to be detected based on the final detection result.
2. The method according to claim 1, wherein the method further comprises:
and when the traversed detection result indicates that the first probability is larger than or equal to a white sample probability threshold, determining that the data to be detected is a white sample as the final detection result, wherein the first probability is the probability that the data to be detected is the white sample, and the white sample is independent of abnormal characteristics.
3. The method according to claim 1, wherein the method further comprises:
And when the traversed detection result indicates that the second probability is larger than or equal to a black sample probability threshold, determining that the data to be detected is a black sample as the final detection result, wherein the second probability is the probability that the data to be detected is the black sample, and the black sample is the data hit with abnormal characteristics.
4. The method according to claim 1, wherein the method further comprises:
when the traversed detection result shows that the second probability is smaller than the black sample probability threshold value, acquiring a second probability sequence corresponding to the traversed detection result sequence;
determining a target weight sequence based on a detection mode sequence corresponding to the traversed detection result sequence from M target weights corresponding to M detection modes, wherein the M detection modes comprise N detection modes, M is a positive integer, and the target weights in the target weight sequence represent the weights of the detection modes;
correspondingly combining the target weight sequence and the second probability sequence to obtain a current comprehensive probability;
and when the current comprehensive probability is greater than or equal to a black sample probability threshold, determining that the data to be detected is a black sample as the final detection result.
5. The method according to claim 4, wherein the determining a target weight sequence from the M target weights corresponding to the M detection patterns is preceded by determining a target weight sequence based on a detection pattern sequence corresponding to the traversed detection result sequence:
determining M initial weights of M detection modes as M ith weights, and iterating i to execute the following processing, wherein i is a natural number:
selecting L ith weights from M ith weights, wherein L is a positive integer;
correspondingly adjusting L ith weights based on L adjustment directions to obtain L (i+1) th weights;
based on the L (i+1) th weights, replacing L (i) th weights in the M (i) th weights to obtain M first weights to be detected, and determining current index values corresponding to the M first weights to be detected;
when the current index value is smaller than or equal to a detection index threshold value, determining M first weights to be detected as M (i+1) th weights;
and determining M current weights, obtained in the iteration i, with the detection index value larger than the detection index threshold value as M target weights.
6. The method of claim 5, wherein the adjusting L of the ith weights based on L adjustment directions to obtain L of the ith+1 weights comprises:
Correspondingly adjusting L ith weights based on L adjustment directions to obtain L intermediate weights;
based on the L intermediate weights, replacing L ith weights in the M ith weights to obtain M second weights to be detected, and determining first index values corresponding to the M second weights to be detected;
the following process is iteratively performed: when the second index value corresponding to the M ith weights is smaller than or equal to the first index value, correspondingly adjusting the L intermediate weights based on the L adjustment directions; or when the second index value is greater than the first index value, reversely adjusting the L ith weights based on the L adjustment directions;
and determining the L local optimal weights adjusted by iteration as L (i+1) th weights.
7. The method according to claim 5 or 6, characterized in that the iterative adjustment of M of said initial weights is triggered by one or more of the following: and when the adjustment time is up, adding and deleting the detection mode, updating the detection mode, detecting that the index value is lower than the detection index threshold value, receiving adjustment operation, and triggering an adjustment event.
8. The method according to any one of claims 1 to 6, wherein the obtaining, in response to an abnormality detection request for data to be detected, N detection results for respectively detecting the data to be detected in N detection manners includes:
Obtaining a data identifier from the abnormality detection request in response to the abnormality detection request for the data to be detected, wherein the data identifier is used for identifying the data to be detected;
matching the detection result corresponding to the data identifier in a detection result library corresponding to the detection result and the identifier stored in the first storage area;
based on the matched detection results, N detection results of the data to be detected are respectively detected by N detection modes;
and when the matching fails, adopting N detection modes to detect the data to be detected respectively, and obtaining N detection results.
9. The method according to claim 8, wherein determining N detection results of the data to be detected by the N detection means based on the matched detection results, respectively, includes:
determining a mode sequence to be rechecked from N detection modes based on one or two of detection time and detection mode versions;
detecting the data to be detected by adopting the mode sequence to be re-detected to obtain a re-detection result sequence;
and based on the re-detection result sequence, replacing one or more corresponding detection results in the matched detection results to obtain N detection results of the data to be detected, wherein the N detection results are detected by the N detection modes respectively.
10. The method of claim 8, wherein the detecting the data to be detected by using N detection methods when the matching fails to obtain N detection results includes:
when the matching fails, determining the data matched with the data identifier in a sample database corresponding to the identifier and the data as the data to be detected;
selecting N detection modes of the data to be detected from M detection modes;
and respectively detecting the data to be detected by adopting N detection modes to obtain N detection results.
11. The method of any one of claims 1 to 6, wherein the traversing the sequence of test results, when determining a final test result based on the traversed test result, further comprises, after ending the traversing of the sequence of test results:
acquiring N pieces of information to be displayed of N detection results and final display information of the final detection results;
storing the N detection results and the final detection results into a detection result library, and storing the N information to be displayed and the final display information into a second storage area, wherein the processing speed of the second storage area is smaller than that of the first storage area.
12. A data processing apparatus, characterized in that the data processing apparatus comprises:
the request response module is used for responding to an abnormal detection request aiming at the data to be detected, acquiring N detection results for respectively detecting the data to be detected by adopting N detection modes, wherein N is a positive integer greater than 1, and the detection modes are used for detecting whether the data to be detected is abnormal or not;
the result ordering module is used for carrying out reverse order arrangement on the N detection results based on the priority of the detection mode to obtain a detection result sequence;
the result determining module is used for traversing the detection result sequence, and finishing the traversing of the detection result sequence when determining a final detection result based on the traversed detection result;
and the data processing module is used for processing the data to be detected based on the final detection result.
13. An electronic device for data processing, the electronic device comprising:
a memory for storing computer executable instructions or computer programs;
a processor for implementing the data processing method of any one of claims 1 to 11 when executing computer-executable instructions or computer programs stored in the memory.
14. A computer-readable storage medium storing computer-executable instructions or a computer program, which, when executed by a processor, implements the data processing method of any one of claims 1 to 11.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311401326.0A CN117272292B (en) | 2023-10-26 | 2023-10-26 | Data processing method, device, equipment and computer readable storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311401326.0A CN117272292B (en) | 2023-10-26 | 2023-10-26 | Data processing method, device, equipment and computer readable storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN117272292A true CN117272292A (en) | 2023-12-22 |
| CN117272292B CN117272292B (en) | 2024-02-27 |
Family
ID=89202563
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202311401326.0A Active CN117272292B (en) | 2023-10-26 | 2023-10-26 | Data processing method, device, equipment and computer readable storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN117272292B (en) |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9009820B1 (en) * | 2010-03-08 | 2015-04-14 | Raytheon Company | System and method for malware detection using multiple techniques |
| CN111400126A (en) * | 2020-02-19 | 2020-07-10 | 中国平安人寿保险股份有限公司 | Network service abnormal data detection method, device, equipment and medium |
| CN112818066A (en) * | 2019-11-15 | 2021-05-18 | 深信服科技股份有限公司 | Time sequence data anomaly detection method and device, electronic equipment and storage medium |
| CN114065187A (en) * | 2022-01-18 | 2022-02-18 | 中诚华隆计算机技术有限公司 | Abnormal login detection method and device, computing equipment and storage medium |
| CN115730305A (en) * | 2021-08-31 | 2023-03-03 | 杭州盈高科技有限公司 | Application program detection method and device, nonvolatile storage medium and processor |
| CN116305106A (en) * | 2022-09-09 | 2023-06-23 | 深信服科技股份有限公司 | Data detection method, device and storage medium |
-
2023
- 2023-10-26 CN CN202311401326.0A patent/CN117272292B/en active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9009820B1 (en) * | 2010-03-08 | 2015-04-14 | Raytheon Company | System and method for malware detection using multiple techniques |
| CN112818066A (en) * | 2019-11-15 | 2021-05-18 | 深信服科技股份有限公司 | Time sequence data anomaly detection method and device, electronic equipment and storage medium |
| CN111400126A (en) * | 2020-02-19 | 2020-07-10 | 中国平安人寿保险股份有限公司 | Network service abnormal data detection method, device, equipment and medium |
| CN115730305A (en) * | 2021-08-31 | 2023-03-03 | 杭州盈高科技有限公司 | Application program detection method and device, nonvolatile storage medium and processor |
| CN114065187A (en) * | 2022-01-18 | 2022-02-18 | 中诚华隆计算机技术有限公司 | Abnormal login detection method and device, computing equipment and storage medium |
| CN116305106A (en) * | 2022-09-09 | 2023-06-23 | 深信服科技股份有限公司 | Data detection method, device and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN117272292B (en) | 2024-02-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Erdődi et al. | Simulating SQL injection vulnerability exploitation using Q-learning reinforcement learning agents | |
| US12200014B2 (en) | Lifelong learning based intelligent, diverse, agile, and robust system for network attack detection | |
| Anderson et al. | Evading machine learning malware detection | |
| KR102323290B1 (en) | Systems and methods for detecting data anomalies by analyzing morphologies of known and/or unknown cybersecurity threats | |
| CN117879970A (en) | Network security protection method and system | |
| IL268052A (en) | Continuous learning for intrusion detection | |
| CN110188543A (en) | White list library, white list program library update method and industrial control system | |
| US11604833B1 (en) | Database integration for machine learning input | |
| US20230281315A1 (en) | Malware process detection | |
| Lesturgie et al. | Coalescence times, life history traits and conservation concerns: An example from four coastal shark species from the Indo‐Pacific | |
| CN118428454A (en) | A model-independent meta-learning method, device, equipment and storage medium | |
| CN117272292B (en) | Data processing method, device, equipment and computer readable storage medium | |
| CN115001763B (en) | Phishing website attack detection method and device, electronic equipment and storage medium | |
| CN119557215A (en) | A limit testing method, system, device and medium for artificial intelligence model | |
| CN114510980B (en) | Model feature acquisition method and device, electronic equipment and storage medium | |
| KR102618707B1 (en) | Device and method for generating learning data utilizing penetration test attack data, and learning device and method for artificial neural network model utilizing the learning data | |
| CN117687890B (en) | Abnormal operation identification method, system, medium and equipment based on operation log | |
| US11930048B1 (en) | Testing complex decision systems using outcome learning-based machine learning models | |
| US20230319086A1 (en) | Method, product, and system for network security management using a reasoning and inference engine | |
| Zeng et al. | Approximating behavioral equivalence of models using top-k policy paths. | |
| CN110798454A (en) | A method to defend against attacks based on the assessment of attack organization capability | |
| Zhang et al. | An intrusion detection scheme based on repeated game in smart home | |
| KR20240104542A (en) | System for judging abnormal behavior of users and judging method therefor | |
| CN110225019B (en) | Network security processing method and device | |
| KR101988205B1 (en) | Virtual private network service system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |