RU2845174C1

RU2845174C1 - Use of genetic algorithms and method of reference vectors in application to detection of intrusions and analysis of network traffic

Info

Publication number: RU2845174C1
Application number: RU2024105788A
Authority: RU
Inventors: Сергей Александрович Чарынцев; Андрей Геннадьевич Егоров
Original assignee: Общество С Ограниченной Ответственностью "Юзергейт"
Filing date: 2024-03-06
Publication date: 2025-08-14

Abstract

FIELD: physics.

SUBSTANCE: invention relates to a method of marking potentially dangerous data blocks. In the method, data blocks are received, data samples parameters are set, where the data sample is presented in the form of a data block section from successive information transmission units, setting a method of converting data sections represented by a sequence of information transmission units, providing the information transmission units sequence conversion, forming the data samples separate sets in accordance with the data samples given parameters, selecting data section representing a sequence of information transmission units with entropy, minimum for possible sampling data sections, converting the data sections into compressed data sections, which are checked for probable belonging to the compressed sections of data blocks of dangerous data blocks using a preset classifier, in the case of finding a certain compressed portion of the data block, a notification on the potential danger of the data block is generated and the corresponding data block is transmitted over the computer network together with the notification on the potential danger of the block, the danger of the blocks is checked, for which notification on potential hazard is generated and safety of units for which notification on potential hazard is not generated is checked.

EFFECT: high accuracy of determining hazard of code embedded in packets.

19 cl, 3 dwg

Description

ОБЛАСТЬ ТЕХНИКИAREA OF TECHNOLOGY

Изобретение относится к области технологий сетевой безопасности, а именно к способу анализа пакетов в сети передачи данных (сетевых пакетов) и может быть использовано как для фильтрации сетевых пакетов, то есть, для удаления пакетов из сетевого трафика, так и для оповещения о возможных угрозах с целью принятия решения о необходимости более тщательного анализа сетевых пакетов.The invention relates to the field of network security technologies, namely to a method for analyzing packets in a data transmission network (network packets) and can be used both for filtering network packets, that is, for removing packets from network traffic, and for warning about possible threats in order to make a decision on the need for a more thorough analysis of network packets.

УРОВЕНЬ ТЕХНИКИLEVEL OF TECHNOLOGY

Способы анализа данных, передаваемых по информационным сетям в виде пакетов данных, представляющих информационные блоки на физическом уровне, используются как часть технологии построения межсетевых экранов. Эффективность анализа такого рода данных оказывает существенное влияние как на безопасность сетей и программно-аппаратных комплексов конечных пользователей, так и на пропускную способность и качество передачи данных информационных сетей. Соответствующие технологии постоянно развиваются.Methods of analyzing data transmitted over information networks in the form of data packets representing information blocks at the physical level are used as part of the technology for constructing firewalls. The effectiveness of analyzing this type of data has a significant impact on both the security of networks and software and hardware complexes of end users, as well as on the throughput and quality of data transmission in information networks. The corresponding technologies are constantly evolving.

В общем случае, способы анализа данных реализуют полную проверку сетевых пакетов, например, пакетов, передаваемых с использованием интернет протокола (IP) на соответствие заранее заданным правилам, как это показано в патенте США US 5864666 A, либо используют определенные допущения, при проверке данных, как это показано в патенте США US 11882095 B2. В последнем случае, результаты проверки данных имеют вероятностный характер и не могут быть признаны полностью достоверными. Основным недостатком известной и аналогичных система анализа трафика, с точки зрения использования в качестве инструмента распознавания вторжений, является то, что большинство систем обнаружения анализа трафика имеют ограниченные возможности обмена информацией с внешними по отношению к ним пользовательскими приложениями, а также то, что они, как правило, используют заранее подготовленный набор сценариев для реакции на изменения сетевой обстановки.In general, data analysis methods implement a complete check of network packets, for example, packets transmitted using the Internet Protocol (IP) for compliance with predetermined rules, as shown in US Patent US 5864666 A, or use certain assumptions when checking data, as shown in US Patent US 11882095 B2. In the latter case, the results of data checking are probabilistic in nature and cannot be considered completely reliable. The main disadvantage of the known and similar traffic analysis systems, from the point of view of use as an intrusion detection tool, is that most traffic analysis detection systems have limited capabilities for exchanging information with user applications external to them, and also that they, as a rule, use a pre-prepared set of scenarios for responding to changes in the network environment.

В связи с наличием определенных недостатков у упомянутых выше способов анализа данных, при необходимости повышения безопасности сетевых систем, защищаемых межсетевыми экранами, как это показано в патентной заявке США US 2022385635 A1, для фильтрации опасного трафика используются комбинированные способы защиты, где после фильтрации сетевых пакетов с использованием технологий машинного обучения, производится исчерпывающая проверка потенциально опасных пакетов на соответствие заранее заданному набору признаков.Due to the presence of certain shortcomings in the above-mentioned data analysis methods, when it is necessary to increase the security of network systems protected by firewalls, as shown in the US patent application US 2022385635 A1, combined protection methods are used to filter dangerous traffic, where after filtering network packets using machine learning technologies, an exhaustive check of potentially dangerous packets is performed for compliance with a predetermined set of features.

В связи с тем, что исчерпывающая проверка данных является, в определенной степени, тривиальной задачей, развитие уровня техники в части повышения точности выявления потенциально опасных блоков данных и скорости работы программно-аппаратных комплексов, реализующих новые технологии, является важной задачей уровня техники.Since exhaustive data verification is, to a certain extent, a trivial task, the development of the state of the art in terms of increasing the accuracy of identifying potentially dangerous data blocks and the speed of operation of software and hardware systems implementing new technologies is an important task of the state of the art.

В патентной заявке США US 2022385635 A1, в которой раскрыто техническое решение, наиболее близкое к заявленному изобретению, для индикации возможности наличия вредоносной активности в сети Интернет используется журнал трафика. Записи журнала трафика анализируются на предмет наличия аномальной активности. Для анализа данных используется кластеризация векторов записей с проверкой метаданных сетевых пакетов на принадлежность кластерам, относящимся, а нормальной активности участников процесса обмена данными. Аномальные данные рассматриваются как потенциально опасные и анализируются на соответствие заранее заданным правилам. Для кластеризации данных используется параметр расстояния между векторами анализируемых записей журнала.In the US patent application US 2022385635 A1, which discloses a technical solution closest to the claimed invention, a traffic log is used to indicate the possibility of malicious activity on the Internet. Traffic log records are analyzed for the presence of anomalous activity. To analyze the data, clustering of record vectors is used with a check of network packet metadata for belonging to clusters related to the normal activity of participants in the data exchange process. Anomalous data is considered as potentially dangerous and is analyzed for compliance with pre-set rules. The distance parameter between the vectors of the analyzed log records is used to cluster the data.

Существенным недостатком известного технического решения является невозможность анализа содержимого пакетов или блоков передаваемых данных на наличие данных, которые могут нанести вред участникам электронного взаимодействия, которое осуществляется с использованием физических сетей передачи данных.A significant drawback of the known technical solution is the impossibility of analyzing the contents of packets or blocks of transmitted data for the presence of data that could harm participants in electronic interactions that are carried out using physical data transmission networks.

Настоящее изобретение не относится к способам исчерпывающего анализа сетевых пакетов, но пригодно для использования в составе технологий анализа сетевого трафика, как инструмент, предварительного определения вероятности того, что сетевой пакет может быть опасным для участников системы обмена данными.The present invention does not relate to methods for exhaustive analysis of network packets, but is suitable for use in network traffic analysis technologies as a tool for preliminary determination of the probability that a network packet may be dangerous for participants in a data exchange system.

Задачей нестоящего изобретения является устранение недостатков уровня техники и достижения технического результата, заключающегося в возможности предотвращения формирования ложноположительных оповещений об опасности данных, содержащихся в сетевых пакетах, выявления вредоносного кода, для которого не сформированы формальные поисковые признаки и не существует эффективных способов защиты. Изобретение обеспечивает повышенную точность определения опасности вложенного в пакеты кода, при этом обеспечивается возможность автоматической коррекции определения угроз, например, при повышении устойчивости системы к нежелательным воздействиям. В последнем случае, формирование ложноположительных уведомлений не нарушает безопасность системы, но снижает ее производительность. При реализации изобретения обеспечивается возможность распознавания новых и неизвестных сетевых атак, в том числе, так называемых атак «нулевого дня». Предложенное изобретение также обеспечивает возможность обнаружения вторжений за счет наличия обратной связи с внешними системами и обеспечения принятия решений не только на основе анализа кода, который является вредоносным, но и на путем выявления закономерностей в сетевом трафике, которые сопутствуют или могут сопутствовать сетевым вторжениям.The objective of this invention is to eliminate the disadvantages of the prior art and achieve a technical result consisting in the ability to prevent the formation of false positive alerts about the danger of data contained in network packets, to identify malicious code for which formal search features have not been formed and for which there are no effective methods of protection. The invention provides increased accuracy in determining the danger of the code embedded in the packets, while providing the ability to automatically correct the threat detection, for example, when increasing the system's resistance to unwanted effects. In the latter case, the formation of false positive alerts does not violate the security of the system, but reduces its performance. When implementing the invention, it is possible to recognize new and unknown network attacks, including so-called "zero-day" attacks. The proposed invention also provides the ability to detect intrusions due to the presence of feedback with external systems and to ensure decision-making not only based on the analysis of the code that is malicious, but also by identifying patterns in network traffic that accompany or may accompany network intrusions.

СУЩНОСТЬ ИЗОБРЕТЕНИЯESSENCE OF THE INVENTION

Для реализации назначения изобретения и достижения технического результата предлагается способ маркировки потенциально опасных блоков данных, при реализации которого, с использованием цифрового представления данных, реализованного в виде совокупности ячеек памяти, физическое состояние которых соответствует представляемым данным, сохраняемым в памяти программно-аппаратного комплекса, осуществляются следующие этапы преобразования сигналов и обработки данных:In order to implement the purpose of the invention and achieve the technical result, a method for marking potentially dangerous data blocks is proposed, during the implementation of which, using a digital representation of data implemented in the form of a set of memory cells, the physical state of which corresponds to the represented data stored in the memory of the hardware and software complex, the following stages of signal conversion and data processing are carried out:

последовательно принимают блоки данных, представленные в виде блоков структурированных единиц передачи информации, предназначенных для передачи по компьютерной сети, после чего передают принятые блоки данных по компьютерной сети, где структура блоков данных определена протоколом передачи данных по компьютерной сети;sequentially receive data blocks presented in the form of blocks of structured information transmission units intended for transmission over a computer network, after which the received data blocks are transmitted over a computer network, where the structure of the data blocks is determined by the data transmission protocol over a computer network;

задают параметры выборок данных, где выборка данных представлена в виде участка блока данных из последовательных единиц передачи информации, а заданные параметры выборок содержат длину выборки, и начальный адрес, характеризующий положение первой единицы передачи информации в выборке, при этом заданные параметры выборок данных определяют, представленные в двоичном формате хранения данных, количество выборок данных в каждом блоков данных, начальный адрес первой выборки данных в каждом из блоков данных и смещения начальных адресов последующих выборок данных относительно начального адреса первой выборки данных в каждом из блоков данных;specifying parameters of data samples, where the data sample is presented in the form of a section of a data block of consecutive information transfer units, and the specified parameters of the samples contain the length of the sample and the starting address characterizing the position of the first information transfer unit in the sample, wherein the specified parameters of the data samples determine, presented in a binary data storage format, the number of data samples in each data block, the starting address of the first data sample in each of the data blocks and the offsets of the starting addresses of subsequent data samples relative to the starting address of the first data sample in each of the data blocks;

задают способ преобразования участков данных, представленных последовательностью единиц передачи информации обеспечивающий преобразование последовательность единиц передачи информации каждого из участков данных в последовательность из заранее заданного количества бит сжатых участков данных таким образом, что для блоков данных, относящихся к сходным по опасному воздействию на устройства компьютерной сети, при сравнении сжатых участков данных по заданным критериям, сходные участки данных соответствуют сходным соответствующим блокам данных;specify a method for converting sections of data represented by a sequence of information transmission units that ensures the conversion of the sequence of information transmission units of each of the sections of data into a sequence of a predetermined number of bits of compressed sections of data in such a way that for blocks of data related to similar hazardous impact on computer network devices, when comparing compressed sections of data according to specified criteria, similar sections of data correspond to similar corresponding blocks of data;

(A) для блоков данных из последовательно принятых блоков данных формируют отдельные совокупности выборок данных в соответствии с заданными параметрами выборок данных; (A) for data blocks, separate sets of data samples are formed from sequentially received data blocks in accordance with specified parameters of the data samples;

из каждой из выборок данных выбирают участок данных такой, что выбранный участок данных представляет последовательность единиц передачи информации с энтропией, минимальной для возможных участков данных выборки;from each of the data samples, a data section is selected such that the selected data section represents a sequence of information transfer units with an entropy that is minimal for the possible data sections of the sample;

преобразуют участки данных в сжатые участки данных;convert data chunks into compressed data chunks;

(В) проверяют сжатые участки данных блоков данных на вероятную принадлежность к сжатым участкам блоков данных опасных блоков данных с использованием предварительно настроенного классификатора сжатых участков данных такого, что для многомерного пространства, число измерений которого равняется числу бит в сжатых участках, для чего задаются координаты многомерной плоскости такой, что большинство точек с координатами сжатых участков блоков данных заведомо опасных блоков данных располагается с первой стороны многомерной плоскости в многомерном пространстве, большинство точек с координатами сжатых участков заведомо неопасных блоков данных располагается со второй другой стороны многомерной плоскости, а координаты сжатых участков в многомерном пространстве представлены значениями бит сжатых участков, расположенных соответствующих позициях цифрового представления сжатых участков, где при нахождении сжатого участка блока данных с первой стороны многомерной плоскости, формируют уведомление о потенциальной опасности блока данных и передают по компьютерной сети соответствующий блок данных совместно с уведомлением о потенциальной опасности блока; (B) checking the compressed data sections of the data blocks for probable belonging to the compressed data sections of the data blocks of dangerous data blocks using a pre-configured classifier of compressed data sections such that for a multidimensional space, the number of dimensions of which is equal to the number of bits in the compressed sections, for which purpose the coordinates of a multidimensional plane are specified such that the majority of points with the coordinates of the compressed data sections of the data blocks of obviously dangerous data blocks are located on the first side of the multidimensional plane in the multidimensional space, the majority of points with the coordinates of the compressed sections of obviously non-dangerous data blocks are located on the second other side of the multidimensional plane, and the coordinates of the compressed sections in the multidimensional space are represented by the values of the bits of the compressed sections located in the corresponding positions of the digital representation of the compressed sections, where, when a compressed section of the data block is located on the first side of the multidimensional plane, a notification of the potential danger of the data block is generated and the corresponding data block is transmitted over a computer network together with the notification of the potential danger of the block;

при этом:in this case:

(C) проверяют опасность блоков, для которых сформировано уведомление о потенциальной опасности и проверяют безопасность блоков, для которых не сформировано уведомление о потенциальной опасности; (C) check the hazard of blocks for which a potential hazard notification has been generated and check the safety of blocks for which a potential hazard notification has not been generated;

при выявлении и заранее заданном количестве блоков несоответствующих блоков, таких что, для опасных блоков не сформировано уведомление о потенциальной опасности и для безопасных блоков сформировано уведомление о потенциальной опасности, уточняют координаты многомерной плоскости, используемой на этапе (В), с обеспечением наличия минимального количества несоответствующих блоков в последующем заранее заданном количестве блоков;when identifying and a predetermined number of blocks of non-conforming blocks, such that for dangerous blocks no notification of potential danger has been generated and for safe blocks a notification of potential danger has been generated, the coordinates of the multidimensional plane used in stage ( B) are specified, ensuring the presence of a minimum number of non-conforming blocks in the subsequent predetermined number of blocks;

(D) в случае, если минимальное количество несоответствующих блоков в заранее заданном количестве блоков не изменяется при уточнении координат многомерной плоскости, изменяют параметры выборок на этапе (А) с обеспечением минимального количества несоответствующих блоков так, что (D) if the minimum number of non-conforming blocks in a predetermined number of blocks does not change when refining the coordinates of the multidimensional plane, change the parameters of the samples at step ( A) to ensure the minimum number of non-conforming blocks so that

случайным образом выбирают совокупности ранее использованных параметров выборок;randomly select sets of previously used sample parameters;

из случайной пары из совокупностей ранее использованных параметров формируют результирующую совокупность параметров, так, что значащие биты параметров группируют по секторам заранее заданной длины с заранее заданными адресами, а биты параметров результирующей совокупности формируют из секторов параметров случайной пары путем случайного выбора секторов с соответствующими адресами соответствующего параметра одной из случайных пар;from a random pair of sets of previously used parameters, a resulting set of parameters is formed, so that the significant bits of the parameters are grouped into sectors of a predetermined length with predetermined addresses, and the bits of the parameters of the resulting set are formed from the sectors of the parameters of the random pair by randomly selecting sectors with the corresponding addresses of the corresponding parameter of one of the random pairs;

из нескольких совокупностей параметров из ранее использованных совокупностей параметров и результирующей совокупностей параметров выбирают лучшую совокупность параметров, для которой обеспечивается минимальное количество несоответствующих блоков при реализации этапа (С) и используют лучшую совокупность параметров на этапе (А). from several sets of parameters from the previously used sets of parameters and the resulting set of parameters, the best set of parameters is selected for which the minimum number of non-conforming blocks is ensured during the implementation of stage (C) and the best set of parameters is used at stage ( A).

В частном, преимущественном случае реализации изобретения, способ применяется в сетях, где блоки данных являются сетевыми блоками данных и имеют формат пакетов сети пакетной передачи данных, например, форматов, используемых в сети в сети Интернет или локальных вычислительных сетях. В другом преимущественном случае реализации изобретения, блоки данных представляют собой группированные единицы передачи информации, размер каждой из которых составляет 1 байт. Еще в одном случае реализации изобретения, параметры выборок выбираются с возможностью перекрытия последовательностей единиц передачи информации, составляющих различные выборки. Количество выборок в пакете, в частном случае реализации, может определяться длиной пакета в соответствии с шагом и длиной выборок данных.In a particular, advantageous case of implementing the invention, the method is used in networks where the data blocks are network data blocks and have the format of packets of a packet data network, for example, the formats used in the Internet or local area networks. In another advantageous case of implementing the invention, the data blocks are grouped units of information transmission, the size of each of which is 1 byte. In yet another case of implementing the invention, the parameters of the samples are selected with the possibility of overlapping sequences of units of information transmission, constituting different samples. The number of samples in a packet, in a particular case of implementation, can be determined by the length of the packet in accordance with the step and length of the data samples.

Еще в одном частном случае реализации, параметры одной из выборок сформированы путем инвертирования всех бит случайно выбранных секторов одной из использованных совокупностей параметров.In another particular implementation case, the parameters of one of the samples are formed by inverting all the bits of randomly selected sectors of one of the used sets of parameters.

Еще в одном предпочтительном случае реализации изобретения, перед началом использования способа производят предварительную настройку параметров способа, для чего задают случайным образом первоначальные параметры выборок данных, и реализуют способ, в котором последовательно принимают блоки данных обучающей последовательности блоков, блоки которой заведомо относятся к опасным и безопасным блокам.In yet another preferred case of implementing the invention, before starting to use the method, the parameters of the method are pre-set, for which the initial parameters of the data samples are randomly set, and the method is implemented in which the data blocks of the training sequence of blocks are sequentially received, the blocks of which are obviously related to dangerous and safe blocks.

В частном случае реализации, обучающая последовательность блоков представлена конечным количеством блоков, с обеспечением возможности многократного приема блоков данных обучающей последовательности, при этом, блоки данных для приема могут быть выбраны случайным образом. В частном случае реализации изобретения, приостанавливают реализацию способа и выполняют операции способа с проверкой качества маркировки потенциально опасных блоков данных, для чего принимают блоки данных тестовой последовательности блоков данных, блоки данных которой отличаются об блоков обучающей последовательности блоков и заведомо относятся к опасным и безопасным блокам, и, при наличии маркировки для более 70% заведомо опасных блоков и отсутствии маркировки у более 70% блоков, используют настройки способа для маркировки блоков сетевого трафика, а в другом случае, производят дополнительную предварительную настройку параметров способа.In a particular case of implementation, the training sequence of blocks is represented by a finite number of blocks, with provision of the possibility of multiple reception of data blocks of the training sequence, wherein, the data blocks for reception can be selected randomly. In a particular case of implementation of the invention, the implementation of the method is suspended and the operations of the method are performed with a check of the quality of marking of potentially dangerous data blocks, for which the data blocks of the test sequence of data blocks are received, the data blocks of which differ from the blocks of the training sequence of blocks and are obviously related to dangerous and safe blocks, and, in the presence of marking for more than 70% of obviously dangerous blocks and the absence of marking for more than 70% of the blocks, the settings of the method are used for marking the network traffic blocks, and in another case, additional preliminary adjustment of the parameters of the method is performed.

В частном случае реализации изобретения, этап (D) выполняют с помощью последовательности операций, используемых для реализации «генетического алгоритма» на программно-аппаратных вычислительных средствах.In the particular case of implementing the invention, stage (D) is performed using a sequence of operations used to implement a “genetic algorithm” on software and hardware computing equipment.

В другом частном случае реализации изобретения, преобразуют участки данных в сжатые участки данных с использованием последовательности операций, используемых для хэширования с учетом положения (Locality-sensitive hashing или lsh). В частном случает реализации хэширования с учетом положения используют последовательность операций «Trend Micro Locality Sensitive Hash».In another particular case of implementing the invention, data sections are transformed into compressed data sections using a sequence of operations used for locality-sensitive hashing (lsh). In a particular case of implementing locality-sensitive hashing, the sequence of operations "Trend Micro Locality Sensitive Hash" is used.

Где один из примеров реализации «Генетического алгоритма» раскрыт в CN 106817376 A. При этом построение гиперплоскости, в частном случае, осуществляют с использованием последовательности операций, используемых при реализации способа или машины опорных векторов (support vector machine, SVM). При реализации способа с использованием аппаратных средств, блоки данных принимаются и передаются по физической сети передачи электромагнитных сигналов в виде пакетов электромагнитных сигналов, где пакеты электромагнитных сигналов преобразуются в цифровую форму представления данных и обратно. В частном случае реализации, в качестве параметра энтропии используют максимально возможную степень сжатия участков данных заданным способом сжатия данных, причем увеличение степени сжатия, то есть отношения исходного участка данных к сжатому представлению данных, характеризует понижение энтропии. В качестве способ сжатия данных может использоваться способ сжатия данных с потерями. Кроме того, в качестве параметра энтропии может быть использована степень предсказуемости данных в участках данных, где высокая предсказуемость данных характеризует низкую энтропию.Where one of the examples of the implementation of the "Genetic Algorithm" is disclosed in CN 106817376 A. In this case, the construction of the hyperplane, in a particular case, is carried out using a sequence of operations used in the implementation of the method or a support vector machine (SVM). When implementing the method using hardware, data blocks are received and transmitted over a physical network for transmitting electromagnetic signals in the form of electromagnetic signal packets, where the electromagnetic signal packets are converted into a digital form of data representation and back. In a particular case of implementation, the maximum possible degree of compression of data sections by a given data compression method is used as the entropy parameter, wherein an increase in the compression degree, i.e. the ratio of the original data section to the compressed data representation, characterizes a decrease in entropy. A lossy data compression method can be used as the data compression method. In addition, the degree of data predictability in data sections, where high data predictability characterizes low entropy, can be used as the entropy parameter.

Для поиска сигнатур или паттернов угроз, предпочтительным является использование метода характеристических выборок, где определяющими являются число выборок для анализа сигнатуры, длина выборки, а также смещение выборки относительно начала пакета или относительно начала последовательности связанных данных. В общем случае, для каждой из сигнатур, задача имеет идеальное решение, заключающееся в том, что для каждой угрозы существуют параметры окон, содержащих только те данные, которые относятся к угрозе непосредственно. К таким данным могут относиться определения параметров распаковки архивов, имеющих высокую энтропию, пароли для распаковки, проверочные строки и прочая метаинформация. В частных случаях реализации, к участкам с низкой энтропией могут относиться данные, которые выглядят, как данные, сконфигурированные случайным, образом, но не поддаются сжатию, либо намерено вставляются в файл в неизменном виде.To search for signatures or patterns of threats, it is preferable to use the method of characteristic samples, where the determining factors are the number of samples for signature analysis, the sample length, and the sample offset relative to the beginning of the packet or relative to the beginning of the sequence of related data. In the general case, for each of the signatures, the problem has an ideal solution, which consists in the fact that for each threat there are window parameters containing only those data that are directly related to the threat. Such data may include definitions of parameters for unpacking archives with high entropy, passwords for unpacking, verification strings, and other metainformation. In particular cases of implementation, low-entropy areas may include data that looks like randomly configured data, but cannot be compressed, or is intentionally inserted into the file unchanged.

В контексте данного изобретения, под участками данных с низкой энтропией понимаются не только блоки данных, упорядоченные явным образом, но и блоки данных, которые имеют хаотичную структуру, но могут коррелировать в заранее заданными структурами. В последнем случае, поиск структур данных в псевдослучайных структурах может проводиться по аналогии с обработкой данных, содержащихся в сигналах спутниковой навигации. В этом случае, алгоритмы сжатия данных с сохранением структуры, обеспечивают сходство для заранее заданного количества векторов.In the context of this invention, low-entropy data sections are understood to mean not only explicitly ordered data blocks, but also data blocks that have a chaotic structure, but can correlate with predetermined structures. In the latter case, the search for data structures in pseudo-random structures can be carried out by analogy with the processing of data contained in satellite navigation signals. In this case, data compression algorithms preserving the structure provide similarity for a predetermined number of vectors.

Помимо прочего, генетический алгоритм может быть использован для управления параметрами хэш функции с понижением размерности, например, путем задания длины результирующего хэша.Among other things, a genetic algorithm can be used to control the parameters of a hash function with dimensionality reduction, for example by specifying the length of the resulting hash.

Предложенный способ может быть использован одновременно на нескольких отдельно стоящих вычислительных системах, где каждая из вычислительных систем реализует циклю операций предложенного способа, с обеспечением координации управления значениями настроек, в случае необходимости. Например, «обучение» способа и «тестирование» способа может осуществляться на различных вычислительных системах.The proposed method can be used simultaneously on several separate computing systems, where each of the computing systems implements a cycle of operations of the proposed method, with provision of coordination of control of the setting values, if necessary. For example, "training" of the method and "testing" of the method can be carried out on different computing systems.

В частном случае реализации, для определения координат многомерной плоскости (гиперплоскости) используются уведомления о нарушениях или сохранении нормальной работы вычислительных систем по результатам проверки блоков в процессе маркировки.In a particular case of implementation, notifications about violations or maintaining normal operation of computing systems based on the results of checking blocks during the marking process are used to determine the coordinates of a multidimensional plane (hyperplane).

В контексте изобретения, понятия «алгоритм», «способ», «метод», «машина» и аналогичные понятия, описывают операции, реализуемые посредством изменения состояния физических объектов, таких как регистры процессора, ячеек памяти и аналогичных элементов. Изменение состояния объектов также является физическим процессом, например, переносом заряда внутри твердотельных устройств, изменениями параметров электромагнитного поля и т.п. понятия «параметры настроек», также являются физическими параметрами, характеризующими отдельные элементы блоков хранения данных, например, таким параметром может являться ориентация магнитного домена, уровень потенциала или значения заряде емкостной ячейки памяти и т.п.In the context of the invention, the concepts of "algorithm", "method", "machine" and similar concepts describe operations implemented by changing the state of physical objects, such as processor registers, memory cells and similar elements. Changing the state of objects is also a physical process, for example, charge transfer inside solid-state devices, changes in the parameters of the electromagnetic field, etc. The concepts of "setting parameters" are also physical parameters characterizing individual elements of data storage units, for example, such a parameter may be the orientation of a magnetic domain, the potential level or the charge value of a capacitive memory cell, etc.

ОПИСАНИЕ ЧЕРТЕЖЕЙDESCRIPTION OF DRAWINGS

Реализация изобретения будет описана в дальнейшем в соответствии с прилагаемыми чертежами, которые представлены для пояснения сути изобретения и никоим образом не ограничивают область изобретения. К заявке прилагаются следующие чертежи:The implementation of the invention will be described further in accordance with the attached drawings, which are presented to explain the essence of the invention and in no way limit the scope of the invention. The following drawings are attached to the application:

На фиг. 1 проиллюстрирован пример вычислительного устройства, используемого в составе вычислительной системы общего назначения, в которой может быть применено настоящее изобретение.Fig. 1 illustrates an example of a computing device used in a general-purpose computing system in which the present invention can be applied.

На фиг. 2 проиллюстрирована упрощенная схематическая функциональная диаграмма одного из вариантов реализации системы, в которой может использоваться предложенный способ.Fig. 2 illustrates a simplified schematic functional diagram of one embodiment of a system in which the proposed method can be used.

Фиг. 3 кратко иллюстрирует последовательность операций, которые могут быть выполнены при реализации изобретения.Fig. 3 briefly illustrates the sequence of operations that can be performed when implementing the invention.

ДЕТАЛЬНОЕ ОПИСАНИЕ ИЗОБРЕТЕНИЯDETAILED DESCRIPTION OF THE INVENTION

На фиг. 1 показан пример вычислительного устройства, например, сервера, используемого в составе вычислительной системы общего назначения, в которой может быть применено настоящее изобретение. Система содержит многоцелевое вычислительное устройство в виде компьютера 20 или сервера, содержащего процессор 21, системную память 22 и системную шину 23. Системная шина формирует электрические связи, поддерживающие информационное, в том числе, управляющее, взаимодействие различные системные компонент, в том числе, системную памяти, с процессором 21.Fig. 1 shows an example of a computing device, for example, a server, used as part of a general-purpose computing system in which the present invention can be applied. The system contains a multi-purpose computing device in the form of a computer 20 or server, containing a processor 21, a system memory 22 and a system bus 23. The system bus forms electrical connections that support information, including control, interaction of various system components, including the system memory, with the processor 21.

Системная шина 23, как правило, используется для передачи данных, адресов и для управления функциональным состоянием блоков, подключенных к системной шине. Системная память 22 может быть выполнена в виде комбинации постоянного запоминающего устройства (ПЗУ) 24 и оперативного запоминающего устройства (ОЗУ) 25. В ПЗУ 24 хранятся коды инструкций базовой системы ввода/вывода (базовой операционной системы) 26, обеспечивающих выполнение операций по первоначальному запуску вычислительного устройства с последующей передачей управления блокам, указанным в конфигурационных данных ПЗУ. Эти коды могут использоваться, например, для обеспечения развертывания и запуска операционной системы более высокого уровня.The system bus 23 is generally used for transmitting data, addresses and for controlling the functional state of the units connected to the system bus. The system memory 22 can be implemented as a combination of a read-only memory (ROM) 24 and a random access memory (RAM) 25. The ROM 24 stores codes of instructions of the basic input/output system (basic operating system) 26, which ensure the execution of operations for the initial launch of the computing device with the subsequent transfer of control to the units specified in the configuration data of the ROM. These codes can be used, for example, to ensure the deployment and launch of a higher-level operating system.

Компьютер 20 также может содержать накопитель данных 27 большой емкости, например, твердотельный накопитель или накопитель с использованием магнитных дисков. Для использования съемных накопителей 29 данных используется программно-аппаратный интерфейс 28, обеспечивающий механическое и электрическое подключение соответствующего накопителя, например флэш карты к аппаратным интерфейсам и возможность считывания и записи данных на соответствующий блок хранения данных. В частном случае, интерфейс может быть выполнен в виде привода 30 оптических дисков 31. Для подключения к системной шине, накопитель 27 и программно-аппаратные интерфейсы содержат электрические соединители, обеспечивающие подключение к программно-аппаратным интерфейсам, преобразующим данный формат, пригодный для передачи данных в виде электрических сигналов по системной шине 23.The computer 20 may also contain a large-capacity data storage device 27, such as a solid-state drive or a drive using magnetic disks. To use removable data storage devices 29, a software and hardware interface 28 is used, which provides mechanical and electrical connection of the corresponding storage device, such as a flash card, to the hardware interfaces and the ability to read and write data to the corresponding data storage unit. In a particular case, the interface may be implemented as a drive 30 of optical disks 31. To connect to the system bus, the storage device 27 and the software and hardware interfaces contain electrical connectors, which provide connection to software and hardware interfaces that convert the given format, suitable for data transmission in the form of electrical signals via the system bus 23.

Накопитель 27 на жестком диске, накопитель 28 на магнитных дисках и накопитель 30 на оптических дисках соединены с системной шиной 23 посредством, соответственно, интерфейса 32 накопителя на жестком диске, интерфейса 33 накопителя на магнитных дисках и интерфейса 34 оптического накопителя.The hard disk drive 27, the magnetic disk drive 28 and the optical disk drive 30 are connected to the system bus 23 via, respectively, the hard disk drive interface 32, the magnetic disk drive interface 33 and the optical drive interface 34.

Накопители и их соответствующие читаемые компьютером средства обеспечивают энергонезависимое хранение читаемых компьютером инструкций, структур данных, программных модулей и других данных для компьютера 20.The storage devices and their associated computer-readable means provide non-volatile storage of computer-readable instructions, data structures, program modules, and other data for the computer 20.

Хотя описанная здесь типичная конфигурация использует жесткий диск, съёмный магнитный диск 29 и съёмный оптический диск 31, специалист примет во внимание, что в типичной операционной среде могут также быть использованы другие типы читаемых компьютером средств, которые могут хранить данные, которые доступны с помощью компьютера, такие как магнитные кассеты, карты флэш-памяти, цифровые видеодиски, картриджи Бернулли, оперативные запоминающие устройства (ОЗУ), постоянные запоминающие устройства (ПЗУ) и т.п.Although the typical configuration described herein utilizes a hard disk, a removable magnetic disk 29, and a removable optical disk 31, one skilled in the art will appreciate that other types of computer-readable media that can store data that is accessible by a computer may also be used in a typical operating environment, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, random access memory (RAM), read-only memory (ROM), and the like.

Различные программные модули, включая операционную систему 35, могут быть сохранены на жёстком диске, магнитном диске 29, оптическом диске 31, ПЗУ 24 или ОЗУ 25. Компьютер 20 содержит файловую систему 36, связанную с операционной системой 35 или включенную в нее, одно или более программное приложение (приложения) 37, другие программные модули 38 и программные данные 39. Пользователь может управлять компьютером 20 путем ввода команд с использованием клавиатуры 40 и указателя 42.Various software modules, including the operating system 35, may be stored on the hard disk, magnetic disk 29, optical disk 31, ROM 24, or RAM 25. The computer 20 comprises a file system 36 associated with or included in the operating system 35, one or more software application(s) 37, other software modules 38, and software data 39. A user may control the computer 20 by entering commands using the keyboard 40 and the pointer 42.

Эти и другие устройства ввода соединены с процессором 21 часто посредством интерфейса 46 последовательного порта, который связан с системной шиной, но могут быть соединены посредством других интерфейсов, таких как параллельный порт, игровой порт или универсальная последовательная шина (УПШ). Монитор 47 или другой тип устройства визуального отображения также соединен с системной шиной 23 посредством интерфейса, например, видеоадаптера 48. В дополнение к монитору 47, персональные компьютеры обычно включают в себя другие периферийные устройства ввода и вывода информации.These and other input devices are connected to the processor 21 often via a serial port interface 46, which is connected to the system bus, but may be connected via other interfaces, such as a parallel port, a game port, or a universal serial bus (USB). A monitor 47 or other type of visual display device is also connected to the system bus 23 via an interface, such as a video adapter 48. In addition to the monitor 47, personal computers typically include other peripheral input and output devices.

Компьютер 20 может быть подключен к сетевому окружению посредством аппаратных соединений, обеспечивающих формирование логических соединений с одним или несколькими удаленными компьютерами 49. Удаленный компьютер (или компьютеры) 49 может представлять собой другой компьютер, сервер, роутер, сетевой ПК, пиринговое устройство или другой узел единой сети, а также обычно содержит большинство или все элементы, описанные выше, в отношении компьютера 20, хотя показано только устройство хранения информации 50. Логические соединения включают в себя локальную сеть 51 передачи данных и глобальную сеть 52 передачи данных.The computer 20 may be connected to a network environment via hardware connections that provide for the formation of logical connections with one or more remote computers 49. The remote computer (or computers) 49 may be another computer, a server, a router, a network PC, a peer-to-peer device, or another node of a single network, and also typically contains most or all of the elements described above in relation to the computer 20, although only the information storage device 50 is shown. The logical connections include a local data network 51 and a global data network 52.

Компьютер 20, используемый в сетевом окружении ЛВС, соединяется с локальной сетью 51 посредством сетевого интерфейса или адаптера 53. Компьютер 20, используемый в сетевом окружении ГКС, обычно использует модем 54 или другие средства для установления связи с глобальной компьютерной сетью 52, такой как Интернет.The computer 20 used in the LAN network environment is connected to the local network 51 via a network interface or adapter 53. The computer 20 used in the GCN network environment typically uses a modem 54 or other means to establish a connection to a global computer network 52, such as the Internet.

Сетевая карта 54, соединена с системной шиной 23 посредством интерфейса 46 последовательного порта. В сетевом окружении программные модули или их части, описанные применительно к компьютеру 20, могут храниться на удаленном устройстве хранения информации. Показанные сетевые соединения являются обычно используемыми, но установления сетей передачи данных между компьютерами, могут быть использованы другие средства.The network card 54 is connected to the system bus 23 via the serial port interface 46. In a network environment, the program modules or portions thereof described with respect to the computer 20 may be stored on a remote information storage device. The network connections shown are those typically used, but other means may be used to establish data transmission networks between computers.

На фиг. 2 представлена упрощенная схематическая функциональная диаграмма одного из вариантов реализации системы, в которой может использоваться предложенный способ.Fig. 2 shows a simplified schematic functional diagram of one of the implementation options of a system in which the proposed method can be used.

Как показано на фиг. 2, в системе используется процессор 500, который вычисляет и формирует параметры настроек, в соответствии с которыми производится анализ блоков данных, в частных случаях, в режиме реального времени. Часть параметров и настроек сохраняется в памяти универсального энергонезависимого запоминающего устройства 515, такого как флэш диск. Указанные параметры настроек могут быть записаны в энергонезависимое запоминающее устройство хранения начальных настроек модуля анализа блоков данных (сетевых блоков). Последовательность чтения и записи настроек может быть инициирована командой процессора 500, а также командой, поступившей от внешнего устройства, осуществившему передачу параметров настроек. Параметры настроек, могут быть изменены по запросу модуля фильтрации сетевых блоков, например, при необходимости восстановления начальных настроек, в случае необходимости аварийной перезагрузки модуля фильтрации сетевых блоков.As shown in Fig. 2, the system uses a processor 500, which calculates and forms the settings parameters, according to which the analysis of data blocks is performed, in particular cases, in real time. Some of the parameters and settings are stored in the memory of a universal non-volatile memory device 515, such as a flash drive. The said settings parameters can be written to the non-volatile memory device storing the initial settings of the data block analysis module (network blocks). The sequence of reading and writing the settings can be initiated by a command of the processor 500, as well as by a command received from an external device that has transmitted the settings parameters. The settings parameters can be changed at the request of the network block filtering module, for example, if it is necessary to restore the initial settings, in the event of an emergency reboot of the network block filtering module.

Доступ процессора к энергонезависимому запоминающему устройству 515 может осуществляться как в режиме прямого доступа, так и с использованием функционального контроллера. В частности, доступ процессора к оперативному запоминающему устройству может осуществляться с использованием не показанного на фигурах быстродействующего системного контроллера, который может быть объединен с процессором на одной микросхеме в виде «системы на одном кристалле» отраженной на фиг. 6 позицией 670.The access of the processor to the non-volatile memory device 515 can be carried out both in the direct access mode and using the functional controller. In particular, the access of the processor to the operational memory device can be carried out using a high-speed system controller, not shown in the figures, which can be combined with the processor on one chip in the form of a "system on one chip" reflected in Fig. 6 by position 670.

Взаимодействие процессора или «системы на кристалле» 670 и функционального контроллера может осуществляться с использованием выделенных дорожек или соединений на печатной плате модуля управления настройками.The interaction between the 670 processor or system-on-a-chip and the functional controller may be accomplished using dedicated tracks or connections on the settings control module printed circuit board.

Функциями системы может быть предусмотрено обеспечение удаленного доступа к устройству по протоколу SSH (англ. Secure Shell - «безопасная оболочка») даже при наличии только дежурного питания, возможность управления параметрами платформы с использованием CLI (англ. Command line interface - «интерфейс командной строки») и обеспечение доступа к функциям основного процессора с использованием CLI. Функции удаленного доступа обеспечивают возможность оперативного обновления параметров системы, например, по результатам тестирования параметров на других устройствах или в других конфигурациях.The system functions may provide remote access to the device via the SSH (Secure Shell) protocol even with only standby power, the ability to control platform parameters using the CLI (Command line interface), and access to the main processor functions using the CLI. Remote access functions provide the ability to quickly update system parameters, for example, based on the results of testing parameters on other devices or in other configurations.

Реализация функций с помощью блока управления настройками системы значительно повышает отказоустойчивость платформы. В случае возникновения нештатных ситуаций всегда можно подключиться к PMC по защищенному каналу SSH, выполнить первичную диагностику неисправности и произвести действия по ее устранению.Implementation of functions using the system settings control unit significantly increases the platform's fault tolerance. In case of abnormal situations, you can always connect to the PMC via a secure SSH channel, perform primary diagnostics of the malfunction and take action to eliminate it.

Как показано на фиг. 2, система содержит модуль 100 управления настройками, модуль 110 обработки сетевых блоков данных, соединенные между собой аппаратными разъемными соединениями. Физические линии разъемного соединения могут быть использованы для распределения нагрузки источников питания, а также для передачи данных между элементами системы. Обмен данными 621 может производится по протоколу PCIe, с использованием адаптера, встроенного в микросхему процессора. На стороне модуля управления сетевыми блоками, адаптер PCIe 120 может быть реализован в виде отдельной микросхемы, либо может являться частью аппаратной логики программируемой логической интегральной схемы (ПЛИС), такой как FPGA. Для обеспечения возможности использования сетевого протокола при получении настроек модулем управления настройками, часть физических каналов 620 соединения 120 может использоваться для подключения сетевого адаптера, который, в современных системах, как правило, является частью микросхемы 210, на которой выполнен процессор. Максимальное использование возможностей адаптеров обмена данными, установленных на микросхеме процессора, позволяет оптимизировать производительность системы. В системе могут использоваться системы на одном кристалле. «Система на кристалле» (СнК) или однокристальная система, англ. System-on-a-Chip (SoC) - электронная схема, выполняющая функции целого устройства (например, компьютера) и размещённая на одной интегральной схеме.As shown in Fig. 2, the system comprises a settings control module 100, a network data block processing module 110, connected to each other by hardware detachable connections. The physical lines of the detachable connection can be used to distribute the load of power supplies, as well as to transfer data between the elements of the system. Data exchange 621 can be performed via the PCIe protocol, using an adapter built into the processor chip. On the side of the network block control module, the PCIe adapter 120 can be implemented as a separate chip, or can be part of the hardware logic of a programmable logic integrated circuit (FPGA), such as an FPGA. In order to ensure the possibility of using the network protocol when receiving settings by the settings control module, part of the physical channels 620 of the connection 120 can be used to connect the network adapter, which, in modern systems, is usually part of the chip 210 on which the processor is implemented. Maximum use of the capabilities of the data exchange adapters installed on the processor chip makes it possible to optimize the performance of the system. The system can use systems on a single chip. A System-on-a-Chip (SoC) is an electronic circuit that performs the functions of an entire device (such as a computer) and is placed on a single integrated circuit.

Функциональный контроллер, который, преимущественно, выполняет функции «южного моста», также может быть реализован как составная часть кристалла процессора. Функциональный контроллер может обеспечивать реализацию операций 640 по изменению параметров работы аппаратных устройств системы, например, управлять тактовой частотой 610 тактового генератора 615. Необходимость управления тактовой частотой может возникнуть при снижении или повышении интенсивности передачи данных в информационной сети. Физические каналы 678 и 679 могут быть выделены для обмена данными по протоколам интерфейса ввода/вывода общего назначения, а также протокола универсального асинхронного приёмопередатчика, соответственно, для обеспечения возможности подключения различных периферийных устройств 690, 691.The functional controller, which mainly performs the functions of the "south bridge", can also be implemented as a component of the processor crystal. The functional controller can ensure the implementation of operations 640 for changing the operating parameters of the system hardware devices, for example, control the clock frequency 610 of the clock generator 615. The need to control the clock frequency can arise when the intensity of data transmission in the information network decreases or increases. Physical channels 678 and 679 can be allocated for data exchange according to general-purpose input/output interface protocols, as well as the universal asynchronous receiver-transmitter protocol, respectively, to ensure the possibility of connecting various peripheral devices 690, 691.

В общем случае, система для реализации способа содержит средства 10 приема входных сигналов первого физического канала сети передачи данных, и преобразования входных сигналов в цифровые представления входных сетевых блоков данных, где сетевые блоки данных характеризуются адресом получателя сетевого блока данных;In general, the system for implementing the method comprises means 10 for receiving input signals of the first physical channel of the data transmission network, and converting the input signals into digital representations of input network data blocks, where the network data blocks are characterized by the address of the recipient of the network data block;

средства 11 преобразования цифровых представлений выходных сетевых блоков данных в выходные сигналы для передачи по второму физическому каналу сети передачи данных;means 11 for converting digital representations of output network data blocks into output signals for transmission over a second physical channel of the data transmission network;

средства преобразования цифровых представлений входных сетевых блоков данных в цифровые представления выходных сетевых блоков данных иmeans for converting digital representations of input network data blocks into digital representations of output network data blocks and

средство анализа цифровых представлений входных сетевых блоков данных, выполненное формирующим уведомления о потенциальной опасности блоков данных при соответствии цифровых или сетевых представлений входных сетевых блоков данных заранее заданным условиям. Для этого система может быть выполнена содержащей соединенные между собой модуль обработки сетевых блоков данных и модуль управления настройками системы. Модуль обработки сетевых блоков данных содержит установленные на нем средства хранения цифровых представлений сетевых блоков данных, средства хранения цифровых представлений заранее заданных условий и средства фильтрации цифровых представлений входных сетевых блоков данных, содержащие средства аппаратной логики, выполненные сопоставляющими заранее заданные условия с цифровыми представлениями входных блоков данных, средства аппаратной логики выполняющие преобразование цифровых представлений входных сетевых блоков данных в цифровые представления выходных сетевых блоков данных, а также энергонезависимое запоминающее устройство хранения начальных настроек средств обработки сетевых блоков, и тактовый генератор системы, выполненный тактирующим средства аппаратной логики при выполнении операций.a means for analyzing digital representations of input network data blocks, executed by generating notifications about the potential danger of data blocks when the digital or network representations of the input network data blocks correspond to predetermined conditions. For this purpose, the system can be executed by containing a module for processing network data blocks and a module for managing the system settings, connected to each other. The module for processing network data blocks contains means for storing digital representations of network data blocks installed thereon, means for storing digital representations of predetermined conditions, and means for filtering digital representations of input network data blocks, containing hardware logic means executed by comparing predetermined conditions with digital representations of input data blocks, hardware logic means that convert digital representations of input network data blocks into digital representations of output network data blocks, as well as a non-volatile memory device for storing the initial settings of the means for processing network blocks, and a clock generator of the system executed by clocking the hardware logic means when performing operations.

На фиг. 3 схематично отражена последовательность операций, используемых для определения первоначальных настроек способа.Fig. 3 schematically shows the sequence of operations used to determine the initial settings of the method.

На начальном этапе 301 собирают массив произвольной последовательности сетевых блоков, в отношении которых не применялись операции фильтрации любого рода. На этапе 302 производят предварительную обработку данных, например, произвольно отбирают блоки данных из массива данных, собранных на этапе 301. Отобранная часть блоков массива используют в качестве обучающей выборки на этапе 303. Другие блоки массива могут использовать в качестве тестового массива данных. Опасность блоков данных массива определяют либо с использованием известных средств, эффективность которых заведомо известна, либо производят тестирование блоков на программно-аппаратных вычислительных системах с определением негативного воздействия данных, содержащихся в блоках на работу известных систем. Различные конфигурации вычислительных системе могут иметь различную степень уязвимости, в связи с чем для блоков массива может быть указана не только потенциальная опасность использования блоков, но и вероятность возникновения сбоев в работе систем при использовании блоков.At the initial stage 301, an array of an arbitrary sequence of network blocks is collected, with respect to which no filtering operations of any kind have been applied. At stage 302, preliminary data processing is performed, for example, data blocks are randomly selected from the array of data collected at stage 301. The selected part of the array blocks is used as a training sample at stage 303. Other blocks of the array can be used as a test array of data. The danger of the array data blocks is determined either using known means, the effectiveness of which is known in advance, or the blocks are tested on software and hardware computing systems with the determination of the negative impact of the data contained in the blocks on the operation of known systems. Various configurations of computing systems can have different degrees of vulnerability, in connection with which not only the potential danger of using the blocks can be indicated for the array blocks, but also the probability of failures in the operation of the systems when using the blocks.

Далее, на первоначальном этапе, произвольным образом задаются параметры настроек формирования выборок из блоков данных. С использованием этих параметров, на этапе 304 производится выборка данных из блоков. На этапе 305 выборки данных сжимаются с использованием функции tlsh, после чего, на этапе 306 производится определение и уточнение координат гиперплоскости с учетов значений векторов, представленных кэшами, а также с учетом знаний о вредоносности блоков и вероятности того, что блок является вредоносным. Скорректированные параметры гиперплоскости используются, применительно к блоку, для определения точности выявления опасности. В случае, если по результатам проверки 307 гиперплоскость эффективно отражает опасность блоков с требуемой степенью достоверности, точность считается приемлемой и настройки способа применяются для обнаружения 308 аномалий в условиях анализа реального сетевого трафика.Next, at the initial stage, the parameters of the settings for generating samples from data blocks are arbitrarily set. Using these parameters, at stage 304, data is sampled from the blocks. At stage 305, the data samples are compressed using the tlsh function, after which, at stage 306, the hyperplane coordinates are determined and refined taking into account the values of the vectors represented by the caches, as well as taking into account knowledge of the maliciousness of the blocks and the probability that the block is malicious. The adjusted hyperplane parameters are used, with respect to the block, to determine the accuracy of identifying the danger. If, according to the results of verification 307, the hyperplane effectively reflects the danger of the blocks with the required degree of reliability, the accuracy is considered acceptable and the settings of the method are used to detect 308 anomalies in the conditions of analyzing real network traffic.

Дополнительно, точность реализации настроек может быть проверена на этапе 308 с использованием тестовых блоков. В случае, если настройки не могут быть использованы для защиты вычислительных систем, например, имеют низкую точность вообще или низкую точность, применительно к определенным воздействиям, параметры выборок данных корректируются с использованием исторических данных и с применением генетического алгоритма и настройки способа, в том числе, определение координат гиперплоскости, производится повторно с этапа 304.Additionally, the accuracy of the implementation of the settings can be checked at step 308 using test blocks. In the event that the settings cannot be used to protect computing systems, for example, have low accuracy in general or low accuracy with respect to certain impacts, the parameters of the data samples are adjusted using historical data and using a genetic algorithm and the adjustment of the method, including determining the coordinates of the hyperplane, is repeated from step 304.

Изобретение обеспечивает возможность разработки средств использования технологии эвристической обработки данных, в том числе, так называемых, «генетических алгоритмов», для оперативного формирования правил блокировки пакетов на основе сопоставления текущих пакетов и исторических данных. В этом случае, проводится анализ каждой сессии. Сессии с аномальным содержимым пакетов, входящих в сессию, анализируются и учитываются отдельно, что позволяет снизить количество ложноположительных срабатываний и блокировки IP адресов, не генерирующих опасный код. Указанное свойство может быть реализовано за счет того, что только из-за учета аномальных сессий со сходными свойствами. Операции обработки данных, составляющие существенную часть предложенного изобретения, оптимизированы для реализации с использованием средств аппаратной логики, например, с использованием микросхем с заранее заданными логическими связями или программируемых логических интегральных схем (ПЛИС).The invention provides the possibility of developing means for using heuristic data processing technology, including so-called "genetic algorithms", for promptly generating packet blocking rules based on a comparison of current packets and historical data. In this case, each session is analyzed. Sessions with anomalous packet content included in the session are analyzed and taken into account separately, which allows reducing the number of false positives and blocking IP addresses that do not generate dangerous code. The specified property can be implemented due to the fact that only due to taking into account anomalous sessions with similar properties. Data processing operations, which constitute a significant part of the proposed invention, are optimized for implementation using hardware logic, for example, using microcircuits with predetermined logical connections or programmable logic integrated circuits (FPGA).

Предложенный способ не предназначен для распознавания типа угрозы, а, с определенной регулируемой вероятностью, определяет возможность содержания угрозы в принятом блоке данных. Далее, системы, применяемые совместно с предложенным способом, могут удалять подозрительные блоки без дальнейшего анализа или проводить анализ содержимого блоков или проводить анализ их воздействия на испытуемые системы. Выявление подозрительных блоков, для которых не существует шаблонов и сигнатур угроз позволяет определять угрозы нулевого дня и отсеивать соответствующие данные до внесения изменений в защищаемые системы. Ложноположительные срабатывания при реализации способа также учитываются и используются для обучения или уточнения координат гиперплоскости. При большом количестве ложноположительных срабатываний, изменения вносятся в генетический алгоритм.The proposed method is not intended to recognize the type of threat, but, with a certain adjustable probability, determines the possibility of containing a threat in the received data block. Further, systems used together with the proposed method can delete suspicious blocks without further analysis or analyze the contents of the blocks or analyze their impact on the tested systems. Identification of suspicious blocks for which there are no templates and threat signatures allows identifying zero-day threats and filtering out the corresponding data before making changes to the protected systems. False positives during the implementation of the method are also taken into account and used to train or refine the coordinates of the hyperplane. With a large number of false positives, changes are made to the genetic algorithm.

Системы обнаружения вторжений (Intrusion Detection Systems, IDS) специализирующиеся на автоматическом распознавании вторжений и угроз в прослушиваемом трафике локальных сетей, являются программно-аппаратными средствами, как правило, выполняющие непрерывное наблюдение за сетевым трафиком и деятельностью субъектов системы с целью предупреждения, выявления и протоколирования атак. Предложенное изобретение обеспечивает возможность формирования предупреждений, которые также относятся и к сетевым вторжениям, с использованием механизмов обратной связи, которые не предусматривают необходимость формирования сигналов обратной связи в режиме реального времени или в ответ на каждый сбой системы, имеющий отношение к вторжению.Intrusion Detection Systems (IDS) specializing in automatic recognition of intrusions and threats in the eavesdropped traffic of local networks are software and hardware tools, as a rule, performing continuous monitoring of network traffic and the activity of system entities in order to prevent, detect and record attacks. The proposed invention provides the ability to generate warnings that also relate to network intrusions, using feedback mechanisms that do not provide for the need to generate feedback signals in real time or in response to each system failure related to the intrusion.

В предложенном изобретении метод (машина) опорных векторов используется для формирования настроек или, другими словами, параметров границы раздела или многомерной гиперплоскости, условно разделяющей данные характеризующие опасные пакеты данных и пакеты данных, не представляющие угрозы для сети. В многомерный системах обеспечить строгое разделение данных, представленных в бинарном виде невозможно, в связи с чем, задачей классификатора является подбор параметров гиперплоскости таким образом, чтобы осуществлялась максимально эффективная сортировка блоков. Параметры гиперплоскости, в дальнейшем, могут быть применены для использования не только в системе, в которой осуществлялось обучение, но и на других системах, применяющих соответствующие машины опорных векторов. В общем случае, качество определения параметров гиперплоскости зависит от количества обучающих блоков данных, однако при использовании предложенного способа возможны случаи, когда блоки данных, используемые для обучения машины опорных векторов не релевантные к возможным угрозам. В этом случае необходима настройка параметров предшествующих операций.In the proposed invention, the support vector machine (method) is used to form settings or, in other words, parameters of the boundary of the section or multidimensional hyperplane, conditionally dividing the data characterizing dangerous data packets and data packets that do not pose a threat to the network. In multidimensional systems, it is impossible to ensure strict separation of data presented in binary form, in connection with which, the task of the classifier is to select the hyperplane parameters in such a way as to carry out the most effective sorting of blocks. The hyperplane parameters can subsequently be applied for use not only in the system in which training was carried out, but also in other systems using the corresponding support vector machines. In the general case, the quality of determining the hyperplane parameters depends on the number of training data blocks, however, when using the proposed method, there may be cases when the data blocks used to train the support vector machine are not relevant to possible threats. In this case, it is necessary to adjust the parameters of the previous operations.

Например, если в результате применения настроек, сгенерированных генетическим алгоритмом, для анализа отбираются последовательности, не имеющие пересечений с вредоносным кодом, применение последующих операций не даст результатов, отличных от сформированных случайным образом. Таким образом, формирование параметров указанных настроек является весьма существенным этапом реализации изобретения.For example, if, as a result of applying the settings generated by the genetic algorithm, sequences that do not intersect with malicious code are selected for analysis, the application of subsequent operations will not yield results that differ from those generated randomly. Thus, the formation of the parameters of the said settings is a very significant stage in the implementation of the invention.

Наборы инструкций, обеспечивающие выполнение вычислительных операций способа, в том числе, для определения параметров машины опорных векторов известны, но их фактическая реализация может определяться типом операционной системы, моделью процессора и другими параметрами программно-аппаратных комплексов, реализующих предложенный способ. В частном случае, для реализации операций способа могут использоваться произвольно программируемые пользователем интегральные схемы, например, FPGA (field-programmable gate array), обеспечивающие возможность формирования аппаратных логических элементов и их связей по выбору пользователя. Применительно к предложенному способу, применение FPGA целесообразно для определения участков данных с низкой энтропией, а также для вычисления хэш функций, в том числе табличным методом, а для определения «расположения» блоков данных относительно гиперплоскости и корректировки координат гиперплоскости (многомерной плоскости). Использование обычной платформы Intel или AMD для проведения операций способа, как правило, малоэффективно, т.к. при проведении скалярных произведений векторов 90-98% транзисторов процессора простаивает. Для конкретной задачи, например, для вычисления скалярного произведения векторов необходимо иметь множество умножителей и сумматоров. То есть, можно, используя то же самое количество транзисторов, что и для универсального процессора, построить специализированную систему, которая будет использовать почти все 100% ресурсов только для задач умножения и сложения. Такая система будет обладать быстродействием выше универсального процессора на несколько порядков. При использовании аппаратно реконфигурируемых вычислительных систем для реализации способа, имеется возможность создавать аппаратные структуры, которые на уровне аппаратуры соответствуют вычислительной структуре задачи. Для создания универсальных систем могут быть использованы микросхемы FPGA. Для более специализированных задач, возможно создание микросхем с исходно заданной топологией, оптимально соответствующей последовательности обработки данных, в том числе с возможностью возможности организации параллельных потоков обработки данных без разделения операций во времени.The instruction sets that ensure the execution of computational operations of the method, including for determining the parameters of the support vector machine, are known, but their actual implementation can be determined by the type of the operating system, the processor model and other parameters of the software and hardware complexes implementing the proposed method. In a particular case, arbitrarily programmable user integrated circuits can be used to implement the operations of the method, for example, FPGA (field-programmable gate array), which provide the ability to form hardware logical elements and their connections at the user's discretion. With regard to the proposed method, the use of FPGA is advisable for determining data sections with low entropy, as well as for calculating hash functions, including by the tabular method, and for determining the "location" of data blocks relative to the hyperplane and adjusting the coordinates of the hyperplane (multidimensional plane). The use of a conventional Intel or AMD platform for performing the operations of the method is usually ineffective, since 90-98% of the processor transistors are idle when performing scalar products of vectors. For a specific task, for example, for calculating the scalar product of vectors, it is necessary to have many multipliers and adders. That is, using the same number of transistors as for a general-purpose processor, it is possible to build a specialized system that will use almost 100% of the resources only for multiplication and addition tasks. Such a system will have a performance higher than a general-purpose processor by several orders of magnitude. When using hardware reconfigurable computing systems to implement the method, it is possible to create hardware structures that correspond to the computing structure of the task at the hardware level. FPGA chips can be used to create general-purpose systems. For more specialized tasks, it is possible to create chips with an initially specified topology that optimally corresponds to the data processing sequence, including the ability to organize parallel data processing flows without separating operations in time.

Максимальное количество итераций, относящихся к использованию генетических вычислений, то есть, к реализации технологии генетического алгоритма, осуществляется на этапе первоначального конфигурирования параметров способа, где возможны случаи отсутствия пересечений выборок с промежуточными или первоначальными параметрами с опасными данными. То есть, на начальных этапах возможны такие параметры выборок данных, при которых корреляция свойств блоков данных со свойствами «опасных» блоков отсутствует.The maximum number of iterations related to the use of genetic calculations, i.e., to the implementation of genetic algorithm technology, is carried out at the stage of the initial configuration of the method parameters, where cases of absence of intersections of samples with intermediate or initial parameters with dangerous data are possible. That is, at the initial stages, such parameters of data samples are possible, in which there is no correlation of the properties of data blocks with the properties of "dangerous" blocks.

В этом случае, наиболее эффективным элементов генетического алгоритма является использование «мутации» или случайного изменение параметров выборок.In this case, the most effective element of the genetic algorithm is the use of "mutation" or random change of sample parameters.

В процессе работы возможно применение генетического алгоритма либо в режиме параллелизма, либо при использовании различных «поколений» настроек на различных вычислительных комплексах. В случае использования вычислительных кластеров вычислительных устройств, на которых применяются различные «поколения» настроек, применение способа является максимально эффективным.In the process of work, it is possible to use a genetic algorithm either in parallel mode or when using different "generations" of settings on different computing complexes. In the case of using computing clusters of computing devices on which different "generations" of settings are used, the application of the method is most effective.

В качестве параметров, которые могут использоваться с применением генетического алгоритма, используются последовательности бит, характеризующих длину выборки, адрес выборки относительно начала пакета или относительно начала последовательности данных, если предполагаются различные способы распределения данных в пакетах различных стандартов. Параметром может также быть количество выборок в одной последовательности, где каждая из выборок последовательности может характеризоваться длиной и смещением относительно начального адреса пакета.The parameters that can be used with the genetic algorithm are bit sequences that characterize the length of the sample, the address of the sample relative to the beginning of the packet or relative to the beginning of the data sequence, if different methods of data distribution in packets of different standards are assumed. The parameter can also be the number of samples in one sequence, where each of the samples of the sequence can be characterized by the length and offset relative to the initial address of the packet.

Способ показал максимальную эффективность при использовании с технологией tlsh, описанной в статье Oliver, J., Cheng, C., Chen, Y.: TLSH - A Locality Sensitive Hash. 4th Cybercrime and Trustworthy Computing Workshop, Sydney, November 2013 https://github.com/trendmicro/tlsh/blob/master/TLSH_CTC_final.pdf. Представленное в указанном источнике информации описание может рассматриваться как один из частных случаев реализации кэширования с понижением размерности в настоящем изобретении.The method has shown maximum efficiency when used with the tlsh technology described in the article by Oliver, J., Cheng, C., Chen, Y.: TLSH - A Locality Sensitive Hash. 4th Cybercrime and Trustworthy Computing Workshop, Sydney, November 2013 https://github.com/trendmicro/tlsh/blob/master/TLSH_CTC_final.pdf. The description presented in the specified source of information can be considered as one of the special cases of implementing caching with dimensionality reduction in the present invention.

В частном случае реализации, генетический алгоритм применяется для конфигурирования параметров технологии скользящих окон, где окнами являются участки данных, а регулируется длина выборок данных (окон), и другие параметры, такие как адрес начального окна в блоке данных и шаг между начальными адресами последующих окон. С использованием генетического алгоритма обеспечивается снижение объема данных, подлежащих рассмотрению. Итеративный процесс уточнения параметров обеспечивает сохранение точности принятия решений на приемлемом уровне с одновременным увеличением скорости принятия решений. При этом, предложенный способ обеспечивает существенное ускорение вычислительных операций с использованием аппаратных средств. В частности, обеспечивается возможность параллельного вычисления хэшей с сохранением близости для нескольких участков данных блоков, либо определение указанных хэшей за минимальное число тактов вычислительной системы.In a particular case of implementation, the genetic algorithm is used to configure the parameters of the sliding window technology, where the windows are data sections, and the length of data samples (windows) is regulated, and other parameters, such as the address of the initial window in the data block and the step between the initial addresses of subsequent windows. Using the genetic algorithm, a reduction in the volume of data to be considered is ensured. The iterative process of refining the parameters ensures that the accuracy of decision-making is maintained at an acceptable level with a simultaneous increase in the speed of decision-making. At the same time, the proposed method provides a significant acceleration of computing operations using hardware. In particular, the possibility of parallel calculation of hashes with preservation of proximity for several sections of data blocks is provided, or the determination of the said hashes in a minimum number of cycles of the computing system.

При использовании алгоритма TLSH снижается размерность блоков данных, то есть, количество условных измерений векторов блоков данных. В этом случае, упрощается построение так называемой, гиперплоскости, то есть параметров границ раздела блоков, с одной стороны границы раздела располагаются вектора, которые указывают на наличие угроз с высокой вероятностью, а с другой стороны гиперплоскости располагаются вектора, которые, высокой степенью вероятности, относятся к безопасным блокам.When using the TLSH algorithm, the dimensionality of data blocks is reduced, i.e., the number of conditional dimensions of data block vectors. In this case, the construction of the so-called hyperplane is simplified, i.e., the parameters of the block division boundaries; on one side of the division boundary are vectors that indicate the presence of threats with a high probability, and on the other side of the hyperplane are vectors that, with a high degree of probability, belong to safe blocks.

Сетевые угрозы имеет тенденцию постоянно изменяться и эволюционировать. Для эффективного определения неизвестных новых угроз, которые при этом имеют какие-либо общие черты с определенными известными угрозами, будут использоваться наиболее современные методы сравнения. В том числе планируется оценить возможность эффективного использования технологий сравнения, разработанных для сравнения различных геномов ДНК. Генетические алгоритмы, реализуемые в ПАК, позволяют максимально быстро создавать алгоритмы, выявляющие аномалии в трафике и инфраструктуре клиента.Network threats tend to constantly change and evolve. The most modern comparison methods will be used to effectively identify unknown new threats that have some common features with certain known threats. In particular, it is planned to evaluate the possibility of effectively using comparison technologies developed for comparing different DNA genomes. Genetic algorithms implemented in the PAC allow for the fastest possible creation of algorithms that detect anomalies in the client's traffic and infrastructure.

Использование генетического алгоритма позволяет поддержать стабильную защиту оборудования от зловредного кода, в отличие от обычных методов «устранения уязвимостей». В случае устранения уязвимостей путем изменения структуры операционных систем или условий поиска уязвимостей по выявленным шаблонам и сигнатур опасного кода, генетический алгоритм обеспечивает поддержание устойчивости защищаемой системы на приемлемом уровне путем применения нечетких или размытых правил устранения уязвимостей, которые не могут быть проанализированы обычным образом. Важно, что генерация опасного кода злоумышленниками, в настоящее время, не может осуществляться случайным образом, поскольку нанесение ущерба предполагает использование заранее известных свойств систем, имеющих стабильную конфигурацию. Аналогично, устранение известных уязвимостей стандартными методами предполагает изменение явным образом определенных свойств систем, которые используются для нанесения ущерба вредоносным кодом.The use of a genetic algorithm allows maintaining stable protection of equipment from malicious code, unlike conventional methods of "eliminating vulnerabilities". In the case of eliminating vulnerabilities by changing the structure of operating systems or the conditions for searching for vulnerabilities according to identified patterns and signatures of dangerous code, a genetic algorithm ensures that the stability of the protected system is maintained at an acceptable level by applying fuzzy or vague rules for eliminating vulnerabilities that cannot be analyzed in the usual way. It is important that the generation of dangerous code by intruders cannot currently be carried out randomly, since causing damage involves using previously known properties of systems with a stable configuration. Similarly, eliminating known vulnerabilities using standard methods involves changing explicitly defined properties of systems that are used to cause damage with malicious code.

Кроме этого, почти все вычислительные системы и системы, использующие программное обеспечение, используют значительные объемы ресурсов, используемых для предотвращения нежелательных воздействий. Предложенный способ является самомодифицирующимся, обеспечивает возможность снизить расход вычислительных ресурсов на обнаружение тех участков кода, которые ранее воздействовали, но уже не влияют на системы недопустимым образом.In addition, almost all computing systems and systems using software use significant amounts of resources used to prevent unwanted impacts. The proposed method is self-modifying, provides the ability to reduce the consumption of computing resources for detecting those sections of code that previously affected, but no longer affect the systems in an unacceptable way.

Перед использованием предложенного способа для защиты информационных и вычислительных систем, производится первоначальное конфигурирование настроек. Для этого, способ, согласно формуле изобретения, используют в отношении обучающих и тестовых блоков данных. Для таких блоков данных заведомо известен уровень потенциальной опасности, в связи с чем настройки технологий, используемых при реализации способа, могут быть эффективно и оперативно сформированы, после чего параметры настроек могут быть протестированы для определения уровня достоверности.Before using the proposed method for protecting information and computing systems, the initial configuration of the settings is performed. For this purpose, the method, according to the formula of the invention, is used with respect to training and test data blocks. For such data blocks, the level of potential danger is known in advance, in connection with which the settings of the technologies used in implementing the method can be effectively and promptly formed, after which the settings parameters can be tested to determine the level of reliability.

Нет
Основной задачей генетического алгоритма является формирование настроек, которые обеспечивают максимально эффективное извлечение из блоков данных, используемых в сети интернет, например транспортных пакетов, сведений, которые могут соотноситься с угрозами безопасности. В данном случае эффективность определяется не возможностью идеального подхода к извлечению данных, но соизмеримостью достигаемой эффективности и накладных расходов на достижение результата, близкого к идеальному. Поиск фрагментов пакетов с наибольшей информативностью можно свести к задаче о покрытии множеств, которая за полиномиальное время сводится к задаче о вершинном покрытии и является NP-полной. На сегодняшний день генетические алгоритмы доказали свою конкурентоспособность при решении многих NP-полных задач.
No
The main task of the genetic algorithm is to form settings that ensure the most efficient extraction of information that may be related to security threats from data blocks used on the Internet, such as transport packets. In this case, efficiency is determined not by the possibility of an ideal approach to data extraction, but by the commensurability of the achieved efficiency and the overhead costs of achieving a result close to the ideal. The search for packet fragments with the greatest information content can be reduced to a set covering problem, which in polynomial time is reduced to a vertex covering problem and is NP-complete. To date, genetic algorithms have proven their competitiveness in solving many NP-complete problems.

Анализ извлеченных сведений на предмет возможного наличия угроз, производится на последующих этапах. Соответственно, ложноположительные срабатывания учитываются при настройке всех настраиваемых параметров предложенного способа или только машины опорных векторов, при определении координат гиперплоскости.The analysis of the extracted information for possible presence of threats is performed at subsequent stages. Accordingly, false positives are taken into account when setting all adjustable parameters of the proposed method or only the support vector machine, when determining the coordinates of the hyperplane.

При работе генетического алгоритма возможно задание базового смещения первой выборки. Указанный параметр может влиять на результаты хэширования с понижением размерности и сохранения порядка, что в частных случаях, обеспечивает возможность изменения результатов хэширования без необходимости изменения настроек технологии хэширования.When running a genetic algorithm, it is possible to specify a base offset for the first sample. This parameter can affect the results of downsampling and order-preserving hashing, which in particular cases allows changing the hashing results without having to change the hashing technology settings.

Для данных, не поддающихся сжатию, может использоваться проверка корреляции выбранной для анализа последовательности бит с последовательностями бит, определенных для шаблонов угроз. В случае обнаружения корреляции, при кажущейся хаотичности структур, энтропия считается малой, поскольку данные с высокой степенью вероятности замаскированы под данные с высокой энтропией и имеют упорядоченную структуру.For data that cannot be compressed, a correlation check of the bit sequence selected for analysis with the bit sequences defined for threat patterns can be used. If a correlation is detected, with seemingly chaotic structures, the entropy is considered low, since the data is highly likely to be disguised as high-entropy data and have an ordered structure.

В терминах настоящего изобретения, «обученными» технологиями являются этапы обработки и анализа данных, параметры работы которых настраиваются и эти параметры являются максимально эффективными из возможных. Обученные технологии, в таком виде, могут быть реплицированы на аппаратных средствах с формированием программно-аппаратных модулей. Как уже отмечалось выше, отдельные программно-аппаратные модули могут использоваться для уточнения и, соответственно, для локального повышения эффективности обработки данных, на соответствующем модуле. В дальнейшем возможно использовать суперпозицию настроек отдельных модулей для формирования ещё более эффективных настроек систем обработки данных. В качестве варианта может рассматриваться периодическое сравнение эффективности работы модулей с применением общей тестовой последовательности блоков данных и выбор наиболее эффек4тивной конфигурации настроек для настройки всех модулей. Также настройки различных модулей могут применяться как исходные данные для работы генетического алгоритма. Развертывание обученных систем с одновременной модификацией параметров на совокупность программно-аппаратных модулей также может рассматриваться как эффективная реализация настоящего изобретения.In terms of the present invention, the "trained" technologies are the stages of data processing and analysis, the operating parameters of which are configured and these parameters are the most effective possible. The trained technologies, in this form, can be replicated on hardware with the formation of software and hardware modules. As noted above, individual software and hardware modules can be used to clarify and, accordingly, to locally increase the efficiency of data processing on the corresponding module. In the future, it is possible to use the superposition of the settings of individual modules to form even more effective settings of data processing systems. As an option, a periodic comparison of the efficiency of the modules using a common test sequence of data blocks and the selection of the most effective configuration of settings for setting up all modules can be considered. Also, the settings of various modules can be used as initial data for the operation of the genetic algorithm. Deployment of trained systems with simultaneous modification of parameters on a set of software and hardware modules can also be considered as an effective implementation of the present invention.

Исходно отобранное для анализа окно представлено в виде последовательности байт. Длина (количество байт) окна выбирается при использовании генетического алгоритма, где параметры, сформированные методом «естественного отбора», оцениваются с точки зрения эффективности маркировки блоков данных. Точность маркировки, составляющая 50%, например, является неэффективной, поскольку соответствует случайному выбору блоков, промаркированных как потенциально опасные.The window initially selected for analysis is presented as a sequence of bytes. The length (number of bytes) of the window is selected using a genetic algorithm, where the parameters formed by the "natural selection" method are evaluated in terms of the efficiency of data block labeling. A labeling accuracy of 50%, for example, is ineffective, since it corresponds to a random selection of blocks labeled as potentially dangerous.

По аналогии с биологическими объектами, для которых характерна оптимальность, а также согласованность и эффективностью их работы, так называемые «генетические алгоритмы» или «эволюционные вычисления» имитируют естественный отбор в живой природе. Где естественный отбор характеризуется следующими сущностями:By analogy with biological objects, which are characterized by optimality, as well as consistency and efficiency of their work, the so-called "genetic algorithms" or "evolutionary computations" imitate natural selection in living nature. Where natural selection is characterized by the following entities:

Популяция - это совокупность особей на рассматриваемой стадии эволюции.A population is a collection of individuals at a given stage of evolution.

Индивид (особь) - единичный представитель популяции.An individual is a single representative of a population.

Хромосома - структура, содержащая генетический код индивида.A chromosome is a structure that contains the genetic code of an individual.

Ген - определенная часть хромосомы, кодирующая врожденное качествоA gene is a specific part of a chromosome that codes for an innate quality.

индивида.individual.

Функция приспособленности (fitness-функция, приспособленность) - определяет, как близка особь к решению задачи.Fitness function (fitness function) - determines how close an individual is to solving a problem.

Генетический алгоритм - это оптимизационный метод, базирующийся на принципах естественной эволюции популяции особей (индивидов). Задача оптимизации состоит в максимизации функции приспособленности (фитнес-функции).Genetic algorithm is an optimization method based on the principles of natural evolution of a population of individuals. The optimization task is to maximize the fitness function.

Термин «генетический алгоритм», применительно к настоящему изобретению, обозначает операции подбора оптимальных параметров совокупностей битовых строк фиксированной длины, где совокупность строк («хромосома») характеризует «индивида», заранее определенные участки строк рассматриваются в качестве «генов», функция приспособленности соответствует возможности максимально достоверного выявления потенциально опасных блоков, а «популяция» определяет совокупность индивидов, за исключением индивидов, полученных в результате «мутации» для которых подтверждена возможность реализации «функции приспособленности». Здесь мутация определяет набор параметров, полученных путем случайного изменения «генов» «хромосомы» индивидов из популяции.The term "genetic algorithm", as applied to the present invention, denotes the operations of selecting optimal parameters of sets of bit strings of fixed length, where the set of strings ("chromosome") characterizes an "individual", predetermined sections of strings are considered as "genes", the fitness function corresponds to the possibility of maximally reliable detection of potentially dangerous blocks, and the "population" determines the set of individuals, with the exception of individuals obtained as a result of "mutation" for which the possibility of implementing the "fitness function" has been confirmed. Here, mutation determines a set of parameters obtained by randomly changing the "genes" of the "chromosome" of individuals from the population.

Путем применения генетического алгоритма подбирается также положение окна в пакете данных. Для одного пакета данных могут быть указаны параметры, соответствующие использованию нескольких окон из блока данных. Положение каждого окна или начальный адрес окна (выборки данных) относительно начального адреса блока данных также может быть выбрано с использованием генетического алгоритма. Точное указание параметров каждого из окон блока данных дает максимальную эффективность, однако накладные расходы на вычисление оптимальных параметров каждого окна не дает возможность применения такого метода на практике, когда ограничены время и ресурсы на проведение операций вычисления. Оптимальным, по результатам сопоставления производительности и эффективности, является выбор начального адреса первого окна в пакете данных, выбор значения длины окон в байтах, а также определение расстояния или смещения адресов между выбранными окнами. Параметр количества окон может задаваться в качестве «гена» генетического алгоритма, при малом количестве окон, соответствующих максимальной эффективности применения способа. Оптимальным является косвенное задание количества окон путем указания длины окон, адресом первого окна и шагом или смещением окон друг относительно друга. В этом случае, количество окон опосредовано определяется длиной блока данных.The position of the window in the data packet is also selected by applying the genetic algorithm. For one data packet, parameters corresponding to the use of several windows from the data block can be specified. The position of each window or the starting address of the window (data sample) relative to the starting address of the data block can also be selected using the genetic algorithm. Precise specification of the parameters of each of the windows of the data block provides maximum efficiency, but the overhead costs of calculating the optimal parameters of each window do not allow the use of such a method in practice, when the time and resources for performing calculation operations are limited. The optimal, according to the results of comparing the performance and efficiency, is the selection of the starting address of the first window in the data packet, the selection of the value of the window length in bytes, and the determination of the distance or offset of the addresses between the selected windows. The parameter of the number of windows can be specified as a "gene" of the genetic algorithm, with a small number of windows corresponding to the maximum efficiency of the method. Indirect specification of the number of windows by specifying the length of the windows, the address of the first window and the step or offset of the windows relative to each other is optimal. In this case, the number of windows is indirectly determined by the length of the data block.

Количество окон может быть определено без применения генетического алгоритма, например, если достоверно определяется, что содержимое окон, начиная с определенного количества, не влияет на наличие угрозы в пакете, например наличием вероятности использования пакета для реализации операций вторжения.The number of windows can be determined without using a genetic algorithm, for example, if it is reliably determined that the contents of the windows, starting from a certain number, do not affect the presence of a threat in the packet, for example, the probability of using the packet to implement intrusion operations.

Один из вариантов применения подбора значений выборок (окон), имеющий общепринятое название «генетический алгоритм», раскрыт в патенте КНР CN 106817376 A. Последовательность операций, раскрытая в указанном патенте, может быть применена в предложенном изобретении.One of the variants of application of selection of sample values (windows), which has the generally accepted name "genetic algorithm", is disclosed in the PRC patent CN 106817376 A. The sequence of operations disclosed in the said patent can be applied in the proposed invention.

Например, при реализации генетического алгоритма, выбираются «родительские хромосомы», то есть последовательности бит, используемые в качестве параметры настройки выборок, которые были применены для предшествующих операций фильтрации и показали свою эффективность.For example, when implementing a genetic algorithm, "parent chromosomes" are selected, that is, sequences of bits used as sample settings that have been applied to previous filtering operations and have shown their effectiveness.

Выбирается количество «родительских хромосом» для формирования потомства, часть хромосом подвергается случайным изменениям, где положение мутировавших генов выбирается случайным образом, количество мутаций задается заранее, а мутация может быть использована, как путем изменения, так и сохранения гена. Производят случайное скрещивание генов с формированием заранее заданного количества наборов параметров, то есть, «потомства». Параметры потомства применяются для фильтрации блоков и наиболее эффективный «потомок», в том числе, относительно используемых в текущем варианте параметров метода построения гиперплоскости и хэширования с понижением размерности, используется для фильтрации в дальнейшем.The number of "parent chromosomes" for the formation of offspring is selected, some chromosomes are subject to random changes, where the position of mutated genes is selected randomly, the number of mutations is specified in advance, and the mutation can be used both by changing and preserving the gene. Random crossing of genes is performed with the formation of a predetermined number of parameter sets, i.e., "offspring". The parameters of the offspring are used to filter blocks and the most effective "offspring", including relative to the parameters of the hyperplane construction method and hashing with dimensionality reduction used in the current version, is used for further filtering.

Для проверки блоков используется функция локальной чувствительности общая для всех блоков, а для определения гиперплоскости используется новая настройка алгоритмов обучения. Для дальнейшего использования применяется либо проверка на тестовой последовательности, либо сравнение результатов маркировки в режиме реального времени, не только на эффективность выявления опасных блоков, но и на параметры ложноположительных и ложноотрицательных блоков. Следует отметить, что фильтрация является предварительной операцией в общей системе сетевой безопасности и пропуск некоторого процента опасных блоков не приводит к критическим падениям внешних систем. Данный способ позволяет снизить нагрузку в случае наличия угроз, к которым во внешней системе выработался, условный «иммунитет».To check the blocks, the local sensitivity function is used, common to all blocks, and a new setting of the learning algorithms is used to determine the hyperplane. For further use, either checking on the test sequence or comparing the results of marking in real time is used, not only for the efficiency of detecting dangerous blocks, but also for the parameters of false positive and false negative blocks. It should be noted that filtering is a preliminary operation in the general network security system and skipping a certain percentage of dangerous blocks does not lead to critical failures of external systems. This method allows you to reduce the load in the event of threats to which the external system has developed a conditional "immunity".

Производители программно-аппаратных комплексов постоянно обновляют средства защиты данных и оборудования, в том числе, с использованием блокировки опасного кода. В связи с этим, для первоначальной настройки параметров системы, относящихся к средствам автоматического конфигурирования или «обучения», предпочтительно использовать программно-аппаратные средства, не содержащие средства защиты от вредоносного кода. В этом случае, настроенная или обученная система будет иметь возможность анализировать большее количество данных, релевантных к угрозам. Последующее использование способа для анализа блоков данных реального трафика, где для «обучения» используются сведения о поведении реальных систем, исключает из рассмотрения блоки данных, несущих потенциально опасный код, не угрожающий сетевым системам. При этом, структуры, соответствующие общим концепциям формирования опасного кода, например, относящиеся к угрозам «нулевого дня», могут быть эффективно обнаружены.Manufacturers of hardware and software systems constantly update data and equipment protection tools, including using blocking of dangerous code. In this regard, for the initial configuration of system parameters related to automatic configuration or "training" tools, it is preferable to use hardware and software tools that do not contain protection tools against malicious code. In this case, the configured or trained system will be able to analyze a larger amount of data relevant to threats. Subsequent use of the method for analyzing real traffic data blocks, where information about the behavior of real systems is used for "training", excludes from consideration data blocks carrying potentially dangerous code that does not threaten network systems. At the same time, structures corresponding to general concepts of dangerous code formation, for example, related to "zero-day" threats, can be effectively detected.

При использовании генетического алгоритма, в качестве генов используются «последовательности» в виде цепочек бит или байт заданной длины, в которых возможно изменение бит в любом участке цепочки. В частном случае, например, при использовании байтовой группировки или другой группировке битовых структур, при применении генетического алгоритма возможно определение значащих байт, не влияющих на результаты анализа, или для которых определено, что их структура не влияет на поставленную задачу.When using a genetic algorithm, "sequences" in the form of chains of bits or bytes of a given length are used as genes, in which it is possible to change a bit in any part of the chain. In a particular case, for example, when using a byte grouping or another grouping of bit structures, when using a genetic algorithm, it is possible to determine significant bytes that do not affect the results of the analysis, or for which it is determined that their structure does not affect the task at hand.

Работа генетического алгоритма сводится к формированию комбинаций «генов» с минимальными затратами времени. Например, гены, для которых эффективность работы минимальна, исключаются из рассмотрения, для оставшихся генов, пор возможности, путем последовательного применения, определяются участки, которые влияют на поставленную задачу, применяются их комбинирование. Операции проводятся случайным образом, в общем случае, завершающий этап отсутствует. Тем не менее, настройка генетического алгоритма может использоваться для обновления системы безопасности с учетом новых событий - угроз и ликвидации старых угроз. Более того, предложенный метод может использоваться для определения кода, который негативно влияет на работу вычислительных систем, но не является результатом действий злоумышленников. В этом случае, анализ кода позволяет определить потенциально опасные участки разработанного кода в процессе тестированияThe work of the genetic algorithm is reduced to the formation of combinations of "genes" with minimal time expenditure. For example, genes for which the efficiency of work is minimal are excluded from consideration, for the remaining genes, when possible, by sequential application, sections that affect the task are determined, their combination is applied. Operations are carried out randomly, in the general case, the final stage is absent. Nevertheless, the configuration of the genetic algorithm can be used to update the security system taking into account new events - threats and the elimination of old threats. Moreover, the proposed method can be used to determine the code that negatively affects the operation of computing systems, but is not the result of intruders' actions. In this case, code analysis allows you to determine potentially dangerous sections of the developed code during testing

На следующем этапе, выбранные окна анализируются на предмет наличия в них участков данных с низкой энтропией, то есть на наличие данных, последовательность бит указывает на их искусственное происхождение. Таким данными могут быть участки, представляющие тексты на естественных языках или языках программирования, последовательности данных, имеющие повторяющийся характер. Применительно к предложенному способу, низкая энтропия указывает на высокую степень организации или предсказуемости данных.At the next stage, the selected windows are analyzed for the presence of low-entropy data sections, i.e. for the presence of data, the bit sequence indicates their artificial origin. Such data may be sections representing texts in natural languages or programming languages, data sequences that have a repetitive nature. In relation to the proposed method, low entropy indicates a high degree of organization or predictability of the data.

Участки данных с низкой энтропией используются для последующего анализа. Участки данных или выбранные окна, в которых отсутствуют выявленные закономерности в распределении данных, в дальнейшем анализе не используются.Data areas with low entropy are used for further analysis. Data areas or selected windows in which there are no identified patterns in the data distribution are not used in further analysis.

В частном случае, способ может использоваться с применением алгоритма Locality-sensitive hashing (LSH) (хэширования, чувствительного к местоположению), вероятностного метода понижения размерности многомерных данных, основной принцип которого состоит в таком подборе хеш-функций для некоторых измерений, чтобы похожие объекты с высокой степенью вероятности имели одинаковый хэш.In a particular case, the method can be used with the Locality-sensitive hashing (LSH) algorithm, a probabilistic method for reducing the dimensionality of multidimensional data, the basic principle of which is to select hash functions for some dimensions so that similar objects with a high degree of probability have the same hash.

В частном случае реализации, хэш функция может быть задана в табличном виде.In a particular implementation case, the hash function can be specified in tabular form.

Для вычисления LSH могут использоваться алгоритмы MinHash, SimHash, Метод случайных проекций, tlsh.MinHash, SimHash, Random Projection Method, tlsh algorithms can be used to calculate LSH.

Исследование предложенного алгоритма показало эффективность применения генетического алгоритма существенное для ложноположительных срабатываний на 12% перед использованием хэширования без изменения параметров хэширования, по сравнению с различными параметрами хэширования и сопоставимое качество выявления атак, то есть, количеством эффективных срабатываний. При этом, на этапах применения генетического алгоритма, например, при росте количества обучающих срабатываний, алгоритм повышает нагрузку на процессор на 72% относительно стандартных алгоритмов, но снижает нагрузку на процессоры в режиме стабильной обработки трафика, за счет снижения размерности обрабатываемых данных на 56%.The study of the proposed algorithm showed the efficiency of the genetic algorithm application to be significant for false positives by 12% before using hashing without changing the hashing parameters, compared to different hashing parameters and comparable quality of attack detection, i.e., the number of effective responses. At the same time, at the stages of the genetic algorithm application, for example, with an increase in the number of training responses, the algorithm increases the load on the processor by 72% relative to standard algorithms, but reduces the load on processors in the stable traffic processing mode, due to a decrease in the dimensionality of the processed data by 56%.

Несмотря на то, что изобретение описано со ссылкой на раскрываемые варианты воплощения, для специалистов в данной области должно быть очевидно, что конкретные подробно описанные эксперименты приведены лишь в целях иллюстрирования настоящего изобретения, и их не следует рассматривать как каким-либо образом ограничивающие объем изобретения. Должно быть понятно, что возможно осуществление различных модификаций без отступления от сути настоящего изобретения.Although the invention has been described with reference to the disclosed embodiments, it will be apparent to those skilled in the art that the specific experiments described in detail are provided merely for the purpose of illustrating the present invention and should not be considered as limiting the scope of the invention in any way. It will be understood that various modifications can be made without departing from the spirit of the present invention.

Claims

1. A method for marking potentially dangerous data blocks, which consists in the fact that using a digital representation of data, implemented in the form of a set of memory cells, the physical state of which corresponds to the represented data stored in the memory of a hardware and software complex, the following stages of signal conversion and data processing are carried out:

sequentially receive data blocks presented in the form of blocks of structured information transmission units intended for transmission over a computer network, after which the received data blocks are transmitted over a computer network, where the structure of the data blocks is determined by the data transmission protocol over a computer network;

specifying parameters of data samples, where the data sample is presented in the form of a section of a data block of consecutive information transfer units, and the specified parameters of the samples contain the length of the sample and the starting address characterizing the position of the first information transfer unit in the sample, wherein the specified parameters of the data samples determine the number of data samples in each data block, presented in the binary data storage format, the starting address of the first data sample in each of the data blocks, and the offsets of the starting addresses of subsequent data samples relative to the starting address of the first data sample in each of the data blocks;

specify a method for converting sections of data represented by a sequence of information transmission units, ensuring the conversion of the sequence of information transmission units of each of the sections of data into a sequence of a predetermined number of bits of compressed sections of data in such a way that for blocks of data related to similar hazardous impact on computer network devices, when comparing compressed sections of data according to specified criteria, similar sections of data correspond to similar corresponding blocks of data;

(A) for data blocks, separate sets of data samples are formed from sequentially received data blocks in accordance with specified parameters of the data samples;

from each of the data samples, a data section is selected such that the selected data section represents a sequence of information transfer units with an entropy that is minimal for the possible data sections of the sample;

convert data chunks into compressed data chunks;

(B) checking the compressed data sections of the data blocks for probable belonging to the compressed data sections of the data blocks of dangerous data blocks using a pre-configured classifier of compressed data sections such that for a multidimensional space, the number of dimensions of which is equal to the number of bits in the compressed sections, for which purpose the coordinates of a multidimensional plane are specified such that the majority of points with the coordinates of the compressed data sections of the data blocks of obviously dangerous data blocks are located on the first side of the multidimensional plane in the multidimensional space, the majority of points with the coordinates of the compressed sections of obviously non-dangerous data blocks are located on the second other side of the multidimensional plane, and the coordinates of the compressed sections in the multidimensional space are represented by the values of the bits of the compressed sections located in the corresponding positions of the digital representation of the compressed sections, where, when a compressed section of the data block is located on the first side of the multidimensional plane, a notification is generated about the potential danger of the data block and the corresponding data block is transmitted over the computer network together with the notification about the potential danger of the block;

in this case:

(C) check the hazard of blocks for which a potential hazard notification has been generated and check the safety of blocks for which a potential hazard notification has not been generated;

when non-conforming blocks are detected in a predetermined number of blocks, such that a notification of potential danger has not been generated for dangerous blocks and a notification of potential danger has been generated for safe blocks, the coordinates of the multidimensional plane used in step (B) are specified, ensuring the presence of a minimum number of non-conforming blocks in the subsequent predetermined number of blocks;

(D) if the minimum number of non-conforming blocks in a predetermined number of blocks does not change when refining the coordinates of the multidimensional plane, change the parameters of the samples at step (A) to ensure the minimum number of non-conforming blocks so that

randomly select sets of previously used sample parameters;

from a random pair of sets of previously used parameters, a resulting set of parameters is formed so that the significant bits of the parameters are grouped into sectors of a predetermined length with predetermined addresses, and the bits of the parameters of the resulting set are formed from the sectors of the parameters of the random pair by randomly selecting sectors with the corresponding addresses of the corresponding parameter of one of the random pairs;

from several sets of parameters from the previously used sets of parameters and the resulting set of parameters, the best set of parameters is selected for which the minimum number of non-conforming blocks is ensured during the implementation of stage (C) and the best set of parameters is used at stage (A).

2. The method according to item 1, characterized in that the operations of the method are used in networks where the data blocks are network data blocks and have the format of packets of a packet data network.

3. The method according to paragraph 2, characterized in that the operations of the method are used on the Internet or a local area network.

4. A method according to any of the preceding paragraphs, characterized in that the data blocks are grouped units of information transmission, the size of each of which is 1 byte.

5. A method according to any of the preceding paragraphs, characterized in that the parameters of the samples are selected with the possibility of overlapping sequences of information transmission units that make up different samples.

6. A method according to any of the preceding paragraphs, characterized in that the number of samples in a packet is determined by the length of the packet in accordance with the step and length of the data samples.

7. A method according to any of the preceding paragraphs, characterized in that the parameters of one of the samples are formed by inverting all the bits of randomly selected sectors of one of the used sets of parameters.

8. A method according to any of the preceding paragraphs, characterized in that before starting to use the method according to paragraph 1, the parameters of the method are pre-set, for which the initial parameters of the data samples are randomly set, and the method according to paragraph 1 is implemented, in which the data blocks of the training sequence of blocks are sequentially received, the blocks of which are obviously related to dangerous and safe data blocks.

9. A method according to any of the preceding paragraphs, characterized in that the training sequence of blocks is represented by a finite number of blocks with the possibility of multiple reception of data blocks of the training sequence, wherein the data blocks for reception are selected randomly.

10. The method according to any of the preceding paragraphs, characterized in that the implementation of the method is suspended at any of the stages and the operations of the method according to paragraph 1 are performed with a check of the quality of the marking of potentially dangerous data blocks, for which purpose the data blocks of the test sequence of data blocks are received, the data blocks of which differ from the blocks of the training sequence of blocks and are obviously related to dangerous and safe blocks, and, if there is marking for more than 70% of obviously dangerous blocks and no marking for more than 70% of the blocks, the settings of the method are used for marking the network traffic blocks, and, in another case, additional preliminary adjustment of the parameters of the method is performed.

11. The method according to item 1, characterized in that stage (D) is performed using a sequence of operations used to implement the “genetic algorithm” technology on software and hardware computing equipment.

12. A method according to any of the preceding paragraphs, characterized in that data sections are converted into compressed data sections using a sequence of operations used by the implementation of the Locality-sensitive hashing (lsh) technology.

13. The method according to paragraph 12, characterized in that the sequence of operations of the “Trend Micro Locality Sensitive Hash” (tlsh) technology is used as the position-sensitive hashing technology.

14. A method according to any of the preceding paragraphs, characterized in that the determination of the coordinates of the multidimensional plane is carried out using a sequence of operations used in the implementation of support vector machine (SVM) technology.

15. A method according to any of the preceding paragraphs, characterized in that the data blocks are received and transmitted via an electromagnetic signal transmission medium in the form of structured electromagnetic signal packets, where the electromagnetic signal packets are converted into a digital representation of the data blocks and the digital representation of the data blocks are converted into electromagnetic signal packets.

16. A method according to any of the preceding paragraphs, characterized in that data sections are compressed using a given data compression technology, and the maximum possible degree of compression of data sections by a given data compression technology is used as the entropy parameter, wherein an increase in the degree of compression, i.e. the ratio of the volume of the original data section to the volume of the compressed representation of the data, characterizes a decrease in entropy.

17. A method according to any of the preceding paragraphs, characterized in that the compression technology used is a lossy data compression technology.

18. A method according to any of the preceding paragraphs, characterized in that a hash function with the ability to change parameters is used as a hash function with dimensionality reduction, and at step (D) the values of the changeable parameters of the hash function are used as one of the sets of sample parameters.

19. The method according to item 18, characterized in that the length of the resulting value of the hash function is used as a parameter of the hash function.