[go: up one dir, main page]

CN119854325A - Distributed storage method and system based on AI flow identification - Google Patents

Distributed storage method and system based on AI flow identification Download PDF

Info

Publication number
CN119854325A
CN119854325A CN202510340158.1A CN202510340158A CN119854325A CN 119854325 A CN119854325 A CN 119854325A CN 202510340158 A CN202510340158 A CN 202510340158A CN 119854325 A CN119854325 A CN 119854325A
Authority
CN
China
Prior art keywords
node
gnn
cdn network
network node
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510340158.1A
Other languages
Chinese (zh)
Inventor
杨建军
高欣
陆政
陈叶华
周洪印
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
HANGZHOU TRINET INFORMATION TECHNOLOGY CO LTD
Original Assignee
HANGZHOU TRINET INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by HANGZHOU TRINET INFORMATION TECHNOLOGY CO LTD filed Critical HANGZHOU TRINET INFORMATION TECHNOLOGY CO LTD
Priority to CN202510340158.1A priority Critical patent/CN119854325A/en
Publication of CN119854325A publication Critical patent/CN119854325A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/03Topology update or discovery by updating link state protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/08Learning-based routing, e.g. using neural networks or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/38Flow based routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/44Distributed routing

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a distributed storage method and a distributed storage system based on AI flow identification, wherein the distributed storage method comprises the steps of obtaining state information of a current CDN network node, simulating and constructing a first GNN node in a GNN model, constructing adjacent nodes of the GNN model according to communication relations between the current CDN network node and different CDN network nodes, obtaining communication flow information between the first GNN node and corresponding adjacent nodes, utilizing an AI model to identify flow characteristic data in the communication flow information, constructing a dynamic adjacent matrix between the first GNN node and the adjacent nodes, utilizing the dynamic adjacent matrix to update the states of the first GNN node and the adjacent nodes, executing a caching strategy or a flow processing strategy of the corresponding CDN network nodes according to the updated states of the first GNN node and the adjacent nodes, and routing the nodes of the current CDN network to the nodes of an optimal CDN network according to the caching strategy or the flow processing strategy.

Description

Distributed storage method and system based on AI flow identification
Technical Field
The invention relates to the technical field of distributed storage, in particular to a distributed storage method and system based on AI flow identification.
Background
The conventional distributed storage method currently existing realizes redundancy and backup of data by storing the data on servers in a plurality of physical locations. Even if one server fails, copies of data stored on other servers are still available, thereby ensuring data persistence and system stability. The traditional distributed storage does not comprehensively analyze traffic, such as data traffic with network threat or data traffic with illegal contents are stored according to a distributed data slicing method, so that the stored data is low in safety and cleanliness, and malicious occupation of server storage resources is easily caused under the condition that a server node is attacked. Meanwhile, in the process of scheduling storage resources, the traditional distributed storage does not effectively analyze data traffic, so that the distributed storage efficiency is low.
Disclosure of Invention
One of the purposes of the invention is to provide a distributed storage method and a system based on AI flow identification, which identify flow data of different storage nodes by using an AI model, wherein the invention identifies network attack data and illegal content data possibly existing in the flow data by using the AI model, and intercepts or schedules the flow data based on the identified network attack data and illegal content data, thereby realizing the directional processing of the flow of the network attack data and the illegal content data, improving the data security of distributed storage, reducing the occupation of the illegal content data to distributed storage resources and improving the storage effect of the distributed data.
The invention further aims to provide a distributed storage method and a distributed storage system based on AI flow identification, wherein the method and the distributed storage system utilize a CDN network (content delivery network) as a distributed storage system, the AI model is utilized to identify and analyze the flows of different nodes of the CDN network to obtain network attack data and illegal content data of the different node flows of the CDN network, and corresponding CDN network node caching strategies are carried out based on the network attack data and the illegal content data.
The invention further aims to provide a distributed storage method and a distributed storage system based on AI flow identification, wherein the method and the distributed storage system utilize a GNN model (graph neural network model) to simulate node behaviors of CDN network nodes including but not limited to flow data, wherein related data of the CDN network nodes are used as node characteristics in the GNN model, and results of AI analysis of the flow data among different CDN network nodes are used as connecting edge characteristics (adjacent matrixes) among different nodes in the GNN model, and the GNN model is utilized to construct a dynamic adjacent matrix based on the flow characteristics identified by the AI model to update data of corresponding CDN network nodes or execute a cache strategy.
In order to achieve at least one of the above objects, the present invention further provides a distributed storage method based on AI traffic identification, the method comprising:
Acquiring current CDN network node state information, simulating and constructing a first GNN node in a GNN model according to the current CDN network node state information, and constructing adjacent nodes of the GNN model according to communication relations between the current CDN network node and different CDN network nodes;
Acquiring communication flow information between the first GNN node and a corresponding adjacent node, identifying flow characteristic data in the communication flow information by using an AI model, and constructing a dynamic adjacent matrix between the first GNN node and the adjacent node according to the flow characteristic data;
Updating the states of the first GNN node and the adjacent nodes by using the dynamic adjacent matrix, and executing a caching strategy or a flow processing strategy of the corresponding CDN network node according to the updated states of the first GNN node and the adjacent nodes;
And acquiring a current CDN network node routing link table, and routing the current CDN network node to an optimal CDN network node according to the caching strategy or the flow processing strategy.
According to one preferred embodiment of the invention, the current CDN network node state information comprises a hardware resource state, a network resource state and a cache state, wherein the hardware resource state comprises CPU utilization rate, memory utilization rate and read-write performance data of a cache disk, the network resource state comprises uplink and downlink bandwidth utilization rate of the current CDN network node, packet loss rate of the current CDN network node, network request rate, concurrent connection number and URL number, the cache state comprises cache hit rate, cache elimination rate, cache preheating state and cache content type, the hardware resource state, the network resource state and the cache state are respectively subjected to data preprocessing to obtain a feature matrix of the first GNN node, and the dynamic adjacency matrix is constructed according to traffic feature data identified by an AI model and between the first GNN node and a corresponding adjacency node.
According to another preferred embodiment of the invention, the construction method of the dynamic adjacency matrix comprises the steps of obtaining network flow characteristic data between the first GNN node and the corresponding adjacency node by utilizing the identification of the AI model including image identification AI model, word identification AI model, voice identification AI model and network attack identification AI model, wherein the flow characteristic data comprise illegal video frame data, illegal voice data, illegal text data and network attack types and quantity, respectively carrying out data preprocessing on different flow characteristic data to convert the flow characteristic data into the dynamic adjacency matrix, and carrying out state updating on the first GNN node and the corresponding adjacency node according to the dynamic adjacency matrix.
According to another preferred embodiment of the present invention, the method for updating the states of the first GNN node and the corresponding neighboring nodes includes constructing a multi-head attention coefficient from traffic feature data obtained by identifying traffic according to the AI modelWherein i represents a current first GNN node, j represents an adjacent node having a communication relationship with the first GNN node, h i represents a first GNN node feature, a and W represent different learnable parameters, wherein W is a weight matrix, a is an action parameter based on flow feature data conversion between nodes, T represents a transpose matrix identification, leakyReLU represents an activation function, softmax represents a linear classification function, and the attention coefficient α ij represents similar feature weights of the first GNN node and the corresponding adjacent node.
According to another preferred embodiment of the present invention, the method for updating the states of the first GNN node and the corresponding neighboring nodes further includes performing parallel computation on the multiple-head attention coefficient α ij for capturing the relationship type between the first GNN node i and the neighboring node j, and computing the updated content of the first GNN node by the multiple-head attention coefficient α ij includes constructing a dynamic neighboring matrix a dynamic[i,j]k =using the multiple-head attention coefficient α ij Wherein k represents the corresponding attention head, and the dynamic adjacency matrix A dynamic[i,j]k is used for replacing the original static adjacency matrix A static, and after replacement, the update of the first GNN node is performed according to the following formulaWhereinRepresenting the updated predicted first GNN node state, N representing a node relationship function, K representing the total number of attention heads, sigma being an activation function, based on the updated predicted first GNN node stateAnd executing a node caching strategy or a traffic processing strategy corresponding to the current CDN.
According to another preferred embodiment of the present invention, the caching policy or traffic handling policy includes a first GNN node state based on the updated predictionUpdating predicted buffer status for node of current CDN network to allow incoming buffer of data of corresponding adjacent nodeLimiting the current node prediction of CDN network from the corresponding adjacent node network attack data or transferring to the node of the specific CDN networkAnd carrying out a strategy comprising limiting or prohibiting access and caching on the violation data from the corresponding adjacent node predicted by the node of the current CDN.
According to another preferred embodiment of the present invention, the CDN network node includes a central CDN network node and edge CDN network nodes, and after the current CDN network node routing link table is obtained, the central CDN network node determines an edge CDN network node closest to the current CDN network node according to the current CDN network node routing link table, and transmits data of an adjacent node to the closest edge CDN network node through route conversion according to the caching policy, so as to implement efficient processing of cache data.
According to another preferred embodiment of the present invention, the method for calculating the distance L between closest edge CDN network nodes of the current CDN network node includes obtaining a round trip time RTT from the current CDN network node to any one edge CDN network node, calculating a hop count Hops to the edge CDN network node, and calculating a bandwidth utilization P of the current CDN network node, where a calculation formula of the distance L between closest edge CDN network nodes of the current CDN network node is: wherein β, λ, and γ are weight coefficients, respectively.
In order to achieve at least one of the above objects, the present invention further provides a distributed storage system based on AI-traffic recognition, which performs the above-described distributed storage method based on AI-traffic recognition.
The present invention further provides a computer-readable storage medium storing a computer program that is executed by a processor to implement the above-described AI-traffic-identification-based distributed storage method.
Drawings
Fig. 1 shows a flow chart of a distributed storage method based on AI flow identification of the invention.
Detailed Description
The following description is presented to enable one of ordinary skill in the art to make and use the invention. The preferred embodiments in the following description are by way of example only and other obvious variations will occur to those skilled in the art. The basic principles of the invention defined in the following description may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.
It will be understood that the terms "a" and "an" should be interpreted as referring to "at least one" or "one or more," i.e., in one embodiment, the number of elements may be one, while in another embodiment, the number of elements may be plural, and the term "a" should not be interpreted as limiting the number.
Referring to fig. 1, the invention discloses a distributed storage method and a system based on AI traffic identification, wherein the method includes that firstly, CDN network node state information needs to be acquired, and GNN nodes (graph neural network nodes) are built by performing feature conversion simulation according to the CDN network node state information, wherein the CDN network nodes are content delivery network nodes, the CDN network nodes include a central CDN network node and edge CDN network nodes, the central CDN network node is equivalent to a main control node, and is used for monitoring a plurality of edge CDN network nodes and performing content delivery caching on the plurality of edge CDN network nodes, wherein the central CDN network node can also perform traffic control including among the plurality of edge CDN network nodes, or directionally deliver corresponding edge CDN network nodes to a specific CDN network node. The invention utilizes the CDN network nodes to carry out distributed storage and carries out caching strategies and flow control strategies in different modes according to the flow characteristics among different CDN network nodes, and particularly carries out the caching strategies and the flow control strategies aiming at flow and violation data including network attack. The invention uses a GNN model (graph neural network model) as a driving model of a distributed storage strategy, and utilizes the GNN model to simulate the states of CDN network nodes and the relations between different CDN network nodes, wherein an adjacency relation (equivalent to the connecting edges of different GNN model nodes) based on the GNN model is constructed according to flow characteristic data between the different CDN network nodes, the state change of the GNN model nodes is judged according to the adjacency relation, and the corresponding caching strategy and flow control strategy of the CDN network nodes are executed based on the updated and predicted state of the GNN model nodes. Therefore, the invention can effectively improve the safety performance in the distributed cache, effectively reduce the malicious occupation of the distributed storage resources caused by abnormal traffic and improve the resource utilization rate of the distributed storage.
The method for acquiring the CDN network node state information to construct the GNN model node specifically comprises the steps of acquiring the CDN network node state information, wherein the CDN network node state information comprises a hardware resource state, a network resource state and a cache state, the hardware resource state comprises but is not limited to CPU utilization rate, GPU utilization rate, memory utilization rate, read-write performance data of a cache disk and the like of a current CDN network node, the hardware resource state is used for representing the overall performance of edge equipment where the current CDN network node is located, in general, the CDN network node of the edge equipment is used as a distributed cache hardware resource, and the performances of hardware equipment deployed by different edge equipment are different. The network resource state of the CDN network node is automatically identified according to the network environment change of the current CDN network node, for example, the situation that network bandwidth is insufficient and network request speed is reduced may exist in the high concurrency interaction process of the multiple CDN network nodes, and the identification of the network resource state can provide reference data for a CDN network node caching policy and a traffic limiting policy to a certain extent, so that the distributed storage efficiency of the CDN network node is improved as a whole. The cache state comprises, but is not limited to, a cache hit rate, a cache elimination rate, a cache preheating state, a cache content type and the like, wherein the cache hit rate is used for describing the performance of a cache system, particularly in a specific service scene, if the cache hit performance is higher, the cache system of the CDN network node has better distribution effect on a current service content request, and high adaptability of the CDN network node of edge equipment and corresponding service content can be realized, so that the cache hit rate can be used as a state parameter of the GNN graph node to better describe the adaptability of different CDN network nodes to related service flow. Similarly, based on the cache elimination rate, the cache preheating state and the state parameters converted from the cache content type to the GNN map node, the adaptation effect of the corresponding edge device CDN network node and the corresponding service content can be described, which is not described in detail in the present invention.
Furthermore, after the related state data of the hardware resource state, the network resource state and the cache state of the current CDN network node are obtained, the related state data of the current CDN network node can be more normalized by adopting data preprocessing including but not limited to normalization, standardization and feature transformation, wherein the normalized related state data is classified by adopting a label coding (labelEncoder) mode to construct different features. The tag coding (labelEncoder) mode is an existing method for constructing classification features, which is not described in detail in the present invention. And constructing the GNN graph node feature matrix by utilizing the related state data of the hardware resource state, the network resource state and the cache state of the current CDN network node in the tag coding (labelEncoder), wherein the feature matrix is defined as a feature matrix of a first GNN node and is used for describing the related state data of the current CDN network node.
It should be noted that one of the core technical points of the present invention is to construct an adjacency matrix of a first GNN node, where the adjacency matrix of the first GNN node is a matrix describing a relationship between the first GNN node and an adjacency node, where the adjacency node and the first GNN node are defined as CDN network nodes having a communication relationship, and in particular, CDN network nodes having a content delivery scheduling relationship. The adjacency matrix comprises flow characteristic data between the first GNN node and the adjacency node, and the adjacency matrix constructed by the flow characteristic data is used for updating and predicting state changes of the first GNN node and the adjacency node. It should be noted that, the traffic data may be from the first GNN node to the adjacent node, or from the adjacent node to the first GNN node, where the above definition name may be converted according to the actual traffic transmission situation, which is not limited in detail in the present invention.
In the invention, the flow characteristic data existing in the flow data need to be identified, and because the flow data possibly includes but is not limited to network attack data, illegal video, audio, text and the like, the invention needs to utilize the existing network flow characteristic data comprising an image recognition AI model, a text recognition AI model, a voice recognition AI model and a network attack recognition AI model to identify and acquire the first GNN node and the corresponding adjacent node, utilize the flow characteristics identified by the existing different AI models to output flow characteristic labels (label), and further construct the flow characteristic matrix based on the flow characteristic labels (label) in the same label coding (labelEncoder) mode. Wherein the traffic signature tag includes, but is not limited to, offending video frame data, offending voice data, offending text data, type and number of network attacks, and the like.
It should be noted that, in the conventional GNN model, the adjacency matrix of the GNN network node is generally a static matrix for message passing to update the state of the GNN network node, for example, the conventional GNN model adjacency matrix message passing mechanism formula is as follows: wherein A Static is a conventional static adjacency matrix, In order to update the state of the GNN node before the update,The updated predicted GNN node states are all in the form of node state feature matrixes, N represents node relation functions, i and j respectively represent different nodes, and W is a learnable parameter and is a weight matrix corresponding to node state features in general.
In order to adapt to dynamic flow characteristics and dynamically update and drive the state of the GNN network node, the invention replaces the conventional static adjacency matrix A with a dynamic adjacency matrix A dynamic[i,j]k based on flow characteristic data, wherein the realization method of the dynamic adjacency matrix A dynamic[i,j]k comprises the steps of constructing a multi-head attention coefficient according to the flow characteristic data obtained by identifying the flow according to the AI modelWherein i represents a current first GNN node, j represents an adjacent node having a communication relationship with the first GNN node, h i represents a first GNN node feature, a and W represent different learnable parameters, wherein W is a weight matrix, a is an action parameter based on flow feature data conversion between nodes, T represents a transpose matrix identification, leakyReLU represents an activation function, softmax represents a linear classification function, the attention coefficient alpha ij represents similar feature weights of the first GNN node and a corresponding adjacent node, and the multi-headed attention coefficient alpha ij constructed based on flow feature data is further used for constructing the dynamic adjacent matrix a dynamic[i,j]kij, wherein the multi-headed attention coefficient alpha ij represents the influence of different types of flow features on updating of the first GNN node and the corresponding adjacent node, so the dynamic adjacent matrix a dynamic[i,j]k can be constructed directly through parameter conversion.
Furthermore, the method for calculating the updated first GNN node by the multi-head attention coefficient alpha ij comprises constructing a dynamic adjacency matrix A dynamic[i,j]k = using the multi-head attention coefficient alpha ij Wherein k represents the corresponding attention head, and the dynamic adjacency matrix A dynamic[i,j]k is used for replacing the original static adjacency matrix A static, and after replacement, the update of the first GNN node is performed according to the following formulaWhereinRepresenting the updated predicted first GNN node state, N representing a node relationship function, K representing the total number of attention heads, sigma being an activation function, based on the updated predicted first GNN node stateAnd executing a node caching strategy or a traffic processing strategy corresponding to the current CDN.
The caching strategy or the flow processing strategy comprises the following steps of predicting the state of a first GNN node according to the updatingUpdating predicted buffer status for node of current CDN network to allow incoming buffer of data of corresponding adjacent nodeLimiting the current node prediction of CDN network from the corresponding adjacent node network attack data or transferring to the node of the specific CDN networkAnd carrying out a strategy comprising limiting or prohibiting access and caching on the violation data from the corresponding adjacent node predicted by the node of the current CDN. The above-mentioned caching strategy and flow control strategy are both exemplified, and the present invention is not limited thereto.
In one preferred embodiment of the present invention, in order to find a CDN network node closest to a current CDN network node, after obtaining a current CDN network node routing link table, the central CDN network node determines an edge CDN network node closest to the current CDN network node according to the current CDN network node routing link table, and transmits data of an adjacent node to the closest edge CDN network node through route conversion according to the caching policy, so as to implement efficient processing of cache data. The method for calculating the distance L between the closest edge CDN network nodes of the current CDN network node comprises the steps of obtaining round trip time RTT between the current CDN network node and any one edge CDN network node, calculating the hop count Hops of the edge CDN network node, and calculating the bandwidth utilization rate P of the current CDN network node, wherein the calculation formula of the distance L between the closest edge CDN network nodes of the current CDN network node is as follows: wherein β, λ, and γ are weight coefficients, respectively.
The processes described above with reference to flowcharts may be implemented as computer software programs in accordance with the disclosed embodiments of the application. Embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via a communication portion, and/or installed from a removable medium. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU). The computer readable medium of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wire segments, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless segments, radio lines, fiber optic cables, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It will be understood by those skilled in the art that the embodiments of the present invention described above and shown in the drawings are merely illustrative and not restrictive of the current invention, and that this invention has been shown and described with respect to the functional and structural principles thereof, without departing from such principles, and that any modifications or adaptations of the embodiments of the invention may be possible and practical.

Claims (10)

1.一种基于AI流量识别的分布式存储方法,其特征在于,所述方法包括:1. A distributed storage method based on AI traffic identification, characterized in that the method includes: 获取当前CDN网络节点状态信息,根据所述当前CDN网络节点状态信息模拟构建GNN模型中的第一GNN节点,根据当前CDN网络节点和不同CDN网络节点的通讯关系构建GNN模型的邻接节点;Obtain current CDN network node status information, simulate and construct a first GNN node in the GNN model according to the current CDN network node status information, and construct adjacent nodes of the GNN model according to the communication relationship between the current CDN network node and different CDN network nodes; 获取所述第一GNN节点和对应邻接节点之间的通讯流量信息,并利用AI模型识别所述通讯流量信息中的流量特征数据,并根据所述流量特征数据构建所述第一GNN节点和邻接节点之间的动态邻接矩阵;Acquire communication flow information between the first GNN node and the corresponding adjacent node, identify flow characteristic data in the communication flow information using an AI model, and construct a dynamic adjacency matrix between the first GNN node and the adjacent node according to the flow characteristic data; 利用所述动态邻接矩阵进行所述第一GNN节点和邻接节点的状态更新,根据更新的所述第一GNN节点和邻接节点的状态执行对应CDN网络节点的缓存策略或流量处理策略;Using the dynamic adjacency matrix to update the status of the first GNN node and the adjacent nodes, and executing the caching strategy or traffic processing strategy of the corresponding CDN network node according to the updated status of the first GNN node and the adjacent nodes; 获取当前CDN网络节点路由链路表,根据所述缓存策略或流量处理策略将当前CDN网络的节点路由到最优CDN网络的节点。Obtain the current CDN network node routing link table, and route the current CDN network node to the optimal CDN network node according to the cache strategy or traffic processing strategy. 2.根据本权利要求1所述的一种基于AI流量识别的分布式存储方法,其特征在于,所述当前CDN网络节点状态信息包括硬件资源状态、网络资源状态和缓存状态,其中所述硬件资源状态包括:CPU使用率,内存使用率,缓存磁盘的读写性能数据;所述网络资源状态包括:当前CDN网络节点的上下行带宽使用率、当前CDN网络节点的丢包率、网络请求速率、并发连接数和URL数量;所述缓存状态包括:缓存命中率、缓存淘汰率、缓存预热状态和缓存内容类型;将所述硬件资源状态、网络资源状态和缓存状态分别进行数据预处理得到所述第一GNN节点的特征矩阵,并根据AI模型识别的从第一GNN节点和对应邻接节点之间的流量特征数据构建所述动态邻接矩阵。2. A distributed storage method based on AI traffic identification according to claim 1, characterized in that the current CDN network node status information includes hardware resource status, network resource status and cache status, wherein the hardware resource status includes: CPU utilization, memory utilization, cache disk read and write performance data; the network resource status includes: the upstream and downstream bandwidth utilization of the current CDN network node, the packet loss rate of the current CDN network node, the network request rate, the number of concurrent connections and the number of URLs; the cache status includes: cache hit rate, cache elimination rate, cache preheating status and cache content type; the hardware resource status, network resource status and cache status are respectively preprocessed to obtain the feature matrix of the first GNN node, and the dynamic adjacency matrix is constructed according to the traffic feature data between the first GNN node and the corresponding adjacent node identified by the AI model. 3.根据本权利要求2所述的一种基于AI流量识别的分布式存储方法,其特征在于,所述动态邻接矩阵的构建方法包括:利用包括图像识别AI模型、文字识别AI模型、语音识别AI模型和网络攻击识别AI模型识别获取所述第一GNN节点和对应邻接节点之间的网络流量特征数据;其中所述流量特征数据包括:违规视频帧数据、违规语音数据、违规文字数据、网络攻击类型和数量;将不同的所述流量特征数据分别进行数据预处理转化为所述动态邻接矩阵,并根据所述动态邻接矩阵进行所述第一GNN节点和对应邻接节点的状态更新。3. According to a distributed storage method based on AI traffic identification as described in claim 2, it is characterized in that the method for constructing the dynamic adjacency matrix includes: using an image recognition AI model, a text recognition AI model, a speech recognition AI model and a network attack recognition AI model to identify and obtain the network traffic feature data between the first GNN node and the corresponding adjacent node; wherein the traffic feature data includes: illegal video frame data, illegal voice data, illegal text data, network attack type and quantity; pre-processing the different traffic feature data respectively to convert them into the dynamic adjacency matrix, and updating the status of the first GNN node and the corresponding adjacent node according to the dynamic adjacency matrix. 4.根据本权利要求1所述的一种基于AI流量识别的分布式存储方法,其特征在于,所述第一GNN节点和对应邻接节点的状态更新方法包括:根据所述AI模型对流量进行识别得到的流量特征数据构建多头注意力系数,其中i表示当前第一GNN节点,j表示和所述第一GNN节点具有通讯关系的邻接节点,hi表示第一GNN节点特征,a和W表示不同的可学习参数,其中W为权重矩阵,a为节点之间基于流量特征数据转化的动作参数,T表示转置矩阵标识,LeakyReLU表示激活函数,Softmax表示线性分类函数,所述注意力系数αij表示第一GNN节点和对应邻接节点的相似特征权重。4. According to the distributed storage method based on AI traffic identification according to claim 1, the state update method of the first GNN node and the corresponding adjacent node includes: constructing a multi-head attention coefficient according to the traffic feature data obtained by identifying the traffic according to the AI model , where i represents the current first GNN node, j represents the adjacent node having a communication relationship with the first GNN node, hi represents the first GNN node feature, a and W represent different learnable parameters, where W is the weight matrix, a is the action parameter between nodes based on the transformation of traffic feature data, T represents the transposed matrix identifier, LeakyReLU represents the activation function, Softmax represents the linear classification function, and the attention coefficient αij represents the similarity feature weight of the first GNN node and the corresponding adjacent node. 5.根据本权利要求4所述的一种基于AI流量识别的分布式存储方法,其特征在于,所述第一GNN节点和对应邻接节点的状态更新方法还包括:对所述多头注意力系数αij进行并行计算,用于捕捉所述第一GNN节点i和邻接节点j之间的关系类型, 通过所述多头注意力系数αij计算更新的第一GNN节点的内容包括:利用所述多头注意力系数αij构建动态邻接矩阵Adynamic[i,j]k=,其中k表示对应的注意力头,并将所述动态邻接矩阵Adynamic[i,j]k替换原始静态邻接矩阵Astatic,替换后按照如下公式进行所述第一GNN节点的更新,其中表示更新后预测的第一GNN节点状态,N表示节点关系函数,K表示注意力头总数,σ为激活函数,根据所述更新后预测的第一GNN节点状态执行对应当前CDN网络的节点缓存策略或流量处理策略。5. According to a distributed storage method based on AI traffic identification according to claim 4, it is characterized in that the state update method of the first GNN node and the corresponding adjacent node also includes: parallel calculation of the multi-head attention coefficient α ij to capture the relationship type between the first GNN node i and the adjacent node j, and the content of the first GNN node updated by calculating the multi-head attention coefficient α ij includes: using the multi-head attention coefficient α ij to construct a dynamic adjacency matrix A dynamic [i, j] k = , where k represents the corresponding attention head, and the dynamic adjacency matrix A dynamic [i, j] k replaces the original static adjacency matrix A static , and after replacement, the first GNN node is updated according to the following formula ,in represents the first GNN node state predicted after the update, N represents the node relationship function, K represents the total number of attention heads, σ is the activation function, and the first GNN node state predicted after the update is obtained. Execute the node cache strategy or traffic processing strategy corresponding to the current CDN network. 6.根据本权利要求5所述的一种基于AI流量识别的分布式存储方法,其特征在于,其中所述缓存策略或流量处理策略包括:根据所述更新后预测的第一GNN节点状态对当前CDN网络的节点更新预测的缓存状态允许对应邻接节点数据的传入缓存;根据更新后预测的第一GNN节点状态对当前CDN网络的节点预测的来自对应邻接节点网络攻击数据进行限流或转入特定CDN网络的节点;根据更新后预测的第一GNN节点状态对当前CDN网络的节点预测的来自对应邻接节点违规数据进行包括限流或禁止访问和缓存策略的处理。6. A distributed storage method based on AI traffic identification according to claim 5, characterized in that the cache strategy or traffic processing strategy includes: according to the first GNN node state predicted after the update Update the predicted cache state of the node in the current CDN network to allow the corresponding adjacent node data to be cached; according to the updated predicted first GNN node state The network attack data from the corresponding adjacent nodes predicted by the nodes of the current CDN network is limited or transferred to the nodes of a specific CDN network; according to the first GNN node state predicted after the update The node prediction of the current CDN network is processed with the violation data from the corresponding adjacent node, including flow limiting or access prohibition and caching strategy. 7.根据本权利要求1所述的一种基于AI流量识别的分布式存储方法,其特征在于,CDN网络节点包括中心CDN网络节点和边缘CDN网络节点,当获取当前CDN网络节点路由链路表后,通过中心CDN网络节点根据所述当前CDN网络节点路由链路表判断当前CDN网络节点最接近的边缘CDN网络节点,并根据所述缓存策略将邻接节点的数据通过路由转换传输到所述最接近的边缘CDN网络节点中,实现对缓存数据高效处理。7. According to a distributed storage method based on AI traffic identification as described in claim 1, it is characterized in that the CDN network nodes include a central CDN network node and an edge CDN network node. After obtaining the routing link table of the current CDN network node, the central CDN network node determines the edge CDN network node closest to the current CDN network node according to the routing link table of the current CDN network node, and transmits the data of the adjacent node to the closest edge CDN network node through routing conversion according to the cache strategy, so as to realize efficient processing of cached data. 8.根据本权利要求7所述的一种基于AI流量识别的分布式存储方法,其特征在于,所述当前CDN网络节点最接近的边缘CDN网络节点之间的距离L计算方法包括:获取从当前CDN网络节点到任意一个边缘CDN网络节点之间的往返时间RTT,并计算到所述边缘CDN网络节点的跳数Hops,并计算当前CDN网络节点的带宽利用率P,则当前CDN网络节点最接近的边缘CDN网络节点之间的距离L的计算公式为:,其中β、λ和γ分别为权重系数。8. According to a distributed storage method based on AI traffic identification according to claim 7, it is characterized in that the method for calculating the distance L between the edge CDN network node closest to the current CDN network node includes: obtaining the round-trip time RTT from the current CDN network node to any edge CDN network node, and calculating the number of hops Hops to the edge CDN network node, and calculating the bandwidth utilization P of the current CDN network node, then the calculation formula for the distance L between the edge CDN network node closest to the current CDN network node is: , where β, λ and γ are weight coefficients respectively. 9.一种基于AI流量识别的分布式存储系统,其特征在于,所述系统执行权利要求1-8中任意一项所述的一种基于AI流量识别的分布式存储方法。9. A distributed storage system based on AI traffic identification, characterized in that the system executes a distributed storage method based on AI traffic identification as described in any one of claims 1-8. 10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现权利要求1-8中任意一项所述的一种基于AI流量识别的分布式存储方法。10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement a distributed storage method based on AI traffic identification as described in any one of claims 1-8.
CN202510340158.1A 2025-03-21 2025-03-21 Distributed storage method and system based on AI flow identification Pending CN119854325A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510340158.1A CN119854325A (en) 2025-03-21 2025-03-21 Distributed storage method and system based on AI flow identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510340158.1A CN119854325A (en) 2025-03-21 2025-03-21 Distributed storage method and system based on AI flow identification

Publications (1)

Publication Number Publication Date
CN119854325A true CN119854325A (en) 2025-04-18

Family

ID=95367670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510340158.1A Pending CN119854325A (en) 2025-03-21 2025-03-21 Distributed storage method and system based on AI flow identification

Country Status (1)

Country Link
CN (1) CN119854325A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102640472A (en) * 2009-12-14 2012-08-15 瑞典爱立信有限公司 Dynamic cache selection method and system
CN113705959A (en) * 2021-05-11 2021-11-26 北京邮电大学 Network resource allocation method and electronic equipment
US20210383228A1 (en) * 2020-06-05 2021-12-09 Deepmind Technologies Limited Generating prediction outputs using dynamic graphs
US20240023028A1 (en) * 2022-11-22 2024-01-18 Intel Corporation Wireless network energy saving with graph neural networks
CN118678377A (en) * 2024-08-16 2024-09-20 深圳市大数据研究院 Spectrum coverage map construction method and device based on heterogeneous graph neural network
CN119561738A (en) * 2024-11-21 2025-03-04 吉林大学 A graph neural network-based intrusion detection method, device and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102640472A (en) * 2009-12-14 2012-08-15 瑞典爱立信有限公司 Dynamic cache selection method and system
US20210383228A1 (en) * 2020-06-05 2021-12-09 Deepmind Technologies Limited Generating prediction outputs using dynamic graphs
CN113705959A (en) * 2021-05-11 2021-11-26 北京邮电大学 Network resource allocation method and electronic equipment
US20240023028A1 (en) * 2022-11-22 2024-01-18 Intel Corporation Wireless network energy saving with graph neural networks
CN118678377A (en) * 2024-08-16 2024-09-20 深圳市大数据研究院 Spectrum coverage map construction method and device based on heterogeneous graph neural network
CN119561738A (en) * 2024-11-21 2025-03-04 吉林大学 A graph neural network-based intrusion detection method, device and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIASEN WANG等: ""Energy Saving Based on Transformer Models with LeakyRelu Activation Function"", 《2023 13TH ICIST》, 29 December 2023 (2023-12-29) *
罗恒: ""基于动态时空图神经网络的网络流量入侵检测方法"", 《计算机工程》, 6 March 2025 (2025-03-06) *

Similar Documents

Publication Publication Date Title
CN113037687B (en) Traffic identification method and electronic equipment
US12273275B2 (en) Dynamic allocation of network resources using external inputs
US20210303984A1 (en) Machine-learning based approach for classification of encrypted network traffic
US10592578B1 (en) Predictive content push-enabled content delivery network
CN110417903A (en) Information processing method and system based on cloud computing
CN113642700A (en) Cross-platform multimodal public opinion analysis method based on federated learning and edge computing
US20210029052A1 (en) Methods and apparatuses for packet scheduling for software- defined networking in edge computing environment
CN113535348A (en) A resource scheduling method and related device
CN117560433A (en) DPU (digital versatile unit) middle report Wen Zhuaifa order preserving method and device, electronic equipment and storage medium
CN114742166B (en) A communication network field maintenance model migration method based on delay optimization
CN113395183B (en) Virtual node scheduling method and system for network simulation platform VLAN interconnection
US12284209B2 (en) Bridging between client and server devices using proxied network metrics
CN119854325A (en) Distributed storage method and system based on AI flow identification
CN118714567A (en) Device identification and access method and device based on radio frequency information and network traffic
CN117596207A (en) Network flow control method and device, electronic equipment and storage medium
CN117215772A (en) Task processing method and device, computer equipment, storage medium and product
CN113194071B (en) Method, system and medium for detecting DDoS (distributed denial of service) based on unsupervised deep learning in SDN (software defined network)
CN116915432A (en) Method, device, equipment and storage medium for arranging calculation network security
CN110324354B (en) Method, device and system for network tracking long chain attack
CN115664826A (en) Data file encryption method and device, computing equipment and storage medium
CN114978585A (en) Deep learning symmetric encryption protocol identification method based on flow characteristics
Magaia et al. An edge-based smart network monitoring system for the Internet of Vehicles
CN119629179B (en) An information processing system based on mimicry decision method
CN119520435B (en) Communication scheduling method, apparatus, computer equipment, computer-readable storage medium, and computer program product
CN118797266B (en) Training method of prediction model and prediction method of information propagation path

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination