Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a micro-service fault positioning method and device based on a self-encoder and a service dependency graph, which judge the running condition of micro-service by learning the fluctuation characteristics of monitoring indexes in normal running of the micro-service through a self-encoder model, solve the problem that various monitoring index thresholds are required to be manually set for carrying out abnormal diagnosis in the existing micro-service fault positioning method, and combine the utilization monitoring indexes of various resources on a server host to carry out weight setting on nodes in the service dependency graph, thereby improving the accuracy of automatically positioning fault micro-service.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
a micro-service fault positioning method based on a self-encoder and a service dependency graph comprises the following steps:
step 1: collecting an index of the utilization rate of physical resources of a server host and an index of response time of a call request between micro services in the running process of the micro service system;
step 2: invoking the monitoring sequence of the response time as training data to train a self-encoder model, reconstructing the response time, and judging whether the micro-service system is abnormal or not by calculating a reconstruction error of the response time index data;
step 3: generating a corresponding node by mapping each micro service, analyzing the call relation between communication data capture micro services among each micro service, constructing directed edges among the nodes through the call relation among the nodes, and constructing a service call relation graph by taking the reconstruction error of the response time index data as a node abnormal weight value;
step 4: correlating the response time index data reconstruction error with physical resource utilization index data of a service host, and calculating abnormal weight of each graph node in a service call relation graph;
step 5: based on the updating of the abnormal weight of each graph node in the service call relation graph in the step 4, a weighted PageRank algorithm is used for deducing and positioning the fault micro-service causing the abnormality.
Wherein:
in step 1, the physical resource utilization index includes an index of the physical machine or virtual machine layer resource usage condition of the running micro service instance, the response time index includes an index of the time spent by the micro service in the micro service system for responding to other micro service requests, and the physical resource utilization index and the response time index are monitored and collected in real time through a Promitus tool.
In step 2:
will micro-serve v
i Response time monitoring index sequence data collected at time t for a time window w
As input to the self-encoder for training completion, where V represents the set of micro-services, +.>
For a vector of dimension h, the +.>
Potential feature representation mapped to d-dimension +.>
Where g is the activation function, h is the number of response time monitoring index collections within the time window w, d is the dimension of the potential feature representation,
is the weight matrix of h rows and d columns of the input layer and the hidden layer, b is the h-dimensional bias vector of the input layer, and the potential characteristic is represented by the decoder +.>
Reconstruction as microservice v
i Is a response time index monitoring sequence data +.>
Wherein the method comprises the steps of
Is the weight matrix of d rows and h columns of the hidden layer and the output layer, c is the d-dimensional bias vector of the hidden layer,
calculate->
And->
Reconstruction mean square error between->
Use of micro-service v during normal operation of micro-service system during self-encoder training phase
i Is used as training data to train a self-encoder model through multiple rounds of trainingThereafter, the converged self-encoder model learns the characteristics of the normal response time series data, and the self-encoder model performs the micro-service v
i The reconstruction value of the response time monitoring value in normal operation is close to the monitoring value, the corresponding reconstruction error is smaller and fluctuates in a stable range, the mean value mu and standard deviation sigma of the reconstruction error at the moment are calculated, and the micro service v is determined
i Is not less than the abnormality detection threshold alpha
i =μ+3σ, in microservice v
i In the process of detecting the running state in real time, the error is reconstructed
Then consider micro service v
i An abnormality occurs.
The specific flow in the step 3 is as follows:
step 3-1: the microservice set in the microservice system is denoted as v= { V 1 ,v 2 ,…,v n Where n represents the number of micro services, for any v i E V, map generation map node s i Finally, a graph node set S= { S is obtained 1 ,s 2 ,…,s n };
Step 3-2: capturing call relationships between micro services by parsing communication data between the micro services i To micro-service v j Sending a service request, constructing a slave graph node s i Pointing graph node s j Is directed to edge z of (2) ij Finally, an edge set Z= { Z is formed ij The service request is only constructed into a directed edge, and a service call relation diagram without abnormal weight is generated;
step 3-3. Micro service v
i Is used as a graph node s
i Initial anomaly weight of (a)
Traversing and calculating initial abnormal weight of each micro-service to obtain a graph node abnormal weight set +.>
Anomaly in anomaly weight set FWeight->
As graph node s
i And finally obtaining a service call relation graph G (S, Z, F).
In step 4, based on the service call relationship graph G (S, Z, F), the abnormal weight of each graph node is automatically updated according to the abnormal weight relationship between adjacent graph nodes in the service call relationship graph, and for any graph node S j E S, j e {1,2, …, n }, will contain the directed graph node S j Adjacent graph nodes of directed edges of (a) form a set AN (s j ) Will contain a pointing AN(s) j ) Adjacent graph nodes of the directed edge of any one of the graph nodes form a set NAN (s j ),
Computing AN(s) j ) Average anomaly weight aScare(s) j ):
Wherein the method comprises the steps of
Representing a graph node s
i Abnormal weights of (1), inDegree(s)
j ) Representing a graph node s
j Is of the order of entry, NAN (s
j ) Average anomaly weight cScore(s)
j ):
Wherein aScore(s) j ) Reflects AN(s) j ) Overall degree of abnormality, cScore (s j ) Representing NAN(s) j ) The degree of abnormality as a whole was combined with aScore (s j ) And cScore(s) j ) Feature calculation graph node s of (a) j Is of anomaly weight acScore(s) j ):
acScore(s j )=aScore(s j )-cScore(s j ) (6)
acScore(s
j ) The higher the value of (2), the sectionPoint s
j The higher the overall degree of anomaly of the neighboring graph nodes, the more AN(s)
j ) The lower the degree of abnormality of the adjacent graph node, the graph node s
j Corresponding microservice v
j The higher the probability of being the root cause of the fault, the micro-service v is measured by pearson correlation function
i Response time series reconstruction errors collected at time t over a time window
Deploying microservices v
i Correlation of each physical resource monitoring index sequence data on a host computer:
wherein,,
sequence data representing the physical resource monitoring index of item r collected at time t for a time window w,/v>
Representation->
R.epsilon. {1,2, …, k }, k representing the number of physical resources,/->
Representing microservices v
i Reconstruction of the response time monitoring value at time e,/->
Representing microservices v
i Response time monitoring value at time e, +.>
Representing reconstruction errors
Is the mean of (v) microservice v
i Through->
The score is expressed by the height of the score, and is combined with acScore (s
i ) And->
Calculation graph node s
i Final anomaly weight AS(s)
i ):
In step 5:
the abnormal weight of each micro service is calculated in a traversing way to finish the updating of the abnormal weights of all graph nodes in the service call relation graph, a weighted PageRank algorithm is adopted to 'randomly walk' in the service call relation graph G (S, Z, F), and a service call relation graph node transition probability matrix U is defined firstly:
wherein u is ij Representing slave graph node s j Random walk to graph node s i The abnormal weight of the graph node is related to the walk probability, and the slave graph node s is calculated j Random walk to graph node s i Probability u of (2) ij :
Wherein s is j →s i Representing the existence of a slave graph node s j Pointing graph node s i Is a directional edge of linkut(s) j ) Representing a graph node s j Abnormal weight sum of all graph nodes pointed to, for any graph node s i E S, i= {1,2, L, n }, initializing PR score to PR 0 (s i ) =1/n, the PR scores of all graph nodes are expressed as vector R 0 :
R 0 =(PR 0 (s 1 ),PR 0 (s 2 ),…,PR 0 (s n )) T (11)
During each round of random walk, the PR score of each graph node is iteratively updated:
R c =dU·R c-1 +(1-β)R 0 (12)
wherein R is c PR score vectors representing all graph nodes after iteration round c, U.epsilon.R n*n Representing a random walk probability matrix, beta epsilon (0, 1) representing a sony coefficient, wherein generally beta=0.85, after iterative updating, PR scores of each graph node tend to converge, the higher the PR score of each graph node is, the greater the probability that the corresponding micro service is a fault root cause is, and finally, a ranking list of the fault root cause micro services is output according to the sequence from high to low of the PR scores of the graph nodes.
The invention also provides a device for using the micro-service fault positioning method based on the self-encoder and the service dependency graph, which comprises a data collection module, an anomaly detection module and a fault positioning module, wherein the data collection module is responsible for collecting the physical resource utilization index of a server host and the response time index of a call request between micro-services in the running process of the micro-service system; the anomaly detection module trains a self-encoder model by calling the monitoring sequence of the response time as training data, reconstructs the response time, and judges whether the micro-service system is abnormal or not by calculating the reconstruction error of the response time index data; the fault locating module generates a corresponding node from each micro service map, analyzes the communication data between each micro service to capture the call relation between the micro services, constructs a directed edge between the nodes through the call relation between the nodes, constructs a service call relation diagram by taking the response time index data reconstruction error as a node abnormal weight value, associates the response time index data reconstruction error with the physical resource utilization index data of the service host, calculates the abnormal weight of each graph node in the service call relation diagram, and deduces and locates the abnormal fault micro service caused by the abnormality based on the update of the abnormal weight of each graph node in the service call relation diagram by using a weighted PageRank algorithm.
The invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the above-described self-encoder and service dependency graph based micro-service fault locating method steps.
The invention also provides an electronic device comprising a processor and a memory, wherein the memory stores a computer program which, when executed by the processor, realizes the steps of the micro-service fault positioning method based on the self-encoder and the service dependency graph.
The invention also provides a computer program product characterized by comprising a computer program/instruction which, when executed by a processor, implements the above-mentioned microservice fault location method steps based on self-encoders and service dependency graphs.
Compared with the prior art, the invention has the advantages that:
1. according to the invention, the association relation between the micro-service running state and response time fluctuation is learned through training the self-encoder model, so that the micro-service running state is detected in real time, and the problem that various monitoring index thresholds are required to be manually set for abnormality diagnosis in the existing micro-service fault positioning method is solved; and the call relations among the micro services are captured by analyzing the communication data among the micro services, so that a service call relation diagram is constructed to simulate a fault propagation path, and the abnormal weights of nodes in the call relation diagram are updated by combining the reconstruction errors of the response time of the self-encoder to the micro services and the utilization rate of system resources, so that the fault micro services are automatically positioned based on a weighted PageRank algorithm, and the fault positioning accuracy is improved.
2. According to the invention, the monitoring index data in the normal running state of the micro-service is used as the input of the self-encoder, and the self-encoder is used for encoding and reconstructing, so that compared with the traditional anomaly detection method, the method can capture the index data hiding characteristic in the normal running state of the micro-service, thereby improving the accuracy of real-time anomaly detection.
3. The invention uses the reconstruction error of the encoder to the micro-service monitoring index as the abnormal weight of the service call relation graph node, and the reconstruction error reflects the deviation degree of the real-time monitoring index and the normal monitoring index, so the abnormal degree of the micro-service can be reflected by the reconstruction error (namely the abnormal weight).
4. The invention measures micro services as fault root likelihood size by introducing an aScore score, a cScore score and an acScore score concept, wherein aScore(s) j ) Reflects AN(s) j ) Degree of abnormality as a whole. cScore(s) j ) Representing NAN(s) j ) Degree of abnormality as a whole. AcScare(s) j ) Integrate the nodes s j Itself and its surrounding nodes, acScore(s) j ) The higher the indicating node s j The overall anomaly degree of the adjacent node is high, and AN(s) j ) If the overall degree of abnormality of the adjacent node is low, then node s j Corresponding microservice v j The higher the probability of being the root cause of the fault.
5. The invention updates the abnormal weight of each node by calculating the correlation between the monitoring index of the response time and the sequence data of each physical resource monitoring index through the pearson correlation function, takes the resource utilization rate as a part of the abnormal weight calculation, and enhances the correlation between the positioning of the micro-service fault root cause and the resource utilization rate.
6. According to the invention, through improving the PageRank algorithm and based on the abnormal weight of the connected nodes in the service call relation diagram, the weighted PageRank algorithm is designed, so that the migration strategy in the random walk algorithm is related to the abnormal weight relation of each node and the connected nodes, and the node frequency with higher migration abnormal weight is higher.
Detailed Description
Embodiments of the present invention are described in further detail below with reference to the accompanying drawings.
The invention provides a method and a device for automatically positioning micro-service faults based on a self-encoder and a service dependency graph, wherein the overall framework of the method and the device is shown in figure 1 and mainly comprises 3 modules. The data collection module is responsible for collecting application program indexes and system level index data, wherein the application program indexes are used for detecting application performance problems, and the system level indexes are used for updating node weights in a subsequent service call relation graph. The anomaly detection module is used for carrying out coding reconstruction on the application index data through the self-encoder and detecting whether the micro-service module is abnormal or not. Once the system abnormality is detected, the fault locating module constructs a service call relation diagram by analyzing call relations among the micro services to analyze an abnormal propagation path, calculates the abnormal weight of each micro service node in the service call relation diagram by utilizing the correlation of the system resource utilization rate and the micro service performance, and finally locates the fault root cause micro service module by utilizing a weighted PageRank algorithm.
Referring to fig. 2, fig. 2 is a flowchart of a method for automatically locating a micro service fault based on a self-encoder and a service dependency graph according to an embodiment of the present invention, and specifically, the method includes:
step 1: and monitoring and collecting the physical resource utilization rate index and the response time index called between the micro services in the running process of the micro service system in real time through a Promises tool. The physical resource utilization index is a type of index reflecting the use condition of physical machine or virtual machine level resources running the micro service instance, such as CPU utilization rate, memory utilization rate and the like; the response time index is an index reflecting the length of time it takes for a micro service in the micro service system to respond to other micro service requests.
The invention expresses k physical resource monitoring indexes as M= { M
1 ,m
2 ,…,m
k Continuously monitoring value of any monitoring index mr is expressed as time series data
Wherein m is
r E M, r e {1,2, … k }. To model index m
r The change relation of the monitoring value is that the index m is intercepted at the monitoring index collecting time t through a sliding window w with the time length of h
r Is expressed as +.>
Wherein->
Representing index m
r Monitoring value at time t. The monitoring sequence data which is collected by k physical resource monitoring indexes at the time t and comprises a time window w is formed into a monitoring index data matrix +.>
In a micro service system v= { V having n micro service modules
1 ,v
2 ,…,v
n In } for any microservice v
i E V, the response time monitoring sequence data collected at time t and containing a time window w is expressed as
Wherein->
Representing microservices v
i A monitored value of response time at time t. The response time monitoring sequence data collected by n micro-services at time t for a time window w is formed into a matrix +.>
Step 2: will micro-serve v
i Response time monitoring index sequence data collected at time t for duration of time window w
As input to the self-encoder after training, the coding layer will be +.>
Mapping to d-dimensional latent feature representation +.>
Where g is the activation function, h is the number of response time monitoring index collections within the time window w, d is the dimension of the potential feature representation,
is the weight matrix of the h rows and d columns of the input layer and the hidden layer, and b is the h-dimensional bias vector of the input layer. The potential feature is then represented by the decoder +.>
Reconstruction as a micro-service module v
i Is a response time index monitoring sequence data +.>
Wherein the method comprises the steps of
Is the weight matrix of d rows and h columns of the hidden layer and the output layer, c E R
d+1 Is the bias vector of the hidden layer,
then calculate +.>
And->
Reconstruction mean square error between->
Use of micro-service v during normal operation of micro-service system during self-encoder training phase
i Is used as training data to train a self-encoder model. After multiple rounds of training, the converged self-encoder model learns the characteristics of the normal response time series data. Thus from encoder model pair micro-services v
i The reconstruction value of the response time monitoring value in normal operation is close to the monitoring value, and the corresponding reconstruction error is small and fluctuates in a stable range. Determining a micro service module v by calculating the mean value mu and the standard deviation sigma of the reconstruction error at the moment
e Is not less than the abnormality detection threshold alpha
i =μ+3σ. In micro service v
i In the process of detecting the running state in real time, if a reconstruction error is found
Then consider micro service v
i An abnormality occurs.
Step 3: generating a corresponding node by mapping each micro service, capturing call relations among the micro services by analyzing communication data among the micro services, constructing directed edges among the nodes, and constructing a service call relation graph by taking a reconstruction error of response time index data of each micro service as a node abnormal weight value, wherein the specific flow is as follows:
step 3-1: the microservice set in the microservice system is denoted as v= { V 1 ,v 2 ,…,v n Where n represents the number of micro services. For any v i E V, map generation map node s i Finally, a graph node set S= { S is obtained 1 ,s 2 ,…,s n };
Step 3-2: capturing call relationships between micro services by parsing communication data between the micro services, if micro service v i To micro-service v j Sending a service request, constructing a slave s i Direction s j Is directed to edge z of (2) ij Form edge set z= { Z ij The number of the service requests is greater than or equal to 1, the number of the service requests is less than or equal to n, and only one directed edge is constructed by the same service request;
step 3-3: will micro-serve v
i As graph node s, the reconstruction error of the response time monitoring index of (c)
i Initial anomaly weight of (a)
Traversing and calculating abnormal initial weights of each micro service to obtain a graph node abnormal weight set +.>
Finally we get the service invocation relationship graph G (S, Z, F).
Step 4: the "walk" strategy of the random walk algorithm between each node depends on the walk probability between the nodes, and the walk probability is associated with the abnormal weight of the nodes, so that the abnormal weight setting of each graph node in the service call relation graph G (S, Z, F) is particularly critical to the fault positioning accuracy of the random walk algorithm. The invention updates the abnormal weight of the graph nodes through the relationship between the graph nodes in the service call relationship graph and the relationship between the micro service running state and the host resource utilization rate, so that the graph nodes followThe graph nodes with higher degree of abnormality in the machine walk algorithm obtain larger walk probability, so that the interpretation of fault positioning is enhanced. Based on the service call relation graph G (S, Z, F), the abnormal weight of each graph node is automatically updated according to the abnormal weight relation between adjacent graph nodes in the service call relation graph. For any graph node s j E S, j e {1,2, …, n }, will contain the directed graph node S j Adjacent graph nodes of directed edges of (a) form a set AN (s j ) Will contain a pointing AN(s) j ) Adjacent graph nodes of the directed edge of any one of the graph nodes form a set NAN (s j )。
Then calculate AN(s) j ) Average anomaly weight aScare(s) j ):
Wherein the method comprises the steps of
Representing a graph node s
i Abnormal weights of (1), inDegree(s)
j ) Representing a graph node s
j Is included in the (a) is included in the (b). NAN(s)
j ) Average anomaly weight cScore(s)
j ):
Wherein aScore(s) j ) Reflects AN(s) j ) Degree of abnormality as a whole. cScore(s) j ) Representing NAN(s) j ) Degree of abnormality as a whole. Binding to aScore(s) j ) And cScore(s) j ) Feature calculation node s of (a) j Is of anomaly weight acScore(s) j ):
acScore(s j )=aScore(s j )-cScore(s j ) (18)
Obviously, acScore (s
j ) Integrate graph nodes s
j Itself and its surrounding graph nodes, acScore(s)
j ) The higher the value of (a) indicates the graph node s
j Adjacent graph node wholeThe degree of abnormality is high, and AN(s)
j ) If the overall degree of anomaly of adjacent graph nodes of (a) is lower, then the graph node s
j Corresponding microservice v
j The higher the probability of being the root cause of the fault. In addition, the response time of the micro-service is related to the change of the performance index of the host where the micro-service is deployed, and the invention selects the Pearson correlation function to measure the micro-service v
i Response time series reconstruction errors collected at time t over a time window
Deploying microservices v
i Correlation of each physical resource monitoring index sequence data on a host computer:
wherein,,
sequence data representing the physical resource monitoring index of item r collected at time t for a time window w,/v>
Representation->
R.epsilon {1,2, …, k }, }>
Representing microservices v
i Reconstruction of the response time monitoring value at time e,/->
Representing microservices v
i Response time monitoring value at time e, +.>
Representing reconstruction error +.>
Is the mean of (v) microservice v
i Can be controlled by->
The score is embodied by the height, so the invention is combined with acScore (s
i ) And->
Calculation graph node s
i Final anomaly weight AS(s)
i ):
Step 5: and (3) traversing and calculating the abnormal weight of each micro service node to finish updating of the abnormal weights of all graph nodes in the service call relation graph, then adopting a weighted PageRank algorithm to perform random walk in the service call relation graph G (S, Z, F), calculating the walk probability by using the relation between the abnormal weights of the graph nodes and the abnormal weights of the connected graph nodes, and improving the positioning accuracy of the positioning fault root cause. The walk strategy of the weighted PageRank algorithm is based on the probability that each graph node accesses other graph nodes, so that the service call relationship graph node transition probability matrix U needs to be defined first:
wherein u is ij Representing slave graph node s j Random walk to graph node s i Is a probability of (2). The abnormal weight value of the graph node is related to the walk probability, and the slave graph node s is calculated j Random walk to graph node s i Probability u of (2) ij :
Wherein s is j →s i Indicating the presence ofFrom graph node s j Pointing graph node s i Is a directional edge of linkut(s) j ) Representing a graph node s j Abnormal weight sum of the directed graph nodes. For any graph node s i E S, i= {1,2, L, n }, initializing PR score to PR 0 (s i ) Let PR score of all graph nodes be expressed as vector R =1/n 0 :
R 0 =(PR 0 (s 1 ),PR 0 (s 2 ),…,PR 0 (s n )) T (23)
During each round of random walk, the PR score of each graph node is iteratively updated:
R c =dU·R c-1 +(1-β)R 0 (24)
wherein R is c PR score vectors representing all graph nodes after iteration round c, U.epsilon.R n*n Representing a random walk probability matrix, β e (0, 1) represents the sony coefficient, typically β=0.85. After continuous iterative updating, PR scores of each graph node tend to converge, at this time, the higher the PR score of each graph node is, the greater the probability that the corresponding micro-service module is a fault root cause is, and finally, a ranking list of the fault root cause micro-service is output according to the sequence from high PR scores of the graph nodes to low PR scores.
In order to evaluate the effectiveness of the invention, the invention adopts evaluation indexes AC@K and MAP as indexes for measuring the fault positioning effect, wherein AC@K represents the probability of including fault root micro-services in the first K micro-services output by root prediction. The higher the AC@K score is, the more accurate the representation model locates faults; MAP quantization fault location average accuracy.
Table 1 shows the fault location accuracy of the present invention on different micro-services in the Sock-shop in case of faults such as injection network delay (Latency), CPU resource shortage (CPU Hog), memory leakage (Memory Leak), etc. As can be seen from Table 1, the average fault location accuracy (MAP) of the MicroEncoder on each microservice is over 85%.
TABLE 1
To evaluate the effectiveness of the present invention, the present invention was compared to other advanced methods, including Random selection, microRCA, AAMR, and the like. Firstly, the accuracy of fault location under three fault types of network delay (Latency), CPU resource shortage (CPU Hog) and Memory leakage (Memory Leak) is tested and compared by the method and the method. The experimental results are shown in fig. 3, and under the conditions of three faults injection and different values of K, the fault positioning accuracy of the invention is higher than that of other methods, which indicates that the invention really and effectively improves the accuracy in the aspect of fault positioning.
Further, the average fault location accuracy of the method is measured by calculating a fault location evaluation index MAP, and an experimental result is shown in FIG. 4. As can be seen from fig. 4, the average fault location accuracy of the present invention for network delay (Latency) is 92%, the average fault location accuracy for CPU resource shortage (CPU Hog) is 86.4%, and the average fault location accuracy for Memory Leak (Memory Leak) is 91.2%, which is superior to the comparison method.
The invention also provides a device of the micro-service fault positioning method based on the self-encoder and the service dependency graph, which comprises a data collection module, an anomaly detection module and a fault positioning module, wherein the data collection module is used for collecting micro-service physical resource indexes and response time indexes, the response time indexes are used for detecting the response speed of calling among micro-services, and the physical resource indexes are used for updating the abnormal weights of graph nodes in the service calling relation graph; the anomaly detection module carries out coding reconstruction on the data of the data collection module through the self-encoder, and detects whether the micro-service is abnormal or not; the fault locating module analyzes an abnormal propagation path by analyzing a service call relation graph by analyzing call relations among micro services, calculates abnormal weight of each micro service in the service call relation graph by using physical resource indexes and micro service response time indexes, and locates the position of the fault root cause micro service by using a weighted PageRank algorithm.
Referring to fig. 5, which shows a schematic structural diagram of a micro-service fault location device based on a self-encoder and a service dependency graph according to an exemplary embodiment of the present invention, the device provided in this embodiment includes a data collection unit 301, an anomaly detection unit 302, and a fault location unit 303. The data collection unit 301 is responsible for collecting micro-service physical resource indexes and response time indexes, wherein the response time indexes are used for detecting the response speed of calling among micro-services, and the physical resource indexes are used for updating the abnormal weights of the graph nodes in the service calling relation graph; the anomaly detection unit 302 performs coding reconstruction on the data of the data collection module through a self-encoder to detect whether the micro-service is abnormal or not; the fault location unit 303. And constructing a service call relation graph by analyzing call relations among the micro services, analyzing an abnormal propagation path, calculating the abnormal weight of each micro service in the service call relation graph by using the physical resource index and the micro service response time index, and positioning a fault root cause by using a weighted PageRank algorithm.
Referring to fig. 6, a schematic structural diagram of a micro-service fault locating device based on a self-encoder and a service dependency graph according to an embodiment of the present application is shown, hereinafter referred to as device 6, where the device 6 may be integrated in the foregoing electronic apparatus, and as shown in fig. 6, the device includes a memory 602, a processor 601, an input device 603, an output device 604, and a communication interface. The memory 602 may be a separate physical unit, and the memory 602, the processor 601, the transceiver 603 may be connected to the processor 601, the input device 603, and the output device 604 through buses, may be integrated, implemented by hardware, or the like. The memory 602 is used to store a program implementing the above method embodiment, or the respective modules of the apparatus embodiment, and the processor 601 invokes the program to perform the operations of the above method embodiment. Input devices 602 include, but are not limited to, a keyboard, a mouse; output devices include, but are not limited to, display screens. Communication interfaces are used to transmit and receive various types of messages, including but not limited to wireless interfaces or wired interfaces.
Alternatively, when part or all of the distributed task scheduling method of the above-described embodiment is implemented by software, the apparatus may include only the processor. The memory for storing the program is located outside the device and the processor is connected to the memory via a circuit/wire for reading and executing the program stored in the memory. The processor may be a central processor (central processing unit, CPU), a network processor (ne twork processor, NP) or a combination of CPU and NP.
The processor may further comprise a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD), or a combination thereof. The PLD may be a complex programmable logic device (complexprogrammable logic device, CPLD), a field-programmable gate array (field-progr ammable gatearray, FPGA), general-purpose array logic (generic array logic, GAL), or any combination thereof. The memory may include volatile memory (volatile memory), such as access memory (randomaccess memory, RAM); the memory may also include a nonvolatile memory (non-volatile memory), such as a flash memory (flash memory), a hard disk (HDD) or a Solid State Drive (SSD); the memory may also comprise a combination of the above types of memories.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the invention without departing from the principles thereof are intended to be within the scope of the invention as set forth in the following claims.