[go: up one dir, main page]

CN119883852B - Server performance intelligent monitoring method and system - Google Patents

Server performance intelligent monitoring method and system Download PDF

Info

Publication number
CN119883852B
CN119883852B CN202510370279.0A CN202510370279A CN119883852B CN 119883852 B CN119883852 B CN 119883852B CN 202510370279 A CN202510370279 A CN 202510370279A CN 119883852 B CN119883852 B CN 119883852B
Authority
CN
China
Prior art keywords
load
performance
data
performance data
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202510370279.0A
Other languages
Chinese (zh)
Other versions
CN119883852A (en
Inventor
曾泓瀚
罗忠少
苏景
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yunhan Technology Co ltd
Original Assignee
Shenzhen Yunhan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yunhan Technology Co ltd filed Critical Shenzhen Yunhan Technology Co ltd
Priority to CN202510370279.0A priority Critical patent/CN119883852B/en
Publication of CN119883852A publication Critical patent/CN119883852A/en
Application granted granted Critical
Publication of CN119883852B publication Critical patent/CN119883852B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3433Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment for load management
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本发明公开了服务器性能智能监控方法及系统,涉及服务器性能技术领域,包括分析负载历史数据确定负载的区间范围和变化程度,为后续负载的区间范围的拆分提供可靠基础,兼顾了服务器负载的以往变化情况,从而对负载的详细区间范围与性能数据之间的关联关系进行准确描述。确认负载的不同部分区间下负载与性能历史数据的关联关系,得到负载‑性能的标准关联关系,在负载与性能历史数据的对应关系中,挑选出性能历史数据的正常值。将负载数据和性能数据输入到负载‑性能的标准关联关系中来确定偏差情况,分析偏差情况输出目标服务器的当前性能信息,提高服务器性能监控的准确性和适应性,有效的保证了服务器的高效安全运行。

The present invention discloses a method and system for intelligent monitoring of server performance, which relates to the technical field of server performance, including analyzing historical load data to determine the range and degree of change of the load, providing a reliable basis for the subsequent splitting of the range of the load, taking into account the previous changes in the server load, and accurately describing the relationship between the detailed range of the load and the performance data. Confirm the relationship between the load and the historical performance data in different partial intervals of the load, obtain the standard relationship between the load and the performance, and select the normal value of the historical performance data in the corresponding relationship between the load and the historical performance data. Input the load data and the performance data into the standard relationship between the load and the performance to determine the deviation, analyze the deviation and output the current performance information of the target server, improve the accuracy and adaptability of the server performance monitoring, and effectively ensure the efficient and safe operation of the server.

Description

Intelligent monitoring method and system for server performance
Technical Field
The invention relates to the technical field of server performance, in particular to an intelligent monitoring method and system for server performance.
Background
The background technology of the intelligent monitoring scheme for the server performance mainly stems from the continuous improvement of the requirements of the information technology field on the stability and the reliability of the server. With the expansion of enterprise business and the acceleration of digital transformation, the server is used as a core infrastructure supporting various application systems, and the performance of the server directly influences the continuity of business and user experience. The traditional server management mode is often difficult to accurately find potential problems in real time, so that fault early warning and disposal are delayed.
Therefore, an intelligent monitoring scheme is developed, which combines data collection, data analysis and early warning mechanisms, and performs anomaly detection and performance trend prediction by using a machine learning algorithm through collecting key performance indexes such as load, response time, throughput and the like of a server in real time. The techniques not only improve the accuracy and efficiency of monitoring, but also realize comprehensive evaluation and optimization of the server performance. The intelligent monitoring scheme can timely find out performance bottlenecks and potential fault risks, provides powerful data support for operation and maintenance teams, and ensures stable operation of the server and continuity of service.
In the prior art, the performance of the server is monitored according to the parameter of the performance data, and the influence of different loads of the server on the performance is not considered, so that the accuracy and adaptability of the performance monitoring of the server are poor, and the safe operation of the server cannot be effectively ensured.
Therefore, how to improve the accuracy and adaptability of server performance monitoring is a technical problem to be solved at present.
Disclosure of Invention
The invention aims to solve the problem that the accuracy and the adaptability of server performance monitoring are poor because the influence of different load conditions is not considered in the prior art, and provides a server performance intelligent monitoring method which comprises the following steps of,
Collecting load historical data of a target server, analyzing the load historical data to determine a range and a change degree of a load, and splitting the range of the load according to the change degree to obtain a plurality of partial intervals of the load;
Collecting performance history data of a target server according to a plurality of partial intervals of a load, and confirming association relations between the load and the performance history data under different partial intervals of the load to obtain standard association relations of the load and the performance;
load data and performance data of a target server in a current period of time are obtained, and the load data and the performance data are input into a standard association relation of load-performance to determine deviation conditions;
and analyzing the deviation condition and outputting the current performance information of the target server so as to monitor the performance of the server.
In some embodiments of the application, the load history data is analyzed to determine the range of intervals and the degree of variation of the load, including,
Drawing a histogram of the load according to the load history data, marking a minimum value and a maximum value of the load on the histogram, and taking a range from the minimum value to the maximum value of the load as an interval range of the load;
Identifying a load gradual change section and a load jump section from load historical data, and respectively integrating the load gradual change section and the load jump section to obtain a load gradual change section record and a load jump section record;
Determining the occurrence frequencies of different ranges under the interval range of the load based on the load gradual change section record and the load jump section record, respectively determining the frequent weights of the load gradual change section record and the load jump section record under the different ranges respectively through the occurrence frequencies, differentiating the different ranges of the load gradual change section record and the load jump section record, and carrying out differential weighting by combining the frequent weights under the corresponding ranges to obtain the gradual change degree and the jump change degree;
the degree of change of the interval range of the load is determined by integrating the degree of change of the gradual change and the degree of change of the jump.
In some embodiments of the present application, the range of intervals of the load is split according to the degree of variation, resulting in a plurality of partial intervals of the load, including,
Calculating the discrete degree of the load historical data, determining a splitting quantity according to the discrete degree and the change degree of the interval range of the load, and correspondingly splitting the interval range of the load through the splitting quantity to obtain a plurality of partial intervals of the load.
In some embodiments of the present application, the association of the load with the performance history data in different partial intervals of the load is confirmed to obtain a standard association of the load-performance, including,
Classifying the performance data, determining the corresponding performance data types and the distribution range of each performance data under different part intervals of the load, calculating the sensitivity between the load and each performance data, and counting the distribution parameters of each performance data;
the distribution parameters of the performance data comprise mean, standard deviation, quartile, IQR, skewness and kurtosis;
setting a first Z score threshold according to the mean value and the standard deviation, setting a second Z score threshold according to the quartile and the IQR, and setting a third Z score threshold according to the skewness and the kurtosis;
Setting a fourth Z score threshold by combining the first Z score threshold, the second Z score threshold and the third Z score threshold;
and distinguishing normal values and abnormal values in the distribution range of each performance data through a fourth Z score threshold value, eliminating the abnormal values, and constructing a standard association relation of load-performance according to the normal values and the sensitivity between the load and each performance data in the distribution range of the performance data.
In some embodiments of the application, the standard association of load-performance is constructed from normal values in the distribution range of performance data and the sensitivity between load and each performance data, including,
Fitting the relation between the load and the distribution range of the performance data under different part intervals on the basis of the normal value in the distribution range of the performance data to obtain a regression model, and adjusting the regression coefficient in the regression model through the sensitivity between the load and each performance data to describe the standard association relation of the load and the performance.
In some embodiments of the application, the load data and the performance data are input into a standard association of load-performance to determine a bias condition, including,
Inputting load data and performance data into a corresponding regression model according to different partial intervals of the load data, counting deviation under each load partial interval, and drawing a deviation curve of each load partial interval;
And integrating the deviation curves of all the load part intervals, extracting the curve characteristics of all the deviation curves, and generating a deviation condition.
In some embodiments of the present application, analyzing the bias conditions to output current performance information of the target server includes,
Determining curve characteristics under each type of performance data, generating a matching coefficient under each type of performance data by integrating the curve characteristics, determining a basic performance index of each type of performance data according to parameters of each type of performance data, and determining a performance index by combining the basic performance index and the matching coefficient of each type of performance data;
and integrating the performance indexes under all the performance data categories to output the current performance information of the target server.
In some embodiments of the present application, the performance metrics under all performance data categories are integrated to output current performance information for the target server, including,
Integrating the performance indexes under all performance data categories to obtain the overall performance index of the target server;
and analyzing the performance indexes under all the performance data categories to obtain the performance status of each type of performance data, and taking the performance status, the performance indexes and the overall performance indexes of each type of performance data as the current performance information of the target server.
Correspondingly, the application also provides an intelligent monitoring system for server performance, which comprises,
The first module is used for collecting load historical data of the target server, analyzing the load historical data to determine a range and a change degree of a load, and splitting the range of the load according to the change degree to obtain a plurality of partial intervals of the load;
The second module is used for collecting performance history data of the target server according to a plurality of partial intervals of the load, confirming association relations between the load and the performance history data under different partial intervals of the load, and obtaining standard association relations of the load-performance;
The third module is used for acquiring load data and performance data of the target server in a current period of time, and inputting the load data and the performance data into a standard association relation of load-performance to determine a deviation condition;
and the fourth module is used for analyzing the deviation condition and outputting the current performance information of the target server so as to monitor the performance of the server.
Compared with the prior art, the invention has the beneficial effects that:
1. The load history data is analyzed to determine the interval range and the change degree of the load, a reliable basis is provided for splitting the interval range of the subsequent load, and the previous change condition of the load of the server is considered, so that the association relation between the detailed interval range of the load and the performance data is accurately described. And confirming the association relation between the load and the performance history data in different part intervals of the load to obtain a standard association relation between the load and the performance, selecting normal values of the performance history data in the corresponding relation between the load and the performance history data, constructing the standard association relation between the load and the performance, and ensuring the accuracy of the description of the standard association relation between the load and each performance data in different load intervals.
2. The load data and the performance data are input into the standard association relation of load-performance to determine the deviation condition, the current performance information of the target server is output by analyzing the deviation condition, the accuracy and the adaptability of the performance monitoring of the server are improved, and the efficient and safe operation of the server is effectively ensured.
Drawings
FIG. 1 is a schematic flow chart of a method for intelligently monitoring server performance;
fig. 2 is a schematic structural diagram of the intelligent monitoring system for server performance according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.
Referring to fig. 1, the server performance intelligent monitoring method includes the steps of,
Step S101, collecting load historical data of a target server, analyzing the load historical data to determine a range and a change degree of a load, and splitting the range of the load according to the change degree to obtain a plurality of partial intervals of the load.
In this embodiment, system commands (e.g., uptime, top, sar, etc.) or specialized monitoring tools (e.g., zabbix, nagios, prometheus, etc.) are used to collect load history data for the target server. Ensuring that the selected tool is able to provide accurate, comprehensive load data including, but not limited to, key metrics such as CPU utilization, memory utilization, network bandwidth, etc. The collected load history data is stored in a secure, reliable database or file system. The data is subjected to necessary sort work, such as deduplication, formatting, etc., to ensure the integrity and consistency of the data. Preprocessing the collected load history data, including data cleaning (such as abnormal value removal, missing value removal and the like) and data normalization. The preprocessed data can be ensured to accurately reflect the load condition of the server. And analyzing the preprocessed load data by using a statistical method (such as a histogram, a box diagram and the like) to determine the interval range of the load. In the prior art, the load data is divided into three sections of low load, medium load and high load according to the distribution condition of the load data, but the load section of the fuzzy general system is not suitable for analyzing the specific relation between the load and the performance, so the load section is required to be split according to the conventional change condition of the load of a server, and the relation between the load and the performance under the specific section of the load is analyzed.
In some embodiments of the application, the load history data is analyzed to determine the range of intervals and the degree of variation of the load, including,
Drawing a histogram of the load according to the load history data, marking a minimum value and a maximum value of the load on the histogram, and taking a range from the minimum value to the maximum value of the load as an interval range of the load;
Identifying a load gradual change section and a load jump section from load historical data, and respectively integrating the load gradual change section and the load jump section to obtain a load gradual change section record and a load jump section record;
Determining the occurrence frequencies of different ranges under the interval range of the load based on the load gradual change section record and the load jump section record, respectively determining the frequent weights of the load gradual change section record and the load jump section record under the different ranges respectively through the occurrence frequencies, differentiating the different ranges of the load gradual change section record and the load jump section record, and carrying out differential weighting by combining the frequent weights under the corresponding ranges to obtain the gradual change degree and the jump change degree;
the degree of change of the interval range of the load is determined by integrating the degree of change of the gradual change and the degree of change of the jump.
In this embodiment, the interval range of the load is the range between the minimum value and the maximum value of the load on the histogram, and is all possible ranges of the load. There are two modes of load variation of the server, one is gradual and the other is direct jump. Gradual changes, where the load continues to increase or decrease over a period of time, are generally more smooth and are suitable for counting frequency by interval division to reflect the overall trend of the load. Direct jump-the load changes significantly in a short time, which is usually severe, and is suitable for capturing the fluctuation of the load by statistics of specific values. And integrating all the load gradual change sections and all the load jump sections respectively to obtain a load gradual change section record and a load jump section record. On the premise of two modes, the occurrence frequencies of different ranges under the interval range of the load are calculated, frequent weights are given, and when the difference is calculated, the difference can be weighted according to the occurrence frequencies of the load interval. The frequently occurring load interval should be given higher weight in the differential calculation so as to reflect the influence of the load interval on the whole load fluctuation more accurately, and the change degree of the load is reflected by combining the two modes.
In some embodiments of the present application, the range of intervals of the load is split according to the degree of variation, resulting in a plurality of partial intervals of the load, including,
Calculating the discrete degree of the load historical data, determining a splitting quantity according to the discrete degree and the change degree of the interval range of the load, and correspondingly splitting the interval range of the load through the splitting quantity to obtain a plurality of partial intervals of the load.
In this embodiment, the discrete degree of the load history data also illustrates the change condition of the load, and a splitting number is determined by combining the discrete degree and the change degree (such as weighted summation) of the interval range of the load, and the higher the discrete degree and the change degree, the larger the splitting number, so that the relationship between the load and the performance data can be accurately analyzed. Too many intervals may increase the complexity of the analysis, while too few intervals may not accurately reflect the relationship between load and performance, so a reasonable number of splits is determined.
Step S102, collecting performance history data of the target server according to a plurality of partial intervals of the load, and confirming association relations between the load and the performance history data under different partial intervals of the load to obtain standard association relations of the load and the performance.
In this embodiment, an appropriate performance index is selected according to an application scenario and a service requirement of the server. The performance index should be able to comprehensively reflect the performance status of the server, such as response time, throughput, number of concurrent users, etc. The usage monitoring tool or system commands collection of performance history data in terms of multiple fractional spans of the load. Ensuring that the collected performance data is consistent with the load data over time for subsequent correlation analysis.
Server performance is embodied in multiple aspects or dimensions, each of which directly reflects a different performance condition of the server. The following is a detailed summary and description of these aspects or dimensions:
1. CPU performance
The method has the advantages of high processing speed and concurrent processing capacity of the server.
Performance conditions:
Processing speed-CPU is the core component of the server, responsible for executing various computing tasks. The high-performance CPU can execute the calculation task faster, and the response speed and performance of the server are improved. For example, in application scenarios where data analysis, scientific computation, etc. require a large amount of computation, the performance of the CPU is critical.
Concurrent processing capability modern CPUs typically have multi-core designs capable of processing multiple tasks simultaneously. The multi-core CPU can enhance the concurrent processing capacity of the server, so that the server can process more user requests or tasks at the same time. This is important for running large databases, virtualized environments, or highly loaded applications.
2. Memory performance
The method has the advantages of high task processing capacity, high system stability and high database performance.
Performance conditions:
The task processing capability is that the memory is used for temporarily storing data generated in the CPU, and enough memory can enable the server to process more data and requests, so that the response speed is improved. Meanwhile, the memory also supports more programs to run simultaneously, so that the switching time between the programs is reduced, and the response speed of the system is improved.
The system stability, namely the larger memory capacity can relieve the problem of shortage of system resources and provide more sufficient running space, thereby improving the stability and reliability of the system.
Database performance, namely, the memory is used as a cache area of the database, and has important influence on the performance of the database. The larger memory capacity can provide more cache space, reduce the access times of the disk and greatly improve the query and writing speeds of the database.
3. Disk performance
The method has the advantages of high data storage and reading speed of the server and high overall system efficiency.
Performance conditions:
Data storage and read speeds high performance hard disks (e.g., SSDs or NVMe SSDs) provide data read and write speeds much higher than traditional mechanical hard disks (HDDs). This directly relates to the response time of the application and the speed of the data processing. For example, database queries, log records, file transfers, etc. may be significantly accelerated.
The faster disk read-write speed means shorter starting time of the operating system and the application program, and the overall efficiency of the system is higher. Meanwhile, the high-performance hard disk can also support higher IOPS (input/output operation times per second), and stable operation can be ensured under high load.
4. Network performance
The method and the device have the advantages of data transmission speed and concurrent request processing capability of the server.
Performance conditions:
Data transmission speed-the network bandwidth determines the speed of data transmission in the network. The high network bandwidth can reduce data transmission delay and improve the efficiency of distributed computation and remote access.
Concurrent request processing capability-in a high concurrency scenario, a server needs to process a large number of requests at the same time. Sufficient network bandwidth can support more concurrent connections, ensuring that the server can handle these requests in a timely manner.
5. Reliability and stability
The fault tolerance capability and the long-time stable operation capability of the server are embodied.
Performance conditions:
fault tolerance-high quality server hardware and software design can improve the fault tolerance of the server. For example, redundant power supplies and fans, predictable hard disk and fan failures, RAID systems, and the like are common techniques for improving server reliability.
The long-time stable operation capability is that the stability and the reliability of the server are also reflected in that the server can stably operate for a long time without faults or performance degradation. This is critical for application scenarios (e.g., websites, databases, etc.) where continuous service is required.
In this embodiment, performance history data of the target server is collected according to a plurality of partial intervals of the load, and performance history data of the target server corresponding to each interval is collected based on the partial intervals of the load.
In some embodiments of the present application, the association of the load with the performance history data in different partial intervals of the load is confirmed to obtain a standard association of the load-performance, including,
Classifying the performance data, determining the corresponding performance data types and the distribution range of each performance data under different part intervals of the load, calculating the sensitivity between the load and each performance data, and counting the distribution parameters of each performance data;
the distribution parameters of the performance data comprise mean, standard deviation, quartile, IQR, skewness and kurtosis;
setting a first Z score threshold according to the mean value and the standard deviation, setting a second Z score threshold according to the quartile and the IQR, and setting a third Z score threshold according to the skewness and the kurtosis;
Setting a fourth Z score threshold by combining the first Z score threshold, the second Z score threshold and the third Z score threshold;
and distinguishing normal values and abnormal values in the distribution range of each performance data through a fourth Z score threshold value, eliminating the abnormal values, and constructing a standard association relation of load-performance according to the normal values and the sensitivity between the load and each performance data in the distribution range of the performance data.
In the present embodiment, the sensitivity between the load and each performance data is calculated for the distribution range of each performance data (the distribution range of the parameter value size), where the sensitivity can be calculated by the pearson correlation coefficient or the like. To accurately describe the relationship between load and performance, outlier data (errors, outlier performance, etc.) in the performance data is removed. The identification of normal data and abnormal data is carried out by adopting a Z-score method, the Z-score of each data is calculated, and a Z-score threshold value is compared to judge whether the data is abnormal data.
In this embodiment, the distribution parameters of the performance data include mean, standard deviation, quartile, IQR, skewness, and kurtosis, and specifically include the following:
Mean value is the most common method, which can help us to understand the central trend of data. In the Z-score method, the mean is used to calculate the difference between each data point and the mean.
The median is another central trend indicator of the data set, which is the value of the data in the middle after the data are arranged according to the order of the sizes. The median is not affected by the extreme value, and when the extreme value exists in the data set, the median can more accurately reflect the central position of the data than the mean value.
Mode is the value in the dataset that occurs most frequently. Mode can help us understand the most common case when analyzing data distribution.
Quartile the quartile can divide the data into four parts, each part containing 25% of the data points. The distance between the first quartile (Q1) and the third quartile (Q3) is referred to as the quartile range (IQR), which is an indicator of how discrete the data is. The quartile can help us quickly identify outliers, which are generally considered to be data points that fall outside of Q1-1.5IQR and Q3+1.5 IQR.
Skewness-skewness describes the symmetry of the data distribution. Positive bias means that the data distribution is right biased, i.e. larger value, and negative bias means that the data distribution is left biased, i.e. smaller value. Skewness can affect the setting of the Z-score threshold because data distributions with greater skewness may be more prone to extreme values.
Kurtosis-kurtosis describes the degree of spike in the data distribution. Kurtosis high indicates that the data is concentrated near the mean and kurtosis low indicates that the data is more diffuse. Kurtosis may also affect the setting of the Z-score threshold, as a data distribution with higher kurtosis may more easily produce data points that are close to the mean.
Based on the mean and standard deviation, in the case of data approaching normal distribution, the mean and standard deviation can be used to calculate the Z-score and a fixed threshold (e.g., 3) can be set to identify outliers. However, if the data distribution deviates from the normal distribution or the volatility is large, the threshold value may need to be adjusted.
Regarding quartiles and IQR, for data that is more fluctuating or that deviates from normal distribution, quartiles and IQR may be used to assist in setting the Z-score threshold. For example, the IQR of the data may be calculated and Q1-1.5IQR and Q3+1.5IQR may be used as boundaries for outliers. The threshold value of the Z-score is then adjusted based on these boundaries.
Combining skewness and kurtosis the skewness and kurtosis of the data may also be combined when considering setting the Z-score threshold. If the data skewness is greater or kurtosis is higher, it may be desirable to set a looser threshold to avoid misjudging normal fluctuations as anomalies.
In this embodiment, a fourth Z-score threshold is set in combination with the first Z-score threshold, the second Z-score threshold, and the third Z-score threshold, and the calculation formula is as follows:
;
Wherein, the Is the first to loadCorresponding under each partial intervalA fourth Z-score threshold for the performance data,The contribution weights of the first Z-score threshold, the second Z-score threshold and the third Z-score threshold respectively,Respectively the first of the loadsCorresponding under each partial intervalA first Z-score threshold, a second Z-score threshold, a third Z-score threshold of performance data,AndRespectively isThe maximum value and the minimum value of the three,Is the first to loadCorresponding under each partial intervalA first constant of performance data;
In the present embodiment of the present invention, The average of the maximum value and the minimum value of the influence quantity of the first Z fraction threshold value, the second Z fraction threshold value and the third Z fraction threshold value is corrected to the average value of the three values,To balance the size of the correction function.
In some embodiments of the application, the standard association of load-performance is constructed from normal values in the distribution range of performance data and the sensitivity between load and each performance data, including,
Fitting the relation between the load and the distribution range of the performance data under different part intervals on the basis of the normal value in the distribution range of the performance data to obtain a regression model, and adjusting the regression coefficient in the regression model through the sensitivity between the load and each performance data to describe the standard association relation of the load and the performance.
In this embodiment, an appropriate regression model (such as linear regression, polynomial regression, etc.) is selected to fit the relationship between the load and the distribution range of the performance data in different partial intervals on the basis of the normal values in the distribution range of the performance data. And training the model by using the collected data to obtain a preliminary regression coefficient. The sensitivity coefficient reflects the extent to which load changes affect performance data changes. If the sensitivity coefficient is larger, the influence of the load on the performance data is obvious, otherwise, the influence is smaller. And adjusting the regression coefficient in the regression model according to the result of the sensitivity analysis. If the sensitivity of a certain performance data to the load is higher, the regression coefficient corresponding to the performance data can be appropriately increased to enhance the influence of the load on the performance data. Conversely, if the sensitivity of a certain performance data to the load is low, the regression coefficient corresponding to the performance data may be appropriately reduced to reduce the influence of the load on the performance data. And verifying the adjusted regression model by using the verification data set, and evaluating the prediction accuracy and stability of the model. If the model is not well behaved, repeating the steps to perform model optimization until a satisfactory model is obtained, and describing the standard association relationship of load-performance under different load intervals through different regression models.
Step S103, load data and performance data of the target server in a current period of time are obtained, and the load data and the performance data are input into a standard association relation of load-performance to determine deviation conditions.
In the embodiment, the real-time data collection is that a monitoring tool is used for acquiring load data and performance data of the target server in real time in a current period of time. And recording the acquired data and storing the data in a safe and reliable database or file system. The data record should include both load data and performance data for subsequent correlation analysis and monitoring. And inputting the load data and the performance data into the corresponding regression model, and counting the deviation condition of the performance data.
In some embodiments of the application, the load data and the performance data are input into a standard association of load-performance to determine a bias condition, including,
Inputting load data and performance data into a corresponding regression model according to different partial intervals of the load data, counting deviation under each load partial interval, and drawing a deviation curve of each load partial interval;
And integrating the deviation curves of all the load part intervals, extracting the curve characteristics of all the deviation curves, and generating a deviation condition.
In this embodiment, the performance deviation is the difference between the predicted value and the actual value corresponding to different loads in the regression model, and the smaller the deviation, the better the actual load is matched with the performance, and the matching degree between the relationship between the actual load and the performance and the standard association relationship between the load and the performance is described. The curve characteristics of the deviation curve include slope, extremum, smoothness, etc.
And step S104, analyzing the deviation condition and outputting the current performance information of the target server, so as to realize the monitoring of the server performance.
In this embodiment, the current performance of the target server is analyzed by combining the parameter size and the deviation condition of the performance data, and the current performance is comprehensively analyzed from the parameter size and the load-performance relationship matching condition of the performance data.
In some embodiments of the present application, analyzing the bias conditions to output current performance information of the target server includes,
Determining curve characteristics under each type of performance data, generating a matching coefficient under each type of performance data by integrating the curve characteristics, determining a basic performance index of each type of performance data according to parameters of each type of performance data, and determining a performance index by combining the basic performance index and the matching coefficient of each type of performance data;
and integrating the performance indexes under all the performance data categories to output the current performance information of the target server.
In some embodiments of the present application, the performance metrics under all performance data categories are integrated to output current performance information for the target server, including,
Integrating the performance indexes under all performance data categories to obtain the overall performance index of the target server;
and analyzing the performance indexes under all the performance data categories to obtain the performance status of each type of performance data, and taking the performance status, the performance indexes and the overall performance indexes of each type of performance data as the current performance information of the target server.
In this embodiment, all the curve features are integrated (weighted sum after normalization, mapping to obtain a matching coefficient), so as to obtain a matching coefficient between the performance data and the load, and the parameter size of the index data of each type of performance is evaluated to obtain a basic performance index (the quality of performance parameter representation), and the performance indexes under all the performance data types are integrated to output the current performance information of the target server.
;
Wherein, the For the current performance information of the target server (total server performance),For the number of categories of performance data,Is the firstThe matching coefficients of the performance data are such that,Is the firstBasic performance indicators of the performance data,Is thatA number of performance data categories less than the corresponding threshold,For the conversion coefficient of the performance difference,Is thatA first of less than a corresponding thresholdThe matching coefficients of the performance data are such that,Is thatA first of less than a corresponding thresholdBasic performance indicators of the performance data,In order to set the constant value of the preset value,Representation ofA correction of the sum of the performances of all the performance data smaller than the sum of the corresponding thresholds,Is to balance the correction function.
Correspondingly, the application also provides an intelligent monitoring system for server performance, as shown in figure 2, comprising,
The first module is used for collecting load historical data of the target server, analyzing the load historical data to determine a range and a change degree of a load, and splitting the range of the load according to the change degree to obtain a plurality of partial intervals of the load;
The second module is used for collecting performance history data of the target server according to a plurality of partial intervals of the load, confirming association relations between the load and the performance history data under different partial intervals of the load, and obtaining standard association relations of the load-performance;
The third module is used for acquiring load data and performance data of the target server in a current period of time, and inputting the load data and the performance data into a standard association relation of load-performance to determine a deviation condition;
and the fourth module is used for analyzing the deviation condition and outputting the current performance information of the target server so as to monitor the performance of the server.
Compared with the prior art, the invention has the beneficial effects that:
1. The load history data is analyzed to determine the interval range and the change degree of the load, a reliable basis is provided for splitting the interval range of the subsequent load, and the previous change condition of the load of the server is considered, so that the association relation between the detailed interval range of the load and the performance data is accurately described. And confirming the association relation between the load and the performance history data in different part intervals of the load to obtain a standard association relation between the load and the performance, selecting normal values of the performance history data in the corresponding relation between the load and the performance history data, constructing the standard association relation between the load and the performance, and ensuring the accuracy of the description of the standard association relation between the load and each performance data in different load intervals.
2. The load data and the performance data are input into the standard association relation of load-performance to determine the deviation condition, the current performance information of the target server is output by analyzing the deviation condition, the accuracy and the adaptability of the performance monitoring of the server are improved, and the efficient and safe operation of the server is effectively ensured.
From the above description of the embodiments, it will be clear to those skilled in the art that the present invention may be implemented in hardware, or may be implemented by means of software plus necessary general hardware platforms. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective implementation scenario of the present invention.
Those skilled in the art will appreciate that the drawing is merely a schematic illustration of a preferred implementation scenario and that the modules or flows in the drawing are not necessarily required to practice the invention.
Those skilled in the art will appreciate that modules in an apparatus in an implementation scenario may be distributed in an apparatus in an implementation scenario according to an implementation scenario description, or that corresponding changes may be located in one or more apparatuses different from the implementation scenario. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims (8)

1.服务器性能智能监控方法,其特征在于,包括,1. A method for intelligently monitoring server performance, comprising: 收集目标服务器的负载历史数据,分析负载历史数据确定负载的区间范围和变化程度,根据变化程度对负载的区间范围进行拆分,得到负载的多个部分区间;Collect the load history data of the target server, analyze the load history data to determine the load interval range and the degree of change, split the load interval range according to the degree of change, and obtain multiple partial load intervals; 按照负载的多个部分区间收集目标服务器的性能历史数据,确认负载的不同部分区间下负载与性能历史数据的关联关系,得到负载-性能的标准关联关系;Collecting historical performance data of the target server according to multiple partial intervals of the load, confirming the correlation between the load and the historical performance data in different partial intervals of the load, and obtaining a standard correlation between the load and the performance; 获取目标服务器当前一段时间内的负载数据和性能数据,并将负载数据和性能数据输入到负载-性能的标准关联关系中来确定偏差情况;Obtain the load data and performance data of the target server in the current period, and input the load data and performance data into the standard correlation relationship between load and performance to determine the deviation; 分析偏差情况输出目标服务器的当前性能信息,以此实现对服务器性能的监控;Analyze the deviation and output the current performance information of the target server to monitor the server performance; 其中,in, 分析负载历史数据确定负载的区间范围和变化程度,包括,Analyze historical load data to determine the load range and degree of change, including: 根据负载历史数据绘制负载的直方图,在直方图上标记出负载的最小值和最大值,将负载的最小值到最大值之间的范围作为负载的区间范围;Draw a load histogram based on the load history data, mark the minimum and maximum values of the load on the histogram, and use the range from the minimum to the maximum value of the load as the load interval range; 从负载历史数据上识别出负载渐变段和负载跳跃段,分别对负载渐变段和负载跳跃段进行各自整合,得到负载渐变段记录和负载跳跃段记录;Identify the load gradient section and the load jump section from the load history data, integrate the load gradient section and the load jump section respectively, and obtain the load gradient section record and the load jump section record; 基于负载渐变段记录和负载跳跃段记录来确定负载的区间范围下不同范围的出现频次,通过出现频次分别确定负载渐变段记录和负载跳跃段记录各自不同范围下的频繁权重,对负载渐变段记录和负载跳跃段记录各自不同范围进行差分,并结合对应范围下的频繁权重进行差分加权,得到渐变的变化程度和跳跃的变化程度;Based on the load gradient segment record and the load jump segment record, the occurrence frequency of different ranges under the load interval range is determined, and the frequent weights of the load gradient segment record and the load jump segment record under different ranges are determined by the occurrence frequency. Differentiate the different ranges of the load gradient segment record and the load jump segment record, and perform differential weighting in combination with the frequent weights under the corresponding range to obtain the degree of change of the gradient and the degree of change of the jump; 综合渐变的变化程度和跳跃的变化程度确定负载的区间范围的变化程度。The degree of change of the load range is determined by combining the degree of gradual change and the degree of jump change. 2.根据权利要求1所述的服务器性能智能监控方法,其特征在于,根据变化程度对负载的区间范围进行拆分,得到负载的多个部分区间,包括,2. The method for intelligently monitoring server performance according to claim 1 is characterized in that the load interval range is split according to the degree of change to obtain multiple partial intervals of the load, including: 计算负载历史数据的离散程度,根据离散程度和负载的区间范围的变化程度来确定一个拆分数量,通过拆分数量将负载的区间范围进行对应拆分,得到负载的多个部分区间。The discrete degree of the load history data is calculated, and a split number is determined according to the discrete degree and the degree of change of the load interval range. The load interval range is correspondingly split according to the split number to obtain multiple partial intervals of the load. 3.根据权利要求1所述的服务器性能智能监控方法,其特征在于,确认负载的不同部分区间下负载与性能历史数据的关联关系,得到负载-性能的标准关联关系,包括,3. The method for intelligently monitoring server performance according to claim 1 is characterized in that the correlation between load and historical performance data in different load intervals is confirmed to obtain a standard correlation between load and performance, including: 对性能数据进行分类,确定负载的不同部分区间下对应的性能数据类别以及每种性能数据的分布范围,计算负载与每种性能数据之间的敏感性,统计每种性能数据的分布参数;Classify the performance data, determine the performance data categories corresponding to different parts of the load and the distribution range of each performance data, calculate the sensitivity between the load and each performance data, and count the distribution parameters of each performance data; 性能数据的分布参数包括均值、标准差、四分位数、IQR、偏度和峰度;Distribution parameters of performance data included mean, standard deviation, quartiles, IQR, skewness, and kurtosis; 根据均值和标准差设定第一Z分数阈值,根据四分位数和IQR设定第二Z分数阈值,根据偏度和峰度设定第三Z分数阈值;The first Z score threshold was set based on the mean and standard deviation, the second Z score threshold was set based on the quartiles and IQR, and the third Z score threshold was set based on the skewness and kurtosis; 结合第一Z分数阈值、第二Z分数阈值、第三Z分数阈值设定第四Z分数阈值;The fourth Z score threshold is set by combining the first Z score threshold, the second Z score threshold, and the third Z score threshold; 通过第四Z分数阈值区分出每种性能数据的分布范围中的正常值和异常值,将异常值剔除,根据性能数据的分布范围中的正常值和负载与每种性能数据之间的敏感性来构建负载-性能的标准关联关系。The fourth Z score threshold is used to distinguish normal values and abnormal values in the distribution range of each performance data, and the abnormal values are eliminated. The standard correlation relationship between load and performance is constructed based on the normal values in the distribution range of the performance data and the sensitivity between the load and each performance data. 4.根据权利要求3所述的服务器性能智能监控方法,其特征在于,根据性能数据的分布范围中的正常值和负载与每种性能数据之间的敏感性来构建负载-性能的标准关联关系,包括,4. The method for intelligently monitoring server performance according to claim 3 is characterized in that a standard correlation relationship between load and performance is constructed according to a normal value in a distribution range of performance data and a sensitivity between load and each performance data, including: 在性能数据的分布范围中的正常值的基础上,拟合不同部分区间下负载与性能数据的分布范围之间的关系,得到回归模型,并通过负载与每种性能数据之间的敏感性调整回归模型中的回归系数,以此描述负载-性能的标准关联关系。Based on the normal value in the distribution range of performance data, the relationship between the load and the distribution range of performance data in different parts of the interval is fitted to obtain a regression model. The regression coefficient in the regression model is adjusted according to the sensitivity between the load and each performance data to describe the standard correlation between load and performance. 5.根据权利要求4所述的服务器性能智能监控方法,其特征在于,并将负载数据和性能数据输入到负载-性能的标准关联关系中来确定偏差情况,包括,5. The method for intelligently monitoring server performance according to claim 4, characterized in that the load data and the performance data are input into the standard correlation relationship between load and performance to determine the deviation, including: 按照负载数据的不同部分区间来将负载数据和性能数据输入到对应回归模型中,并统计每个负载部分区间下的偏差,绘制每个负载部分区间的偏差曲线;Input the load data and performance data into the corresponding regression model according to different partial intervals of the load data, and count the deviations under each partial interval of the load, and draw the deviation curve of each partial interval of the load; 将所有负载部分区间的偏差曲线进行整合,提取所有偏差曲线的曲线特征,生成偏差情况。The deviation curves of all load sections are integrated, the curve features of all deviation curves are extracted, and the deviation situation is generated. 6.根据权利要求5所述的服务器性能智能监控方法,其特征在于,分析偏差情况输出目标服务器的当前性能信息,包括,6. The method for intelligently monitoring server performance according to claim 5, characterized in that the analysis of the deviation situation and the output of the current performance information of the target server include: 确定每类性能数据下的曲线特征,综合曲线特征生成每类性能数据下的匹配系数,根据每类性能数据的参数确定每类性能数据的基础性能指标,结合每类性能数据的基础性能指标和匹配系数确定性能指标;Determine the curve characteristics under each type of performance data, generate the matching coefficient under each type of performance data by integrating the curve characteristics, determine the basic performance index of each type of performance data according to the parameters of each type of performance data, and determine the performance index by combining the basic performance index of each type of performance data and the matching coefficient; 整合所有性能数据类别下的性能指标输出目标服务器的当前性能信息。Integrate the performance indicators under all performance data categories to output the current performance information of the target server. 7.根据权利要求6所述的服务器性能智能监控方法,其特征在于,整合所有性能数据类别下的性能指标输出目标服务器的当前性能信息,包括,7. The server performance intelligent monitoring method according to claim 6 is characterized in that the current performance information of the target server is output by integrating the performance indicators under all performance data categories, including: 整合所有性能数据类别下的性能指标得到目标服务器的整体性能指标;Integrate the performance indicators under all performance data categories to obtain the overall performance indicators of the target server; 分析所有性能数据类别下的性能指标得到每类性能数据的性能状况,并将每类性能数据的性能状况、性能指标和整体性能指标作为目标服务器的当前性能信息。The performance indicators under all performance data categories are analyzed to obtain the performance status of each type of performance data, and the performance status, performance indicators and overall performance indicators of each type of performance data are used as the current performance information of the target server. 8.服务器性能智能监控系统,其特征在于,用于实现如权利要求1-7任一项所述的服务器性能智能监控方法,所述系统包括,8. A server performance intelligent monitoring system, characterized in that it is used to implement the server performance intelligent monitoring method according to any one of claims 1 to 7, the system comprising: 第一模块,用于收集目标服务器的负载历史数据,分析负载历史数据确定负载的区间范围和变化程度,根据变化程度对负载的区间范围进行拆分,得到负载的多个部分区间;The first module is used to collect the load history data of the target server, analyze the load history data to determine the load interval range and the degree of change, and split the load interval range according to the degree of change to obtain multiple partial load intervals; 第二模块,用于按照负载的多个部分区间收集目标服务器的性能历史数据,确认负载的不同部分区间下负载与性能历史数据的关联关系,得到负载-性能的标准关联关系;The second module is used to collect the performance history data of the target server according to multiple partial intervals of the load, confirm the correlation between the load and the performance history data in different partial intervals of the load, and obtain the standard correlation between the load and the performance; 第三模块,用于获取目标服务器当前一段时间内的负载数据和性能数据,并将负载数据和性能数据输入到负载-性能的标准关联关系中来确定偏差情况;The third module is used to obtain the load data and performance data of the target server in the current period of time, and input the load data and performance data into the standard correlation relationship between load and performance to determine the deviation; 第四模块,用于分析偏差情况输出目标服务器的当前性能信息,以此实现对服务器性能的监控。The fourth module is used to analyze the deviation and output the current performance information of the target server, so as to monitor the server performance.
CN202510370279.0A 2025-03-27 2025-03-27 Server performance intelligent monitoring method and system Active CN119883852B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510370279.0A CN119883852B (en) 2025-03-27 2025-03-27 Server performance intelligent monitoring method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510370279.0A CN119883852B (en) 2025-03-27 2025-03-27 Server performance intelligent monitoring method and system

Publications (2)

Publication Number Publication Date
CN119883852A CN119883852A (en) 2025-04-25
CN119883852B true CN119883852B (en) 2025-07-08

Family

ID=95424707

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510370279.0A Active CN119883852B (en) 2025-03-27 2025-03-27 Server performance intelligent monitoring method and system

Country Status (1)

Country Link
CN (1) CN119883852B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354092A (en) * 2015-11-19 2016-02-24 东软集团股份有限公司 Method, device and system for predicting application performance risk
CN117311984A (en) * 2023-11-03 2023-12-29 北京创璞科技有限公司 Method and system for balancing server load based on comparison service

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11244410B2 (en) * 2020-02-24 2022-02-08 Leading Edge Power Solutions, Llc Technologies for dynamically dispatching generator power
CN118585397A (en) * 2024-05-29 2024-09-03 深圳市灏鑫达电子科技有限公司 Solid state drive performance monitoring method and system
CN119421175A (en) * 2024-10-15 2025-02-11 上海创蓝云智信息科技股份有限公司 A real-time SMS communication link verification and performance monitoring method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354092A (en) * 2015-11-19 2016-02-24 东软集团股份有限公司 Method, device and system for predicting application performance risk
CN117311984A (en) * 2023-11-03 2023-12-29 北京创璞科技有限公司 Method and system for balancing server load based on comparison service

Also Published As

Publication number Publication date
CN119883852A (en) 2025-04-25

Similar Documents

Publication Publication Date Title
CN110413227B (en) Method and system for predicting remaining service life of hard disk device on line
JP5936509B2 (en) Program, analysis method, and information processing apparatus
US7502971B2 (en) Determining a recurrent problem of a computer resource using signatures
US8078913B2 (en) Automated identification of performance crisis
US9921936B2 (en) Method and system for IT resources performance analysis
CN111984499A (en) Fault detection method and device for big data cluster
CN119179598B (en) Solid state disk fault prediction method and system based on artificial intelligence
US6681309B2 (en) Method and apparatus for measuring and optimizing spatial segmentation of electronic storage workloads
CN118761745B (en) OA collaborative workflow optimization method applied to enterprise
CN104516808B (en) Data prediction device and method
WO2012142144A2 (en) Assessing application performance with an operational index
Liu et al. Multi-task hierarchical classification for disk failure prediction in online service systems
CN118673396A (en) Big data platform operation and maintenance management system based on artificial intelligence
CN118673087A (en) Bank data warehouse construction method, system, equipment and storage medium
Wildani et al. Efficiently identifying working sets in block i/o streams
CN118331822A (en) Abnormal information detection method and device, storage medium and electronic device
CN114860540B (en) Cloud data center server health evaluation method
CN118733367B (en) Disk performance analysis method based on high-density storage
CN118917390B (en) Service knowledge base management system and method based on knowledge big model
CN119883852B (en) Server performance intelligent monitoring method and system
CN118708593A (en) A data processing method and system in a high-concurrency scenario of a data center
CN118689887A (en) Data consistency detection and repair method, device and electronic equipment
WO2016013099A1 (en) Feature data management system and feature data management method
CN112749035A (en) Anomaly detection method, device and computer readable medium
CN111158974B (en) A hardware-aware CPU energy consumption calculation method for cloud servers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant