CN119106350B

CN119106350B - Long-time sequence power load prediction method based on improved Informer model

Info

Publication number: CN119106350B
Application number: CN202411106406.8A
Authority: CN
Inventors: 朱春强; 米路革麻; 卢欣超; 罗敏楠; 朱海萍; 梁潇; 朱莉; 陈曦; 司恒斌; 杜国维; 王梦婷; 王婧
Original assignee: Xian University of Science and Technology; Training Center of State Grid Shaanxi Electric Power Co Ltd
Current assignee: Xian University of Science and Technology; Training Center of State Grid Shaanxi Electric Power Co Ltd
Priority date: 2024-08-13
Filing date: 2024-08-13
Publication date: 2025-08-08
Anticipated expiration: 2044-08-13
Also published as: CN119106350A

Abstract

The application provides a long-time sequence power load prediction method based on an improved Informer model, and particularly relates to the technical field of power load prediction, the method is characterized in that meteorological features, scalar information, position information, time information and trend information are respectively subjected to feature coding and embedding through a feature embedding layer and an SSI model, more abundant input feature representation is provided for the model, a method for sparsity evaluation of a key value matrix is adopted to carry out secondary sparsity on probability sparse attention, the time complexity of an attention mechanism is reduced from O (LlnL) to O ((lnL) ²), and finally a final predicted value is output through a decoder and an encoder, and the method solves the problems that the feature richness is insufficient and the sparsity is still insufficient in a probability sparse attention module when the original Informer model is applied to power load prediction, so that the effects of improving the precision and efficiency of long-time sequence power load prediction are achieved.

Description

Long-time sequence power load prediction method based on improved Informer model

Technical Field

The invention relates to the technical field of power load prediction, in particular to a long-time sequence power load prediction method based on an improved Informer model.

Background

With rapid development of social economy and continuous progress of technology, demands for electric power in the fields of resident life, industrial production, business activities, information communication, traffic management and the like are increasing. Particularly, various intelligent technologies and products are widely popularized, so that the demand of society for electric power is further expanded. However, electric energy is an instant energy source which is difficult to store in a large amount for a long time, a balance point needs to be found between the production and the use of the electric energy, unnecessary waste is caused when the generated energy is larger than the actual required electric quantity, and when the generated energy is smaller than the actual required electric quantity, electric energy shortage, large-area power failure, emergency failure of electric power equipment and a series of safety problems can occur. Accurate power load prediction can help the power plant and the power supply department find the supply and demand balance point of the power, so that the problems are avoided. Therefore, the research of the power load prediction method can help the power operation department to effectively plan the power supply, reduce the power waste, reduce the cost, ensure the stable operation of the power system, further provide safe and stable power supply, and has very important significance for the social production and life.

The current deep learning model, such as RNN and LSTM, has stronger characteristic learning capability, and improves the precision of load prediction. However, when processing an ultra-long sequence, the method has the problems of large calculation amount, long training time, difficulty in capturing long-term dependency and the like. In addition, these models often ignore the temporal granularity information of the data when processing time series data, resulting in inaccurate predictions.

In summary, how to improve the accuracy and efficiency of long-time-series power load prediction is a problem to be solved by the present application.

Disclosure of Invention

The invention mainly aims to provide a long-time sequence power load prediction method based on an improved Informer model, which solves the problems of insufficient feature richness and insufficient sparsity of a probability sparse attention module when an original Informer model is applied to power load prediction, thereby achieving the effect of improving the accuracy and efficiency of long-time sequence power load prediction.

In order to achieve the above object, the present invention provides a long-time-series power load prediction method based on an improved Informer model, which is used for achieving the effect of improving the accuracy and efficiency of long-time-series power load prediction.

The invention provides a long-time series power load prediction method based on an improved Informer model, which comprises the following steps:

Acquiring time sequence data and an SSI model, extracting feature information of the time sequence data through a feature embedding module, and calculating the feature information to obtain first output information, wherein the feature information comprises weather features, global time features, trend features, scalar features and position features, the SSI model is used for indicating a model based on an improved Informer model, and the SSI model comprises a feature embedding module, an encoder module, a decoder module and an output layer;

The first output information is screened by an encoder module, and second output information is obtained after the second sparse and attention distilling mechanism is processed;

Processing the second output information through a decoder module to obtain a high-dimensional feature vector, wherein the decoder comprises a probability sparse self-attention module and a multi-head attention module;

And taking the high-dimensional feature vector as input of an output layer, and converting the high-dimensional feature vector into a final predicted value through the output layer.

Optionally, before the time series data and the SSI model are acquired, the SSI model can be constructed, and the SSI model is constructed by the steps of firstly relating to data preprocessing, including missing value filling and data normalization, and then dividing the data into a training set and a testing set. The time sequence data is input into a feature embedding layer, a XGBoost model is used for carrying out feature selection on meteorological information, and each dimension feature is extracted through a position encoder, a time encoder, a trend encoder and a scalar encoder, so that the feature information of the model is enriched. These features are input to the encoder part of the model, which is composed of a multi-headed quadratic sparse ProbSparse Self-Attention mechanism and a distillation mechanism, which screens out important query matrices and key matrices, and captures the long-term correlation of the load sequences by self-Attention distillation dimension reduction, and then inputs the output result of the encoder to the decoder, which contains hidden multi-headed probabilistic sparse Attention and multi-headed Attention. Finally, the high-dimensional output result of the decoder is input to the output layer, and the final predicted value is output.

Optionally, the obtaining time series data and an SSI model, extracting feature information of the time series data through a feature embedding module, and calculating the feature information to obtain first output information, where the feature information includes weather features, global time features, trend features, scalar features and position features, the SSI model is used to indicate a model based on an improved Informer model, and the SSI model includes the feature embedding module, an encoder module, a decoder module and an output layer, and includes:

analyzing the meteorological information in the time sequence data through XGBoost model to obtain the meteorological features;

processing time sequence data in the time sequence data through a global time encoder to obtain the global time characteristic;

Acquiring the trend feature in the time series data by a trend encoder;

Acquiring the scalar features and the position features in the time series data by a direct observation and position encoder;

And performing linear addition calculation on the meteorological features, the global time features, the trend features, the scalar features and the position features to obtain first output information.

Optionally, the analyzing, by the XGBoost model, the weather information in the time-series data to obtain the weather feature includes:

And selecting the weather information by using XGBoost models, and selecting the characteristic with the highest score as the weather characteristic to be used for model calculation.

Optionally, the processing, by the global time encoder, the time sequence data in the time sequence data to obtain a global time feature includes:

And obtaining global time characteristics by splitting and encoding the time sequence data according to six time granularities of year, month, day, time and holiday.

Optionally, the obtaining, by a trend encoder, the trend feature in the time series data includes:

Constructing a matrix D by the time series data of the past h days at the moment t;

constructing a matrix P from the time series data of the past h weeks at time t and constructing a matrix M from the time series data of the past h months at time t;

and then carrying out fusion on D, P and M to form a trend matrix, and finally extracting the trend characteristics by convolution.

Optionally, the acquiring scalar features and position features in the time series data by a direct view and position encoder includes:

Extracting the scalar features in the time series data by direct observation;

extracting the position features in the time series data by a position encoder.

Optionally, the filtering, the second sparsing and the attention distilling mechanism processing the first output information by the encoder module to obtain second output information includes:

screening the first output information by calculating the similarity of the self-attention distribution and the uniform distribution to obtain dominant query data;

performing secondary sparse processing on the dominant query data by a secondary sparse probability sparse attention method to obtain secondary sparse data;

And distilling the secondary sparse data to obtain the second output information.

The method is used for respectively carrying out feature coding and embedding on meteorological features, scalar information, position information, time information and trend information through a feature embedding layer, providing richer input feature representation for the model, carrying out secondary sparsity on probability sparse attention by adopting a method for sparsity evaluation on a key value matrix, reducing the time complexity of an attention mechanism from O (LlnL) to O ((lnL) ²), and finally outputting a final predicted value through a decoder and an encoder, and solves the problems that the original Informer model is insufficient in feature richness and the probability sparse attention module is still insufficient in sparsity when being applied to power load prediction, thereby achieving the effect of improving the accuracy and efficiency of long-time sequence power load prediction.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:

FIG. 1 is a schematic flow chart of a long-time series power load prediction method based on an improved Informer model according to the first embodiment of the present application;

FIG. 2 is a second flow chart of the improved Informer model-based long-time-series power load prediction method according to the present application;

fig. 3 is a flowchart illustrating a long-time-series power load prediction method based on the improved Informer model according to the present application.

Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein.

In the present application, words such as "exemplary" or "such as" are used to mean examples, illustrations or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In recent years, with rapid development of social economy and continuous progress of technology, demands for electric power in the fields of resident life, industrial production, business activities, information communication, traffic management and the like are increasing. Particularly, various intelligent technologies and products are widely popularized, so that the demand of society for electric power is further expanded. However, electric energy is an instant energy source which is difficult to store in a large amount for a long time, a balance point needs to be found between the production and the use of the electric energy, unnecessary waste is caused when the generated energy is larger than the actual required electric quantity, and when the generated energy is smaller than the actual required electric quantity, electric energy shortage, large-area power failure, emergency failure of electric power equipment and a series of safety problems can occur. Accurate power load prediction can help the power plant and the power supply department find the supply and demand balance point of the power, so that the problems are avoided. Therefore, the electric load prediction is an important link of operation and planning of the electric power system, has important significance for ensuring stable operation and optimizing resource allocation of the electric power system, and the research of the electric load prediction method can help an electric power operation department to effectively plan power supply, reduce electric power waste, reduce cost, ensure stable operation of the electric power system, further provide safe and stable electric power supply, and has important significance for social production and life.

The current deep learning models, such as RNN and LSTM, or the neural network models with more complex parameters are adopted for training and reasoning, and the model has stronger characteristic learning capability and improves the accuracy of load prediction. However, when processing an ultra-long sequence, the method has the problems of large calculation amount, long training time, difficulty in capturing long-term dependency and the like. In addition, these models often ignore the temporal granularity information of the data when processing time series data, resulting in inaccurate predictions.

The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a long-time-series power load prediction method based on an improved Informer model according to the first embodiment, where an execution subject is a power load prediction system, as shown in fig. 1, and the long-time-series power load prediction method based on an improved Informer model according to the first embodiment includes:

s101, acquiring time sequence data and an SSI model, extracting characteristic information of the time sequence data through a characteristic embedding module, and calculating the characteristic information to obtain first output information.

Wherein the characteristic information refers to meteorological characteristics, global time characteristics, trend characteristics, scalar characteristics and position characteristics. The SSI model is used for indicating a model based on a modified Informer model, and comprises the feature embedding module, an encoder module, a decoder module and an output layer

The meteorological features are obtained by selecting meteorological information by using XGBoost models, and selecting the features with the highest scores as the meteorological features to be used for model calculation.

The global time feature is obtained by providing global time information for the model through a designed global time encoder E _t (X). For data X, split coding is performed herein from six time granularities of year, month, day, time, minutes, holidays. For example, for data with current time of 2023-03-0315:10, the extracted global time feature is represented by a numerical vector as [23,3,3,15,1,0], which sequentially represents 23 years, 3 months, 3 days, and 15 hours, 10 minutes (60 minutes is divided into 6 segments, so the numerical value of 10 minutes is represented as 1), and a non-holiday (because holiday is 1), the vector is input into a fully-connected network to obtain the global time feature, and the formula is defined as:

x_time＝E_t(X)

Where X is the extracted global temporal feature value vector representation and E _t (-) is the fully connected layer.

The trend characteristic is obtained through a designed trend encoder Es (X), and the trend characteristic mainly comprises three parts, namely a matrix D is constructed by using load data of the past h days at the time t, a matrix P is constructed by using load data of the past h weeks at the time t, and a matrix M is constructed by using load data of the past h months at the time t. Then, D, P and M are fused to form a trend matrix T epsilon R ^3×h, and finally, feature extraction is carried out by convolution, wherein trend codes are defined as follows:

x_tre＝E_s(X)=Conv1d(T)=concat(D,P,M)

The scalar features and the position features are obtained by extracting scalar features x _sca by direct observation and extracting position features x _pos by a position encoder calculation mode.

The calculation mode for calculating the characteristic information to obtain the first output information is as follows:

and carrying out linear addition on the meteorological features, the global time features, the trend features, the scalar features and the position features to obtain first output information.

The method aims at extracting complete time features, periodicity and trend features to enhance the utilization of meteorological features when the model is applied to load prediction tasks, wherein the features have important roles in mining nonlinear long-distance dependency relations among data and improving prediction effects

S102, screening, secondary sparseness and attention distillation mechanism processing are carried out on the first output information through an encoder module, and then second output information is obtained.

The first output information is screened, secondarily sparse and attentive distillation mechanism processed by the encoder module, and then second output information is obtained, so that the method is completed by the following steps:

Step one, screening first output information through an encoder module:

Dominant queries in the first output information are screened by computing similarity of self-attention distribution to uniform distribution, which can be quantitatively computed with Kullback-Leibler divergence, defined as:

Where q _i is the ith query, k _j is the jth bond, and N _k is the number of bonds. Alpha represents the attention profile of query q _i, which is expressed as follows:

removing constants in the KL divergence definition, and designing a sparsity measure to measure the significance of the query, wherein the definition is as follows:

The first term of the equation is q _i and Log-Sum-Exp of all keys, and the second is the arithmetic mean of the division results. The higher the value of M (q _i, K), the higher the probability that the attention probability of q _i contains dominant point pairs in the head domain of the long-tail self-attention distribution. The complexity of the sparsity measure is still quadratic because it requires computation of the dot product of each query and each key. An alternative measurement method was designed for this Informer model, which was defined as:

under the long tail distribution, only L _klnL_k dot product pairs are selected randomly to calculate I.e. the other pairs are filled with zeros. Finally, only selectThe highest value s dominant queries to make self-attention calculations for each key are defined as:

wherein Q is a query matrix of the dominant query, consisting of data points of the dominant query, K is a key matrix, providing characteristics of the information required for the query, V is a value matrix, representing the actual value information, Comprising s dominant queries of the number,By assigning a specific value to s, the self-noted temporal and spatial complexity is reduced from N ² to nxs using ProbSparse self-attention mechanisms.

The second step, the secondary sparse ProbSpare Self-attribute mechanism provided by the invention, is a specific flow for carrying out secondary sparse on the data obtained in the first step by using a secondary sparse self-Attention mechanism, wherein the specific flow comprises the following steps:

Kullback-Leibler divergence was first used to evaluate K sparsity. Wherein the sparsity evaluation of the ith key is defined as:

Where Q is the query matrix of the dominant query, Q _j is the j-th query, k _j is the j-th key, and Q _Q is the number of keys. The first term of the formula is one Log-Sum-Exp for all keys, and the second term is their arithmetic mean, the higher the value, the higher the probability that the attention probability of k _i contains the dominant point pair in the head-domain of the long-tail self-attention distribution (k _i, Q).

Based on the evaluation mode, the invention carries out sparsity evaluation on K, then selects Top-u keys (u= lnL/c, c is a constant) from the sparsity evaluation result, wherein the keys are the u keys with highest activity in the sparsity evaluation, and consist of the keysInstead of K in the original formula.

It should be noted that in the sparsity evaluation process, if the sparsity score is calculated for each key, an additional calculation amount is brought, so that the dot product result can be utilized to obey the assumption of long tail distribution, and when the sparsity score of each key is calculated, only one sparsity evaluation is neededThe random sampling part qurey and the key are calculated. Thereby reducing the time loss caused by sparsity evaluation of the key. Thus, under long tail distribution, only random samples of u= lnQlnK dot product pairs are needed to calculate M (k _i, Q) and zero is used to fill the other dot product pairs, so zero-filling is done because the maximum operator under M (k _i, Q) is less sensitive to zero values and is numerically stable.

Based on the above manner, the input of the attention mechanism dot product operation is slaveK is converted into This allows the post-sparsity ProbSparse Self-attribute to operate only on the dot product pair of O ((lnL) ²), and finally, to makeAndCalculated by dot product operationThe invention can perform matrix multiplication operation with V matrix, and the invention uses a median filling method to smooth S, and marks the filled S asBased on the method, the invention provides a new measurement mode on the attention mechanism proposed by Informer model, which is defined as:

wherein: to include only a few selected query matrices that have a high specific gravity of contribution to the attention value, The key matrix with the highest liveness rank screened for the secondary sparsity ProbSpare Self-attribute is V a value matrix, and d is Q, K and the dimension of V.

And step three, processing the output result of the step two through an attention distillation mechanism, wherein the specific flow is as follows:

after the matrix Q, K, V is operated by using the secondary sparse self-attention mechanism, attention distillation operation is also needed to reduce the feature dimension while maintaining the high sensitivity of the attention mechanism to the sequence relationship. The attention distillation operation is performed by a 1-D convolution layer with an ELU activation function and passes through a max-pooling layer with a step size of 2. The distillation operation is defined as:

Wherein [ ] AB represents the attention block. It contains multi-headed quadratic sparsity ProbSparse Self-intent and basic operations, conv1d () uses the ELU () activation function. By using a distilled extraction operation, the dimension of each head probability sparse self-noticing feature map will be reduced to half of its original dimension, and all the output information will be connected as a hidden representation of the model encoder.

It can be appreciated that the first output information output by the feature embedding module is taken as the input of the encoder of the quadratic sparsity ProbSpare Self-attribute after passing through the feature embedding module. ProbSpare Self-Attention mechanisms consider that the sparsity of the self-Attention matrix is manifested by different query-to-key concerns (i.e., different liveness, higher self-Attention score). Dominant queries refer to a query (query) having a greater dominant dot product if the self-attention profile of the query matches the long-tail profile. According to experiments, the invention shows that the point product under ProbSpare Self-Attention mechanism sparsifies the query matrix, meanwhile, partial keys and almost all queries keep smaller association, and when the keys are calculated with the queries, the time complexity and the space complexity of the self-Attention module are increased, namely, the Attention matrix after the sparse self-Attention mechanism still has sparsity, so that the invention performs secondary sparsity treatment on the Attention score matrix, performs qualitative evaluation again on the self-Attention mechanism under ProbSpare Self-Attention mechanism, realizes secondary sparsification on the ProbSpare Self-Attention by a method of sparsifying the key value matrix K, and the Attention score of the query and the key after secondary sparsification is integrally kept at higher score, which shows that the secondary sparsification method can truly remove the key with lower activity.

And S103, processing the second output information through a decoder module to obtain a high-dimensional feature vector.

The method for processing the second output information by the decoder module to obtain the high-dimensional feature vector is as follows:

next, the output of the second output information acquired by the encoder is input as part of a decoder composed of a probability sparse self-attention layer and a multi-headed self-attention layer. The masking dot product is here set directly to- ≡and the output of the multi-headed self-care layer is concatenated with the result of the encoder, and then the resulting high-dimensional vector representation is defined as follows:

X_decoder＝Concat(MultiHeadAttention(Q_de,K_de,V_de),X_encoder)

where Q _de、K_de、V_de is the query, key, and value vector of the decoder, respectively, and X _ender is the output of the encoder.

S104, taking the high-dimensional feature vector as input of an output layer, and converting the high-dimensional feature vector into a final predicted value through the output layer.

The mode of converting the high-dimensional feature vector into the final predicted value through the output layer is that the high-dimensional feature vector obtained in the step S103 is input into a complete connection layer, namely, the output layer, and the output of the high-dimensional feature vector is completely connected with the load value to obtain a final result. The input of the output layer at the time t is the splicing of two parts, and the final predicted value is defined as:

Wherein, the As a start mark of length L _token, the definition is as follows:

Is a placeholder of the target sequence length L _y, set to 0. The length of the decoder input sequence is thus the sum of the lengths of x _token and x ₀, L _de＝L_token+L_y.

According to the long-time sequence power load prediction method based on the improved Informer model, the characteristic information of time sequence data is extracted through the characteristic embedding module, the characteristic information is subjected to linear addition processing to obtain first output information, the first output information is subjected to screening, secondary sparseness and attention distillation mechanism processing through the encoder module to obtain second output information, the decoder module is used for processing the second output information to obtain a high-dimensional characteristic vector, the high-dimensional characteristic vector is used as input of an output layer, and the high-dimensional characteristic vector is converted into a final predicted value through the output layer. By the method, the problems that the original Informer model is insufficient in feature richness and the probability sparse attention module is still insufficient in sparsity when the model is applied to power load prediction are solved, and therefore the effect of improving the accuracy and efficiency of long-time sequence power load prediction is achieved.

Fig. 2 is a second flow chart of a long-time-series power load prediction method based on an improved Informer model, as shown in fig. 2, and the embodiment describes in detail the long-time-series power load prediction method based on an improved Informer model, as shown in fig. 2, and the long-time-series power load prediction method based on an improved Informer model, provided in the embodiment, includes:

and S201, analyzing the weather information in the time sequence data through a XGBoost model to obtain weather features.

Wherein the weather features have an important influence on load prediction, but part of weather information has weak correlation with load prediction or information redundancy exists, and the weather information can reduce the performance of a prediction model. Therefore, in the meteorological feature extraction part, the invention uses XGBoost to perform feature selection on meteorological information X _wth, the meteorological information comprises weather, temperature, humidity, wind speed and the like, top_f features with the highest feature importance score are selected according to feature frequency division numbers and added into a feature subset T _wth, a full connection layer is used as a meteorological encoder E _w (X), and finally T _wth∈R^Top_f×t is input into E _w (X) to obtain meteorological features, and the meteorological features are expressed as follows:

x_wth＝E_w(T_wth)

S202, processing time sequence data in the time sequence data through a global time encoder to obtain global time characteristics.

The invention provides global time information for the model by designing a global time encoder. For time series data, split coding is performed herein from six time granularities of year, month, day, time, minutes and holidays. For example, for data with current time of 2023-03-03:15, the extracted global time feature is represented by a numerical vector as [23,3,3,15,1,0], which sequentially represents 23 years, 3 months, 3 days and 15 hours, 10 minutes (60 minutes is divided into 6 segments, so the numerical value of 10 minutes is represented as 1), and a non-holiday (because holiday is 1), the vector is input into a fully-connected network to obtain the global time feature, and the formula is defined as follows:

x_time＝E_t(X)

It will be appreciated that load prediction is a classical time series prediction task, and its global time feature plays an important role in model learning the dependency of data time dimension, so global time encoders are designed to provide global time information for models.

And S203, acquiring the trend characteristics in the time series data through a trend encoder.

The trend characteristic is obtained through the trend encoder Es (X), and the trend characteristic mainly comprises three parts, namely a matrix D is constructed by using load data of the past h days at the time t, a matrix P is constructed by using load data of the past h weeks at the time t, and a matrix M is constructed by using load data of the past h months at the time t. Then, D, P and M are fused to form a trend matrix T epsilon R ^3×h, and finally, feature extraction is carried out by convolution, wherein trend codes are defined as follows:

x_tre＝E_s(X)=Conv1d(T)=concat(D,P,M)

It will be appreciated that the purpose of this step is to better reflect the periodicity and trending of the load data.

And S204, acquiring scalar features and position features in the time series data through a direct observation and position encoder.

Wherein scalar feature x _sca can be obtained from the time series data by direct observation.

The location feature x _pos is obtained in the specific implementation in the following manner:

The input representation is three independent parts and one scalar projection, and embedding of local and global time stamps.

For time tThe representation method first projects the values of the input sequence of (a) through a d _modle convolution filter onto a d _modle -dim vectorThe kernel size is 3, the step size is 1, where d _modle is the dimension of the input representation.

Then, the local context is reserved by utilizing fixed position embedding at the time t, and the position information is acquired, wherein the position characteristics are defined as follows:

Where pos=1, 2,..l _x and j=1, 2,..d _modle.L_x is the length of the input sequence, L _x＝L_en or L _de represent the input of the encoder or decoder, respectively, of the proposed model.

The purpose of this step is to obtain global hierarchical timestamps (e.g., week, month, and year) and agnostic timestamps (e.g., holidays and others), which are necessary to obtain long-term independence of long-sequence time-series predictions.

And S205, performing linear addition calculation on the meteorological features, the global time features, the trend features, the scalar features and the position features to obtain first output information.

The specific implementation mode of the step is that the five parts of the meteorological features, the global time features, the trend features, the scalar features and the position features obtained in the step S201-205 are linearly added to obtain the input of a model, wherein the definition of the input of the model is as follows:

x_input＝Add(x_wth,x_time,x_tre,x_pos,x_sca)

S206, screening the first output information by calculating the similarity of the self-attention distribution and the uniform distribution to obtain dominant query data.

Wherein, the specific implementation mode of screening dominant queries in the first output information by calculating the similarity of the self-attention distribution and the uniform distribution can be quantitatively calculated by using the Kullback-Leibler divergence, which is defined as:

S207, performing secondary sparse processing on the dominant query data through a secondary sparse probability sparse attention method to obtain secondary sparse data.

The specific flow of performing secondary sparse processing on dominant query data by the secondary sparse probability sparse attention method to obtain secondary sparse data is as follows:

And S208, performing distillation operation on the secondary sparse data to obtain second output information.

The specific flow of obtaining the second output information by carrying out distillation operation on the secondary sparse data is as follows:

And S209, processing the second output information through a decoder module to obtain a high-dimensional feature vector.

And S210, taking the high-dimensional feature vector as an input of an output layer, and converting the high-dimensional feature vector into a final predicted value through the output layer.

Steps S209 to S210 are similar to steps S103 to S104, and will not be described here.

According to the long-time sequence power load prediction method based on the improved Informer model, weather information in time sequence data is analyzed through the XGBoost model, weather characteristics are obtained, time sequence data in the time sequence data are processed through a global time encoder, global time characteristics are obtained, trend characteristics in the time sequence data are obtained through a trend encoder, scalar characteristics and position characteristics in the time sequence data are obtained through direct observation and a position encoder, first output information is obtained after linear addition processing is carried out on the characteristic information, second output information is obtained after screening, secondary sparsity and attention distillation mechanism processing are carried out on the first output information through an encoder module, a high-dimensional characteristic vector is obtained through processing on the second output information through a decoder module, the high-dimensional characteristic vector is used as input of an output layer, and the high-dimensional characteristic vector is converted into a final predicted value through the output layer. By the method, the problems that the original Informer model is insufficient in feature richness and the probability sparse attention module is still insufficient in sparsity when the model is applied to power load prediction are solved, and therefore the effect of improving the accuracy and efficiency of long-time sequence power load prediction is achieved.

FIG. 3 is a flowchart of a long-time-series power load prediction method based on an improved Informer model according to the third embodiment of the present application, where the "analysis of weather information in time-series data to obtain weather features through a XGBoost model", "processing time-series data in time-series data through a global time encoder to obtain global time features", "obtaining the trend features in time-series data through a trend encoder", "obtaining scalar features and position features in time-series data through a direct observation and position encoder" is described in detail, and as shown in FIG. 3, the long-time-series power load prediction method based on an improved Informer model according to the present embodiment includes:

and S301, selecting weather information by using XGBoost models, and selecting the characteristic with the highest score as the weather characteristic to be used by model calculation.

In the meteorological feature extraction part, the invention uses XGBoost to perform feature selection on meteorological information X _wth, the meteorological information comprises weather, temperature, humidity, wind speed and the like, top_f features with the highest feature importance score are selected according to feature frequency division numbers and added into a feature subset T _wth, a full connection layer is used as a meteorological encoder E _w (X), and finally T _wth∈R^Top_f×t is input into E _w (X) to obtain meteorological features, and the formulation of the meteorological features is as follows:

x_wth＝E_w(T_wth)

It will be appreciated that the weather features have a significant impact on load prediction, but that some weather information has a weak correlation or information redundancy with load prediction, which reduces the performance of the prediction model, and therefore the present invention uses XGBoost to make feature selection for weather information X _wth.

S302, obtaining global time characteristics by splitting and encoding time sequence data with six time granularities of year, month, day, time and holiday.

x_time＝E_t(X)

And S303, constructing a matrix D through time series data of the past h days at the moment t.

S304, constructing a matrix P by using time series data of the past h weeks at the time t and constructing a matrix M by using time series data of the past h months at the time t.

And S305, fusing D, P, M with the vector to form a trend matrix, and finally extracting trend features by convolution.

The steps S303 and S304 are conventional operations of those skilled in the art, and may be implemented by using conventional matrix construction techniques, so that details are not described herein.

The specific implementation mode of fusing D, P, M and carrying out the fusion to form a trend matrix and finally extracting trend features by utilizing convolution is as follows:

The trend characteristic is obtained through the trend encoder, and the trend characteristic mainly comprises three parts, namely, a matrix D is constructed by using load data of the past h days at the time t, a matrix P is constructed by using load data of the past h weeks at the time t and a matrix M is constructed by using load data of the past h months at the time t. Then, D, P and M are fused to form a trend matrix T epsilon R ³ ^×h, and finally, feature extraction is carried out by convolution, wherein trend codes are defined as follows:

x_tre＝E_s(X)=Conv1d(T)=concat(D,P,M)

And S306, extracting the marker features in the time series data through direct observation.

Wherein for scalar feature x _sca, one skilled in the art can directly observe the acquisition from time series data.

S307, extracting the position features of the time series data through a position encoder.

The method for acquiring the position feature x _pos in specific implementation is as follows:

According to the long-time sequence power load prediction method based on the improved Informer model, weather information in time sequence data is analyzed through the XGBoost model to obtain weather characteristics, time sequence data in the time sequence data is processed through a global time encoder to obtain global time characteristics so as to obtain global layering time stamps (such as week, month and year) and agnostic time stamps (such as holidays and others), the trend characteristics in the time sequence data are obtained through a trend encoder, scalar characteristics and position characteristics in the time sequence data are obtained through direct observation and a position encoder, the characteristic information is subjected to linear addition processing to obtain first output information, the first output information is subjected to screening, secondary sparsity and attention distillation mechanism processing to obtain second output information through an encoder module, the second output information is processed through a decoder module to obtain high-dimensional characteristic vectors, the high-dimensional characteristic vectors are used as input of an output layer, and the high-dimensional characteristic vectors are converted into final predicted values through the output layer. By the method, the problems that the original Informer model is insufficient in feature richness and the probability sparse attention module is still insufficient in sparsity when the model is applied to power load prediction are solved, and therefore the effect of improving the accuracy and efficiency of long-time sequence power load prediction is achieved.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are alternative embodiments, and that the acts and modules referred to are not necessarily required for the present application.

It will be appreciated that the device embodiments described above are merely illustrative and that the device of the application may be implemented in other ways. For example, the division of the units/modules in the above embodiments is merely a logic function division, and there may be another division manner in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be omitted or not performed.

In addition, each functional unit/module in each embodiment of the present application may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may be integrated together, unless otherwise specified. The integrated units/modules described above may be implemented either in hardware or in software program modules.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A long-time-series power load prediction method based on an improved Informer model, for application to a power load prediction system, the method comprising:

The method comprises the steps of obtaining time sequence data and an SSI model, extracting characteristic information of the time sequence data through a characteristic embedding module, calculating the characteristic information to obtain first output information, wherein the characteristic information comprises weather characteristics, global time characteristics, trend characteristics, scalar characteristics and position characteristics, the SSI model is used for indicating a model improved based on the Informer model, the SSI model comprises the characteristic embedding module, an encoder module, a decoder module and an output layer, the time sequence data and the SSI model are obtained, the characteristic information of the time sequence data is extracted through the characteristic embedding module, the characteristic information is calculated to obtain first output information, and specifically comprises the steps of analyzing the weather information in the time sequence data through the XGBoost model to obtain the weather characteristics, processing the time sequence data in the time sequence data through the global time encoder to obtain the global time characteristics, obtaining the time sequence characteristics in the time sequence data through the trend encoder, obtaining the characteristic and the position characteristics in the time sequence data through the direct observation and the position encoder, and calculating the scalar characteristics, and the linear trend characteristics;

The encoder module is used for obtaining second output information after screening, secondary sparseness and attention distillation mechanism processing the first output information, wherein the encoder module is used for obtaining the second output information after screening, secondary sparseness and attention distillation mechanism processing the first output information, and the method specifically comprises the steps of screening the first output information to obtain dominant query data by calculating similarity of self-attention distribution and uniform distribution; performing secondary sparse processing on the dominant query data through a secondary sparse probability sparse attention method to obtain secondary sparse data;

processing the second output information by the decoder module to obtain a high-dimensional feature vector;

and taking the high-dimensional feature vector as an input of the output layer, and converting the high-dimensional feature vector into a final predicted value through the output layer.

2. The method of claim 1, wherein analyzing the weather information in the time series data by XGBoost model to obtain the weather feature comprises:

3. The method of claim 1, wherein the processing of the time series data by the global time encoder to obtain global time characteristics comprises:

4. The method of claim 1, wherein the obtaining, by a trend encoder, the trend feature in the time series data comprises:

building a matrix from said time series data at time t over the past h days ;

Building a matrix from said time series data at time t over the past h weeksAnd constructing a matrix using the time series data of the past h months at time t;

Will be、AndAnd fusing to form a trend matrix, and finally extracting the trend characteristics by convolution.

5. The method of claim 1, wherein the obtaining the scalar feature and the location feature in the time series data by a direct view and location encoder comprises:

Extracting the scalar features in the time series data by direct observation;

extracting the position features in the time series data by a position encoder.