CN120562928B

CN120562928B - A wind turbine multi-source data fusion and power prediction method

Info

Publication number: CN120562928B
Application number: CN202511062248.5A
Authority: CN
Inventors: 盛立; 李春昱; 高明; 周东华; 席霄鹏; 钟麦英
Original assignee: China University of Petroleum East China
Current assignee: China University of Petroleum East China
Priority date: 2025-07-31
Filing date: 2025-07-31
Publication date: 2025-09-26
Anticipated expiration: 2045-07-31
Also published as: CN120562928A

Abstract

The present invention discloses a method for multi-source data fusion and power prediction of a wind turbine. The method comprises: S1: preprocessing raw data from a wind turbine monitoring and data acquisition system, including feature selection, outlier removal, and data set generation and partitioning; S2: constructing a Transformer basis generation module and combining it with a multi-layer fully connected neural network to form a Transformer block; S3: constructing a power curve basis generation module and combining it with a multi-layer fully connected neural network to form a power curve block; S4: connecting several Transformer blocks using the double residual stacking principle to form a Transformer stack; S5: connecting several power curve blocks using the double residual stacking principle to form a power curve stack; S6: connecting the Transformer stack and the power curve stack in series, establishing a dynamic trainable weighting mechanism, fusing the outputs of each stack, and obtaining a multi-step wind power prediction result; and S7: determining the optimal configuration scheme for network hyperparameters based on the prediction results using a grid search method.

Description

Multi-source data fusion and power prediction method for wind turbine generator

Technical Field

The invention belongs to the technical field of wind power prediction, and particularly relates to a multi-source data fusion and power prediction method of a wind turbine generator.

Background

With the continuous improvement of the ratio of wind power generation in a global renewable energy system, accurate and physical interpretable wind power prediction has become a key supporting means for safe operation and economic dispatch of a power grid. In the running process of the wind turbine generator, the high-frequency fluctuation and intermittent characteristics of the wind speed not only enable the power output to show strong nonlinearity and obvious threshold constraint (cut-in wind speed, rated wind speed and cut-out wind speed), but also further increase the complexity and uncertainty of input characteristics due to the fact that the monitoring data are low in sampling rate and high in noise. In short-term prediction (a range from a few minutes to a few hours), the high-precision requirement is particularly urgent, and the high-precision requirement is directly related to the reduction of the wind rejection rate, the optimization of the generated energy and the reasonable formulation of a unit maintenance plan, but the error accumulation effect and the model robustness in multi-step prediction are insufficient, so that the improvement of the prediction performance is still restricted.

Existing wind power prediction techniques can be broadly divided into two main categories, a statistical model based on a random process and a data-driven model based on machine learning. Statistical methods such as autoregressive moving average and Gaussian process regression can capture the randomness and time correlation of wind speed and power to a certain extent, but the strong dependence on stationarity and linear assumption makes it difficult to process the highly non-stationarity and multisource coupling characteristics in wind power monitoring data. In contrast, the traditional machine learning methods such as a support vector machine, a k-nearest neighbor and a shallow neural network have remarkable progress in short-term wind power prediction through nonlinear mapping and multidimensional feature fusion, however, the models still have limited performance in the aspects of deep capture of long-time sequence dependency and adaptive modeling of large-scale heterogeneous data.

In recent years, the deep learning method rapidly becomes a research hotspot of wind power prediction by virtue of the end-to-end characteristic extraction and strong nonlinear fitting capability. Short-term and local space-time characteristics are effectively captured through a gating mechanism or expansion convolution by a long-term memory network (LSTM), a gating circulating unit (GRU), a Time Convolution Network (TCN) and other circulating and convolution architectures, and a transducer-based model such as Informer, autoformer further utilizes a global self-attention mechanism to improve the middle-long term prediction accuracy. Furthermore, the graph neural network and its variants enable modeling of non-euclidean space dependencies by constructing a set or sensor topology. However, most of the depth models belong to a 'black box' paradigm, and physical characteristics and operation constraints of the wind turbine generator are ignored, so that the prediction results lack reliability and robustness under extreme working conditions or topology changes.

In order to make up for the lack of interpretation of a 'black box' model, a physical information machine learning method is started to be applied to the field of wind power prediction, and mainly comprises taking a conservation equation as a loss regular term, performing physical pre-training based on simulation data, performing a physical-deep learning mixed model, performing physical perception architecture design and other norms. Although the physical prior is integrated into the network to a certain extent, the method is often limited to be popularized and applied in the practical large-scale and multi-working-condition wind power plant due to the problems that the calculation of the higher derivative is complex, the deviation between simulation and actual measurement data is large, the weight of an integrated strategy is difficult to self-adaptively adjust, the architecture needs to be repeatedly designed aiming at specific scenes, and the like.

In summary, how to realize flexible self-adaptation to the wind power time sequence mode while considering both high-precision prediction and physical interpretability is a core problem to be solved in the current wind power prediction technology. Aiming at the limitations of static equal weight aggregation, lack of physical meaning of basis functions and the like of the existing nerve base expansion analysis (NBEATSx) network, physical prior is introduced and dynamic trainable weights are given to a model so as to construct a wind power prediction framework with both interpretability and robustness, which is a key point for realizing engineering application of a wind power prediction technology.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention provides a multi-source data fusion and power prediction method for a wind turbine, which realizes flexible self-adaption of a wind power time sequence mode while considering high precision and physical interpretability of prediction.

In order to achieve the above purpose, the invention provides a method for multi-source data fusion and power prediction of a wind turbine, comprising the following steps:

S1, carrying out data preprocessing on original data of a wind turbine monitoring and data acquisition system, wherein the data preprocessing comprises feature selection, outlier rejection, data set generation and division;

S2, constructing a transducer base generation module, and combining the transducer base generation module with a multi-layer fully-connected neural network to form a transducer block:

S3, constructing a power curve base generating module, and combining the power curve base generating module with a multi-layer fully-connected neural network to form a power curve block;

s4, connecting a plurality of transducer blocks through a dual residual stacking principle to form a transducer stack:

S5, connecting a plurality of power curve blocks through a double residual stacking principle to form a power curve stack;

s6, connecting a transducer stack and a power curve stack in series, establishing a dynamic trainable weighting mechanism, and fusing output of each stack to obtain a multi-step prediction result of wind power;

And S7, determining an optimal configuration scheme of the network super-parameters by utilizing a grid search method according to the prediction result.

The method for combining the transducer base and the multi-layer fully-connected neural network to form the transducer block comprises the steps of adopting a multi-layer stack-block architecture to realize time sequence decomposition and prediction by integrating exogenous variables, wherein the transducer block is a basic structural unit of the multi-layer weight connected neural network, all transducer blocks process input data according to the same logic, and a plurality of blocks are connected through a double residual stacking principle to form a stack structure, and the transducer block comprises a dynamic residual part and a static exogenous variable part, wherein the dynamic residual part is expressed as:

Wherein, the Representing the stack index and,The block index is represented as such,The size of the batch is indicated and,The designed multi-layer fully-connected neural network outputs two vectors including the trace-back value of a transducer blockAnd predicted value, wherein,To predict window length.

Further, the transducer block comprises a multi-layer fully connected neural network responsible for learning the base expansion coefficientThe specific process is as follows:

network integration backtracking base And a prediction basisAnd in the base layer, for each sampleBy means of backtracking basesAnd a prediction basisExplicit summation operation is carried out, and backtracking base expansion coefficients are respectively carried outAnd prediction base expansion coefficientMapping to backtracking valuesAnd predicted valueThe process is expressed as:

。

Further, the generating step of the transducer base includes:

S201 to give input tensor AndSplice formation along the time dimensionWherein,In order to backtrack the window of the window,Transpose tensors to prediction windows, following standard transducer architecturePerforming linear projection to obtain:

Wherein, the For the embedded layer size, a scaling factorFor stabilizing gradient magnitude, time-position information is integrated into embedded features by sinusoidal coding function, position coding matrixOffline pre-computation, whereinFor pre-calculated maximum position coding length and dynamic truncation to match sequence lengthThe final code is expressed as:

S202, encoding by using a transducer encoder layer, wherein each layer comprises a multi-head self-attention, residual connection and a position-by-position feedforward network, the process is expressed as:

Wherein, the For the following triangle causal mask, defined as:

S203 outputting the transducer through the output projection layer From the dimensionMapping to target dimensions:

By joint permutation and residual join operations, the fusion is expressed as:

。

Further, the power curve base generation module captures nonlinear characteristics of a wind power curve by adopting a smooth logic growth model, and the mathematical expression is as follows:

Wherein, the Indicating wind speed asThe predicted power at the time of the time,At the time of the maximum power output to be achieved,Is a steepness parameter controlling the rate of rise of the curve,Is the wind speed of half maximum power,AndCut-in wind speed and rated wind speed, respectively.

Further, the processing method of the power curve base generating module comprises the following steps:

S301, parameter processing and dimension adjustment, setting AndRespectively representing wind speed data in a backtracking window and a prediction window, and splicing two input components along a time dimension:

Wherein, the Based on rated cut-out wind speedFor formula (VI)Scaling the horizontal and vertical coordinates of the smoothed logic cliff-growth model to a [0,1] range, wherein the horizontal coordinates represent wind speed and the vertical coordinates represent power:

Wherein, the ,,,,,,Parameter(s)、And input sequenceThe dimensions of (2) are adjusted as follows:

S302, constructing a power curve basis function, and solving a smooth logic Style growth model through element-level broadcast operation in tensor calculation:

will be third dimension And a fourth dimensionMerging and removing single-instance dimension to obtain the following steps:

Wherein, the And (2) andCalculation formulaNon-linear scaling term in (a)And the dimension of the product is expanded to obtain the product:

the final basis function matrix is obtained by:

Wherein, the Represents element-level multiplication operations, anAnd finally, by tensor slicingSplitting into a backtracking basis function and a prediction basis function:

s303, processing the base expansion coefficient, for AndThe ReLU activation function is applied to ensure non-negativity, and then normalized by softmax:

Wherein, the Representing normalization along the last dimension of the tensor, the result is then split into backtracking base expansion coefficientsAnd prediction base expansion coefficient。

Further, the stack is obtained by forming a stack from a plurality of blocks, the firstThe input of each stack is a backtracking valueThe prediction output of the stack is aggregated from the prediction results of the blocks within the stack:

。

further, the multi-step prediction result of the wind power is a weighted sum of all stack prediction results:

Wherein, the Is a trainable weight coefficient.

Further, the numbers of the transducer blocks and the power curve blocks are obtained by using a grid search optimizing mode through a data set respectively.

Compared with the prior art, the invention has the advantages and positive effects that:

Firstly, through dual residual connection and a multi-layer stack-block architecture, the invention realizes efficient propagation of residual errors and multi-component separation modeling of signals, organically combines the non-linear fitting capacity of deep learning with the interpretability of a traditional decomposition method, greatly improves the precision and stability of time sequence feature extraction, secondly, introduces a trainable weight matrix to replace a traditional fixed accumulation strategy, enables the model to dynamically and adaptively adjust the contribution of each stack output to final prediction, enhances the response capacity to power change under different wind speed modes, thirdly, embeds the operation constraints such as cut-in wind speed, rated wind speed, cut-out wind speed, rated power and the like of a wind turbine generator into a smooth logic Stir growth basis function, ensures that the output strictly meets the physical consistency requirement through the special power curve stack injection physical priori, accords with the non-linear saturation characteristic of the wind turbine generator, and finally, adopts a Transformer encoder to generate basis vectors, thereby effectively capturing complex time dependence. The design not only overcomes the limitations of the traditional NBEATSx fixed stack and simple aggregation mode, but also realizes the deep cooperation of physical consistency and interpretability, provides a more accurate, reliable and interpretable solution for wind power prediction, and effectively bridges the gap between physical priori knowledge and a data driving model.

Drawings

FIG. 1 is a flowchart of a method for multi-source data fusion and power prediction of a wind turbine according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a physical information adaptive weighting neural based expansion analysis network according to an embodiment of the present invention;

FIG. 3 is a graph showing the comparison of the shape of a curve of a smooth logic-Style growth model under different parameters according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an overall framework of a short-term wind power prediction method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a transducer base generation module framework;

FIG. 6 is a schematic diagram of a power curve base generation module framework;

FIG. 7 is a box plot of the distribution of prediction errors of the Bi-LSTM network at each prediction step;

FIG. 8 is a box plot of the distribution of prediction errors of a transducer network at each prediction step;

FIG. 9 is a box plot of the distribution of the prediction error of the TCN network at each prediction step;

FIG. 10 is a box plot of the distribution of prediction errors for NBEATSx networks at each prediction step;

FIG. 11 is a box diagram showing the distribution of prediction errors in each prediction step for a network according to embodiments PIAW-NBEATSx of the present invention;

FIG. 12 is a PIAW-NBEATSx network generated global prediction Comparing with the actual measured power curve;

FIG. 13 is a Transformer stack prediction for PIAW-NBEATSx networks Decomposing the component graph;

FIG. 14 is a power curve stack prediction for PIAW-NBEATSx networks Decomposing the component graph;

FIG. 15 is a dynamic weighting factor for PIAW-NBEATSx networks AndEvolution diagram in training process;

FIG. 16 is a graph of a steady convergence procedure for training and validation loss for PIAW-NBEATSx networks;

FIG. 17 is a graph of the interval prediction results for PIAW-NBEATSx networks at different confidence levels;

FIG. 18 is an enlarged view of a portion of FIG. 17 at A;

fig. 19 is a partial enlarged view at B in fig. 17.

Detailed Description

The present invention will be specifically described below by way of exemplary embodiments. It will be appreciated, however, that the invention may be beneficially incorporated in other embodiments without further recitation.

The invention provides a wind turbine generator multisource data fusion and power prediction method based on a physical information self-adaptive weight neural expansion analysis network, which is based on a wind power short-term prediction framework of the physical information self-adaptive weight neural expansion analysis (PIAW-NBEATSx) network, and realizes the deep fusion of physical priori knowledge and a data driving model to predict the power of the wind turbine. The model directly embeds fan operation constraints and fan power curve characteristics into the architecture through a dedicated power curve stack, and captures complex time-dependent relationships by means of a transducer stack. In addition, the model adopts a dynamic trainable weighting mechanism to replace a fixed aggregation mode, and the contribution of each stack is self-adaptively balanced. These designs not only ensure physical consistency of the predicted results, but also promote interpretability of the model and robustness to non-physical outputs.

Referring to fig. 1, a physical information adaptive weight neural base expansion analysis network based on data fusion carries out short-term prediction on wind power. The method comprises the following steps:

s1, carrying out data preprocessing on original data of a wind turbine monitoring and data acquisition System (SCADA), wherein the data preprocessing comprises feature selection, outlier rejection, data set generation and division.

The specific feature selection method is to calculate a correlation matrix through pearson correlation coefficients to realize feature selection, and only one representative variable is reserved for a variable group which approximates complete collinearity. And then removing the physical abnormal value, normalizing the data to the range of [0,1], and completing preliminary preprocessing to generate a data set. The data set is divided into a training set, a verification set and a test set according to a certain proportion, and the specific division proportion can be divided according to the data quantity and the actual requirement, so that the limitation is not made.

S2, constructing a transducer base generation module, and combining the transducer base generation module with a multi-layer fully-connected neural network to form a transducer block.

The method adopts a multi-level stack-block architecture, and realizes time sequence decomposition and prediction by integrating exogenous variables. The transducer is a deep learning architecture, the transducer blocks are basic structural units of a multi-layer weight connected neural network, all transducer blocks process input data according to the same logic, and a plurality of transducer blocks are connected through a double residual stacking principle to form a stack structure. For example, the firstStacks ofMiddle (f)Individual blocksThe input of (1) comprises two parts, the dynamic residual part:

Wherein, the The size of the batch is indicated and,For backtracking window length, static exogenous variable part. The transform block then outputs two vectors, the traceback value of the blockAnd predicted value, wherein,To predict window length.

Each transducer block contains a multi-layer Fully Connected Neural Network (FCNN) responsible for learning the base expansion coefficientsThe specific process is as follows:

the multi-layer fully-connected neural network integrates a backtracking base And a prediction basisBacktracking baseAnd a prediction basisMay be a predefined basis function or a basis vector derived by data learning. In the base layer, for each sampleBy means of backtracking basesAnd a prediction basisExplicit summation operation is carried out to trace back the base expansion coefficients in the base expansion coefficients respectivelyAnd prediction base expansion coefficientMapping to backtracking valuesAnd predicted value. The process can be expressed as:

the generation step of the transducer base comprises the following three core stages:

s201 embedding and position coding

Given an input tensor(Backtracking Window) and(Prediction window) first formed by stitching along the time dimensionWherein. Following the standard fransformer architecture, transpose tensors intoPerforming linear projection to obtain :

Wherein, the For the embedded layer size, a scaling factorFor stabilizing the gradient magnitude.

The time position information is fused into the embedded feature by a sinusoidal coding function. Position coding matrixOffline precalculationFor a pre-calculated maximum position code length) and dynamically truncated to match the sequence length. The final code is expressed as:

S202, constructing a transducer encoder layer for encoding

Applying a transform encoder layer, each layer containing multi-headed self-attention, residual connection, and position-by-position Feed Forward Network (FFN), the process is expressed as:

Wherein, the For the following triangle causal mask, defined as:

the mask is used to mask future time steps to preserve causal relationships in the attention mechanism.

S203 output projection and residual error connection

Outputting a transducer through an output projection layerFrom the dimensionMapping to target dimensions(For blocks of basis vectors obtained based on a data learning model, there are ):

By joint permutation and residual join operations, the fusion is expressed as:

S3, constructing a power curve base generating module, and combining the power curve base generating module with a multi-layer fully-connected neural network to form a power curve block.

A smooth logic growth model (smooth logistic growth models, SLGMs) is adopted to capture nonlinear characteristics of a Wind Power Curve (WPC), and the mathematical expression is as follows:

Wherein, the Indicating wind speed asThe predicted power at the time of the time,At maximum power output (typically rated power),Is a steepness parameter controlling the rate of rise of the curve,At half maximum power wind speed (i.e. at this wind speed the power output is ),AndCut-in wind speed and rated wind speed, respectively. By setting different parameter combinations, curves of different shapes can be obtained, as shown in fig. 3.

The processing flow of the power curve base generation module comprises the following steps:

S301 parameter processing and dimension adjustment

Is provided withAndRespectively representing wind speed data within the backtracking window and the prediction window.

First, two input components are spliced along the time dimension:

Wherein, the 。

Due toTo normalize the processed sequence, it is necessary to base the cut-out wind speed on the nominalThe parameters in equation (8) are adjusted to scale the horizontal and vertical coordinates of the smooth logistic growth model to the [0,1] range, where the horizontal coordinates represent wind speed and the vertical coordinates represent power:

(9)

Wherein, the , , , , , , 。

Parameters to facilitate tensor broadcasting in a deep learning framework 、And input sequenceThe dimensions of (2) are adjusted as follows:

s302, constructing a power curve basis function

The smooth logical stellite growth model is solved by element-level broadcast operations in tensor computation:

(10)

The third dimension and the fourth dimension (corresponding to And) Merging and removing single-instance dimension to obtain the following steps:

(11)

Wherein, the And (2) and 。

Nonlinear scaling term in calculation equation (9)And the dimension of the product is expanded to obtain the product:

(12)

the final basis function matrix is obtained by:

(13)

Wherein, the Represents element-level multiplication operations, an. In particular, the method comprises the steps of,Manipulating the second and third dimension of the tensor fromIs replaced by 。

Finally, slice through tensorSplitting into a backtracking basis function and a prediction basis function:

(14)

s303 base expansion coefficient processing

As shown in fig. 2, the multi-layer "stack-block" architecture resulting from the neural base extension analysis (NBEATSx), the network includes two stacks, a transducer stack (blue block) whose basis vectors are generated by the transducer base generation module, and a power curve stack (green block) whose predefined basis functions are generated by the power curve base generation module. Each block comprises a multi-layer Fully Connected Neural Network (FCNN) for learning base expansion coefficients. Global predictorBy adaptive weightingAnd constructing, namely realizing the self-adaptive fusion of all stack prediction results.

For the power curve block, since the physical meaning of the base expansion coefficient representation of the fully connected neural network output is the weight of the smooth logic growth model SLGMs, the power curve block is needed to be added firstAndThe ReLU activation function is applied to ensure non-negativity, and then normalized by softmax:

(15)

Wherein, the Representing normalization along the last dimension of the tensor. The result is then split into traceback base expansion coefficientsAnd prediction base expansion coefficient 。

S4, connecting a plurality of transducer blocks through a dual residual stacking principle to form a transducer stack.

The number of the Transformer blocks is obtained by using a data set in a grid search optimizing mode. A plurality of blocks form a stack, the firstThe input of each stack is a backtracking valueThe prediction output of the stack is aggregated from the prediction results of the blocks within the stack:

(16)

s5, connecting a plurality of power curve blocks through a double residual stacking principle to form a power curve stack.

The number of the power curve blocks is obtained by using a data set in a grid search optimizing mode.

S6, connecting a transducer stack and a power curve stack in series, establishing a dynamic trainable weighting mechanism, and fusing output of each stack to obtain a multi-step prediction result of wind power.

The global prediction (i.e., model output) is a weighted sum of all stack predictors:

(17)

Wherein, the Is a trainable weight coefficient.

And after the optimal configuration scheme is obtained, predicting the wind power by using a physical information-based self-adaptive weight neural expansion basis analysis network of the optimal configuration scheme.

The method will be further described with reference to historical data obtained from a year-round monitoring and data collection system of a wind farm 2023 in China.

1) Acquiring data sets and performance metrics

The performance of the predictive model was verified using historical data from a China wind farm 2023 year-round supervisory control and data acquisition (SCADA) system. The data contained 37 variables with a sampling frequency of 15 minutes. As shown in fig. 4-6, the transducer basis vector and power curve basis function generation module includes a cut-in wind speedRated wind speedCut-out wind speedAnd rated powerThe key parameters are explicitly embedded into the power curve basis functions through a smooth logical growth model.

The data set construction flow is such that, first, a correlation matrix is calculated by pearson correlation coefficients to achieve feature selection, leaving only one representative variable for the set of variables that are approximately fully co-linear. 13 variables are finally screened out as static exogenous variables X in the process, wherein the static exogenous variables comprise wind speed, grid phase voltage, power factor, blade current value, blade angle, impeller rotating speed, gearbox oil temperature generator winding temperature, generator driving end/non-driving end bearing temperature, environment temperature and converter side/grid side module temperature, and the 13 variables are selected characteristics. And then removing the physical abnormal value, normalizing the data to the range of [0,1], and completing preliminary pretreatment to obtain a data set. Backtracking window lengthSet to the predicted window length5 Times of (2).

To evaluate the accuracy of the predictive model, performance metrics including Mean Absolute Percentage Error (MAPE), symmetric Mean Absolute Percentage Error (SMAPE), normalized Mean Absolute Error (NMAE), normalized Root Mean Square Error (NRMSE) and decision coefficient # )。

2) Experimental setup and proposed predictive model configuration

In the experiment, the dataset was divided into training, validation and test sets in a 6:2:2 ratio. Physical information adaptive weight neural expansion basis analysis (PIAW-NBEATSx) networks, which are the networks constructed by the present invention, were trained on training sets using an adaptive moment estimation (Adam, adaptive Moment Estimation) optimizer. The Mean Square Error (MSE) is taken as the loss function. Super-parameters are optimized by three sets of grid searches, architecture parameters (number of blocks per stackNumber of hidden layers per blockNumber of hidden units per layer) Regularization parameter (discard rate)Regularization coefficient of L1Regularization coefficient of L2) Learning rate scheduling (initial learning rate)Attenuation factorMinimum loss reduction threshold). And guiding parameter updating through model performance on the verification set. The final optimized network is configured as(The data set is subjected to network optimization to obtain a transducer block and a power curve block with three numbers respectively), 、、、、、、、Specific details are shown in Table 1. After training, the model generates five-step prediction on the test set, and the five-step prediction is spliced to form a complete prediction sequence. All experiments were performed by Python 3.9, run on NVIDIA RTX a4000 GPU.

TABLE 1 physical information based adaptive weighted neural extension basis analysis (PIAW-NBEATSx) network architecture

3) Deterministic prediction result comparison

Four reference prediction models were developed, including the original neural based expansion analysis (NBEATSx) model and three sequence-to-sequence two-way long and short term memory networks (Bi-LSTM), time Convolutional Networks (TCN), and Transformer variants. All models have the same input and output dimensions, with the input being 14-dimensional and the output being 5 steps, and all contain residual shortcuts from the original input to the output. The neural based expansion analysis (NBEATSx) model employs the architecture in table 1, but uses classical trend stacking and seasonal stacking.

TABLE 2 hyper-parameter configuration of reference models

Wherein, the Indicating the hidden layer size, the hidden indicating dimension of Bi-LSTM isIncluding a forward dimension and a rearward dimension.

The super parameters of all models are optimized through grid search, the search range covers the hidden layer size, the layer number, the discarding rate, the learning rate and the batch size, the search space is kept consistent, and the same loss function and early-stop strategy are adopted. The main superparameters are summarized in table 2.

Table 3 comparison of the performance of predictive models

Table 3 compares the performance of the physical information based adaptive weighted neural extension basis analysis PIAW-NBEATSx network with the baseline model from five metrics. Notably, due to the strong variability of real data and the large number of near zero wind values (corresponding to fan down conditions), the Mean Absolute Percentage Error (MAPE) of the baseline model is extremely high. The reference model has limited ability to capture complex time dependencies and multi-scale patterns, resulting in significant prediction errors accumulated on low-value samples. In contrast, neural extension-based analysis NBEATSx alleviates this problem to some extent by virtue of the interpretable stacked architecture and signal decomposition mechanism.

The PIAW-NBEATSx provided by the invention has obvious advantages on all evaluation indexes, not only realizes the lowest error rate, but also obtains the highest decision coefficient. This excellent performance results from the innovative construction of a power curve stack that explicitly embeds the operating parameter constraints into the basis functions, ensuring that the predicted results conform to the nonlinear saturation characteristics of WPCs. In addition, a trainable weight mechanism and a transducer stack are introduced, so that the accuracy and stability of prediction are further improved.

Overall, these comparisons verify the necessity of incorporating a priori knowledge in wind power prediction. The proposed method not only improves the interpretability and reliability of the prediction results, but also exhibits excellent prediction accuracy and robustness when processing an actual data set comprising a large number of near-zero samples.

Fig. 7-11 visualize the prediction error distribution of all models over each prediction step by box plots. As the prediction horizon is extended, the quartile range becomes wider and the whisker becomes longer, reflecting that the uncertainty of long-term prediction is increasing. In contrast, the box plot for the PIAW-NBEATSx model is significantly more compact, with a concentrated outlier distribution, where the number of bits and the mean (shown by the solid and dashed lines, respectively) almost coincide with zero, indicating that the model has very little bias and high stability. Benchmark models, especially LSTM and TCN networks, are significantly wider in case line graphs and more diffuse in outlier distributions, which is indicative of their poor stability and susceptibility to extreme errors.

4) Discussion of the proposed method

I) Interpretability analysis

FIG. 12 compares PIAW-NBEATSx network generated global predictionsWith measured power curves, FIGS. 13 and 14 illustrate the transducer stack predictionsAnd power curve stack predictionTwo decomposition components. Its dynamic weight coefficientAndEvolution during training as shown in FIG. 15, the weights converge from 1 to 1AndThe transform stack was shown to contribute significantly higher to global predictions than the power curve stack. This illustrates that timing dependent modeling dominates the prediction, while the stack, guided by physical a priori information, complements the regularization. Figure 16 further shows a steady convergence process of training and validation loss with no sign of overfitting.

Ii) sensitivity analysis

To evaluate the contribution of the three core innovations, ablation experiments were performed in this study focusing on the three parts of the dynamic trainable weight, power curve stack and transducer stack.

TABLE 4 evaluation of predicted Performance of the Transformer stack in PIAW-NBEATSx replaced with other types of stack Nbeatsx variant models

Table 5 predicted performance evaluation of the power curve stack in PIAW-NBEATSx replaced with other types of stack Nbeatsx variant models

Both the training and validation processes employ Mean Square Error (MSE) as the loss function. PI-NBEATSx represents the simplified network resulting from the removal of the dynamic weighting mechanism from the proposed method. Table 3 shows that MAPE increased 22.7% (from 9.21% to 11.30%) and R ² decreased (from 0.9874 to 0.9846) after removal of the dynamic weighting mechanism. This suggests that this mechanism plays a key role in adaptively balancing multi-stack contributions.

Table 4 lists the predicted performance of several NBEATSx variants. Each variant decomposes the time sequence into different components through a special stack, wherein the trend stack models the long-term trend by using a polynomial basis function, the seasonal stack captures periodic oscillation by means of sine and cosine basis functions, the identity stack omits the basis function generation, and the full-connection network is directly output(As defined by equation (2)) as a block predictor. Furthermore, the architecture of WaveNet, TCN, LSTM, etc. can be used as an alternative timing dependent learner, which functions like a transform encoder, extracting features from the input sequence and using the network output as a basis vector. The results of Table 3, in combination with which the PIAW-NBEATSx architecture performs significantly better than all variants, verify the necessity of the dual stack design shown in FIG. 2.

When replacing the Transformer stack (Table 4), the model performance is greatly degraded by the limitations of the predefined basis functions, resulting in a rush of MAPE up to 144.6% -534.7% for the conventional stack (trend/seasonal), and the R2 value (≤0.978) is still lower than for the proposed model, although the general timing model (WaveNet/TCN/LSTM) partially alleviates this problem (MAPE between 80.6% -294%). This suggests that the transfomer is an irreplaceable component of the core adaptive timing modeling-its multi-headed attention mechanism effectively captures complex dependencies, avoiding the assumed trend or periodic generalized bias in traditional approaches.

Experiments replacing the power curve stack (table 5) reveal their domain specific value. WaveNet, the general timing model achieves MAPE (10.1522%) closest to PIAW-NBEATSx, but the NMSE (0.1603%) is still higher than the original architecture (0.1463%). Conventional stacks produce extreme errors due to physical property mismatch. These results demonstrate that the power curve stack solves the generalization limitation of a pure data driven model in a specific scenario by embedding the physical constraints derived from WPC. In particular, in dealing with nonlinear saturation effects, its role as a domain knowledge carrier is indispensable.

Innovative synergy of the two types of stacks is key to performance improvement, namely, a transducer stack is used as a general timing engine to extract complex features, and power curve stacks are injected with prior knowledge (such as wind power physical constraint) in the field to ensure physical consistency. Ablation studies have shown that removing either stack results in a significant performance degradation, replacing the transducer stack results in overall performance degradation, while replacing the power curve stack results in predicted collapse in a particular scenario. In contrast, the conventional NBEATSx variant relies on a generic basis function, which makes it difficult to characterize complex domain-specific structures in an industrial time series. This verifies the design advantages of PIAW-NBEATSx-both overcoming the limitations of the fixed stack in conventional NBEATSx and enhancing the engineering applicability of the framework by configurable domain-specific blocks. The method provides more accurate, reliable and interpretable prediction, and provides a new approach for research and application of complex wind power time sequence analysis.

Iii) Uncertainty analysis

We evaluate the probabilistic predictive capability of PIAW-NBEATSx networks using quantile loss (Pinball loss), defined as:

the loss function quantifies the prediction quantile and the actual value in Deviation under horizontal.

As shown in fig. 17-19, the model achieved an empirical coverage of 50%, 70% and 90%, with average interval widths of 59.90 kw, 93.82 kw and 199.03 kw, respectively. A relatively narrow interval (especially at the 50% level) indicates that the model has accurate uncertainty quantization capability.

Notably, FIG. 18 shows that PIAW-NBEATSx maintains high accuracy and low uncertainty in the low power range (+.750kilowatts). This local accuracy explains its MAPE advantage in table 4 over the baseline model and all NBEATSx variants-a large number of near zero values in the wind power dataset would significantly pull up the MAPE, while PIAW-NBEATSx exhibited its excellent robustness in power prediction with the lowest MAPE of 9.21%.

The foregoing description is, of course, merely illustrative of preferred embodiments of the present invention, and it should be understood that the present invention is not limited to the above-described embodiments, but is intended to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.

Claims

1. A method for multi-source data fusion and power prediction of a wind turbine generator system, comprising the following steps:

S1. Data preprocessing of the raw data of the wind turbine monitoring and data acquisition system, including feature selection, outlier removal, data set generation and partitioning;

S2. Construct a Transformer-based generation module and combine it with a multi-layer fully connected neural network to form a Transformer block:

S3. Construct a power curve base generation module and combine it with a multi-layer fully connected neural network to form a power curve block. The power curve base generation module uses a smooth logistic growth model to capture the nonlinear characteristics of the wind power curve. Its mathematical expression is:

in, Indicates wind speed The predicted power at is the maximum power output, is the steepness parameter that controls the rising rate of the curve, is the half-maximum power wind speed, and are the cut-in wind speed and rated wind speed respectively;

The processing method of the power curve base generation module includes:

S301: Parameter processing and dimension adjustment, setting and Represent the wind speed data in the lookback window and the forecast window respectively, and concatenate the two input components along the time dimension:

in, ; Based on rated cut-out wind speed Formula Adjust the parameters in to scale the horizontal and vertical coordinates of the smoothed logistic growth model to the range [0,1], where the horizontal coordinate represents wind speed and the vertical coordinate represents power:

in, , , , , , , ;parameter 、 and the input sequence The dimensions of are adjusted as follows:

S302: Construct the power curve basis function and solve the smoothed logistic growth model through element-level broadcasting in tensor calculations:

The third dimension and the fourth dimension After merging and removing the singleton dimension, we get:

in, ,and ; Calculation formula Nonlinear scaling term in And after expanding its dimension we get:

The final basis function matrix is obtained as follows:

in, represents element-wise multiplication, and ; Finally, through tensor slicing Split into backtracking basis function and predictive basis function:

S303: base expansion coefficient processing, and Apply the ReLU activation function to ensure non-negativity, and then normalize it through softmax:

in, Represents normalization along the last dimension of the tensor; the result is then split into the back-basis expansion coefficients and the predicted base expansion coefficient ;

S4. Connect several Transformer blocks through the double residual stacking principle to form a Transformer stack:

S5, connecting the plurality of power curve blocks through a double residual stacking principle to form a power curve stack;

S6: Connect the Transformer stack and the power curve stack in series, establish a dynamic trainable weighting mechanism, fuse the outputs of each stack, and obtain a multi-step prediction result of wind power;

S7. Determine the optimal configuration of network hyperparameters using the grid search method based on the prediction results.

2. A wind turbine multi-source data fusion and power prediction method according to claim 1, characterized in that: the Transformer base is combined with a multi-layer fully connected neural network to form a Transformer block by adopting a multi-level "stack-block" architecture to achieve time series decomposition and prediction by integrating exogenous variables; the Transformer block is the basic structural unit of the multi-layer weighted connection neural network, all Transformer blocks follow the same logic to process input data, and multiple blocks are connected to form a stack structure through the double residual stacking principle; the Transformer block includes a dynamic residual part and a static exogenous variable part, and the dynamic residual part is expressed as:

in, represents the stack index, Represents the block index, represents the batch size, is the length of the lookback window; the designed multi-layer fully connected neural network outputs two vectors: including the lookback value of the Transformer block and predicted values ,in, is the prediction window length.

3. A wind turbine multi-source data fusion and power prediction method according to claim 2, characterized in that the Transformer block comprises a multi-layer fully connected neural network, which is responsible for learning the basis expansion coefficient The specific process is as follows:

Network Integration Backtracking Base and prediction base ; In the base layer, for each sample , using the backtracking basis and prediction base Perform explicit summation operations and expand the backtracking basis coefficients and the predicted base expansion coefficient Mapping to backtracking value and predicted values , the process is expressed as:

.

4. The method for multi-source data fusion and power prediction of a wind turbine generator set according to claim 3, wherein the step of generating the Transformer basis comprises:

S201: Given an input tensor and Splicing along the time dimension ,in , is the lookback window, is the prediction window; following the standard Transformer architecture, the tensor is transposed to And perform linear projection to get :

in, is the embedding layer size, scaling factor Used to stabilize the gradient magnitude; the time position information is integrated into the embedded features through the sine encoding function, the position encoding matrix Offline precomputation, where Encodes the length of the maximum position for the precomputed sequence and dynamically truncated to match the sequence length. , the final encoding is expressed as:

S202: Apply the Transformer encoder layer for encoding. Each layer contains multi-head self-attention, residual connection and position-by-position feedforward network. The process is expressed as:

in, is the lower triangular causal mask, defined as:

S203: Output Transformer through the output projection layer From the dimension Mapping to target dimension :

By combining permutation and residual connection operations, the fusion is expressed as:

.

5. A method for multi-source data fusion and power prediction of a wind turbine according to claim 1, characterized in that the stack acquisition method is: multiple blocks constitute a stack, the first The input of the stack is the backtrace value , the prediction output of the stack is aggregated from the prediction results of each block within it:

.

6. A wind turbine multi-source data fusion and power prediction method according to claim 1, characterized in that the multi-step wind power prediction result is a weighted sum of all stacked prediction results:

in, is the trainable weight coefficient.

7. A wind turbine multi-source data fusion and power prediction method according to claim 1, characterized in that the number of the Transformer blocks and the power curve blocks are respectively obtained by grid search optimization using a data set.