Deep learning model-based marine environment information data assimilation method and system
Technical Field
The invention relates to the technical field of marine environment information data processing, in particular to a marine environment information data assimilation method and system based on a deep learning model.
Background
The acquisition path of the marine environment information can be directly observed through equipment. However, due to objective factors such as equipment and environment, the observed data often have errors or cannot be measured. Therefore, when using the observation data, error analysis is required for the acquired raw data to acquire the observation data with the highest reliability. The data assimilation is a bridge between a system model and observed data, is a sequence deducing method for dynamic system model and static observed parameters, combines a forecasting field of a numerical mode and high-precision observed information of different sources under the existing ocean power system frame, continuously reduces random errors of the numerical mode by an optimizing method to obtain an analysis field which is more accurate or more complete than the original numerical mode result and observed information, better describes the real state of the ocean, and is widely applied to the fields of ocean data analysis and numerical forecasting.
Currently, the assimilation method for marine environment information mainly comprises a variational method, a set Kalman filtering method, a set optimal smoothing method and the like. For the acquisition of the observation data of marine environment elements including marine environment temperature, salinity, ocean current flow speed and the like, a periodic acquisition mode is generally adopted by arranging monitoring points and arranging corresponding sensors, the acquired marine environment observation data such as the marine environment temperature, salinity, ocean current flow speed and the like are distributed in time sequence space and three-dimensional space, and compared with the change trend of the marine environment, the acquired observation data sample size is sparse, and when the assimilation method is adopted based on sparse observation data, the deviation between an analysis field of a dynamic system model and a real marine state can be caused. In order to solve the technical problem of how to realize high-resolution prediction of ocean environment element space distribution fields such as ocean environment temperature, salinity, ocean current flow velocity and the like in an observation area based on sparse observation data so as to meet the requirement of a high-resolution ocean environment information by a high-precision dynamic system model, a deep learning model-based ocean environment information data assimilation method and a deep learning model-based ocean environment information data assimilation system are provided.
Disclosure of Invention
The invention mainly aims to provide a marine environment information data assimilation method and system based on a deep learning model, which can effectively solve the problems in the background technology.
In order to achieve the above purpose, the invention adopts the technical proposal that,
The marine environment information data assimilation method based on the deep learning model comprises the following steps:
S1, acquiring time sequence observation data of any marine environmental element in an observation area on a space distribution field, wherein the marine environmental element comprises temperature, salinity and ocean current flow rate;
s2, classifying the observed data into N sets according to sampling time to define For the state analysis value of the ith set at time t,And expressing the acquired time series observation data by using a set Kalman filtering arithmetic expression for the state predicted value of the ith set at the time of t+1, and iteratively updating the state analyzed value, wherein the set Kalman filtering arithmetic expression is as follows:
Wherein M t,t+1 is a linear prediction factor from time t to time t+1, w i,t is a model error at time t, and w i,t~N(0,Qt),Qt is a covariance matrix of the set; the state predicted value of the ith set at the time t is obtained; The method comprises the steps of determining a state analysis value of an ith set at a moment t, wherein alpha is a current state parameter, and f is a predicted state parameter;
The state analysis value iterative updating process expression is as follows:
in the formula (1), the components are as follows, The state analysis value at the time t+1 for the ith set; K t+1 is a gain matrix; V i,t is the observation error at time t, and v i,t~N(0,Qt);Ht+1 is the observation factor at time t+1;
in the formula (2), the amino acid sequence of the compound, The average value of the state analysis values at the time t+1 is taken as the average value of the state analysis values, and N is the total data amount;
In the formula (3), the amino acid sequence of the compound, The method comprises the steps of analyzing a field error variance matrix, wherein H is an observation factor, and R t is a measurement noise covariance matrix at t moment;
In the formula (4), the amino acid sequence of the compound, The state prediction value mean value of the ith set at the time t is obtained;
In the formula (7), the amino acid sequence of the compound, The average value of the state predicted values at the time t+1;
S3, taking the iteratively updated data error w i,t as the input of a generator G, constructing and generating an countermeasure network by taking the acquired observation data as real data, generating virtual observation data G (z) by using the generator G, training a discriminator D and the generator G for generating the countermeasure network by the observation data and the generated virtual observation data G (z), and outputting a result D (G (z))=1 to the discriminator D;
The training process to generate the countermeasure network includes the following steps:
S31, generating a virtual observation data sample according to the input iterative updated data error w i,t through a generation network of the generator G, inputting the virtual observation data sample into the discriminator D, and judging by utilizing a discrimination network of the discriminator D;
S32, classifying the input virtual observation data sample and real data by the judging network, and calculating classification errors;
s33, updating parameters of the discrimination network according to the classification errors so as to improve the classification accuracy;
s34, the generating network updates own parameters according to the feedback information of the judging network so as to generate a more realistic virtual observation data sample;
S35, repeating the steps S31-S34 until the preset training round number or convergence condition is reached;
the training process of the generator G comprises the following steps:
S311, acquiring a data sample of the iteratively updated data error;
s312, transforming the data samples of the data errors;
S313, judging the classification result of the discriminator D, and dividing the classification result into true or false, wherein the output of the discriminator D is a binary variable D (x) =1 or D (x) =0, when the output D (x) =1, the classification result of the discriminator D is true, and when the output D (x) =0, the classification result of the discriminator D is false;
s314, calculating the loss of the discrimination network according to the classification result of the discriminator D;
S315, counter-propagating is carried out through a discriminator D and a generator G to obtain the gradient of the discrimination network;
S316, generating network parameters by using a gradient modification generator G of the discrimination network;
the training process of the discriminator D comprises the following steps:
s321, classifying the input real data and the virtual observation data sample generated by the generator G;
s322, penalty function of discrimination network of the discriminator D penalizes the mistakes made by the penalty function;
s323, through back propagation, the discriminator D updates the weight of the discriminating network;
The discriminator D is used for discriminating the authenticity of the input data, namely distinguishing the authenticity data from the virtual observation data generated by the generator G, wherein the input parameter of the discriminator D is x and the output parameter D (x) is set, wherein D (x) is represented as the probability that x is the authenticity data, if the output parameter D (x) =1, the input parameter x is represented as 100% of the authenticity data, and if the output parameter D (x) =0, the input parameter x is represented as the impossibility of being the authenticity data;
S4, synthesizing a plurality of virtual observation data samples by using a trained generator G model, wherein the following relation between the virtual observation data sample size and the real data sample size is satisfied:
Wherein Q i is represented as the data sample size of the ith set of observation data, Q v is represented as the virtual observation data sample size, Q max is represented as the maximum data sample size in the N sets of observation data, ε is a constant coefficient of not less than 1;
s5, repeating the steps S2-S4 until the assimilation performance evaluation index reaches a set expected value, wherein the assimilation performance evaluation index comprises a Root Mean Square Error (RMSE) of observed data and a set average value (RMSE t) of predicted values of the observed data at the moment t, and a calculation formula of the Root Mean Square Error (RMSE) of the observed data is as follows:
the calculation formula of the aggregate mean RMSE t of the observed data predictors is:
Wherein L is assimilation time length, X (t) is a variable function of real state at t moment; the state predicted value of the ith set at the time t-1.
The marine environment information data assimilation system based on the deep learning model comprises a data acquisition module, a data classification module, a data processing module, a generation countermeasure network construction module, a data iteration module and a performance judgment module;
The data acquisition module is used for acquiring time sequence observation data of any marine environmental element in an observation area on a space distribution field, wherein the marine environmental element comprises temperature, salinity and ocean current flow rate;
the data classification module is used for classifying the observed data into N sets according to sampling time and defining For the state analysis value of the ith set at time t,State predicted values of the ith set at the time t+1;
The data processing module is used for expressing the acquired time series observation data by using a set Kalman filtering algorithm and carrying out iterative updating on the state analysis value, the generation countermeasure network construction module is used for constructing and generating a countermeasure network by taking the acquired observation data as real data by taking a data error w i,t of iterative updating as the input of a generator G, generating virtual observation data G (z) by using the generator G, training a discriminator D and the generator G of the generated countermeasure network by using the observation data and the generated virtual observation data G (z), and synthesizing a plurality of virtual observation data samples by using a trained generator G model when the output result of the discriminator D is D (G (z))=1;
the data iteration module is used for expressing the obtained virtual observation data sample by using a set Kalman filtering algorithm and carrying out iteration update on the state analysis value;
The performance judging module is used for evaluating the data assimilation process and judging whether the data assimilation process is continuously subjected to iterative updating according to a set assimilation performance evaluation index expected value, wherein the assimilation performance evaluation index comprises Root Mean Square Error (RMSE) of observed data and an aggregate mean value (RMSE t) of predicted values of the observed data at the moment t;
the system includes a memory, a processor, and a computer program stored on the memory and executable on the processor.
The invention has the following advantages that,
Compared with the prior art, the method has the advantages that the time sequence observation data of any marine environmental element in the observation area on the space distribution field is obtained, the observation data are classified into N sets according to sampling time, and definition is givenFor the state analysis value of the ith set at time t,For the state predicted value of the ith set at the time t+1, expressing the acquired time series observation data by using a set Kalman filtering algorithm, carrying out iterative updating on the state analyzed value, taking an iteratively updated data error w i,t as the input of a generator G, constructing and generating an countermeasure network by taking the acquired observation data as real data, generating virtual observation data G (z) by using the generator G, training a discriminator D and the generator G for generating the countermeasure network by using the observation data and the generated virtual observation data G (z), synthesizing a plurality of virtual observation data samples by using a trained generator G model, controlling the iterative process of an assimilation flow by judging whether an assimilation performance evaluation index reaches a set expected value, and effectively solving the problem that the high-resolution prediction of the ocean environment element space distribution field in an observation area is realized based on sparse observation data, thereby meeting the requirement of a high-precision dynamic system model on high-resolution ocean environment information.
Drawings
FIG. 1 is a flow chart of a marine environmental information data assimilation method based on a deep learning model of the invention;
FIG. 2 is a block diagram of a deep learning model-based marine environmental information data assimilation system of the present invention;
Fig. 3 is a block diagram of a structure of a generated countermeasure network constructed by the present invention.
Detailed Description
The present invention will be further described with reference to the following detailed description, wherein the drawings are for illustrative purposes only and are presented as schematic drawings, rather than physical drawings, and are not to be construed as limiting the invention, and wherein certain components of the drawings are omitted, enlarged or reduced in order to better illustrate the detailed description of the present invention, and are not representative of the actual product dimensions.
The specific implementation flow of the technical scheme of the invention comprises the following steps:
step1, acquiring time series observation data of any marine environmental element in an observation area on a space distribution field, wherein the marine environmental element comprises temperature, salinity and ocean current flow velocity.
Step 2, classifying the observed data into N sets according to sampling time to defineFor the state analysis value of the ith set at time t,The state predicted value of the ith set at the time t+1.
And 3, expressing the acquired time series observation data by using a set Kalman filtering expression, and iteratively updating the state analysis value, wherein the set Kalman filtering expression is as follows:
Wherein M t,t+1 is a linear prediction factor from time t to time t+1, w i,t is a model error at time t, and w i,t~N(0,Qt),Qt is a covariance matrix of the set;
The state analysis value iterative update process expression is:
in the formula (1), the components are as follows, The state analysis value at the time t+1 for the ith set; K t+1 is a gain matrix; v i,t is the observation error at time t and v i,t~N(0,Qt);
in the formula (2), the amino acid sequence of the compound, The mean value of the state analysis values at the time t+1;
In the formula (3), the amino acid sequence of the compound, The method comprises the steps of analyzing a field error variance matrix, wherein H is an observation factor, and R t is a measurement noise covariance matrix at t moment;
In the formula (4), the amino acid sequence of the compound, The state prediction value mean value of the ith set at the time t is obtained.
Step 4, taking the iteratively updated data error w i,t as the input of a generator G, and constructing and generating an countermeasure network by taking the acquired observation data as real data, wherein the structure of the generated countermeasure network is shown in fig. 3, generating virtual observation data G (z) by using the generator G, training a discriminator D and the generator G for generating the countermeasure network by using the observation data and the generated virtual observation data G (z), and outputting the result D (G (z))=1;
The training process to generate the countermeasure network includes the following steps:
S41, generating a virtual observation data sample according to the input iterative updated data error w i,t through a generation network of the generator G, inputting the virtual observation data sample into the discriminator D, and judging by utilizing a discrimination network of the discriminator D;
S42, classifying the input virtual observation data sample and real data by the discrimination network, and calculating classification errors;
s43, updating parameters of the discrimination network according to the classification errors so as to improve the classification accuracy;
S44, the generation network updates own parameters according to the feedback information of the discrimination network so as to generate a more realistic virtual observation data sample;
s45, repeating the steps S41-S44 until the preset training round number or convergence condition is reached;
the training process of the generator G comprises the following steps:
S411, acquiring a data sample of the iteratively updated data error;
S412, transforming the data samples of the data error;
S413, judging the classification result of the discriminator D, and classifying the classification result into true or false, wherein the output of the discriminator D is a binary variable D (x) =1 or D (x) =0, when the output D (x) =1, the classification result of the discriminator D is true, and when the output D (x) =0, the classification result of the discriminator D is false;
s414, calculating the loss of the discrimination network according to the classification result of the discriminator D, wherein the loss of the discrimination network can be calculated by a loss function, and the expression is as follows:
Wherein J (D)(θ(D),θ(G)) represents loss of the discrimination network, E represents expected probability, x-P represents distribution that x satisfies P;
s415, carrying out counter propagation through a discriminator D and a generator G to obtain a gradient of a discrimination network;
S416, generating network parameters by using the gradient modification generator G of the discrimination network;
the training process of the discriminator D comprises the following steps:
s421, classifying the input real data and the virtual observation data sample generated by the generator G;
s422, penalty function of discrimination network of the discriminator D penalizes the mistakes made by the penalty function;
s423, through back propagation, the discriminator D updates the weight of the discriminating network;
The discriminator D is used for discriminating the authenticity of the input data, namely distinguishing the authenticity data from the virtual observation data generated by the generator G, wherein the input parameter of the discriminator D is x and the output parameter D (x) is set, wherein D (x) is represented as the probability that x is the authenticity data, if the output parameter D (x) =1, the input parameter x is represented as 100% of the authenticity data, and if the output parameter D (x) =0, the input parameter x is represented as the impossibility of being the authenticity data;
it should be noted that in the training phase, the generation countermeasure network uses two data sources, real data, i.e. sampled observed data, which is an example of how the generator G tries to copy, and generated data from the generator G, the real data being taken as positive examples in the training and the generator G example as negative examples. During the training of the discriminator D, the generator G remains unchanged, when it generates the data for the training of the discriminator D, the weight of its network will not change, when the training starts, the generator G generates random apparent dummy data, the discriminator D can easily judge the data as dummy data, and as the training proceeds, the generator G gets closer to generating an output that can spoof the discriminator D until it starts classifying the dummy data as real data.
And 5, synthesizing a plurality of virtual observation data samples by using a trained generator G model, wherein the following relation between the virtual observation data sample size and the real data sample size is satisfied:
Wherein Q i is represented as the data sample size of the ith set of observation data, Q v is represented as the virtual observation data sample size, Q max is represented as the maximum data sample size in the N sets of observation data, ε is a constant coefficient of not less than 1;
and 6, repeating the steps S2-S4 until the assimilation performance evaluation index reaches a set expected value, wherein the assimilation performance evaluation index comprises a Root Mean Square Error (RMSE) of observed data and a set average value (RMSE t) of predicted values of the observed data at the moment t, and the calculation formula of the Root Mean Square Error (RMSE) of the observed data is as follows:
the calculation formula of the aggregate mean RMSE t of the observed data predictors is:
Wherein L is assimilation time length, X (t) is a variable function of real state at t moment;
When the calculation result of the assimilation performance evaluation index meets the set expected value, it is indicated that the time series observation data of the marine environment elements on the time length L is completely assimilated, if the time length L is less than or equal to the total sampling time length of the observation data, the remaining observation data is required to be assimilated, and if the time length L is greater than the total sampling time length of the observation data, it is indicated that the sampled observation data is completely assimilated.
The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.