CN115408943B

CN115408943B - Combustion optimization method for thermal power generating unit

Info

Publication number: CN115408943B
Application number: CN202211046653.4A
Authority: CN
Inventors: 黄晓明
Original assignee: Nanjing Innavitt Automation Technology Co ltd; Jinggangshan Power Plant of Huaneng Power International Inc
Current assignee: Nanjing Innavitt Automation Technology Co ltd; Jinggangshan Power Plant of Huaneng Power International Inc
Priority date: 2022-08-30
Filing date: 2022-08-30
Publication date: 2025-02-18
Anticipated expiration: 2042-08-30
Also published as: CN115408943A

Abstract

The present invention provides a combustion optimization method for thermal power units, which takes the optimal boiler efficiency and pollutant emission as the optimization control targets, uses the improved asynchronous advantage actor-critic algorithm as the control algorithm, and trains through the state design and reward design of the thermal power unit combustion system. The improved asynchronous advantage actor-critic algorithm has a fast convergence speed and stronger learning ability. It can effectively find the best control strategy for the complex problem of multi-variable constraints and multi-objective optimization of boiler combustion optimization, and at the same time considers the problem of wall temperature control, which can effectively improve the safety of unit operation.

Description

Combustion optimization method for thermal power generating unit

Technical Field

The invention relates to the field of automatic control of thermal power generating units, in particular to a combustion optimization method of a thermal power generating unit.

Background

With the continuous growth of new energy industry, wind power and solar energy gradually change the current power grid pattern, and the flexibility transformation of each unit is one of main directions due to the instability of new energy. The main technical difficulty of the flexible transformation is how to enable the high-capacity coal motor unit to carry out deep peak shaving towards the ultra-low load, and therefore higher requirements are put on the combustion stability of the high-capacity coal motor unit. The boiler combustion optimization reduces the generation of NOx by means of grading air distribution, coordination of coal blending and the like, and simultaneously takes into account economic indexes such as CO emission concentration, boiler efficiency and the like and safety indexes such as high-temperature steam pipe wall overtemperature prevention and the like, is a complex problem of multi-field coupling, multivariable restriction and multi-objective optimization, and can cause the problems of instability of boiler combustion, incapability of effective operation of a denitration system, overtemperature of the pipe wall of a boiler steam-water system and the like due to load reduction under deep peak regulation operation. How to effectively ensure boiler efficiency and NOx emission is an important research problem for combustion optimization under deep peak shaving.

The boiler combustion condition is more complicated, the limiting conditions are more, the current group intelligent optimization algorithm can not find the optimal solution, and the state description and the action evaluation dynamic adjustment strategy of the environment can be directly utilized through deep reinforcement learning so as to meet the final target control, but sometimes learning is directly performed through feedback of the environment, so that the learning efficiency is lower.

Based on the problems, the invention improves the algorithm of the actor commentator with asynchronous advantages, increases the Dyna structure to increase the training efficiency of the algorithm, dynamically adjusts the learning proportion of the virtual environment, and performs the state design and the rewarding design of the algorithm for the boiler combustion system, thereby reducing the time required by training while considering various constraints.

Disclosure of Invention

The invention aims to overcome the defects in the background art, and provides a combustion optimizing method of a thermal power generating unit, which is realized by the following scheme:

the invention provides a thermal power generating unit combustion optimization method, which comprises the following steps:

Step 1, carrying out state design and rewarding design of an algorithm according to the condition of boiler equipment;

Step2, designing a network structure and basic parameters of an asynchronous dominant actor commentator algorithm;

Step 3, establishing a multi-variable transfer function model of the boiler combustion system;

step4, establishing an asynchronous dominant actor criticizing algorithm based on the virtual environment model according to the transfer function model in the step 3, and training;

The algorithm state design in the step 1 comprises a set value, an actual value, an adjustment quantity and deviation required by the algorithm, wherein the set value comprises a carbon monoxide quantity CO _sp, a smoke exhaust temperature T _e,sp and a NOx concentration set value NO _x,sp for a boiler combustion system, the adjustment quantity comprises a total air quantity D _AIR, an nth layer combustor coal quantity D _B,n, an ith layer primary air door opening V _f,i, a jth layer secondary air door opening V _s,j, a kth layer fuel air door opening V _c,k and a combustor swing angle A _f, the actual value carbon monoxide quantity CO, the smoke exhaust temperature T _e and the NOx concentration NO _x,e_Cx,e_Te,e_NOx are respectively a carbon monoxide quantity deviation, a smoke exhaust temperature deviation and a NOx concentration deviation, a safety margin delta T _P between an actual wall temperature maximum value of a water wall and an overtemperature value, and a state S _t of a coordinated control system at a moment T:

S_t＝{CO_sp,T_e,sp,NO_x,sp,D_AIR,D_B,n,V_f,i,V_s,j,V_c,k,A_f,CO,T_e,NO_x,e_CO,e_Te,e_NOx,ΔT_P}.

The algorithm rewarding design in the step 1 is divided into a continuous rewarding item and a control quantity change rate limiting item, and the continuous rewarding is as follows:

e _t is the weighted bias at time t, K ₁ is the continuous bonus term weight;

the control amount change rate limiting term is:

k ₂ is the weight of the control quantity change rate limiting item;

The asynchronous dominant actor critique algorithm based on the virtual environment model in the step 4 adds a Dynabar structure for each thread by utilizing the transfer function model in the step 3, so that the algorithm trains in the transfer function model while learning in the real environment to improve the learning efficiency, and adds the dynamic weight mu for learning in the virtual environment, and then the parameters are updated as follows:

θ′_a←θ′_a+με_adθ′_a,θ′_v←θ′_v+με_vdθ′_v

Wherein the method comprises the steps of In the above formula, rm is a cumulative reward in the virtual environment model, rm is set to zero when Rm is less than 0, rm is a maximum reward given by each step of virtual environment model, j is the number of repeated execution in the virtual environment model, and μ decreases with increasing global parameter updating times and increasing cumulative reward in each thread. θ '_a and θ' _v are global sharing parameters in the virtual environment, and T is a global sharing count.

Advantageous effects

According to the method disclosed by the invention, the boiler combustion system of the actor commentator algorithm with asynchronous advantages is optimized, the problem of multi-target and multi-coupling can be solved, and meanwhile, the Dynaberry structure is added to improve the learning efficiency of the algorithm, so that the algorithm converges more quickly, the algorithm training is simple, and engineering practice can be realized.

Drawings

Fig. 1 is a schematic diagram of a combustion system of a thermal power generating unit.

FIG. 2 is a diagram of the boiler combustion optimization training of the present invention.

Detailed description of the preferred embodiments

The invention will be further described with reference to the drawings and the detailed description.

Fig. 1 is a schematic diagram of a boiler combustion system of a thermal power generating unit, which is a six-input three-output system, wherein the inputs are total air quantity, fuel quantity (or total fuel quantity) of each layer, opening degree of a primary air door of each layer, opening degree of a secondary air door of each layer, opening degree of an exhaust air of each layer and a swinging angle of a burner, and the outputs are carbon monoxide concentration, NOx concentration and exhaust gas temperature.

FIG. 2 is a training result of a boiler combustion system algorithm based on a deep reinforcement learning algorithm, including rewarding design, status design, network parameter setting, virtual environment setting of the algorithm. Firstly, designing the state and the rewarding of an algorithm, then inputting the state and the rewarding value into the algorithm, wherein the state is simultaneously input into a virtual environment in Dyna and an actual algorithm, integrating after the algorithm outputs action to obtain the adjustment quantity at the actual current moment, and enabling the adjustment quantity to act on an actual boiler combustion system to obtain the state and rewarding at the next moment, so that repeated learning is performed until the algorithm converges.

The method mainly comprises the following steps:

The algorithm state design comprises a set value, an actual value, an adjustment quantity and deviation required by the algorithm, wherein the set value comprises a carbon monoxide quantity CO _sp, a smoke exhaust temperature T _e,sp and a NOx concentration set value NO _x,sp for a boiler combustion system, the adjustment quantity comprises a total air quantity D _AIR, an nth layer burner coal quantity D _B,n, an ith layer primary air door opening V _f,i, a jth layer secondary air door opening V _s,j, a kth layer fuel air door opening V _c,k and a burner swing angle A _f, the actual value carbon monoxide quantity CO, the smoke exhaust temperature T _e and the NOx concentration NO _x,e_Cx,e_Te,e_NOx are respectively the carbon monoxide quantity deviation, the smoke exhaust temperature deviation and the NOx concentration deviation, and a safety margin delta T _P between an actual wall temperature maximum value and an overtemperature value of a water-cooled wall, and the state S _t of the coordination control system at the moment T:

S_t＝{CO_sp,T_e,sp,NO_x,sp,D_AIR,D_B,n,V_f,i,V_s,j,V_c,k,A_f,CO,T_e,NO_x,e_CO,e_Te,e_NOx,ΔT_P}

the algorithm rewarding design in the step 1 is divided into a continuous rewarding item and a control quantity change rate limiting item, wherein the continuous rewarding item is as follows:

Wherein e _t is the weighted deviation at time t, [ lambda ] ₁,λ₂,λ₃]^T is a deviation weight matrix for adjusting the specific gravity among three adjustment targets, and K ₁ is the weight of the continuous rewarding item;

the control amount change rate limiting term is:

k ₂ is the weight of the control quantity change rate limiting item;

The network structure of the asynchronous dominant actor commentator algorithm in the step 2 is a 4-layer full-connection layer, the commentator network input layer comprises 20 nodes including state information and rewarding information, two middle layers respectively comprise 80 nodes, the output layer comprises 1 node, the actor network input layer comprises 20 nodes including state information and output of the commentator network, the middle hidden layer is the same as the commentator network, and the output layer comprises 3 nodes. The basic parameters are global maximum update times T _max, thread maximum update times T _max, actor network learning rate epsilon _a, critic network learning rate epsilon _v, virtual environment model repetition times n, thread update discount factor gamma and global update frequency F _update, and the specific settings are as follows:

The convergence speed of the algorithm can be generally judged according to the boiler structure and design parameters, so that the parameters can be adjusted.

The multi-variable transfer function model of the boiler combustion system in the step 3 is used as a virtual environment for auxiliary training, the precision requirement is not high, the general disturbance experiment at a certain load point is carried out to directly establish the 6-input 3-output transfer function model of the boiler combustion system, the model is only used as the virtual environment in Dyna, the initial convergence rate of the algorithm is improved, and the final precision is not influenced

The asynchronous dominant actor critique algorithm based on the virtual environment model in the step 4 adds a Dyna structure for each thread by utilizing the transfer function model in the step 3, so that the algorithm is trained in the transfer function model while learning in the real environment to improve the learning efficiency, and the dynamic weight mu of the virtual environment learning is added, and then the parameters are updated as follows:

θ′_a←θ′_a+με_adθ′_a,θ′_v←θ′_v+με_vdθ′_v

Wherein the method comprises the steps of

In the above formula, rm is a cumulative reward in the virtual environment model, rm is set to zero when Rm is less than 0, rm is a maximum reward given by each step of virtual environment model, j is the number of repeated execution in the virtual environment model, and μ decreases with increasing global parameter updating times and increasing cumulative reward in each thread. θ '_a and θ' _v are global sharing parameters in the virtual environment, and T is a global sharing count.

Claims

1. A method for optimizing combustion of a thermal power unit, characterized in that the specific steps are as follows:

Step 1: Design the algorithm’s state and reward according to the boiler equipment conditions;

Step 2: Design the network structure and basic parameters of the asynchronous advantage actor-critic algorithm;

Step 3: Establish a multivariable transfer function model of the boiler combustion system;

Step 4: Establish an asynchronous advantage actor-critic algorithm based on the virtual environment model according to the transfer function model in step 3 and perform training;

The state design in step 1 is divided into set values, actual values, adjustment amounts and deviations, wherein the set values include the carbon monoxide amount CO _sp , the exhaust gas temperature Te _,sp and the NOx concentration set value NO _x,sp ; the adjustment amounts are the total air volume D _AIR , the coal amount of the n-th layer burner _DB,n , the i-th layer primary air door opening V _f,i , the j-th layer secondary air door opening V _s,j , the k-th layer burnout air door opening V _c,k and the burner swing angle A _f ; the actual values are the carbon monoxide amount CO , the exhaust gas temperature _Te and the NOx concentration NO _x ; e _Cx , e _Te , e _NOx are the carbon monoxide amount deviation, the exhaust gas temperature deviation and the NOx concentration deviation and the safety margin ΔT _P between the actual maximum value of the water-cooled wall temperature and the over-temperature value, respectively; the state S _t of the coordinated control system at time t is:

S _t ={CO _sp ,T _e,sp ,NO _x,sp ,D _AIR ,D _B,n ,V _f,i ,V _s,j ,V _c,k ,A _f ,CO,T _e ,NO _x ,e _CO ,e _Te ,e _NOx ,ΔT _P };

The reward design in step 1 is divided into a continuous reward item and a control amount change rate limit item. The continuous reward is:

e _t is the weighted deviation at time t, K ₁ is the weight of the continuous reward item;

The control variable change rate limit item is:

K ₂ is the weight of the control variable change rate limit item;

The asynchronous advantage actor-critic algorithm based on the virtual environment model in step 4 adds a Dyna structure to the transfer function model in step 3 in each thread, so that the algorithm is trained in the transfer function model while learning in the real environment to improve the learning efficiency, and the virtual environment learning dynamic weight μ is increased, and the parameter update is:

θ _a ′←θ _a ′+με _a dθ _a ′，θ _v ′←θ _v ′+με _v dθ _v ′

in

In the above formula, Rm is the cumulative reward in the virtual environment model. When Rm<0, Rm is set to zero. rm is the maximum reward given by the virtual environment model at each step. j is the number of repeated executions in the virtual environment model. μ decreases with the increase in the number of global parameter updates and the increase in the cumulative rewards in each thread. θ′ _a and θ′ _v are the global shared parameters in the virtual environment, and T is the global shared count.