Combustion optimization method for thermal power generating unit
    
      Technical Field
      The invention relates to the field of automatic control of thermal power generating units, in particular to a combustion optimization method of a thermal power generating unit.
    
    
      Background
      With the continuous growth of new energy industry, wind power and solar energy gradually change the current power grid pattern, and the flexibility transformation of each unit is one of main directions due to the instability of new energy. The main technical difficulty of the flexible transformation is how to enable the high-capacity coal motor unit to carry out deep peak shaving towards the ultra-low load, and therefore higher requirements are put on the combustion stability of the high-capacity coal motor unit. The boiler combustion optimization reduces the generation of NOx by means of grading air distribution, coordination of coal blending and the like, and simultaneously takes into account economic indexes such as CO emission concentration, boiler efficiency and the like and safety indexes such as high-temperature steam pipe wall overtemperature prevention and the like, is a complex problem of multi-field coupling, multivariable restriction and multi-objective optimization, and can cause the problems of instability of boiler combustion, incapability of effective operation of a denitration system, overtemperature of the pipe wall of a boiler steam-water system and the like due to load reduction under deep peak regulation operation. How to effectively ensure boiler efficiency and NOx emission is an important research problem for combustion optimization under deep peak shaving.
      The boiler combustion condition is more complicated, the limiting conditions are more, the current group intelligent optimization algorithm can not find the optimal solution, and the state description and the action evaluation dynamic adjustment strategy of the environment can be directly utilized through deep reinforcement learning so as to meet the final target control, but sometimes learning is directly performed through feedback of the environment, so that the learning efficiency is lower.
      Based on the problems, the invention improves the algorithm of the actor commentator with asynchronous advantages, increases the Dyna structure to increase the training efficiency of the algorithm, dynamically adjusts the learning proportion of the virtual environment, and performs the state design and the rewarding design of the algorithm for the boiler combustion system, thereby reducing the time required by training while considering various constraints.
    
    
      Disclosure of Invention
      The invention aims to overcome the defects in the background art, and provides a combustion optimizing method of a thermal power generating unit, which is realized by the following scheme:
       the invention provides a thermal power generating unit combustion optimization method, which comprises the following steps: 
       Step 1, carrying out state design and rewarding design of an algorithm according to the condition of boiler equipment; 
       Step2, designing a network structure and basic parameters of an asynchronous dominant actor commentator algorithm; 
       Step 3, establishing a multi-variable transfer function model of the boiler combustion system; 
       step4, establishing an asynchronous dominant actor criticizing algorithm based on the virtual environment model according to the transfer function model in the step 3, and training; 
       The algorithm state design in the step 1 comprises a set value, an actual value, an adjustment quantity and deviation required by the algorithm, wherein the set value comprises a carbon monoxide quantity CO sp, a smoke exhaust temperature T e,sp and a NOx concentration set value NO x,sp for a boiler combustion system, the adjustment quantity comprises a total air quantity D AIR, an nth layer combustor coal quantity D B,n, an ith layer primary air door opening V f,i, a jth layer secondary air door opening V s,j, a kth layer fuel air door opening V c,k and a combustor swing angle A f, the actual value carbon monoxide quantity CO, the smoke exhaust temperature T e and the NOx concentration NO x,eCx,eTe,eNOx are respectively a carbon monoxide quantity deviation, a smoke exhaust temperature deviation and a NOx concentration deviation, a safety margin delta T P between an actual wall temperature maximum value of a water wall and an overtemperature value, and a state S t of a coordinated control system at a moment T: 
      St={COsp,Te,sp,NOx,sp,DAIR,DB,n,Vf,i,Vs,j,Vc,k,Af,CO,Te,NOx,eCO,eTe,eNOx,ΔTP}.
       The algorithm rewarding design in the step 1 is divided into a continuous rewarding item and a control quantity change rate limiting item, and the continuous rewarding is as follows: 
        e t is the weighted bias at time t, K 1 is the continuous bonus term weight; 
       the control amount change rate limiting term is: 
        k 2 is the weight of the control quantity change rate limiting item; 
       The asynchronous dominant actor critique algorithm based on the virtual environment model in the step 4 adds a Dynabar structure for each thread by utilizing the transfer function model in the step 3, so that the algorithm trains in the transfer function model while learning in the real environment to improve the learning efficiency, and adds the dynamic weight mu for learning in the virtual environment, and then the parameters are updated as follows: 
      θ′a←θ′a+μεadθ′a,θ′v←θ′v+μεvdθ′v 
       Wherein the method comprises the steps of In the above formula, rm is a cumulative reward in the virtual environment model, rm is set to zero when Rm is less than 0, rm is a maximum reward given by each step of virtual environment model, j is the number of repeated execution in the virtual environment model, and μ decreases with increasing global parameter updating times and increasing cumulative reward in each thread. θ 'a and θ' v are global sharing parameters in the virtual environment, and T is a global sharing count.
      Advantageous effects
      According to the method disclosed by the invention, the boiler combustion system of the actor commentator algorithm with asynchronous advantages is optimized, the problem of multi-target and multi-coupling can be solved, and meanwhile, the Dynaberry structure is added to improve the learning efficiency of the algorithm, so that the algorithm converges more quickly, the algorithm training is simple, and engineering practice can be realized.
    
    
      Drawings
      Fig. 1 is a schematic diagram of a combustion system of a thermal power generating unit.
      FIG. 2 is a diagram of the boiler combustion optimization training of the present invention.
    
    
      Detailed description of the preferred embodiments
      The invention will be further described with reference to the drawings and the detailed description.
      Fig. 1 is a schematic diagram of a boiler combustion system of a thermal power generating unit, which is a six-input three-output system, wherein the inputs are total air quantity, fuel quantity (or total fuel quantity) of each layer, opening degree of a primary air door of each layer, opening degree of a secondary air door of each layer, opening degree of an exhaust air of each layer and a swinging angle of a burner, and the outputs are carbon monoxide concentration, NOx concentration and exhaust gas temperature.
      FIG. 2 is a training result of a boiler combustion system algorithm based on a deep reinforcement learning algorithm, including rewarding design, status design, network parameter setting, virtual environment setting of the algorithm. Firstly, designing the state and the rewarding of an algorithm, then inputting the state and the rewarding value into the algorithm, wherein the state is simultaneously input into a virtual environment in Dyna and an actual algorithm, integrating after the algorithm outputs action to obtain the adjustment quantity at the actual current moment, and enabling the adjustment quantity to act on an actual boiler combustion system to obtain the state and rewarding at the next moment, so that repeated learning is performed until the algorithm converges.
      The method mainly comprises the following steps:
       Step 1, carrying out state design and rewarding design of an algorithm according to the condition of boiler equipment; 
       Step2, designing a network structure and basic parameters of an asynchronous dominant actor commentator algorithm; 
       Step 3, establishing a multi-variable transfer function model of the boiler combustion system; 
       step4, establishing an asynchronous dominant actor criticizing algorithm based on the virtual environment model according to the transfer function model in the step 3, and training; 
       The algorithm state design comprises a set value, an actual value, an adjustment quantity and deviation required by the algorithm, wherein the set value comprises a carbon monoxide quantity CO sp, a smoke exhaust temperature T e,sp and a NOx concentration set value NO x,sp for a boiler combustion system, the adjustment quantity comprises a total air quantity D AIR, an nth layer burner coal quantity D B,n, an ith layer primary air door opening V f,i, a jth layer secondary air door opening V s,j, a kth layer fuel air door opening V c,k and a burner swing angle A f, the actual value carbon monoxide quantity CO, the smoke exhaust temperature T e and the NOx concentration NO x,eCx,eTe,eNOx are respectively the carbon monoxide quantity deviation, the smoke exhaust temperature deviation and the NOx concentration deviation, and a safety margin delta T P between an actual wall temperature maximum value and an overtemperature value of a water-cooled wall, and the state S t of the coordination control system at the moment T: 
      St={COsp,Te,sp,NOx,sp,DAIR,DB,n,Vf,i,Vs,j,Vc,k,Af,CO,Te,NOx,eCO,eTe,eNOx,ΔTP}
       the algorithm rewarding design in the step 1 is divided into a continuous rewarding item and a control quantity change rate limiting item, wherein the continuous rewarding item is as follows: 
      
        
      
       Wherein e t is the weighted deviation at time t, [ lambda ] 1,λ2,λ3]T is a deviation weight matrix for adjusting the specific gravity among three adjustment targets, and K 1 is the weight of the continuous rewarding item; 
       the control amount change rate limiting term is: 
        k 2 is the weight of the control quantity change rate limiting item; 
       The network structure of the asynchronous dominant actor commentator algorithm in the step 2 is a 4-layer full-connection layer, the commentator network input layer comprises 20 nodes including state information and rewarding information, two middle layers respectively comprise 80 nodes, the output layer comprises 1 node, the actor network input layer comprises 20 nodes including state information and output of the commentator network, the middle hidden layer is the same as the commentator network, and the output layer comprises 3 nodes. The basic parameters are global maximum update times T max, thread maximum update times T max, actor network learning rate epsilon a, critic network learning rate epsilon v, virtual environment model repetition times n, thread update discount factor gamma and global update frequency F update, and the specific settings are as follows: 
      
        
      
      
        
      
       The convergence speed of the algorithm can be generally judged according to the boiler structure and design parameters, so that the parameters can be adjusted. 
      The multi-variable transfer function model of the boiler combustion system in the step 3 is used as a virtual environment for auxiliary training, the precision requirement is not high, the general disturbance experiment at a certain load point is carried out to directly establish the 6-input 3-output transfer function model of the boiler combustion system, the model is only used as the virtual environment in Dyna, the initial convergence rate of the algorithm is improved, and the final precision is not influenced
      The asynchronous dominant actor critique algorithm based on the virtual environment model in the step 4 adds a Dyna structure for each thread by utilizing the transfer function model in the step 3, so that the algorithm is trained in the transfer function model while learning in the real environment to improve the learning efficiency, and the dynamic weight mu of the virtual environment learning is added, and then the parameters are updated as follows:
      θ′a←θ′a+μεadθ′a,θ′v←θ′v+μεvdθ′v 
       Wherein the method comprises the steps of  
      In the above formula, rm is a cumulative reward in the virtual environment model, rm is set to zero when Rm is less than 0, rm is a maximum reward given by each step of virtual environment model, j is the number of repeated execution in the virtual environment model, and μ decreases with increasing global parameter updating times and increasing cumulative reward in each thread. θ 'a and θ' v are global sharing parameters in the virtual environment, and T is a global sharing count.