[go: up one dir, main page]

CN118494790B - Ammonia working medium thruster thrust stability control method and system - Google Patents

Ammonia working medium thruster thrust stability control method and system Download PDF

Info

Publication number
CN118494790B
CN118494790B CN202410940206.6A CN202410940206A CN118494790B CN 118494790 B CN118494790 B CN 118494790B CN 202410940206 A CN202410940206 A CN 202410940206A CN 118494790 B CN118494790 B CN 118494790B
Authority
CN
China
Prior art keywords
control
local
control input
thruster
constraint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410940206.6A
Other languages
Chinese (zh)
Other versions
CN118494790A (en
Inventor
贾云涛
沈岩
樊明洲
罗群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yidong Aerospace Technology Co ltd
Original Assignee
Beijing Yidong Aerospace Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yidong Aerospace Technology Co ltd filed Critical Beijing Yidong Aerospace Technology Co ltd
Priority to CN202410940206.6A priority Critical patent/CN118494790B/en
Publication of CN118494790A publication Critical patent/CN118494790A/en
Application granted granted Critical
Publication of CN118494790B publication Critical patent/CN118494790B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B64AIRCRAFT; AVIATION; COSMONAUTICS
    • B64GCOSMONAUTICS; VEHICLES OR EQUIPMENT THEREFOR
    • B64G1/00Cosmonautic vehicles
    • B64G1/22Parts of, or equipment specially adapted for fitting in or to, cosmonautic vehicles
    • B64G1/24Guiding or controlling apparatus, e.g. for attitude control
    • B64G1/242Orbits and trajectories
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B64AIRCRAFT; AVIATION; COSMONAUTICS
    • B64GCOSMONAUTICS; VEHICLES OR EQUIPMENT THEREFOR
    • B64G1/00Cosmonautic vehicles
    • B64G1/22Parts of, or equipment specially adapted for fitting in or to, cosmonautic vehicles
    • B64G1/24Guiding or controlling apparatus, e.g. for attitude control
    • B64G1/244Spacecraft control systems
    • B64G1/245Attitude control algorithms for spacecraft attitude control

Landscapes

  • Engineering & Computer Science (AREA)
  • Remote Sensing (AREA)
  • Chemical & Material Sciences (AREA)
  • Combustion & Propulsion (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention provides a method and a system for stably controlling the thrust of an ammonia working medium thruster, which relate to the technical field of thruster control and comprise the steps of dividing a working area into a plurality of local areas based on historical working conditions, and constructing a local linear model library; selecting a corresponding local model, switching local controllers and determining control input; updating the local model in real time by using a model optimization network, and generating an updated dynamic model through online learning; evaluating control performance through a reward function by utilizing an action evaluation algorithm, generating an adjustment strategy, and self-setting controller parameters to obtain self-adaptive controller parameters; and (3) taking the minimized cost function as a target, obtaining a global optimal control input sequence through a control optimizing algorithm, taking the first element of the sequence as a current control instruction and the other elements as reference control tracks, and realizing thrust stability control based on the current control instruction and the reference control tracks.

Description

Ammonia working medium thruster thrust stability control method and system
Technical Field
The invention relates to the technical field of thruster control, in particular to a method and a system for stably controlling the thrust of an ammonia working medium thruster.
Background
The ammonia working medium thruster is a common satellite and spacecraft attitude control and orbit control actuating mechanism and has the advantages of continuously adjustable thrust, high specific impulse, no pollution and the like, however, as the physical properties of the ammonia working medium are changed severely near a thermodynamic critical point, and the internal flowing and heat transfer processes of the thruster are complex, the thrust output precision and stability of the thruster are difficult to ensure.
The traditional ammonia working medium thruster control method mainly comprises open-loop control based on pressure-flow characteristics and closed-loop control based on thermodynamic models. It is generally difficult to cope with thrust fluctuations caused by physical property changes of the propellant and external disturbances, and the robustness of the controller is to be improved.
In summary, the existing ammonia working medium thruster thrust control method has insufficient precision and low robustness, and cannot meet the requirements of high precision and high stability of spacecraft attitude and orbit control.
Disclosure of Invention
The embodiment of the invention provides a method and a system for stably controlling the thrust of an ammonia working medium thruster, which can solve the problems in the prior art.
In a first aspect of an embodiment of the present invention,
The utility model provides a stable thrust control method of an ammonia working medium thruster, which comprises the following steps:
Dividing a working interval of the ammonia working substance thruster into a plurality of local areas based on the history working condition of the ammonia working substance thruster, and constructing a local linear model library; selecting a corresponding local model from the local linear model library according to the working condition state of the ammonia working medium thruster, switching a corresponding local controller, and determining a control input;
Updating the local model in real time by using a model optimization network, and generating an updated dynamic model by on-line learning and adjusting network parameters based on the working condition state, the control input and the response output of the ammonia working medium thruster; using an action evaluation algorithm to evaluate control performance through a reward function based on the updated dynamic model and combining the control input and the response output, generating an adjustment strategy of the controller parameters, and performing self-tuning on the controller parameters of the local controller to obtain self-adaptive controller parameters;
according to the updated dynamic model and the self-adaptive controller parameters, combining the state constraint and the control input constraint of the ammonia working medium thruster, taking the minimized cost function as a target, obtaining a global optimal control input sequence through a control optimizing algorithm, taking the first element of the global optimal control input sequence as a current control instruction, taking the other elements of the global optimal control input sequence as reference control tracks, and realizing thrust stability control based on the current control instruction and the reference control tracks.
In an alternative embodiment of the present invention,
Selecting a corresponding local model from the local linear model library according to the working condition state of the ammonia working medium thruster, and switching a corresponding local controller, wherein the determining control input comprises:
collecting measurement signals of the ammonia working medium thruster, generating a working condition state vector, classifying the working condition state vector based on a pre-trained Gaussian mixture model, determining posterior probability of the working condition state vector belonging to each local model, and determining a local controller by taking a local model corresponding to the maximum value in the posterior probability as an adaptive local model;
Based on the local controllers, acquiring a plurality of adjacent local controllers, calculating the control law of each adjacent local controller, determining the fusion factor of each control law, and fusing the control laws of the adjacent local controllers by taking the sum of the fusion factors at each moment as 1 and continuously conducting a switching process as a constraint condition to determine a switching control law;
based on the Lyapunov function stability condition, constructing a linear matrix inequality, and based on the linear matrix inequality, solving all fusion factors in the switching control law to generate a stable and smooth switching control law, thereby obtaining a control input signal.
In an alternative embodiment of the present invention,
Updating the local model in real time by using a model optimization network, and adjusting network parameters through online learning based on the working condition state, the control input and the response output of the ammonia working medium thruster, wherein the generating the updated dynamic model comprises the following steps:
Acquiring the working condition state and control input of the ammonia working medium thruster;
based on a pre-trained kernel extreme learning machine network, calculating kernel function values between the working condition state and the control input and a pre-acquired training sample by utilizing a Gaussian kernel function through inputting the working condition state and the control input, and outputting the kernel function values as kernel functions of hidden layer nodes;
Solving network output weights through a least square method based on kernel function output of the hidden layer node to obtain optimized local model parameters; and updating the local model of the ammonia working medium thruster in real time based on the optimized local model parameters.
In an alternative embodiment of the present invention,
And evaluating control performance through a reward function based on the updated dynamic model by using an action evaluation algorithm and combining the control input and the response output, generating an adjustment strategy of the controller parameters, performing self-tuning on the controller parameters of the local controller, and obtaining self-adaptive controller parameters, wherein the self-tuning comprises the following steps:
the action evaluation algorithm is constructed based on a reinforcement learning method;
Initializing a global action network, a global evaluation network and a plurality of parallel processing units;
For each parallel processing unit, the following operations are performed:
copying parameters from the global action network and the global evaluation network, generating a local action network and a local evaluation network, interacting with a controlled object through the local action network based on an updated dynamic model, collecting sample data and storing the sample data in a local data cache;
Extracting a group of sample data from a local data cache, calculating the gradient of the local evaluation network, and uploading the gradient to a global evaluation network;
Based on the sample data, calculating a strategy updating direction and a step length of a local action network by constructing a trust domain strategy optimization algorithm in advance, uploading the updating direction and the step length to a global action network, asynchronously updating the global action network, and generating an updated global action network;
Generating an adjustment strategy of the controller parameters by utilizing the updated global action network;
based on the adjustment strategy, self-setting the controller parameters of the local controller to obtain self-adaptive controller parameters;
Controlling the controlled object based on the self-adaptive controller parameters, and calculating a reward function value according to an actual control result;
feeding back the reward function value to the global evaluation network, and updating the global evaluation network by combining an actual control result;
And repeating the operation of the parallel processing unit until the global action network and the global evaluation network reach preset control targets, and determining optimal self-adaptive controller parameters.
In an alternative embodiment of the present invention,
Based on the sample data, calculating the policy update direction and step length of the local action network by constructing a trust domain policy optimization algorithm in advance comprises:
the policy update direction is calculated as follows:
wherein, v θ represents an operator for gradient calculation of the policy parameter θ, J (θ) represents a cumulative return of the current policy when the policy parameter θ, N represents a total number of samples, i represents a sample number, pi θ(ai|si) represents a probability of selecting the action a i in the state s i when the policy parameter θ, a θlogπθ(ai|si) represents a logarithmic policy gradient of the i-th sample, a πθ(ai|si) represents a dominance function value of taking the action a i in the state s i, F represents an information matrix of curvature information in different directions in the policy parameter space, and d represents a policy update direction vector.
In an alternative embodiment of the present invention,
According to the updated dynamic model and the self-adaptive controller parameters, combining the state constraint and the control input constraint of the ammonia working medium thruster, taking the minimized cost function as a target, and obtaining a global optimal control input sequence through a control optimizing algorithm comprises the following steps:
establishing state constraint and control input constraint of the ammonia working medium thruster;
Determining a thrust error quadratic function based on the error between the actual thrust and the target thrust, and constructing a thrust error term; determining a control input difference quadratic function based on the difference between control inputs at adjacent moments, and constructing a control input change rate item; determining a penalty function based on the violation state constraint and a penalty value generated by violating the control input constraint, and constructing a constraint violation penalty item; constructing a cost function based on the thrust error term, the control input change rate term and the constraint violation penalty term;
Constructing a control optimizing algorithm by taking the minimized cost function as a target;
And in each iteration process, the control optimizing algorithm constructs a quadratic programming based on the gradient and the Hessian matrix at the current iteration point, solves the quadratic programming to obtain the increment of the control input sequence, updates the iteration point, judges whether the preset convergence condition is met, enters the next iteration if the preset convergence condition is not met, otherwise, terminates the iteration, and outputs the current corresponding control input sequence to obtain the global optimal control input sequence.
In an alternative embodiment of the present invention,
Based on the thrust error term, the control input rate of change term, and the constraint violation penalty term, constructing a cost function includes:
the cost function has the following formula:
Wherein J 1 denotes a thrust error term, K denotes a time, K denotes a total time, e k denotes a thrust error at the kth time, α 1 denotes a thrust error sensitivity coefficient, J 2 denotes a control input change rate term, Δu k denotes a difference between adjacent control inputs at the kth time, α 2 denotes a control input change rate sensitivity coefficient, J 3 denotes a constraint violation penalty term, J denotes a state constraint sequence number, m denotes a state constraint total number, g j(xk) denotes a function value of the jth state constraint at the kth time, β 1 denotes a sensitivity coefficient of the violation state constraint, l denotes a control input constraint sequence number, n denotes a control input constraint total number, h l(uk) denotes a function value of the ith control input constraint at the kth time, β 2 denotes a sensitivity coefficient of the violation control input constraint, J denotes a cost function, γ 1 denotes a thrust error term contribution penalty index, γ 2 denotes a control input change rate term contribution index, and γ 3 denotes a constraint violation term index.
In a second aspect of an embodiment of the present invention,
The utility model provides an ammonia working medium thruster thrust stability control system, include:
The first unit is used for dividing the working interval of the ammonia working medium thruster into a plurality of local areas based on the history working condition of the ammonia working medium thruster, and constructing a local linear model library; selecting a corresponding local model from the local linear model library according to the working condition state of the ammonia working medium thruster, switching a corresponding local controller, and determining a control input;
The second unit is used for updating the local model in real time by utilizing a model optimization network, and generating an updated dynamic model by adjusting network parameters through online learning based on the working condition state, the control input and the response output of the ammonia working medium thruster; using an action evaluation algorithm to evaluate control performance through a reward function based on the updated dynamic model and combining the control input and the response output, generating an adjustment strategy of the controller parameters, and performing self-tuning on the controller parameters of the local controller to obtain self-adaptive controller parameters;
And the third unit is used for obtaining a global optimal control input sequence by taking the minimum cost function as a target and adopting a control optimizing algorithm according to the updated dynamic model and the self-adaptive controller parameters and combining the state constraint and the control input constraint of the ammonia working medium thruster, taking the first element of the global optimal control input sequence as a current control instruction, taking the other elements of the global optimal control input sequence as a reference control track, and realizing thrust stability control based on the current control instruction and the reference control track.
In a third aspect of an embodiment of the present invention,
There is provided an electronic device including:
A processor;
A memory for storing processor-executable instructions;
wherein the processor is configured to invoke the instructions stored in the memory to perform the method described previously.
In a fourth aspect of an embodiment of the present invention,
There is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method as described above.
In the embodiment of the invention, the working interval is divided into a plurality of local areas and a local linear model library is constructed, so that a corresponding local model and a controller can be selected according to the current working condition state of the ammonia working medium thruster, the fine control for different working conditions is realized, and the adaptability and the stability of the system are improved; because the local model and the controller are constructed according to the historical working condition data, and can be selected and switched in real time according to the current working condition state, the control strategy can be adjusted in real time in the operation process of the ammonia working medium thruster, and the method is suitable for different working environments and operation conditions; by utilizing an action evaluation algorithm, combining the updated dynamic model, control input and thruster response output, evaluating control performance through a reward function, generating an adjustment strategy of the controller parameters, wherein the adjustment strategy can help the system to realize autonomous adjustment of the controller parameters so as to adapt to different working states and environmental changes and realize optimization of the self-adaptive controller parameters; taking the minimized cost function as an optimization target, so that the system can more efficiently reach the expected performance target in the control process; through a control optimizing algorithm, the system can find a globally optimal control input sequence, and the optimization of the thruster control is realized on the premise of meeting constraint conditions.
Drawings
FIG. 1 is a schematic flow chart of a method for controlling thrust stability of an ammonia working substance thruster according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a thrust stability control system of an ammonia working substance thruster according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
Fig. 1 is a schematic flow chart of a method for controlling thrust stability of an ammonia working substance thruster according to an embodiment of the present invention, as shown in fig. 1, the method includes:
S101, dividing a working interval of an ammonia working medium thruster into a plurality of local areas based on a history working condition of the ammonia working medium thruster, and constructing a local linear model library; selecting a corresponding local model from the local linear model library according to the working condition state of the ammonia working medium thruster, switching a corresponding local controller, and determining a control input;
The historical operating conditions typically include valve opening, thruster inlet pressure, thruster inlet temperature, thruster outlet pressure, thruster outlet temperature, thruster wall temperature, thruster mass flow, actual output thrust, and operating time period, operating load, related internal control parameters, etc.;
collecting and arranging historical working condition data of the ammonia working medium thruster, wherein the historical working condition data comprise information such as working time period, working load, temperature and pressure change and the like; dividing a working interval into a plurality of local areas according to collected historical working condition data, preferably using a clustering algorithm to divide, constructing a local linear model for each local area through a linear system identification method, establishing a local linear model library, and storing the local linear model corresponding to each local area for later use;
When the ammonia working medium thruster is in a specific working condition state, selecting a corresponding local model according to the working condition state, selecting by using a model matching method, switching to a local controller corresponding to the selected local model, and determining a control input so that the ammonia working medium thruster can keep stability and performance optimization under the working condition;
In the embodiment, the working interval is divided into a plurality of local areas, and a local linear model library is constructed, so that a corresponding local model and a controller can be selected according to the current working condition state of the ammonia working medium thruster, the fine control for different working conditions is realized, and the adaptability and the stability of the system are improved; because the local model and the controller are constructed according to the historical working condition data, and can be selected and switched in real time according to the current working condition state, the control strategy can be adjusted in real time in the operation process of the ammonia working medium thruster, and the method is suitable for different working environments and operation conditions; the local model and the controller can be used for selecting and switching according to the actual working condition state of the ammonia working medium thruster, so that the self-adaptability of the system is realized, the control strategy can be timely adjusted in the face of working condition change or system parameter drift, and the stability and the reliability of the system are ensured.
In an alternative embodiment, according to the working condition state of the ammonia working medium thruster, selecting a corresponding local model from the local linear model library, and switching the corresponding local controller, determining the control input includes:
collecting measurement signals of the ammonia working medium thruster, generating a working condition state vector, classifying the working condition state vector based on a pre-trained Gaussian mixture model, determining posterior probability of the working condition state vector belonging to each local model, and determining a local controller by taking a local model corresponding to the maximum value in the posterior probability as an adaptive local model;
Based on the local controllers, acquiring a plurality of adjacent local controllers, calculating the control law of each adjacent local controller, determining the fusion factor of each control law, and fusing the control laws of the adjacent local controllers by taking the sum of the fusion factors at each moment as 1 and continuously conducting a switching process as a constraint condition to determine a switching control law;
based on the Lyapunov function stability condition, constructing a linear matrix inequality, and based on the linear matrix inequality, solving all fusion factors in the switching control law to generate a stable and smooth switching control law, thereby obtaining a control input signal.
The working condition state vector specifically refers to a feature vector extracted from a measurement signal of the ammonia working medium thruster, and is used for describing the current running state of the ammonia working medium thruster, such as the thrust, the rotating speed, the temperature and the like;
the posterior probability specifically refers to the probability of a certain hypothesis or model under the condition of given observation data, and in the invention, the posterior probability represents the probability that the working condition state vector belongs to each local model;
the control law specifically refers to a rule or algorithm for determining an output control signal according to the current working condition state of the ammonia working medium thruster, and the rule or algorithm is used for controlling the ammonia working medium thruster;
the fusion factor specifically refers to a weight coefficient of the adjacent local controllers in the control law fusion process, and is used for determining the comprehensive weight of the control inputs of the two adjacent controllers;
The switching control law is specifically a control law adopted when switching between different local models, and is obtained by fusing control laws of adjacent local controllers, so that continuity and stability of control input in the switching process are ensured;
Firstly, collecting measurement signals of an ammonia working medium thruster through a sensor, wherein the measurement signals comprise valve opening, thruster inlet pressure, inlet temperature, outlet pressure, outlet temperature, wall temperature, mass flow, actual output thrust and the like, and combining the measurement signals into a working condition state vector to represent the working state of the current thruster;
and classifying the working condition state vectors by using a pre-trained Gaussian mixture model. The gaussian mixture model is a probability density estimation method, and a probability distribution of an arbitrary shape is fitted by a weighted sum of a plurality of gaussian distributions. And in the pre-training stage, a large amount of historical working condition data are used for estimating parameters of the Gaussian mixture model, so that probability distribution characteristics of different working condition states can be accurately described. And when classifying, inputting the working condition state vector into the trained Gaussian mixture model, and calculating the posterior probability of the working condition state vector belonging to each local model. The posterior probability represents the probability that the thruster is positioned in each local area under the current working condition;
And selecting a local model corresponding to the maximum value from the posterior probability as an adaptive local model under the current working condition. Each local model corresponds to a pre-designed local controller, so that the corresponding local controller is determined by determining the adaptive local model;
In order to achieve smooth controller switching, the control laws of adjacent local controllers need to be fused. First, according to the adaptive local controller, a plurality of local controllers adjacent to the adaptive local controller are acquired. Adjacent means controllers corresponding to those areas adjacent to the adapted partial area when dividing the working space. For each neighboring local controller, a control law based on the current state is calculated, i.e. a control input is determined from the feedback information.
In order to smoothly merge these adjacent control laws, the concept of a fusion factor is introduced. The fusion factor represents the weight of each adjacent control law in the final control input. The sum of the individual fusion factors is required to be 1 at each instant to ensure that the numerical range of the control inputs before and after fusion is unchanged. At the same time, the derivative of the fusion factor is taken into account as a constraint in order to ensure continuous guidance of the switching process.
Finally, a linear matrix inequality is constructed by utilizing Lyapunov stability theory to ensure the stability of switching control. The linear matrix inequality is a special semi-definite programming problem and can be used for stability analysis and synthesis of a control system. By solving the linear matrix inequality, the fusion factor value meeting the stability condition can be obtained. Substituting the fusion factor obtained by solving into the weighted sum of adjacent control laws to obtain the final stable smooth switching control law. The stable smooth switching control law integrates the control actions of the adaptive local controller and the adjacent local controllers, ensures stable switching and simultaneously gives consideration to control performance. And outputting the switching control law to serve as a control input signal of the ammonia working medium thruster so as to realize accurate and stable control of the thrust.
In the embodiment, the working states of the thrusters are classified by using a Gaussian mixture model, so that the system can respond correspondingly according to different working states; selecting an adaptive local model and a local controller according to the working condition state vector classification result, and realizing fine control on different working conditions; by fusing control laws of adjacent local controllers and introducing fusion factors, smooth transition of controller switching is realized, and instability of the system in the switching process is avoided; the linear matrix inequality is constructed by utilizing the Lyapunov stability theory, so that the stability of switching control is ensured, and the stability and reliability of the system in different working states are ensured; the control functions of the proper local controllers and the adjacent local controllers are combined, the requirements of control performance and stable switching are met, and the performance of the system is improved.
S102, updating the local model in real time by using a model optimization network, and generating an updated dynamic model by adjusting network parameters through online learning based on the working condition state, the control input and the response output of the ammonia working medium thruster; using an action evaluation algorithm to evaluate control performance through a reward function based on the updated dynamic model and combining the control input and the response output, generating an adjustment strategy of the controller parameters, and performing self-tuning on the controller parameters of the local controller to obtain self-adaptive controller parameters;
In the embodiment, the network is optimized by using the model, on-line learning is performed based on the working condition state, the control input and the thruster response output, the network parameters are adjusted, an updated dynamic model is generated, the real-time monitoring and modeling of the dynamic characteristics of the thruster are realized, and the control system can adapt to the change of the performance of the thruster in time; and by utilizing an action evaluation algorithm, combining the updated dynamic model, the control input and the thruster response output, evaluating the control performance through a reward function, generating an adjustment strategy of the controller parameters, wherein the adjustment strategy can help the system to realize autonomous adjustment of the controller parameters so as to adapt to different working states and environmental changes and realize optimization of the self-adaptive controller parameters.
In an alternative embodiment, the local model is updated in real time by using a model optimization network, and generating an updated dynamic model based on the working condition state, the control input and the response output of the ammonia working medium thruster by online learning, and adjusting network parameters includes:
Acquiring the working condition state and control input of the ammonia working medium thruster;
based on a pre-trained kernel extreme learning machine network, calculating kernel function values between the working condition state and the control input and a pre-acquired training sample by utilizing a Gaussian kernel function through inputting the working condition state and the control input, and outputting the kernel function values as kernel functions of hidden layer nodes;
Solving network output weights through a least square method based on kernel function output of the hidden layer node to obtain optimized local model parameters; and updating the local model of the ammonia working medium thruster in real time based on the optimized local model parameters.
Firstly, acquiring working condition states and control inputs of an ammonia working medium thruster in real time through a sensor and a control system, wherein the working condition states reflect the current working condition of the thruster, and the control inputs refer to control instructions applied to the thruster at the current moment;
next, the local model is updated online using a pre-trained kernel extreme learning machine network. The nuclear extreme learning machine is a single hidden layer feedforward neural network and has the advantages of high training speed, strong generalization capability and the like. And in the pre-training stage, a large amount of history working condition data is used for initializing the input weight and hidden layer node parameters of the network, and the input weight and hidden layer node parameters are kept fixed. And in the online updating stage, the current acquired working condition state and control input are input into a pre-training network, and the intermediate output of the network is obtained through kernel function transformation of the hidden layer node.
The kernel function is a key component of the kernel extreme learning machine and is used for mapping input data to a high-dimensional feature space to realize nonlinear transformation. The kernel function preferably uses a gaussian kernel function, which is calculated separately for each hidden layer node for the current input operating state and control input. The kernel function value represents the similarity between the current input and the hidden layer node, the higher the similarity is, the larger the kernel function value is, and the calculated kernel function value is used as the output of the hidden layer node and is transmitted to the output layer of the network.
At the output layer, the network output weight is solved by a least square method, so that the error between the network output and the actual measured value is minimized. Because the parameters of the hidden layer node are already determined in the pre-training stage, the fast training of the network can be realized by only adjusting the output weight. And solving to obtain output weights, namely the updated local model parameters.
And finally, replacing the original local model parameters with the updated local model parameters to realize the real-time updating of the local model. The updated local model can more accurately describe the dynamic characteristics of the ammonia working medium thruster under the current working condition, and reliable priori information is provided for the design of the controller. The change of the dynamic characteristics of the thruster can be effectively adapted by updating the local model on line, and the self-adaption capability and the robustness of the control system are improved.
Wherein the online updating of the local model is performed in parallel with the switching of the controller. In each control period, an adaptive local model and a controller are selected according to the current working condition state, and then the adaptive local model is updated by using a nuclear extreme learning machine network. The updated local model will be used for the design and switching of the controller in the next control cycle. By means of the strategy of real-time updating and switching, accurate and stable control of thrust output of the ammonia working medium thruster can be achieved, and requirements of high precision and high reliability of spacecraft attitude and orbit control are met.
In the embodiment, the local model is updated on line by utilizing the pre-trained core extreme learning machine network, so that the model parameters can be quickly adjusted, the dynamic characteristics of the thruster can be more accurately described by the model parameters, and reliable priori information is provided; the quick training characteristic of the kernel extreme learning machine network is used for realizing the quick updating of local model parameters and the quick adaptability of the control system to the dynamic characteristics of the thruster; the online updating of the local model and the switching of the controller are performed in parallel, and the parallel strategy can ensure that the latest local model parameters can be used for designing and switching the controller in each control period, so that accurate and stable thrust output control is realized; the accurate and stable control of the thrust output of the ammonia working medium thruster is realized by updating the parallel strategy of the local model and the controller in real time, and the requirements of high precision and high reliability of spacecraft attitude and orbit control are met.
In an alternative embodiment, using an action evaluation algorithm, based on the updated dynamic model, in combination with the control input and the response output, evaluating control performance through a reward function, generating an adjustment strategy for controller parameters, self-tuning the controller parameters of the local controller, and obtaining adaptive controller parameters includes:
the action evaluation algorithm is constructed based on a reinforcement learning method;
Initializing a global action network, a global evaluation network and a plurality of parallel processing units;
For each parallel processing unit, the following operations are performed:
copying parameters from the global action network and the global evaluation network, generating a local action network and a local evaluation network, interacting with a controlled object through the local action network based on an updated dynamic model, collecting sample data and storing the sample data in a local data cache;
Extracting a group of sample data from a local data cache, calculating the gradient of the local evaluation network, and uploading the gradient to a global evaluation network;
Based on the sample data, calculating a strategy updating direction and a step length of a local action network by constructing a trust domain strategy optimization algorithm in advance, uploading the updating direction and the step length to a global action network, asynchronously updating the global action network, and generating an updated global action network;
Generating an adjustment strategy of the controller parameters by utilizing the updated global action network;
based on the adjustment strategy, self-setting the controller parameters of the local controller to obtain self-adaptive controller parameters;
Controlling the controlled object based on the self-adaptive controller parameters, and calculating a reward function value according to an actual control result;
feeding back the reward function value to the global evaluation network, and updating the global evaluation network by combining an actual control result;
And repeating the operation of the parallel processing unit until the global action network and the global evaluation network reach preset control targets, and determining optimal self-adaptive controller parameters.
Firstly, initializing a global action network and a global evaluation network for generating an adjustment strategy and an evaluation control effect of controller parameters, and simultaneously creating a plurality of parallel processing units, wherein each unit independently executes a reinforcement learning process and does not directly communicate with each other.
For each parallel processing unit, copying parameters from a global action network and a global evaluation network to generate a local action network and a local evaluation network, allowing each unit to independently explore a control strategy, and improving the diversity of sample data; next, the local action network interacts with the controlled object, generates a control input according to the current state, and observes the state and prize value at the next moment. The process is based on a dynamic model updated in real time so as to adapt to the change of the characteristics of a controlled object, and the acquired sample data, including state transition and rewarding values, are stored in a local data cache for subsequent network training;
After a certain number of samples are accumulated in the local data cache, a group of sample data is randomly extracted from the local data cache, and the gradient of the local evaluation network is calculated. The purpose of the evaluation network is to evaluate the cost function of the state-action pairs, i.e. the long-term jackpot that can be achieved by taking some action in the current state. The gradient information reflects the approximation degree of the current evaluation network to the real cost function, and the approximation degree is uploaded to the global evaluation network for updating the evaluation network parameters;
In addition to evaluating the training of the network, the sample data is also used to optimize the policies of the action network; by adopting a trust domain policy optimization algorithm, the stability of the updating process is ensured while the policy performance is improved by limiting the updating amplitude of the policy; specifically, in each parallel unit, based on local sample data, calculating the direction and step length of action network strategy updating, uploading the updating direction and step length to a global action network, asynchronously updating parameters of the global action network, and adopting an asynchronous updating mechanism can reduce communication overhead among the parallel units and improve training efficiency;
The updated global action network is used to generate an adjustment policy for the controller parameters. Based on the adjustment strategy, self-setting the parameters of the local controller to obtain the parameters of the self-adaptive controller; the parameter of the self-adaptive controller reflects the optimal control strategy under the current dynamic environment, and can quickly respond to the change of the characteristics of the controlled object;
controlling the controlled object by utilizing the self-adaptive controller parameters, and calculating a reward function value according to the actual control effect, wherein the reward function is used for measuring the quality of the control effect and is a key feedback signal for reinforcement learning; and feeding back the reward function value and the actual control result to the global evaluation network for further optimization of the evaluation network parameters.
Each parallel processing unit is independently and repeatedly performed, the global action network and the evaluation network are continuously updated until a preset convergence condition is met or a given control target is reached, and the parallel learning mechanism can fully utilize the computing resources of a plurality of processing units and accelerate the convergence process of reinforcement learning;
finally, the global action network obtained through reinforcement learning can generate an optimal controller parameter adjustment strategy to realize the self-adaptive setting of the controller parameters.
In the embodiment, by creating a plurality of parallel processing units, each unit independently executes the reinforcement learning process, and does not directly communicate with each other, the computing resources of the plurality of processing units can be fully utilized, the convergence process of reinforcement learning is accelerated, and the training efficiency is improved; the dynamic model updated in real time is utilized to adapt to the change of the characteristics of the controlled object, wherein the reinforcement learning process is based on sample data collected by the dynamic model, and the sample data comprises state transition and rewarding values, so that the control system can quickly respond to the change of the characteristics of the controlled object; by adopting an asynchronous updating mechanism, the method improves the strategy performance and ensures the stability of the updating process by limiting the updating amplitude of the strategy, reduces the communication overhead among parallel units and improves the training efficiency; generating an optimal controller parameter adjustment strategy through the global action network obtained through reinforcement learning, realizing the self-adaptive setting of the controller parameters, enabling the controller to quickly respond to the change under the dynamic environment, and improving the adaptability and performance of the system; and calculating a reward function value according to the actual control effect, feeding back to the global evaluation network, further optimizing the evaluation network parameters, continuously optimizing the evaluation network, improving the accurate evaluation of the control effect, and optimizing the control strategy.
In an alternative embodiment, calculating the policy update direction and step size of the local active network by pre-constructing a trust domain policy optimization algorithm based on the sample data comprises:
the policy update direction is calculated as follows:
wherein, v θ represents an operator for gradient calculation of the policy parameter θ, J (θ) represents a cumulative return of the current policy when the policy parameter θ, N represents a total number of samples, i represents a sample number, pi θ(ai|si) represents a probability of selecting the action a i in the state s i when the policy parameter θ, a θlogπθ(ai|si) represents a logarithmic policy gradient of the i-th sample, a πθ(ai|si) represents a dominance function value of taking the action a i in the state s i, F represents an information matrix of curvature information in different directions in the policy parameter space, and d represents a policy update direction vector.
According to the formula for calculating the policy updating direction, the formula is composed of three parts:
the first part is the calculation formula of the policy gradient. Representing the gradient of the current strategy relative to the strategy parameters, and measuring the influence of small changes of the strategy parameters on the accumulated return, wherein the gradient of the current strategy relative to the strategy parameters is obtained by averaging the contributions of all samples. For each sample, the contribution degree is obtained by multiplying two items, wherein one item is a logarithmic strategy gradient for taking corresponding action in one state, and the influence of small change of strategy parameters on the probability of taking the action is represented; another is a merit function value that indicates how well the corresponding action was taken in one state relative to average;
the second part is a calculation formula of an information matrix, and the information matrix measures curvature information in different directions in a strategy parameter space and reflects shape characteristics of strategy distribution. The calculation of the information matrix is also based on averaging all samples, the contribution of each sample is the outer product of the logarithmic strategy gradient and its transpose, the information matrix can be regarded as a Riemann metric of the strategy parameter space, and the information matrix is used for correcting the gradient direction in the natural gradient method;
The third part obtains a final strategy updating direction by solving a linear equation set by utilizing the results of the first two parts, wherein the whole equation set represents the variation direction of mapping the strategy gradient vector to the parameter space, so that under the measurement of an information matrix, the inner product of the variation direction and the gradient vector is equal to the modular length of the gradient vector, the variation direction is as consistent as possible with the gradient direction on the basis of correcting the parameter space geometry, represents the fastest rising direction, solves the strategy updating direction obtained by the equation set, comprehensively considers the gradient information and the parameter space curvature information, and can accelerate the convergence of the strategy;
According to the formula, the gradient of the strategy parameters can be effectively estimated by measuring the gradient of the current strategy relative to the strategy parameters, namely the influence of the tiny change of the strategy parameters on the accumulated return and averaging the contributions of all samples, so that the optimization algorithm can be helped to converge to the optimal solution more quickly; by measuring curvature information in different directions in the strategy parameter space and reflecting shape characteristics of strategy distribution, an information matrix can be effectively estimated, and an optimization algorithm is helped to more accurately select a proper strategy updating direction; the final strategy updating direction is obtained by solving the linear equation set, gradient information and parameter space curvature information are comprehensively considered, so that the strategy updating direction can guide parameter updating more accurately, the convergence process of the strategy is accelerated, and the efficiency and stability of an optimization algorithm are improved; comprehensively, by calculating the strategy updating direction and the step length, the effective adjustment of strategy parameters is realized, and the performance and the convergence speed of an optimization algorithm are improved.
S103, according to the updated dynamic model and the self-adaptive controller parameters, combining the state constraint and the control input constraint of the ammonia working medium thruster, taking the minimized cost function as a target, obtaining a global optimal control input sequence through a control optimizing algorithm, taking the first element of the global optimal control input sequence as a current control instruction, taking the other elements of the global optimal control input sequence as reference control tracks, and realizing thrust stability control based on the current control instruction and the reference control tracks.
The state constraint specifically refers to a state constraint condition which must be met by the ammonia working substance thruster in the running process. The state limiting conditions may include various limitations of pressure, temperature, mass flow rate and the like of the thruster so as to ensure that the thruster operates within a safe range without damage or failure;
the control input constraint specifically refers to a constraint on the control input, i.e., a constraint on the control command applied by the thruster. The limiting conditions may include limitations on the valve opening of the thruster, the magnitude of the thrust, etc., to ensure that the control operation of the thruster is performed within acceptable limits without causing overload or other adverse effects;
the global optimal control input sequence specifically refers to an optimal control input sequence obtained by taking a minimum cost function as a target under the condition of considering state constraint and control input constraint. The control input sequence is calculated by an optimizing algorithm to ensure that the operation effect of the thruster is optimal under the given constraint condition;
In the embodiment, the state constraint and the control input constraint are considered, so that the thruster can be ensured to be always in a safe and stable working state in the running process, and meanwhile, the operation beyond the capacity range of the thruster is avoided; taking the minimized cost function as an optimization target, so that the system can more efficiently reach the expected performance target in the control process; through a control optimizing algorithm, the system can find a globally optimal control input sequence, and the optimization of the thruster control is realized on the premise of meeting constraint conditions; the first element of the global optimal control input sequence is used as a current control instruction, and the other elements are used as reference control tracks, so that the system can stably control the thruster according to the real-time condition and keep running on the optimal tracks.
In an alternative embodiment, according to the updated dynamic model and the adaptive controller parameter, in combination with a state constraint and a control input constraint of the ammonia working substance thruster, the obtaining the global optimal control input sequence by controlling the optimizing algorithm with the objective of minimizing the cost function includes:
establishing state constraint and control input constraint of the ammonia working medium thruster;
Determining a thrust error quadratic function based on the error between the actual thrust and the target thrust, and constructing a thrust error term; determining a control input difference quadratic function based on the difference between control inputs at adjacent moments, and constructing a control input change rate item; determining a penalty function based on the violation state constraint and a penalty value generated by violating the control input constraint, and constructing a constraint violation penalty item; constructing a cost function based on the thrust error term, the control input change rate term and the constraint violation penalty term;
Constructing a control optimizing algorithm by taking the minimized cost function as a target;
And in each iteration process, the control optimizing algorithm constructs a quadratic programming based on the gradient and the Hessian matrix at the current iteration point, solves the quadratic programming to obtain the increment of the control input sequence, updates the iteration point, judges whether the preset convergence condition is met, enters the next iteration if the preset convergence condition is not met, otherwise, terminates the iteration, and outputs the current corresponding control input sequence to obtain the global optimal control input sequence.
Firstly, establishing state constraint and control input constraint aiming at an ammonia working medium thruster, wherein the constraint reflects the safe working interval of the thruster and the physical limit of a controller, and the physical limit must be strictly satisfied in the optimizing process;
Next, a cost function is constructed to evaluate the control performance, the cost function being composed of three parts: a thrust error term, a control input rate of change term, and a constraint violation penalty term; the deviation between the actual thrust and the target thrust is measured by a thrust error term, the size of the error is described by adopting a quadratic function, and the larger the error is, the higher the cost function value is; the control input change rate term considers the difference value between the control inputs at adjacent moments, and is also described by using a quadratic function, so that the control smoothness is improved for inhibiting the severe change of the control inputs; the constraint violation punishment item punishs the condition of violating the state constraint or controlling the input constraint, the punishment value is in direct proportion to the degree of violating the constraint, the three items are weighted and summed to obtain a complete cost function expression, and the control performance is comprehensively evaluated;
After determining the cost function, constructing a control optimizing algorithm by taking the minimized cost function as a target, and finding out a global optimal control input sequence through iterative search by adopting a quadratic programming method based on a gradient and a Hessian matrix; specifically, in each iteration process, firstly, calculating a gradient of a cost function on a control input and a Hessian matrix at a current iteration point, wherein the gradient provides direction information of the decrease of the cost function value, the Hessian matrix characterizes local curvature characteristics of the cost function, and a quadratic programming sub-problem is constructed, and the objective of the quadratic programming sub-problem is to find an increment of the control input sequence under the condition of meeting state constraint and control input constraint, so that the cost function value is decreased to some extent;
Solving the quadratic programming sub-problem to obtain an optimal control input increment at the current iteration point, adding the increment to the current control input sequence, updating the iteration point, and entering the next iteration, and continuously checking whether a preset convergence condition is met or not, for example, whether the decreasing amplitude of the cost function value is smaller than a given threshold value or not, and preferably whether the iteration number reaches an upper limit or not in the iteration process. Once the convergence condition is met, indicating that a locally optimal solution has been found, the iterative process terminates. Outputting the current control input sequence as a global optimal control input sequence;
The iterative optimization method based on quadratic programming utilizes the gradient of the cost function and Hessian information, gives consideration to the descending direction and the step length in the searching process, and can quickly converge to the local optimal solution. Optionally, the probability of finding the globally optimal solution can be further improved by designing reasonable starting points and convergence conditions. Meanwhile, each iteration strictly meets the state constraint and the control input constraint, so that the obtained control input sequence can be ensured to be physically feasible, and the constraint violation problem is avoided;
and finally, acting the optimized global optimal control input sequence on the ammonia working medium thruster to realize the accurate tracking control of the thrust.
In the embodiment, the cost function consists of a thrust error term, a control input change rate term and a constraint violation penalty term, and multiple aspects of control performance are comprehensively considered, so that the control performance can be effectively estimated through construction of the cost function, and adjustment of a controller is guided in the optimizing process; the quadratic programming method based on the gradient and the Hessian matrix is adopted for control optimization, and a globally optimal control input sequence can be quickly found under the condition that constraint conditions are met; by means of iterative search, the system can effectively adjust control input to minimize a cost function and realize accurate tracking control of the thruster; the quadratic programming method utilizes the gradient of the cost function and Hessian information, considers the descending direction and the step length, can quickly converge to the local optimal solution, and improves the response speed and the performance of the system.
In an alternative embodiment, constructing a cost function based on the thrust error term, the control input rate of change term, and the constraint violation penalty term includes:
the cost function has the following formula:
Wherein J 1 denotes a thrust error term, K denotes a time, K denotes a total time, e k denotes a thrust error at the kth time, α 1 denotes a thrust error sensitivity coefficient, J 2 denotes a control input change rate term, Δu k denotes a difference between adjacent control inputs at the kth time, α 2 denotes a control input change rate sensitivity coefficient, J 3 denotes a constraint violation penalty term, J denotes a state constraint sequence number, m denotes a state constraint total number, g j(xk) denotes a function value of the jth state constraint at the kth time, β 1 denotes a sensitivity coefficient of the violation state constraint, l denotes a control input constraint sequence number, n denotes a control input constraint total number, h l(uk) denotes a function value of the ith control input constraint at the kth time, β 2 denotes a sensitivity coefficient of the violation control input constraint, J denotes a cost function, γ 1 denotes a thrust error term contribution penalty index, γ 2 denotes a control input change rate term contribution index, and γ 3 denotes a constraint violation term index.
The formula describes the construction of a cost function, comprising three main parts: thrust error term, control input rate of change term, constraint violation penalty term, and combinations thereof;
Calculating a thrust error term, namely calculating an absolute value for the thrust error at each moment, and then converting the absolute value through an exponential function; the independent variable of the exponential function is the inverse of the product of the absolute value of the error and a sensitivity coefficient, and the square of the transformed result is equivalent to a dynamic scaling of the error compared with the square of the original error; when the error is smaller, the scaling multiple is close to 1, and the amplifying effect on the error is not obvious; when the error is larger, the scaling multiple is rapidly increased, the amplifying effect on the error is remarkable, the processing mode can highlight the influence of the large error, meanwhile, the interference of the small error is restrained, and the square of the converted error is accumulated to obtain a thrust error term;
The calculation of the control input change rate term is similar to the thrust error term, and aims at the difference value between the control inputs at adjacent moments; and calculating absolute values of the control input variable quantity at each moment, converting by an exponential function, squaring and accumulating to obtain a control input variable rate item. The independent variable of the exponential function is the opposite number of the product of the absolute value of the variation and the sensitivity coefficient, plays a role in dynamically scaling the variation and highlights the influence of large variation;
The constraint violation penalty term is relatively complex to calculate and is divided into a state constraint violation penalty and a control input constraint violation penalty; for each state constraint, a function value at each time is calculated, and the greater of the function value and 0 is taken to represent the degree of violation of the constraint. Then similar exponential function transformation and square processing are carried out, and state constraint violation punishment is obtained through accumulation; the control input constraint violation penalty is calculated similarly, only for each moment in time; adding the state constraint violation penalty and the control input constraint violation penalty to obtain a total constraint violation penalty term;
And finally, combining the thrust error term, the control input change rate term and the constraint violation penalty term to obtain a complete cost function expression. The combination adopts a logarithmic exponential form, specifically, the logarithm of the thrust error item and the logarithm of the control input change rate item are multiplied, the logarithm of the control input change rate item and the logarithm of the constraint violation penalty item are multiplied, the constraint violation penalty item and the logarithm of the thrust error item are multiplied, and the three products are added to obtain the cost function value. The combination mode can balance the contribution of three items, and the importance degree of each item is flexibly adjusted by adjusting the contribution indexes before three logarithms, so that the overall optimization of the control performance is realized;
According to the formula, the characteristics of the thrust control problem are fully considered, and the thrust tracking precision and the control smoothness can be considered while the constraint condition is met by introducing the sensitivity coefficient and the contribution index; the exponential function transformation and the logarithmic exponential combination are used, so that the sensitivity of the cost function to errors and changes is adaptive, good balance can be realized under different working conditions, meanwhile, the mathematical property of the cost function is good, the derivation and the optimization are convenient, and a foundation is provided for the subsequent design of the optimizing algorithm.
Fig. 2 is a schematic structural diagram of a thrust stability control system of an ammonia working substance thruster according to an embodiment of the present invention, as shown in fig. 2, the system includes:
The first unit is used for dividing the working interval of the ammonia working medium thruster into a plurality of local areas based on the history working condition of the ammonia working medium thruster, and constructing a local linear model library; selecting a corresponding local model from the local linear model library according to the working condition state of the ammonia working medium thruster, switching a corresponding local controller, and determining a control input;
The second unit is used for updating the local model in real time by utilizing a model optimization network, and generating an updated dynamic model by adjusting network parameters through online learning based on the working condition state, the control input and the response output of the ammonia working medium thruster; using an action evaluation algorithm to evaluate control performance through a reward function based on the updated dynamic model and combining the control input and the response output, generating an adjustment strategy of the controller parameters, and performing self-tuning on the controller parameters of the local controller to obtain self-adaptive controller parameters;
And the third unit is used for obtaining a global optimal control input sequence by taking the minimum cost function as a target and adopting a control optimizing algorithm according to the updated dynamic model and the self-adaptive controller parameters and combining the state constraint and the control input constraint of the ammonia working medium thruster, taking the first element of the global optimal control input sequence as a current control instruction, taking the other elements of the global optimal control input sequence as a reference control track, and realizing thrust stability control based on the current control instruction and the reference control track.
In a third aspect of an embodiment of the present invention,
There is provided an electronic device including:
A processor;
A memory for storing processor-executable instructions;
wherein the processor is configured to invoke the instructions stored in the memory to perform the method described previously.
In a fourth aspect of an embodiment of the present invention,
There is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method as described above.
The present invention may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present invention.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (9)

1. The thrust stability control method of the ammonia working medium thruster is characterized by comprising the following steps of:
Dividing a working interval of the ammonia working substance thruster into a plurality of local areas based on the history working condition of the ammonia working substance thruster, and constructing a local linear model library; selecting a corresponding local model from the local linear model library according to the working condition state of the ammonia working medium thruster, switching a corresponding local controller, and determining a control input;
Updating the local model in real time by using a model optimization network, and generating an updated dynamic model by on-line learning and adjusting network parameters based on the working condition state, the control input and the response output of the ammonia working medium thruster; using an action evaluation algorithm to evaluate control performance through a reward function based on the updated dynamic model and combining the control input and the response output, generating an adjustment strategy of the controller parameters, and performing self-tuning on the controller parameters of the local controller to obtain self-adaptive controller parameters;
According to the updated dynamic model and the self-adaptive controller parameters, combining the state constraint and the control input constraint of the ammonia working medium thruster, taking the minimized cost function as a target, obtaining a global optimal control input sequence through a control optimizing algorithm, taking the first element of the global optimal control input sequence as a current control instruction, taking the other elements of the global optimal control input sequence as reference control tracks, and realizing thrust stability control based on the current control instruction and the reference control tracks;
Selecting a corresponding local model from the local linear model library according to the working condition state of the ammonia working medium thruster, and switching a corresponding local controller, wherein the determining control input comprises:
collecting measurement signals of the ammonia working medium thruster, generating a working condition state vector, classifying the working condition state vector based on a pre-trained Gaussian mixture model, determining posterior probability of the working condition state vector belonging to each local model, and determining a local controller by taking a local model corresponding to the maximum value in the posterior probability as an adaptive local model;
Based on the local controllers, acquiring a plurality of adjacent local controllers, calculating the control law of each adjacent local controller, determining the fusion factor of each control law, and fusing the control laws of the adjacent local controllers by taking the sum of the fusion factors at each moment as 1 and continuously conducting a switching process as a constraint condition to determine a switching control law;
based on the Lyapunov function stability condition, constructing a linear matrix inequality, and based on the linear matrix inequality, solving all fusion factors in the switching control law to generate a stable and smooth switching control law, thereby obtaining a control input signal.
2. The method of claim 1, wherein updating the local model in real time using a model optimization network, adjusting network parameters via online learning based on the operating state, the control input, and a response output of an ammonia working substance thruster, the generating an updated dynamic model comprising:
Acquiring the working condition state and control input of the ammonia working medium thruster;
based on a pre-trained kernel extreme learning machine network, calculating kernel function values between the working condition state and the control input and a pre-acquired training sample by utilizing a Gaussian kernel function through inputting the working condition state and the control input, and outputting the kernel function values as kernel functions of hidden layer nodes;
Solving network output weights through a least square method based on kernel function output of the hidden layer node to obtain optimized local model parameters; and updating the local model of the ammonia working medium thruster in real time based on the optimized local model parameters.
3. The method of claim 1, wherein using an action evaluation algorithm to evaluate control performance via a reward function based on the updated dynamic model in combination with the control input and the response output to generate an adjustment strategy for controller parameters, wherein self-tuning the controller parameters of the local controller to obtain adaptive controller parameters comprises:
the action evaluation algorithm is constructed based on a reinforcement learning method;
Initializing a global action network, a global evaluation network and a plurality of parallel processing units;
For each parallel processing unit, the following operations are performed:
copying parameters from the global action network and the global evaluation network, generating a local action network and a local evaluation network, interacting with a controlled object through the local action network based on an updated dynamic model, collecting sample data and storing the sample data in a local data cache;
Extracting a group of sample data from a local data cache, calculating the gradient of the local evaluation network, and uploading the gradient to a global evaluation network;
Based on the sample data, calculating a strategy updating direction and a step length of a local action network by constructing a trust domain strategy optimization algorithm in advance, uploading the updating direction and the step length to a global action network, asynchronously updating the global action network, and generating an updated global action network;
Generating an adjustment strategy of the controller parameters by utilizing the updated global action network;
based on the adjustment strategy, self-setting the controller parameters of the local controller to obtain self-adaptive controller parameters;
Controlling the controlled object based on the self-adaptive controller parameters, and calculating a reward function value according to an actual control result;
feeding back the reward function value to the global evaluation network, and updating the global evaluation network by combining an actual control result;
And repeating the operation of the parallel processing unit until the global action network and the global evaluation network reach preset control targets, and determining optimal self-adaptive controller parameters.
4.A method according to claim 3, wherein calculating policy update directions and steps for the local active network by pre-constructing a trust domain policy optimization algorithm based on the sample data comprises:
the policy update direction is calculated as follows:
wherein, v θ represents an operator for gradient calculation of the policy parameter θ, J (θ) represents a cumulative return of the current policy when the policy parameter θ, N represents a total number of samples, i represents a sample number, pi θ(ai|si) represents a probability of selecting the action a i in the state s i when the policy parameter θ, a θlogπθ(ai|si) represents a logarithmic policy gradient of the i-th sample, a πθ(ai|si) represents a dominance function value of taking the action a i in the state s i, F represents an information matrix of curvature information in different directions in the policy parameter space, and d represents a policy update direction vector.
5. The method of claim 1, wherein based on the updated dynamic model and the adaptive controller parameters, in combination with a state constraint and a control input constraint of an ammonia-based thruster, targeting a minimization of a cost function, obtaining a globally optimal control input sequence by a control optimization algorithm comprises:
establishing state constraint and control input constraint of the ammonia working medium thruster;
Determining a thrust error quadratic function based on the error between the actual thrust and the target thrust, and constructing a thrust error term; determining a control input difference quadratic function based on the difference between control inputs at adjacent moments, and constructing a control input change rate item; determining a penalty function based on the violation state constraint and a penalty value generated by violating the control input constraint, and constructing a constraint violation penalty item; constructing a cost function based on the thrust error term, the control input change rate term and the constraint violation penalty term;
Constructing a control optimizing algorithm by taking the minimized cost function as a target;
And in each iteration process, the control optimizing algorithm constructs a quadratic programming based on the gradient and the Hessian matrix at the current iteration point, solves the quadratic programming to obtain the increment of the control input sequence, updates the iteration point, judges whether the preset convergence condition is met, enters the next iteration if the preset convergence condition is not met, otherwise, terminates the iteration, and outputs the current corresponding control input sequence to obtain the global optimal control input sequence.
6. The method of claim 5, wherein constructing a cost function based on the thrust error term, the control input rate of change term, and the constraint violation penalty term comprises:
the cost function has the following formula:
Wherein J 1 denotes a thrust error term, K denotes a time, K denotes a total time, e k denotes a thrust error at the kth time, α 1 denotes a thrust error sensitivity coefficient, J 2 denotes a control input change rate term, Δu k denotes a difference between adjacent control inputs at the kth time, α 2 denotes a control input change rate sensitivity coefficient, J 3 denotes a constraint violation penalty term, J denotes a state constraint sequence number, m denotes a state constraint total number, g j(xk) denotes a function value of the jth state constraint at the kth time, β 1 denotes a sensitivity coefficient of the violation state constraint, l denotes a control input constraint sequence number, n denotes a control input constraint total number, h l(uk) denotes a function value of the ith control input constraint at the kth time, β 2 denotes a sensitivity coefficient of the violation control input constraint, J denotes a cost function, γ 1 denotes a thrust error term contribution penalty index, γ 2 denotes a control input change rate term contribution index, and γ 3 denotes a constraint violation term index.
7. An ammonia working substance thruster thrust stability control system for implementing the method according to any one of the preceding claims 1-6, characterized in that it comprises:
The first unit is used for dividing the working interval of the ammonia working medium thruster into a plurality of local areas based on the history working condition of the ammonia working medium thruster, and constructing a local linear model library; selecting a corresponding local model from the local linear model library according to the working condition state of the ammonia working medium thruster, switching a corresponding local controller, and determining a control input;
The second unit is used for updating the local model in real time by utilizing a model optimization network, and generating an updated dynamic model by adjusting network parameters through online learning based on the working condition state, the control input and the response output of the ammonia working medium thruster; using an action evaluation algorithm to evaluate control performance through a reward function based on the updated dynamic model and combining the control input and the response output, generating an adjustment strategy of the controller parameters, and performing self-tuning on the controller parameters of the local controller to obtain self-adaptive controller parameters;
And the third unit is used for obtaining a global optimal control input sequence by taking the minimum cost function as a target and adopting a control optimizing algorithm according to the updated dynamic model and the self-adaptive controller parameters and combining the state constraint and the control input constraint of the ammonia working medium thruster, taking the first element of the global optimal control input sequence as a current control instruction, taking the other elements of the global optimal control input sequence as a reference control track, and realizing thrust stability control based on the current control instruction and the reference control track.
8. An electronic device, comprising:
A processor;
A memory for storing processor-executable instructions;
Wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1 to 6.
9. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 6.
CN202410940206.6A 2024-07-15 2024-07-15 Ammonia working medium thruster thrust stability control method and system Active CN118494790B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410940206.6A CN118494790B (en) 2024-07-15 2024-07-15 Ammonia working medium thruster thrust stability control method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410940206.6A CN118494790B (en) 2024-07-15 2024-07-15 Ammonia working medium thruster thrust stability control method and system

Publications (2)

Publication Number Publication Date
CN118494790A CN118494790A (en) 2024-08-16
CN118494790B true CN118494790B (en) 2024-10-15

Family

ID=92242939

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410940206.6A Active CN118494790B (en) 2024-07-15 2024-07-15 Ammonia working medium thruster thrust stability control method and system

Country Status (1)

Country Link
CN (1) CN118494790B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019030949A1 (en) * 2017-08-10 2019-02-14 Mitsubishi Electric Corporation Spacecraft, and control system for controlling operation of spacecraft
CN113868961A (en) * 2021-10-18 2021-12-31 哈尔滨理工大学 A power tracking control method for nuclear power system based on adaptive value iteration

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753468B (en) * 2020-06-28 2021-09-07 中国科学院自动化研究所 Self-learning optimal control method and system for elevator system based on deep reinforcement learning
CN113885328A (en) * 2021-10-18 2022-01-04 哈尔滨理工大学 Nuclear power tracking control method based on integral reinforcement learning
CN114967676B (en) * 2022-04-12 2025-02-07 苏州感测通信息科技有限公司 Model predictive control trajectory tracking control system and method based on reinforcement learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019030949A1 (en) * 2017-08-10 2019-02-14 Mitsubishi Electric Corporation Spacecraft, and control system for controlling operation of spacecraft
CN113868961A (en) * 2021-10-18 2021-12-31 哈尔滨理工大学 A power tracking control method for nuclear power system based on adaptive value iteration

Also Published As

Publication number Publication date
CN118494790A (en) 2024-08-16

Similar Documents

Publication Publication Date Title
CN118466224B (en) Flow control method and system for electric propulsion system
Wang et al. A novel approach to feedback control with deep reinforcement learning
Fischer et al. Predictive control based on local linear fuzzy models
CN115327890B (en) Method for optimizing main steam pressure of PID control thermal power depth peak shaving unit by improved crowd searching algorithm
CN113919545A (en) Photovoltaic power generation power prediction method and system with integration of multiple data models
CN119620666B (en) Intelligent industrial control parameter self-tuning method and equipment
Li et al. Optimal disturbance rejection control approach based on a compound neural network prediction method
Zeng et al. DDPG-based continuous thickness and tension coupling control for the unsteady cold rolling process
Marusak A numerically efficient fuzzy MPC algorithm with fast generation of the control signal
CN116755409A (en) A coordinated control method for coal-fired power generation system based on value distribution DDPG algorithm
Ding et al. Optimizing boiler control in real-time with machine learning for sustainability
CN117519353A (en) Cabinet temperature control method and device and air conditioner
CN119582465A (en) Secondary voltage stabilization control method for wireless power transmission based on butterfly optimization
CN118494790B (en) Ammonia working medium thruster thrust stability control method and system
Yan et al. Robust model predictive control of nonlinear affine systems based on a two-layer recurrent neural network
Zhao et al. An intelligent multi-step predictive control method of the Small Modular Reactor
Gou et al. Predictive control of turbofan engine model based on improved Elman neural network
Amin et al. System identification via artificial neural networks-applications to on-line aircraft parameter estimation
Liu et al. A fuzzy tuning approach for controller parameters of a parallel manipulator based on clustering analysis
Yao et al. An approach to solving optimal control problems of nonlinear systems by introducing detail-reward mechanism in deep reinforcement learning
Huang et al. A weighted deep learning based predictive control for multimode nonlinear system with industrial applications
Cheng et al. A multi-objective decision-making method for small modular reactor operation based on A3C algorithm
Yan et al. A neural network approach to nonlinear model predictive control
Guan et al. Interval Optimal Controller Design for Uncertain Systems Based on Interval Neural Network
CN119536167B (en) Coordination control method of slow-dynamic unknown coal-fired power generation system based on T-S fuzzy and TD3

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant