[go: up one dir, main page]

CN120335379B - Intelligent control method for satellite energy system and storage medium - Google Patents

Intelligent control method for satellite energy system and storage medium

Info

Publication number
CN120335379B
CN120335379B CN202510813906.3A CN202510813906A CN120335379B CN 120335379 B CN120335379 B CN 120335379B CN 202510813906 A CN202510813906 A CN 202510813906A CN 120335379 B CN120335379 B CN 120335379B
Authority
CN
China
Prior art keywords
data
energy system
state
satellite
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202510813906.3A
Other languages
Chinese (zh)
Other versions
CN120335379A (en
Inventor
薛均晓
李超
魏宁
王军
王聪
孙立剑
林承君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202510813906.3A priority Critical patent/CN120335379B/en
Publication of CN120335379A publication Critical patent/CN120335379A/en
Application granted granted Critical
Publication of CN120335379B publication Critical patent/CN120335379B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

本申请涉及一种卫星能源系统智控方法和存储介质,其中,该方法包括:获取卫星能源系统的实时运行数据;将实时运行数据输入卫星能源系统环境模型,利用环境模型中的大语言子模型,对实时运行数据进行语义理解和特征关联分析,提取关键特征;利用环境模型中的智能决策框架,基于关键特征,生成状态空间对应的状态数据;基于关键特征、状态数据,生成动作空间对应的动作数据,并基于关键特征、状态数据和奖励函数,生成奖励预测值;基于动作数据与奖励预测值,生成优化策略;基于优化策略,对卫星能源系统进行实时控制操作。通过本申请,解决了难以适应动态太空环境、无法高效平衡多任务需求的问题,实现了对卫星能源系统的高效管理和优化。

The present application relates to a satellite energy system intelligent control method and storage medium, wherein the method includes: obtaining real-time operating data of the satellite energy system; inputting the real-time operating data into the satellite energy system environment model, using the large language sub-model in the environment model to perform semantic understanding and feature association analysis on the real-time operating data, and extracting key features; using the intelligent decision-making framework in the environment model to generate state data corresponding to the state space based on the key features; based on the key features and state data, generating action data corresponding to the action space, and based on the key features, state data and reward function, generating reward prediction values; generating an optimization strategy based on the action data and reward prediction values; and performing real-time control operations on the satellite energy system based on the optimization strategy. Through this application, the problems of difficulty in adapting to the dynamic space environment and inability to efficiently balance multi-task requirements are solved, and efficient management and optimization of the satellite energy system are achieved.

Description

Intelligent control method for satellite energy system and storage medium
Technical Field
The application relates to the technical field of satellite energy systems, in particular to an intelligent control method and storage medium for a satellite energy system.
Background
With the rapid development of satellite technology and the increasing complexity of space tasks, the management of a satellite energy system becomes a key link for ensuring the stable operation of a satellite and the smooth completion of the tasks. Satellite energy systems need to efficiently manage energy resources in highly dynamic, diverse space environments to address complex mission requirements and environmental challenges.
In the related technology, the traditional satellite energy management system mostly adopts a control strategy based on rules or a simple optimization algorithm, and the methods can meet basic requirements under specific scenes, but often present the problems of insufficient flexibility and poor adaptability when facing complex space environments and dynamic tasks, cannot effectively cope with various emergency situations and energy requirement fluctuation, and due to the lack of deep analysis capability on data, the traditional system often has difficulty in accurately predicting potential faults, which increases the risk of system faults, and in addition, the traditional method often neglects dynamic changes of environmental conditions and task requirements when formulating the optimization strategy, so that the energy utilization efficiency is low, and the system performance cannot be fully exerted.
At present, an effective solution is not proposed for the problems that a dynamic space environment is difficult to adapt to and the multi-task requirements cannot be balanced efficiently in the related technology.
Disclosure of Invention
The embodiment of the application provides an intelligent control method, device and system for a satellite energy system, an electronic device and a storage medium, which at least solve the problems that the related technology is difficult to adapt to a dynamic space environment and the multi-task requirements cannot be balanced efficiently.
In a first aspect, an embodiment of the present application provides a satellite energy system intelligent control method, including:
Acquiring real-time operation data of a satellite energy system;
Inputting the real-time operation data into a trained satellite energy system environment model, and carrying out semantic understanding and feature association analysis on the real-time operation data by utilizing a large language submodel in the satellite energy system environment model to extract key features;
Generating action data corresponding to an action space based on the key features and the state data, and generating a reward prediction value based on the key features, the state data and a reward function, wherein the intelligent decision framework comprises the state space, the action space and the reward function;
generating an optimization strategy based on the action data and the reward predicted value, and performing real-time control operation on the satellite energy system based on the optimization strategy.
In some of these embodiments, the method further comprises:
Based on the key characteristics, combining a preset satellite task target and an environment constraint condition to generate an optimization target and the constraint condition;
defining the reward function based on the optimization objective;
The state space and the action space are defined based on the constraint condition.
In some embodiments, the state space includes a current state of charge of the battery, a current output power of the solar panel, a current power consumption requirement of the load device, a current environmental condition, a priority of the satellite mission, and a real-time requirement, a health status of the energy system;
The action space comprises an angle adjustment amount of a solar panel, a satellite attitude adjustment amount and an energy scheduling strategy adjustment amount;
each sub-objective in the reward function comprises energy utilization efficiency, system stability and task completion.
In some of these embodiments, the generating an optimization strategy based on the action data and the reward prediction value includes:
And selecting an optimal action from the action data with the aim of maximizing the reward predicted value, and generating an optimization strategy based on the optimal action.
In some of these embodiments, the method further comprises:
performing deep analysis on the running state of the satellite energy system by using the large language submodel, and judging whether the current running state of the satellite energy system is abnormal or not;
if the current running state is in an abnormal mode, generating early warning information and continuously utilizing the intelligent decision framework to generate the state data, the action data and the rewarding predicted value;
and if the current running state is not in an abnormal mode, generating the state data, the action data and the rewarding predicted value by directly utilizing the intelligent decision framework.
In some embodiments, the determining whether the current operating state of the satellite energy system is abnormal includes:
generating a safety state set based on a preset environment constraint condition;
judging whether the current running state of the satellite energy system belongs to the safety state set or not;
if the current running state of the satellite energy system belongs to the safety state set, the current running state is an abnormal mode;
And if the current running state of the satellite energy system does not belong to the safety state set, the current running state is not in an abnormal mode.
In some of these embodiments, the training process of the satellite energy system environment model includes:
constructing a simulation environment of a satellite energy system and acquiring training data of the satellite energy system;
inputting the training data into an initial environment model, and running the initial environment model in the simulation environment, wherein the initial environment model comprises an initial large language sub-model and an optimization algorithm module;
Performing semantic understanding and feature association analysis on the training data by using an initial large language sub-model in the satellite energy system environment model, generating training key features, constructing a large language loss function, substituting the training key features into the large language loss function for calculation, and adjusting parameters of the initial large language sub-model with the calculation result of the large language loss function minimized as a target to obtain the trained large language sub-model;
Inputting the training key features into the optimization algorithm module, constructing an optimization loss function, and adjusting parameters of the optimization algorithm module with the calculation result of the optimization loss function minimized as a target to obtain the trained intelligent decision frame;
And obtaining the trained satellite energy system environment model based on the large language sub-model and the intelligent decision framework.
In some of these embodiments, the acquiring training data of the satellite energy system includes:
Generating experience data by using a state transfer function in the simulation environment, wherein the state transfer function is used for describing state change of the satellite energy system under given current state, execution action and environmental random factors;
Based on the empirical data, the training data is obtained.
In some embodiments, the inputting the training key feature into the optimization algorithm module, constructing an optimization loss function, and adjusting parameters of the optimization algorithm module with the calculation result of the optimization loss function minimized as a target, to obtain the trained intelligent decision frame, including:
the optimization algorithm module comprises a strategy network and a value network;
Generating action probability distribution based on the training key characteristics and the training state data through the strategy network, and selecting training action data corresponding to the action space based on the action probability distribution;
Calculating, via the value network, an instant prize value for the current training action data based on the training state data, the training action data, and the prize function;
And calculating an optimized loss function result according to the training state data, the training action data, the instant reward value and the next training state data generated by a state transfer function, and adjusting parameters of the optimization algorithm module with the aim of minimizing the optimized loss function result to obtain the trained intelligent decision frame.
In a second aspect, an embodiment of the present application provides a storage medium having stored thereon a computer program which, when executed by a processor, implements a satellite energy system intelligent control method as described in the first aspect above.
Compared with the related art, the intelligent control method for the satellite energy system provided by the embodiment of the application has the advantages that the satellite energy data are acquired in real time, the key characteristics are analyzed through the large language submodel, the state and action data are generated, the rewards are predicted and the strategy is optimized, the closed-loop control is realized, the problems that the dynamic space environment is difficult to adapt and the multi-task requirements cannot be balanced efficiently are solved, and the efficient management and optimization of the satellite energy system are realized.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a block diagram of the hardware architecture of a terminal of a satellite energy system intelligent control method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a satellite energy system intelligent control method according to an embodiment of the application;
FIG. 3 is a flow chart of a satellite energy system intelligent control method in accordance with a preferred embodiment of the present application;
FIG. 4 is a block diagram of a data preprocessing and feature extraction module in a satellite energy system intelligent control method according to a preferred embodiment of the present application;
FIG. 5 is a block diagram of state analysis and optimization requirement assessment in the intelligent control method of the satellite energy system according to the preferred embodiment of the present application;
fig. 6 is a diagram of an intelligent decision framework in the intelligent control method of the satellite energy system according to the preferred embodiment of the application.
Detailed Description
The present application will be described and illustrated with reference to the accompanying drawings and examples in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by a person of ordinary skill in the art based on the embodiments provided by the present application without making any inventive effort, are intended to fall within the scope of the present application. Moreover, it should be appreciated that while such a development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as having the benefit of this disclosure.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by those of ordinary skill in the art that the described embodiments of the application can be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," and similar referents in the context of the application are not to be construed as limiting the quantity, but rather as singular or plural. The terms "comprises," "comprising," "includes," "including," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in connection with the present application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means greater than or equal to two. "and/or" describes the association relationship of the association object, and indicates that three relationships may exist, for example, "a and/or B" may indicate that a exists alone, a and B exist simultaneously, and B exists alone. The terms "first," "second," "third," and the like, as used herein, are merely distinguishing between similar objects and not representing a particular ordering of objects.
The method embodiment provided in this embodiment may be executed in a terminal, a computer or a similar computing device. Taking the operation on the terminal as an example, fig. 1 is a block diagram of the hardware structure of the terminal of the intelligent control method of the satellite energy system according to the embodiment of the invention. As shown in fig. 1, the terminal may include one or more processors 102 (only one is shown in fig. 1) (the processor 102 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing data, and optionally, a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting on the structure of the terminal described above. For example, the terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.
The memory 104 may be used to store computer programs, such as software programs and modules of application software, such as computer programs corresponding to the intelligent control method of the satellite energy system in the embodiment of the present invention, and the processor 102 executes the computer programs stored in the memory 104 to perform various functional applications and data processing, that is, implement the above-mentioned method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. The specific example of the network described above may include a wireless network provided by a communication provider of the terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as a NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.
The embodiment provides a satellite energy system intelligent control method, fig. 2 is a flowchart of the satellite energy system intelligent control method according to an embodiment of the application, as shown in fig. 2, the flowchart includes the following steps:
Step S201, acquiring real-time operation data of a satellite energy system;
Wherein, the real-time operation data of the satellite energy system is collected in real time through the sensors arranged on the satellite, the real-time operation data comprises power supply system data (such as battery voltage, current, temperature, charge-discharge state and the like), load system data (power consumption of each device, working state (such as a communication module, a sensor, a propeller and the like), satellite environment data (solar illumination intensity, space radiation dose, satellite surface temperature and the like), the real-time operation data collected by the sensor in the step covers three dimensions of power supply, load and environment, provides complete input for subsequent model analysis, and can ensure timely monitoring and analysis of the state of the satellite energy system by collecting the data in real time through the sensor.
Step S202, inputting real-time operation data into a trained satellite energy system environment model, and carrying out semantic understanding and feature association analysis on the real-time operation data by utilizing a large language submodel in the satellite energy system environment model to extract key features;
The real-time operation data is input into a trained satellite energy system environment model, and is subjected to data preprocessing, such as cleaning (invalid value, repeated value and abnormal value are removed, and integrity and accuracy of the data are ensured), denoising (noise in the data is eliminated through a filtering algorithm or a statistical method, and the signal-to-noise ratio of the data is improved), and normalization (unified normalization of the data with different dimensions and magnitudes). A large language sub-model is embedded in the satellite energy system environment model, semantic understanding and feature association analysis are carried out on the preprocessed real-time operation data by utilizing the semantic understanding capability of the pre-trained large language model, historical data and real-time data are associated, and key features which are vital to energy system state evaluation and optimization decision are extracted. The key features are that high-order abstract indexes which are extracted from real-time operation data and can reflect the core state of the system are synthesized, dynamic changes of a power supply system, load equipment and environmental conditions are integrated, input is provided for an intelligent decision frame, and generation of an optimization strategy is supported. Key features include the state of health of the power supply system (e.g., battery aging), the power consumption trend of the load system (e.g., periodic high load periods), dynamic changes in environmental conditions, and the like. The large language submodel outputs key features for use by a subsequent intelligent decision framework. The method comprises the steps of carrying out preprocessing operations such as cleaning, denoising and normalizing before data are input into a large language sub-model, improving the data quality, providing a reliable basis for subsequent analysis, carrying out deep analysis on real-time operation data through the large language sub-model, more accurately identifying state changes and potential faults of an energy system, and generating an optimization strategy by an intelligent decision framework based on extracted key features, thereby improving decision efficiency and accuracy.
Step S203, generating state data corresponding to a state space based on key features by utilizing an intelligent decision framework in a satellite energy system environment model, generating action data corresponding to an action space based on the key features and the state data, and generating a reward prediction value based on the key features, the state data and a reward function, wherein the intelligent decision framework comprises a state space, an action space and the reward function;
The state space is a set of all parameters describing the real-time running state of the satellite energy system and covers key dimensions such as energy supply and demand, environmental conditions, task demands and the like, the action space is a set of all optimization operations executable by the satellite energy system, the energy is efficiently utilized by adjusting hardware or scheduling strategies, and the rewarding function is used for quantifying the advantages and disadvantages of each action and integrating the energy efficiency, the system stability and the task completion degree through a mathematical formula to guide the model to learn the optimal strategy. The intelligent decision framework converts the complex optimization problem of the satellite energy system into a learnable decision process through state space perception environment, action space execution control and rewarding function evaluation effect. Based on the key characteristics (such as power health degree, load trend and environment dynamic change) extracted in the steps, generating state data, wherein the state data comprises data of solar panel output power, total power consumption requirement of load equipment, illumination intensity and the like. And outputting candidate actions in the action space, namely action data, based on the fused key characteristics and state data. Calculating a reward predicted value of the candidate action by using a formula of a reward function, wherein the formula is as follows:
in the above formula, the energy utilization efficiency (η) is determined by the effective energy utilization efficiency, the stability (σ) is calculated by the degree to which the parameter deviates from the safe range, the task completion degree (T) is weighted by the priority, As weight coefficients, for balancing the priorities of the different optimization objectives.
Specifically, if the key characteristics are that the relevance of capacity reduction caused by battery aging and illumination interruption of a shadow area is identified, the fact that illumination cannot be recovered within 30 minutes in the future is predicted, and battery power supply is needed. The generated action data comprises action 1 for adjusting satellite gesture and attempting to depart from shadow area in advance (but possibly increasing energy consumption), action 2 for suspending low priority tasks and fully distributing available energy (200W) to communication tasks, and action 3 for reducing load equipment voltage (hidden action) and reducing instantaneous power consumption. Next, a bonus function evaluation is performed, namely, if action 1 is successfully separated from the shadow region, the energy utilization efficiency eta is improved, but the posture adjustment consumes energy (the stability is improved), the bonus may be that r=0.7, action 2 directly ensures the task to be completed (t=0.9), but the battery exhaustion risk is high (sigma=0.5), the bonus r=0.85, action 3 has the best stability (sigma=0.2), but the task completion degree is reduced (t=0.6), and the bonus r=0.65.
The method comprises the steps of integrating energy efficiency, system stability and task completion degree through a reward function, scientifically balancing different optimization targets, avoiding system unbalance caused by single index optimization, generating an action strategy in real time, ensuring quick response in a dynamic space environment (such as illumination mutation and load fluctuation), abstracting a complex energy system optimization problem into an intelligent decision problem, defining a definite state space, an action space and the reward function, and enabling the problem to be more structured and resolvable.
And step S204, generating an optimization strategy based on the action data and the rewards predicted value, and performing real-time control operation on the satellite energy system based on the optimization strategy.
And selecting the action with the highest rewarding value as the optimal strategy through action data (such as adjusting the angle of the solar panel, satellite gesture and the like) and corresponding rewarding predicted values. For example, if the prize predictor r=0.85 for the action "pause low priority task" is higher than the other actions, it is determined as an optimization strategy. The abstract actions are converted into specific control instructions (such as 'disabling the observation equipment'), and the satellite control module executes strategies to perform real-time control operations, such as adjusting the angle of a solar panel, maximizing the light capture, dynamically distributing energy, and preferentially guaranteeing high-priority tasks (such as communication equipment). In addition, the system state (such as battery power recovery and task completion) after execution can be monitored, and model parameters can be updated to optimize the subsequent strategies. And finally, the real-time running state of the satellite energy system, the execution effect of the optimization strategy and the key performance index can be visually displayed through a human-computer interaction interface, a detailed optimization report is generated, and decision support is provided for operation and maintenance personnel. The step can dynamically adjust according to the real-time state and task requirement of the satellite energy system by real-time control operation, and ensures the high-efficiency and stable operation of the system.
Through the steps, the real-time operation data are firstly obtained, the large language submodel is utilized to carry out semantic analysis and characteristic association on the original data, the limitation of the traditional threshold judgment or simple statistical model is broken through, and the deep association of multi-source heterogeneous data is captured through the context understanding capability (such as carrying out semantic level association analysis on solar wing temperature fluctuation and load power consumption change). And then, constructing a dynamic state space in an intelligent decision framework, mapping environmental parameters, task priorities and the like into high-dimensional feature vectors, and compared with a traditional table look-up method or a fixed rule base, generating a refined action space comprising 'adjusting solar wing angles', 'dynamic power distribution' and the like in real time and quantifying strategy values through a reward function (such as energy utilization rate multiplied by task completion degree weight coefficient). Finally, in the strategy generation stage, an action sequence with the highest rewarding predicted value is dynamically selected through algorithms such as Monte Carlo tree search and the like, for example, the communication module is preferentially guaranteed to supply power and the standby load is started, compared with the traditional priority queue algorithm, multi-target pareto optimal solution calculation can be completed in millisecond level, the method effectively solves the conflict between dynamic environment adaptation lag and resource allocation, and therefore solves the technical problems that a satellite energy system is difficult to adapt to environment change and cannot efficiently balance multi-task demands in a dynamic space environment.
In some of these embodiments, the method further comprises:
Based on key characteristics, generating an optimization target and a constraint condition by combining a preset satellite task target and an environment constraint condition;
defining a reward function based on the optimization objective;
Based on the constraints, a state space and an action space are defined.
Wherein, based on key characteristics (such as battery state of health E health, load power consumption trend P load, environmental dynamics T env), preset satellite task targets (task priority R task), and environmental constraints (such as illumination intensity, battery capacity limit), optimization targets and constraint conditions are generated. The optimization target comprises maximizing energy efficiency, guaranteeing system stability and meeting task requirements, and the constraint conditions comprise power output capacity, load power consumption requirements and environmental dynamic changes.
Specifically, the optimization objective is as follows:
The energy utilization efficiency is maximized, namely the energy waste is reduced and the energy utilization rate is improved by optimizing the power output and the load distribution. The objective function may be expressed as:
;
In the above formula, η is the energy utilization efficiency, P used is the energy to be effectively utilized, and P total is the total available energy.
Ensuring the system stability, ensuring the operation of a power supply system and load equipment in a safe range, and avoiding system faults caused by insufficient energy or overload. The objective function may be expressed as:
;
In the above formula, σ is a system stability index, S i is an actual value of the ith system parameter, S target,i is a target value, S max,i and S min,i are safety upper and lower limits of the parameters, respectively.
The requirements of the satellite tasks are met, namely, the energy allocation is dynamically adjusted according to the priority and the real-time requirements of the satellite tasks, and smooth completion of the key tasks is ensured. The objective function may be expressed as:
;
In the above formula, T is the task completion, For the priority weight of the j-th task, C j is the task completion status (1 indicates complete, 0 indicates incomplete).
The constraint conditions are as follows:
the output capacity of the power supply system is limited by considering the battery capacity, the output power of the solar panel and the like, so that the power supply system is ensured to operate in a safe range. Constraints can be expressed as:
Wherein, the For the output power of the power supply system,AndThe lower and upper limits of the output power, respectively.
And the power consumption requirement of the load equipment is that energy is reasonably distributed according to the power consumption characteristics and task priority of each equipment, so that local overload or energy shortage is avoided. Constraints can be expressed as:
Wherein, the Is the firstThe power consumption of the individual load devices,Is the currently available energy source.
And (3) dynamically changing environmental conditions, namely dynamically adjusting an optimization strategy by considering the influence of environmental factors such as illumination intensity, temperature, radiation and the like on an energy system. Constraints can be expressed as:
Wherein, the As an influencing factor for the environmental conditions,AndThe lower and upper limits of the influencing factors, respectively.
And the real-time running state of the satellite energy system is that the system state is estimated and the optimization strategy is adjusted based on the real-time monitoring data, so that the stability and the high efficiency of the system are ensured. Constraints can be expressed as:
Wherein, the In order to be in the current operating state,Is a set of security states.
Through the comprehensive consideration of the optimization targets and the constraint conditions, a scientific and reasonable energy management strategy can be formulated, the efficient and stable operation of the satellite energy system is realized, the satellite task targets described by natural language (such as 'preferential guarantee scientific experiment load') are converted into mathematical constraints, the optimization directions are strictly aligned with the actual demands of the system, and the problem of target ambiguity of the traditional threshold control method is avoided.
In some of these embodiments, the state space includes a current state of charge of the battery, a current output power of the solar panel, a current power consumption requirement of the load device, a current environmental condition, a priority of the satellite mission, and a real-time requirement, a health status of the energy system;
the action space comprises an angle adjustment amount of a solar panel, a satellite attitude adjustment amount and an energy scheduling strategy adjustment amount;
The sub-objectives in the reward function include energy utilization efficiency, system stability, and task completion.
The state space S is used for describing the real-time operation state of the satellite energy system, and is defined as:
Wherein, the For the current state of charge of the battery,For the current output power of the solar panel,For the current power consumption requirements of the load device,Is the current environmental conditions such as the ambient temperature, the illumination intensity and the like,For the priority and real-time requirements of satellite tasks,Is the health state of the energy system.
The action space a is used for describing the optimal control operation of the satellite energy system, and is defined as follows:
Wherein, the Is the angle adjustment of the solar panel,Is the adjustment of satellite attitude,Adjustment of energy scheduling strategies (such as task priority adjustment and the like).
The reward function R is used to evaluate the effect of each action, and the sub-objectives include energy utilization efficiency, system stability, and task completion, and is defined as:
Wherein, the The energy utilization efficiency is calculated by the following formulaSigma is a system stability index, and a calculation formula is;For the task completion degree, the calculation formula is as follows;As weight coefficients, for balancing the priorities of the different optimization objectives.
The state space carries out semantic understanding and feature extraction on real-time operation data through a large language sub-model, multi-dimensional data such as power supply, load, environment and the like are fused to generate, the action space generates candidate action sequences through a deep reinforcement learning strategy network based on the state space and a reward function, and the reward function adjusts the weight of each sub-target in real time according to environmental changes (such as sudden faults and task urgency). The step combines the energy efficiency, the system stability and the task completion degree through the reward function, and avoids unbalance caused by single-objective optimization. The reward function quantifies the complex target into a learnable numerical value so as to generate optimal action, the state space is updated in real time, the action space is fast responded, the real-time adaptability of the system is guaranteed, the action space strictly follows the limitation of the power output capacity and the load demand, the overload or energy shortage risk is avoided, the real-time data and the historical characteristics are fused, the accurate decision is supported, the efficiency, the stability and the task demand are scientifically balanced, the manual intervention is reduced, and the survivability and the task success rate of the satellite in the complex space environment are improved.
In some of these embodiments, generating an optimization strategy based on the action data and the reward prediction value includes:
And selecting an optimal action from the action data with the aim of maximizing the reward predicted value, and generating an optimization strategy based on the optimal action.
Based on the motion data (candidate motion in motion space) and the corresponding predicted value of rewards (such as "adjusting solar panel angle" rewards r=0.6 and "suspending low priority tasks" rewards r=0.85), the motion with the highest rewards value is selected as the optimal strategy through the value network of deep reinforcement learning. For example, when a satellite enters a shadow zone, "pause low priority tasks" are selected for higher prize values. The optimal actions are then mapped to specific control commands such as suspending the viewing device (saving 200W power consumption), adjusting the solar panel angle (from 30 ° to 45 °, maximizing the light capture). And a satellite control module sends out an instruction (such as a driving motor for adjusting the angle of the solar panel). In addition, parameters such as battery power, task completion degree and the like after execution can be tracked in real time, an execution result is fed back to the model, strategy network parameters are updated, and the subsequent decision accuracy is improved. According to the method, the action with the highest rewarding value is selected, so that energy waste is reduced, and the situations of illumination mutation, load fluctuation and the like are responded in real time.
In some of these embodiments, the method further comprises:
Deep analysis is carried out on the running state of the satellite energy system by utilizing the large language submodel, and whether the current running state of the satellite energy system is abnormal or not is judged;
If the current running state is an abnormal mode, generating early warning information and continuously utilizing an intelligent decision framework to generate state data, action data and rewarding predicted values;
if the current running state is not in an abnormal mode, the intelligent decision framework is directly utilized to generate state data, action data and rewarding predicted values.
The method comprises the steps of carrying out semantic association analysis on preprocessed real-time operation data (power supply system, load system and environment data) and history data by utilizing a large language submodel, extracting key characteristics, and judging whether the current state is abnormal or not based on characteristic threshold values or dynamic rules (such as that the battery temperature exceeds a safety range and the load power consumption is not matched with the task priority). If the judgment is abnormal, generating early warning information (such as 'battery temperature overrun, risk level: high'), and triggering the intelligent decision frame to generate state data, action data and rewarding predicted values; if the state is normal, the method directly enters an intelligent decision framework to generate an optimization strategy. According to the method, abnormal modes such as battery aging and load sudden increase are rapidly identified through semantic association capability of a large language model, early warning (such as 'illumination interruption risk early warning') is generated in advance, system breakdown or task interruption is avoided, under abnormal conditions, an intelligent decision frame is combined with early warning information to generate a targeted strategy (such as suspending non-critical tasks), stable operation of core functions is guaranteed, under normal conditions, energy distribution is optimized based on real-time state data, and multitasking requirements such as efficiency, stability and task requirements are balanced.
In some of these embodiments, determining whether the current operating state of the satellite energy system is abnormal includes:
generating a safety state set based on a preset environment constraint condition;
judging whether the current running state of the satellite energy system belongs to a safety state set or not;
If the current running state of the satellite energy system belongs to the safety state set, the current running state is an abnormal mode;
If the current running state of the satellite energy system does not belong to the safety state set, the current running state is not in an abnormal mode.
Wherein, based on preset environmental constraint conditions (such as battery voltage range, temperature threshold, upper limit of load power consumption, etc.), a safety state set is defined, for example, the battery voltage safety range is 3.3 V≤V battery≤4.2V, and the temperature safety range T env E [ -20 ℃ and 50 ℃. The secure ranges of the multidimensional parameters are combined into a secure State set State safe of the high-dimensional space. Based on the State data, a current operating State current is obtained, which includes battery level P battery, solar output power P solar, ambient temperature T env, and the like. If State current State safe, judging as an abnormal mode, triggering early warning, and if State current∈Statesafe, judging as a normal mode, and directly entering an optimization decision flow. In addition, the large language submodel can carry out auxiliary analysis, and combines historical data and real-time characteristics to identify implicit anomalies (such as capacity fading trend caused by battery aging), so that the comprehensiveness of judgment is enhanced. And under the abnormal mode, specific early warning content (such as 'the battery voltage is over-limited: the current value is 4.3V, and the safety range is 3.3V-4.2V') is generated and pushed to an operation and maintenance interface. According to the embodiment, the parameter out-of-range (such as temperature overrun and voltage abnormality) is quickly identified through comparison of the preset constraint condition and the real-time state, hysteresis of a traditional rule engine is avoided, safety ranges of multiple parameters such as power supply, load and environment are comprehensively considered, and system-level cascading failure is avoided.
In some of these embodiments, the training process of the satellite energy system environment model includes:
constructing a simulation environment of a satellite energy system and acquiring training data of the satellite energy system;
Inputting training data into an initial environment model, and running the initial environment model in a simulation environment, wherein the initial environment model comprises an initial large language sub-model and an optimization algorithm module;
Carrying out semantic understanding and feature association analysis on training data by utilizing an initial large language submodel in a satellite energy system environment model, generating training key features, constructing a large language loss function, substituting the training key features into the large language loss function for calculation, and adjusting parameters of the initial large language submodel with the calculation result of the large language loss function minimized as a target to obtain a trained large language submodel;
Inputting the training key features into an optimization algorithm module, constructing an optimization loss function, and adjusting parameters of the optimization algorithm module with the minimum calculation result of the optimization loss function as a target to obtain a trained intelligent decision frame;
Based on the large language submodel and the intelligent decision frame, a trained satellite energy system environment model is obtained.
The method comprises the steps of constructing a high-fidelity simulation environment based on a physical model (such as a power supply system, a load system and environmental conditions) of a satellite energy system, and simulating dynamic scenes such as illumination intensity fluctuation, load demand change, equipment aging and the like. Running preset tasks (such as track adjustment and burst communication tasks) in the simulation environment, and collecting real-time data such as voltage/current of a power supply system, power consumption of load equipment, environmental temperature and the like to form training data. The initial environment model includes an initial large language sub-model (Large Language Model, LLM large language model) and an optimization algorithm module (deep reinforcement learning network). The training data is input into an initial environment model, the initial large language sub-model performs primary feature extraction on the initial environment model, an optimization algorithm generates an initial action strategy, and the simulation environment updates the state according to the action and feeds back the rewarding value. Specifically, the initial large language sub-model performs semantic association analysis on training data, and extracts key features (such as battery health and load power consumption trend). Constructing a large language loss function, and calculating the difference between the output characteristics of the initial large language submodel and the actual state of the simulation environment:
In the above formula, y i is a real feature tag (e.g., battery capacity decay rate) provided for the simulated environment.
And then, adjusting parameters of the initial large language submodel by a gradient descent method, minimizing a loss function, and improving feature extraction precision to obtain the trained large language submodel.
And inputting training key features (such as E health、Pload) extracted from the initial large language submodel into an optimization algorithm module, constructing an optimization loss function, adjusting the optimization loss function parameters through experience playback and gradient descent, and maximizing the cumulative rewards to obtain a trained intelligent decision frame. Combining the trained large language sub-model with an intelligent decision framework, wherein the large language sub-model is responsible for real-time feature extraction, and the intelligent decision framework generates an action strategy based on the features to obtain the trained satellite energy system environment model.
The simulation environment can reproduce high dynamic performance (such as illumination interruption of a shadow area) of the space environment, training data are ensured to cover various extreme conditions, high cost of a real satellite experiment is avoided, large-scale annotation data are quickly generated, feasibility of an initial model is verified through the simulation environment, early defects are identified, satellite system faults caused by false actions are prevented, a large language sub-model can understand deep association of mining data (such as a relation between battery temperature and load fluctuation) through semantics, feature quality is improved, the trained large language sub-model can identify hidden anomalies (such as voltage fluctuation caused by battery aging), a trained optimization algorithm can quickly generate actions to respond to dynamic environment changes, the semantic understanding of the large language sub-model is complementary with decision capability of the optimization algorithm, overall model performance is improved, the model can adapt to unknown environments (such as sudden radiation interference), and stable operation of satellites under complex scenes is guaranteed.
In some of these embodiments, acquiring training data for a satellite energy system includes:
Generating experience data by using a state transfer function in a simulation environment, wherein the state transfer function is used for describing state change of a satellite energy system under given current state, execution action and environmental random factors;
based on the empirical data, training data is obtained.
The method comprises the steps of constructing a high-fidelity simulation environment comprising a power supply system, a load system and environmental dynamics, and simulating the scenes of illumination intensity change, load fluctuation, equipment aging and the like. The state transfer function is s t+1=f(st,at,et), where s t is current state data (e.g., battery power, load demand), a t is action data to be performed (e.g., adjusting the angle of the solar panel), and e t is an environmental random factor (e.g., abrupt illumination, device noise). The model is run in a simulation environment, and the state, action, rewards, and next state of each step are recorded to form an empirical data tuple (s t,at,rt,st+1). Empirical data is stored in the empirical pool D supporting subsequent random sampling to break the data correlation. The data are sampled in batches from the experience pool D as training data for updating parameters of the large language submodel and the optimization algorithm module. The training data includes state data s t, action data a t, instant prize value r t, and next state data s t+1. The method introduces the uncertainty of the environmental random factor e t to simulate illumination fluctuation, equipment noise and the like, enhances the adaptability of the model to the dynamic environment, samples data in batches from an experience pool, improves the data utilization rate, accelerates model convergence, reduces data correlation by random sampling, improves model generalization, and enables the model to be still capable of making a stable decision in an unknown scene (such as burst radiation interference).
In some embodiments, the training key features are input into an optimization algorithm module, an optimization loss function is constructed, the parameters of the optimization algorithm module are adjusted with the calculation result of the optimization loss function minimized as a target, and a trained intelligent decision frame is obtained, including:
the optimization algorithm module comprises a strategy network and a value network;
Generating action probability distribution based on the training key characteristics and the training state data through a strategy network, and selecting training action data corresponding to the action space based on the action probability distribution;
calculating an instant prize value for the current training motion data based on the training state data, the training motion data, and the prize function via the value network;
And calculating an optimized loss function result according to the training state data, the training action data, the instant rewarding value and the next training state data generated by the state transfer function, and adjusting parameters of the optimization algorithm module with the minimized optimized loss function result as a target to obtain a trained intelligent decision frame.
The optimization algorithm module comprises a Policy Network (pi) and a Value Network (Q). The training key features and the state data are input into the strategy network, and an action probability distribution is output, wherein the action probability distribution represents the probability that each action in the action space is selected under the given current state data. For example, the action probability= [0.6,0.3,0.1] corresponds to A= { delta theta solar,ΔSschedule,Δθattitude},Δθsolar as the adjustment amount of the solar panel angle, delta theta attitude as the adjustment amount of the satellite attitude, delta S schedule as the adjustment of the energy scheduling strategy (such as task priority adjustment) and the like, and the actions (such as the following) are selected according to probability distribution). The value network evaluates the long-term benefits of the action, directs the policy network to optimize the direction, inputs the status data and the action data into the value network, calculates an instant prize value based on a prize function, quantifies the action effect, e.g., adjusts the prize r=0.85 for the solar panel angle action, over other actions. The loss functions of the policy network and the value network of the optimization algorithm module are respectively defined as follows:
;
In the above formula, L π is the loss function of the policy network with the goal of maximizing the expected value of the long-term jackpot, E is the expected value, Q (s t,atQ) is the state action value function of the value network output, representing the expected long-term jackpot value of executing action a t in state s t , s t is the state data of time step t, pi (s tπ) is the parameter of the policy network under parameter theta π, action a tπ generated according to state s t is the parameter of the policy network, and is optimized by the gradient descent method, theta Q is the parameter of the value network, and is fixed for training of the policy network (avoiding target value fluctuation).
;
In the above formula, L Q is a loss function of the value network, the goal is to make the predicted value approach to the real target value (mean square error is minimized), N is the training data amount sampled from the experience pool, Q (s i,aiQ) is the predicted value of the value network under the parameter theta Q for the state s i and the action a i, r i is the instant rewarding value obtained after the action a i is executed, gamma is a discount factor (0≤gamma≤1), and the importance of the current rewards and the future rewards is balanced (if gamma=0.9 represents more important recent rewards); To select the action a i+1Q′ that maximizes the output of the target value network (parameter θ Q′) in the next state s i+1 as the parameter of the target value network, the synchronization is periodically performed from θ Q (steady training process).
The optimized loss function comprises a loss function of a strategy network and a loss function of a value network, network parameters theta π and theta Q are adjusted by a gradient descent method with the aim of minimizing the result of the optimized loss function, and the data correlation is broken by combining experience playback. According to the method, diversified actions are generated through the training strategy network, the training value network is accurately evaluated, the efficiency, the stability and the task requirements are balanced, exploration and utilization are balanced, the training time is shortened through batch training and experience playback acceleration model convergence, and the optimized model can adapt to an unknown dynamic environment.
The embodiments of the present application will be described and illustrated below by means of preferred embodiments.
Fig. 3 is a flowchart of a satellite energy system intelligent control method according to a preferred embodiment of the present application, and as shown in fig. 3, the satellite energy system intelligent control method comprises the steps of:
and S1, constructing a satellite energy system environment model.
An environment model of the satellite energy system is built, wherein the environment model comprises power system data (such as battery voltage, current, temperature and the like), load system data (such as power consumption, working state and the like of each device) and satellite environment data (such as illumination intensity, temperature, radiation and the like) which run in real time. And acquiring data by using the sensor to form a complete running state description of the satellite energy system.
The input of the satellite energy system environment model is real-time operation data of the satellite energy system, wherein the real-time operation data mainly comprise power supply system data, load system data, satellite environment data and the like. Power system data including, but not limited to, battery voltage, current, temperature, charge-discharge status, solar panel output power, etc., is used to describe the real-time operating status of the satellite power system. Load system data including, but not limited to, power consumption, operating status, task priority, etc. of each device (e.g., communication device, sensor, computing unit, etc.) is used to describe the real-time requirements of the satellite load system. Satellite environmental data including, but not limited to, light intensity, temperature, radiation level, orbital position, etc., is used to describe the external environmental conditions under which the satellite operates. By collecting the multidimensional data, a complete satellite energy system environment model is constructed, and a comprehensive data base is provided for subsequent intelligent monitoring and optimization.
And S2, establishing a data preprocessing and feature extraction module.
Fig. 4 is a frame diagram of a data preprocessing and feature extraction module in the intelligent control method of the satellite energy system according to the preferred embodiment of the present application, and as shown in fig. 4, the collected satellite energy system data is subjected to cleaning, denoising and normalization processing, and key features are extracted. And carrying out semantic understanding and feature association analysis on the historical data and the real-time data by using a large language model, and constructing high-quality data input.
The data preprocessing and feature extraction module is established based on real-time operation data of the satellite energy system, and is mainly used for cleaning, denoising and normalizing acquired power system data, load system data and satellite environment data, extracting key features and providing high-quality data input for subsequent large language model analysis.
And (3) cleaning data, namely removing invalid values, repeated values and abnormal values in the acquired data, and ensuring the integrity and the accuracy of the data. And (3) denoising the data, namely eliminating noise in the data by a filtering algorithm or a statistical method, and improving the signal-to-noise ratio of the data. And data normalization, namely unified standardization of data with different dimensions and magnitudes is realized, so that subsequent analysis and model processing are facilitated. And extracting the key features (such as the health state of a power supply system, the power consumption trend of a load system, the dynamic change of environmental conditions and the like) from the preprocessed data by utilizing the semantic understanding capability of the large language model, and providing a high-quality data basis for intelligent monitoring and optimization. Through the processing, the data preprocessing and feature extraction module can remarkably improve the usability of data and provide reliable support for subsequent large language model analysis and optimization decision.
And S3, analyzing the state of the energy system and optimizing the requirement.
Fig. 5 is a frame diagram of state analysis and optimization requirement evaluation in the intelligent control method of the satellite energy system according to the preferred embodiment of the present application, as shown in fig. 5, the operation state of the satellite energy system is deeply analyzed based on a large language model, an abnormal mode is identified, a potential failure is predicted, and the optimization requirement of the current energy system is evaluated. And formulating an optimization target and constraint conditions by combining the satellite task target and the environment constraint.
In the state analysis and optimization demand evaluation of the energy system, the optimization targets mainly are maximizing the energy utilization efficiency, guaranteeing the system stability and meeting the satellite task demands, and the constraint conditions consider the output capacity of the power supply system, the power consumption demands of load equipment, the dynamic changes of environmental conditions, the real-time running state of the satellite energy system and the like.
The optimization targets are as follows:
The energy utilization efficiency is maximized, namely the energy waste is reduced and the energy utilization rate is improved by optimizing the power output and the load distribution. The objective function may be expressed as:
Wherein, the In order to achieve the energy utilization efficiency of the energy,In order to make efficient use of the energy,Is the total available energy source.
Ensuring the system stability, ensuring the operation of a power supply system and load equipment in a safe range, and avoiding system faults caused by insufficient energy or overload. The objective function may be expressed as:
wherein sigma is a system stability index, Is the actual value of the ith system parameter,For the target value to be a target value,AndThe upper and lower safety limits of the parameter, respectively.
The requirements of the satellite tasks are met, namely, the energy allocation is dynamically adjusted according to the priority and the real-time requirements of the satellite tasks, and smooth completion of the key tasks is ensured. The objective function may be expressed as:
Wherein, the In order to achieve the degree of completion of the task,For the priority weights of j tasks,For the task complete state (1 indicates complete, 0 indicates incomplete).
The constraint conditions are as follows:
the output capacity of the power supply system is limited by considering the battery capacity, the output power of the solar panel and the like, so that the power supply system is ensured to operate in a safe range. Constraints can be expressed as:
Wherein, the For the output power of the power supply system,AndThe lower and upper limits of the output power, respectively.
And the power consumption requirement of the load equipment is that energy is reasonably distributed according to the power consumption characteristics and task priority of each equipment, so that local overload or energy shortage is avoided. Constraints can be expressed as:
Wherein, the Is the firstThe power consumption of the individual load devices,Is the currently available energy source.
And (3) dynamically changing environmental conditions, namely dynamically adjusting an optimization strategy by considering the influence of environmental factors such as illumination intensity, temperature, radiation and the like on an energy system. Constraints can be expressed as:
Wherein, the As an influencing factor for the environmental conditions,AndThe lower and upper limits of the influencing factors, respectively.
And the real-time running state of the satellite energy system is that the system state is estimated and the optimization strategy is adjusted based on the real-time monitoring data, so that the stability and the high efficiency of the system are ensured. Constraints can be expressed as:
Wherein, the In order to be the current state of the system,Is a set of security states.
Through the comprehensive consideration of the optimization target and the constraint condition, a scientific and reasonable energy management strategy can be formulated, and the efficient and stable operation of the satellite energy system can be realized.
And S4, converting the energy system optimization problem into an intelligent decision problem.
Fig. 6 is a diagram of an intelligent decision framework in the intelligent control method of the satellite energy system according to the preferred embodiment of the present application, as shown in fig. 6, to convert the monitoring and optimization problem of the satellite energy system into an intelligent decision problem based on a large language model. And defining a state space and an action space, and constructing an intelligent decision framework. The state space comprises a power supply system state, a load system state and environmental conditions, and the action space comprises power supply output adjustment, load mode optimization and energy scheduling strategies.
Modeling an energy system optimization problem, converting the monitoring and optimization problem of a satellite energy system into an intelligent decision problem based on a large language model, defining a state space, an action space and a reward function, and constructing an intelligent decision framework.
The state space S is used for describing the real-time operation state of the satellite energy system, and is defined as:
Wherein, the For the current state of charge of the battery,For the current output power of the solar panel,For the current power consumption requirements of the load device,Is the environmental conditions such as the ambient temperature, the illumination intensity and the like,E health is the health status of the energy system (such as battery aging degree, equipment failure risk, etc.) for the priority and real-time demand of satellite tasks.
The action space a is used for describing the optimal control operation of the satellite energy system, and is defined as follows:
Wherein, the Is the angle adjustment of the solar panel,Is the adjustment of satellite attitude,Adjustment of energy scheduling strategies (such as task priority adjustment and the like).
Further, according to the current illumination condition and the satellite position, calculating the optimal solar panel angleThe adjustment angle of the solar panel is defined as:
Wherein, the Is the current solar panel angle.
According to the task requirements and the energy optimization target, calculating the optimal satellite attitudeThe adjustment amount of the satellite attitude is defined as:
Wherein, the Is the current satellite attitude.
According to the current energy system state and task demands, calculating an optimal energy scheduling strategyThe adjustment of the energy scheduling policy is defined as:
Wherein, the And scheduling the strategy for the current energy.
The reward function R is used for evaluating the effect of each action, comprehensively considering the energy utilization efficiency, the system stability and the task completion, and is defined as follows:
Wherein, the The energy utilization efficiency is calculated by the following formulaSigma is a system stability index, and a calculation formula is;For the task completion degree, the calculation formula is as follows;As weight coefficients, for balancing the priorities of the different optimization objectives.
The intelligent decision framework is constructed by integrating a state space, an action space and a reward function into the intelligent decision framework based on semantic understanding and reasoning capability of the large language model. And generating an optimal energy management strategy through analysis of the historical data and the real-time data by the large language model. The core of the decision framework is to find the optimal strategy by Markov Decision Process (MDP) modeling, so that the jackpot is maximized:
Wherein, the Is a discount factor used to balance the importance of current rewards with future rewards.
And S5, building a network structure in which the large language model and the optimization algorithm are fused.
And designing a network structure of fusion of the large language model and an optimization algorithm (such as deep reinforcement learning, heuristic algorithm and the like) to generate a reward function. The reward function comprehensively considers the energy utilization efficiency, the system stability and the task completion degree, and improves the decision efficiency of the optimization algorithm through semantic understanding and reasoning capability of the large language model.
The network structure design of the fusion of the large language model and the optimization algorithm comprises embedding of the large language model and the optimization algorithm so as to realize intelligent monitoring and optimization of the satellite energy system.
Embedding a large language model, namely embedding the pre-trained large language model into an optimization framework, and performing deep analysis on real-time operation data of the satellite energy system by utilizing strong semantic understanding and reasoning capability of the pre-trained large language model. The input of the large language model comprises a state space S and historical data, and the output is a semantic understanding and feature extraction result of the current system state.
The output of the large language model is defined as:
Wherein, the The i-th key feature extracted for the large language model.
And (3) optimizing the satellite energy system by utilizing deep reinforcement learning, inputting the characteristics and the state space S extracted for the large language model, and outputting the characteristics and the state space S as an optimizing action in the action space A.
The objective of the optimization algorithm is to find the optimal actions through the policy network pi and the value network Q:
Wherein, the AndParameters of the policy network and the value network, respectively.
And step S6, training and optimizing the model in the simulation environment.
Training a model in the constructed satellite energy system simulation environment, and continuously learning optimization strategies in historical data and real-time data through a large language model. And collecting training experience, updating model parameters, and gradually improving the monitoring precision and the optimizing capability of the model. The simulation environment simulates dynamic changes in the space environment, including illumination condition fluctuation, load demand change and sudden fault events, so as to enhance the generalization capability of the model.
In the model training and optimizing process, through training a network structure fused with a large language model and an optimizing algorithm in a simulation environment, experience data are collected, model parameters are updated, and the monitoring precision and optimizing capability of the model are gradually improved.
And constructing a simulation environment, namely constructing a high-fidelity satellite energy system simulation environment, and simulating dynamic changes of a power supply system, a load system and environmental conditions. The input of the simulation environment comprises power system data, load system data and environment data, and the output is the real-time running state and the optimization result of the system.
The state transfer function of the simulated environment is defined as:
Wherein, the In the event of a current state,For the current action to be taken,Is an environmental random factor.
Experience data collection, namely running a model in a simulation environment and collecting experience dataWherein: In the event of a current state, For the current action to be taken,For the current prize to be awarded,The next state. The empirical data is stored in the empirical pool D for subsequent model training and parameter updating.
Model training and parameter updating, namely randomly sampling a batch of experience data from an experience pool D, and training a large language model and an optimization algorithm. The training process comprises large language model training and optimization algorithm training.
First, update parameters of a large language model with states and rewards in empirical dataAnd semantic understanding and feature extraction capability of the method are improved. The loss function is defined as:
Wherein, the Is the target value.
Next, parameters of the optimization algorithm are updated with the states, actions, and rewards in the empirical data. The loss functions of the optimization algorithm deep reinforcement learning strategy network and the value network are respectively defined as follows:
Model parameters are updated using a gradient descent method. The performance of the model is periodically assessed during the training process, including monitoring accuracy, optimization effect, and task completion. And adjusting model parameters and training strategies according to the evaluation results, and further improving the performance of the model. The training and evaluation process is repeated until the model converges or reaches a predetermined performance index, so that the running state of the satellite energy system can be efficiently monitored, and an optimal energy management strategy can be generated.
And S7, outputting the optimal energy management strategy and executing the optimal energy management strategy.
After training, outputting the intelligent monitoring and optimizing strategy of the satellite energy system based on the large language model. And the satellite energy system is controlled in real time according to a strategy, such as adjusting power supply output, optimizing a load mode, executing energy scheduling and the like, so that efficient and stable operation of the energy system is realized.
And generating and executing an optimal strategy, and generating an optimal energy management strategy of the satellite energy system based on a network structure fused with the trained large language model and the optimization algorithm.
According to the state of the current satellite energy system, a large language model and an optimization algorithm which are completed through training are utilizedAnd historical data to generate an optimal energy management strategy. The generation process of the optimal strategy is defined as follows:
wherein: in order to train the completed optimal policy network, As a parameter thereof.
And carrying out semantic understanding and feature extraction on the current state by the large language model, and generating an optimal action by an optimization algorithm according to the extracted features. The generated actions include, but are not limited to, the angle of the solar panel, the attitude of the satellite, the energy scheduling strategy, etc.
Energy management strategy to be generatedThe method is applied to a satellite energy system and performs corresponding control operation. The method comprises the steps of adjusting the angle of a solar panel, dynamically adjusting the angle of the solar panel according to the current illumination condition and energy demand, guaranteeing the stability and high efficiency of energy supply, adjusting the attitude of a satellite, enabling the solar panel to always face the sun through adjusting the attitude of the satellite, maximizing the energy capturing efficiency, dynamically adjusting an energy scheduling strategy, optimizing the energy distribution and scheduling strategy according to the task priority and the energy system state, and guaranteeing the smooth completion of a key task.
And S8, visually displaying the monitoring and optimizing result.
The real-time running state, abnormal early warning, fault prediction, optimization strategy and other key information of the satellite energy system are visually displayed through the human-computer interaction interface, a detailed optimization report is generated, and decision support is provided for ground control personnel. The visual display content comprises real-time running state, optimization strategy execution effect and key performance indexes, and is visually displayed in the forms of charts, dashboards, dynamic simulation and the like.
According to the intelligent monitoring and optimizing method for the satellite energy system based on the large language model, efficient management and optimization of the satellite energy system are achieved. The method not only improves the energy utilization efficiency and the system stability, but also enhances the intelligent level and the self-adaptive capacity of the system through the semantic understanding and reasoning capacity of the large language model. In addition, the energy system optimization problem is converted into an intelligent decision problem, and the decision efficiency and the optimization accuracy are further improved by combining a large language model with an optimization algorithm. The efficient network structure and the bonus function design accelerate the convergence speed of the algorithm, and the trained model shows good generalization capability, so that stable performance in an unknown space environment is ensured. The improvement of the real-time performance and the accuracy, the optimal utilization of the energy and the obviously reduced fault risk are all key advantages of the application, provide strong technical support for the execution of satellite tasks and show wide application potential.
In addition, in combination with the intelligent control method of the satellite energy system in the above embodiment, the embodiment of the application can be implemented by providing a storage medium. The storage medium stores a computer program which when executed by a processor implements any of the satellite energy system intelligent control methods of the above embodiments.
It should be understood by those skilled in the art that the technical features of the above-described embodiments may be combined in any manner, and for brevity, all of the possible combinations of the technical features of the above-described embodiments are not described, however, they should be considered as being within the scope of the description provided herein, as long as there is no contradiction between the combinations of the technical features.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (8)

1.一种卫星能源系统智控方法,其特征在于,包括:1. A satellite energy system intelligent control method, comprising: 获取卫星能源系统的实时运行数据;Obtain real-time operating data of satellite energy systems; 将所述实时运行数据输入训练好的卫星能源系统环境模型,利用所述卫星能源系统环境模型中的大语言子模型,对所述实时运行数据进行语义理解和特征关联分析,提取关键特征;所述卫星能源系统环境模型的训练过程,包括:构建所述卫星能源系统的模拟环境,并获取所述卫星能源系统的训练数据;将所述训练数据输入初始环境模型,在所述模拟环境中运行初始环境模型;其中,所述初始环境模型包括初始大语言子模型和优化算法模块;利用所述卫星能源系统环境模型中的初始大语言子模型,对所述训练数据进行语义理解和特征关联分析,生成训练关键特征,构建大语言损失函数,并将所述训练关键特征代入所述大语言损失函数进行计算,以所述大语言损失函数的计算结果最小化为目标调整所述初始大语言子模型的参数,得到训练好的所述大语言子模型;将所述训练关键特征输入所述优化算法模块,构建优化损失函数,以所述优化损失函数的计算结果最小化为目标调整所述优化算法模块的参数,得到训练好的智能决策框架;基于所述大语言子模型和所述智能决策框架,得到训练好的所述卫星能源系统环境模型;The real-time operation data is input into a trained satellite energy system environment model, and the large language sub-model in the satellite energy system environment model is used to perform semantic understanding and feature association analysis on the real-time operation data to extract key features; the training process of the satellite energy system environment model includes: constructing a simulation environment for the satellite energy system and obtaining training data for the satellite energy system; inputting the training data into an initial environment model, and running the initial environment model in the simulation environment; wherein the initial environment model includes an initial large language sub-model and an optimization algorithm module; using the initial large language sub-model in the satellite energy system environment model to optimize the training data According to semantic understanding and feature association analysis, training key features are generated, a large language loss function is constructed, and the training key features are substituted into the large language loss function for calculation, and the parameters of the initial large language sub-model are adjusted with the goal of minimizing the calculation result of the large language loss function to obtain the trained large language sub-model; the training key features are input into the optimization algorithm module, an optimization loss function is constructed, and the parameters of the optimization algorithm module are adjusted with the goal of minimizing the calculation result of the optimization loss function to obtain a trained intelligent decision framework; based on the large language sub-model and the intelligent decision framework, a trained satellite energy system environment model is obtained; 利用所述卫星能源系统环境模型中的智能决策框架,基于所述关键特征,生成状态空间对应的状态数据;基于所述关键特征、所述状态数据,生成动作空间对应的动作数据,并基于所述关键特征、所述状态数据和奖励函数,生成奖励预测值;其中,所述智能决策框架包括所述状态空间、所述动作空间和所述奖励函数;所述状态空间包括电池的当前电量状态,太阳能电池板的当前输出功率,负载设备的当前功耗需求,当前环境条件,卫星任务的优先级和实时需求,能源系统的健康状态;所述动作空间包括太阳能板角度调整量、卫星姿态调整量和能源调度策略调整量;所述奖励函数中各子目标包括能源利用效率、系统稳定性和任务完成度;Utilizing the intelligent decision-making framework in the satellite energy system environment model, based on the key features, state data corresponding to the state space is generated; based on the key features and the state data, action data corresponding to the action space is generated, and based on the key features, the state data and the reward function, a reward prediction value is generated; wherein, the intelligent decision-making framework includes the state space, the action space and the reward function; the state space includes the current power state of the battery, the current output power of the solar panel, the current power consumption requirement of the load equipment, the current environmental conditions, the priority and real-time requirements of the satellite mission, and the health status of the energy system; the action space includes the solar panel angle adjustment amount, the satellite attitude adjustment amount and the energy scheduling strategy adjustment amount; the sub-goals in the reward function include energy utilization efficiency, system stability and task completion; 基于所述动作数据与所述奖励预测值,生成优化策略;基于所述优化策略,对卫星能源系统进行实时控制操作。An optimization strategy is generated based on the action data and the reward prediction value; and a satellite energy system is controlled in real time based on the optimization strategy. 2.根据权利要求1所述的卫星能源系统智控方法,其特征在于,所述方法还包括:2. The satellite energy system intelligent control method according to claim 1, further comprising: 基于所述关键特征,结合预设的卫星任务目标和环境约束条件,生成优化目标和约束条件;Based on the key features, combined with preset satellite mission objectives and environmental constraints, generate optimization objectives and constraints; 基于所述优化目标,定义所述奖励函数;Based on the optimization objective, defining the reward function; 基于所述约束条件,定义所述状态空间和所述动作空间。Based on the constraints, the state space and the action space are defined. 3.根据权利要求1所述的卫星能源系统智控方法,其特征在于,所述基于所述动作数据与所述奖励预测值,生成优化策略,包括:3. The satellite energy system intelligent control method according to claim 1, wherein generating an optimization strategy based on the action data and the reward prediction value comprises: 以最大化所述奖励预测值为目标,从所述动作数据选择最优动作,并基于所述最优动作生成优化策略。With the goal of maximizing the reward prediction value, an optimal action is selected from the action data, and an optimization strategy is generated based on the optimal action. 4.根据权利要求1所述的卫星能源系统智控方法,其特征在于,所述方法还包括:4. The satellite energy system intelligent control method according to claim 1, further comprising: 利用所述大语言子模型对卫星能源系统的运行状态进行深度分析,判断所述卫星能源系统的当前运行状态是否异常;Using the large language sub-model to conduct an in-depth analysis of the operating status of the satellite energy system to determine whether the current operating status of the satellite energy system is abnormal; 若所述当前运行状态为异常模式,则生成预警信息并继续利用所述智能决策框架,生成所述状态数据、所述动作数据和所述奖励预测值;If the current operating state is an abnormal mode, a warning message is generated and the intelligent decision-making framework is continued to be used to generate the state data, the action data and the reward prediction value; 若所述当前运行状态不为异常模式,则直接利用所述智能决策框架,生成所述状态数据、所述动作数据和所述奖励预测值。If the current operating state is not an abnormal mode, the intelligent decision-making framework is directly used to generate the state data, the action data and the reward prediction value. 5.根据权利要求4所述的卫星能源系统智控方法,其特征在于,所述判断所述卫星能源系统的当前运行状态是否异常,包括:5. The satellite energy system intelligent control method according to claim 4, wherein determining whether the current operating state of the satellite energy system is abnormal comprises: 基于预设的环境约束条件,生成安全状态集合;Generate a safe state set based on preset environmental constraints; 判断所述卫星能源系统的当前运行状态是否属于所述安全状态集合;Determining whether the current operating state of the satellite energy system belongs to the safe state set; 若所述卫星能源系统的当前运行状态属于所述安全状态集合,则所述当前运行状态为异常模式;If the current operating state of the satellite energy system belongs to the safe state set, the current operating state is an abnormal mode; 若所述卫星能源系统的当前运行状态不属于所述安全状态集合,则所述当前运行状态不为异常模式。If the current operating state of the satellite energy system does not belong to the safe state set, then the current operating state is not an abnormal mode. 6.根据权利要求1所述的卫星能源系统智控方法,其特征在于,所述获取卫星能源系统的训练数据,包括:6. The satellite energy system intelligent control method according to claim 1, wherein obtaining training data of the satellite energy system comprises: 利用所述模拟环境中的状态转移函数,生成经验数据;所述状态转移函数用于描述所述卫星能源系统在给定当前状态、执行动作和环境随机因素下的状态变化;Generating empirical data using a state transition function in the simulation environment; the state transition function is used to describe the state change of the satellite energy system under a given current state, execution action and environmental random factors; 基于所述经验数据,获取所述训练数据。Based on the experience data, the training data is obtained. 7.根据权利要求1所述的卫星能源系统智控方法,其特征在于,所述将所述训练关键特征输入所述优化算法模块,构建优化损失函数,以所述优化损失函数的计算结果最小化为目标调整所述优化算法模块的参数,得到训练好的所述智能决策框架,包括:7. The satellite energy system intelligent control method according to claim 1, characterized in that the step of inputting the training key features into the optimization algorithm module, constructing an optimization loss function, and adjusting the parameters of the optimization algorithm module with the goal of minimizing the calculation result of the optimization loss function to obtain the trained intelligent decision framework comprises: 所述优化算法模块包括策略网络和价值网络;The optimization algorithm module includes a strategy network and a value network; 基于所述训练关键特征,生成状态空间对应的训练状态数据;经由所述策略网络,基于所述训练关键特征和所述训练状态数据,生成动作概率分布,并基于所述动作概率分布选择动作空间对应的训练动作数据;Based on the training key features, generating training state data corresponding to the state space; generating an action probability distribution based on the training key features and the training state data via the policy network, and selecting training action data corresponding to the action space based on the action probability distribution; 经由所述价值网络,基于所述训练状态数据、所述训练动作数据和所述奖励函数,计算当前所述训练动作数据的即时奖励值;Calculating, via the value network, an immediate reward value of the current training action data based on the training state data, the training action data, and the reward function; 根据所述训练状态数据、所述训练动作数据、所述即时奖励值以及由状态转移函数生成的下一训练状态数据,计算优化损失函数结果,以所述优化损失函数结果最小化为目标调整所述优化算法模块的参数,得到训练好的所述智能决策框架。Based on the training state data, the training action data, the immediate reward value and the next training state data generated by the state transfer function, the optimization loss function result is calculated, and the parameters of the optimization algorithm module are adjusted with the goal of minimizing the optimization loss function result to obtain the trained intelligent decision-making framework. 8.一种存储介质,其特征在于,所述存储介质中存储有计算机程序,其中,所述计算机程序被设置为运行时执行权利要求1至权利要求7中任一项所述的卫星能源系统智控方法。8. A storage medium, characterized in that a computer program is stored in the storage medium, wherein the computer program is configured to execute the satellite energy system intelligent control method according to any one of claims 1 to 7 when running.
CN202510813906.3A 2025-06-18 2025-06-18 Intelligent control method for satellite energy system and storage medium Active CN120335379B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510813906.3A CN120335379B (en) 2025-06-18 2025-06-18 Intelligent control method for satellite energy system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510813906.3A CN120335379B (en) 2025-06-18 2025-06-18 Intelligent control method for satellite energy system and storage medium

Publications (2)

Publication Number Publication Date
CN120335379A CN120335379A (en) 2025-07-18
CN120335379B true CN120335379B (en) 2025-09-23

Family

ID=96369084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510813906.3A Active CN120335379B (en) 2025-06-18 2025-06-18 Intelligent control method for satellite energy system and storage medium

Country Status (1)

Country Link
CN (1) CN120335379B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116692027A (en) * 2023-06-05 2023-09-05 浙江理工大学 Satellite Exploration Control System and Method Based on Deep Reinforcement Learning
CN118631314A (en) * 2024-05-23 2024-09-10 北京航空航天大学 A large model driven multi-satellite collaborative perception decision-making method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6813525B2 (en) * 2000-02-25 2004-11-02 Square D Company Energy management system
US11022720B2 (en) * 2019-10-25 2021-06-01 The Florid International University Board of Trustees System for forecasting renewable energy generation
CN118709748A (en) * 2024-06-28 2024-09-27 武汉大学 A deep reinforcement learning scheduling method and device for satellite multi-point target imaging
CN118798486A (en) * 2024-09-11 2024-10-18 山东科技职业学院 An industrial textile control system and method based on large language model
CN119472447A (en) * 2025-01-06 2025-02-18 成都星联芯通科技有限公司 Dynamic optimization method of high-orbit satellite network based on logic modeling and SDN controller
CN120013465A (en) * 2025-01-17 2025-05-16 中山大学 A method and system for generating energy decisions based on large language model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116692027A (en) * 2023-06-05 2023-09-05 浙江理工大学 Satellite Exploration Control System and Method Based on Deep Reinforcement Learning
CN118631314A (en) * 2024-05-23 2024-09-10 北京航空航天大学 A large model driven multi-satellite collaborative perception decision-making method

Also Published As

Publication number Publication date
CN120335379A (en) 2025-07-18

Similar Documents

Publication Publication Date Title
CN113435657B (en) Data integration processing method, system, energy management system, electronic device, and computer-readable storage medium
US20190257886A1 (en) Deep learning approach for battery aging model
WO2024198769A1 (en) Model training method, power prediction method, and device
US10474177B2 (en) System and method for stability monitoring, analysis and control of electric power systems
US20160084889A1 (en) System and method for stability monitoring, analysis and control of electric power systems
CN112653198A (en) Wind power output scene generation method and system based on prediction box
CN112821456B (en) Distributed source-storage-load matching method and device based on transfer learning
CN116937565B (en) A distributed photovoltaic power generation power prediction method, system, device and medium
CN117595231A (en) A smart grid distribution network management system and method thereof
CN110991519A (en) Intelligent switch state analysis and adjustment method and system
Babu et al. Intelligent energy management system for smart grids using machine learning algorithms
CN119089401B (en) Data fusion method and system for multi-source microgrid
CN119828481A (en) Cockpit multi-scene switching output control method based on artificial intelligence
CN118798584B (en) A method, system, medium and device for robust optimization scheduling of building microgrid
CN120335379B (en) Intelligent control method for satellite energy system and storage medium
CN118710250A (en) A full life cycle management system and method for drone nest equipment
CN107526316B (en) Guide display equipment monitoring system based on internet communication engine
CN120109806B (en) Digital twinning-based transregional power scheduling method and system
CN119428295B (en) An unattended hydrogen ion battery charging station
Li et al. Real-time scheduling of virtual power system based on the hidden Markov model
CN120528034B (en) Power distribution network dynamic cluster planning control method and system based on electrical indexes
CN118761575B (en) Virtual power plant cluster resource replacement method and device, electronic equipment and medium
CN120341864B (en) Artificial intelligence-based power grid control method and related equipment
CN119670021B (en) Adjustable multimodal data aggregation method and system based on multimodal data fusion
CN120338439B (en) Virtual power plant energy management method and system based on air conditioner load

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant