Detailed Description
      The present application will be described and illustrated with reference to the accompanying drawings and examples in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by a person of ordinary skill in the art based on the embodiments provided by the present application without making any inventive effort, are intended to fall within the scope of the present application. Moreover, it should be appreciated that while such a development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as having the benefit of this disclosure.
      Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by those of ordinary skill in the art that the described embodiments of the application can be combined with other embodiments without conflict.
      Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," and similar referents in the context of the application are not to be construed as limiting the quantity, but rather as singular or plural. The terms "comprises," "comprising," "includes," "including," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in connection with the present application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means greater than or equal to two. "and/or" describes the association relationship of the association object, and indicates that three relationships may exist, for example, "a and/or B" may indicate that a exists alone, a and B exist simultaneously, and B exists alone. The terms "first," "second," "third," and the like, as used herein, are merely distinguishing between similar objects and not representing a particular ordering of objects.
      The method embodiment provided in this embodiment may be executed in a terminal, a computer or a similar computing device. Taking the operation on the terminal as an example, fig. 1 is a block diagram of the hardware structure of the terminal of the intelligent control method of the satellite energy system according to the embodiment of the invention. As shown in fig. 1, the terminal may include one or more processors 102 (only one is shown in fig. 1) (the processor 102 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing data, and optionally, a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting on the structure of the terminal described above. For example, the terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.
      The memory 104 may be used to store computer programs, such as software programs and modules of application software, such as computer programs corresponding to the intelligent control method of the satellite energy system in the embodiment of the present invention, and the processor 102 executes the computer programs stored in the memory 104 to perform various functional applications and data processing, that is, implement the above-mentioned method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
      The transmission device 106 is used to receive or transmit data via a network. The specific example of the network described above may include a wireless network provided by a communication provider of the terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as a NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.
      The embodiment provides a satellite energy system intelligent control method, fig. 2 is a flowchart of the satellite energy system intelligent control method according to an embodiment of the application, as shown in fig. 2, the flowchart includes the following steps:
       Step S201, acquiring real-time operation data of a satellite energy system; 
       Wherein, the real-time operation data of the satellite energy system is collected in real time through the sensors arranged on the satellite, the real-time operation data comprises power supply system data (such as battery voltage, current, temperature, charge-discharge state and the like), load system data (power consumption of each device, working state (such as a communication module, a sensor, a propeller and the like), satellite environment data (solar illumination intensity, space radiation dose, satellite surface temperature and the like), the real-time operation data collected by the sensor in the step covers three dimensions of power supply, load and environment, provides complete input for subsequent model analysis, and can ensure timely monitoring and analysis of the state of the satellite energy system by collecting the data in real time through the sensor. 
      Step S202, inputting real-time operation data into a trained satellite energy system environment model, and carrying out semantic understanding and feature association analysis on the real-time operation data by utilizing a large language submodel in the satellite energy system environment model to extract key features;
       The real-time operation data is input into a trained satellite energy system environment model, and is subjected to data preprocessing, such as cleaning (invalid value, repeated value and abnormal value are removed, and integrity and accuracy of the data are ensured), denoising (noise in the data is eliminated through a filtering algorithm or a statistical method, and the signal-to-noise ratio of the data is improved), and normalization (unified normalization of the data with different dimensions and magnitudes). A large language sub-model is embedded in the satellite energy system environment model, semantic understanding and feature association analysis are carried out on the preprocessed real-time operation data by utilizing the semantic understanding capability of the pre-trained large language model, historical data and real-time data are associated, and key features which are vital to energy system state evaluation and optimization decision are extracted. The key features are that high-order abstract indexes which are extracted from real-time operation data and can reflect the core state of the system are synthesized, dynamic changes of a power supply system, load equipment and environmental conditions are integrated, input is provided for an intelligent decision frame, and generation of an optimization strategy is supported. Key features include the state of health of the power supply system (e.g., battery aging), the power consumption trend of the load system (e.g., periodic high load periods), dynamic changes in environmental conditions, and the like. The large language submodel outputs key features for use by a subsequent intelligent decision framework. The method comprises the steps of carrying out preprocessing operations such as cleaning, denoising and normalizing before data are input into a large language sub-model, improving the data quality, providing a reliable basis for subsequent analysis, carrying out deep analysis on real-time operation data through the large language sub-model, more accurately identifying state changes and potential faults of an energy system, and generating an optimization strategy by an intelligent decision framework based on extracted key features, thereby improving decision efficiency and accuracy. 
      Step S203, generating state data corresponding to a state space based on key features by utilizing an intelligent decision framework in a satellite energy system environment model, generating action data corresponding to an action space based on the key features and the state data, and generating a reward prediction value based on the key features, the state data and a reward function, wherein the intelligent decision framework comprises a state space, an action space and the reward function;
       The state space is a set of all parameters describing the real-time running state of the satellite energy system and covers key dimensions such as energy supply and demand, environmental conditions, task demands and the like, the action space is a set of all optimization operations executable by the satellite energy system, the energy is efficiently utilized by adjusting hardware or scheduling strategies, and the rewarding function is used for quantifying the advantages and disadvantages of each action and integrating the energy efficiency, the system stability and the task completion degree through a mathematical formula to guide the model to learn the optimal strategy. The intelligent decision framework converts the complex optimization problem of the satellite energy system into a learnable decision process through state space perception environment, action space execution control and rewarding function evaluation effect. Based on the key characteristics (such as power health degree, load trend and environment dynamic change) extracted in the steps, generating state data, wherein the state data comprises data of solar panel output power, total power consumption requirement of load equipment, illumination intensity and the like. And outputting candidate actions in the action space, namely action data, based on the fused key characteristics and state data. Calculating a reward predicted value of the candidate action by using a formula of a reward function, wherein the formula is as follows: 
      
        
      
       in the above formula, the energy utilization efficiency (η) is determined by the effective energy utilization efficiency, the stability (σ) is calculated by the degree to which the parameter deviates from the safe range, the task completion degree (T) is weighted by the priority, 、、As weight coefficients, for balancing the priorities of the different optimization objectives.
      Specifically, if the key characteristics are that the relevance of capacity reduction caused by battery aging and illumination interruption of a shadow area is identified, the fact that illumination cannot be recovered within 30 minutes in the future is predicted, and battery power supply is needed. The generated action data comprises action 1 for adjusting satellite gesture and attempting to depart from shadow area in advance (but possibly increasing energy consumption), action 2 for suspending low priority tasks and fully distributing available energy (200W) to communication tasks, and action 3 for reducing load equipment voltage (hidden action) and reducing instantaneous power consumption. Next, a bonus function evaluation is performed, namely, if action 1 is successfully separated from the shadow region, the energy utilization efficiency eta is improved, but the posture adjustment consumes energy (the stability is improved), the bonus may be that r=0.7, action 2 directly ensures the task to be completed (t=0.9), but the battery exhaustion risk is high (sigma=0.5), the bonus r=0.85, action 3 has the best stability (sigma=0.2), but the task completion degree is reduced (t=0.6), and the bonus r=0.65.
      The method comprises the steps of integrating energy efficiency, system stability and task completion degree through a reward function, scientifically balancing different optimization targets, avoiding system unbalance caused by single index optimization, generating an action strategy in real time, ensuring quick response in a dynamic space environment (such as illumination mutation and load fluctuation), abstracting a complex energy system optimization problem into an intelligent decision problem, defining a definite state space, an action space and the reward function, and enabling the problem to be more structured and resolvable.
      And step S204, generating an optimization strategy based on the action data and the rewards predicted value, and performing real-time control operation on the satellite energy system based on the optimization strategy.
      And selecting the action with the highest rewarding value as the optimal strategy through action data (such as adjusting the angle of the solar panel, satellite gesture and the like) and corresponding rewarding predicted values. For example, if the prize predictor r=0.85 for the action "pause low priority task" is higher than the other actions, it is determined as an optimization strategy. The abstract actions are converted into specific control instructions (such as 'disabling the observation equipment'), and the satellite control module executes strategies to perform real-time control operations, such as adjusting the angle of a solar panel, maximizing the light capture, dynamically distributing energy, and preferentially guaranteeing high-priority tasks (such as communication equipment). In addition, the system state (such as battery power recovery and task completion) after execution can be monitored, and model parameters can be updated to optimize the subsequent strategies. And finally, the real-time running state of the satellite energy system, the execution effect of the optimization strategy and the key performance index can be visually displayed through a human-computer interaction interface, a detailed optimization report is generated, and decision support is provided for operation and maintenance personnel. The step can dynamically adjust according to the real-time state and task requirement of the satellite energy system by real-time control operation, and ensures the high-efficiency and stable operation of the system.
      Through the steps, the real-time operation data are firstly obtained, the large language submodel is utilized to carry out semantic analysis and characteristic association on the original data, the limitation of the traditional threshold judgment or simple statistical model is broken through, and the deep association of multi-source heterogeneous data is captured through the context understanding capability (such as carrying out semantic level association analysis on solar wing temperature fluctuation and load power consumption change). And then, constructing a dynamic state space in an intelligent decision framework, mapping environmental parameters, task priorities and the like into high-dimensional feature vectors, and compared with a traditional table look-up method or a fixed rule base, generating a refined action space comprising 'adjusting solar wing angles', 'dynamic power distribution' and the like in real time and quantifying strategy values through a reward function (such as energy utilization rate multiplied by task completion degree weight coefficient). Finally, in the strategy generation stage, an action sequence with the highest rewarding predicted value is dynamically selected through algorithms such as Monte Carlo tree search and the like, for example, the communication module is preferentially guaranteed to supply power and the standby load is started, compared with the traditional priority queue algorithm, multi-target pareto optimal solution calculation can be completed in millisecond level, the method effectively solves the conflict between dynamic environment adaptation lag and resource allocation, and therefore solves the technical problems that a satellite energy system is difficult to adapt to environment change and cannot efficiently balance multi-task demands in a dynamic space environment.
      In some of these embodiments, the method further comprises:
       Based on key characteristics, generating an optimization target and a constraint condition by combining a preset satellite task target and an environment constraint condition; 
       defining a reward function based on the optimization objective; 
       Based on the constraints, a state space and an action space are defined. 
      Wherein, based on key characteristics (such as battery state of health E health, load power consumption trend P load, environmental dynamics T env), preset satellite task targets (task priority R task), and environmental constraints (such as illumination intensity, battery capacity limit), optimization targets and constraint conditions are generated. The optimization target comprises maximizing energy efficiency, guaranteeing system stability and meeting task requirements, and the constraint conditions comprise power output capacity, load power consumption requirements and environmental dynamic changes.
      Specifically, the optimization objective is as follows:
       The energy utilization efficiency is maximized, namely the energy waste is reduced and the energy utilization rate is improved by optimizing the power output and the load distribution. The objective function may be expressed as: 
       ;
       In the above formula, η is the energy utilization efficiency, P used is the energy to be effectively utilized, and P total is the total available energy. 
      Ensuring the system stability, ensuring the operation of a power supply system and load equipment in a safe range, and avoiding system faults caused by insufficient energy or overload. The objective function may be expressed as:
       ;
       In the above formula, σ is a system stability index, S i is an actual value of the ith system parameter, S target,i is a target value, S max,i and S min,i are safety upper and lower limits of the parameters, respectively. 
      The requirements of the satellite tasks are met, namely, the energy allocation is dynamically adjusted according to the priority and the real-time requirements of the satellite tasks, and smooth completion of the key tasks is ensured. The objective function may be expressed as:
       ;
       In the above formula, T is the task completion, For the priority weight of the j-th task, C j is the task completion status (1 indicates complete, 0 indicates incomplete).
      The constraint conditions are as follows:
       the output capacity of the power supply system is limited by considering the battery capacity, the output power of the solar panel and the like, so that the power supply system is ensured to operate in a safe range. Constraints can be expressed as: 
      
        
      
       Wherein, the  For the output power of the power supply system,AndThe lower and upper limits of the output power, respectively.
      And the power consumption requirement of the load equipment is that energy is reasonably distributed according to the power consumption characteristics and task priority of each equipment, so that local overload or energy shortage is avoided. Constraints can be expressed as:
      
        
      
       Wherein, the  Is the firstThe power consumption of the individual load devices,Is the currently available energy source.
      And (3) dynamically changing environmental conditions, namely dynamically adjusting an optimization strategy by considering the influence of environmental factors such as illumination intensity, temperature, radiation and the like on an energy system. Constraints can be expressed as:
      
        
      
       Wherein, the  As an influencing factor for the environmental conditions,AndThe lower and upper limits of the influencing factors, respectively.
      And the real-time running state of the satellite energy system is that the system state is estimated and the optimization strategy is adjusted based on the real-time monitoring data, so that the stability and the high efficiency of the system are ensured. Constraints can be expressed as:
      
        
      
       Wherein, the  In order to be in the current operating state,Is a set of security states.
      Through the comprehensive consideration of the optimization targets and the constraint conditions, a scientific and reasonable energy management strategy can be formulated, the efficient and stable operation of the satellite energy system is realized, the satellite task targets described by natural language (such as 'preferential guarantee scientific experiment load') are converted into mathematical constraints, the optimization directions are strictly aligned with the actual demands of the system, and the problem of target ambiguity of the traditional threshold control method is avoided.
      In some of these embodiments, the state space includes a current state of charge of the battery, a current output power of the solar panel, a current power consumption requirement of the load device, a current environmental condition, a priority of the satellite mission, and a real-time requirement, a health status of the energy system;
       the action space comprises an angle adjustment amount of a solar panel, a satellite attitude adjustment amount and an energy scheduling strategy adjustment amount; 
       The sub-objectives in the reward function include energy utilization efficiency, system stability, and task completion. 
      The state space S is used for describing the real-time operation state of the satellite energy system, and is defined as:
      
        
      
       Wherein, the  For the current state of charge of the battery,For the current output power of the solar panel,For the current power consumption requirements of the load device,Is the current environmental conditions such as the ambient temperature, the illumination intensity and the like,For the priority and real-time requirements of satellite tasks,Is the health state of the energy system.
      The action space a is used for describing the optimal control operation of the satellite energy system, and is defined as follows:
      
        
      
       Wherein, the  Is the angle adjustment of the solar panel,Is the adjustment of satellite attitude,Adjustment of energy scheduling strategies (such as task priority adjustment and the like).
      The reward function R is used to evaluate the effect of each action, and the sub-objectives include energy utilization efficiency, system stability, and task completion, and is defined as:
      
        
      
       Wherein, the  The energy utilization efficiency is calculated by the following formulaSigma is a system stability index, and a calculation formula is;For the task completion degree, the calculation formula is as follows;、、As weight coefficients, for balancing the priorities of the different optimization objectives.
      The state space carries out semantic understanding and feature extraction on real-time operation data through a large language sub-model, multi-dimensional data such as power supply, load, environment and the like are fused to generate, the action space generates candidate action sequences through a deep reinforcement learning strategy network based on the state space and a reward function, and the reward function adjusts the weight of each sub-target in real time according to environmental changes (such as sudden faults and task urgency). The step combines the energy efficiency, the system stability and the task completion degree through the reward function, and avoids unbalance caused by single-objective optimization. The reward function quantifies the complex target into a learnable numerical value so as to generate optimal action, the state space is updated in real time, the action space is fast responded, the real-time adaptability of the system is guaranteed, the action space strictly follows the limitation of the power output capacity and the load demand, the overload or energy shortage risk is avoided, the real-time data and the historical characteristics are fused, the accurate decision is supported, the efficiency, the stability and the task demand are scientifically balanced, the manual intervention is reduced, and the survivability and the task success rate of the satellite in the complex space environment are improved.
      In some of these embodiments, generating an optimization strategy based on the action data and the reward prediction value includes:
       And selecting an optimal action from the action data with the aim of maximizing the reward predicted value, and generating an optimization strategy based on the optimal action. 
      Based on the motion data (candidate motion in motion space) and the corresponding predicted value of rewards (such as "adjusting solar panel angle" rewards r=0.6 and "suspending low priority tasks" rewards r=0.85), the motion with the highest rewards value is selected as the optimal strategy through the value network of deep reinforcement learning. For example, when a satellite enters a shadow zone, "pause low priority tasks" are selected for higher prize values. The optimal actions are then mapped to specific control commands such as suspending the viewing device (saving 200W power consumption), adjusting the solar panel angle (from 30 ° to 45 °, maximizing the light capture). And a satellite control module sends out an instruction (such as a driving motor for adjusting the angle of the solar panel). In addition, parameters such as battery power, task completion degree and the like after execution can be tracked in real time, an execution result is fed back to the model, strategy network parameters are updated, and the subsequent decision accuracy is improved. According to the method, the action with the highest rewarding value is selected, so that energy waste is reduced, and the situations of illumination mutation, load fluctuation and the like are responded in real time.
      In some of these embodiments, the method further comprises:
       Deep analysis is carried out on the running state of the satellite energy system by utilizing the large language submodel, and whether the current running state of the satellite energy system is abnormal or not is judged; 
       If the current running state is an abnormal mode, generating early warning information and continuously utilizing an intelligent decision framework to generate state data, action data and rewarding predicted values; 
       if the current running state is not in an abnormal mode, the intelligent decision framework is directly utilized to generate state data, action data and rewarding predicted values. 
      The method comprises the steps of carrying out semantic association analysis on preprocessed real-time operation data (power supply system, load system and environment data) and history data by utilizing a large language submodel, extracting key characteristics, and judging whether the current state is abnormal or not based on characteristic threshold values or dynamic rules (such as that the battery temperature exceeds a safety range and the load power consumption is not matched with the task priority). If the judgment is abnormal, generating early warning information (such as 'battery temperature overrun, risk level: high'), and triggering the intelligent decision frame to generate state data, action data and rewarding predicted values; if the state is normal, the method directly enters an intelligent decision framework to generate an optimization strategy. According to the method, abnormal modes such as battery aging and load sudden increase are rapidly identified through semantic association capability of a large language model, early warning (such as 'illumination interruption risk early warning') is generated in advance, system breakdown or task interruption is avoided, under abnormal conditions, an intelligent decision frame is combined with early warning information to generate a targeted strategy (such as suspending non-critical tasks), stable operation of core functions is guaranteed, under normal conditions, energy distribution is optimized based on real-time state data, and multitasking requirements such as efficiency, stability and task requirements are balanced.
      In some of these embodiments, determining whether the current operating state of the satellite energy system is abnormal includes:
       generating a safety state set based on a preset environment constraint condition; 
       judging whether the current running state of the satellite energy system belongs to a safety state set or not; 
       If the current running state of the satellite energy system belongs to the safety state set, the current running state is an abnormal mode; 
       If the current running state of the satellite energy system does not belong to the safety state set, the current running state is not in an abnormal mode. 
      Wherein, based on preset environmental constraint conditions (such as battery voltage range, temperature threshold, upper limit of load power consumption, etc.), a safety state set is defined, for example, the battery voltage safety range is 3.3 V≤V battery≤4.2V, and the temperature safety range T env E [ -20 ℃ and 50 ℃. The secure ranges of the multidimensional parameters are combined into a secure State set State safe of the high-dimensional space. Based on the State data, a current operating State current is obtained, which includes battery level P battery, solar output power P solar, ambient temperature T env, and the like. If State current State safe, judging as an abnormal mode, triggering early warning, and if State current∈Statesafe, judging as a normal mode, and directly entering an optimization decision flow. In addition, the large language submodel can carry out auxiliary analysis, and combines historical data and real-time characteristics to identify implicit anomalies (such as capacity fading trend caused by battery aging), so that the comprehensiveness of judgment is enhanced. And under the abnormal mode, specific early warning content (such as 'the battery voltage is over-limited: the current value is 4.3V, and the safety range is 3.3V-4.2V') is generated and pushed to an operation and maintenance interface. According to the embodiment, the parameter out-of-range (such as temperature overrun and voltage abnormality) is quickly identified through comparison of the preset constraint condition and the real-time state, hysteresis of a traditional rule engine is avoided, safety ranges of multiple parameters such as power supply, load and environment are comprehensively considered, and system-level cascading failure is avoided.
      In some of these embodiments, the training process of the satellite energy system environment model includes:
       constructing a simulation environment of a satellite energy system and acquiring training data of the satellite energy system; 
       Inputting training data into an initial environment model, and running the initial environment model in a simulation environment, wherein the initial environment model comprises an initial large language sub-model and an optimization algorithm module; 
       Carrying out semantic understanding and feature association analysis on training data by utilizing an initial large language submodel in a satellite energy system environment model, generating training key features, constructing a large language loss function, substituting the training key features into the large language loss function for calculation, and adjusting parameters of the initial large language submodel with the calculation result of the large language loss function minimized as a target to obtain a trained large language submodel; 
       Inputting the training key features into an optimization algorithm module, constructing an optimization loss function, and adjusting parameters of the optimization algorithm module with the minimum calculation result of the optimization loss function as a target to obtain a trained intelligent decision frame; 
       Based on the large language submodel and the intelligent decision frame, a trained satellite energy system environment model is obtained. 
      The method comprises the steps of constructing a high-fidelity simulation environment based on a physical model (such as a power supply system, a load system and environmental conditions) of a satellite energy system, and simulating dynamic scenes such as illumination intensity fluctuation, load demand change, equipment aging and the like. Running preset tasks (such as track adjustment and burst communication tasks) in the simulation environment, and collecting real-time data such as voltage/current of a power supply system, power consumption of load equipment, environmental temperature and the like to form training data. The initial environment model includes an initial large language sub-model (Large Language Model, LLM large language model) and an optimization algorithm module (deep reinforcement learning network). The training data is input into an initial environment model, the initial large language sub-model performs primary feature extraction on the initial environment model, an optimization algorithm generates an initial action strategy, and the simulation environment updates the state according to the action and feeds back the rewarding value. Specifically, the initial large language sub-model performs semantic association analysis on training data, and extracts key features (such as battery health and load power consumption trend). Constructing a large language loss function, and calculating the difference between the output characteristics of the initial large language submodel and the actual state of the simulation environment:
      
        
      
       In the above formula, y i is a real feature tag (e.g., battery capacity decay rate) provided for the simulated environment. 
      And then, adjusting parameters of the initial large language submodel by a gradient descent method, minimizing a loss function, and improving feature extraction precision to obtain the trained large language submodel.
      And inputting training key features (such as E health、Pload) extracted from the initial large language submodel into an optimization algorithm module, constructing an optimization loss function, adjusting the optimization loss function parameters through experience playback and gradient descent, and maximizing the cumulative rewards to obtain a trained intelligent decision frame. Combining the trained large language sub-model with an intelligent decision framework, wherein the large language sub-model is responsible for real-time feature extraction, and the intelligent decision framework generates an action strategy based on the features to obtain the trained satellite energy system environment model.
      The simulation environment can reproduce high dynamic performance (such as illumination interruption of a shadow area) of the space environment, training data are ensured to cover various extreme conditions, high cost of a real satellite experiment is avoided, large-scale annotation data are quickly generated, feasibility of an initial model is verified through the simulation environment, early defects are identified, satellite system faults caused by false actions are prevented, a large language sub-model can understand deep association of mining data (such as a relation between battery temperature and load fluctuation) through semantics, feature quality is improved, the trained large language sub-model can identify hidden anomalies (such as voltage fluctuation caused by battery aging), a trained optimization algorithm can quickly generate actions to respond to dynamic environment changes, the semantic understanding of the large language sub-model is complementary with decision capability of the optimization algorithm, overall model performance is improved, the model can adapt to unknown environments (such as sudden radiation interference), and stable operation of satellites under complex scenes is guaranteed.
      In some of these embodiments, acquiring training data for a satellite energy system includes:
       Generating experience data by using a state transfer function in a simulation environment, wherein the state transfer function is used for describing state change of a satellite energy system under given current state, execution action and environmental random factors; 
       based on the empirical data, training data is obtained. 
      The method comprises the steps of constructing a high-fidelity simulation environment comprising a power supply system, a load system and environmental dynamics, and simulating the scenes of illumination intensity change, load fluctuation, equipment aging and the like. The state transfer function is s t+1=f(st,at,et), where s t is current state data (e.g., battery power, load demand), a t is action data to be performed (e.g., adjusting the angle of the solar panel), and e t is an environmental random factor (e.g., abrupt illumination, device noise). The model is run in a simulation environment, and the state, action, rewards, and next state of each step are recorded to form an empirical data tuple (s t,at,rt,st+1). Empirical data is stored in the empirical pool D supporting subsequent random sampling to break the data correlation. The data are sampled in batches from the experience pool D as training data for updating parameters of the large language submodel and the optimization algorithm module. The training data includes state data s t, action data a t, instant prize value r t, and next state data s t+1. The method introduces the uncertainty of the environmental random factor e t to simulate illumination fluctuation, equipment noise and the like, enhances the adaptability of the model to the dynamic environment, samples data in batches from an experience pool, improves the data utilization rate, accelerates model convergence, reduces data correlation by random sampling, improves model generalization, and enables the model to be still capable of making a stable decision in an unknown scene (such as burst radiation interference).
      In some embodiments, the training key features are input into an optimization algorithm module, an optimization loss function is constructed, the parameters of the optimization algorithm module are adjusted with the calculation result of the optimization loss function minimized as a target, and a trained intelligent decision frame is obtained, including:
       the optimization algorithm module comprises a strategy network and a value network; 
       Generating action probability distribution based on the training key characteristics and the training state data through a strategy network, and selecting training action data corresponding to the action space based on the action probability distribution; 
       calculating an instant prize value for the current training motion data based on the training state data, the training motion data, and the prize function via the value network; 
       And calculating an optimized loss function result according to the training state data, the training action data, the instant rewarding value and the next training state data generated by the state transfer function, and adjusting parameters of the optimization algorithm module with the minimized optimized loss function result as a target to obtain a trained intelligent decision frame. 
      The optimization algorithm module comprises a Policy Network (pi) and a Value Network (Q). The training key features and the state data are input into the strategy network, and an action probability distribution is output, wherein the action probability distribution represents the probability that each action in the action space is selected under the given current state data. For example, the action probability= [0.6,0.3,0.1] corresponds to A= { delta theta solar,ΔSschedule,Δθattitude},Δθsolar as the adjustment amount of the solar panel angle, delta theta attitude as the adjustment amount of the satellite attitude, delta S schedule as the adjustment of the energy scheduling strategy (such as task priority adjustment) and the like, and the actions (such as the following) are selected according to probability distribution). The value network evaluates the long-term benefits of the action, directs the policy network to optimize the direction, inputs the status data and the action data into the value network, calculates an instant prize value based on a prize function, quantifies the action effect, e.g., adjusts the prize r=0.85 for the solar panel angle action, over other actions. The loss functions of the policy network and the value network of the optimization algorithm module are respectively defined as follows:
       ;
       In the above formula, L π is the loss function of the policy network with the goal of maximizing the expected value of the long-term jackpot, E is the expected value, Q (s t,at;θQ) is the state action value function of the value network output, representing the expected long-term jackpot value of executing action a t  in state s t , s t is the state data of time step t, pi (s t;θπ) is the parameter of the policy network under parameter theta π, action a t;θπ generated according to state s t is the parameter of the policy network, and is optimized by the gradient descent method, theta Q is the parameter of the value network, and is fixed for training of the policy network (avoiding target value fluctuation). 
       ;
      In the above formula, L Q is a loss function of the value network, the goal is to make the predicted value approach to the real target value (mean square error is minimized), N is the training data amount sampled from the experience pool, Q (s i,ai;θQ) is the predicted value of the value network under the parameter theta Q  for the state s i and the action a i, r i is the instant rewarding value obtained after the action a i is executed, gamma is a discount factor (0≤gamma≤1), and the importance of the current rewards and the future rewards is balanced (if gamma=0.9 represents more important recent rewards); To select the action a i+1;θQ′ that maximizes the output of the target value network (parameter θ Q′) in the next state s i+1  as the parameter of the target value network, the synchronization is periodically performed from θ Q (steady training process). 
      The optimized loss function comprises a loss function of a strategy network and a loss function of a value network, network parameters theta π and theta Q are adjusted by a gradient descent method with the aim of minimizing the result of the optimized loss function, and the data correlation is broken by combining experience playback. According to the method, diversified actions are generated through the training strategy network, the training value network is accurately evaluated, the efficiency, the stability and the task requirements are balanced, exploration and utilization are balanced, the training time is shortened through batch training and experience playback acceleration model convergence, and the optimized model can adapt to an unknown dynamic environment.
      The embodiments of the present application will be described and illustrated below by means of preferred embodiments.
      Fig. 3 is a flowchart of a satellite energy system intelligent control method according to a preferred embodiment of the present application, and as shown in fig. 3, the satellite energy system intelligent control method comprises the steps of:
       and S1, constructing a satellite energy system environment model. 
      An environment model of the satellite energy system is built, wherein the environment model comprises power system data (such as battery voltage, current, temperature and the like), load system data (such as power consumption, working state and the like of each device) and satellite environment data (such as illumination intensity, temperature, radiation and the like) which run in real time. And acquiring data by using the sensor to form a complete running state description of the satellite energy system.
      The input of the satellite energy system environment model is real-time operation data of the satellite energy system, wherein the real-time operation data mainly comprise power supply system data, load system data, satellite environment data and the like. Power system data including, but not limited to, battery voltage, current, temperature, charge-discharge status, solar panel output power, etc., is used to describe the real-time operating status of the satellite power system. Load system data including, but not limited to, power consumption, operating status, task priority, etc. of each device (e.g., communication device, sensor, computing unit, etc.) is used to describe the real-time requirements of the satellite load system. Satellite environmental data including, but not limited to, light intensity, temperature, radiation level, orbital position, etc., is used to describe the external environmental conditions under which the satellite operates. By collecting the multidimensional data, a complete satellite energy system environment model is constructed, and a comprehensive data base is provided for subsequent intelligent monitoring and optimization.
      And S2, establishing a data preprocessing and feature extraction module.
      Fig. 4 is a frame diagram of a data preprocessing and feature extraction module in the intelligent control method of the satellite energy system according to the preferred embodiment of the present application, and as shown in fig. 4, the collected satellite energy system data is subjected to cleaning, denoising and normalization processing, and key features are extracted. And carrying out semantic understanding and feature association analysis on the historical data and the real-time data by using a large language model, and constructing high-quality data input.
      The data preprocessing and feature extraction module is established based on real-time operation data of the satellite energy system, and is mainly used for cleaning, denoising and normalizing acquired power system data, load system data and satellite environment data, extracting key features and providing high-quality data input for subsequent large language model analysis.
      And (3) cleaning data, namely removing invalid values, repeated values and abnormal values in the acquired data, and ensuring the integrity and the accuracy of the data. And (3) denoising the data, namely eliminating noise in the data by a filtering algorithm or a statistical method, and improving the signal-to-noise ratio of the data. And data normalization, namely unified standardization of data with different dimensions and magnitudes is realized, so that subsequent analysis and model processing are facilitated. And extracting the key features (such as the health state of a power supply system, the power consumption trend of a load system, the dynamic change of environmental conditions and the like) from the preprocessed data by utilizing the semantic understanding capability of the large language model, and providing a high-quality data basis for intelligent monitoring and optimization. Through the processing, the data preprocessing and feature extraction module can remarkably improve the usability of data and provide reliable support for subsequent large language model analysis and optimization decision.
      And S3, analyzing the state of the energy system and optimizing the requirement.
      Fig. 5 is a frame diagram of state analysis and optimization requirement evaluation in the intelligent control method of the satellite energy system according to the preferred embodiment of the present application, as shown in fig. 5, the operation state of the satellite energy system is deeply analyzed based on a large language model, an abnormal mode is identified, a potential failure is predicted, and the optimization requirement of the current energy system is evaluated. And formulating an optimization target and constraint conditions by combining the satellite task target and the environment constraint.
      In the state analysis and optimization demand evaluation of the energy system, the optimization targets mainly are maximizing the energy utilization efficiency, guaranteeing the system stability and meeting the satellite task demands, and the constraint conditions consider the output capacity of the power supply system, the power consumption demands of load equipment, the dynamic changes of environmental conditions, the real-time running state of the satellite energy system and the like.
      The optimization targets are as follows:
       The energy utilization efficiency is maximized, namely the energy waste is reduced and the energy utilization rate is improved by optimizing the power output and the load distribution. The objective function may be expressed as: 
      
        
      
       Wherein, the  In order to achieve the energy utilization efficiency of the energy,In order to make efficient use of the energy,Is the total available energy source.
      Ensuring the system stability, ensuring the operation of a power supply system and load equipment in a safe range, and avoiding system faults caused by insufficient energy or overload. The objective function may be expressed as:
      
        
      
       wherein sigma is a system stability index, Is the actual value of the ith system parameter,For the target value to be a target value,AndThe upper and lower safety limits of the parameter, respectively.
      The requirements of the satellite tasks are met, namely, the energy allocation is dynamically adjusted according to the priority and the real-time requirements of the satellite tasks, and smooth completion of the key tasks is ensured. The objective function may be expressed as:
      
        
      
       Wherein, the  In order to achieve the degree of completion of the task,For the priority weights of j tasks,For the task complete state (1 indicates complete, 0 indicates incomplete).
      The constraint conditions are as follows:
       the output capacity of the power supply system is limited by considering the battery capacity, the output power of the solar panel and the like, so that the power supply system is ensured to operate in a safe range. Constraints can be expressed as: 
      
        
      
       Wherein, the  For the output power of the power supply system,AndThe lower and upper limits of the output power, respectively.
      And the power consumption requirement of the load equipment is that energy is reasonably distributed according to the power consumption characteristics and task priority of each equipment, so that local overload or energy shortage is avoided. Constraints can be expressed as:
      
        
      
       Wherein, the  Is the firstThe power consumption of the individual load devices,Is the currently available energy source.
      And (3) dynamically changing environmental conditions, namely dynamically adjusting an optimization strategy by considering the influence of environmental factors such as illumination intensity, temperature, radiation and the like on an energy system. Constraints can be expressed as:
      
        
      
       Wherein, the  As an influencing factor for the environmental conditions,AndThe lower and upper limits of the influencing factors, respectively.
      And the real-time running state of the satellite energy system is that the system state is estimated and the optimization strategy is adjusted based on the real-time monitoring data, so that the stability and the high efficiency of the system are ensured. Constraints can be expressed as:
      
        
      
       Wherein, the  In order to be the current state of the system,Is a set of security states.
      Through the comprehensive consideration of the optimization target and the constraint condition, a scientific and reasonable energy management strategy can be formulated, and the efficient and stable operation of the satellite energy system can be realized.
      And S4, converting the energy system optimization problem into an intelligent decision problem.
      Fig. 6 is a diagram of an intelligent decision framework in the intelligent control method of the satellite energy system according to the preferred embodiment of the present application, as shown in fig. 6, to convert the monitoring and optimization problem of the satellite energy system into an intelligent decision problem based on a large language model. And defining a state space and an action space, and constructing an intelligent decision framework. The state space comprises a power supply system state, a load system state and environmental conditions, and the action space comprises power supply output adjustment, load mode optimization and energy scheduling strategies.
      Modeling an energy system optimization problem, converting the monitoring and optimization problem of a satellite energy system into an intelligent decision problem based on a large language model, defining a state space, an action space and a reward function, and constructing an intelligent decision framework.
      The state space S is used for describing the real-time operation state of the satellite energy system, and is defined as:
      
        
      
       Wherein, the  For the current state of charge of the battery,For the current output power of the solar panel,For the current power consumption requirements of the load device,Is the environmental conditions such as the ambient temperature, the illumination intensity and the like,E health is the health status of the energy system (such as battery aging degree, equipment failure risk, etc.) for the priority and real-time demand of satellite tasks.
      The action space a is used for describing the optimal control operation of the satellite energy system, and is defined as follows:
      
        
      
       Wherein, the  Is the angle adjustment of the solar panel,Is the adjustment of satellite attitude,Adjustment of energy scheduling strategies (such as task priority adjustment and the like).
      Further, according to the current illumination condition and the satellite position, calculating the optimal solar panel angleThe adjustment angle of the solar panel is defined as:
      
        
      
       Wherein, the  Is the current solar panel angle.
      According to the task requirements and the energy optimization target, calculating the optimal satellite attitudeThe adjustment amount of the satellite attitude is defined as:
      
        
      
       Wherein, the  Is the current satellite attitude.
      According to the current energy system state and task demands, calculating an optimal energy scheduling strategyThe adjustment of the energy scheduling policy is defined as:
      
        
      
       Wherein, the  And scheduling the strategy for the current energy.
      The reward function R is used for evaluating the effect of each action, comprehensively considering the energy utilization efficiency, the system stability and the task completion, and is defined as follows:
      
        
      
       Wherein, the  The energy utilization efficiency is calculated by the following formulaSigma is a system stability index, and a calculation formula is;For the task completion degree, the calculation formula is as follows;、、As weight coefficients, for balancing the priorities of the different optimization objectives.
      The intelligent decision framework is constructed by integrating a state space, an action space and a reward function into the intelligent decision framework based on semantic understanding and reasoning capability of the large language model. And generating an optimal energy management strategy through analysis of the historical data and the real-time data by the large language model. The core of the decision framework is to find the optimal strategy by Markov Decision Process (MDP) modeling, so that the jackpot is maximized:
      
        
      
       Wherein, the  Is a discount factor used to balance the importance of current rewards with future rewards.
      And S5, building a network structure in which the large language model and the optimization algorithm are fused.
      And designing a network structure of fusion of the large language model and an optimization algorithm (such as deep reinforcement learning, heuristic algorithm and the like) to generate a reward function. The reward function comprehensively considers the energy utilization efficiency, the system stability and the task completion degree, and improves the decision efficiency of the optimization algorithm through semantic understanding and reasoning capability of the large language model.
      The network structure design of the fusion of the large language model and the optimization algorithm comprises embedding of the large language model and the optimization algorithm so as to realize intelligent monitoring and optimization of the satellite energy system.
      Embedding a large language model, namely embedding the pre-trained large language model into an optimization framework, and performing deep analysis on real-time operation data of the satellite energy system by utilizing strong semantic understanding and reasoning capability of the pre-trained large language model. The input of the large language model comprises a state space S and historical data, and the output is a semantic understanding and feature extraction result of the current system state.
      The output of the large language model is defined as:
      
        
      
       Wherein, the  The i-th key feature extracted for the large language model.
      And (3) optimizing the satellite energy system by utilizing deep reinforcement learning, inputting the characteristics and the state space S extracted for the large language model, and outputting the characteristics and the state space S as an optimizing action in the action space A.
      The objective of the optimization algorithm is to find the optimal actions through the policy network pi and the value network Q:
      
        
      
      
        
      
       Wherein, the  AndParameters of the policy network and the value network, respectively.
      And step S6, training and optimizing the model in the simulation environment.
      Training a model in the constructed satellite energy system simulation environment, and continuously learning optimization strategies in historical data and real-time data through a large language model. And collecting training experience, updating model parameters, and gradually improving the monitoring precision and the optimizing capability of the model. The simulation environment simulates dynamic changes in the space environment, including illumination condition fluctuation, load demand change and sudden fault events, so as to enhance the generalization capability of the model.
      In the model training and optimizing process, through training a network structure fused with a large language model and an optimizing algorithm in a simulation environment, experience data are collected, model parameters are updated, and the monitoring precision and optimizing capability of the model are gradually improved.
      And constructing a simulation environment, namely constructing a high-fidelity satellite energy system simulation environment, and simulating dynamic changes of a power supply system, a load system and environmental conditions. The input of the simulation environment comprises power system data, load system data and environment data, and the output is the real-time running state and the optimization result of the system.
      The state transfer function of the simulated environment is defined as:
      
        
      
       Wherein, the  In the event of a current state,For the current action to be taken,Is an environmental random factor.
      Experience data collection, namely running a model in a simulation environment and collecting experience dataWherein: In the event of a current state, For the current action to be taken,For the current prize to be awarded,The next state. The empirical data is stored in the empirical pool D for subsequent model training and parameter updating.
      Model training and parameter updating, namely randomly sampling a batch of experience data from an experience pool D, and training a large language model and an optimization algorithm. The training process comprises large language model training and optimization algorithm training.
      First, update parameters of a large language model with states and rewards in empirical dataAnd semantic understanding and feature extraction capability of the method are improved. The loss function is defined as:
      
        
      
       Wherein, the  Is the target value.
      Next, parameters of the optimization algorithm are updated with the states, actions, and rewards in the empirical data. The loss functions of the optimization algorithm deep reinforcement learning strategy network and the value network are respectively defined as follows:
      
        
      
      
        
      
       Model parameters are updated using a gradient descent method. The performance of the model is periodically assessed during the training process, including monitoring accuracy, optimization effect, and task completion. And adjusting model parameters and training strategies according to the evaluation results, and further improving the performance of the model. The training and evaluation process is repeated until the model converges or reaches a predetermined performance index, so that the running state of the satellite energy system can be efficiently monitored, and an optimal energy management strategy can be generated. 
      And S7, outputting the optimal energy management strategy and executing the optimal energy management strategy.
      After training, outputting the intelligent monitoring and optimizing strategy of the satellite energy system based on the large language model. And the satellite energy system is controlled in real time according to a strategy, such as adjusting power supply output, optimizing a load mode, executing energy scheduling and the like, so that efficient and stable operation of the energy system is realized.
      And generating and executing an optimal strategy, and generating an optimal energy management strategy of the satellite energy system based on a network structure fused with the trained large language model and the optimization algorithm.
      According to the state of the current satellite energy system, a large language model and an optimization algorithm which are completed through training are utilizedAnd historical data to generate an optimal energy management strategy. The generation process of the optimal strategy is defined as follows:
      
        
      
       wherein:  in order to train the completed optimal policy network, As a parameter thereof.
      And carrying out semantic understanding and feature extraction on the current state by the large language model, and generating an optimal action by an optimization algorithm according to the extracted features. The generated actions include, but are not limited to, the angle of the solar panel, the attitude of the satellite, the energy scheduling strategy, etc.
      Energy management strategy to be generatedThe method is applied to a satellite energy system and performs corresponding control operation. The method comprises the steps of adjusting the angle of a solar panel, dynamically adjusting the angle of the solar panel according to the current illumination condition and energy demand, guaranteeing the stability and high efficiency of energy supply, adjusting the attitude of a satellite, enabling the solar panel to always face the sun through adjusting the attitude of the satellite, maximizing the energy capturing efficiency, dynamically adjusting an energy scheduling strategy, optimizing the energy distribution and scheduling strategy according to the task priority and the energy system state, and guaranteeing the smooth completion of a key task.
      And S8, visually displaying the monitoring and optimizing result.
      The real-time running state, abnormal early warning, fault prediction, optimization strategy and other key information of the satellite energy system are visually displayed through the human-computer interaction interface, a detailed optimization report is generated, and decision support is provided for ground control personnel. The visual display content comprises real-time running state, optimization strategy execution effect and key performance indexes, and is visually displayed in the forms of charts, dashboards, dynamic simulation and the like.
      According to the intelligent monitoring and optimizing method for the satellite energy system based on the large language model, efficient management and optimization of the satellite energy system are achieved. The method not only improves the energy utilization efficiency and the system stability, but also enhances the intelligent level and the self-adaptive capacity of the system through the semantic understanding and reasoning capacity of the large language model. In addition, the energy system optimization problem is converted into an intelligent decision problem, and the decision efficiency and the optimization accuracy are further improved by combining a large language model with an optimization algorithm. The efficient network structure and the bonus function design accelerate the convergence speed of the algorithm, and the trained model shows good generalization capability, so that stable performance in an unknown space environment is ensured. The improvement of the real-time performance and the accuracy, the optimal utilization of the energy and the obviously reduced fault risk are all key advantages of the application, provide strong technical support for the execution of satellite tasks and show wide application potential.
      In addition, in combination with the intelligent control method of the satellite energy system in the above embodiment, the embodiment of the application can be implemented by providing a storage medium. The storage medium stores a computer program which when executed by a processor implements any of the satellite energy system intelligent control methods of the above embodiments.
      It should be understood by those skilled in the art that the technical features of the above-described embodiments may be combined in any manner, and for brevity, all of the possible combinations of the technical features of the above-described embodiments are not described, however, they should be considered as being within the scope of the description provided herein, as long as there is no contradiction between the combinations of the technical features.
      The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.