Disclosure of Invention
      The invention provides a model training excitation method based on satisfaction in federal learning, which aims at solving at least one of the technical problems in the prior art.
      The technical scheme of the invention is a model training excitation method based on satisfaction in federal learning, comprising the following steps:
       in a meta-computing framework based on federal learning, computing information age and service delay; 
       Adjusting the data size by utilizing the quality adjustment parameters, and combining the information ages to obtain the model quality; 
       Adjusting the model quality and the service delay by using conversion parameters to obtain satisfaction; 
       Obtaining the utility of the server according to the satisfaction and the server rewards; 
       Constructing a Steueberg game model according to node utility and the server utility, wherein a server is used as a leader to determine a rewarding strategy, and a node is used as a follower to select a node update period according to the rewarding strategy of the server; 
       And solving the Stenberg game model by using a deep reinforcement learning algorithm to obtain an excitation scheme. 
      According to some embodiments of the invention, the information age is expressed as:
      
        
      
       Wherein A i is the information age, θ i is the node update period, and t is unit time. 
      According to some embodiments of the invention, the adjusting the data size by adjusting the quality parameter and combining the information age to obtain the model quality includes:
       multiplying the data quantity collected in the unit time period by the task duration to obtain a first intermediate value, dividing the first intermediate value by the node update period to obtain the data size, wherein the data size is expressed as: 
      
        
      
       Wherein D i is the data size, T is the task duration, D is the data amount collected in the unit time period, and θ i is the node update period; 
       multiplying the adjusted quality parameter by the data size to obtain a second intermediate value, dividing the second intermediate value by the information age to obtain the model quality, wherein the model quality is expressed as: 
      
        
      
       Wherein Q i is the model quality, ρ is the adjustment quality parameter, D i is the data size, and a i is the information age. 
      According to some embodiments of the invention, the node update period is expressed as:
      θi=cit+ait,
       where θ i is the node update period, c i t is the time it takes to collect and process model training data, a i t is the duration from the end of data collection to the beginning of data collection at the next stage, and t is the unit time. 
      According to some embodiments of the invention, the service delay is expressed as:
      
        
      
       Wherein E i is the service delay, θ i is the node update period, and t is unit time. 
      According to some embodiments of the invention, the conversion parameters include a quality conversion parameter and a delay conversion parameter;
       Said adjusting said model quality and said service delay using conversion parameters to obtain satisfaction, comprising: 
       multiplying the quality conversion parameter by the model quality to obtain a third intermediate value; 
       multiplying the delay conversion parameter by the service delay to obtain a fourth intermediate value; 
       subtracting the fourth intermediate value from the third intermediate value to obtain the satisfaction, expressed as: 
      Gi=τQi-λEi,
       Where G i is the satisfaction, τ is the quality transition parameter, λ is the delay transition parameter, Q i is the model quality, and E i is the service delay. 
      According to some embodiments of the invention, obtaining server utility from the satisfaction and server rewards comprises:
       Multiplying the satisfaction with unit satisfaction profit to obtain satisfaction gain; 
       Obtaining the utility of the server according to the difference value between the satisfaction gain and the server rewards, wherein the utility is expressed as follows: 
      
        
      
       wherein V is the utility of the server, beta is the unit satisfaction profit, G i is the satisfaction, and R i is the server reward. 
      According to some embodiments of the invention, the node utility is obtained by:
       Dividing the unit cost of maintaining the node updating period by the node updating period to obtain cost; 
       Subtracting the cost from the server rewards yields the node utility expressed as: 
      
        
      
      
        
      
      Ui=Ri-Ci,
       Wherein R i is the server rewards, R i is the unit rewards, θ i is the node update period, C i is the cost, σ i is the unit cost of maintaining the node update period, and U i is the node utility. 
      The technical scheme of the invention also relates to electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the model training excitation method based on satisfaction in federal learning when executing the computer program.
      The technical scheme of the invention also relates to a storage medium, wherein the storage medium stores a computer program, and the computer program realizes the model training excitation method based on satisfaction in federal learning when being executed by a processor.
      The method has the advantages that information age and service delay are calculated in a meta-computing framework based on federal learning, data size is adjusted by utilizing adjustment quality parameters, the information age is combined to obtain model quality, then the model quality and service delay are adjusted by utilizing conversion parameters to obtain satisfaction, server utility is obtained according to the satisfaction and server rewards, a Steinberg game model is built according to node utility and server utility, wherein the server is used as a leader to determine rewarding strategies, the node is used as a follower to select node updating periods according to the rewarding strategies of the server, and the Steinberg game model is solved by using a deep reinforcement learning algorithm to obtain an incentive scheme. And adjusting the model quality and the service delay by using the conversion parameters to balance the model quality and the service delay, and integrating satisfaction degree for balancing the model quality and the service delay into the utility of the server, so that nodes can be effectively stimulated to participate in federal learning.
      Further, additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
    
    
      Detailed Description
      The conception, specific structure, and technical effects produced by the present application will be clearly and completely described below with reference to the embodiments and the drawings to fully understand the objects, aspects, and effects of the present application. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.
      It should be noted that, unless otherwise specified, when a feature is referred to as being "fixed" or "connected" to another feature, it may be directly or indirectly fixed or connected to the other feature. Further, the descriptions of the upper, lower, left, right, top, bottom, etc. used in the present invention are merely with respect to the mutual positional relationship of the respective constituent elements of the present invention in the drawings.
      Furthermore, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in the description presented herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used herein includes any combination of one or more of the associated listed items.
      It should be understood that although the terms first, second, third, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element of the same type from another. For example, a first element could also be termed a second element, and, similarly, a second element could also be termed a first element, without departing from the scope of the present invention.
      Referring to fig. 1 to fig. 4, in some embodiments, the present invention provides a model training excitation method based on satisfaction in federal learning, including, but not limited to, steps 101 to 106, and each step is described in turn below.
      Step 101, in a meta-computing framework based on federal learning, information age and service delay are computed.
      In a specific embodiment, the model training incentive method based on satisfaction in federal learning further comprises constructing a meta-computing framework based on federal learning before calculating the information age and service delay in the meta-computing framework based on federal learning.
      Referring to fig. 4, the model training excitation method based on satisfaction in federal learning is applied to a federal learning-based meta-computing framework, which is composed of 5 modules, namely a device management module, a resource scheduling module, a task management module, a zero trust computing management module, and an identity and access management module.
      Wherein the device management module comprises a plurality of edge nodes. The main purpose of the device management module is to collect data from the production devices, integrate the computing, storage and communication resources of the edge nodes, and then map these resources to servers, thereby converting them into objects that can be easily accessed by the resource scheduling module. The resource scheduling module includes a plurality of virtual edge nodes. These virtual edge nodes constantly monitor changes in physical node configuration details, simulate the possible states of the nodes, and dynamically perform resource optimization. The task management module is located at the server end and is responsible for receiving the target object request and decomposing the task, and designing an incentive scheme according to the constraint of the task. The zero trust calculation management module performs global aggregation of federal learning through the blockchain. The identity and access management module ensures that the target object has the appropriate data access rights.
      Specifically, the target object is a user. Referring to fig. 4, a target object views a device on a virtual platform, the target object issues a task, a task management module receives the task, a server performs a stanburg game with a virtual node according to a task requirement, the task management module obtains an incentive scheme, the incentive scheme is applied to the node of the device management module, a zero trust calculation management module executes federal learning global model aggregation, the trained global model is uploaded to the target object, and the target object pays according to the incentive scheme.
      In some embodiments, in federal learning where a node has data caches and information ages are considered, the information ages for node i are expressed as:
      
        
      
       Where a i is information age, θ i is node update period, a i t is duration from end of data collection to start of data collection in next stage, and t is unit time. 
      In particular, the age of the information (Age of information, aoI) is used to measure model freshness, representing the time difference between the generation of the data and the receipt and processing of the data. The information age is used for measuring the freshness of the model, so that the quality of the model is guaranteed, and the information freshness is considered.
      In some embodiments, the service delay is expressed as:
      
        
      
       Where E i is the service delay, θ i is the node update period, a i t is the duration from the end of data collection to the beginning of data collection at the next stage, and t is the unit time. 
      In particular, the service delay represents the length of time a node has passed from receiving a request to uploading a local model, including data collection and model training periods.
      And 102, adjusting the data size by utilizing the quality adjustment parameters, and combining the information age to obtain the model quality.
      Wherein, the data represents training data of the node, and the data size represents the data size of the node for training. The adjustment quality parameter is used to adjust the data size. The model quality is the model quality of the node training model.
      In some embodiments, adjusting the data size using the adjustment quality parameter in combination with the information age, results in a model quality comprising:
       Multiplying the data amount collected in the unit time period by the task duration to obtain a first intermediate value, dividing the first intermediate value by the node update period to obtain a data size, and representing the data size as: 
      
        
      
       Wherein D i is the data size, T is the task duration, D is the data amount collected in a unit time period, and θ i is the node update period; 
       multiplying the adjusted quality parameter by the data size to obtain a second intermediate value, dividing the second intermediate value by the information age to obtain model quality expressed as: 
      
        
      
       Wherein Q i is model quality, ρ is adjustment quality parameter, D i is data size, and A i is information age. 
      In some embodiments, the time taken to collect and process model training data is added to the duration of the data collection ending to the beginning of the next stage of data collection, resulting in a node update period, expressed as:
      θi=cit+ait,
       Where θ i is the node update period, c i t is the time it takes to collect and process model training data, a i t is the duration from the end of data collection to the beginning of the next stage of data collection, and t is the unit time. 
      Specifically, the node update period indicates an update period of data collection, calculation, and transmission in the node. The node periodically updates its cached data with a node update period. The duration from the end of data collection to the beginning of data collection at the next stage includes a traffic period and an idle period.
      And 103, adjusting the model quality and service delay by using the conversion parameters to obtain satisfaction.
      In particular, the trade-off of low latency and high model quality in the industry metauniverse is critical, so satisfaction is used to balance model quality and service latency in node i.
      It can be understood that, the satisfaction index is provided, the data size, the information age and the service delay are comprehensively considered, and the model quality and the service delay in the industry meta-universe are balanced, which is different from the traditional quality assessment mode which only focuses on a single factor or does not fully consider the key factors.
      In some embodiments, the conversion parameters include a quality conversion parameter and a delay conversion parameter. Adjusting the model quality and service delay using the conversion parameters to obtain satisfaction, including:
       Multiplying the quality conversion parameter by the model quality to obtain a third intermediate value; 
       multiplying the delay conversion parameter by the service delay to obtain a fourth intermediate value; 
       Subtracting the fourth intermediate value from the third intermediate value to obtain satisfaction, expressed as: 
      Gi=τQi-λEi,
       Where G i is satisfaction, τ is a quality conversion parameter, λ is a delay conversion parameter, Q i is model quality, and E i is service delay. 
      In particular, quality transition parameters are used to adjust the model quality and delay transition parameters are used to adjust the service delay. It will be appreciated that the model quality is adjusted using the quality conversion parameters and the service delay is adjusted using the delay conversion parameters to balance the model quality and the service delay. And integrating satisfaction degree for balancing the quality and service delay of the model into the server utility, constructing a Steinberg game model according to the server utility, and solving the Steinberg game model to obtain an excitation scheme, so that nodes can be effectively excited to participate in federal learning.
      And 104, obtaining the utility of the server according to the satisfaction and the server rewards.
      In particular, a server utility is a utility that a server obtains from a node. The set of industrial internet of things nodes involved is denoted I all = {1, a method of operating a computer system for operating a computer system, I.
      In some embodiments, deriving the server utility from the satisfaction and the server rewards includes:
       Multiplying satisfaction with unit satisfaction profit to obtain satisfaction gain; 
       Obtaining the utility of the server according to the difference between the satisfaction gain and the server rewards, wherein the utility is expressed as: 
      
        
      
       Where V is the server utility, beta is the unit satisfaction profit, G i is the satisfaction, and R i is the server rewards. 
      The participation of a node in a federal learning task may be rewarded by a server while generating costs, and in some embodiments, the utility of the node is obtained by:
       Dividing the unit cost of maintaining the node update period by the node update period to obtain cost; 
       subtracting the cost from the server rewards yields the node utility, expressed as: 
      
        
      
      
        
      
      Ui=Ri-Ci,
       Where R i is server rewards, R i is unit rewards, θ i is node update period, C i is cost, σ i is unit cost for maintaining node update period, and U i is node utility. 
      And 105, constructing a Stebert game model according to the node utility and the server utility, wherein the server is used as a leader to determine a rewarding strategy, and the node is used as a follower to select a node update period according to the rewarding strategy of the server.
      Specifically, given that both servers and nodes seek to maximize their own interests, interactions between the two are modeled as a two-stage Steinberg game. Wherein the server determines the rewarding policy as a leader and the nodes respond with a node update period as followers.
      It will be appreciated that the utility model of the server and node are built separately, and that the incentive scheme is designed from the interests of both parties, unlike existing incentive schemes that focus only on node contributions or on single body interests.
      In some embodiments, the Stanberg gaming model is expressed as:
      Ω={(SP∪{i}i∈I),(ri,θi),(V,Ui)},
       Where, (SP ∈i } i∈I) represents a set of servers and their corresponding nodes, (r i,θi) represents a set of bonus policies, (V, U i) represents a set of utilities, SP is a server, r i is a unit bonus, i is a node, θ i is a node update period, V is a server utility, U i is a node utility, and Ω is a Stanberg game model. 
      And 106, solving the Steinberg game model by using a deep reinforcement learning algorithm to obtain an excitation scheme.
      It should be noted that the incentive scheme includes a unit incentive and a node update period.
      It will be appreciated that conventional methods for solving gaming balances require a large amount of participant information and are difficult to implement in the industry universe. By using deep reinforcement learning, the optimal strategy is learned based on experience, priori information is not needed, the privacy of participants is protected, and the information acquisition difficulty is solved. In addition, the model training excitation method based on satisfaction in federal learning effectively improves the utilization efficiency and the overall benefit of system resources while not reducing the model precision.
      In a specific embodiment, the Stanberg gaming model is solved using MADDPG algorithm in the DRL. Specifically, the DRL algorithm is an abbreviation of Deep Reinforcement Learning, i.e., a deep reinforcement learning algorithm, which is a machine learning method that combines deep learning (DEEP LEARNING, DL) and reinforcement learning (Reinforcement Learning, RL). The MADDPG algorithm, the Multi-agent depth deterministic strategy Gradient (Multi-AGENT DEEP DETERMINISTIC Policy Gradient) algorithm, is a reinforcement learning algorithm used in Multi-agent environments.
      It can be understood that the utility optimization problem of the node utility and the server utility is converted into a Stanberg game model, and the game balance is solved by utilizing MADDPG algorithm in the DRL, so that the method can better adapt to complex and non-cooperative environment in the industrial universe and realize better resource allocation and node selection strategy compared with the traditional heuristic algorithm.
      With respect to solving the Stenberg gaming model using DRL, specifically, a state space (including server price policies and node caching policies), a partially observable space (node and server history policy based observation information), an action space (server rewards policies and node caching policy adjustments), and rewards functions (consistent with utility functions) are defined. Wherein the utility functions are node utility functions and server utility functions. And the Actor network outputs actions according to the environmental state by utilizing an Actor-Critic architecture, the Critic network evaluates action values, and the Actor and Critic network are trained alternately through experience playback and gradient update, so that the strategy is converged to an optimal strategy, and the accumulated rewards are maximized.
      Specifically, DRL-based stebert gaming process as shown in fig. 2, the server acts as a leader and the nodes act as followers. During each training period, the server agent observes the stateAnd determines the actionNode proxy observation stateAnd determines the actionThe current state then transitions to the next state and the agent receives the reward. The detailed composition of the DRL controller of each agent is shown in fig. 3. The playback buffer is used to store the transition data collected in the interaction, including the current state, actions, rewards, and next state. These stored transition data are sampled in a batch fashion to decorrelate sequential data and stabilize the training process. The Actor network and the Critic network are composed of three full connection layers. The Actor network takes the current state as input, and outputs corresponding actions through the generation strategy. The Critic network evaluates the actions taken by the actor network and provides a value estimate to guide policy improvement. Both the Actor and Critic networks are updated through two independent optimization modules. The policy optimizer updates the parameters of the Actor network based on the policy gradients, and the value optimizer minimizes time difference errors to refine the value estimate of the Critic network.
      In one possible embodiment, a Steinberg game balance is defined, a reverse induction method is used to analyze node optimal decisions, derive node and server utility functions, and determine node optimal update periods and server optimal rewards strategies.
      It can be seen that, in a meta-computing framework based on federal learning, information age and service delay are calculated, data size is adjusted by using an adjustment quality parameter, and information age is combined to obtain model quality, model quality and service delay are adjusted by using a conversion parameter to obtain satisfaction, server utility is obtained according to satisfaction and server utility, a Steinberg game model is constructed according to node utility and server utility, wherein the server is used as a leader to determine unit rewards, the node is used as a follower to select a node update period according to the unit rewards of the server, and Steinberg game balance is solved by using a deep reinforcement learning algorithm to obtain an excitation scheme. And the conversion parameters are utilized to adjust the model quality and the service delay, so that the balance of the model quality and the service delay is realized, and the overall system performance is improved. And the satisfaction degree for balancing the model quality and the service delay is integrated into the utility of the server, so that the node can be effectively stimulated to participate in federal learning.
      The embodiment of the invention also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the model training excitation method based on satisfaction in federal learning when executing the computer program. The electronic equipment can be any intelligent terminal including a computer and the like.
      The embodiment of the invention also provides a storage medium which stores a computer program, and the computer program realizes the model training excitation method based on satisfaction in the federal learning when being executed by a processor.
      It should be appreciated that the method steps in embodiments of the present invention may be implemented or carried out by computer hardware, a combination of hardware and software, or by computer instructions stored in non-transitory computer-readable memory. The method may use standard programming techniques. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
      Furthermore, the operations of the processes described herein may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes (or variations and/or combinations thereof) described herein may be performed under control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications), by hardware, or combinations thereof, collectively executing on one or more processors. The computer program includes a plurality of instructions executable by one or more processors.
      Further, the method may be implemented in any type of computing platform operatively connected to a suitable computing platform, including, but not limited to, a personal computer, mini-computer, mainframe, workstation, network or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and so forth. Aspects of the invention may be implemented in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optical read and/or write storage medium, RAM, ROM, etc., such that it is readable by a programmable computer, which when read by a computer, is operable to configure and operate the computer to perform the processes described herein. Further, the machine readable code, or portions thereof, may be transmitted over a wired or wireless network. When such media includes instructions or programs that, in conjunction with a microprocessor or other data processor, implement the steps described above, the invention described herein includes these and other different types of non-transitory computer-readable storage media. The invention may also include the computer itself when programmed according to the methods and techniques of the present invention.
      The computer program can be applied to the input data to perform the functions described herein, thereby converting the input data to generate output data that is stored to the non-volatile memory. The output information may also be applied to one or more output devices such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including specific visual depictions of physical and tangible objects produced on a display.
      The present invention is not limited to the above embodiments, but can be modified, equivalent, improved, etc. by the same means to achieve the technical effects of the present invention, which are included in the spirit and principle of the present invention. Various modifications and variations are possible in the technical solution and/or in the embodiments within the scope of the invention.