CN111190654B

CN111190654B - Loading method and device of functional module, storage medium and electronic device

Info

Publication number: CN111190654B
Application number: CN201911391742.0A
Authority: CN
Inventors: 杨爽
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2022-05-03
Anticipated expiration: 2039-12-30
Also published as: CN111190654A

Abstract

The invention discloses a loading method and device of a functional module, a storage medium and an electronic device. The method can comprise the following steps: acquiring a plurality of functional modules to be loaded by a target application; determining a target loading sequence of the plurality of functional modules under the current operating environment based on a reinforcement learning algorithm; and sequentially loading a plurality of functional modules according to the target loading sequence. By the method and the device, the effect of dynamically determining the loading sequence of each functional module by monitoring the change of the operating environment is achieved.

Description

Loading method and device of functional module, storage medium and electronic device

Technical Field

The invention relates to the field of data processing, in particular to a loading method and device of a functional module, a storage medium and an electronic device.

Background

At present, the internal structure of the application becomes more and more complex with the increasing demand of users, and new functional modules are derived continuously. Under the condition that the difference of running environments such as device performance, network rate and the like is not considered, when an application is opened, if all function modules are loaded and applied at the same time, the whole starting response time is too long, so that the results of incomplete loading of the function modules, process interruption and the like are caused.

In order to enable the application to be started more smoothly and quickly, all the functional modules needing to be loaded need to be sequenced and started, and when the application runs in the current running environment, the loading sequence of each functional module cannot be updated in real time, so that the shortest starting time of the application under the current condition cannot be reached.

Aiming at the problem that the loading sequence of each functional module can not be dynamically determined by monitoring the change of the operating environment in the prior art, an effective solution is not provided at present.

Disclosure of Invention

The invention mainly aims to provide a loading method, a loading device, a storage medium and an electronic device of functional modules, so as to at least solve the technical problem that the loading sequence of each functional module cannot be dynamically determined by monitoring the change of the operating environment.

In order to achieve the above object, according to an aspect of the present invention, a method for loading a functional module is provided. The method can comprise the following steps: acquiring a plurality of functional modules to be loaded by a target application; determining a target loading sequence of the plurality of functional modules under the current operating environment based on a reinforcement learning algorithm; and sequentially loading a plurality of functional modules according to the target loading sequence.

Optionally, determining a target loading order of the plurality of functional modules in the current operating environment based on the reinforcement learning algorithm includes: an obtaining step, in which at least one expected value corresponding to a first functional module in a plurality of functional modules is obtained, wherein each expected value is used for representing the expectation of loading the rest functional modules after the first functional module is loaded, and the rest functional modules are the functional modules of which the loading sequence is not determined except the first functional module in the plurality of functional modules; determining, namely determining the residual function module with the maximum expected value as a second function module which needs to be loaded after the first function module is loaded; and judging whether the plurality of functional modules have residual functional modules, if so, determining the second functional module as the first functional module, returning to the obtaining step, and if not, determining the sequence corresponding to the functional modules with the determined loading sequence as a target loading sequence, wherein the functional modules with the determined loading sequence comprise the first functional module and the second functional module.

Optionally, before the first performing the obtaining step, the method further comprises one of: determining a function module randomly selected from a plurality of function modules as a first function module; and determining the functional module with the target attribute meeting the preset condition as a first functional module.

Optionally, the obtaining step includes: acquiring a first loading time length required by loading of each residual function module after the first function module is loaded; determining a first reward value for loading each residual function module through the first loading duration, and acquiring a weight of the first reward value; determining each of the expected values based at least on the first reward value for each remaining function module, the weight of the first reward value for each remaining function module.

Optionally, the method further comprises: updating, namely updating each expected value corresponding to the first functional module to obtain at least one updated expected value; the determining step includes: and determining the residual function module corresponding to the maximum update expected value in the at least one update expected value as a second function module needing to be loaded after the first function module is loaded.

Optionally, the updating step includes: acquiring the weight of each expected value; acquiring a second loading time length of the functional module with the determined loading sequence; determining a second reward value for loading the functional modules with the determined loading sequence through a second loading time length, and acquiring a weight of the second reward value; and updating each expected value at least based on the weight of each expected value, the second reward value and the weight of the second reward value to obtain each updated expected value.

Optionally, the updating step includes: updating each expected value corresponding to the first functional module until at least one of the following conditions is met: the time length for sequentially loading the plurality of functional modules based on the target loading sequence is less than a target threshold value; in response to the update end instruction.

Optionally, determining a target loading order of the plurality of functional modules in the current operating environment based on the reinforcement learning algorithm includes at least one of: determining a target loading sequence based on a reinforcement learning algorithm under the condition of starting the target application once; determining a target loading sequence based on a reinforcement learning algorithm under the condition that the number of times of starting the target application reaches a target number of times; and determining a target loading sequence based on a reinforcement learning algorithm at intervals of a preset interval.

In order to achieve the above object, according to another aspect of the present invention, there is also provided a loading apparatus for a functional module. The device comprises: the system comprises an acquisition unit, a processing unit and a control unit, wherein the acquisition unit is used for acquiring a plurality of functional modules to be loaded by a target application; the determining unit is used for determining the target loading sequence of the functional modules under the current operating environment based on a reinforcement learning algorithm; and the loading unit is used for sequentially loading the plurality of functional modules according to the target loading sequence.

In order to achieve the above object, according to another aspect of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is configured to execute a loading method of the functional module of the embodiment of the present invention when running.

In order to achieve the above object, according to another aspect of the present invention, there is also provided an electronic device. The electronic device includes a memory and a processor. The memory has stored therein a computer program and the processor is arranged to run the computer program to perform the function loading method of an embodiment of the invention.

According to the invention, a plurality of functional modules to be loaded of the target application are obtained; determining a target loading sequence of the plurality of functional modules in the current operating environment based on a reinforcement learning algorithm; and sequentially loading a plurality of functional modules according to the target loading sequence. That is to say, in the present application, in the current operating environment of the target application, the reinforcement learning algorithm is used to determine the target loading sequence of the plurality of functional modules, which is suitable for the current operating environment, so as to dynamically update the loading sequence of the plurality of functional modules in the original operating environment, so as to achieve the purpose of adapting to the current operating environment, thereby solving the technical problem that the loading sequence of each functional module cannot be dynamically determined by monitoring the change of the operating environment, and further achieving the technical effect of dynamically determining the loading sequence of each functional module by monitoring the change of the operating environment.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

fig. 1 is a block diagram of a hardware structure of a mobile terminal according to a method for loading a functional module according to an embodiment of the present invention;

FIG. 2 is a functional module loading method according to an embodiment of the present invention;

FIG. 3 is a flowchart of a reinforcement learning based multi-function module dynamic loading method according to an embodiment of the present invention;

FIG. 4 is a diagram of a grid to Q-table conversion in a reinforcement learning algorithm according to an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating a variation of an exploration rate according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a system for dynamically loading functional modules within software, in accordance with an embodiment of the present invention; and

fig. 7 is a schematic diagram of a loading apparatus for a functional module according to an embodiment of the present invention.

Detailed Description

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

The method provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal or a similar operation device. Taking an example of the present invention running on a mobile terminal, fig. 1 is a block diagram of a hardware structure of the mobile terminal of a method for loading a functional module according to an embodiment of the present invention. As shown in fig. 1, the mobile terminal may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and does not limit the structure of the mobile terminal. For example, the mobile terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used for storing computer programs, for example, software programs and modules of application software, such as a computer program corresponding to a loading method of a functional module in the embodiment of the present invention, and the processor 102 executes the computer programs stored in the memory 104 to perform loading of various functional applications and functional modules, so as to implement the above-mentioned method. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices via a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

In this embodiment, a method for loading a functional module running on the mobile terminal is provided. Fig. 2 is a flowchart of a method for loading a functional module according to an embodiment of the present invention. As shown in fig. 2, the method may include the steps of:

fig. 2 is a method for loading a functional module according to an embodiment of the present invention. As shown in fig. 2, the method may include the steps of:

step S202, a plurality of functional modules to be loaded by the target application are obtained.

In the technical solution provided by step S202 in the present invention, the target application may be software that allows adding a function module to the target application according to a user requirement, and may include a plurality of function modules, and the plurality of function modules may be loaded into the target application when the target application is started, and may be different types of function modules, and may further include components for implementing respective functions of the target application.

For example, the target application of the embodiment is an instant messaging application, even though the communication application includes a plurality of functional modules, which may be a message sending module, a message receiving module, a message viewing module, a message closing module, a message deleting module, and the like, without any limitation here.

And step S204, determining the target loading sequence of the functional modules in the current operating environment based on the reinforcement learning algorithm.

In the technical solution provided by step S204 of the present invention, after obtaining a plurality of functional modules to be loaded by a target application, a target loading sequence of the plurality of functional modules in a current operating environment is determined based on a reinforcement learning algorithm.

In this embodiment, the target application runs in the current environment, which may include information for affecting loading of the plurality of functional modules, such as, but not limited to, the type of operating system of the device in which the target application is installed, the usage rate of the central processing unit, the rate of the network used, control of software resources corresponding to the target application, the number of times the target application is opened, and the like.

In order to avoid that all the function modules are loaded simultaneously when the target application is opened, which results in overlong response time of integral starting of the target application and further results of incomplete loading of the function modules, process interruption and the like, in the current operating environment, the embodiment determines the target loading sequence of the function modules to be loaded in the current operating environment, wherein the target loading sequence is also the sequence of the function modules loaded to the front and the back of the target application.

The embodiment determines the target loading sequence of a plurality of functional modules in the current operating environment based on the reinforcement learning algorithm, wherein the reinforcement learning algorithm is that an Agent can select an action for the environment, the environment can accept the action and then has a state change, and simultaneously can generate a reward signal to the Agent, the Agent can select to execute the next action (action) according to the reward signal and the current state (state) of the environment, each state has a plurality of actions for selection, so that the goal of guiding the action through the reward obtained by interacting with the environment is realized, and the maximum reward is obtained.

In order to avoid adding too much performance burden to the target application as a whole, the reinforcement Learning algorithm of this embodiment may be adapted to the mechanism of the Q-Learning algorithm, and is improved on the basis that, unlike the conventional path planning application, each created sub-mesh needs to be traversed (i.e. each function module of the target application needs to be executed) only in a different order, so that an n × 1 mesh is initially created, which represents n states, which represent that n function modules of the target application are executed in sequence, and n-i action choices (i ═ 0 … n) are provided in each state, that is, when each supply module is executed, n-i possible function modules to be selected to be loaded next. After the Q-Learning mechanism is adapted, the above-mentioned grid of n x 1 is converted into a Q-table, each column for representing n-i action selections, the rows representing states, and the value of each cell for representing the maximum future reward expectation (Q-value) for a given state and corresponding action.

In this embodiment, the selection operation for selecting a functional module to load may correspond to the action in the reinforcement learning, the result of loading the selected functional module corresponds to the reward signal, and the target loading sequence of the plurality of functional modules may be continuously determined along with the change of the target application in the current operating environment, where the target loading sequence may be an arrangement result that makes the starting time of the target application in the current operating environment shortest, that is, an optimal loading sequence of the plurality of functional modules, and not only does the selection of the target loading sequence keep the learning state all the time, but also adapts to the change of the operating environment of the target application.

Step S206, a plurality of functional modules are loaded in sequence according to the target loading sequence.

In the technical solution provided by step S206 of the present invention, after determining the target loading order of the plurality of functional modules in the current operating environment based on the reinforcement learning algorithm, the plurality of functional modules may be sequentially loaded to the target application according to the target loading order, and the loading duration of the plurality of functional modules is the shortest, so that the target application is started more smoothly and quickly.

In the related art, when determining the loading order of all functional modules to be loaded, the priority of loading the functional modules is usually determined according to the importance of each functional module, and then the loading order of the functional modules is determined according to the priority, and the functional modules are sequentially loaded according to the loading order. Although the method can meet the design requirement of sequentially loading the functional modules according to the loading sequence and enable users to preferentially see and use the more important functional modules, the method is not user-friendly, cannot adjust the loading sequence according to the current operating environment of the application, and also can cause the situations of overlong starting response time or process termination; the other method is that a developer lists all loading sequences of all functional modules, performs loading test on the functional modules in each loading sequence, finds out the loading sequence with the shortest loading time and writes the loading sequence into a starting program. Both of which belong to the static loading method.

In the present application, through the above steps S202 to S208, the loading order of the plurality of function modules is determined in the current operating environment of the target application, and the target loading order of the plurality of function modules, which is applicable to the current operating environment, may be dynamically determined by using the characteristics of the reinforcement learning algorithm, rather than a static loading method, so that the loading order of the plurality of function modules in the original operating environment is dynamically updated to achieve the purpose of adapting to the current operating environment, thereby solving the technical problem that the loading order of each function module cannot be dynamically determined by monitoring the change of the operating environment, and further achieving the technical effect of dynamically determining the loading order of each function module by monitoring the change of the operating environment.

The above method of this example is further illustrated below.

As an alternative implementation, in step S204, determining a target loading order of the plurality of functional modules in the current operating environment based on the reinforcement learning algorithm includes: an obtaining step, obtaining at least one expected value corresponding to a first functional module in the plurality of functional modules, wherein each expected value is used for representing the expectation of loading the rest functional modules after the first functional module is loaded, and the rest functional modules are the functional modules which do not determine the loading sequence except the first functional module in the plurality of functional modules; determining, namely determining the residual function module with the maximum expected value as a second function module which needs to be loaded after the first function module is loaded; and judging whether the plurality of functional modules have residual functional modules, if so, determining the second functional module as the first functional module, returning to the obtaining step, and if not, determining the sequence corresponding to the functional modules with the determined loading sequence as a target loading sequence, wherein the functional modules with the determined loading sequence comprise the first functional module and the second functional module.

In this embodiment, when the target loading order of the plurality of functional modules under the current operating environment is determined based on the reinforcement learning algorithm, the obtaining step may be performed first, a first functional module is determined from the plurality of functional modules, the first functional module may be the currently loaded functional module, and at least one expected value corresponding to the first functional module is obtained corresponding to a concept of a state at a certain time in reinforcement learning, wherein in a subsequent reinforcement training process, an action selection of each state may be determined by using the expected value, and the expected value may be calculated by learning an action value function in combination with a time-consuming evaluation. The correspondence relationship between the first function module and each expected value in this embodiment means that each expected value is used to characterize an expectation of future loading of a remaining function module after the target application loads the first function module, the remaining function module being a function module, other than the first function module, whose loading order among the plurality of function modules has not been determined, and the selection operation for the remaining function module corresponds to a plurality of actions that the state in the reinforcement learning has, and each expected value corresponds to a future reward expectation (Q value) of the reinforcement learning for performing a certain action in a state at a certain time.

After obtaining at least one expected value corresponding to a first functional module of the plurality of functional modules, a determination step may be performed to evaluate an action based on the at least one expected value corresponding to the first functional module, this action determines that the next functional module to be loaded after the first functional module is loaded, the remaining functional module with the highest expectation value can be determined from the remaining functional modules, the loading duration of the remaining functional module with the highest expected value may be the shortest compared to the loading duration of the other remaining functional modules, and then the remaining functional module with the highest expected value is determined as the second functional module, and the second functional module is loaded after the first functional module is loaded, and the operation of loading the second functional module after the first functional module corresponds to the estimated selected action, so that the loading sequence of the first functional module and the second functional module is determined.

After determining that the remaining function module with the maximum expected value is the second function module which needs to be loaded after the first function module is loaded, judging whether the remaining function modules exist in the plurality of function modules, and if the remaining function modules do not exist in the plurality of function modules, indicating that the target loading sequence of the plurality of function modules is determined; if the plurality of functional modules have remaining functional modules, re-determining the second functional module as the first functional module, and returning to the step of obtaining, and obtaining at least one expected value corresponding to the re-determined first functional module in the plurality of functional modules, where each expected value is used to represent an expectation of loading the remaining functional modules after loading the re-determined first functional module, and the remaining functional modules are functional modules, other than the re-loaded first functional module, of the plurality of functional modules for which a loading order is not determined And (4) a strong learning algorithm.

The method of determining the first functional module from the plurality of functional modules according to the embodiment will be described below.

As an optional implementation, before the first performing the obtaining step, the method further comprises one of: determining a function module randomly selected from a plurality of function modules as a first function module; and determining the functional module with the target attribute meeting the preset condition as a first functional module.

In this embodiment, at the beginning of the start of the target application, none of the function modules is turned on, the expected value of each function module is 0, and the initial value of each cell is set to 0 corresponding to the Q-table established in the reinforced learning algorithm by initializing this embodiment. In this case, it is not known what action should be taken next, that is, it is not determined which functional module needs to be loaded, and then one functional module may be randomly selected from the plurality of functional modules, and the randomly selected functional module may be determined as the first functional module.

Optionally, in the embodiment, when a function module randomly selected from the plurality of function modules is determined as the first function module, an epsilon greedy strategy may be adopted to randomly select the function module from the plurality of function modules, where the epsilon greedy strategy is to set an exploration rate (epsilon) of 1, that is, the exploration rate is a decay rate, and the epsilon greedy strategy is initially a maximum value, and is used to indicate that, at the initial stage of the reinforcement learning training, a manner of how to select a next function module is set, at this time, an expected value of each function module is 0, and it is not known what action should be taken next, so that a function module is randomly selected from the plurality of function modules according to the exploration rate to perform a large amount of exploration, and the randomly selected function module is determined as the first function module. Alternatively, this embodiment may have to set a large search rate at the beginning of the training, which may be gradually reduced as the estimated expectation becomes more confident through the reinforcement learning algorithm.

Optionally, in this embodiment, attributes of the plurality of function modules are obtained, and the attributes of the plurality of function modules may include, but are not limited to, importance levels of the plurality of function modules in the target application. The function module with the target attribute meeting the predetermined condition may be searched from the plurality of function modules, where the predetermined condition is a limitation condition for screening out the first function module from the plurality of function modules according to the target attribute, for example, the function module with the greatest importance degree in the plurality of function modules is determined as the function module meeting the predetermined condition, and then determined as the first function module.

The above-described acquisition step of this embodiment is further explained below.

As an optional implementation manner, the acquiring step includes: acquiring a first loading time length required by loading of each residual function module after the first function module is loaded; determining a first reward value for loading each residual function module through the first loading duration, and acquiring a weight of the first reward value; determining each of the expected values based at least on the first reward value for each remaining function module, the weight of the first reward value for each remaining function module.

In this embodiment, when obtaining at least one expected value corresponding to a first function module of the plurality of function modules, a first loading duration required for loading each remaining function module after loading the first function module may be obtained first, where the first loading duration is a basis for determining a reward function R corresponding to each remaining function module, and this embodiment may construct a function by using time-consuming evaluation, further determine, through the reward function, a first reward value for loading each remaining function module, and obtain a weight value of the first reward value, where the weight value may be a given constant. After obtaining the first reward value for each remaining function module and the weight value for the first reward value for each remaining function module, a corresponding each expected value is determined based at least on the first reward value for each remaining function module, the weight value for the first reward value for each remaining function module. Optionally, the embodiment utilizes a learning action value function in combination with the time-consuming evaluation to calculate the state, the reward value and the corresponding expected value output after loading into the next functional module, and the learning action value function can be regarded as a calculator scrolling on the Q-table to find the row associated with the current state and the column associated with the action. It will calculate and return the current expectation for the matching cell, i.e., the future reward expectation for performing the action in that state, which can be implemented by the following equation:

wherein s is_tFor indicating the status result of step t, a_tFor indicating an operation of selecting one remaining function module to be activated after the activation to the tth function module (corresponding to the action to be performed at the tth step in the reinforcement learning algorithm), E_πIs used for expressing the operation of obtaining the expected value,

for representing the sum of future rewards from the current step to the end of all executed actions, gamma for representing the weight of the first reward value, t for representing that the tth function block has been started (corresponding to the execution to the step in the reinforcement learning algorithm), gamma^tA weight value for representing a first reward value corresponding to the t-th step, R for representing a reward function, R_t+1For indicating the execution of action a at the t-th step_tPrize value of, s₀＝s_tFor indicating the currently executing function block (corresponding to the currently initiated state in the reinforcement learning algorithm))，a₀A is used to indicate an operation of selecting one remaining function module to be started after the currently executed function module (corresponding to the currently executed action in the reinforcement learning algorithm).

As an optional implementation, the method further comprises: updating, namely updating each expected value corresponding to the first functional module to obtain at least one updated expected value; the determining step includes: and determining the residual function module corresponding to the maximum update expected value in the at least one update expected value as a second function module needing to be loaded after the first function module is loaded.

In this embodiment, each expected value corresponding to the first function module may further be updated iteratively, so that the updating step is performed to update each expected value corresponding to the first function module to obtain at least one updated expected value, and further in the determining step, an action may be estimated and selected based on the at least one updated expected value corresponding to the first function module, the action determines a function module to be loaded next after the first function module is loaded, the remaining function module with the highest updated expected value may be determined from the remaining function modules, a loading duration of the remaining function module with the highest updated expected value may be shortest compared with loading times of other remaining function modules, and then the remaining function module with the highest updated expected value is determined as the second function module, which is loaded after the first function module is loaded, the operation of loading the second functional module after the first functional module corresponds to the estimated selected one of the actions, so that the loading order of the first functional module and the second functional module can be determined again.

As an optional implementation, the updating step includes: acquiring the weight of each expected value; acquiring a second loading time length of the functional module with the determined loading sequence; determining a second reward value for loading the functional modules with the determined loading sequence through a second loading time length, and acquiring a weight of the second reward value; and updating each expected value at least based on the weight of each expected value, the second reward value and the weight of the second reward value to obtain each updated expected value.

In this embodiment, in implementing the updating step, a weight of the expected value corresponding to each remaining functional module may be obtained first, and then a second reward value of the functional module with the determined loading order may be obtained, where the second reward value of the functional module with the determined loading order corresponds to the reward value of the last t steps in the reinforcement learning algorithm, and the second reward value of the functional module with the determined loading order may be determined by a second loading duration of the functional module with the determined loading order, where the second loading duration is a basis for determining the reward function R corresponding to the functional module with the determined loading order, and further the second reward value of the functional module with the determined loading order may be determined by the reward function, and a weight of the second reward value may be obtained, and each of the expected values may be updated based on at least the weight of each expected value, the second reward value, and the weight of the second reward value, resulting in each update, the method can be implemented by a Bellman formula, which is a dynamic programming equation for iteratively updating each expected value corresponding to the first functional module, and can be expressed by the following formula:

New(s,a)＝Q(s,a)+α[R(s,a)+γmax Q'(s',a')-Q(s,a)]wherein New (s, a) is used to represent the expected value corresponding to the first iteratively updated function module, Q (s, a) is used to represent the expected value corresponding to the remaining function module, α is used to represent the weight of the expected value corresponding to the remaining function module calculated this time, R (s, a) is used to represent the second reward value (reward value of t step before this time) of the function module with determined loading order, γ is used to represent the weight of each reward value, γ max Q '(s', a ') is used to represent the optimal expected value (Q value) of t +1 step prediction this time, and both of R (s, a) and γ max Q' (s ', a') are added to represent that t step before this time is predicted in the current state s_tPerform the next action a_tIs calculated from the expected value of (c).

In the reinforcement learning training process of this embodiment, Bellman equalization is used to update each expected value corresponding to the first function block, where Q (s, a) ═ r + γ (max (Q (s ', a')), and Bellman equalization is explained as Q (s, a) is used to represent the instant prize value r after taking action a for the current state s, plus the maximum prize value rewardmax (Q (s ', a') after discounting γ.

In this embodiment, the obtaining step, the determining step, and the updating step are repeatedly performed, the action selection of loading the functional module is continuously completed to update the bonus value, and the loading sequence of the plurality of functional modules is determined.

As an optional implementation, the updating step includes: updating each expected value corresponding to the first functional module until at least one of the following conditions is met: the time length for sequentially loading the plurality of functional modules based on the target loading sequence is less than a target threshold value; in response to the update end instruction.

In this embodiment, when the updating step is executed, it may be determined whether a loading duration for sequentially loading the plurality of functional modules according to the target loading order is smaller than a target threshold, and if it is determined that the loading duration for sequentially loading the plurality of functional modules according to the target loading order is smaller than the target threshold, the updating of each expected value corresponding to the first functional module is finished; if the loading duration for sequentially loading the plurality of functional modules according to the target loading sequence is judged to be not less than the target threshold, each expected value corresponding to the first functional module can be continuously updated; optionally, the embodiment may also manually stop the reinforcement learning training, and in response to the update ending instruction, end updating each expected value corresponding to the first functional module.

As an alternative embodiment, determining the target loading sequence of the plurality of functional modules in the current operating environment based on the reinforcement learning algorithm includes at least one of the following: determining a target loading sequence based on a reinforcement learning algorithm under the condition of starting the target application once; determining a target loading sequence based on a reinforcement learning algorithm under the condition that the number of times of starting the target application reaches a target number of times; and determining a target loading sequence based on a reinforcement learning algorithm at intervals of a preset interval.

In this embodiment, when determining a target loading sequence of a plurality of functional modules in a current operating environment based on a reinforcement learning algorithm is implemented, the target loading sequence of the plurality of functional modules may be determined based on the reinforcement learning algorithm each time a target application is detected to be started, or the number of times the target application is started at present may be obtained, whether the number of times the target application is started at present reaches a target number is determined, and if the number of times the target application is started reaches the target number, the target loading sequence may be determined based on the reinforcement learning algorithm, that is, the target application is started multiple times in a daily application, reinforcement learning training for determining the loading sequence of the multifunctional modules is completed, and the loading sequence of the multifunctional modules is determined; in this embodiment, a preset interval time may also be set, and the target loading order is automatically determined based on the reinforcement learning algorithm at intervals of the preset interval time, so as to sequentially load the plurality of functional modules according to the target loading order.

The method for determining the loading sequence of the plurality of functional modules based on the reinforcement learning algorithm has comprehensiveness, and can cover all functional modules of target application when action selection and state determination are set, and different training results are obtained only according to different sequences; stability, the optimal loading order of each functional module in this embodiment may be a result of a user starting a target application again and again in a personal operating environment to perform intensive training, and is a behavior of performing iterative accumulation according to the situation of each different user, so that the method is very inclusive, can accommodate various loading situations in training, and finally generates an optimal sequence that can also stably run; the embodiment has the advantages that even in the current operating environment, after the optimal loading sequence is screened out through multiple times of intensive training, the change of the starting time consumption of the functional modules can be monitored all the time, so that whether the operating environment changes is judged, and finally the change is coped with through continuous training, so that the technical problem that the loading sequence of each functional module cannot be dynamically determined by monitoring the change of the operating environment is solved, and the technical effect of dynamically determining the loading sequence of each functional module by monitoring the change of the operating environment is achieved.

The embodiment adopts the steps of acquiring a plurality of functional modules to be loaded by a target application; determining a target loading sequence of the plurality of functional modules under the current operating environment based on a reinforcement learning algorithm; and sequentially loading a plurality of functional modules according to the target loading sequence. That is to say, in the present application, in the current operating environment of the target application, the reinforcement learning algorithm is used to determine the target loading sequence of the plurality of functional modules, which is suitable for the current operating environment, so as to dynamically update the loading sequence of the plurality of functional modules in the original operating environment, so as to achieve the purpose of adapting to the current operating environment, thereby solving the technical problem that the loading sequence of each functional module cannot be dynamically determined by monitoring the change of the operating environment, and further achieving the technical effect of dynamically determining the loading sequence of each functional module by monitoring the change of the operating environment.

Example 2

The technical solution of the present invention will be described below with reference to preferred embodiments.

With the increasing demand of users, the internal structure of the corresponding target application is more and more complex, and new functional modules can be derived continuously. If the difference of the running environments such as the device performance and the network rate is not considered, when the target application is opened, if all the function modules are loaded at the same time, the whole starting response time is too long, and the results of incomplete loading of the function modules, process interruption and the like are caused. Therefore, in order to enable the software to be started more smoothly and quickly, all functional modules needing to be loaded need to be started in a block sequencing manner.

At present, developers usually adopt some traditional solutions, one is to determine the loading priority of a plurality of functional modules according to the importance of each functional module, sort the functional modules according to the priority, and sequentially load the sorted functional modules in blocks; the other method is that the developer lists all the sequencing combinations of the plurality of functional modules, tests the operation of the functional modules, finds out the sequencing which takes the shortest time, writes the sequencing into the starting program, and loads the plurality of functional modules.

If the loading of the plurality of functional modules is sequenced according to the priority of each functional module, although the requirements of a software designer can be met, users can see and use the more important functional modules firstly, the software is not friendly to the users, the loading sequence cannot be adjusted according to the current running environment of the software, and the situations that the starting response time is too long or the process is terminated may occur; the other method can realize the shortest time consumption in the debugging environment of a developer, but cannot adjust according to the actual running environment of the target application, and cannot dynamically update the loading sequence of each functional module by monitoring the change of the running environment, so that the requirement of the shortest starting time under the current condition cannot be met. Both of which are static methods.

In view of the above drawbacks of the related art, the embodiment provides a reinforcement learning-based multi-module dynamic loading method, which can continuously update the loading sequence of a user as the number of times that the user opens a target application in a personal operating environment increases until finding out the shortest arrangement result consuming time for starting. Furthermore, the selection of the loading sequence can keep the learning state all the time, and the change of the running environment of the target application can be self-adapted.

The reinforcement learning-based multi-module dynamic loading method of the embodiment is further described below.

The present invention will be described in further detail with reference to the accompanying drawings and examples.

FIG. 3 is a flowchart of a reinforcement learning-based multi-function module dynamic loading method according to an embodiment of the present invention. As shown in fig. 3, the method may include the steps of:

step S301, creating grids corresponding to a plurality of functional modules, and converting the grids into a Q-table form.

The method of this embodiment is based on reinforcement Learning, and in order to avoid adding too much performance burden to the software as a whole, the method selects to perform adaptation improvement on the basis of the Q-Learning algorithm. Generally, the Q-Learning algorithm is mostly applied to path selection in an environment, so an n × n grid is created, each grid is used to represent a current state, each state has multiple action selections, and the expectation of the maximum future reward can be obtained by combining evaluation indexes. Usually, the mesh created by the Q-Learning algorithm in the application of path planning is designed to be n × n (not all states need to be traversed, as long as the specified end point can be reached). Unlike the conventional path planning application, the first step of this embodiment is to create an n × 1-type mesh for indicating that n states (function modules) need to be executed in sequence, and n-i actions in each state can be selected (i ═ 0 … n) to select the next function module to be loaded.

After the mechanism of the Q-Learning algorithm is adapted, the n × 1 grid is then converted into Q-tables, each column is used to represent n-i action choices, each row is used to represent a state, the value of each cell is used to represent the maximum future reward expectation (i.e., Q value) for a given state and corresponding action, and the Q value is used to determine the action choice for each state in the subsequent intensive training process, so the embodiment calculates the Q value by Learning the action value function in combination with the time-consuming evaluation.

The following is a description of a specific example:

FIG. 4 is a diagram illustrating a grid to Q-table conversion in a reinforcement learning algorithm according to an embodiment of the present invention. As shown in fig. 4, a target application may contain six types of function modules to be loaded, and this embodiment creates a 6 x 1-form grid, converts the grid into a 6 x 6-form Q-table, where each column is used to indicate that there is a corresponding number of action selections, and "\\" is used to indicate that such function modules have been loaded and cannot be used as alternative actions, and the initial Q value is 0.

Step S302, initialize the Q-table established in this embodiment, and set the initial value of each cell to 0.

In this embodiment, the Q-table will give the same arbitrary set value (which may be 0 in most cases) before determining the loading order of the plurality of functional modules. As exploration continues, the Q-table will update the Q (s, a) value to give better and better approximation by iteratively using the Bellman equation (dynamic programming equation).

In step S303, under the functional module S to be loaded, an action a is selected based on the current Q value estimation, and the action determines the next functional module to be loaded.

In this embodiment, in the initial state, none of the function modules is opened, one function module may be randomly selected, or one function module may be selected according to the importance degree.

In this embodiment, where the Q values are all 0 at the beginning, and it is not determined what action should be taken next, the embodiment may use an epsilon greedy strategy for random selection.

The epsilon greedy strategy is to set a search rate of 1, that is, the search rate is initially the maximum value, and at this time, any value in the Q-table is not determined, so that a large amount of search needs to be performed by random selection. At the beginning of the reinforcement learning training, a large epsilon is set, which will gradually decrease as the reinforcement structure becomes more confident about the estimated Q value, as shown in fig. 5. Fig. 5 is a schematic diagram illustrating a variation of an exploration rate according to an embodiment of the present invention.

And step S304, calculating the Q value output after being loaded to the next module by utilizing a learning action value function and combining time consumption evaluation.

In this embodiment, the learned action value function may be viewed as a calculator that scrolls through the Q-table to find the row associated with the current state and the column associated with the action, which will calculate and return the Q value for the matching cell at the current time, i.e., the future reward expectation for performing the action in that state, as shown in equation (1):

for indicating currentStep to end the sum of all future rewards for performing the action, γ is used to represent the weight of the first reward value, t is used to represent that the tth function block has been initiated (corresponding to the execution to the step in the reinforcement learning algorithm), γ^tA weight value for representing a first reward value corresponding to the t-th step, R for representing a reward function, R_t+1For indicating the execution of action a at the t-th step_tPrize value of, s₀＝s_tFor indicating the currently executing functional module (corresponding to the currently initiated state in the reinforcement learning algorithm), a₀A is used to indicate an operation of selecting one remaining function module to be started after the currently executed function module (corresponding to the currently executed action in the reinforcement learning algorithm).

And step S305, updating the Q (S, a) value in the Q-table by using a Bellman equation.

In this embodiment, the Bellman equation is a dynamic programming equation for iteratively updating the Q value, as shown in equation (2):

New(s,a)＝Q(s,a)+α[R(s,a)+γmax Q'(s',a')-Q(s,a)] (2)

wherein New (s, a) is used to represent the expected value corresponding to the first iteratively updated function module, Q (s, a) is used to represent the expected value corresponding to the remaining function module, α is used to represent the weight of the expected value corresponding to the remaining function module calculated this time, R (s, a) is used to represent the second reward value (reward value of t step before this time) of the function module with determined loading order, γ is used to represent the weight of each reward value, γ max Q '(s', a ') is used to represent the optimal expected value (Q value) of t +1 step prediction this time, and both R (s, a) and γ max Q' (s ', a') are added to represent that t step prediction before this time is in the current state s, s_tPerform the next action a_tIs calculated from the expected value of (c).

Step S306, repeatedly executing S303, S304, and S305, and continuously completing the action selection of loading the functional module to update Q until the loading sequence with the shortest time consumption is found or the intensive training is manually stopped.

In summary, the method of the embodiment may not only continuously update the loading sequence as the number of times that the user opens the target application in the personal operating environment increases until finding the shortest arrangement result consuming time for starting, but also keep the learning state for the selection of the loading sequence, so as to dynamically adjust the change of the operating environment of the target application.

As an optional example, an embodiment of the present invention further provides a system for dynamically loading functional modules in software. It should be noted that the system for dynamically loading functional modules in software according to this embodiment may be used to execute the method for dynamically loading functional modules in software according to this embodiment of the present invention.

Fig. 6 is a schematic diagram of a system for dynamically loading functional modules in software according to an embodiment of the present invention. As shown in fig. 6, the system 60 for dynamically loading functional modules in software may include: an adaptation module 61, a training module 62 and a ranking module 63.

The adaptation module 61 is configured to adapt the method of this embodiment to a mechanism of the Q-learning algorithm, and establish a corresponding mesh structure and a Q-table.

The training module 62 keeps the training state by increasing the number of times that the user opens the target application in the personal operating environment, continuously updates the loading mode, and iteratively updates the Q value through the Q function and the Bellman equation.

And the sequencing module 63 is configured to determine a loading sequence of the plurality of functional modules under the current operating environment according to the last updated Q-table after determining the loading method of the plurality of functional modules that consumes the shortest time or stopping the intensive training manually.

The method for determining the loading sequence of the plurality of functional modules based on the reinforcement learning algorithm has comprehensiveness, and can cover all functional modules of target application when setting action selection and state determination, and obtain different training results according to different sequences; stability, the optimal loading order of each functional module in this embodiment may be a result of a user starting a target application again and again in a personal operating environment to perform intensive training, and is a behavior of performing iterative accumulation according to the situation of each different user, so that the method is very inclusive, can accommodate various loading situations in training, and finally generates an optimal sequence that can also stably run; the method has the advantages that the method can monitor the change of the starting time consumption all the time even in the current operating environment after the optimal loading sequence is screened out through multiple times of intensive training, so that whether the operating environment is changed or not is judged, and finally the change is coped with through continuous training, so that the technical problem that the loading sequence of each functional module cannot be dynamically determined by monitoring the change of the operating environment is solved, and the technical effect of dynamically determining the loading sequence of each functional module by monitoring the change of the operating environment is achieved.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

Example 3

The embodiment of the invention also provides a loading device of the functional module. It should be noted that the loading apparatus for functional modules in this embodiment may be used to execute the method for loading functional modules in this embodiment of the present invention.

Fig. 7 is a schematic diagram of a loading apparatus for a functional module according to an embodiment of the present invention. As shown in fig. 7, the loading device 70 of the functional module may include: an acquisition unit 71, a determination unit 72, and a loading unit 73.

The obtaining unit 71 is configured to obtain a plurality of functional modules to be loaded by the target application.

A determining unit 72, configured to determine a target loading order of the plurality of functional modules in the current operating environment based on a reinforcement learning algorithm.

And a loading unit 73, configured to sequentially load the plurality of functional modules according to the target loading order.

Optionally, the determining unit 72 includes: the first obtaining module is used for executing the obtaining step and obtaining at least one expected value corresponding to a first functional module in the plurality of functional modules, wherein each expected value is used for representing the expectation of loading the rest functional modules after the first functional module is loaded, and the rest functional modules are the functional modules which are not determined to be loaded in the loading sequence except the first functional module in the plurality of functional modules; the first determining module is used for executing the determining step and determining the residual function module with the maximum expected value as a second function module which needs to be loaded after the first function module is loaded; and the judging module is used for judging whether the plurality of functional modules have the residual functional modules or not, if so, determining the second functional module as the first functional module and returning to the obtaining step, and if not, determining the sequence corresponding to the functional modules with the determined loading sequence as the target loading sequence, wherein the functional modules with the determined loading sequence comprise the first functional module and the second functional module.

Optionally, the apparatus further comprises one of: a first determining unit configured to determine a function module randomly selected from the plurality of function modules as a first function module; and the second determining unit is used for determining the functional module with the target attribute meeting the preset condition from the plurality of functional modules as the first functional module.

Optionally, the second obtaining module includes: the obtaining submodule is used for obtaining a first loading duration required by loading each residual function module after the first function module is loaded; the loading submodule is used for determining a first reward value for loading each residual function module through a first loading duration and acquiring a weight value of the first reward value; and the first determining submodule is used for determining each corresponding expected value at least based on the first reward value of each residual function module and the weight value of the first reward value of each residual function module.

Optionally, the apparatus further comprises: the updating unit is used for executing the updating step, updating each expected value corresponding to the first functional module, and obtaining at least one updated expected value; the first determining module includes: and the second determining submodule is used for determining the residual functional module corresponding to the maximum update expected value in the at least one update expected value as a second functional module which needs to be loaded after the first functional module is loaded.

Optionally, the update unit includes: the third obtaining module is used for obtaining the weight of each expected value; the fourth obtaining module is used for obtaining a second loading time length of the functional module with the determined loading sequence; the fifth obtaining module is used for determining a second reward value of the functional module which loads the determined loading sequence through the second loading time length and obtaining the weight of the second reward value; and the first updating module is used for updating each expected value at least based on the weight of each expected value, the second reward value and the weight of the second reward value to obtain each updated expected value.

Optionally, the updating unit includes: the second updating module is used for updating each expected value corresponding to the first function module until at least one of the following conditions is met: the time length for sequentially loading the plurality of functional modules based on the target loading sequence is less than a target threshold value; in response to the update end instruction.

Optionally, the determining unit 72 comprises at least one of: the second determining module is used for determining a target loading sequence based on a reinforcement learning algorithm under the condition that the target application is started every time; the third determining module is used for determining a target loading sequence based on a reinforcement learning algorithm under the condition that the number of times of starting the target application reaches the target number of times; and the fourth determining module is used for determining the target loading sequence based on a reinforcement learning algorithm at intervals of a preset interval.

In this embodiment, a plurality of functional modules to be loaded by a target application are acquired, and a target loading order of the plurality of functional modules in a current operating environment is determined by a determining unit based on a reinforcement learning algorithm, wherein the reinforcement learning module is obtained by training operating parameters of the target application based on reinforcement learning, and the plurality of functional modules are sequentially loaded by a loading module based on the target loading order. That is to say, in the present application, in the current operating environment of the target application, the reinforcement learning algorithm is used to determine the target loading sequence of the plurality of functional modules, which is suitable for the current operating environment, so as to dynamically update the loading sequence of the plurality of functional modules in the original operating environment, so as to achieve the purpose of adapting to the current operating environment, thereby solving the technical problem that the loading sequence of each functional module cannot be dynamically determined by monitoring the change of the operating environment, and further achieving the technical effect of dynamically determining the loading sequence of each functional module by monitoring the change of the operating environment.

Example 4

Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Example 5

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for loading a functional module is characterized by comprising the following steps:

acquiring a plurality of functional modules to be loaded by a target application;

determining a target loading sequence of the plurality of functional modules under the current operating environment based on a reinforcement learning algorithm;

sequentially loading the plurality of functional modules according to the target loading sequence;

determining a target loading sequence of the plurality of functional modules under the current operating environment based on a reinforcement learning algorithm, wherein the determining comprises the following steps: an obtaining step of obtaining at least one expected value corresponding to a first functional module in the plurality of functional modules, wherein each expected value is used for representing the expectation of loading the remaining functional modules after loading the first functional module, and the remaining functional modules are functional modules, except the first functional module, in the plurality of functional modules, of which the loading order is not determined; a determining step of determining that the remaining functional module with the maximum expected value is a second functional module to be loaded after the first functional module is loaded; judging whether the plurality of functional modules have the residual functional modules or not, if so, determining the second functional module as the first functional module, and returning to the acquiring step, and if not, determining the sequence corresponding to the functional modules with the determined loading sequence as the target loading sequence, wherein the functional modules with the determined loading sequence comprise the first functional module and the second functional module;

wherein the obtaining step comprises: acquiring a first loading duration required by loading of each remaining functional module after the first functional module is loaded; determining a first reward value for loading each residual function module according to the first loading duration, and acquiring a weight value of the first reward value; determining each of the expected values corresponding to the first reward value for each of the remaining functional modules based at least on the first reward value for each of the remaining functional modules, the weight of the first reward value for each of the remaining functional modules.

2. The method of claim 1, wherein prior to performing the obtaining step for the first time, the method further comprises one of:

determining a function module randomly selected from the plurality of function modules as the first function module;

and determining the functional module with the target attribute meeting the preset condition as the first functional module.

3. The method of claim 1, further comprising:

updating, namely updating each expected value corresponding to the first functional module to obtain at least one updated expected value;

the determining step includes: and determining the remaining functional module corresponding to the maximum update expected value in the at least one update expected value as a second functional module which needs to be loaded after the first functional module is loaded.

4. The method of claim 3, wherein the updating step comprises:

acquiring the weight of each expected value;

acquiring a second loading time length of the functional module with the determined loading sequence;

determining a second reward value of the functional module with the determined loading sequence through the second loading time length, and acquiring a weight of the second reward value;

and updating each expected value at least based on the weight value of each expected value, the second reward value and the weight value of the second reward value to obtain each updated expected value.

5. The method of claim 3, wherein the updating step comprises:

updating each expected value corresponding to the first functional module until at least one of the following conditions is met: the time length for sequentially loading the plurality of functional modules based on the target loading sequence is less than a target threshold value; in response to the update end instruction.

6. The method of any one of claims 1 to 5, wherein determining the target loading order of the plurality of functional modules in the current operating environment based on a reinforcement learning algorithm comprises at least one of:

determining the target loading order based on the reinforcement learning algorithm with each launch of the target application;

determining the target loading sequence based on the reinforcement learning algorithm under the condition that the number of times of starting the target application reaches a target number of times;

and determining the target loading sequence based on the reinforcement learning algorithm at intervals of a preset interval.

7. A loading apparatus for a functional module, comprising:

the system comprises an acquisition unit, a processing unit and a control unit, wherein the acquisition unit is used for acquiring a plurality of functional modules to be loaded by a target application;

the determining unit is used for determining the target loading sequence of the functional modules under the current operating environment based on a reinforcement learning algorithm;

the loading unit is used for sequentially loading the plurality of functional modules according to the target loading sequence;

wherein the determining unit is further configured to determine a target loading order of the plurality of functional modules in the current operating environment based on a reinforcement learning algorithm by: an obtaining step of obtaining at least one expected value corresponding to a first functional module in the plurality of functional modules, wherein each expected value is used for representing the expectation of loading the remaining functional modules after loading the first functional module, and the remaining functional modules are functional modules, except the first functional module, in the plurality of functional modules, of which the loading order is not determined; a determining step of determining that the remaining functional module with the maximum expected value is a second functional module to be loaded after the first functional module is loaded; judging whether the plurality of functional modules have the residual functional modules or not, if so, determining the second functional module as the first functional module, and returning to the acquiring step, and if not, determining the sequence corresponding to the functional modules with the determined loading sequence as the target loading sequence, wherein the functional modules with the determined loading sequence comprise the first functional module and the second functional module;

wherein the obtaining step comprises: acquiring a first loading duration required by loading of each remaining functional module after the first functional module is loaded; determining a first reward value for loading each residual function module according to the first loading duration, and acquiring a weight of the first reward value; determining each of the expected values corresponding to the first reward value for each of the remaining functional modules based at least on the first reward value for each of the remaining functional modules, the weight of the first reward value for each of the remaining functional modules.

8. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1 to 6 when executed.

9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 6.