CN102063640B - Robot Behavior Learning Model Based on Utility Difference Network - Google Patents
Robot Behavior Learning Model Based on Utility Difference Network Download PDFInfo
- Publication number
- CN102063640B CN102063640B CN 201010564142 CN201010564142A CN102063640B CN 102063640 B CN102063640 B CN 102063640B CN 201010564142 CN201010564142 CN 201010564142 CN 201010564142 A CN201010564142 A CN 201010564142A CN 102063640 B CN102063640 B CN 102063640B
- Authority
- CN
- China
- Prior art keywords
- action
- network unit
- layer
- input
- function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Feedback Control In General (AREA)
Abstract
Description
技术领域 technical field
本发明涉及一种基于效用差分网络的机器人行为学习模型,属于人工智能领域的新应用之一。The invention relates to a robot behavior learning model based on a utility difference network, which belongs to one of new applications in the field of artificial intelligence.
背景技术 Background technique
机器人智能行为一般是指机器人在感知周边环境的基础上进行推理与决策,达到行为智能决策的过程。智能行为决策模型的建立需要对知识进行获取、表示和推理,并且能够自动评价机器人行为的优劣。目前,基于强化学习技术的认知行为模型在知识的获取、对决策环境的适应性、可重用性等方面所具有的优点,使其成为智能行为建模的首选。Robot intelligent behavior generally refers to the process in which a robot performs reasoning and decision-making on the basis of perceiving the surrounding environment to achieve behavioral intelligent decision-making. The establishment of an intelligent behavior decision-making model requires the acquisition, representation and reasoning of knowledge, and the ability to automatically evaluate the pros and cons of robot behavior. At present, the cognitive behavioral model based on reinforcement learning technology has advantages in knowledge acquisition, adaptability to decision-making environment, and reusability, making it the first choice for intelligent behavioral modeling.
强化学习过程需要对环境进行探索。可表述为:在某个状态下,决策者选择并执行一个动作,然后感知下一步的环境状态以及相应的回报。决策者并没有被直接告知在什么情况下要采取什么行动,而是根据回报修正自身的行为,来赢得更多的回报。简单地说,强化学习过程就是允许决策者通过不断尝试以得到最佳行动序列的过程。The reinforcement learning process requires exploration of the environment. It can be expressed as: In a certain state, the decision maker chooses and executes an action, and then perceives the next environment state and the corresponding reward. Decision makers are not directly told what actions to take under what circumstances, but modify their behavior according to the rewards to win more rewards. Simply put, the reinforcement learning process is the process that allows the decision maker to obtain the best sequence of actions through repeated trials.
目前机器人强化学习的行为决策中使用较多的是基于特定知识或规则的反应式方式,这种方式的缺点一是知识获取有限,二是问题获取的知识往往带有经验性,不能及时学习新的知识,三是推理过程实时性不高等。At present, the behavioral decision-making of robot reinforcement learning is mostly based on specific knowledge or rules. The disadvantage of this method is that the knowledge acquisition is limited, and the second is that the knowledge acquired by the problem is often empirical and cannot learn new ones in time. The third is that the reasoning process is not real-time.
发明内容 Contents of the invention
本发明针对目前机器人强化学习的行为决策存在的缺点,建立了一种基于效用差分网络的机器人行为学习模型。该模型是一个基于评价的学习系统,通过对环境的交互,自动生成系统的控制率,进而控制给出选择动作。本发明基于效用差分网络的机器人行为学习模型,解决一般行为决策模型知识获取有限、经验性过强的问题,实现的离线学习过程和在线决策过程,解决推理过程实时性不高的问题。Aiming at the shortcomings of the behavior decision-making of the current robot reinforcement learning, the invention establishes a robot behavior learning model based on utility difference network. The model is an evaluation-based learning system. Through the interaction with the environment, the control rate of the system is automatically generated, and then the control is given to select actions. The robot behavior learning model based on the utility difference network of the present invention solves the problem of limited knowledge acquisition and too strong experience of the general behavior decision-making model, realizes the offline learning process and online decision-making process, and solves the problem of low real-time reasoning process.
一种基于效用差分网络的机器人行为学习模型,包括:效用拟合网络单元、差分信号计算网络单元、置信度评价网络单元、动作决策网络单元、动作校正网络单元和动作执行单元;所述的效用拟合网络单元用来计算t时刻动作at经动作执行单元执行后产生的状态空间向量st所得到的效用拟合值并输出给差分信号计算网络单元;差分信号计算网络单元根据输入的效用拟合值以及根据状态空间向量st计算的立即回报函数,进一步计算得到差分信号ΔTDt,并将该差分信号ΔTDt输出给效用拟合网络单元、置信度评价网络单元以及动作决策网络单元;效用拟合网络单元利用差分信号ΔTDt更新效用拟合网络单元中神经网络的权值;置信度评价网络单元利用效用拟合网络单元中神经网络的输入层的输入向量和隐层的输出向量以及差分信号,计算动作决策结果的置信度,并将该置信度输出给动作校正网络单元;动作决策网络单元根据输入的差分信号ΔTDt与状态空间向量st,进行动作的选择学习,输出动作选择函数给动作校正网络单元,其中j、k为大于0的整数;动作校正网络单元利用输入的置信度,对输入的动作选择函数进行校正,然后计算校正后的动作的选取概率值,将概率最大的动作输出给动作执行单元执行,该动作执行后的状态空间向量再反馈输入给效用拟合网络单元、差分信号计算网络单元和动作决策网络单元。A robot behavior learning model based on a utility differential network, comprising: a utility fitting network unit, a differential signal calculation network unit, a confidence evaluation network unit, an action decision network unit, an action correction network unit and an action execution unit; The fitting network unit is used to calculate the utility fitting value obtained by the state space vector s t generated by the action a t executed by the action execution unit at time t And output to the differential signal calculation network unit; the differential signal calculation network unit fits the value according to the input utility And according to the immediate reward function calculated by the state space vector s t , the difference signal ΔTD t is further calculated, and the difference signal ΔTD t is output to the utility fitting network unit, the confidence evaluation network unit and the action decision network unit; utility fitting The network unit uses the differential signal ΔTD t to update the weight of the neural network in the utility fitting network unit; the confidence evaluation network unit utilizes the input vector of the input layer of the input layer of the neural network in the utility fitting network unit, the output vector of the hidden layer and the differential signal, Calculate the confidence degree of the action decision result, and output the confidence degree to the action correction network unit; the action decision network unit performs action selection learning according to the input differential signal ΔTD t and the state space vector s t , and outputs the action selection function Give the action correction network unit, where j and k are integers greater than 0; the action correction network unit uses the confidence of the input to select a function for the input action Perform correction, then calculate the selected probability value of the corrected action, output the action with the highest probability to the action execution unit for execution, and then feed back the state space vector after the action is executed to the utility fitting network unit, differential signal calculation network unit and Action Decision Network Unit.
所述的学习模型具有两个过程:离线学习过程和在线决策过程;所述的离线学习过程中上述各单元都要参与,所述的在线决策过程中仅由离线学习最后得到的动作决策网络单元与动作执行单元参与,在线决策过程中的动作决策网络单元根据t时刻动作执行后的状态空间向量st进行计算并得出输出动作选择函数通过动作选择器输出最终选择的动作给动作执行单元执行,执行动作后得到的状态空间向量再输入给动作决策网络单元。The learning model has two processes: an offline learning process and an online decision-making process; in the offline learning process, each of the above-mentioned units must participate, and in the online decision-making process, only the action decision network unit finally obtained by offline learning Participating with the action execution unit, the action decision network unit in the online decision-making process calculates according to the state space vector s t after the action execution at time t and obtains the output action selection function The final selected action is output through the action selector to the action execution unit for execution, and the state space vector obtained after the action is executed is then input to the action decision network unit.
本发明的优点与有益效果为:Advantage of the present invention and beneficial effect are:
(1)本发明的机器人学习模型不需要计算产生正确的行动,而是通过在行动-环境交互-评价的学习环境中解决机器人知识获取困难的问题。由于此学习模型不需要明确指定环境模型,环境的因果关系已经隐含在具体差分反馈网络中,从而能较好保证机器人获取环境知识的完备性;(1) The robot learning model of the present invention does not need to calculate and generate correct actions, but solves the problem of difficult robot knowledge acquisition in the learning environment of action-environment interaction-evaluation. Since this learning model does not need to explicitly specify the environment model, the causal relationship of the environment has been implicit in the specific differential feedback network, which can better ensure the completeness of the robot's acquisition of environmental knowledge;
(2)本模型设计的离线学习过程能在机器人决策前完成环境知识学习过程,在线决策过程能进一步完成机器人环境知识获取,运行时的决策不再进行探索和学习活动,只需要利用重构的网络进行计算和相加,这种离线与在线的模型设计保证了机器人的行为决策具有较好的实时性,较好地保证了机器人行为决策的及时性和有效性。(2) The offline learning process designed by this model can complete the environmental knowledge learning process before the robot makes a decision, and the online decision-making process can further complete the acquisition of the robot's environmental knowledge. The decision-making at runtime does not carry out exploration and learning activities, and only needs to use the reconstructed The network performs calculation and addition, and this offline and online model design ensures that the robot's behavior decision-making has better real-time performance, and better guarantees the timeliness and effectiveness of the robot's behavior decision-making.
附图说明 Description of drawings
图1为本发明学习模型第一实施例的离线学习过程结构示意图;Fig. 1 is a schematic structural diagram of the offline learning process of the first embodiment of the learning model of the present invention;
图2为本发明学习模型第一实施例的动作决策网络流程示意图;Fig. 2 is a schematic diagram of the action decision network flow of the first embodiment of the learning model of the present invention;
图3为本发明学习模型第一实施例中动作决策网络中的遗传算子编码结构示意图;Fig. 3 is a schematic diagram of the genetic operator coding structure in the action decision network in the first embodiment of the learning model of the present invention;
图4为本发明学习模型第一实施例中动作决策网络中的遗传算子交叉操作示意图;Fig. 4 is a schematic diagram of the genetic operator crossover operation in the action decision network in the first embodiment of the learning model of the present invention;
图5为本发明学习模型第二实施例中在线决策过程的示意图。Fig. 5 is a schematic diagram of the online decision-making process in the second embodiment of the learning model of the present invention.
具体实施方式 Detailed ways
下面将结合附图和实施例对本发明作进一步的详细说明。其中,第一实施例对本发明学习模型的离线学习过程进行了具体说明;第二实施例对在线决策过程进行说明。The present invention will be further described in detail with reference to the accompanying drawings and embodiments. Among them, the first embodiment specifically describes the offline learning process of the learning model of the present invention; the second embodiment describes the online decision-making process.
如图1所示,本发明学习模型包括五个部分:效用拟合网络单元11、差分信号计算网络单元12、置信度评价网络单元13、动作决策网络单元14和动作校正网络单元15。本发明学习模型的离线学习过程中,五个部分都参与其中。As shown in FIG. 1 , the learning model of the present invention includes five parts: a utility
效用拟合网络单元11用来计算t时刻选择的动作at经动作执行单元16执行后产生的不同的状态空间向量st所得到的效用拟合值并输出效用拟合值给差分信号计算网络单元12,差分信号计算网络单元12输出差分信号ΔTDt给置信度评价网络单元13和效用拟合网络单元11。效用拟合网络单元11再利用差分信号计算网络单元12输入的差分信号ΔTDt来不断更新,从而达到真实的效用拟合。The utility
差分信号计算网络单元12根据输入的效用拟合值以及根据状态空间向量st计算的立即回报函数,进一步计算得到差分信号ΔTDt,并将该差分信号ΔTDt输出给效用拟合网络单元11、置信度评价网络单元13以及动作决策网络单元14。The differential signal
置信度评价网络单元13利用效用拟合网络单元11中神经网络的输入层的输入向量和隐层的输出向量以及差分信号ΔTDt计算动作决策结果的置信度,并将该置信度输出给动作校正网络单元15,用于对动作选择的调整。The confidence
动作决策网络单元14根据输入的差分信号ΔTDt与状态空间向量st,利用递阶遗传算法对神经网络进行优化,实现动作的选择学习,输出动作选择函数给动作校正网络单元15,其中j、k为大于0的整数。The action
动作校正网络单元15利用输入的置信度,对输入的动作选择函数 进行校正,将概率最大的动作输出。动作执行后的状态空间向量再反馈输入给效用拟合网络单元11、差分信号计算网络单元12和动作决策网络单元14。The action
其中,效用拟合网络单元11用来对特定的行为引起的状态变化进行效用评价,得到效用拟合值,由两层反馈的神经网络构成,如图1所示。神经网络的输入为状态空间向量st,隐层激活函数为Sigmoid函数,神经网络输出为对动作执行之后状态的效用拟合值,神经网络的权系数为A、B和C(。该神经网络包含n个输入向量单元,以及h个隐层单元,每个隐层单元接受n个输入并具有n个连接权值,输出单元接受n+h个输入并有n+h个权值。对于h的值,用户可以自行设定,一般设定为3,本发明实施例中设置为2。Among them, the utility
该神经网络的输入向量为xi(t),i=1,2,3...n,函数xi(t)是st经过归一化得到的,则隐层单元的输出向量为:The input vector of the neural network is x i (t), i=1, 2, 3...n, the function x i (t) is obtained by normalizing st , then the output vector of the hidden layer unit is:
上式中所用到的函数aij(t)为输入层与隐层的权值A的向量,。效用拟合网络11的输出为对效用的拟合值它是对输入层和隐层的线性组合:The functions used in the above formula a ij (t) is the vector of the weight A of the input layer and the hidden layer,. The output of
其中,bi(t)表示输入层与输出层的权值B的向量,cj(t)表示隐层与输出层的权值C的向量。Among them, b i (t) represents the vector of the weight B of the input layer and the output layer, and c j (t) represents the vector of the weight C of the hidden layer and the output layer.
网络的权值A、B和C利用差分信号ΔTDt进行更新,如果差分信号ΔTDt为正,则说明在上一个行动产生了积极的效果,因此该行动被选择的机会应得到加强。输入层与输出层的权值B和隐层与输出层的权值C利用下式进行更新:The weights A, B, and C of the network are updated using the differential signal ΔTD t . If the differential signal ΔTD t is positive, it means that the last action had a positive effect, so the chance of this action being selected should be strengthened. The weight B of the input layer and the output layer and the weight C of the hidden layer and the output layer are updated using the following formula:
bi(t+1)=bi(t)+λ·ΔTDt+1·xi(t),i=1,2,3...n bi (t+1)= bi (t)+λ·ΔTD t+1 · xi (t), i=1, 2, 3...n
cj(t+1)=cj(t)+λ·ΔTDt+1·yj(t),j=1,2,3...hc j (t+1)=c j (t)+λ·ΔTD t+1 y j (t), j=1, 2, 3...h
式中,λ为大于零的常数,可由用户自行设置。输入与隐层的权值A的更新按照下式进行:In the formula, λ is a constant greater than zero, which can be set by the user. The update of the weight A of the input and hidden layer is carried out according to the following formula:
aij(t+1)=aij(t)+λh·ΔTDt+1·yj(t)·sgn(cj(t))·xi(t)a ij (t+1)=a ij (t)+λ h ∆TD t+1 y j (t) sgn(c j (t)) x i (t)
其中,λh为大于零的数,可由用户自行设置,ΔTDt+1表示对应t+1时刻动作执行后产生的状态空间向量的差分信号,sgn是如下函数:Among them, λ h is a number greater than zero, which can be set by the user. ΔTD t+1 represents the difference signal of the state space vector corresponding to the execution of the action at
如图1所示,差分信号计算网络单元12根据效用拟合网络单元11输出的拟合效用以及状态的立即回报函数R(st)计算得到差分信号ΔTDt。根据瞬时差分算法,ΔTDt是利用下式进行迭代计算得到的:As shown in Figure 1, the differential signal
其中,R(st)是对状态st的立即评价,就是立即回报函数的输出,γ为折扣系数,可由用户自行设置。表示t+1时刻动作执行后产生的状态空间向量st+1所得到的效用拟合值,表示t时刻动作执行后产生的状态空间向量st所得到的效用拟合值。Among them, R(s t ) is the immediate evaluation of the state s t , which is the output of the immediate reward function, and γ is the discount coefficient, which can be set by the user. Indicates the utility fitting value obtained by the state space vector s t+1 generated after the execution of the action at
计算得到的差分信号ΔTDt用于对效用拟合网络单元11以及置信度评价网络单元13的权系数进行训练更新。如果差分信号ΔTDt产生了积极的作用,则应加强这种动作,并且对它的置信度也应加强,即更相信此动作应被选择。另外,差分信号ΔTDt还用来对动作决策网络单元14中动作选择函数的权值进行更新,以保证实现对最优动作的选择。The calculated differential signal ΔTD t is used to train and update the weight coefficients of the utility
如图1所示,在动作决策网络单元14输出动作决策函数时,置信度评价网络单元13要计算输出动作的置信度,该置信度用于对动作选择的调整。置信度评价网络单元13的输入是状态向量xi(t)和yj(t),它们从效用拟合网络单元11的隐层和输出层引出。As shown in FIG. 1 , when the action
置信度p0(t)通过如下公式计算:Confidence p 0 (t) is calculated by the following formula:
其中,权值αi(t)和βj(t)利用下式进行更新:Among them, the weights α i (t) and β j (t) are updated using the following formula:
αi(t+1)=αi(t)+λp·ΔTDt+1·xi(t),i=1,2,3...nα i (t+1)=α i (t)+λ p ΔTD t +1 xi (t), i=1, 2, 3...n
βj(t+1)=βj(t)+λp·ΔTDt+1·yj(t),j=1,2,3...hβ j (t+1)=β j (t)+λ p ΔTD t+1 y j (t), j=1, 2, 3...h
其中,λp表示学习率,是0-1之间的数值,经验值是0.618,用户可以根据自己的经验进行设置。从上式来看,难以保证p0(t)的置信度区间在[0,1],故引入Sigmoid函数对p0(t)进行变换,得到p(t),这样,输出置信度就与随机函数概率相吻合:Among them, λ p represents the learning rate, which is a value between 0 and 1, and the experience value is 0.618. Users can set it according to their own experience. From the above formula, it is difficult to ensure that the confidence interval of p 0 (t) is [0, 1], so the Sigmoid function is introduced to transform p 0 (t) to obtain p(t), so that the output confidence is the same as Random function probabilities coincide:
置信度修正因子a起到平滑学习过程的作用,改变a,就可改变学习对环境的调节范围,若a过大,则会使学习系统失去调节作用,应根据先验知识设定合适的a值,a>0,本发明中a的取值范围是[1,10]。The confidence correction factor a plays a role in smoothing the learning process. Changing a can change the adjustment range of learning to the environment. If a is too large, the learning system will lose its adjustment function. A suitable a should be set according to prior knowledge Value, a > 0, the value range of a in the present invention is [1, 10].
置信度对动作选择的调节作用,反映了决策的不确定性。可以看出,随着状态的效用逐渐趋于真实值,即ΔTDt的增加,置信度p(t)也是逐渐增加的,对动作的选择越来越确定。再利用输出置信度p(t)对动作决策网络单元14的每一个输出动作选择函数进行校正,校正过程在动作校正网络单元15里完成。The moderating effect of confidence on action choice reflects the uncertainty of decision-making. It can be seen that as the utility of the state gradually tends to the true value, that is, the increase of ΔTD t , the confidence p(t) also gradually increases, and the choice of action becomes more and more certain. Reuse the output confidence p (t) to each output action selection function of the action
动作决策网络单元14采用神经网络实现,它共分为四层,如图1所示,第一层到第四层分别是:输入层,模糊子集层,可变节点层和函数输出层,其中,可变节点层也称函数拟合层。分别用h=1,2,3,4表示网络的四层。设分别为第h层的第i个节点的输入和输出,i为每层的节点,其中,第一层节点数为I个,第二层节点数为I*J个,第三层节点数为L个,第四层节点数为K个,I,J,K,L都是正整数。均值mij,方差σij分别为第二层中对应xi(t)输入的第j个节点的高斯隶属函数的位置参数和宽度。Action
动作决策网络单元14的神经网络的输入层,输入量为状态空间向量st归一化得到的xi(t),它表征了输入时刻的机器人态势信息。输入层的第i个节点的输入为:In the input layer of the neural network of the action
模糊子集层用来对输入层的输入变量进行模糊化处理。输出为每一输入向量的隶属度。输入层的每个xi(t)在模糊子集层对应有J个输入,例如图1中,此处的J为2,其中,每个输入就是xi(t)的一个模糊子集,输出是xi(t)在这一模糊子集的隶属度。它的每一节点激活函数为高斯隶属函数,输出为:The fuzzy subset layer is used to fuzzify the input variables of the input layer. The output is the degree of membership for each input vector. Each x i (t) of the input layer corresponds to J inputs in the fuzzy subset layer, for example, in Figure 1, where J is 2, where each input is a fuzzy subset of x i (t), The output is the degree of membership of x i (t) in this fuzzy subset. Its activation function of each node is a Gaussian membership function, and the output is:
其中,为对应于输入xi(t)的第j个输出,,exp是以自然对数e为底的指数函数,xj(t)为输入层的第j个节点的输入。in, is the jth output corresponding to the input x i (t), exp is an exponential function with natural logarithm e as the base, and x j (t) is the input of the jth node of the input layer.
神经网络为满足对于动作函数的拟合,需要在一定程度调整输出,可变节点层用来实现这种调节功能。可变节点层是通过节点数以及连接权值的变化实现调节功能的,节点数以及连接权值利用递阶遗传算法进行优化,动态确定它们的数目以及大小,以满足网络对动作函数的拟合,具体在后面介绍。可变节点层的激活函数为高斯函数,其位置参数与宽度分别为ml和σl。第二层与第三层的连接数也是不确定的,也需要在优化过程中动态调整,连接权值都为1。第三层节点的输出为:In order to satisfy the fitting of the action function, the neural network needs to adjust the output to a certain extent, and the variable node layer is used to realize this adjustment function. The variable node layer realizes the adjustment function through changes in the number of nodes and connection weights. The number of nodes and connection weights are optimized using a hierarchical genetic algorithm, and their number and size are dynamically determined to meet the network's fitting of the action function. , which will be introduced later. The activation function of the variable node layer is a Gaussian function, and its position parameter and width are m l and σ l respectively. The number of connections between the second layer and the third layer is also uncertain and needs to be dynamically adjusted during the optimization process, and the connection weights are all 1. The output of the third layer node is:
节点数目与可选动作数是相同的,函数输出层输出的是对动作函数的拟合值,用来计算得到每个动作的选择概率。第四层节点的输出为:The number of nodes is the same as the number of optional actions, and the output of the function output layer is the fitting value of the action function, which is used to calculate the selection probability of each action. The output of the fourth layer node is:
其中,第四层的输出Ok 4就是动作选择函数 Among them, the output O k 4 of the fourth layer is the action selection function
第三层每个节点与第四层都有连接,ωlk为第三层第l个节点与第四层第k个节点的连接权值,连接权值ωlk也需要在优化过程中动态调整。Every node in the third layer is connected to the fourth layer, ω lk is the connection weight between the lth node in the third layer and the kth node in the fourth layer, and the connection weight ω lk also needs to be dynamically adjusted during the optimization process .
假设网络第一层有I个输入,第i个输入在第二层有ki个模糊划分,则第二层结点数共有k1+k2+...+kI个,节点函数为各输入对于其模糊子集的隶属度函数。总结起来,需要动态调整优化的神经网络结构为:第三层节点数、第二层与第三层的连接数。需调整优化的网络参数为:第二层输入参数隶属函数的位置mij和宽度σij、第三层(隐层)高斯激活函数的位置参数ml与宽度σl以及第三层与第四层的连接权值ωlk。Assuming that the first layer of the network has I inputs, and the i-th input has k i fuzzy divisions in the second layer, then there are k 1 +k 2 +...+k I nodes in the second layer, and the node functions are The membership function of the input for its fuzzy subset. To sum up, the neural network structure that needs to be dynamically adjusted and optimized is: the number of nodes in the third layer, and the number of connections between the second layer and the third layer. The network parameters to be adjusted and optimized are: the position m ij and width σ ij of the membership function of the input parameter of the second layer, the position parameter m l and width σ l of the Gaussian activation function of the third layer (hidden layer), and the third layer and the fourth layer Layer connection weight ω lk .
这里,利用混合递阶遗传算法对动作决策网络中的神经网络的结构和参数进行优化,网络的结构优化为确定第三层节点数、第二层与第三层的连接数。网络的参数优化包括输入向量的隶属度函数位置参数mij和宽度σij、第三层隐节点的高斯函数的位置参数ml与宽度σl以及第三层与第四层的连接权值ωlk。利用递阶遗传算法对神经网络进行优化和调整,使网络在每一轮决策时,根据输入差分信号的变化,不断优化得到动作选择函数,以实现对动作的选择作用。Here, the hybrid hierarchical genetic algorithm is used to optimize the structure and parameters of the neural network in the action decision network. The network structure optimization is to determine the number of nodes in the third layer and the number of connections between the second layer and the third layer. The parameter optimization of the network includes the position parameter m ij and width σ ij of the membership function of the input vector, the position parameter m l and width σ l of the Gaussian function of the hidden node in the third layer, and the connection weight ω between the third layer and the fourth layer lk . The hierarchical genetic algorithm is used to optimize and adjust the neural network, so that the network can continuously optimize the action selection function according to the change of the input differential signal in each round of decision-making, so as to realize the action selection function.
动作校正网络单元15利用置信度评价网络单元13输出的评价值即动作的置信度p(t),对动作选择网络单元14输出的动作选择函数进行校正,然后计算得到每个动作选取的概率值,将概率最大的动作输出。The action
校正过程是以为均值,以p(t)为概率生成一个随机函数,作为新的动作选择函数Aj(st)。p(t)越小,则Aj(st)就越远离反之,则越靠近以新的Aj(st)代替 The calibration process is based on As the mean value, generate a random function with p(t) as the probability, as the new action selection function A j (s t ). The smaller p(t) is, the farther A j (s t ) is from On the contrary, the closer Replace with new A j (s t )
动作选择函数Aj(st)值越大,则对应的动作aj被选择的概率越大。选择概率的计算公式为:The greater the value of the action selection function A j (st t ), the greater the probability that the corresponding action a j will be selected. The formula for calculating the probability of selection is:
则输出为概率值最大的动作。Then the output is the action with the highest probability value.
机器人行为学习模型中,所述动作决策网络单元14还包括4个子单元:编码单元141,种群初始化单元142,适应度函数确定单元143,以及遗传操作单元144,如图2所示。In the robot behavior learning model, the action
编码单元141是对遗传算法的染色体结构进行确定。递阶遗传算法是根据生物染色体的层次结构提出的,生物体中染色体中的基因可分为调节基因与构造基因,调节基因的作用是控制构造基因是否被激活。这里,借鉴生物染色体基因的这种特点,对上述优化问题进行编码。种群中的每个个体由决定网络的结构和参数两部分组成。种群个体的基因结构采用二级递阶结构编码,即按照生物染色体的基因层次结构分两层实现,上层基因实现对第三层节点数量以及第二层输入隶属函数的编码,也就是第三层节点数以及第二层输入隶属函数的参数mij和σij。如图3所示,实现对第三层(隐层)节点数量进行控制的部分称为控制基因,下层为参数基因,实现对第三层(隐层)节点的隶属函数以及网络连接的编码,包括第三层(隐层)节点隶属函数参数ml与σl以及第二层与第三层的连接数,以及第三层与第四层的连接权值ωlk。The encoding unit 141 determines the chromosome structure of the genetic algorithm. Hierarchical genetic algorithm is proposed based on the hierarchical structure of biological chromosomes. Genes in chromosomes in organisms can be divided into regulatory genes and structural genes. The role of regulatory genes is to control whether structural genes are activated. Here, referring to this characteristic of biological chromosome genes, the above optimization problem is coded. Each individual in the population consists of two parts that determine the structure and parameters of the network. The genetic structure of population individuals adopts a two-level hierarchical structure encoding, that is, it is realized in two layers according to the gene hierarchy of biological chromosomes, and the upper layer genes realize the encoding of the number of nodes in the third layer and the input membership function of the second layer, that is, the third layer The number of nodes and the parameters m ij and σ ij of the second layer input membership function. As shown in Figure 3, the part that controls the number of nodes in the third layer (hidden layer) is called the control gene, and the lower layer is the parameter gene, which realizes the membership function of the nodes in the third layer (hidden layer) and the encoding of network connections. Including the third layer (hidden layer) node membership function parameters m l and σ l , the number of connections between the second layer and the third layer, and the connection weight ω lk between the third layer and the fourth layer.
控制基因的隐节点数以及参数基因的表示网络连接的基因均采用二进制编码,用“0”、“1”分别表示“无”和“有”的情况。其他表示隶属函数参数以及连接权值的基因均采用实值编码,即用实数表示。将第三层结构编码为一个二进制串,一位表示第三层一个节点,作为控制基因,“1”表示该节点起作用,“0”表示该节点不起作用。这样,控制基因串中“1”的个数即为起作用的神经网络隐层节点的实际个数。参数基因中,第二、三层连接基因采用二进制编码,“1”表示相应的第二层与第三层有连接,“0”表示相应的第二层与第三层没连接。第三、四层权值基因采用实值编码,表示了第三层与第四层的连接权值。The number of hidden nodes controlling the gene and the gene representing the network connection of the parameter gene are all encoded in binary, and "0" and "1" are used to represent "no" and "yes" respectively. Other genes that represent membership function parameters and connection weights are coded by real values, that is, expressed by real numbers. The third layer structure is coded as a binary string, one bit represents a node in the third layer, as a control gene, "1" indicates that the node works, and "0" indicates that the node does not work. In this way, the number of "1" in the control gene string is the actual number of hidden layer nodes of the neural network. Among the parameter genes, the second and third layer connection genes are coded in binary, "1" indicates that the corresponding second layer is connected to the third layer, and "0" indicates that the corresponding second layer is not connected to the third layer. The weight genes of the third and fourth layers are encoded by real values, which represent the connection weights of the third layer and the fourth layer.
由此可知,控制基因控制着节点的个数,如果某一节点为“0”,则此节点与前后两层都无连接,相应地它所对应的参数基因都是不存在的,可以看出,参数基因由控制基因来控制,如果上层控制基因的某一节点不存在,那么相应的下层参数基因就没有被激活,这正体现了控制基因的控制作用,并且这种控制作用能和网络的拓扑结构相对应。编码而成的一个个染色体构成种群,利用它们完成进化。It can be seen that the control gene controls the number of nodes. If a node is "0", then this node has no connection with the two layers before and after, and accordingly its corresponding parameter genes do not exist. It can be seen that , the parameter gene is controlled by the control gene, if a certain node of the upper control gene does not exist, then the corresponding lower parameter gene will not be activated, which just reflects the control function of the control gene, and this control function can be compared with the network corresponding to the topology. The encoded chromosomes form a population, and they are used to complete evolution.
进一步地,种群初始化单元142是对染色体种群进行初始化。为了顺利进行遗传算法运行,需要在之前产生一定数量的染色体个体,并且这些个体应当是随机产生的,代表了多种网络结构的可能性,即应有足够的求解空间。合适的种群规模对于遗传算法的收敛具有重要意义,种群数量太小难以求得满意的结果,太大则计算复杂,种群规模一般取10~160。Further, the population initialization unit 142 initializes the chromosome population. In order to run the genetic algorithm smoothly, a certain number of chromosome individuals need to be generated before, and these individuals should be randomly generated, representing the possibility of various network structures, that is, there should be enough solution space. Appropriate population size is of great significance to the convergence of genetic algorithm. If the population size is too small, it is difficult to obtain satisfactory results. If the population size is too large, the calculation will be complicated. The population size is generally 10-160.
进一步地,确定染色体的适应度函数单元143。个体的适应度函数采用个体误差和结构的复杂度来表示,在个体误差寻优的同时考虑控制网络的复杂度,从而得到最优的网络结构。网络的适应度函数形式如下:Further, the fitness function unit 143 of the chromosome is determined. The fitness function of the individual is expressed by the individual error and the complexity of the structure, and the complexity of the control network is considered while optimizing the individual error, so as to obtain the optimal network structure. The fitness function of the network has the following form:
其中,E(i),H(i)分别表示第i个个体的个体误差和结构复杂度,其中:Among them, E(i), H(i) represent the individual error and structural complexity of the i-th individual, respectively, where:
H(i)=1+exp[-c(Ni(0))]H(i)=1+exp[-c(N i (0))]
和yij为第i个个体的第j个输出和期望输出,其中,期望输出yij为期望动作的选择函数如果期望输出某个动作,则设它的期望值其他期望动作函数都设为0。Ni(0)为第i个个体的隐层节点为零的数目,c为参数调节因子。其中,b,c为常值,α与β为大于零的常数,α+β=1。利用这样的适应值函数可保证在优化网络权值的同时得到合适的神经网络结构。 and y ij are the jth output and expected output of the i-th individual, where the expected output y ij is the selection function of the expected action If an action is expected to be output, set its expected value All other expected action functions are set to 0. N i (0) is the number of hidden layer nodes of the i-th individual that is zero, and c is the parameter adjustment factor. Wherein, b and c are constant values, α and β are constants greater than zero, and α+β=1. Utilizing such a fitness function can ensure that a suitable neural network structure is obtained while optimizing the network weight.
进一步地,进行遗传操作单元144,遗传操作包括选择、交叉和变异。初始的种群,经过选择、交叉和变异之后,进行了一轮遗传操作,完成了一轮进化,得到了新一代的子种群,并循环这个过程,使得进化不断进行,以使子代收敛到最优。Further, a genetic operation unit 144 is performed, and the genetic operation includes selection, crossover and mutation. The initial population, after selection, crossover, and mutation, undergoes a round of genetic operations, completes a round of evolution, and obtains a new generation of subpopulations, and repeats this process so that the evolution continues, so that the offspring converge to the optimal population. excellent.
选择是从上代种群中,根据个体的适应度,按照一定的规则或方法,选择出一些优良的个体遗传到下一代群体中。算法中采用精英选择的方法进行选择,即根据适应度值大小,每一代种群中最优的个体保留到下一代,这种方式保证了算法的渐进收敛。对于个体i,它的选择概率为:Selection is to select some excellent individuals from the previous generation population and inherit them to the next generation population according to the fitness of individuals and according to certain rules or methods. The algorithm adopts the method of elite selection for selection, that is, according to the size of the fitness value, the best individual in each generation population is retained to the next generation, which ensures the gradual convergence of the algorithm. For individual i, its selection probability is:
其中,fi为个体i的适应度,N为种群的个体数。Among them, fi is the fitness of individual i, and N is the number of individuals in the population.
交叉操作就是随机地使得两个体的基因对应位互换,这个过程反映了随机信息交换,目的在于产生新的基因组合,即产生新的个体。进化到一定程度时,特别是出现大多数个体相同的群体时,交叉是无法产生新的个体的,这时只能靠变异产生新的个体。变异是以一定概率使基因位发生改变,以增加新的搜索空间,也就是说,变异增加了全局优化的特质。在交叉和变异的过程中,随机性起到了重要的作用,只有随机的交叉和变异操作才保证了更新个体的出现,而这种随机性是通过交叉和变异概率表现出来的。The crossover operation is to randomly exchange the corresponding bits of the genes of two individuals. This process reflects random information exchange, and the purpose is to generate new gene combinations, that is, to generate new individuals. When the evolution reaches a certain level, especially when a population with most of the same individuals appears, new individuals cannot be produced by crossover, and new individuals can only be produced by mutation. Mutation is to change the gene position with a certain probability to increase a new search space, that is to say, mutation increases the characteristics of global optimization. In the process of crossover and mutation, randomness plays an important role. Only random crossover and mutation operations can guarantee the appearance of updated individuals, and this randomness is expressed through the probability of crossover and mutation.
在遗传操作过程中,交叉概率和变异概率对遗传算法的性能有很大影响。如果在遗传算法(Genetic Algorithm,简称GA)运行初期,将交叉概率选大,变异概率选小,可以加快算法的收敛速度,有利于搜索最优解。但随着搜索的进行,就需要降低交叉概率增加变异概率,以至算法不易陷入局部极值,能搜索新的解。In the process of genetic operation, crossover probability and mutation probability have great influence on the performance of genetic algorithm. If the crossover probability is selected to be large and the mutation probability is selected to be small at the initial stage of the Genetic Algorithm (GA) operation, the convergence speed of the algorithm can be accelerated, which is conducive to searching for the optimal solution. However, as the search progresses, it is necessary to reduce the crossover probability and increase the mutation probability, so that the algorithm is not easy to fall into the local extremum and can search for new solutions.
同时变异概率不能取得太大,否则算法将难以收敛以及破坏最优解的基因。对于适应度高的解,取较低的交叉概率和变异概率,使其有较大的机会进入到下一代;而对于适应度较低的解,应取较高的交叉概率和变异概率,使其尽快被淘汰掉;当成熟收敛发生时,应加大交叉概率和变异概率,以加快新个体的产生。按照以上的交叉和变异概率的选取原则,采用一种自适应的交叉概率和变异概率的方法,其计算公式为:At the same time, the mutation probability cannot be too large, otherwise the algorithm will be difficult to converge and destroy the genes of the optimal solution. For solutions with high fitness, lower crossover probability and mutation probability should be chosen to make it more likely to enter the next generation; for solutions with lower fitness, higher crossover probability and mutation probability should be chosen so that It is eliminated as soon as possible; when mature convergence occurs, the probability of crossover and mutation should be increased to speed up the generation of new individuals. According to the above selection principles of crossover and mutation probability, an adaptive method of crossover probability and mutation probability is adopted, and its calculation formula is:
其中,pc为交叉概率,pm为变异概率。fmax为群体中的最大适应度,favg为平均适应度,f是交叉的两个个体中较大的适应度,f′为变异个体的适应度。Among them, p c is the crossover probability, and p m is the mutation probability. f max is the maximum fitness in the population, f avg is the average fitness, f is the greater fitness of the two crossed individuals, and f' is the fitness of the mutant individual.
该方法在进化空间较大时,能够快速找到最优解;在收敛到局部最优解附近,增加群体的多样性。可以看出适应度最大的个体变异概率为零,适应度较大的个体交叉和变异概率都很小,这样保护了优良个体。而适应度较小的个体交叉和变异概率都很大,需不断破坏它。This method can quickly find the optimal solution when the evolution space is large; it can increase the diversity of the population when it converges to the local optimal solution. It can be seen that the mutation probability of the individual with the greatest fitness is zero, and the probability of crossover and mutation of the individual with greater fitness is very small, which protects the excellent individuals. However, individuals with low fitness have a high probability of crossover and mutation, so it needs to be constantly destroyed.
按照交叉概率在选中的两个个体之间进行交叉操作,交叉操作分别对控制基因以及参数基因的相对应部分进行操作,如图4所示。这样的交叉操作能使两个染色体的对应基因进行交叉,也保证了二进制编码和实数编码基因的对应交叉。两个染色体对应位的交叉采用单点交叉,随机地选择两个个体的相同位置,在选中的位置进行基因的互换操作。According to the crossover probability, the crossover operation is performed between the two selected individuals, and the crossover operation is performed on the corresponding parts of the control gene and the parameter gene, as shown in Figure 4. Such a crossover operation can make the corresponding genes of the two chromosomes to be crossed over, and also ensures the corresponding crossover of the binary-coded and real-coded genes. The crossover of the corresponding positions of two chromosomes adopts single-point crossover, randomly selects the same position of two individuals, and performs gene exchange operation at the selected position.
变异操作包含对所有基因的操作,对控制基因以及参数基因中的二进制编码基因,采用位变异,进行逻辑取反操作,即把“1”变为“0”,把“0”变为“1”。对于实值编码的基因进行线形组合的高斯变异:The mutation operation includes the operation of all genes. For the control gene and the binary coded gene in the parameter gene, the bit mutation is used to perform the logical inversion operation, that is, to change "1" into "0" and "0" into "1" ". Gaussian mutation for linear combinations of real-valued coded genes:
其中,α为进化率,f为每个个体的适应度,N(0,1)为期望为0,标准差为1的正态分布随机函数。Among them, α is the evolution rate, f is the fitness of each individual, N(0,1) is a normal distribution random function with
综上所述,递阶遗传算法实现神经网络优化的算法步骤为如下:To sum up, the algorithm steps of hierarchical genetic algorithm to realize neural network optimization are as follows:
1.对网络结构和参数按照递阶结构进行编码,生成染色体个体。1. Encode the network structure and parameters according to the hierarchical structure to generate chromosome individuals.
2.随机生成2N个初始染色体种群,进化代数设为t=0。2. Randomly generate 2N initial chromosome populations, and set the evolution algebra to t=0.
3.根据公式计算每个个体的适应度值及种群中最大适应度值和平均适应度值。3. Calculate the fitness value of each individual, the maximum fitness value and the average fitness value in the population according to the formula.
4.按照个体选择概率在种群中选择N个个体作为父代,令t=t+1。4. Select N individuals in the population as parents according to the individual selection probability, let t=
5.从父代中随机选择两个个体,按照交叉概率进行交叉操作。如果交叉,则首先复制两个体,原个体保留。用复制的个体进行交叉操作,产生两个新个体。直到父代种群都交叉完毕。5. Randomly select two individuals from the parent generation, and perform the crossover operation according to the crossover probability. If it crosses, first copy the two bodies, and keep the original body. Perform crossover operation with duplicated individuals to generate two new individuals. Until the crossover of the parent population is completed.
6.对所有个体按照变异概率进行变异操作。6. Perform mutation operations on all individuals according to the mutation probability.
7.当最优个体的适应度和群体适应度达到给定的阀值时,或者达到最大进化代数,则算法的迭代过程收敛、算法结束。否则转3继续执行,直至满足结束条件。7. When the fitness of the optimal individual and the fitness of the group reach a given threshold, or reach the maximum evolution algebra, the iterative process of the algorithm converges and the algorithm ends. Otherwise, go to 3 and continue to execute until the end condition is met.
优化结束后,取最优个体的网络结构和参数作为决策网络,利用它实现动作决策的计算。After the optimization is finished, the network structure and parameters of the optimal individual are taken as the decision-making network, and it is used to realize the calculation of action decision-making.
在动作决策网络单元14中,用递阶遗传算法来优化网络的结构和参数。在每一个新的态势出现后,首先利用瞬时差分算法(Temporal-Difference method,TD)所提供的差分信号ΔTDt来对动作选择网络进行参数更新,以期得到更有利的可选动作。具体地说,它是利用差分信号ΔTDt,通过对种群中的染色体每个参数基因中的第三层与第四层连接权值进行更新,之后再进行遗传操作。这样对应这个动作函数的权值空间都进行了更新,经遗传得到的对应动作的新权值也应该是更大的,能够反映对此最优动作的学习。差分信号对于连接权值的更新过程为:In the action
其中,ωij为第三层第i个隐节点与第四层第j个动作选择函数的连接权值,是加权系数,是0-1之间的数值,经验值是0.62。Among them, ω ij is the connection weight of the i-th hidden node in the third layer and the j-th action selection function in the fourth layer, It is a weighting coefficient, a value between 0-1, and the empirical value is 0.62.
本实施例利用递阶遗传算法对神经网络进行训练,实现知识学习。解决了现有技术中行为决策研究中较多是基于特定知识或规则的反应式方式,较好地解决了机器人行为决策的知识获取,推理决策问题,主体通过与环境交互学习逼近知识的完备性,具有较高层次的学习和推理能力。In this embodiment, a hierarchical genetic algorithm is used to train the neural network to realize knowledge learning. It solves the reactive way in which behavior decision-making research in the prior art is mostly based on specific knowledge or rules, and better solves the knowledge acquisition and reasoning decision-making problems of robot behavior decision-making. The subject approaches the completeness of knowledge through interactive learning with the environment , with higher-level learning and reasoning abilities.
图5为本发明学习模型第二实施例中在线决策过程的示意图。离线学习之后,最后得到的动作决策网络单元14为最优的,使用该动作决策网络单元14用于实时的在线决策。而其他,如效用拟合网络单元11、差分信号计算网络单元12、置信度评价网络单元13和动作校正网络单元15在在线决策过程中都去掉,不再使用。动作决策网络单元14根据选择的动作at经动作执行单元16执行后的状态空间向量st进行计算并得出输出动作选择函数通过动作选择器输出最终选择的动作,该动作经动作执行单元16执行后得到的状态空间向量再输入给动作决策网络单元14。Fig. 5 is a schematic diagram of the online decision-making process in the second embodiment of the learning model of the present invention. After offline learning, the finally obtained action
本实施例利用训练得到的神经网络,进行机器人的行为实时决策。学习过程与决策过程的分离,保证了在线决策的效率,满足实时运行的需要。In this embodiment, the trained neural network is used to make real-time decision-making on the behavior of the robot. The separation of learning process and decision-making process ensures the efficiency of online decision-making and meets the needs of real-time operation.
Claims (1)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN 201010564142 CN102063640B (en) | 2010-11-29 | 2010-11-29 | Robot Behavior Learning Model Based on Utility Difference Network |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN 201010564142 CN102063640B (en) | 2010-11-29 | 2010-11-29 | Robot Behavior Learning Model Based on Utility Difference Network |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN102063640A CN102063640A (en) | 2011-05-18 |
| CN102063640B true CN102063640B (en) | 2013-01-30 |
Family
ID=43998910
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN 201010564142 Expired - Fee Related CN102063640B (en) | 2010-11-29 | 2010-11-29 | Robot Behavior Learning Model Based on Utility Difference Network |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN102063640B (en) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102402712B (en) * | 2011-08-31 | 2014-03-05 | 山东大学 | A Neural Network-Based Initialization Method for Robot Reinforcement Learning |
| CN107972026B (en) * | 2016-10-25 | 2021-05-04 | 河北亿超机械制造股份有限公司 | Robot, mechanical arm and control method and device thereof |
| CN108229640B (en) * | 2016-12-22 | 2021-08-20 | 山西翼天下智能科技有限公司 | Emotional expression method, device and robot |
| CN110705682B (en) * | 2019-09-30 | 2023-01-17 | 北京工业大学 | A system and method for predicting robot behavior based on multi-layer neural network |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5129039A (en) * | 1988-09-17 | 1992-07-07 | Sony Corporation | Recurrent neural network with variable size intermediate layer |
| CN1372506A (en) * | 2000-03-24 | 2002-10-02 | 索尼公司 | Robotic device behavior determination method and robotic device |
| JP3412700B2 (en) * | 1993-06-28 | 2003-06-03 | 日本電信電話株式会社 | Neural network type pattern learning method and pattern processing device |
-
2010
- 2010-11-29 CN CN 201010564142 patent/CN102063640B/en not_active Expired - Fee Related
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5129039A (en) * | 1988-09-17 | 1992-07-07 | Sony Corporation | Recurrent neural network with variable size intermediate layer |
| JP3412700B2 (en) * | 1993-06-28 | 2003-06-03 | 日本電信電話株式会社 | Neural network type pattern learning method and pattern processing device |
| CN1372506A (en) * | 2000-03-24 | 2002-10-02 | 索尼公司 | Robotic device behavior determination method and robotic device |
Non-Patent Citations (1)
| Title |
|---|
| JP特许3412700B2 2003.03.28 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN102063640A (en) | 2011-05-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Khan | Neural fuzzy based intelligent systems and applications | |
| CN104133372B (en) | Room temperature control algolithm based on fuzzy neural network | |
| CN113687654A (en) | A neural network training method and path planning method based on evolutionary algorithm | |
| CN101599138A (en) | Land evaluation method based on artificial neural network | |
| CN111203887A (en) | Robot control system optimization method based on NSGA-II fuzzy logic reasoning | |
| Khan et al. | Neufuz: Neural network based fuzzy logic design algorithms | |
| CN108594793A (en) | A kind of improved RBF flight control systems fault diagnosis network training method | |
| CN102063640B (en) | Robot Behavior Learning Model Based on Utility Difference Network | |
| CN113780664A (en) | Time sequence prediction method based on TDT-SSA-BP | |
| CN113112021A (en) | Inference algorithm of human-like behavior decision model | |
| Ansari et al. | Parameter tuning of MLP, RBF, and ANFIS models using genetic algorithm in modeling and classification applications | |
| Lee et al. | A genetic algorithm based robust learning credit assignment cerebellar model articulation controller | |
| CN115586801B (en) | Gas blending concentration control method based on improved fuzzy neural network PID | |
| CN114912589B (en) | Image identification method based on full-connection neural network optimization | |
| CN113485099B (en) | Online learning control method of nonlinear discrete time system | |
| CN110598835B (en) | Automatic path-finding method for trolley based on Gaussian variation genetic algorithm optimization neural network | |
| CN114925190A (en) | Mixed inference method based on rule inference and GRU neural network inference | |
| CN105512754A (en) | Conjugate prior-based single-mode distribution estimation optimization method | |
| Grosan et al. | Hybrid intelligent systems | |
| Computing et al. | Group | |
| Ponnambalam et al. | Regulation of Great lakes reservoir systems by a neuro-fuzzy optimization model | |
| Lin et al. | System identification based on dynamical training for recurrent interval type-2 fuzzy neural network | |
| Almeida et al. | Automatically searching near-optimal artificial neural networks. | |
| Wei et al. | The optimizing of fuzzy control rule based on particle swarm optimization algorithms | |
| Kalaiselvi et al. | Impact of intelligent controller in a multiprocess system using artificial neural network—BPN |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| C17 | Cessation of patent right | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130130 Termination date: 20131129 |