[go: up one dir, main page]

CN113670306A - Unmanned vehicle navigation method based on deep reinforcement learning - Google Patents

Unmanned vehicle navigation method based on deep reinforcement learning Download PDF

Info

Publication number
CN113670306A
CN113670306A CN202010416877.4A CN202010416877A CN113670306A CN 113670306 A CN113670306 A CN 113670306A CN 202010416877 A CN202010416877 A CN 202010416877A CN 113670306 A CN113670306 A CN 113670306A
Authority
CN
China
Prior art keywords
value
unmanned vehicle
depth image
image
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010416877.4A
Other languages
Chinese (zh)
Inventor
卜祥津
许松枝
苗成生
修彩靖
钟国旗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Automobile Group Co Ltd
Original Assignee
Guangzhou Automobile Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Automobile Group Co Ltd filed Critical Guangzhou Automobile Group Co Ltd
Priority to CN202010416877.4A priority Critical patent/CN113670306A/en
Publication of CN113670306A publication Critical patent/CN113670306A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • G01C21/206Instruments for performing navigational calculations specially adapted for indoor navigation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

本发明提供一种基于深度强化学习的无人车的导航方法,包括通过无人车上的深度相机获取深度图像,对获取的深度图像进行采样,再进行二次线性插值处理组成深度图像矩阵;在深度图像矩阵中,通过无人车的轮速里程计计算与起点的相对定位,形成代表无人车状态的第二深度图像矩阵;对第二深度图像矩阵内的逐个数值进行比较,计算出某个值的最小值,与设定的阈值进行对比,大于设定的阈值时,通过运动学的方式控制无人车的运动,小于设定的阈值时,则将第二深度图像输入深度学习网络中,随机或是依据深度学习网络决定下一个动作。本发明能使网络学习效率更高,误差收敛的值更小,使得实现未知环境的避障效果更好,提高地图的采集效率。

Figure 202010416877

The invention provides a navigation method for an unmanned vehicle based on deep reinforcement learning, which comprises obtaining a depth image through a depth camera on the unmanned vehicle, sampling the obtained depth image, and then performing quadratic linear interpolation processing to form a depth image matrix; In the depth image matrix, the relative positioning of the starting point is calculated by the wheel speed odometer of the unmanned vehicle to form a second depth image matrix representing the state of the unmanned vehicle; the values in the second depth image matrix are compared one by one to calculate The minimum value of a certain value is compared with the set threshold. When it is greater than the set threshold, the motion of the unmanned vehicle is controlled by kinematics. When it is less than the set threshold, the second depth image is input into deep learning In the network, the next action is determined randomly or according to the deep learning network. The invention can make the network learning more efficient, and the error convergence value is smaller, so that the obstacle avoidance effect of the unknown environment is better, and the map collection efficiency is improved.

Figure 202010416877

Description

Unmanned vehicle navigation method based on deep reinforcement learning
Technical Field
The invention relates to the technical field of unmanned vehicles, in particular to a navigation method of an unmanned vehicle based on deep reinforcement learning.
Background
Navigation of unmanned vehicles is a variety of techniques for avoiding obstacles and reaching a target location, however navigating in an unknown environment is much more difficult than in known environments, where the robot's motion relies heavily on data collected from sensors and the efficiency of algorithms to find a good path, and sensors located on the mobile robot will help detect obstacles and map the environment during navigation to navigate to the target location. When the deep reinforcement learning method is applied to the research of the navigation problem, a successful training model is highly dependent on the training set information, so that the acquisition of training data inevitably needs a large amount of time, and how to effectively reduce the training time also becomes the research focus in the field.
The existing invention takes vision as input and has two obvious defects, 1. the influence of illumination is large, and the image recognition effect under different illumination conditions is poor; 2. compared with the input of a depth image, the adaptive capacity to the environment is poor, and the universality from a training environment to another unknown environment is poor; the required training time is longer.
Disclosure of Invention
The embodiment of the invention provides a deep reinforcement learning-based unmanned vehicle navigation method, which aims to solve the technical problems that the existing unmanned vehicle navigation method is poor in adaptability to the environment, poor in universality from a training environment to another unknown environment and long in required training time.
In one aspect of the present invention, a method for navigating an unmanned vehicle based on deep reinforcement learning is provided, including:
step S1, obtaining a Depth image through an RGBD Depth camera on the unmanned vehicle, sampling the obtained Depth image to obtain an image with the resolution of 160 × 120, then performing secondary linear interpolation processing to obtain a Depth image with the size of 80 × 1, and forming a Depth image matrix by all Depth images with the size of 80 × 1;
step S2, calculating the relative positioning between the wheel speed odometer of the unmanned vehicle and the starting point in the depth image matrix, taking the positioning x coordinate as the first row of the second depth image and the y coordinate as the first column of the second depth image, and further integrating the second depth image to form a second depth image matrix representing the state of the unmanned vehicle;
and step S3, comparing the values in the second depth image matrix one by one, calculating the minimum value of a certain value in the second depth pixel matrix by using a quick sorting algorithm, comparing the minimum value with a set threshold value, controlling the motion of the unmanned vehicle in a kinematic mode when the minimum value is larger than the set threshold value, inputting the second depth image into a deep learning network when the minimum value is smaller than the set threshold value, constructing a Markov state space, deciding the next action randomly or according to the deep learning network, and comparing the minimum value with the threshold value again until the minimum value is larger than the set threshold value.
Further, in step S1, the specific process of sampling the acquired depth image to obtain an image with a resolution of 160 × 120 and then performing secondary linear interpolation to obtain a depth image with a size of 80 × 1 includes smoothing the image by using a gaussian pyramid algorithm, retaining all boundary features of the image, obtaining an image with a resolution of 160 × 120 by gradient down-sampling, and then processing the down-sampled 160 × 120 image by using an image secondary linear interpolation method to obtain a depth image with a size of 80 × 80 1.
Further, in the present invention,
the image secondary linear interpolation method is used for processing the 160 × 120 image after the down sampling to obtain the depth image with the size of 80 × 1, and the specific process is that the pixels are linearly interpolated in one direction in the image matrix according to the following formula, and then the pixels are linearly interpolated in the other direction:
Figure BDA0002493097060000021
wherein x is the coordinate coefficient of the pixel on the x axis in the image matrix, and y is the coordinate coefficient of the pixel on the y axis in the image matrix.
Further, in step S3, the set threshold is adjusted according to the actual vehicle speed, and the set threshold is adjusted to be larger when the turning radius of the unmanned vehicle is larger, and to be smaller when the turning radius of the unmanned vehicle is smaller; if the set threshold is too large, the training time becomes long, and if the set threshold is too small, the vehicle collides with an obstacle.
Further, in the present invention,
the specific calculation process for controlling the motion of the unmanned vehicle in a kinematic constraint mode is calculated according to the following formula:
Figure BDA0002493097060000022
wherein x isgAnd ygThe coordinate of the target point in a Cartesian coordinate system is shown, and K is a first scale coefficient;
ω=KωgΘθ)
wherein, thetagIs the direction of the target point, theta is the direction of the current point, theta is the difference between the two angles of the target point and the current point, KωIs the second scaling factor.
Further, the method can be used for preparing a novel material
In step S3, the deep learning network is a convolutional neural network including four convolutional layers and two fully-connected layers, and the deep learning network is applied to a policy function pi according to the following formulaθ(s, a) performing gradient descent processing:
Figure BDA0002493097060000023
wherein theta is a parameter of the neural network, A(s) is an advantage function for evaluating strategy gradient updating, and pi is a circumferential rate;
the deep learning network evaluates the function V (s, theta) according to the following formulav) Performing gradient descent treatment:
Figure BDA0002493097060000024
wherein, R is the corresponding reward value, gamma is the greedy coefficient, V is the state value function, and V is the speed value of the unmanned vehicle.
Further, the reward penalty value R is a penalty value close to the obstacle, the final penalty value of a single round is the sum of all penalty values, and the penalty value specifically includes a collision penalty value, a straight-going or turning penalty value, a driving penalty value towards a target point, a deviation penalty value from the target point, and a penalty value close to the obstacle.
Further, in the present invention,
the punishment value of the straight line or the curve is calculated according to the following formula:
(0.1*v)/(|ω|+0.1)
wherein v is the velocity value of the unmanned vehicle, and omega is the angular velocity of the unmanned vehicle;
the penalty value for approaching an obstacle is calculated according to the following formula:
-1/(x-0.4)
where x · is the minimum value within the second depth image matrix.
Further, in step S3, the markov state space is composed of a plurality of arrays, and each array at least includes current state data of the unmanned vehicle, current motion data of the unmanned vehicle, reward value data corresponding to the current unmanned vehicle, and next state data of the unmanned vehicle.
Further, in step S1, the method further includes preprocessing the depth image to reduce the bright and dark dot noise between black and white in the image, where the preprocessing includes at least median filtering, image cropping, and fast-marching restoration.
In summary, the embodiment of the invention has the following beneficial effects:
according to the navigation method of the unmanned vehicle based on the deep reinforcement learning, the state space construction of the robot in the early stage is optimized by combining a kinematic constraint model, and under the same training time, the state space constructed based on the training mode provided by the text is more reasonable and effective, so that the network learning efficiency is higher, the error convergence value is smaller, and the obstacle avoidance effect of the unknown environment is better;
the problem of unmanned vehicle navigation in an unknown environment is solved, and an end-to-end motion decision navigation mode using a map is omitted; meanwhile, the invention is used for drawing construction work in an unknown environment, so that the trouble of manually controlling equipment to collect the map is avoided, and the map collecting efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is within the scope of the present invention for those skilled in the art to obtain other drawings based on the drawings without inventive exercise.
Fig. 1 is a schematic diagram of a motion decision model of the depth-enhanced learning-based unmanned vehicle navigation method provided by the invention.
Fig. 2 is a main flow diagram of the navigation method of the unmanned vehicle based on deep reinforcement learning according to the present invention.
Fig. 3 is a logic diagram of the navigation method of the unmanned vehicle based on deep reinforcement learning according to the present invention.
Fig. 4 is a reward and punishment rule chart of the unmanned vehicle navigation method based on deep reinforcement learning provided by the invention.
Fig. 5 is a schematic top view of a training environment of the deep reinforcement learning-based unmanned vehicle navigation method provided by the invention.
Fig. 6 is a schematic diagram of an error value curve of the navigation method of the unmanned vehicle based on deep reinforcement learning according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1, the navigation method of the unmanned vehicle based on deep reinforcement learning provided by the invention provides a training mode based on minimum depth-of-field information, and optimizes the state space construction of the robot in an early stage by combining a kinematics constraint model, that is, reduces training time by means of artificial guidance. Under the same training time, the state space constructed based on the training mode provided by the text is more reasonable and effective, the network learning efficiency is higher, the error convergence value is smaller, and the obstacle avoidance effect of realizing an unknown environment is better; the method overcomes the limitation that the DQN algorithm only can enable the robot to output limited execution actions, and enables the robot to output the execution actions in continuous speed and corner numerical value intervals.
Fig. 2 is a schematic diagram of an embodiment of a deep reinforcement learning-based unmanned vehicle navigation method according to the present invention. In this embodiment, the method comprises the steps of:
step S1, obtaining a Depth image through an RGBD Depth camera on the unmanned vehicle, sampling the obtained Depth image to obtain an image with the resolution of 160 × 120, then performing secondary linear interpolation processing to obtain a Depth image with the size of 80 × 1, and forming a Depth image matrix by all Depth images with the size of 80 × 1;
in a specific embodiment, the specific process of sampling the acquired depth image to obtain an image with a resolution of 160 × 120 and then performing secondary linear interpolation to obtain a depth image with a size of 80 × 1 includes the steps of smoothing the image by using a gaussian pyramid algorithm, retaining all boundary characteristic values of the image, and obtaining an image with a resolution of 160 × 120 through gradient down-sampling; the Gaussian pyramid is an existing algorithm, is commonly used in image downsampling, and can smoothly process an image on the premise of better retaining the characteristics of the image; the boundary characteristic value is a characteristic point in computer vision, refers to places with sharp changes of corners and textures and the like in an image, and particularly refers to a pixel with a large first-order derivative in a pixel matrix, and can refer to an SIFT operator detection algorithm; and then processing the 160 × 120 image after the down sampling by an image quadratic linear interpolation method to obtain a depth image with the size of 80 × 1 as an observed state, wherein the larger the size, the more the GPU memory is spent in the deep learning process, the longer the learning time is required, but the smaller the size, the boundary information in the image cannot be fully reserved, and the learning result is influenced. In one embodiment, this value may be set or changed based on the GPU capability of the computer.
Specifically, the processing of the down-sampled 160 × 120 image by the image quadratic linear interpolation method specifically includes performing linear interpolation on pixels in one direction in the image matrix according to the following formula, and then performing linear interpolation in the other direction:
Figure BDA0002493097060000041
wherein x is the coordinate coefficient of the pixel on the x axis in the image matrix, and y is the coordinate coefficient of the pixel on the y axis in the image matrix.
Step S2, calculating the relative positioning between the wheel speed odometer of the unmanned vehicle and the starting point in the depth image matrix, taking the positioning x coordinate as the first row of the second depth image and the y coordinate as the first column of the second depth image, and further integrating the second depth image to form a second depth image matrix representing the state of the unmanned vehicle; establishing coordinate conversion, forming unified coordinate conversion by corresponding coordinates in an actual environment with acquired image coordinates, adjusting the position of a camera according to the position relation of the unmanned vehicle in an actual space, and converting the position of the camera into global coordinates of a map; and finally, calibrating the positioning precision of the wheel speed odometer, realizing the positioning unification of the actual space position and the position in the image, and improving the precision.
And step S3, comparing the values in the second depth image matrix one by one, calculating the minimum value of a certain value in the second depth pixel matrix by using a quick sorting algorithm, comparing the minimum value with a set threshold value, controlling the motion of the unmanned vehicle in a kinematic mode when the minimum value is larger than the set threshold value, inputting the second depth image into a deep learning network when the minimum value is smaller than the set threshold value, constructing a Markov state space, deciding the next action randomly or according to the deep learning network, and repeatedly comparing the minimum value with the threshold value until the minimum value is larger than the set threshold value.
In a specific embodiment, the unmanned vehicle state is obtained through the previous processing of the image, and the unmanned vehicle state comprises positioning and Depth images. The training speed of the model is improved based on the selected training mode of the minimum value of the depth image, as shown in fig. 3, the minimum value of a certain value in the second depth pixel matrix is calculated through value-by-value comparison, the existing mature algorithm is used, and the specific method can refer to a quick sequencing algorithm and the like; when the minimum value is greater than the previously set threshold value, specifically 0.7m in this embodiment, the motion of the robot is controlled in a point-to-point kinematics constraint manner, so that the robot smoothly moves to a target point, in the moving process, once the minimum value in the depth image is less than the threshold value, the depth image is input into a deep learning network, a markov state space is constructed, the next action is determined randomly or according to the network, if the minimum value is greater than the threshold value again, the next action of the robot is constrained again in kinematics, and the process is repeated in a circulating manner;
specifically, the controlling the motion of the unmanned vehicle in the kinematic constraint mode is to control the robot to move smoothly from the current point to the target point in the kinematic constraint mode, and further calculate the motion parameters from the target point to the current point according to the following formula:
Figure BDA0002493097060000051
wherein x isgAnd ygThe coordinate of the target point in a Cartesian coordinate system is defined, K is a first scale coefficient, the parameter is used for calibration due to different kinematic parameters of the unmanned vehicle platform, specific values can be adjusted and calibrated according to specific application conditions, and v is the movement speed of the unmanned vehicle;
ω=KωgΘθ)
wherein, thetagIs the direction of the target point, theta is the direction of the current point, theta is the difference between two angles of the target point and the current point, and belongs to (-pi, pi)]A value of (A), KωAnd omega is the angular velocity of the unmanned vehicle motion for a second proportionality coefficient calibrated in a specific experiment or an embodiment.
Specifically, the minimum depth threshold value needs to be set according to the actual vehicle speed, and if the turning radius of the unmanned vehicle is increased, the value needs to be properly increased, and vice versa; if the setting is too large, it may result in a longer training time, and if it is too small, it may result in a collision with an obstacle, which is also a very important place in the later inspection.
In this embodiment, the purpose of inputting the second depth image into the deep learning network is to perform learning by using the A3C algorithm, and the neural network established by the present invention is used to perform the policy function piθ(s, a) and an evaluation function V (s, θ)v) So as to evaluate whether the decision in the invention is reasonable;
the deep learning network is a convolutional neural network comprising four convolutional layers and two fully-connected layers, and parameters such as the number of built layers, the learning rate, the greedy learning rate and the like of the neural network need to be controlled; determining the number of layers of the convolutional neural network according to the size of the processed image, namely the depth image of 80 × 1, and building a four-layer convolutional neural network to better extract the image details of each layer; the learning rate can not be set too low or too high, too long learning time of primary school can be too long, too high learning time can cause convergence to local optimum, and the learning rate is adjusted to 10 according to the actual learning process-6
The deep learning network is used for strategy function pi according to the following formulaθ(s, a) performing gradient descent processing:
Figure BDA0002493097060000061
wherein theta is a parameter of the neural network, A(s) is an advantage function for evaluating strategy gradient updating, and pi is a circumferential rate;
the deep learning networkEvaluating function V (s, theta) according to the following formulav) Performing gradient descent treatment:
Figure BDA0002493097060000062
wherein, R is the corresponding reward value, gamma is the greedy coefficient, V is the state value function, and V is the speed value of the unmanned vehicle.
In this embodiment, when learning is performed in a deep learning network, a reward penalty value, that is, a reward value, is introduced, and a specific reward penalty rule is shown in fig. 4, where v is a speed range of [0.1,0.6] of the robot, ω is an angular speed of the robot, and is a value range of [ -1,1], and the reward value is larger when the unmanned vehicle moves straight, and the reward value when turning is performed is smaller; when the minimum value x · below 0.7m of the depth image is detected, a reward penalty value R close to the obstacle, that is, a penalty value close to the obstacle, is considered, and the final penalty value of the single round is the sum of all penalty values, where the penalty values specifically include a collision penalty value, a straight-going or turning penalty value, a penalty value driving to the target point, a penalty value deviating from the target point, and a penalty value close to the obstacle.
The impact has a penalty of-20, the direction to the target point has a penalty of 4, the deviation from the target point has a penalty of-2,
the punishment value of the straight line or the curve is calculated according to the following formula:
(0.1*v)/(|ω|+0.1)
wherein v is the velocity value of the unmanned vehicle, and omega is the angular velocity of the unmanned vehicle;
the penalty value for approaching an obstacle is calculated according to the following formula:
-1/(x-0.4)
where x · is the minimum value within the second depth image matrix.
In this embodiment, the markov state space is composed of a plurality of arrays, a single array at least includes current state data of the unmanned vehicle, current motion data of the unmanned vehicle, current reward value data corresponding to the unmanned vehicle, and next state data of the unmanned vehicle, and the markov state space provides a data set for model training.
According to the navigation method of the unmanned vehicle based on deep reinforcement learning, in order to enable the mobile robot to obtain better obstacle avoidance capability, a simulation training environment needing to be designed should have a certain complexity. The environment should include narrow passable road sections, walls, barriers with edges and smooth barriers, as shown in fig. 5, so that model learning needs to be performed in a training environment to accumulate sufficient data, which can improve decision speed in practical application, but navigation can be realized if the training strategy is used in the actual navigation process; the training strategy specifically includes performing accumulated training with a training amount of about 3 thousands of steps and a training time of about 6 hours, respectively, and a final error value curve pair is shown in fig. 6, where a horizontal coordinate axis represents iteration times and a vertical coordinate axis represents an error value, where a curve 1 (direct training) is a result obtained by filtering a curve 2 (training rule of the present invention) by using median average.
In this embodiment, in order to verify the training effect, 7 points may be designated in the test environment for navigation, the robot sequentially passes through positions 1 to 7 in a kinematic constraint manner, and when the robot is too close (less than 0.6m) to an obstacle, obstacle avoidance control is performed by using the model in the deep reinforcement learning network of the present invention, so as to search an unknown environment and pass through the path planning capability of at least two training modes.
According to the depth reinforcement learning-based unmanned vehicle navigation method, if unmanned vehicle navigation is performed in an indoor environment, in order to improve the effect and feasibility of an algorithm, the depth image acquired by a real depth camera needs to be preprocessed in step S1 by considering that the depth image has some black and white alternating bright and dark point noises; then, step S2 is performed to establish coordinate transformation, and the position of the camera is transformed into the global coordinate of the map; calibrating the positioning precision of the wheel speed odometer; and putting the trained model into a real unmanned vehicle for navigation in a real environment.
In summary, the embodiment of the invention has the following beneficial effects:
according to the navigation method of the unmanned vehicle based on the deep reinforcement learning, the state space construction of the robot in the early stage is optimized by combining a kinematic constraint model, and under the same training time, the state space constructed based on the training mode provided by the text is more reasonable and effective, so that the network learning efficiency is higher, the error convergence value is smaller, and the obstacle avoidance effect of the unknown environment is better;
the problem of unmanned vehicle navigation in an unknown environment is solved, and an end-to-end motion decision navigation mode using a map is omitted; meanwhile, the invention is used for drawing construction work in an unknown environment, so that the trouble of manually controlling equipment to collect the map is avoided, and the map collecting efficiency is improved.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1.一种基于深度强化学习的无人车的导航方法,其特征在于,包括以下步骤:1. a navigation method for an unmanned vehicle based on deep reinforcement learning, is characterized in that, comprises the following steps: 步骤S1,通过无人车上的RGBD深度相机获取第一深度图像,对获取的第一深度图像进行采样和二次线性插值处理后,获得第一深度图像矩阵;Step S1, obtaining a first depth image through an RGBD depth camera on the unmanned vehicle, and after performing sampling and quadratic linear interpolation processing on the obtained first depth image, a first depth image matrix is obtained; 步骤S2,在所述第一深度图像矩阵中,通过无人车的轮速里程计计算与起点的相对定位,将定位x坐标作为第二深度图像的第一行,y坐标作为第二深度图像的第一列,进一步集合所述第二深度图像形成代表无人车状态的第二深度图像矩阵;Step S2, in the first depth image matrix, the relative positioning to the starting point is calculated by the wheel speed odometer of the unmanned vehicle, and the positioning x coordinate is used as the first row of the second depth image, and the y coordinate is used as the second depth image. The first column of , further gather the second depth images to form a second depth image matrix representing the state of the unmanned vehicle; 步骤S3,遍历第二深度图像矩阵,计算出第二深度像素矩阵中的某个值的最小值,将最小值与设定的阈值进行对比,当最小值大于设定的阈值时,通过运动学的方式控制无人车的运动,当最小值小于设定的阈值时,则将第二深度图像输入深度学习网络中,构建马尔科夫状态空间,随机或是依据深度学习网络决定下一个动作,重复比对最小值与阈值的大小,直到最小值大于设定的阈值。Step S3, traverse the second depth image matrix, calculate the minimum value of a certain value in the second depth pixel matrix, compare the minimum value with the set threshold, and when the minimum value is greater than the set threshold, pass the kinematics method. When the minimum value is less than the set threshold, the second depth image is input into the deep learning network to construct a Markov state space, and the next action is determined randomly or according to the deep learning network. Repeat the comparison between the minimum value and the threshold value until the minimum value is greater than the set threshold value. 2.如权利要求1所述的方法,其特征在于,在步骤S1中,所述对获取的深度图像进行采样,再进行二次线性插值处理获得深度图像具体过程为,利用高斯金字塔算法平滑处理图像,保留图像的所有边界特征值,通过梯度下采样获得分辨率160*120的图像,然后通过图像二次线性插值法处理降采样后的160*120图像,得到尺寸为80*80*1的深度图像。2. The method according to claim 1, characterized in that, in step S1, the depth image obtained is sampled, and then a quadratic linear interpolation process is performed to obtain the depth image. The specific process is, using a Gaussian pyramid algorithm for smoothing processing Image, retain all the boundary eigenvalues of the image, obtain an image with a resolution of 160*120 through gradient downsampling, and then process the downsampled 160*120 image through image quadratic linear interpolation to obtain a size of 80*80*1 depth image. 3.如权利要求2所述的方法,其特征在于,所述图像二次线性插值法处理降采样的具体过程为,根据以下公式对像素在图像矩阵中一个方向上进行线性插值,然后在另一个方向上进行线性插值:3. The method according to claim 2, wherein, the specific process of processing downsampling by the quadratic linear interpolation of the image is, according to the following formula, linearly interpolate the pixels in one direction in the image matrix, and then perform linear interpolation in another direction. Linear interpolation in one direction:
Figure FDA0002493097050000011
Figure FDA0002493097050000011
其中,x为像素在图像矩阵中x轴的坐标系数,y为像素在图像矩阵中y轴的坐标系数。Among them, x is the coordinate coefficient of the pixel on the x-axis in the image matrix, and y is the coordinate coefficient of the pixel on the y-axis in the image matrix.
4.如权利要求1所述的方法,其特征在于,在步骤S3中,所述设定的阈值根据实际车速进行调整,当无人车的转弯半径变大时,则将设定的阈值调大,当无人车的转弯半径变小时,则将设定的阈值调小。4. The method of claim 1, wherein in step S3, the set threshold is adjusted according to the actual vehicle speed, and when the turning radius of the unmanned vehicle becomes larger, the set threshold is adjusted When the turning radius of the unmanned vehicle becomes smaller, the set threshold value is reduced. 5.如权利要求4所述的方法,其特征在于,所述通过运动学约束的方式控制无人车的运动具体为,利用运动学约束的方式控制机器人从当前点平滑地向目标点运动,进一步根据以下公式对目标点到当前点的运动参数进行计算:5. The method according to claim 4, wherein the control of the motion of the unmanned vehicle by means of kinematic constraints is specifically, using the means of kinematic constraints to control the robot to smoothly move from the current point to the target point, Further calculate the motion parameters from the target point to the current point according to the following formula:
Figure FDA0002493097050000012
Figure FDA0002493097050000012
ω=KωgΘθ)ω=K ωg Θθ) 其中,xg和yg为目标点在笛卡尔坐标系中的坐标,K为第一比例系数,v无人车的运动速度;θg为目标点的方向,θ为当前点的方向,Θ为目标点和当前点的两个角度的差,Kω为第二比例系数,ω为无人车运动的角速度。Among them, x g and y g are the coordinates of the target point in the Cartesian coordinate system, K is the first proportional coefficient, v is the speed of the unmanned vehicle; θ g is the direction of the target point, θ is the direction of the current point, Θ is the difference between the two angles between the target point and the current point, K ω is the second proportional coefficient, and ω is the angular velocity of the unmanned vehicle.
6.如权利要求5所述的方法,其特征在于,在步骤S3中,所述深度学习网络为包括四层卷积层和两层全连接层的卷积神经网络,所述深度学习网络根据以下公式对策略函数πθ(,s,a、)进行梯度下降处理:6. The method of claim 5, wherein, in step S3, the deep learning network is a convolutional neural network comprising four convolutional layers and two fully connected layers, and the deep learning network is based on The following formula performs gradient descent on the policy function πθ (,s,a,):
Figure FDA0002493097050000021
Figure FDA0002493097050000021
其中,θ为神经网络的参数,A(s)为评价策略梯度更新的优势函数,π为圆周率;Among them, θ is the parameter of the neural network, A(s) is the advantage function of evaluating the gradient update of the strategy, and π is the pi; 所述深度学习网络根据以下公式对评价函数V(s,θV)的进行梯度下降处理:The deep learning network performs gradient descent on the evaluation function V(s, θV) according to the following formula:
Figure FDA0002493097050000022
Figure FDA0002493097050000022
其中,R为对应的奖励值,γ为贪婪系数,V为状态价值函数,v为无人车的速度值。Among them, R is the corresponding reward value, γ is the greedy coefficient, V is the state value function, and v is the speed value of the unmanned vehicle.
7.如权利要求6所述的方法,其特征在于,所述奖惩值R为靠近障碍物的惩罚值,单回合的最终的惩罚值为所有惩罚值之和,所述惩罚值具体包括碰撞惩罚值、直行或拐弯的惩罚值、驶向目标点的惩罚值、背离目标点的惩罚值、靠近障碍物的惩罚值。7. method as claimed in claim 6 is characterized in that, described reward and punishment value R is the punishment value close to obstacle, the final punishment value of single round is the sum of all punishment values, and described punishment value specifically comprises collision punishment. value, the penalty value for going straight or turning, the penalty value for driving to the target point, the penalty value for deviating from the target point, the penalty value for approaching the obstacle. 8.如权利要求7所述的方法,其特征在于,所述直行或拐弯的惩罚值根据以下公式计算:8. The method of claim 7, wherein the penalty value for going straight or turning is calculated according to the following formula: (0.1*v)/(|ω|+0.1)(0.1*v)/(|ω|+0.1) 其中,v为无人车的速度取值,ω为无人车的角速度;Among them, v is the value of the speed of the unmanned vehicle, and ω is the angular velocity of the unmanned vehicle; 所述靠近障碍物的惩罚值根据以下公式计算:The penalty value for approaching an obstacle is calculated according to the following formula: -1/(x-0.4)-1/(x-0.4) 其中,x为第二深度图像矩阵内的最小值。where x is the minimum value in the second depth image matrix. 9.如权利要求1所述的方法,其特征在于,在步骤S3中,所述马尔科夫状态空间由多个数组组成,单个所述数组内至少包括无人车本次状态数据、无人车本次动作数据、无人车本次对应的奖励值数据、下无人车下次的状态数据。9. The method according to claim 1, characterized in that, in step S3, the Markov state space is composed of a plurality of arrays, and a single array at least includes the current state data of the unmanned vehicle, the unmanned vehicle The action data of the car this time, the reward value data corresponding to the unmanned car this time, and the status data of the next time the unmanned car gets off. 10.如权利要求1所述的方法,其特征在于,在步骤S1中,该方法还包括对深度图像进行预处理以降低图像中黑白相间的亮暗点噪声,所述预处理至少包括中值滤波、图像裁剪、快速行进修复。10. The method according to claim 1, characterized in that, in step S1, the method further comprises performing preprocessing on the depth image to reduce the black and white bright and dark point noise in the image, and the preprocessing includes at least a median value Filtering, image cropping, fast-moving repairs.
CN202010416877.4A 2020-05-15 2020-05-15 Unmanned vehicle navigation method based on deep reinforcement learning Pending CN113670306A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010416877.4A CN113670306A (en) 2020-05-15 2020-05-15 Unmanned vehicle navigation method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010416877.4A CN113670306A (en) 2020-05-15 2020-05-15 Unmanned vehicle navigation method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN113670306A true CN113670306A (en) 2021-11-19

Family

ID=78537863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010416877.4A Pending CN113670306A (en) 2020-05-15 2020-05-15 Unmanned vehicle navigation method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113670306A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867288A (en) * 2011-07-07 2013-01-09 三星电子株式会社 Depth image conversion apparatus and method
CN103854257A (en) * 2012-12-07 2014-06-11 山东财经大学 Depth image enhancement method based on self-adaptation trilateral filtering
CN104537627A (en) * 2015-01-08 2015-04-22 北京交通大学 Depth image post-processing method
CN107703945A (en) * 2017-10-30 2018-02-16 洛阳中科龙网创新科技有限公司 A kind of intelligent farm machinery paths planning method of multiple targets fusion
CN107817798A (en) * 2017-10-30 2018-03-20 洛阳中科龙网创新科技有限公司 A kind of farm machinery barrier-avoiding method based on deep learning system
CN109212973A (en) * 2018-11-05 2019-01-15 北京交通大学 A kind of avoidance obstacle method of the Human Simulating Intelligent Control based on intensified learning
CN110632931A (en) * 2019-10-09 2019-12-31 哈尔滨工程大学 Collision avoidance planning method for mobile robot based on deep reinforcement learning in dynamic environment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867288A (en) * 2011-07-07 2013-01-09 三星电子株式会社 Depth image conversion apparatus and method
CN103854257A (en) * 2012-12-07 2014-06-11 山东财经大学 Depth image enhancement method based on self-adaptation trilateral filtering
CN104537627A (en) * 2015-01-08 2015-04-22 北京交通大学 Depth image post-processing method
CN107703945A (en) * 2017-10-30 2018-02-16 洛阳中科龙网创新科技有限公司 A kind of intelligent farm machinery paths planning method of multiple targets fusion
CN107817798A (en) * 2017-10-30 2018-03-20 洛阳中科龙网创新科技有限公司 A kind of farm machinery barrier-avoiding method based on deep learning system
CN109212973A (en) * 2018-11-05 2019-01-15 北京交通大学 A kind of avoidance obstacle method of the Human Simulating Intelligent Control based on intensified learning
CN110632931A (en) * 2019-10-09 2019-12-31 哈尔滨工程大学 Collision avoidance planning method for mobile robot based on deep reinforcement learning in dynamic environment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
卜祥津: "基于深度强化学习的未知环境下机器人路径规划的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 1, 15 January 2019 (2019-01-15), pages 140 - 1640 *
陈爱军等: "《数字图像处理及其MATLAB实现》", vol. 2008, 31 July 2020, 东北林业大学出版社, pages: 26 - 27 *

Similar Documents

Publication Publication Date Title
KR101372482B1 (en) Method and apparatus of path planning for a mobile robot
CN114237235B (en) Mobile robot obstacle avoidance method based on deep reinforcement learning
CN111469127B (en) Cost map updating method and device, robot and storage medium
CN112835333A (en) A method and system for multi-AGV obstacle avoidance and path planning based on deep reinforcement learning
CN113984080B (en) A Hierarchical Local Path Planning Method Applicable to Large and Complex Scenes
WO2022165614A1 (en) Path construction method and apparatus, terminal, and storage medium
CN115592324A (en) Automatic welding robot control system based on artificial intelligence
CN116203973B (en) Intelligent control system of track AI inspection robot
CN113433937A (en) Heuristic exploration-based layered navigation obstacle avoidance system and layered navigation obstacle avoidance method
CN116337045A (en) High-speed map building navigation method based on karto and teb
CN115755888A (en) AGV obstacle detection system with multi-sensor data fusion and obstacle avoidance method
CN112612267A (en) Automatic driving path planning method and device
CN114661054A (en) Mobile robot path planning and optimizing method based on image processing
CN116954212B (en) Improved D X Lite unmanned ship path planning method facing complex environment
CN111123953A (en) Particle-based mobile robot group under artificial intelligence big data and control method thereof
CN118279876A (en) Automatic obstacle avoidance method and system for cleaning vehicle based on image processing
CN113538620A (en) A SLAM mapping result evaluation method for 2D grid map
CN119085676A (en) A vision-based photovoltaic panel cleaning path planning method
CN114879660B (en) Robot environment sensing method based on target drive
CN119148163B (en) Autonomous navigation method, device and medium of unmanned vehicle in unknown environment
CN120143677A (en) A control method for an underwater cleaning robot
CN116052116A (en) Automatic parking method based on multi-source information perception and end-to-end deep learning
CN115690343A (en) Robot laser radar scanning and mapping method based on visual following
CN113670306A (en) Unmanned vehicle navigation method based on deep reinforcement learning
CN117075158A (en) Position and orientation estimation method and system of unmanned deformable motion platform based on lidar

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211119