[go: up one dir, main page]

CN112949634B - A method for detecting bird nests in railway contact network - Google Patents

A method for detecting bird nests in railway contact network Download PDF

Info

Publication number
CN112949634B
CN112949634B CN202110249738.1A CN202110249738A CN112949634B CN 112949634 B CN112949634 B CN 112949634B CN 202110249738 A CN202110249738 A CN 202110249738A CN 112949634 B CN112949634 B CN 112949634B
Authority
CN
China
Prior art keywords
picture
region
template
interest
bird
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110249738.1A
Other languages
Chinese (zh)
Other versions
CN112949634A (en
Inventor
武斯全
田震
廖开沅
赵宏伟
许华婷
徐嘉勃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN202110249738.1A priority Critical patent/CN112949634B/en
Publication of CN112949634A publication Critical patent/CN112949634A/en
Application granted granted Critical
Publication of CN112949634B publication Critical patent/CN112949634B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method for detecting a railway contact net nest. The method comprises the following steps: obtaining an interest domain picture containing a bird nest region through reverse reasoning according to the picture containing the bird nest, taking the interest domain picture as a template picture, forming a template library according to all the template pictures, and training a second-stage YOLO detector by using the template library; sequentially matching the pictures which do not contain the bird nest with each template picture in the template library to obtain an interest domain picture data set, and training a first-stage YOLO detector by using the interest domain picture data set; the picture to be detected is input into a first-stage YOLO detector after training, the first-stage YOLO detector outputs a region-of-interest picture, the region-of-interest picture is input into a second-stage YOLO detector after training, and the second-stage YOLO detector outputs a bird nest detection result of the picture to be detected. The invention can solve the problem of difficult recognition caused by the lack of obvious characteristics due to less bird nest information in the overhead contact system, and can effectively and automatically recognize and detect the railway overhead contact system.

Description

一种铁路接触网鸟窝检测方法A method for detecting bird nests in railway contact network

技术领域Technical Field

本发明涉及铁路接触网异物检测技术领域,尤其涉及一种铁路接触网鸟窝检测方法。The invention relates to the technical field of railway contact network foreign body detection, and in particular to a railway contact network bird nest detection method.

背景技术Background Art

当前,对于目表示别领域的相关研究已成为计算机视觉检测领域最受关注的热点之一。动态视频中目标的标记主要是通过分析图片传感器所采集到的图片序列,并对图片序列中感兴趣的目标场景进行提取,标记同一目标的像素区域,并对目标的位置、大小和廓形等信息进行识别。典型的目标标记识别方法有目标特征描述、特征信息提取以及目标特征匹配等步骤。通过将所要表示目标的特征信息例如位置、颜色、轮廓、纹理等进行提取,随后依托这些特征信息对于检测目标进行评估,从而判断目标是否能够与特征信息进行匹配从而完成对目标的标注。At present, the research on target identification has become one of the most popular hot topics in the field of computer vision detection. The marking of targets in dynamic videos is mainly achieved by analyzing the image sequence collected by the image sensor, extracting the target scenes of interest in the image sequence, marking the pixel area of the same target, and identifying the position, size and profile of the target. Typical target marking and recognition methods include target feature description, feature information extraction and target feature matching. By extracting the feature information of the target to be represented, such as position, color, contour, texture, etc., and then evaluating the detection target based on these feature information, it is determined whether the target can match the feature information to complete the target labeling.

目前,现有技术中的高速铁路接触网异物的检测方法包括:基于Faster R-CNN检测模型的检测、基于相对位置不变性的检测以及利用HOG(方向梯度直方图,HistogramofOriented Gradient)特征对铁路接触网上的鸟窝进行的检测。其中,基于Faster R-CNN检测模型的检测是引入了一个RPN(Region Proposal Network,区域生成网络),用来对目标物产生候选区域。Faster R-CNN可以看作是由产生目标候选区的RPN网络和利用这些候选区域进行预测分类的Faster R-CNN检测组成。首先输入一张图片,向前传播到最后一个共享卷积层,一方面将得到的特征映射传送到RPN网络,另一方面继续向前传播产生更高维的特征映射。在RPN网络中,经过一系列的处理,产生多个候选区域和候选区域得分,对候选区域框采用非极大值过滤,以减少候选区数量。将大于设定阈值的候选区域和之前产生的高维特征映射一同输入到RoI池化层提取对应区域的特征,最后将该区域特征和全连接层连接输出目标分类和得分,以及bounding box目标定位框回归值。At present, the detection methods of foreign bodies on high-speed railway contact networks in the prior art include: detection based on the Faster R-CNN detection model, detection based on relative position invariance, and detection of bird nests on railway contact networks using HOG (Histogram of Oriented Gradient) features. Among them, the detection based on the Faster R-CNN detection model introduces an RPN (Region Proposal Network) to generate candidate regions for the target object. Faster R-CNN can be regarded as consisting of an RPN network that generates target candidate regions and a Faster R-CNN detection that uses these candidate regions for prediction and classification. First, an image is input and propagated forward to the last shared convolutional layer. On the one hand, the obtained feature map is transmitted to the RPN network, and on the other hand, it continues to propagate forward to generate a higher-dimensional feature map. In the RPN network, after a series of processing, multiple candidate regions and candidate region scores are generated, and non-maximum filtering is used for the candidate region frame to reduce the number of candidate regions. The candidate regions larger than the set threshold and the previously generated high-dimensional feature map are input into the RoI pooling layer to extract the features of the corresponding regions. Finally, the regional features are connected to the fully connected layer to output the target classification and score, as well as the bounding box target positioning box regression value.

基于相对位置不变性的检测则是利用机器视觉处理技术,在初步分析图片的颜色特征、纹理特征和形状特征等之后,结合鸟窝筑巢平台的特征,将预处理后的检测图片利用sobel水平边缘检测算子得到图片边缘,然后再利用概率Hough变换直线检测方法对图片进行角度校正,并结合待分析图片中线段长度关系实现最前景硬横梁的检测,最后对图片进行二值化处理,通过统计硬横梁之间的白色区域面积,判断该横梁上是否存在鸟窝。The detection based on relative position invariance utilizes machine vision processing technology. After a preliminary analysis of the color, texture and shape features of the image, combined with the characteristics of the bird's nest platform, the preprocessed detection image is used to obtain the image edge using the Sobel horizontal edge detection operator. The probabilistic Hough transform straight line detection method is then used to correct the image angle. The hard beam in the foreground is detected in combination with the relationship between the line segment lengths in the image to be analyzed. Finally, the image is binarized and the area of the white area between the hard beams is counted to determine whether there is a bird's nest on the beam.

利用HOG特征对铁路接触网上的鸟窝进行的检测是根据先验知识对鸟巢在图片中可能出现的区域进行粗提取,然后求出提取区域的HOG特征,最后通过支持向量机(SVM,support vector machines)根据图片的HOG特征对图片中的鸟巢进行精确的识别。因为神经网络在图片处理方面有着其他算法无可比拟的优势,所以国内有一部份学者把神经网络与传统的图片处理技术相结合,对图片中感兴趣的目标进行检测,以此方法可以有效地提升检测的速度与精度。The detection of bird nests on the railway contact network using HOG features is to roughly extract the areas where bird nests may appear in the picture based on prior knowledge, then find the HOG features of the extracted area, and finally accurately identify the bird nests in the picture based on the HOG features of the picture through support vector machines (SVM). Because neural networks have unparalleled advantages over other algorithms in image processing, some domestic scholars have combined neural networks with traditional image processing technology to detect targets of interest in the picture. This method can effectively improve the speed and accuracy of detection.

上述现有技术中的高速铁路接触网异物的检测方法的缺点为:在鸟窝检测的实际操作过程中,由于列车的运行环境变化多样,并且本身形态多样的鸟窝是位于复杂的接触网之中的,由此,基于HOG与DMP模型而搭建的识别系统其准确率和召回率都远达不到期望的标准。这是因为HOG和DPM这类传统识别模型采用的是人工提取特征作为检测模板再进行滑框匹配检测的方式。这类方式很容易受到形状、纹理特征的影响,因此在这种环境中存在的鸟窝很难提取出标准的检测特征。The disadvantages of the above-mentioned detection method of foreign bodies in the high-speed railway contact network in the prior art are: in the actual operation process of bird nest detection, due to the various changes in the operating environment of the train and the fact that the bird nests of various shapes are located in the complex contact network, the accuracy and recall rate of the recognition system built based on the HOG and DMP models are far from the expected standards. This is because traditional recognition models such as HOG and DPM use artificially extracted features as detection templates and then perform sliding frame matching detection. This type of method is easily affected by shape and texture features, so it is difficult to extract standard detection features from bird nests in this environment.

此外,由于人工提取特征的需要,在数据量不足的情况下更加难以达到其精准识别的需求,由此,就有应用面窄以及普适性不足等缺点,这都会给高速铁路接触网鸟窝识别带来许多的问题。In addition, due to the need for manual feature extraction, it is even more difficult to achieve accurate identification when the amount of data is insufficient. As a result, it has disadvantages such as narrow application and lack of universality, which will bring many problems to the identification of bird nests in high-speed railway contact networks.

发明内容Summary of the invention

本发明的实施例提供了一种铁路接触网鸟窝检测方法,以实现对铁路接触网进行有效的自动鸟窝识别检测。The embodiment of the present invention provides a method for detecting bird nests on a railway contact network, so as to realize effective automatic bird nest identification and detection on the railway contact network.

为了实现上述目的,本发明采取了如下技术方案。In order to achieve the above object, the present invention adopts the following technical scheme.

一种铁路接触网鸟窝检测方法,包括:A method for detecting bird nests in a railway contact network, comprising:

根据铁路接触网图片数据集中包含鸟窝的图片通过逆向推理得到包含鸟窝区域的兴趣域图片,将所述兴趣域图片作为模板图片,根据所有模板图片构成模板库,利用所述模板库对第二级YOLO检测器进行训练;According to the pictures containing bird nests in the railway contact network picture data set, the interest domain pictures containing the bird nest area are obtained by reverse reasoning, the interest domain pictures are used as template pictures, a template library is formed according to all template pictures, and the template library is used to train the second-level YOLO detector;

将铁路接触网图片数据集中不包含鸟窝的图片与所述模板库中的每个模板图片依次进行匹配,得到兴趣域图片数据集,利用所述兴趣域图片数据集对第一级YOLO检测器进行训练;Matching the pictures that do not contain bird nests in the railway contact network picture dataset with each template picture in the template library in sequence to obtain a domain of interest picture dataset, and using the domain of interest picture dataset to train the first-level YOLO detector;

将待检测图片输入训练后的第一级YOLO检测器,所述第一级YOLO检测器输出兴趣域图片,将所述兴趣域图片输入训练后的第二级YOLO检测器,所述第二级YOLO检测器输出所述待检测图片的鸟窝检测结果。The picture to be detected is input into the trained first-level YOLO detector, and the first-level YOLO detector outputs the interest domain picture. The interest domain picture is input into the trained second-level YOLO detector, and the second-level YOLO detector outputs the bird nest detection result of the picture to be detected.

优选地,所述的根据铁路接触网图片数据集中包含鸟窝的图片通过逆向推理得到包含鸟窝区域的兴趣域图片,将所述兴趣域图片作为模板图片,根据所有模板图片构成模板库,包括:Preferably, the image of the domain of interest containing the bird's nest area is obtained by reverse reasoning based on the image containing the bird's nest in the railway contact network image data set, the image of the domain of interest is used as a template image, and a template library is formed based on all the template images, including:

对铁路接触网图片数据集中包含鸟窝的图片进行初步分割,得到具有设定相似度的基本区域,根据区域间差异合对基本区域进行初步合并得到一系列初步的候选区域,用矩形框包围出所述初步的候选区域,根据所述矩形框之间的相似度合并初步的候选区域,得到最终的候选区域,对最终的候选区域中的鸟窝位置进行手工标注,用矩形表示标注的鸟窝区域属性,将包含鸟窝区域的最终的候选区域作为兴趣域,将兴趣域图片作为模板图片,根据所有模板图片构成模板库。The pictures containing bird nests in the railway contact network picture dataset are preliminarily segmented to obtain basic areas with set similarities. The basic areas are preliminarily merged according to the differences between the areas to obtain a series of preliminary candidate areas. The preliminary candidate areas are surrounded by rectangular frames. The preliminary candidate areas are merged according to the similarities between the rectangular frames to obtain final candidate areas. The bird nest positions in the final candidate areas are manually marked, and the marked bird nest area attributes are represented by rectangles. The final candidate area containing the bird nest area is used as the interest domain, and the interest domain image is used as the template image. A template library is constructed based on all template images.

优选地,所述的对铁路接触网图片数据集中包含鸟窝的图片进行初步分割,得到具有设定相似度的基本区域,根据区域间差异合对基本区域进行初步合并得到一系列初步的候选区域,包括:Preferably, the pictures containing bird nests in the railway contact network picture data set are preliminarily segmented to obtain basic regions with set similarities, and the basic regions are preliminarily merged according to the differences between the regions to obtain a series of preliminary candidate regions, including:

将一幅包含鸟窝的图片用无向图G=<V,E>表示,其中无向图的顶点表示图片的一个像素点,边e=(vi,vj)的权重表示相邻顶点对i,j的不相似度,用像素的颜色距离表示两个像素间的不相似度w(e),一个基本区域为具有最小不相似度的点集;A picture containing a bird's nest is represented by an undirected graph G = <V, E>, where the vertex of the undirected graph represents a pixel point in the picture, the weight of the edge e = ( vi , vj ) represents the dissimilarity between adjacent vertices i and j, and the color distance of the pixel is used Represents the dissimilarity w(e) between two pixels, and a basic region is the set of points with the minimum dissimilarity;

将基本区域的类内差异定义为:The intra-class variance of the basic region is defined as:

将两个基本区域C1、C2之间的类间差异定义两个基本区域的最小连接边:The inter-class difference between two basic regions C 1 and C 2 is defined as the minimum connecting edge between the two basic regions:

如果两个基本区域没有边相连,Diff(C1,C2)=∞If two basic regions are not connected by edges, Diff(C 1 , C 2 ) = ∞

当满足条件Diff(C1,C2)≤min(Int(C1)+τ(C1),Int(C2)+τ(C2)),则判断两个基本区域C1、C2能够合并;When the condition Diff(C 1 , C 2 )≤min(Int(C 1 )+τ(C 1 ), Int(C 2 )+τ(C 2 )) is satisfied, it is determined that the two basic regions C 1 and C 2 can be merged;

其中τ(C)为阈值函数,使孤立点构成的区域具有权重:Where τ(C) is the threshold function, which makes the area composed of isolated points have weight:

τ(C)=k/||C||τ(C)=k/||C||

将各个基本区域进行初步合并得到一系列初步的候选区域。Each basic region is preliminarily merged to obtain a series of preliminary candidate regions.

优选地,所述的用矩形框包围出所述初步的候选区域,根据所述矩形框之间的相似度合并初步的候选区域,得到最终的候选区域,包括:Preferably, the step of enclosing the preliminary candidate regions with rectangular frames, merging the preliminary candidate regions according to the similarity between the rectangular frames, and obtaining the final candidate regions comprises:

用矩形框包围出所述初步的候选区域,矩形框C的位置用(x,y,w,h)的四元组表示,式中x,y代表矩形框左上角的坐标,w,h代表矩形框的宽度和高度;The preliminary candidate area is enclosed by a rectangular frame, and the position of the rectangular frame C is represented by a four-tuple (x, y, w, h), where x and y represent the coordinates of the upper left corner of the rectangular frame, and w and h represent the width and height of the rectangular frame;

初步的候选区域ri的矩形框ci和初步的候选区域rj的矩形框cj之间的颜色距离为:The color distance between the rectangular box ci of the preliminary candidate region ri and the rectangular box cj of the preliminary candidate region rj is:

式中表示颜色直方图的第k个bins的像素点比例;In the formula Represents the pixel ratio of the kth bins of the color histogram;

初步的候选区域ri的矩形框ci和初步的候选区域rj的矩形框cj之间的矩形框ci和cj之间的纹理距离为:The texture distance between the rectangular box c i of the preliminary candidate region ri and the rectangular box c j of the preliminary candidate region r j is:

式中表示纹理直方图第k维像素点比例;In the formula Indicates the ratio of pixels in the kth dimension of the texture histogram;

对于初步的候选区域ri和rjFor the preliminary candidate regions ri and rj :

式中size(ri)表示区域ri对应矩形框大小,size(im)表示原始待分割图片的大小。Where size( ri ) represents the size of the rectangular box corresponding to region ri , and size(im) represents the size of the original image to be segmented.

对于初步的候选区域ri和rjFor the preliminary candidate regions ri and rj :

式中size(BBij)表示区域ri和rj的外接矩形大小;Where size(BB ij ) represents the size of the bounding rectangle of regions ri and rj ;

初步的候选区域ri和rj之间的总差异为:The total difference between the preliminary candidate regions ri and rj is:

S(ri,rj)=a1Scolour(ri,rj)+a2Stexture(ri,rj)+a3Ssize(ri,rj)+a4Sfill(ri,rj)S (r i , r j ) = a 1 S color (r i , r j ) + a 2 S texture (r i , r j ) + a 3 S size (r i , r j ) + a 4 S fill ( r i , r j )

a1,a2,a3,a4为对应的权重值;a 1 , a 2 , a 3 , a 4 are the corresponding weight values;

当初步的候选区域ri和rj之间的总差异S(ri,rj)大于设定的合并阈值,则将初步的候选区域ri和rj进行合并,得到最终的候选区域。When the total difference S( ri , rj ) between the preliminary candidate regions ri and rj is greater than the set merging threshold, the preliminary candidate regions ri and rj are merged to obtain the final candidate region.

优选地,所述的对最终的候选区域中的鸟窝位置进行手工标注,用矩形表示标注的鸟窝区域属性,将包含鸟窝区域的最终的候选区域作为兴趣域,将兴趣域图片作为模板图片,根据所有模板图片构成模板库,包括:Preferably, the bird nest positions in the final candidate area are manually annotated, and the annotated bird nest area attributes are represented by rectangles. The final candidate area containing the bird nest area is used as the domain of interest, and the domain of interest image is used as the template image. A template library is formed according to all template images, including:

将最终的候选区域C用矩形表示,矩形的位置属性用四元组(x,y,w,h)表示;The final candidate area C is represented by a rectangle, and the position attribute of the rectangle is represented by a four-tuple (x, y, w, h);

对最终的候选区域中的鸟窝位置进行标注,用矩形表示标注的鸟窝区域属性,矩形的位置属性为(bx,by,bw,bh),兴趣域为包含鸟窝区域的候选区域,兴趣域的位置坐标满足:The bird nest position in the final candidate area is marked, and the marked bird nest area attributes are represented by a rectangle. The position attributes of the rectangle are (bx, by, bw, bh). The region of interest is the candidate area containing the bird nest area, and the position coordinates of the region of interest satisfy:

并且满足阈值条件:And the threshold condition is met:

将所述兴趣域图片作为模板图片,根据所有模板图片构成模板库。The interest domain picture is used as a template picture, and a template library is constructed based on all template pictures.

优选地,所述的将铁路接触网图片数据集中不包含鸟窝的图片与所述模板库中的每个模板图片依次进行匹配,得到兴趣域图片数据集,包括:Preferably, the pictures not containing bird nests in the railway contact network picture data set are matched with each template picture in the template library in sequence to obtain the interest domain picture data set, including:

将铁路接触网图片数据集中不包含鸟窝的待匹配图片与模板库中的每个模板图片依次进行匹配,设模板图片为T,不包含鸟窝的图片的待匹配图片为I,设模板图片的宽为w,高为h,R表示匹配结果,匹配方法由如下公示表述:Match the to-be-matched images that do not contain bird nests in the railway contact network image dataset with each template image in the template library in turn. Let the template image be T, the to-be-matched image that does not contain bird nests be I, let the width of the template image be w, the height be h, R represents the matching result, and the matching method is expressed as follows:

式中:Where:

R值越大,代表待匹配图片在(x,y)位置大小为(w,h)的矩形区域与模板相似度越高,取模板相似度最大值为模板匹配的结果,并且要求模板匹配值高于阈值参数;The larger the R value is, the higher the similarity between the rectangular area of the image to be matched at the position (x, y) and the size (w, h) is with the template. The maximum template similarity is taken as the template matching result, and the template matching value is required to be higher than the threshold parameter.

记Rs(T,I)=maxx,y∈IR(x,y)Note Rs(T,I)=max x,y∈I R(x,y)

每个模板图片都对应一个最佳匹配值R,R对应的矩形匹配框的位置为(x,y,w,h),模板初次匹配结果构成结果集S:Each template image corresponds to a best matching value R. The position of the rectangular matching box corresponding to R is (x, y, w, h). The initial template matching results constitute the result set S:

式中c为匹配的阈值参数;Where c is the matching threshold parameter;

对结果集S按Rs值降序排列,两矩形匹配框s,t矩形相交的条件为:Arrange the result set S in descending order by Rs value. The condition for the intersection of two rectangular matching boxes s and t is:

max(x(s),x(t))≤min(x(s)+w(s),x(t)+w(t))max(x(s),x(t))≤min(x(s)+w(s),x(t)+w(t))

max(y(s),y(t))≤min(y(s)+h(s),y(t)+h(t))max(y(s),y(t))≤min(y(s)+h(s),y(t)+h(t))

依次遍历结果集S,若当前矩形匹配框与已标注矩形匹配框相交则放弃标注,否则对当前矩形匹配框进行VOC格式的标注,所有标注的矩形匹配框构成兴趣域数据集。Traverse the result set S in sequence. If the current rectangular matching box intersects with the annotated rectangular matching box, the annotation is abandoned. Otherwise, the current rectangular matching box is annotated in VOC format. All annotated rectangular matching boxes constitute the interest domain dataset.

优选地,所述的第一级YOLO检测器和第二级YOLO检测器包括:YOLOv3-spp、YOLOv4和Faster R-CNN。Preferably, the first-stage YOLO detector and the second-stage YOLO detector include: YOLOv3-spp, YOLOv4 and Faster R-CNN.

优选地,所述第一级YOLO检测器的置信度和期望如下:Preferably, the confidence and expectation of the first-stage YOLO detector are as follows:

式中Pr(Zone)为当前网格具有待测物体(兴趣域)的概率,在训练过程中,待测网格包含兴趣域,Pr(Zone)为1,否则为0,为网格预测的标注框与兴趣域真实所在的矩形框的交并比,B为每个网格预测的标注框数,S2为图片的总划分网格数,为物体所在的所有网格做出预测的所有预测框IOU的平均值,I(Zone)为兴趣域区域大小,I(image)为原始图片的大小,E(Zone)为图片给出的总IOU之和;Where Pr(Zone) is the probability that the current grid has the object to be tested (domain of interest). During the training process, if the grid to be tested contains the domain of interest, Pr(Zone) is 1, otherwise it is 0. is the intersection-over-union ratio of the grid-predicted annotation box and the rectangular box where the interest domain is actually located, B is the number of annotation boxes predicted for each grid, S 2 is the total number of grids divided into the image, The average IOU of all predicted boxes predicted for all grids where the object is located, I(Zone) is the size of the region of interest, I(image) is the size of the original image, and E(Zone) is the sum of the total IOU given by the image;

所述第二级YOLO检测器的置信度为:The confidence of the second-level YOLO detector is:

式中Pr(Birdnest)为当前网格具有鸟窝的概率,待测网格包含鸟窝,Pr(Birdnest)为1,否则为0,为网格预测的标注框与鸟窝真实所在的矩形框的交并比,P(distribution)为图片的鸟窝存在于兴趣域中的概率,显然所有的鸟窝都存在于兴趣域中,此项为1;Where Pr(Birdnest) is the probability that the current grid has a bird nest. If the grid to be tested contains a bird nest, Pr(Birdnest) is 1, otherwise it is 0. is the intersection-and-union ratio of the grid-predicted annotation box and the rectangular box where the bird's nest actually is. P (distribution) is the probability that the bird's nest in the image exists in the domain of interest. Obviously, all bird's nests exist in the domain of interest, and this item is 1.

兴趣域中的鸟窝预测期望为:The prediction expectation of bird nests in the domain of interest is:

式中Pri(birdnest)为兴趣域子图像中,子图像中第i个网格具有包含鸟窝的概率,为子图像第i个网格预测的第j个矩形框与鸟窝图片的交并比,confidence(Zone)为原始图像网格预测的一个矩形框标注出兴趣域的置信度,表示原始图像的划分网格做出预测的一个预测框能够标注出鸟窝的把握程度;Where Pr i (birdnest) is the probability that the i-th grid in the sub-image contains a bird nest in the sub-image of the interest domain. is the intersection-over-union ratio of the jth rectangular box predicted by the i-th grid of the sub-image and the bird's nest image, confidence(Zone) is the confidence of the interest region marked by a rectangular box predicted by the original image grid, A prediction box representing the division grid of the original image can mark the degree of certainty of the bird's nest;

级联预测的期望为:The expectation of the cascade prediction is:

式中为兴趣域子图像中鸟窝包含的网格做出的预测标注框与鸟窝所在矩形的平均交并比,式中代表网格预测锚框的平均IOU值;In the formula is the average intersection-over-union ratio of the predicted annotation box contained in the grid of the bird's nest in the sub-image of the interest domain and the rectangle where the bird's nest is located, where Represents the average IOU value of the grid prediction anchor box;

级联预测的精度为:The accuracy of cascade prediction is:

P=F(birdnest,Zone,N)*F(Zone,image,M)>F(birdnest,image,N)。P=F(birdnest, Zone, N)*F(Zone, image, M)>F(birdnest, image, N).

由上述本发明的实施例提供的技术方案可以看出,本发明实施例提出的高速铁路接触网鸟窝自动识别和快速追踪的方法,可以有效解决高速铁路接触网上鸟窝的准确快速识别和追踪问题;可以有效解决由于接触网中鸟窝信息量较少,缺乏显著的形状或纹理特征造成的识别困难;从而能够对铁路接触网进行有效的自动鸟窝识别检测。It can be seen from the technical solutions provided by the above-mentioned embodiments of the present invention that the method for automatic identification and rapid tracking of bird nests on the high-speed railway contact network proposed in the embodiments of the present invention can effectively solve the problem of accurate and rapid identification and tracking of bird nests on the high-speed railway contact network; it can effectively solve the identification difficulties caused by the small amount of bird nest information in the contact network and the lack of obvious shape or texture features; thereby enabling effective automatic identification and detection of bird nests on the railway contact network.

本发明附加的方面和优点将在下面的描述中部分给出,这些将从下面的描述中变得明显,或通过本发明的实践了解到。Additional aspects and advantages of the present invention will be given in part in the following description, which will become obvious from the following description, or may be learned through practice of the present invention.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other accompanying drawings can be obtained based on these accompanying drawings without paying creative work.

图1为本发明实施例提供的一种铁路接触网鸟窝检测方法的实现原理图;FIG1 is a schematic diagram showing a method for detecting bird nests in a railway overhead contact network according to an embodiment of the present invention;

图2为本发明实施例提供的一种铁路接触网鸟窝检测方法的处理流程图;FIG2 is a processing flow chart of a method for detecting bird nests in a railway contact network provided by an embodiment of the present invention;

图3为本发明实施例提供的一种对图片进行初步分割和合并生成基本区域的处理流程图;FIG3 is a flowchart of a process for performing preliminary segmentation and merging of images to generate basic regions provided by an embodiment of the present invention;

图4(a)为本发明实施例提供的一种原始图片示意图,图4(b)为进行初步分割和合并后得到一系列基本区域的示意图。FIG. 4( a ) is a schematic diagram of an original image provided by an embodiment of the present invention, and FIG. 4( b ) is a schematic diagram of a series of basic regions obtained after preliminary segmentation and merging.

图5为本发明实施例提供的一种根据区域间差异合并基本区域,得到一系列候选区域的处理流程图;FIG5 is a flowchart of a process for merging basic regions according to differences between regions to obtain a series of candidate regions provided by an embodiment of the present invention;

图6为本发明实施例提供的一种用矩形框标注出兴趣域的示意图;FIG6 is a schematic diagram of marking a region of interest with a rectangular frame according to an embodiment of the present invention;

图7为本发明实施例提供的一种待检测图片与模板图片进行模板匹配的处理流程图;FIG7 is a processing flow chart of template matching between a to-be-detected image and a template image provided by an embodiment of the present invention;

图8(a)为本发明实施例提供的一种原始图片示意图,图8(b)为原始图片的模板匹配的示意图;FIG8(a) is a schematic diagram of an original picture provided by an embodiment of the present invention, and FIG8(b) is a schematic diagram of template matching of the original picture;

图9为本发明实施例提供的一种将一幅图片分割成S×S个网格的示意图;FIG9 is a schematic diagram of dividing a picture into S×S grids provided by an embodiment of the present invention;

图10为本发明实施例提供的一种单级网络预测示意图;FIG10 is a schematic diagram of a single-stage network prediction provided by an embodiment of the present invention;

图11为本发明实施例提供的一种级联网络的第一级网络预测示意图;FIG11 is a schematic diagram of a first-level network prediction of a cascade network provided by an embodiment of the present invention;

图12为本发明实施例提供的一种级联网络的第二级网络预测示意图;FIG12 is a schematic diagram of a second-level network prediction of a cascade network provided by an embodiment of the present invention;

图13为本发明实施例提供的一种YOLOv3-GIOU训练过程中的IOU曲线示意图;FIG13 is a schematic diagram of an IOU curve in a YOLOv3-GIOU training process provided by an embodiment of the present invention;

图14位本发明实施例提供的一种Yolov4-CIOU训练过程中的IOU曲线示意图如所示;FIG14 is a schematic diagram of an IOU curve in a Yolov4-CIOU training process provided by an embodiment of the present invention;

图15为本发明实施例提供的一种YOLOv3-SPP第二级检测与直接检测IOU对比示意图;FIG15 is a schematic diagram of an IOU comparison between a YOLOv3-SPP second-level detection and a direct detection provided by an embodiment of the present invention;

图16为本发明实施例提供的一种YOLOv4第二级检测与直接检测IOU对比示意图。FIG16 is a schematic diagram of an IOU comparison between a YOLOv4 second-level detection and a direct detection provided by an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面详细描述本发明的实施方式,所述实施方式的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施方式是示例性的,仅用于解释本发明,而不能解释为对本发明的限制。The embodiments of the present invention are described in detail below, examples of which are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present invention, and cannot be interpreted as limiting the present invention.

本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本发明的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接”可以包括无线连接或耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的任一单元和全部组合。It will be understood by those skilled in the art that, unless expressly stated, the singular forms "one", "said", and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of the present invention refers to the presence of the features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or groups thereof. It should be understood that when we refer to an element as being "connected" or "coupled" to another element, it may be directly connected or coupled to the other element, or there may be intermediate elements. In addition, the "connection" or "coupling" used herein may include wireless connection or coupling. The term "and/or" used herein includes any unit and all combinations of one or more associated listed items.

本技术领域技术人员可以理解,除非另外定义,这里使用的所有术语(包括技术术语和科学术语)具有与本发明所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是,诸如通用字典中定义的那些术语应该被理解为具有与现有技术的上下文中的意义一致的意义,并且除非像这里一样定义,不会用理想化或过于正式的含义来解释。It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as those generally understood by those skilled in the art in the art to which the present invention belongs. It should also be understood that terms such as those defined in common dictionaries should be understood to have meanings consistent with the meanings in the context of the prior art, and will not be interpreted with idealized or overly formal meanings unless defined as herein.

为便于对本发明实施例的理解,下面将结合附图以几个具体实施例为例做进一步的解释说明,且各个实施例并不构成对本发明实施例的限定。To facilitate understanding of the embodiments of the present invention, several specific embodiments will be further explained below with reference to the accompanying drawings, and each embodiment does not constitute a limitation on the embodiments of the present invention.

本发明基于卷积神经网络的方法使用卷积核作为特征提取器,图片可以直接作为网络的输入,通过训练得到的鸟窝特征,避免了传统识别算法中复杂的特征提取和数据重建过程,因此在准确率和召回率上能有明显的提升。The method based on convolutional neural network of the present invention uses convolution kernel as feature extractor, and the picture can be directly used as the input of the network. The bird nest features obtained through training avoid the complicated feature extraction and data reconstruction process in the traditional recognition algorithm, so the accuracy and recall rate can be significantly improved.

本发明首先使用Selective Search算法聚类得到多个候选区域,根据鸟窝的位置从候选区域中找出兴趣域。将得到的所有兴趣域在图片集中进行模板匹配,标注出所有图片的兴趣域。构建YOLO网络对所有图片的兴趣域进行识别训练,同时构建YOLO网络对兴趣域中的鸟窝进行识别训练。对未知的图片样本进行第一级识别找出兴趣域,再进行第二级识别从兴趣域中找出鸟窝。这种级联识别可以大大提高识别的准确率和提高计算效率。The present invention first uses the Selective Search algorithm to cluster multiple candidate regions, and finds the domain of interest from the candidate regions according to the location of the bird's nest. Template matching is performed on all the obtained domains of interest in the picture set, and the domains of interest of all pictures are marked. A YOLO network is constructed to perform recognition training on the domains of interest of all pictures, and a YOLO network is constructed to perform recognition training on the bird's nest in the domain of interest. The first level recognition is performed on the unknown picture sample to find the domain of interest, and then the second level recognition is performed to find the bird's nest from the domain of interest. This cascade recognition can greatly improve the recognition accuracy and improve the calculation efficiency.

实施例一Embodiment 1

本发明实施例提供的一种铁路接触网鸟窝检测方法的实现原理图如图1所示,处理流程如图2所示,包括如下的处理步骤:The implementation principle diagram of a railway contact network bird nest detection method provided by an embodiment of the present invention is shown in FIG1 , and the processing flow is shown in FIG2 , which includes the following processing steps:

步骤S210:根据铁路接触网图片数据集中包含鸟窝的图片通过逆向推理得到包含鸟窝区域的兴趣域图片,将所述兴趣域图片作为模板图片,根据所有模板图片构成模板库。Step S210: obtaining a domain of interest image containing a bird nest area through reverse reasoning based on images containing bird nests in the railway contact network image dataset, using the domain of interest image as a template image, and constructing a template library based on all template images.

兴趣域是一个区域,而区域是一个具有较高相似度的整体,首先我们要在图片中找出所有具有较高相似度的区域集合。而寻找区域就是对图片进行分割与合并的过程。The domain of interest is a region, and a region is a whole with high similarity. First, we need to find all the regions with high similarity in the image. Finding regions is the process of segmenting and merging images.

首先对铁路接触网图片数据集中包含鸟窝的图片进行初步分割,得到大量具有相似度的基本区域。然后对基本区域进行合并得到一系列候选区域。本发明实施例提供的一种对图片进行初步分割和合并生成基本区域的处理流程图如图3所示,处理过程包括:一幅图片可以由无向图G=<V,E>表示,其中无向图的顶点表示图片的一个像素点,边e=(vi,vj)的权重表示相邻顶点对i,j的不相似度。可以用像素的颜色距离等像素属性表示两个像素间的不相似度w(e)。一个基本区域为具有最小不相似度的点集,所以基本区域是包含点集的最小生成树,图片的初步分割即找出图片的最小生成树构成的森林。First, the images containing bird nests in the railway contact network image data set are preliminarily segmented to obtain a large number of basic regions with similarities. Then the basic regions are merged to obtain a series of candidate regions. A processing flow chart for preliminarily segmenting and merging images to generate basic regions provided by an embodiment of the present invention is shown in Figure 3. The processing process includes: a picture can be represented by an undirected graph G = <V, E>, where the vertex of the undirected graph represents a pixel point of the picture, and the weight of the edge e = (v i , v j ) represents the dissimilarity of adjacent vertex pairs i and j. The color distance of the pixels can be used The equal pixel attribute represents the dissimilarity w(e) between two pixels. A basic region is a point set with the minimum dissimilarity, so the basic region is the minimum spanning tree containing the point set. The initial segmentation of the image is to find the forest composed of the minimum spanning tree of the image.

差异决定了基本区域间是否合并,基本区域的类内差异定义为:The difference determines whether the basic regions are merged. The intra-class difference of the basic regions is defined as:

式中C表示合并过程的一个基本区域,e表示区域内部的连接边,其权重代表了像素间的不相似度,类内差异Int(C)为区域内部连接边的最大值。Where C represents a basic region in the merging process, e represents the connecting edge inside the region, its weight represents the dissimilarity between pixels, and the intra-class difference Int(C) is the maximum value of the connecting edge inside the region.

类间差异定义为两个基本区域的最小连接边:The difference between classes is defined as the minimum connecting edge between two basic regions:

式中C1与C2代表两个不同的区域,vi与vj代表两个不同区域连接边的两个顶点,类间差异Diff(C1,C2)即为连接两个区域的连接边的最小值。In the formula, C 1 and C 2 represent two different regions, vi and v j represent two vertices of the connecting edge between two different regions, and the inter-class difference Diff(C 1 , C 2 ) is the minimum value of the connecting edge connecting the two regions.

特别地,如果两个基本区域没有边相连,Diff(C1,C2)=∞In particular, if two basic regions are not connected by edges, Diff(C 1 , C 2 ) = ∞

于是得到了两个基本区域合并的依据,当满足条件:So we get the basis for merging two basic regions when the conditions are met:

Diff(C1,C2)≤min(Int(C1)+τ(C1),Int(C2)+τ(C2))Diff(C 1 , C 2 )≤min(Int(C 1 )+τ(C 1 ), Int(C 2 )+τ(C 2 ))

则判断基本区域C1和C2能够合并Then it is judged that the basic areas C1 and C2 can be merged

其中τ(C)为阈值函数,使孤立点构成的区域具有权重:Where τ(C) is the threshold function, which makes the area composed of isolated points have weight:

τ(C)=k/||C||τ(C)=k/||C||

式中k为人为设定的参数,||C||为基本区域C的顶点个数,通过调整k的大小可以控制算法分割区域的大小。In the formula, k is a manually set parameter, ||C|| is the number of vertices in the basic area C, and the size of the algorithm segmentation area can be controlled by adjusting the size of k.

如此进行进行图片点集的合并,得到多个基本区域。图4(a)为本发明实施例提供的一种原始图片示意图,图4(b)为进行初步分割和合并后得到一系列基本区域区域的示意图。In this way, the image point sets are merged to obtain multiple basic regions. Figure 4 (a) is a schematic diagram of an original image provided by an embodiment of the present invention, and Figure 4 (b) is a schematic diagram of a series of basic regions obtained after preliminary segmentation and merging.

图5为本发明实施例提供的一种根据区域间差异合并基本区域,得到一系列候选区域的处理流程图。具体处理过程包括:对以上形成的基本区域进行合并,推理出鸟窝所在的兴趣域。以矩形表示基本区域及其合并后的结果,矩形的位置可用(x,y,w,h)的四元组表示。式中x,y代表矩形框左上角的坐标,w,h代表矩形框的宽度和高度。FIG5 is a flowchart of a process for merging basic regions according to differences between regions to obtain a series of candidate regions provided by an embodiment of the present invention. The specific processing process includes: merging the basic regions formed above, and inferring the domain of interest where the bird's nest is located. The basic regions and their merged results are represented by rectangles, and the position of the rectangle can be represented by a four-tuple of (x, y, w, h). Where x, y represent the coordinates of the upper left corner of the rectangular box, and w, h represent the width and height of the rectangular box.

对于区域C,其对应的表示矩形位置属性计算方法如下:For area C, the corresponding rectangular position attribute calculation method is as follows:

首先计算区域间的差异,区域间差异可由四项评价指标评估:First, the differences between regions are calculated. The differences between regions can be evaluated by four evaluation indicators:

颜色距离,取决于两个区域颜色直方图各个bins的最小值Color distance, determined by the minimum value of each bin of the color histogram of the two regions

统计区域ri对应矩形图像不同颜色通道的各个bins中的像素点个数得到n维颜色直方图,式中表示颜色直方图的第k个bins的像素点比例。The number of pixels in each bin of the rectangular image of different color channels corresponding to the statistical area ri is obtained to obtain the n-dimensional color histogram, where Represents the pixel ratio of the kth bins of the color histogram.

纹理距离,取决于两个区域的快速sift特征直方图各个bins的最小值。The texture distance is determined by the minimum value of each bin of the fast SIFT feature histogram of the two regions.

统计区域ri对应矩形图像不同颜色通道的各个bins中的每个sift特征的像素点数得到n维纹理直方图,式中表示纹理直方图第k维像素点比例。The n-dimensional texture histogram is obtained by counting the number of pixels of each SIFT feature in each bin of different color channels of the rectangular image corresponding to the region ri , where Indicates the ratio of pixels in the kth dimension of the texture histogram.

优先进行小区域间的合并,对小区域给予更高的合并权重。Prioritize the merging of small areas and give higher merging weights to small areas.

式中size(ri)表示区域ri对应矩形图像大小,size(im)表示原始待分割图片的大小。Where size( ri ) represents the size of the rectangular image corresponding to region r i , and size(im) represents the size of the original image to be segmented.

优先合并外接矩形的重合面积大的区域。Prioritize merging areas with large overlapping areas of the bounding rectangles.

式中size(BBij)表示区域ri和rj的外接矩形大小。其余参数意义同上。Where size(BB ij ) represents the size of the bounding rectangle of regions ri and rj . The other parameters have the same meaning as above.

加权以上差异,得出区域间总差异:Weighting the above differences, we get the total difference between regions:

S(ri,rj)=a1Scolour(ri,rj)+a2Stexture(ri,rj)+a3Ssize(ri,rj)+a4Sfill(ri,rj)S (r i , r j ) = a 1 S color (r i , r j ) + a 2 S texture (r i , r j ) + a 3 S size (r i , r j ) + a 4 S fill (r i , rj )

a1,a2,a3,a4为对应的权重值。a 1 , a 2 , a 3 , a 4 are the corresponding weight values.

前面的条件分割是对原始待分割图片进行一次基于像素点差异的分割,它给出了每个像素点属于哪个基本区域,而基本区域是一个连续的像素点集,它的形状是不规则的。之后判定条件进行的分割是基于第一次基本分割的结果,第二次分割首先用矩形框包围出第一次分割的不规则点集,得到一系列矩形区域,矩形区域的范围大于被包围的基本区域,需要对这些矩形区域再次进行更加严格的相似度判定,然后根据相似度合并这些矩形区域,得到了最终的候选区域,含有鸟窝的候选区域即为寻找的兴趣域。第一次分割只根据像素点是否“相似”进行了一次粗略的分割,而第二次分割在第一次的基础上,加入了“大小”,“特征”,“形状”等方面的考量,得到了最终含有鸟窝的矩形分割区域。The previous conditional segmentation is a segmentation of the original image to be segmented based on pixel differences. It gives which basic area each pixel belongs to, and the basic area is a continuous set of pixels with irregular shapes. The segmentation performed by the judgment condition is based on the result of the first basic segmentation. The second segmentation first uses a rectangular frame to enclose the irregular point set of the first segmentation to obtain a series of rectangular areas. The range of the rectangular area is larger than the enclosed basic area. These rectangular areas need to be judged more strictly for similarity again, and then these rectangular areas are merged according to the similarity to obtain the final candidate area. The candidate area containing the bird's nest is the area of interest to be found. The first segmentation only performs a rough segmentation based on whether the pixels are "similar", while the second segmentation adds "size", "features", "shape" and other considerations on the basis of the first segmentation to obtain the final rectangular segmentation area containing the bird's nest.

对输入数据集的鸟窝位置进行手工标注,标注的鸟窝区域属性仍然用矩形表示,设其位置属性为(bx,by,bw,bh)。兴趣域为包含鸟窝区域的候选区域,其位置坐标应满足:The bird nest locations in the input data set are manually annotated. The annotated bird nest area attributes are still represented by rectangles, and its location attributes are set to (bx, by, bw, bh). The region of interest is the candidate region containing the bird nest area, and its location coordinates should satisfy:

同时防止过度合并,兴趣域同时应满足阈值条件:To prevent over-merging, the domain of interest should also meet the threshold conditions:

鸟窝所在的候选区域即为推理得出的兴趣域,如图6所示,矩形框标注出即为推理得到的兴趣域。The candidate area where the bird's nest is located is the inferred interest area, as shown in Figure 6, where the rectangular box marks the inferred interest area.

步骤S220:将铁路接触网图片数据集中不包含鸟窝的图片与所述模板库中的每个模板图片依次进行匹配,得到兴趣域图片数据集。Step S220: matching the pictures that do not contain bird nests in the railway contact network picture dataset with each template picture in the template library in sequence to obtain a domain of interest picture dataset.

图7为本发明实施例提供的一种待检测图片与模板图片进行模板匹配的处理流程图,具体处理过程包括:将上述推理得出的兴趣域的图片作为模板元素图片,所有模板元素图片构成模板库。利用上述模板库对铁路接触网图片数据集中不包含鸟窝的图片进行模板匹配,即遍历每一张图片,采用归一化相关系数匹配标注出待检测图片的所有兴趣域。FIG7 is a processing flow chart of template matching between a to-be-detected image and a template image provided by an embodiment of the present invention. The specific processing process includes: using the image of the interest domain derived by the above inference as a template element image, and all template element images constitute a template library. The above template library is used to perform template matching on the images that do not contain bird nests in the railway contact network image dataset, that is, traversing each image, and using the normalized correlation coefficient to match and mark all interest domains of the to-be-detected image.

设模板图片为T,不包含鸟窝的图片的待匹配图片为I,设模板图片的宽为w,高为h,R表示匹配结果,则匹配方法可由如下公示表述:Let the template image be T, the image to be matched that does not contain the bird's nest be I, let the width of the template image be w, the height be h, and R represent the matching result. Then the matching method can be expressed as follows:

式中:Where:

R值越大,代表待匹配图片在(x,y)位置大小为(w,h)的矩形区域与模板相似度越高,取模板相似度最大值为模板匹配的结果。并且要求模板匹配值高于阈值参数。The larger the R value is, the higher the similarity between the rectangular area of the image to be matched at the position (x, y) and the size (w, h) is with the template, and the maximum template similarity is taken as the template matching result. And the template matching value is required to be higher than the threshold parameter.

记Rs(T,I)=maxx,y∈IR(x,y)Note Rs(T,I)=max x,y∈I R(x,y)

模板匹配首先将待匹配图片与模板库中的每个模板图片依次进行匹配,每个模板图片都对应一个最佳匹配值R,R对应的矩形匹配框的位置为(x,y,w,h)。模板初次匹配结果构成结果集S:Template matching first matches the image to be matched with each template image in the template library in turn. Each template image corresponds to a best matching value R, and the position of the rectangular matching box corresponding to R is (x, y, w, h). The initial template matching results constitute the result set S:

式中c为匹配的阈值参数。Where c is the matching threshold parameter.

对结果集S按Rs值降序排列,结果集中各个矩形匹配框可能相交。The result set S is sorted in descending order by Rs value. The rectangular matching boxes in the result set may intersect.

对于两矩形匹配框s,t矩形相交的条件为:For two rectangular matching boxes s and t, the condition for the intersection of rectangles is:

max(x(s),x(t))≤min(x(s)+w(s),x(t)+w(t))max(x(s),x(t))≤min(x(s)+w(s),x(t)+w(t))

max(y(s),y(t))≤min(y(s)+h(s),y(t)+h(t))max(y(s),y(t))≤min(y(s)+h(s),y(t)+h(t))

依次遍历结果集S,若当前矩形匹配框与已标注矩形匹配框相交则放弃标注,否则对当前矩形匹配框进行VOC格式的标注,所有标注的矩形匹配框构成兴趣域数据集。模板匹配算法得到的最佳匹配结果为矩形框,其匹配结果为(x,y,w,h),单类物体,得到其label.txt文件的格式为Oxywh(一行)。一行代表一个矩形标注框。之后使用txttoxml脚本把txt文件标注格式转化为深度学习训练的VOC格式的xml文件。Traverse the result set S in sequence. If the current rectangular matching box intersects with the annotated rectangular matching box, the annotation is abandoned. Otherwise, the current rectangular matching box is annotated in VOC format. All annotated rectangular matching boxes constitute the interest domain data set. The best matching result obtained by the template matching algorithm is a rectangular box, and its matching result is (x, y, w, h). For a single-class object, the format of its label.txt file is Oxywh (one line). One line represents a rectangular annotation box. Then use the txttoxml script to convert the txt file annotation format into an xml file in VOC format for deep learning training.

图8(a)为本发明实施例提供的一种原始图片示意图,图8(b)为原始图片的模板匹配的示意图;FIG8(a) is a schematic diagram of an original picture provided by an embodiment of the present invention, and FIG8(b) is a schematic diagram of template matching of the original picture;

步骤S230:构建基于级联神经网络的接触网鸟窝检测模型,并对接触网鸟窝检测模型进行训练。上述接触网鸟窝检测模型包括第一级YOLO检测器和第二级YOLO检测器,利用上述模板库对第二级YOLO检测器进行训练,利用上述兴趣域图片数据集对第一级YOLO检测器进行训练。Step S230: construct a contact network bird nest detection model based on a cascade neural network, and train the contact network bird nest detection model. The contact network bird nest detection model includes a first-level YOLO detector and a second-level YOLO detector. The second-level YOLO detector is trained using the template library, and the first-level YOLO detector is trained using the domain of interest image dataset.

第一级YOLO检测器使用的是模板库模板匹配之后的3900张图片,每张图片的label文件在进行模板匹配过程中得出,使用yolov3-spp神经网络进行训练。The first-level YOLO detector uses 3900 images after template matching in the template library. The label file of each image is obtained during the template matching process and trained using the yolov3-spp neural network.

第二级YOLO检测器的图片为逆向推理算法得到的兴趣域(里面含有鸟窝),鸟窝的位置为相对兴趣域的相对位置,在逆向推理算法过程就可以得出label.txt。使用yolov3-spp神经网络进行训练。The image of the second-level YOLO detector is the interest domain obtained by the reverse reasoning algorithm (which contains the bird's nest). The position of the bird's nest is the relative position relative to the interest domain. The label.txt can be obtained in the reverse reasoning algorithm process. The yolov3-spp neural network is used for training.

步骤S240:将待检测图片输入训练后的第一级YOLO检测器,所述第一级YOLO检测器输出兴趣域图片,将所述兴趣域图片输入训练后的第二级YOLO检测器,所述第二级YOLO检测器输出所述待检测图片的鸟窝检测结果。Step S240: input the image to be detected into the trained first-level YOLO detector, the first-level YOLO detector outputs the domain of interest image, input the domain of interest image into the trained second-level YOLO detector, the second-level YOLO detector outputs the bird nest detection result of the image to be detected.

接触网鸟窝的体积较小,缺乏显著的形状和纹理特征,采用已有的人工设计特征对条纹图片进行分类难以得到理想的结果。对此,深度学习提供了一种可行的解决方案,YOLO神经网络作为一种常见的目标检测网络具有强大的检测性能。本发明利用两级预测网络级联的检测网络进行接触网鸟窝检测,采用YOLOv3-SPP网络结构作为分级检测器。The size of the bird's nest in the contact network is small and lacks significant shape and texture features. It is difficult to obtain ideal results by using existing artificially designed features to classify stripe images. In this regard, deep learning provides a feasible solution. As a common target detection network, the YOLO neural network has powerful detection performance. The present invention uses a detection network with a two-level prediction network cascade to detect the bird's nest in the contact network, and uses the YOLOv3-SPP network structure as a hierarchical detector.

YOLO神经网络把整张图片作为输入,输出预测的边界框和其所属的类别。The YOLO neural network takes the entire image as input and outputs the predicted bounding box and the category it belongs to.

算法开始将一幅图片分割成S×S个网格,如图9所示,标注物体的中心所在的网格负责预测对应的标注物体。每个网格需要预测B个边界框,每个边界框需要回归自身位置(x,y,w,h)以及置信度。The algorithm starts by dividing an image into S×S grids, as shown in Figure 9. The grid where the center of the labeled object is located is responsible for predicting the corresponding labeled object. Each grid needs to predict B bounding boxes, and each bounding box needs to regress its own position (x, y, w, h) and confidence.

YOLOv3通过改变卷积核的步长来改变网络中传播张量的尺寸。YOLOv3通过提前计算出边界框来提高模型的预测速度。YOLOv3通过预测边界框中心点与对应网格左上角位置的相对偏移量来决定边界框的位置。并且对tx、ty做归一化处理,使得边界框预测值在0和1之间,这样可以保证边界框的中心点一定在划分的网格中。YOLOv3 changes the size of the propagation tensor in the network by changing the step size of the convolution kernel. YOLOv3 improves the prediction speed of the model by calculating the bounding box in advance. YOLOv3 determines the position of the bounding box by predicting the relative offset between the center point of the bounding box and the upper left corner of the corresponding grid. And normalize t x and ty so that the bounding box prediction value is between 0 and 1, which can ensure that the center point of the bounding box is definitely in the divided grid.

bx=σ(tx)+cx b x =σ(t x )+c x

by=σ(ty)+cy by =σ( ty )+ cy

tx、ty、tw、th是模型的预测输出。cx和cy表示网格的坐标,pw和ph表示预测前边界框的大小,bx、by、bw和bh是预测得到的边界框的中心坐标和大小。 tx , ty , tw , and th are the predicted outputs of the model. cx and cy represent the coordinates of the grid, pw and ph represent the size of the bounding box before prediction, and bx , by, bw , and bh are the center coordinates and size of the predicted bounding box.

置信度为网格是否正确预测待检测物体以及边界框与物体真实位置的偏差。置信度可用如下公式表达:The confidence is whether the grid correctly predicts the object to be detected and the deviation between the bounding box and the actual position of the object. The confidence can be expressed as follows:

式中IOU为边界框与物体标注框的交并比,其计算方法如下:Where IOU is the intersection-over-union ratio of the bounding box and the object annotation box, and its calculation method is as follows:

其中(tx,ty,tw,th)代表标注框truth位置属性(x,y,w,h)。Where (tx, ty, tw, th) represents the truth position attributes (x, y, w, h) of the annotation box.

(px,py,pw,ph)代表边界框pred位置属性(x,y,w,h)。(px, py, pw, ph) represents the bounding box pred position attributes (x, y, w, h).

Pr(Object)表示网格存在待预测的物体的概率。在训练过程中,如果没有待测物体物体落入网格中,那么Pr(Object)为0,否则为1。Pr(Object) represents the probability that the object to be predicted exists in the grid. During the training process, if no object to be predicted falls into the grid, then Pr(Object) is 0, otherwise it is 1.

在训练过程,每个网格对应一个像素矩阵,每个像素矩阵作为神经网络的输入。对于每个网格,网络对于每个边界框给出的输出为(x,y,w,h,conf,c1......cn),其中(x,y,w,h)给出了边界框的位置,conf为边界框的置信度,c1......cn表示物体的类别概率。During the training process, each grid corresponds to a pixel matrix, and each pixel matrix is used as the input of the neural network. For each grid, the network outputs (x, y, w, h, conf, c 1 ... c n ) for each bounding box, where (x, y, w, h) gives the position of the bounding box, conf is the confidence of the bounding box, and c 1 ... c n represents the class probability of the object.

YOLOv3采用多尺度特征对目标进行检测。经过32倍下采样得到的特征图感受野比较大,适合检测图片中尺寸比较大的目标;经过16倍下采样得到具有中等尺度感受野的特征图,适合检测图片中中等尺寸的目标;经过8倍下采样得到的特征图感受野比较小,适合检测图片中尺寸比较小的目标。YOLOv3 uses multi-scale features to detect targets. The feature map obtained after 32 times downsampling has a relatively large receptive field, which is suitable for detecting larger targets in the image; the feature map obtained after 16 times downsampling has a medium-scale receptive field, which is suitable for detecting medium-sized targets in the image; the feature map obtained after 8 times downsampling has a relatively small receptive field, which is suitable for detecting smaller targets in the image.

YOLOv3采用K-means聚类方法得到先验框的尺寸,并且为每一种下采样尺度设定3种不同大小的先验框,一共聚类出9种尺寸的先验框。当输入图片的分辨率是416*416时,得到的9个先验框在不同大小的特征图上的分配如表1所示。YOLOv3 uses the K-means clustering method to obtain the size of the prior frame, and sets 3 prior frames of different sizes for each downsampling scale, clustering a total of 9 prior frames of different sizes. When the resolution of the input image is 416*416, the distribution of the 9 prior frames on feature maps of different sizes is shown in Table 1.

表1 特征图与先验框Table 1 Feature maps and prior boxes

直接使用YOLO网络进行接触网鸟窝的检测效果并不理想,因为鸟窝在图片中占比很小,导致大量的网格进行无效运算,浪费计算资源。Directly using the YOLO network to detect bird nests on the contact network is not ideal, because the bird nests account for a very small proportion of the image, resulting in invalid calculations on a large number of grids and a waste of computing resources.

而且YoloV3神经网络使用在数据集上聚类得到的先验锚框。Moreover, the YoloV3 neural network uses the prior anchor boxes obtained by clustering on the dataset.

对于小尺度物体,大尺寸锚框与中等尺寸锚框的10U值不会很高,整体精度不高。For small-scale objects, the 10U value of large-size anchor boxes and medium-size anchor boxes will not be very high, and the overall accuracy is not high.

图10为本发明实施例提供的一种单级网络预测示意图,如图10所示,左上角一个网格进行了有效计算。FIG10 is a schematic diagram of a single-stage network prediction provided by an embodiment of the present invention. As shown in FIG10 , a grid in the upper left corner performs effective calculation.

单网络的预测期望为:The prediction expectation of a single network is:

记:为有效预测框的平均IOUremember: is the average IOU of the valid prediction box

式中为鸟窝图片占比。In the formula The proportion of bird nest pictures.

在接触网环境中,受物理因素影响鸟窝的环境分布具有连续性,即鸟窝在接触网中的位置有界。上述得出的兴趣域即为鸟窝的全部可能出现区域。可以用公式表达如下:In the overhead line environment, the environmental distribution of bird nests is continuous due to physical factors, that is, the location of bird nests in the overhead line is bounded. The domain of interest obtained above is the total possible area where bird nests may appear. It can be expressed as follows:

P(distribution)=P(鸟窝存在于图片的兴趣域|图片含有鸟窝)=1P(distribution) = P(the bird's nest exists in the interest domain of the image | the image contains the bird's nest) = 1

基于鸟窝分布的先验条件,可用级联的YOLO神经网络先识别兴趣域,再从网络给出的兴趣域预测子图片中识别鸟窝的位置。Based on the prior condition of bird nest distribution, a cascaded YOLO neural network can be used to first identify the region of interest, and then identify the location of the bird nest from the predicted sub-image of the region of interest given by the network.

(1)第一级预测:(1) First level prediction:

图11为本发明实施例提供的一种级联网络的第一级网络预测示意图。分割效果如图11所示,黑色框为模板匹配给出的兴趣域,黑色分割线为yolo网络划分的网格。只有左上角的四个网格负责预测兴趣域,具有置信度,其它网格的置信度为0。Figure 11 is a schematic diagram of the first-level network prediction of a cascade network provided by an embodiment of the present invention. The segmentation effect is shown in Figure 11, the black box is the domain of interest given by template matching, and the black segmentation line is the grid divided by the yolo network. Only the four grids in the upper left corner are responsible for predicting the domain of interest and have confidence, and the confidence of other grids is 0.

第一级网络检测的置信度和期望如下:The confidence and expectations for the first level of network detection are as follows:

式中Pr(Zone)为当前网格具有待测物体(兴趣域)的概率,在训练过程中,待测网格包含兴趣域,Pr(Zone)为1,否则为0,为网格预测的标注框与兴趣域真实所在的矩形框的交并比。B为每个网格预测的标注框数,S2为图片的总划分网格数,为物体所在的所有网格做出预测的所有预测框IOU的平均值。i(Zone)为兴趣域区域大小,I(image)为原始图片的大小。E(Zone)为图片给出的总IOU之和,体现了对物体预测正确的整体把握程度。Where Pr(Zone) is the probability that the current grid has the object to be tested (domain of interest). During the training process, if the grid to be tested contains the domain of interest, Pr(Zone) is 1, otherwise it is 0. is the intersection-over-union ratio of the grid predicted annotation box and the rectangular box where the interest area actually is. B is the number of annotation boxes predicted for each grid, S 2 is the total number of grids divided into the image, The average IOU of all predicted boxes for all grids where the object is located. i(Zone) is the size of the region of interest, and I(image) is the size of the original image. E(Zone) is the sum of the total IOU given by the image, which reflects the overall degree of accuracy of the object prediction.

(2)第二级预测:(2) Second level prediction:

图12为本发明实施例提供的一种级联网络的第二级网络预测示意图。第二级Yolo网络以第一级网络预测的兴趣域子图片集作为输入,对子图集进行YOLO级联检测,其本质为网格的二次划分,尽可能地在训练过程中增大边界框的IOU来提高训练的准确度并且进行尽可能多的有效计算。Figure 12 is a schematic diagram of the second-level network prediction of a cascade network provided by an embodiment of the present invention. The second-level Yolo network takes the sub-image set of the domain of interest predicted by the first-level network as input, and performs YOLO cascade detection on the sub-image set, which is essentially a secondary division of the grid, and increases the IOU of the bounding box as much as possible during the training process to improve the accuracy of the training and perform as many effective calculations as possible.

第二级网络的置信度为:The confidence of the second-level network is:

式中Pr(Birdnest)为当前网格具有待测物体(鸟窝)的概率,在训练过程中,待测网格包含鸟窝,Pr(Birdnest)为1,否则为0,为网格预测的标注框与鸟窝真实所在的矩形框的交并比。P(distribution)为图片的鸟窝存在于兴趣域中的概率,显然所有的鸟窝都存在于兴趣域中,所以此项为1。Where Pr(Birdnest) is the probability that the current grid has the object to be tested (bird nest). During the training process, if the grid to be tested contains a bird nest, Pr(Birdnest) is 1, otherwise it is 0. It is the intersection-over-union ratio of the grid predicted annotation box and the rectangular box where the bird’s nest actually is. P(distribution) is the probability that the bird’s nest in the image exists in the domain of interest. Obviously, all bird’s nests exist in the domain of interest, so this item is 1.

兴趣域中的鸟窝预测期望为:The prediction expectation of bird nests in the domain of interest is:

式中Pri(birdnest)为兴趣域子图像中,子图像中第i个网格具有包含鸟窝的概率,为子图像第i个网格预测的第j个矩形框与鸟窝图片的交并比,confidence(Zone)为原始图像网格预测的一个矩形框标注出兴趣域的置信度,表示原始图像的划分网格做出预测的一个预测框能够标注出鸟窝的把握程度。Where Pr i (birdnest) is the probability that the i-th grid in the sub-image contains a bird nest in the sub-image of the interest domain. is the intersection-over-union ratio of the jth rectangular box predicted by the i-th grid of the sub-image and the bird's nest image, confidence(Zone) is the confidence of the interest region marked by a rectangular box predicted by the original image grid, A prediction box representing the divided grid of the original image can mark the degree of certainty of the bird's nest.

级联预测的期望为:The expectation of the cascade prediction is:

式中为兴趣域子图像中鸟窝包含的网格做出的预测标注框与鸟窝所在矩形的平均交并比。(预测标注框由先验聚类算法给出,其与子图像大小成正比,所以子图像预测框要比在原始大图像中的预测框大小更加适合鸟窝的形状,因此其平均的IOU值要大)。其余参数意义同上文。In the formula The average intersection-over-union ratio of the predicted annotation box of the grid contained in the bird's nest in the sub-image of the interest domain and the rectangle where the bird's nest is located. (The predicted annotation box is given by the prior clustering algorithm, which is proportional to the size of the sub-image, so the sub-image prediction box is more suitable for the shape of the bird's nest than the prediction box size in the original large image, so its average IOU value is larger). The meanings of the other parameters are the same as above.

式中代表网格预测锚框的平均IOU值,因为锚框大小通过数据集聚类先验给出,它与输入图片的大小成正相关。待检测物体与锚框的比例越相近,平均IOU值越大,因此式中给出的均大于原始的级联预测精度大于原始预测精度。In the formula represents the average IOU value of the grid prediction anchor box, because the size of the anchor box is given by the dataset clustering prior, and it is positively correlated with the size of the input image. The closer the ratio of the object to be detected to the anchor box, the larger the average IOU value, so the given formula is and Both are greater than the original The cascade prediction accuracy is greater than the original prediction accuracy.

另一方面,单级神经网络的平均预测精度与训练样本数据集的数量成正相关,设神经网络在训练样本在Base图片标注Object,并且数量为n时的平均预测精度为F(Object,Base,n)。On the other hand, the average prediction accuracy of a single-stage neural network is positively correlated with the number of training sample data sets. Suppose the average prediction accuracy of the neural network when the training samples are labeled with Object in the Base picture and the number is n is F(Object, Base, n).

由以上讨论,级联期望精度高于单级网络期望精度:From the above discussion, the expected accuracy of the cascade is higher than that of the single-stage network:

F(birdnest,Zone,n)*F(Zone,image,n)>F(birdnest,image,n)F(birdnest, Zone, n)*F(Zone, image, n)>F(birdnest, image, n)

接触网鸟窝的数据集中,带有鸟窝标注的样本数据集很少,大部分图片不含鸟窝。但兴趣域数据集样本数量极大,与待测物体鸟窝无关的兴趣域对鸟窝的分布具有信息增益,兴趣域的训练增益可以增高整体检测器的精度。设数据集样本数量为M,包含鸟窝的样本数量为N。In the dataset of bird nests in the contact network, there are few sample datasets with bird nest annotations, and most images do not contain bird nests. However, the number of samples in the interest domain dataset is extremely large. The interest domain that is not related to the bird nest to be detected has information gain on the distribution of bird nests. The training gain of the interest domain can increase the accuracy of the overall detector. Let the number of dataset samples be M, and the number of samples containing bird nests be N.

则训练器的整体精度为:Then the overall accuracy of the trainer is:

P=F(birdnest,Zone,N)*F(Zone,image,M)>F(birdnest,image,N)P=F(birdnest,Zone,N)*F(Zone,image,M)>F(birdnest,image,N)

兴趣域数据集比较容易得到,M>>N,当样本足够多时,第一级检测器可以分辨出待测图片的所有兴趣域。The interest domain dataset is relatively easy to obtain, M>>N. When there are enough samples, the first-level detector can distinguish all interest domains of the image to be tested.

F(Zone,image,M)→1F(Zone, image, M)→1

经过第一级检测器的极限放大,检测器的精度达到极值:After the extreme amplification of the first-stage detector, the accuracy of the detector reaches its extreme value:

Pmax=F(birdnest,Zone,N)P max =F (birdnest, Zone, N)

检测器的精度由第二级检测器的精度决定,第一级检测器起到了物体放大的作用,第二级检测器在兴趣域中检测待测物体鸟窝,物体的平均IoU值明显大于单级检测器,检测器的检测精度明显提高。The accuracy of the detector is determined by the accuracy of the second-stage detector. The first-stage detector plays the role of object magnification. The second-stage detector detects the bird's nest in the domain of interest. The average IoU value of the object is significantly greater than that of the single-stage detector, and the detection accuracy of the detector is significantly improved.

实施例二Embodiment 2

实验环境与数据Experimental environment and data

本发明的实验环境和系统配置如下:The experimental environment and system configuration of the present invention are as follows:

(1)硬件配置: CoreTM i9-10900K@3.70GHz+NVIDIA Geforce RTX 3090+64GB内存(1) Hardware configuration: Core TM i9-10900K@3.70GHz+NVIDIA Geforce RTX 3090+64GB memory

(2)操作系统:Windows10(2) Operating system: Windows 10

(3)深度学习框架:CUDA 11.0+Pytorch1.7.0(3) Deep learning framework: CUDA 11.0+Pytorch1.7.0

逆向推理Reverse reasoning

实验数据来自于检测车在某重载铁路采集的接触网检测视频。人工处理视频帧得到图片集,挑选出共400张含有鸟窝的图片,采用逆向推理得到130张兴趣域中的鸟窝图片用于第二级深度学习检测器训练。经过人工校验筛选出58张兴趣域图片作为模板库对视频集3900张无鸟窝图片进行模板匹配,对经过匹配之后的数据集进行人工核验修正用于第一级深度学习检测器训练。采用其它深度学习模型进行对比训练,原始的400张图片中的240张作为深度学习训练集,80张作为验证集,对其余80张图片采用数据增强方法得到126张包含鸟窝图片进行测试。实验数据组织结构如表1所示:The experimental data comes from the overhead line inspection video collected by the inspection vehicle on a heavy-load railway. The video frames were manually processed to obtain a picture set, and a total of 400 pictures containing bird nests were selected. Reverse reasoning was used to obtain 130 bird nest pictures in the domain of interest for the second-level deep learning detector training. After manual verification, 58 pictures of the domain of interest were selected as the template library to perform template matching on the 3,900 pictures without bird nests in the video set. The matched data set was manually verified and corrected for the first-level deep learning detector training. Other deep learning models were used for comparative training. 240 of the original 400 pictures were used as the deep learning training set, 80 as the validation set, and the remaining 80 pictures were subjected to data enhancement methods to obtain 126 pictures containing bird nests for testing. The organizational structure of the experimental data is shown in Table 1:

表1 实验数据组织结构Table 1 Experimental data organization structure

实验数据集共有含鸟窝的待测图片经过逆向推理算法筛选得到的图片的数量为130张,为鸟窝所在接触网的兴趣域,图片集中图片间接触网区域结构相似,能够有效地反映出鸟窝存在的空间特征。得到的130张图片作为第二级检测器训练的数据集,但部分图片具有较多的环境背景因素,环境噪声较大,对模板匹配过程具有较强的干扰,需要人工剔除。经过人工剔除后的图片构成了模板匹配的模板库。The experimental data set contains 130 pictures of bird nests to be tested. The pictures are selected by the reverse reasoning algorithm. They are the interest area of the contact network where the bird nests are located. The contact network area structure between the pictures in the picture set is similar, which can effectively reflect the spatial characteristics of the bird nests. The 130 pictures obtained are used as the data set for the second-level detector training. However, some pictures have more environmental background factors and large environmental noise, which have strong interference on the template matching process and need to be manually removed. The pictures after manual removal constitute the template library for template matching.

模板匹配Template Matching

对数据集包含3900张包含接触网的图片根据模板匹配算法进行模板匹配,标注出兴趣域。由于模板库的环境误差,部分样本存在标注偏差,需要人工校验核对。对于匹配错误的检测框进行剔除,并且对匹配数量较少的图片进行补充标注。The dataset contains 3900 images of overhead lines. Template matching is performed according to the template matching algorithm to mark the domain of interest. Due to the environmental error of the template library, some samples have annotation deviations and need to be manually checked. The detection frames with incorrect matching are removed, and the images with a small number of matches are supplemented with annotations.

也正是由于模板匹配的局限性,对未知样本进行模板匹配很可能不能准确地标注出所有兴趣域。经人工核对后的模板匹配结果作为第一级检测器训练的数据集,利用深度学习算法学习兴趣域的图像特征对未知样本进行兴趣域识别,具有较强的泛化能力,能够较准确地针对铁路沿线所有可能存在鸟害的接触网区域进行识别。Due to the limitations of template matching, it is very likely that template matching of unknown samples cannot accurately mark all interest areas. The template matching results after manual verification are used as the data set for the first-level detector training. The deep learning algorithm is used to learn the image features of the interest area to identify the interest area of unknown samples. It has strong generalization ability and can accurately identify all contact network areas along the railway where bird damage may exist.

一些主流的深度学习模型检测Some mainstream deep learning model detection

通过正确预测的样本数与总预测样本数的比值来反映预测的准确率,准确率越高,说明检测模型越精确。而检出率是指正确预测的样本数与真实样本总数的比值,检出率越高说明检测模型越可靠。The accuracy of the prediction is reflected by the ratio of the number of correctly predicted samples to the total number of predicted samples. The higher the accuracy, the more accurate the detection model. The detection rate refers to the ratio of the number of correctly predicted samples to the total number of true samples. The higher the detection rate, the more reliable the detection model.

为了实现实时检测,在保证模型具有比较高的检测精度的同时,还要尽可能提高模型的检测速度。一般使用每秒内处理图像的数量(秒帧率FPS)反映检测速度的快慢,对于单网络检测器,其检测速度为检测网络处理图像的速度;而对于级联网络,采用第一级检测器与第二级检测器串行计算,且并行检测每个兴趣域的计算方法。In order to achieve real-time detection, while ensuring that the model has a relatively high detection accuracy, the detection speed of the model should be increased as much as possible. Generally, the number of images processed per second (frame rate per second FPS) is used to reflect the speed of detection. For a single network detector, its detection speed is the speed at which the detection network processes images; for a cascade network, the first-level detector and the second-level detector are calculated in series, and the calculation method of parallel detection of each domain of interest is adopted.

当数据集规模较小时,无论YOLO系列等two-stage算法还是Faster R-CNN等one-stage算法,均对小目标物体的识别能力有限,并且接触网鸟窝形体特征较为单一,难以与环境样本区分,具有很大的训练难度,在测试集很难有较好的表现。When the dataset is small, both two-stage algorithms such as the YOLO series and one-stage algorithms such as Faster R-CNN have limited recognition capabilities for small target objects. In addition, the shape features of contact wire bird nests are relatively simple and difficult to distinguish from environmental samples. They are very difficult to train and it is difficult to perform well in the test set.

Faster R-CNN在训练样本较小的情况下,由于鸟窝形体较小,难以很好地学习到鸟窝样本的整体特征,表达能力欠佳,容易出现漏标的情况,影响检测器的检出率。When the training sample size is small, Faster R-CNN has difficulty learning the overall features of bird nest samples due to their small size. It has poor expression ability and is prone to missing labels, which affects the detection rate of the detector.

YOLO系列网络,能够对物体进行多尺度识别,检测能力强于Faster R-CNN网络,但鸟窝的形体特征较为单一,容易与环境样本相混淆,容易出现误标的情况,影响检测器的准确率。The YOLO series of networks can identify objects at multiple scales and have stronger detection capabilities than the Faster R-CNN network. However, the shape features of bird nests are relatively simple and can be easily confused with environmental samples, which can easily lead to mislabeling and affect the accuracy of the detector.

YOLOv3-spp,YOLOv4与Faster R-CNN等识别检测模型进行识别,结果如下:The results of the recognition detection models such as YOLOv3-spp, YOLOv4 and Faster R-CNN are as follows:

由表可以看出,无论是YOLO系列等two-stage网络还是Faster R-CNN等one-stage网络,当鸟窝数据集较少且体积较小、形体特征单一时,网络的学习效果不佳,难以学习到鸟窝存在的整体性特征,均无法准确地标注出铁路沿线接触网的鸟窝。It can be seen from the table that whether it is a two-stage network such as the YOLO series or a one-stage network such as Faster R-CNN, when the bird nest data set is small, the volume is small, and the shape features are single, the network learning effect is poor, it is difficult to learn the overall characteristics of the bird nest, and it is impossible to accurately mark the bird nests on the contact network along the railway.

级联检测Cascade detection

(1)第一级检测器(1) First-stage detector

第一级检测器使用Faster R-CNN、yolov3-spp与yolov4检测模型进行兴趣域识别任务训练。The first-level detector uses Faster R-CNN, yolov3-spp and yolov4 detection models for training domain of interest recognition tasks.

使用不同网络结构对兴趣域进行检测。Use different network structures to detect the domain of interest.

YOLOv3-SPP网络层数为225层,参数量为6250万,而YOLOv4网络层数为327层,参数量为6400万。YOLOv3和YOLOv4的主干网络结构大致相同,YOLOv4在训练过程方面做了优化改进:YOLOv3的预测框回归采用GIOU_Loss,它无法区分物体的相对位置关系,而YOLOv4采用了CIOU_Loss,它在GIOU_Loss的基础上考虑了边界框中心点距离与边界框宽高比等尺度信息,大大提高了预测框回归的精度。YOLOv3-SPP has 225 layers and 62.5 million parameters, while YOLOv4 has 327 layers and 64 million parameters. The backbone network structures of YOLOv3 and YOLOv4 are roughly the same, and YOLOv4 has made optimizations and improvements in the training process: YOLOv3's prediction box regression uses GIOU_Loss, which cannot distinguish the relative position relationship of objects, while YOLOv4 uses CIOU_Loss, which takes into account scale information such as the distance between the center points of the bounding box and the aspect ratio of the bounding box on the basis of GIOU_Loss, greatly improving the accuracy of the prediction box regression.

本发明实施例提供的一种Yolov3-GIO训练过程中的IOU曲线示意图如图13所示,本发明实施例提供的一种Yolov4-CIOU训练过程中的IOU曲线示意图如图14所示,由图13和图14可以看出在训练进度相同时,YOLOv4的CIOU远大于YOLOv3的GIOU,它更多地包含了兴趣域的中心位置和区域大小,能够更好地学习到兴趣域的区域特征。A schematic diagram of an IOU curve in a Yolov3-GIO training process provided by an embodiment of the present invention is shown in Figure 13, and a schematic diagram of an IOU curve in a Yolov4-CIOU training process provided by an embodiment of the present invention is shown in Figure 14. It can be seen from Figures 13 and 14 that when the training progress is the same, the CIOU of YOLOv4 is much larger than the GIOU of YOLOv3. It contains more of the center position and area size of the domain of interest, and can better learn the regional characteristics of the domain of interest.

Faster R-CNN对比YOLO系列采用two-stage检测,它首先产生候选区域,生成的预测框数量多于YOLO系列网络,然后对候选框进行分类和回归。而YOLO系列的one-stage算法直接对输入图片进行分类和回归,不产生候选区域。因此Faster R-CNN具有更低的错误率和漏识别率,但one-stage算法的识别速度较慢。Faster R-CNN uses two-stage detection compared to the YOLO series. It first generates candidate regions and generates more prediction boxes than the YOLO series network, and then classifies and regresses the candidate boxes. The one-stage algorithm of the YOLO series directly classifies and regresses the input image without generating candidate regions. Therefore, Faster R-CNN has a lower error rate and missed recognition rate, but the recognition speed of the one-stage algorithm is slower.

对铁路沿线接触网测试数据集共126张图片进行测试,不同网络模型的预测结果如下:A total of 126 images of the railway overhead contact network test data set were tested, and the prediction results of different network models are as follows:

第一级网络模型兴趣域检测数量对比Comparison of the number of interest area detections in the first-level network model

(2)第二级检测器(2) Second stage detector

对模板库共130张鸟窝图片进行第二级检测器的模型训练。The model of the second-level detector is trained on a total of 130 bird nest images in the template library.

第二级检测器的检测任务为在兴趣域中识别鸟窝,数据集中的图片经双线性插值放大,即使是体积很小的鸟窝也能在图片占有很大的面积,并且有效地排除了环境噪声的干扰因素,具有很强的抗干扰能力。由于鸟窝随原图片经双线性插值放大,其训练的IOU值也得到了提高,对比相同的检测模型,IOU值在训练进度相同的情况对比如下:图15为本发明实施例提供的一种YOLOv3-SPP第二级检测与直接检测I0U对比示意图,图16为本发明实施例提供的一种YOLOv4第二级检测与直接检测IOU对比示意图。由图15和图16可以看出,相同YOLO系列下的检测模型对鸟窝经双线性插值放大后的图像具有更高的IOU值,能够更精确地定位鸟窝的位置。The detection task of the second-level detector is to identify the bird's nest in the domain of interest. The pictures in the data set are enlarged by bilinear interpolation. Even a bird's nest with a small volume can occupy a large area in the picture, and the interference factor of environmental noise is effectively eliminated, and it has a strong anti-interference ability. Since the bird's nest is enlarged by bilinear interpolation with the original picture, its training IOU value is also improved. Compared with the same detection model, the IOU value is compared as follows under the same training progress: Figure 15 is a YOLOv3-SPP second-level detection and direct detection IOU comparison schematic diagram provided by an embodiment of the present invention, and Figure 16 is a YOLOv4 second-level detection and direct detection IOU comparison schematic diagram provided by an embodiment of the present invention. It can be seen from Figures 15 and 16 that the detection model under the same YOLO series has a higher IOU value for the image enlarged by bilinear interpolation of the bird's nest, and can more accurately locate the position of the bird's nest.

对测试集126张图片共222个鸟窝进行检测,使用不同的第一级检测器检测兴趣域,测试第二级检测器的检测性能,采用第一、二级检测器串行计算,第二级检测器并行检测的计算方法。A total of 222 bird nests in 126 pictures of the test set were detected. Different first-level detectors were used to detect the domain of interest, and the detection performance of the second-level detector was tested. The calculation method of serial calculation of the first and second-level detectors and parallel detection of the second-level detector was adopted.

其检测结果如下:The test results are as follows:

不同目标检测算法性能对比Performance comparison of different object detection algorithms

由表结果可以看出,三种检测模型级联结果的最差检出率为84.68%,仍然高于直接检测的最好结果77.48%。级联网络识别,通过第一级网络诱导识别先验的兴趣域,而第二级在兴趣域中识别待测物体鸟窝,兴趣域中鸟窝的分布情况较为单一,可以减少其它环境样本的干扰,提高了鸟窝的区分度,降低训练的难度。因此,级联网络对第二级待测物体的数据集规模依赖较小,只需要很少的图片就可以学习到兴趣域中鸟窝的分布特征。As can be seen from the results in the table, the worst detection rate of the cascade results of the three detection models is 84.68%, which is still higher than the best result of direct detection, 77.48%. Cascade network recognition, through the first-level network to induce recognition of the prior interest domain, and the second level to identify the object to be tested, the bird's nest, in the interest domain. The distribution of the bird's nest in the interest domain is relatively simple, which can reduce the interference of other environmental samples, improve the distinction of the bird's nest, and reduce the difficulty of training. Therefore, the cascade network is less dependent on the size of the data set of the second-level object to be tested, and only a few pictures are needed to learn the distribution characteristics of the bird's nest in the interest domain.

由于第一级检测器使用Faster-RCNN检测出的兴趣域数量最多,它能够达到级联网络的极值条件,因此检出率最佳的检测器为Faster-RCNN级联YOLOv3-SPP网络,但Faster-RCNN的检测速率较低,整体检测器的检测速率受限于第一级检测器,无法满足实时检测的性能。YOLOv4网络的表达能力强于YOLOv3,其在样本较多、待测物体特征容易区分的兴趣域识别训练中表现优异,但在样本数量较少、样本特征不明显的训练中表现准确率不高。级联的YOLOv3-SPP网络与YOLOv4级联YOLOv3-SPP网络均具有较高的检出率与准确率,并且具有较高的FPS。能够对铁路沿线视频进行实时检测。Since the first-level detector uses Faster-RCNN to detect the largest number of interest domains, it can reach the extreme value condition of the cascade network, so the detector with the best detection rate is the Faster-RCNN cascade YOLOv3-SPP network, but the detection rate of Faster-RCNN is low. The detection rate of the overall detector is limited by the first-level detector and cannot meet the performance of real-time detection. The YOLOv4 network has stronger expression ability than YOLOv3. It performs well in interest domain recognition training with a large number of samples and easy-to-distinguish features of the objects to be tested, but the accuracy is not high in training with a small number of samples and unclear sample features. The cascaded YOLOv3-SPP network and the YOLOv4 cascaded YOLOv3-SPP network both have high detection rates and accuracy rates, and have high FPS. It can perform real-time detection of videos along the railway.

综上所述,本发明实施例提出的高速铁路接触网鸟窝自动识别和快速追踪的方法,可以有效解决高速铁路接触网上鸟窝的准确快速识别和追踪问题;可以有效解决由于接触网中鸟窝信息量较少,缺乏显著的形状或纹理特征造成的识别困难;可以在表示目标远小于场景范围的情况下将搜寻范围缩小从而转换为大目标追踪的情形,能够显著提高准确率。从而能够对铁路接触网进行有效的自动鸟窝识别检测。In summary, the method for automatic identification and rapid tracking of bird nests on high-speed railway contact networks proposed in the embodiments of the present invention can effectively solve the problem of accurate and rapid identification and tracking of bird nests on high-speed railway contact networks; can effectively solve the identification difficulties caused by the small amount of bird nest information in the contact network and the lack of significant shape or texture features; can narrow the search range when the target is much smaller than the scene range, thereby converting it to a large target tracking situation, which can significantly improve the accuracy rate. Thus, effective automatic identification and detection of bird nests on railway contact networks can be performed.

本发明提出了一种基于联级神经网络的追踪方法,由第一级神经网络表示出鸟窝出现的兴趣域再将其用于第二级神经网络的训练,从而在追求准确度的同时保持较高的追踪速度。The present invention proposes a tracking method based on a cascaded neural network, in which the first-level neural network represents the interest domain where the bird's nest appears and then uses it for training the second-level neural network, thereby maintaining a high tracking speed while pursuing accuracy.

已有的深度学习模型直接检测鸟窝的难度很大,效果不佳,我们据此发明了一种在数据集很少,检测目标很小的情况下,通过结合已有的算法对数据集进行处理并且设计了一种级联架构的检测网络,能够以优秀检测能力完成复杂检测任务的检测方法。本文着重在论述级联检测结构对比传统但网络检测的优势,以及它具有优秀检测能力的原理。Existing deep learning models are very difficult to detect bird nests directly, and the effect is not good. Based on this, we have invented a detection method that can complete complex detection tasks with excellent detection capabilities by combining existing algorithms to process the data set and designing a cascaded detection network when there are few data sets and the detection targets are small. This article focuses on the advantages of the cascade detection structure compared to traditional network detection, and the principle of its excellent detection capabilities.

本领域普通技术人员可以理解:附图只是一个实施例的示意图,附图中的模块或流程并不一定是实施本发明所必须的。Those skilled in the art can understand that the accompanying drawings are only schematic diagrams of one embodiment, and the modules or processes in the accompanying drawings are not necessarily required to implement the present invention.

通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本发明可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例或者实施例的某些部分所述的方法。It can be known from the description of the above implementation methods that those skilled in the art can clearly understand that the present invention can be implemented by means of software plus a necessary general hardware platform. Based on such an understanding, the technical solution of the present invention can be essentially or partly contributed to the prior art in the form of a software product, which can be stored in a storage medium such as ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in the various embodiments of the present invention or certain parts of the embodiments.

本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置或系统实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置及系统实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。Each embodiment in this specification is described in a progressive manner, and the same or similar parts between the embodiments can refer to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the device or system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can refer to the partial description of the method embodiment. The device and system embodiments described above are merely schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. Ordinary technicians in this field can understand and implement it without paying creative labor.

以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求的保护范围为准。The above is only a preferred specific embodiment of the present invention, but the protection scope of the present invention is not limited thereto. Any changes or substitutions that can be easily thought of by a person skilled in the art within the technical scope disclosed by the present invention should be included in the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims (4)

1. The method for detecting the bird nest of the railway contact net is characterized by comprising the following steps of:
obtaining an interest domain picture containing a bird nest region through reverse reasoning according to a picture containing a bird nest in a railway contact net picture data set, taking the interest domain picture as a template picture, forming a template library according to all the template pictures, and training a second-level YOLO detector by using the template library;
sequentially matching the pictures which do not contain bird nest in the railway contact net picture data set with each template picture in the template library to obtain an interest domain picture data set, and training a first-stage YOLO detector by using the interest domain picture data set;
Inputting a picture to be detected into a trained first-stage YOLO detector, outputting a region-of-interest picture by the first-stage YOLO detector, inputting the region-of-interest picture into a trained second-stage YOLO detector, and outputting a bird nest detection result of the picture to be detected by the second-stage YOLO detector;
The method comprises the steps that a picture containing a bird nest in a railway contact net picture data set is subjected to reverse reasoning to obtain a region-of-interest picture containing a bird nest region, the region-of-interest picture is used as a template picture, and a template library is formed according to all the template pictures, and the method comprises the following steps:
Performing preliminary segmentation on a picture containing a bird nest in a railway contact net picture dataset to obtain a basic region with set similarity, performing preliminary merging on the basic region according to the difference between the regions to obtain a series of preliminary candidate regions, surrounding the preliminary candidate regions by rectangular frames, merging the preliminary candidate regions according to the similarity between the rectangular frames to obtain a final candidate region, manually marking the bird nest position in the final candidate region, using a rectangle to represent the marked bird nest region attribute, using the final candidate region containing the bird nest region as an interest region, using the interest region picture as a template picture, and forming a template library according to all the template pictures;
The method for primarily dividing the pictures containing the bird nest in the railway contact net picture data set to obtain basic areas with set similarity, primarily combining the basic areas according to the difference between the areas to obtain a series of primary candidate areas, and comprises the following steps:
A picture containing bird's nest is represented by an undirected graph G= < V, E >, wherein the vertex of the undirected graph represents a pixel point of the picture, the weight of the edge e= (V i,vj) represents the dissimilarity of the adjacent vertex pair i, j, and the color distance of the pixel is used Representing dissimilarity w (e) between two pixels, one basic region being a set of points with minimum dissimilarity;
the intra-class differences of the base region are defined as:
The inter-class difference between the two base areas C 1、C2 defines the minimum connecting edge of the two base areas:
If two base regions are not edge-connected, diff (C 1,C2) = infinity
When the condition Diff (C 1,C2)≤min(Int(C1)+τ(C1),Int(C2)+τ(C2) is satisfied), it is judged that the two basic areas C 1、C2 can be merged;
Where τ (C) is a threshold function that weights the region of isolated points:
τ(C)=k/‖C‖
Preliminary merging is carried out on each basic region to obtain a series of preliminary candidate regions;
The preliminary candidate areas are surrounded by rectangular frames, the preliminary candidate areas are combined according to the similarity between the rectangular frames, and a final candidate area is obtained, and the method comprises the following steps:
Surrounding the preliminary candidate region by a rectangular frame, wherein the position of the rectangular frame C is represented by a quadruple of (x, y, w, h), wherein x, y represents the coordinate of the upper left corner of the rectangular frame, and w, h represents the width and the height of the rectangular frame;
Rectangular box of preliminary candidate region r i And a rectangular box/>, of the preliminary candidate region r j The color distance between the two is as follows:
In the middle of Representing the pixel proportion of the kth bin of the color histogram;
Rectangular box of preliminary candidate region r i And a rectangular box/>, of the preliminary candidate region r j Rectangular box betweenAndThe texture distance between the two is as follows:
In the middle of Representing the k-th dimension pixel point proportion of the texture histogram;
for the preliminary candidate regions r i and r j:
Wherein size (r i) represents the size of a rectangular frame corresponding to the region r i, size (r j) represents the size of a rectangular frame corresponding to the region r j, size (im) represents the size of an original picture to be segmented, and S size(ri,rj) is the size similarity score of the rectangular regions r i and r j;
for the preliminary candidate regions r i and r j:
Wherein size (BB ij) represents the size of the circumscribed rectangle of the regions r i and r j, S fill(ri,rj) is the similarity score filled in the rectangular regions r i and r j, and the intersection degree of different regions is measured;
The total difference between the preliminary candidate regions r i and r j is:
S(ri,rj)=a1Scolour(ri,rj)+a2Stexture(ri,rj)+a3Ssize(ri,rj)+a4Sfill(ri,rj)
a 1,a2,a3,a4 is a corresponding weight value;
When the total difference S (r i,rj) between the preliminary candidate areas r i and r j is larger than a set merging threshold, merging the preliminary candidate areas r i and r j to obtain a final candidate area;
The step of sequentially matching the pictures which do not contain bird nest in the railway contact net picture data set with each template picture in the template library to obtain an interest domain picture data set comprises the following steps:
Sequentially matching a picture to be matched, which does not contain a bird nest, in a railway contact net picture data set with each template picture in a template library, wherein the template picture is set to be T, the picture to be matched, which does not contain a bird nest, is set to be I, the width of the template picture is set to be w, the height of the template picture is set to be h, and R represents a matching result, and the matching method is expressed by the following general expression:
Wherein:
The larger the R value is, the higher the similarity between a rectangular area with the size (w, h) at the (x, y) position of the picture to be matched and the template is, the maximum value of the similarity of the template is taken as a template matching result, and the template matching value is required to be higher than a threshold parameter;
Recording device
Each template picture corresponds to an optimal matching value R, the positions of rectangular matching frames corresponding to the R are (x, y, w and h), and a template primary matching result forms a result set S:
Wherein c is a matched threshold parameter;
the result set S is arranged in descending order of Rs value, and the intersecting condition of the two rectangular matching frames S and t rectangular is as follows:
max(x(s),x(t))≤min(x(s)+w(s),x(t)+w(t))
max(y(s),y(t))≤min(y(s)+h(s),y(t)+h(t))
And traversing the result set S in sequence, discarding the labeling if the current rectangular matching frame is intersected with the labeled rectangular matching frame, otherwise, labeling the current rectangular matching frame in the VOC format, and forming the interest domain data set by all the labeled rectangular matching frames.
2. The method according to claim 1, wherein the manually labeling the bird's nest position in the final candidate region, representing the labeled bird's nest region attribute with a rectangle, using the final candidate region including the bird's nest region as the interest region, using the interest region picture as the template picture, and constructing the template library according to all the template pictures, includes:
the final candidate region C is represented by a rectangle, and the position attribute of the rectangle is represented by a quadruple (x, y, w, h);
Labeling the nest position in the final candidate region, and representing the labeled nest region attribute by using a rectangle, wherein the rectangular position attribute is (bx, by, bw, bh), the interest region is a candidate region containing the nest region, and the position coordinates of the interest region satisfy the following conditions:
and satisfies a threshold condition:
And taking the interest domain picture as a template picture, and forming a template library according to all the template pictures.
3. The method according to any one of claims 1 to 2, wherein the first stage YOLO detector and the second stage YOLO detector comprise: YOLOv3-spp, YOLOv4 and Faster R-CNN.
4. The method of claim 3, wherein the confidence and expectations of the first-stage YOLO detector are as follows:
Wherein Pr (Zone) is the probability that the current grid has an object to be tested (interest domain), in the training process, the grid to be tested contains the interest domain, pr (Zone) is 1, otherwise is 0, For the intersection ratio of the marking frame of grid prediction and the rectangular frame where the interest domain is actually located, B is the marking frame number of each grid prediction, S 2 is the total division grid number of the picture, and/(A)The method comprises the steps that an average value of all prediction frames IOU of prediction is made for all grids where objects are located, I (Zone) is the size of a region of interest, I (image) is the size of an original picture, and E (Zone) is the sum of total IOUs given by the picture;
The confidence level of the second-stage YOLO detector is:
where Pr (Birdnest) is the probability that the current grid has a bird nest, the grid to be measured contains a bird nest, pr (Birdnest) is 1, otherwise is 0, For the intersection ratio of the labeling frame of grid prediction and the rectangular frame where the bird nest is truly located, P (distribution) is the probability that the bird nest of the picture exists in the interest domain, and obviously all bird nests exist in the interest domain, and the probability is 1;
the bird nest predictions in the interest domain are expected to be:
Where Pr i (birdnest) is the probability that the ith grid in the sub-image has a bird's nest, For the intersection ratio of the jth rectangular frame predicted by the ith grid of the sub-image and the nest picture, confidence (Zone) marks the confidence of the interest domain for one rectangular frame predicted by the original image grid, and the confidence of the interest domain is marked by the element of the matrixA prediction frame representing the prediction made by the divided grids of the original image can mark the grasping degree of the bird nest;
The expectations of cascading predictions are:
In the middle of Average cross-ratios of prediction annotation frames made for grids contained in bird's nest in interest domain sub-images and rectangles in which bird's nest is located, whereRepresenting an average IOU value of the grid prediction anchor frame;
the precision of the cascade prediction is:
P=F(birdnest,Zone,N)*F(Zone,image,M)>F(biednest,inage,N)。
CN202110249738.1A 2021-03-08 2021-03-08 A method for detecting bird nests in railway contact network Active CN112949634B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110249738.1A CN112949634B (en) 2021-03-08 2021-03-08 A method for detecting bird nests in railway contact network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110249738.1A CN112949634B (en) 2021-03-08 2021-03-08 A method for detecting bird nests in railway contact network

Publications (2)

Publication Number Publication Date
CN112949634A CN112949634A (en) 2021-06-11
CN112949634B true CN112949634B (en) 2024-04-26

Family

ID=76230073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110249738.1A Active CN112949634B (en) 2021-03-08 2021-03-08 A method for detecting bird nests in railway contact network

Country Status (1)

Country Link
CN (1) CN112949634B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114005137B (en) * 2021-10-23 2024-07-05 吉首大学 Method for establishing giant salamander characteristic sample database to identify individual identity of giant salamander
CN114494756B (en) * 2022-01-05 2025-05-23 西安电子科技大学 An improved clustering algorithm based on Shape-GIoU
CN114612778A (en) * 2022-03-14 2022-06-10 成都唐源电气股份有限公司 Contact net bird nest identification method and model based on vector neural network
CN115457324A (en) * 2022-09-16 2022-12-09 云南电网有限责任公司电力科学研究院 Labeling method, device, electronic equipment and storage medium for transmission and distribution line data
CN115511876B (en) * 2022-11-01 2025-07-11 北京交通大学 High-speed railway contact network bird thorn prevention identification method and system based on relative position perception
CN118692028B (en) * 2024-08-26 2024-11-19 成都考拉悠然科技有限公司 Iron tower bird nest monitoring method and system based on multi-mode large model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254303A (en) * 2011-06-13 2011-11-23 河海大学 Methods for segmenting and searching remote sensing image
CN107844770A (en) * 2017-11-03 2018-03-27 东北大学 A kind of electric melting magnesium furnace unusual service condition automatic recognition system based on video
CN110458102A (en) * 2019-08-12 2019-11-15 深圳市商汤科技有限公司 A face image recognition method and device, electronic device and storage medium
AU2020101011A4 (en) * 2019-06-26 2020-07-23 Zhejiang University Method for identifying concrete cracks based on yolov3 deep learning model
CN111597939A (en) * 2020-05-07 2020-08-28 西安电子科技大学 High-speed rail line nest defect detection method based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254303A (en) * 2011-06-13 2011-11-23 河海大学 Methods for segmenting and searching remote sensing image
CN107844770A (en) * 2017-11-03 2018-03-27 东北大学 A kind of electric melting magnesium furnace unusual service condition automatic recognition system based on video
AU2020101011A4 (en) * 2019-06-26 2020-07-23 Zhejiang University Method for identifying concrete cracks based on yolov3 deep learning model
CN110458102A (en) * 2019-08-12 2019-11-15 深圳市商汤科技有限公司 A face image recognition method and device, electronic device and storage medium
CN111597939A (en) * 2020-05-07 2020-08-28 西安电子科技大学 High-speed rail line nest defect detection method based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《改进YOLOv2卷积神经网络的多类型合作目标检测》;王建林等;《光学精密工程》;1-10 *

Also Published As

Publication number Publication date
CN112949634A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
CN112949634B (en) A method for detecting bird nests in railway contact network
CN111178197B (en) Instance Segmentation Method of Cohesive Pigs in Group Breeding Based on Mask R-CNN and Soft-NMS Fusion
Yang et al. Learning object bounding boxes for 3d instance segmentation on point clouds
Agarwal et al. Learning to detect objects in images via a sparse, part-based representation
CN109684922B (en) A multi-model recognition method for finished dishes based on convolutional neural network
CN114648665B (en) Weak supervision target detection method and system
CN111814902A (en) Target detection model training method, target recognition method, device and medium
CN105005764B (en) The multi-direction Method for text detection of natural scene
CN111783576A (en) Person re-identification method based on improved YOLOv3 network and feature fusion
CN101178770B (en) Image detection method and apparatus
CN102682287A (en) Pedestrian detection method based on saliency information
CN111898432A (en) A pedestrian detection system and method based on improved YOLOv3 algorithm
CN110188763B (en) Image significance detection method based on improved graph model
CN109165658B (en) A strong negative sample underwater target detection method based on Faster-RCNN
CN116092179A (en) Improved Yolox fall detection system
CN113221956A (en) Target identification method and device based on improved multi-scale depth model
CN106682681A (en) Recognition algorithm automatic improvement method based on relevance feedback
CN103679187A (en) Image identifying method and system
CN118609121B (en) A cell detection method and electronic device based on YOLOv8
CN115082776A (en) Electric energy meter automatic detection system and method based on image recognition
CN111881803A (en) Livestock face recognition method based on improved YOLOv3
CN118379589A (en) Photovoltaic panel abnormal state detection method based on multi-mode fusion and related equipment
CN117333796A (en) Ship target automatic identification method and system based on vision and electronic equipment
Shishkin et al. Implementation of yolov5 for detection and classification of microplastics and microorganisms in marine environment
CN106339665A (en) Fast face detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
OL01 Intention to license declared
OL01 Intention to license declared