Tearing detection method based on coding and decoding neural network
Technical Field
The invention belongs to the technical field of transportation monitoring of belt conveyors, and relates to a tearing detection method based on a coding and decoding neural network.
Background
The belt conveyor is an important mechanical device in the coal mine transportation process, and has the advantages of less transportation energy consumption, long distance, high efficiency and the like. Conveyor belt tearing is one of the common failures of belt conveyors. The main reasons for the tearing of the conveyor belt are that 1) hard foreign matters mixed in the transported coal press, smash and scratch the conveyor belt, 2) the belt conveyor has design defects or incorrect installation, which causes other sharp objects to damage the conveyor belt, and 3) the conveyor belt deviates, so that the carrier roller or the steel frame scratches the conveyor belt. When tearing takes place, can lead to the float coal to pile up, must stop transportation and in time handle this moment, otherwise arouse whole conveyer belt fracture, appear that the coal material is inclined and spill, damage transportation equipment, threaten the life safety of borehole operation personnel.
At present, the detection technology of tearing faults of the belt conveyor can be divided into three types, namely sensor detection, multispectral detection and computer vision detection. With the rapid development of artificial intelligence and computer hardware, the longitudinal tear detection method based on computer vision not only has higher detection speed and accuracy than other methods, but also has smaller workload of installation and maintenance. Therefore, the adoption of computer vision technology to solve the tearing detection problem is one of the intelligent development directions of coal mines. The method generally relies on an industrial camera to collect tearing image data, transmits the tearing image data to a processing end for image recognition, judges whether tearing exists or not, and completes tearing fault diagnosis.
However, the computer vision detection method has the following defects that 1) the illumination condition of the underground coal mine is poor, the quality of the visible light image acquired by the image acquisition system is poor, so that the subsequent image processing precision is reduced, 2) the torn image characteristic is constructed by manual statistics, when the field condition of the underground coal mine is changed, the previous tearing characteristic is not applicable any more, scientific researchers are required to re-extract the tearing characteristic, and the generalization capability is poor.
Disclosure of Invention
In view of the above, the invention aims to provide a tearing detection method based on a coding and decoding neural network, which solves the problems of poor illumination conditions under coal mine, poor quality of the obtained visible light image, no need of setting additional illumination equipment for light supplementing, simple structure, lower cost and convenient installation and maintenance.
In order to achieve the above purpose, the present invention provides the following technical solutions:
The tearing detection method based on the coding and decoding neural network is based on an image acquisition system, wherein the image acquisition system comprises a line laser transmitter, an industrial camera and an embedded development board, and comprises the following steps of:
S1, the line laser transmitter transmits line laser, the line laser irradiates the lower surface of the belt, the industrial camera acquires an image I (y, x), and the acquired image I (y, x) is transmitted to the embedded development board for processing;
S2, taking the image I (y, x) as input to perform image preprocessing to obtain an output characteristic F (x);
The image preprocessing method comprises the following steps:
Traversing x, and calculating y index values of the first k maximum gray values and the maximum values along the y direction, [ V, D ] = Topk (I (y, x)), wherein the scales of V and D are k multiplied by w, V represents the first k maximum gray values, D represents the y index values of the first k maximum values, and w is the image width;
obtaining key stripe coordinates and stripe gray scale through a weighted average sum algorithm:
Calculating the size of F (x) by the preset resolution of the industrial camera;
s3, performing semantic segmentation by taking F (x) as input of a coding and decoding neural network to obtain output data P;
s4, performing image post-processing, and calculating a torn circumscribed rectangle R in the image according to the P obtained in the S2;
the specific method comprises the following steps:
when P (1, x) is more than or equal to P (0, x), judging that x belongs to tearing pixels, and finding continuous tearing pixels to form a tearing interval [ x s,xe ].
Further, in the step S1, k is 16, and w is calculated by using an image bilinear interpolation algorithm.
Further, in S3, the size of P is 2×w.
Further, in the step S3, the semantic segmentation includes the steps of:
input feature map F 1×w is subjected to convolution operation of c×3 twice successively to obtain an output feature map
Inputting a feature mapThe pooling with the step length of 2 is firstly carried out, and then the convolution operation of 2c multiplied by 3 is carried out twice continuously, thus obtaining an output characteristic diagram
Inputting a feature mapThe pooling with the step length of 2 is firstly carried out, and then the convolution operation of 4c multiplied by 3 is carried out twice continuously, thus obtaining an output characteristic diagram
Inputting a feature mapPooling with step length of 2 is firstly carried out, then convolution operation of 8c multiplied by 3 is carried out twice continuously, and finally transposition convolution operation with step length of 2,4c multiplied by 3 is carried out, thus obtaining an output characteristic diagram
Inputting a feature mapAndAccording to the characteristic diagram of channel splicing
Inputting a feature mapThe convolution operation of 4c multiplied by 3 is continuously carried out twice, and then the transposition convolution operation with the step length of 2,2c multiplied by 3 is carried out, so that an output characteristic diagram is obtained
Inputting a feature mapAndSpliced into according to the channel
Inputting a feature mapThe convolution operation of 2c multiplied by 3 is carried out twice in succession, then the transposition convolution operation with the step length of 2 and c multiplied by 3 is carried out, and an output characteristic diagram is obtained
Inputting a feature mapAndSpliced into according to the channel
Inputting a feature mapThe convolution operation of c multiplied by 3 is continuously carried out twice, and then the convolution operation of 2 multiplied by 3 is carried out, so that an output characteristic diagram P 2×w is obtained;
where c represents the channel parameter.
Further, in S3, w=1024 and c=64. The parameters show that w is the width of the image, can be divided by 8 as much as possible, w is not too large, a large amount of hardware is consumed, w=1024 is recommended, an image bilinear interpolation algorithm can be used to adjust w to a proper range, c represents channel parameters, the value interval is between [32,128], a large amount of hardware is consumed, and c=64 is recommended.
The invention has the beneficial effects that:
According to the technical scheme, the neural network operation is carried out on the line structured light image, the torn external rectangle is detected, and whether the belt tearing fault occurs or not is judged, so that the tearing fault is timely detected and alarmed. Compared with the prior art, the line structure light technology is combined with the coding and decoding neural network, the problem that the underground illumination condition of a coal mine is poor, and the quality of an obtained visible light image is poor is solved, the technical prejudice that the underground phase structure light technology needs additional light supplementing in the prior art is overcome, the judgment accuracy is ensured, additional illumination equipment is not required to be arranged for light supplementing, the structure is simple, the cost is low, and the installation and maintenance are convenient.
Meanwhile, the scheme adopts a semantic segmentation network and performs image post-processing, training can be completed only by a small number of samples, the requirement on the number of data sample libraries is low, the generalization capability is strong, the adaptability to underground scene condition changes of coal mines is strong, the model scale of the scheme is small, the method is simple to calculate, the requirement on hardware equipment is low, and the cost is further reduced.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.
Drawings
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:
FIG. 1 is a system block diagram of a tearing detection method based on a neural network for encoding and decoding according to the present invention;
FIG. 2 is a schematic diagram of an input image of a tearing detection method based on a neural network for encoding and decoding according to the present invention;
fig. 3 is a schematic diagram of a structure of a neural network for encoding and decoding according to a tearing detection method of the present invention.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.
In which the drawings are for illustrative purposes only and are not intended to be construed as limiting the invention, and in which certain elements of the drawings may be omitted, enlarged or reduced in order to better illustrate embodiments of the invention, and not to represent actual product dimensions, it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
In the description of the present invention, it should be understood that, if there are terms such as "upper", "lower", "left", "right", "front", "rear", etc., the directions or positional relationships indicated are based on the directions or positional relationships shown in the drawings, only for convenience of describing the present invention and simplifying the description, rather than indicating or implying that the referred devices or elements must have a specific direction, be constructed and operated in a specific direction, so that the terms describing the positional relationships in the drawings are merely for exemplary illustration and are not to be construed as limitations of the present invention, and that the specific meanings of the terms described above may be understood by those skilled in the art according to specific circumstances.
Please refer to fig. 1-3, which are a tearing detection method based on a coding/decoding neural network. The image acquisition system is composed of line structured light and mainly comprises a line laser emitter, an industrial camera and an embedded development board.
S1, the line laser transmitter transmits line laser to the lower surface of the belt, an industrial camera collects images after reflection, and the industrial camera transmits the collected images to the embedded development board.
S2, image preprocessing
And (3) taking the image I (y, x) acquired by the industrial camera as output to perform image preprocessing, so as to obtain an output characteristic F (x).
The method specifically comprises the following steps:
Firstly, traversing x, and calculating y index values of the first k maximum gray values and the maximum values along the y direction, [ V, D ] = Topk (I (y, x)), wherein the scales of V and D are k multiplied by w, V represents the first k maximum gray values, D represents the y index values of the first k maximum values, and w is the image width;
Then, the key stripe coordinates and stripe gray scale are obtained by using a weighted average sum algorithm:
x∈[0,w-1];
the image width w is obtained from the preset resolution of the industrial camera, and is taken into the size of F (x) together with the k value to be calculated by the above formula, in this embodiment, k is preferably 16, the image width is preferably w=1024, and the size of F (x) is w, and is extended to 1×w.
S3, coding and decoding neural network
According to the structure of the codec neural network shown in fig. 3, the input data of this step is F (x), and enters the codec neural network to perform semantic segmentation, and finally the output data P, in this embodiment,
L1 module for inputting characteristic diagram F 1×w, and performing convolution operation of c×3 twice continuously to obtain output characteristic diagram
L2 module for inputting characteristic diagramThe pooling with the step length of 2 is firstly carried out, and then the convolution operation of 2c multiplied by 3 is carried out twice continuously, thus obtaining an output characteristic diagram
L3 module for inputting characteristic diagramThe pooling with the step length of 2 is firstly carried out, and then the convolution operation of 4c multiplied by 3 is carried out twice continuously, thus obtaining an output characteristic diagram
L4 module for inputting characteristic diagramPooling with step length of 2 is firstly carried out, then convolution operation of 8c multiplied by 3 is carried out twice continuously, and finally transposition convolution operation with step length of 2,4c multiplied by 3 is carried out, thus obtaining an output characteristic diagram
Cat module has three modules with the functions of (1) inputting characteristic diagramAndSpliced into according to the channel(2) Inputting a feature mapAndSpliced into according to the channel(3) Inputting a feature mapAndAccording to the characteristic diagram of channel splicing
R3 module for inputting characteristic diagramThe convolution operation of 4c multiplied by 3 is continuously carried out twice, and then the transposition convolution operation with the step length of 2,2c multiplied by 3 is carried out, so that an output characteristic diagram is obtained
R2 module for inputting characteristic diagramThe convolution operation of 2c multiplied by 3 is carried out twice in succession, then the transposition convolution operation with the step length of 2 and c multiplied by 3 is carried out, and an output characteristic diagram is obtained
R1 module for inputting characteristic diagramThe convolution operation of c×3 is performed twice successively, and then the convolution operation of 2×3 is performed, so that an output feature map P 2×w is obtained.
The parameters show that w is the width of the image, can be divided by 8 as much as possible, w is not too large, a large amount of hardware is consumed, w=1024 is recommended, an image bilinear interpolation algorithm can be used to adjust w to a proper range, c represents channel parameters, the value interval is between [32,128], a large amount of hardware is consumed, and c=64 is recommended.
The key points are as follows:
quadrangle, wherein F is the output of image preprocessing, the size is 1 xw, the channel number is 1, and the length is w, P is the output of the neural network, the size is 2 xw, the channel number is 2, and the length is w;
Rectangle, representing operation module, wherein the first row is the input size of the feature map, and the mode is 'channel number x length';
conv, which is to represent convolution operation, the number of the following figures represents operation size, and the mode is 'channel number multiplied by convolution kernel length';
Maxpool representing maximum pooling operation, the latter number being the operation step size, the feature map is contracted The number of channels is unchanged;
deconv, representing transposed convolution operation, wherein the latter number is the operation size, the mode is 'channel number multiplied by convolution kernel length', the channel number determines the channel number of the output feature map, the feature map length is unchanged, stride represents the operation step length, and the latter number 2 represents 2 times of expansion of the feature map length after operation.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.