Disclosure of Invention
The invention mainly aims to provide a bolt loosening identification method and a system based on image feature point matching, and aims to solve the technical problem that in the prior art, a bolt loosening identification method based on computer vision can only identify loosening corners within a range of 60 degrees.
S10, obtaining two initial bolt group pictures at different moments, wherein the initial bolt group pictures comprise a plurality of fastening bolts; the method comprises the steps of S20, determining row-column ratios of fastening bolt arrangement in an initial bolt group picture, performing perspective transformation according to the row-column ratios to obtain corrected bolt group pictures after visual angle correction, performing image cutting and super-resolution reconstruction processing on each frame of corrected bolt group pictures to obtain a plurality of bolt subgraphs corresponding to each frame, S30, performing semantic segmentation on the bolt subgraphs, eliminating non-target areas, determining target areas to divide fastening bolts, S40, performing matching on the divided fastening bolts at different moments to determine target bolt matching pairs, wherein the target bolt matching pairs are matched with the same fastening bolt in a front picture and a rear picture, generating a first feature descriptor and a second feature descriptor of each fastening bolt in the matching pairs by using a SIFT algorithm, wherein the first feature descriptor is a feature description of a feature point of the target area in the front picture, the second feature descriptor is a feature description of a feature point of the target area in the rear picture, the SIFT algorithm generates a feature description of the feature point through a method of scale space detection, key point positioning and direction distribution and feature point description, and performing corresponding rotation matrix transformation on the first feature descriptor and the second feature descriptor in the front picture, and determining the corresponding rotation matrix of the first feature descriptor and the second feature descriptor in the front picture and the corresponding matrix.
Further, in step S40, the homography matrix satisfies the form of the rigid transformation matrix through matrix transformation:
wherein [ the Representing the horizontal displacement, vertical displacement and overall translation of the projective transformation in the two imagesRepresenting the rotation and scaling effects of projective transformation; for representing perspective effects and guaranteeing affine transformations, 、The amount of translation in the horizontal direction and in the vertical direction, respectivelyExpressed as a rotation angle.
In step S20, the corrected bolt group picture is obtained by performing bolt target detection on the initial bolt group picture by using a trained YOLOv-P6 algorithm, obtaining front and rear two groups of bolt detection data, wherein the bolt detection data comprise bolt types, center coordinates and sizes of detection frames, the bolt types comprise outer bolts, embedded bolts and missing bolts, the outer bolts correspond to inner bolts, the center coordinates of the detection frames comprise detection frame center coordinates x and detection frame center coordinates y, the sizes of the detection frames comprise detection frame length l and detection frame width w, traversing the center coordinates of the detection frames corresponding to each fastening bolt in one group of bolt detection data, calculating Euclidean distances of the center coordinates of any two detection frames, traversing all Euclidean distances in the obtained bolt detection data, determining four vertexes of the bolt group based on the largest two groups of Euclidean distances, sequentially connecting the four vertexes into a quadrilateral, determining the number of rows and columns of the outermost fastening bolts in the initial bolt group picture according to the number of the detection frames penetrated by the four edges of the quadrilateral, determining the row and column ratios of the outermost fastening bolts in the initial bolt group picture, performing perspective transformation according to the row-column ratios and row ratios in the initial bolt group picture, and column ratios, correcting the perspective ratios, and correcting perspective ratios.
Further, the specific mode of determining the rotation angle of the corresponding fastening bolt in the rear image relative to the fastening bolt in the front image is that if the fastening bolt is an outer sleeve bolt, the rotation angles of the outer sleeve bolt and the inner circular screw rod are respectively obtained by matching the target bolt, the relative rotation angle calculated based on the rotation angles of the outer sleeve bolt and the inner circular screw rod is taken as the rotation angle, and if the fastening bolt is an embedded bolt, the rotation angle of the embedded bolt is obtained as the rotation angle.
The concrete mode of obtaining the plurality of bolt subgraphs corresponding to each frame through image cutting and super-resolution reconstruction processing of the correction bolt crowd picture is that target detection is carried out on the correction bolt crowd picture to obtain a detection frame, the bolt subgraphs detected in the correction bolt crowd picture are cut according to the coordinates of the detection frame to obtain the bolt subgraphs, and the super-resolution reconstruction is carried out on the cut bolt subgraphs through SRGAN algorithm to improve the definition and detail of the image.
Further, in step S40, the specific step of generating feature description of the feature points by the SIFT algorithm through the methods of scale space extremum detection, key point positioning, direction distribution and feature point description is that S421, a series of fuzzy images with different fuzzy degrees are generated by convolving the initial bolt group picture by Gaussian kernels with different standard deviations, scale spaces are obtained, each fuzzy image represents the fuzzy degree under different scales, the Gaussian convolution kernels are shown in the following formula,
Wherein the method comprises the steps of、Representing the abscissa and the ordinate of the pixel point respectively,、Representing the length and width of the original bolt group picture respectively,The parameter is the variance (gaussian radius),The value is 1.2 to 1.6;
S422, subtracting two adjacent images with higher blurring degree in the scale space to generate Gaussian difference images, S423, marking candidate key points based on the Gaussian difference images, S424, performing threshold screening on the detected candidate key points, removing low-contrast candidate key points and edge response points to obtain a preliminary detected key point set as shown in the following formula,
Wherein T is a threshold value, n is the number of images of the feature to be extracted,Is a pixel value;
S425, on the preliminarily detected key point set, precisely positioning the position and the scale of the key point by fitting a second-order Gaussian function to determine the precise position of the key point, S426, determining the main direction of the key point to ensure the rotation invariance of the key point, S427, obtaining the characteristic description of the characteristic point by carrying out the characteristic description of the mathematical level on the key point, wherein the following formula is adopted
The processing is performed to acquire a feature description (descriptor) of the feature point, wherein,Is the L2 norm of the feature descriptor,Is a feature descriptor sub-vector of the feature,Is a component of the feature descriptor sub-vector, n is the vector dimension,The normalized feature descriptor vector is obtained.
Further, T is 0.5, or T is a local threshold value obtained by adjusting by adopting an adaptive Gaussian threshold method, a local Gaussian weight is calculated according to the local pixel distribution of each small block, the threshold value is adjusted by adopting the adaptive Gaussian threshold method, as shown in a formula,
Wherein the method comprises the steps ofRepresentative pixelAt the local threshold value(s),Is the average gray value of the pixels in the adjacent local area,Is the standard deviation of pixels in adjacent local areas.
Further, in step S40, the first feature descriptor and the second feature descriptor are matched in a specific manner by comparing the first feature descriptor and the second feature descriptor one by one, calculating the square of the difference between each corresponding component of feature vectors of two feature points one by one, summing the squares of the differences between all corresponding components to obtain the square root, calculating the vector distance, determining the feature point with the vector distance smaller than the preset distance threshold as a proper matching point, wherein the preset distance threshold is determined by a dynamic adjustment method, setting an initial distance threshold, increasing the preset distance threshold according to a preset step length S if the number of feature point matching pairs is smaller than the minimum number of feature points of two images, enabling the number of matching points to meet the number requirement of matchers, screening the matching points by a RANSAC algorithm after the number of matching points meet the number requirement of matchers, further eliminating wrong matching pairs, screening by adopting a filtering threshold of the dynamic adjustment RANSAC algorithm, reducing the filtering threshold step by step until the obtained homography matrix meets the matrix transformation matrixIn the form of (a).
Further, a pretension force variation value of the fastening bolt is determined based on the rotation angle of the fastening bolt and the rotation angle-pretension force variation relationship data.
The invention also provides a bolt looseness identification system based on image feature point matching, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the method when executing the computer program.
Compared with the prior art, the bolt loosening identification method based on image feature point matching has the following beneficial effects:
the Bolt loosening identification method based on image feature point matching comprises the steps of firstly obtaining two initial Bolt group pictures at the front and rear non-passing time, obtaining corrected Bolt group pictures after visual angle correction according to row-column ratios of arrangement of fastening bolts, conducting perspective transformation, conducting image cutting to obtain all Bolt subgraphs in the front and rear corrected Bolt group pictures, conducting semantic segmentation after super-resolution reconstruction to determine a target area needing feature matching, conducting matching in the target area to complete matching of the same fastening Bolt in the front picture and the rear picture, generating a first feature descriptor and a second feature descriptor of the fastening Bolt in the matching pair through a SIFT algorithm, conducting matching on the first feature descriptor and the second feature descriptor through a Brutal-Force matcher, calculating a homography matrix, determining the rotation angle of the corresponding fastening Bolt in the rear image relative to the fastening Bolt in the front image through transformation homography matrix, adopting the SIFT algorithm to combine Brutal-Force matcher, calculating the rotation angle of the fastening Bolt based on the SIFT algorithm, achieving 360-degree full period detection of the fastening Bolt, and solving the problem that the problem of a complete detection range of 60 degrees is only based on the existing method of loosening in the visual range of a visual range of the machine is solved.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that all directional indicators (such as up, down, left, right, front, and rear are used in the embodiments of the present invention) are merely for explaining the relative positional relationship, movement conditions, and the like between the components in a certain specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicators are changed accordingly.
Furthermore, the description of "first," "second," etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
Referring to fig. 1,2,3,4, 5, 6, 7, 8 and 9, the invention provides a bolt loosening identification method based on image feature point matching, comprising the following steps of S10, obtaining two initial bolt group pictures at different moments, wherein the initial bolt group pictures comprise a plurality of fastening bolts; S20, determining row-column ratios of fastening bolt arrangement in an initial bolt group picture, performing perspective transformation according to the row-column ratios to obtain corrected bolt group pictures after visual angle correction, performing image cutting and super-resolution reconstruction processing on each frame of corrected bolt group pictures to obtain a plurality of bolt subgraphs corresponding to each frame, S30, performing semantic segmentation on the bolt subgraphs, eliminating non-target areas, determining target areas to segment fastening bolts, S40, performing matching pairing on the segmented fastening bolts at different moments, determining target bolt matching pairs, wherein the target bolt matching pairs are the matching of the same fastening bolt in a previous picture and a subsequent picture, generating a first feature descriptor and a second feature descriptor of each fastening bolt in the matching pairs by using a SIFT algorithm, wherein the first feature descriptor is the feature description of the feature point of the target area in the previous picture, the second feature descriptor is the feature description of the feature point of the target area in the subsequent picture, generating the feature point by using the SIFT algorithm through a method of scale space extremum, key point positioning and direction distribution and feature point description, performing single-feature matrix matching by using a SIFT algorithm, performing single-matrix matching by using a first feature matcher and a single-feature matcher, performing single-matrix matching description, the rotation angle of the corresponding fastening bolt in the rear image with respect to the fastening bolt in the front image is determined.
The Bolt looseness identification method based on image feature point matching comprises the steps of firstly obtaining two initial Bolt group pictures at the front and rear non-passing time, obtaining corrected Bolt group pictures after visual angle correction according to row-column ratios of arrangement of fastening bolts, conducting perspective transformation, conducting image cutting to obtain all Bolt subgraphs in the front and rear corrected Bolt group pictures, conducting semantic segmentation after super-resolution reconstruction to determine a target area needing feature matching, conducting matching in the target area to complete matching of the same fastening Bolt in the front picture and the rear picture, generating a first feature descriptor and a second feature descriptor of the fastening Bolt in the matching pair through a SIFT algorithm, conducting matching on the first feature descriptor and the second feature descriptor through a Brutal-Force matcher, calculating a homography matrix, determining the rotation angle of the corresponding fastening Bolt in the rear image relative to the fastening Bolt in the front image through transformation homography matrix, adopting the SIFT algorithm to combine Brutal-Force, calculating the rotation angle of the fastening Bolt based on the SIFT method, achieving 360-degree period detection of the fastening Bolt, and breaking through the full-60-degree detection range of the complete detection of the whole image, and solving the problem of looseness in the existing method based on the visual looseness identification technology.
It can be understood that the two-frame initial bolt group picture is a bolt group picture taken at the front time and the rear time, and the shooting angles at the rear time and the front time may be consistent or inconsistent, and the bolt group may be formed by two rows and three columns of fastening bolts, may be formed by three rows and four columns of bolt groups, may be formed by other rows and other columns of bolt groups, and is not described herein.
It can be appreciated that in order to further adapt to the calculation of the Bolt rotation angle based on the SIFT-Bolt mode, the method further optimizes the steps by calculating the Bolt rotation angle more accurately.
It will be appreciated that in a preferred embodiment of the present invention, the first and second feature descriptors of each fastening bolt in the pair are matched using SIFT algorithm (Scale-INVARIANT FEATURE TRANSFORM, a Scale-invariant feature extraction method), and the first and second feature descriptors are matched by a Brutal-Force matcher.
The invention adopts the fusion of multiple technologies such as computer vision, neural network, deep learning, target detection, visual angle correction, super-resolution reconstruction and the like.
Computer vision, among other things, is a process of enabling computer systems to understand and interpret image or video content through computer technology and algorithms, which involves the tasks of extracting information from images or videos, identifying objects, detecting motion, measuring the size and shape of objects, etc., and computer vision typically uses digital image processing and pattern recognition techniques, including methods of feature extraction, pattern matching, machine learning, and deep learning, etc.
The neural network is a calculation model imitating the working mode of a human nervous system and is used in the fields of machine learning and artificial intelligence, and consists of a large number of artificial neurons (or called nodes), the neurons are connected to transmit information, and the connection has weight and can be adjusted to influence the transmission of the information in the network. The neural network is typically divided into multiple layers including an input layer that receives data from an external environment, a hidden layer that performs information processing between input and output, and an output layer that generates predictions or results of the model. Each node weights the input data and generates an output by activating a function, and the neural network adjusts the connection weights by learning the training data set to optimize the performance of the model.
Among them, deep learning is a branch of machine learning that attempts to implement abstract learning of data by simulating the neural network structure of the human brain. The core of deep learning is to learn the characteristic representation of data through multi-level neural networks so as to realize efficient processing and analysis of complex data, and each layer of the neural networks can perform some transformation and feature extraction on the data to gradually form a more abstract and advanced representation of input data.
Among them, object detection is a task in the field of computer vision aimed at identifying a specific object in an image or video and determining its position, and unlike a simple image classification task, object detection requires determining a bounding box (detection box) of an object and classifying it into a predefined class.
The detection frame is a common representation mode in the target detection task and is used for identifying the position of a target object in an image. It is typically a rectangular frame consisting of four borders for enclosing the target object.
Among them, viewing angle correction is an image processing technique for correcting the viewing angle or observation angle of an object in an image so as to make it look straight or aligned with a specific angle, and is generally used in applications such as image correction, three-dimensional reconstruction, virtual reality, and the like.
In image processing, sometimes, due to equipment limitations or other factors, we can only obtain low-resolution images, while super-resolution reconstruction techniques can improve the resolution of images through some algorithms and techniques, so that the images can be seen more clearly and in rich detail.
The back propagation (Backpropagation) is an optimization algorithm for training the neural network, and is a variant of the gradient descent algorithm, and the back propagation algorithm realizes the training of the network by calculating the gradient of each parameter in the neural network to the loss function and then updating the parameters along the opposite direction of the gradient, thereby gradually reducing the loss function. The back propagation algorithm mainly comprises the steps of firstly, forward propagation (ForwardPropagation) of calculating input data layer by layer through a neural network until an output result is obtained, in the process, the input of each layer is subjected to weighting and function activation treatment to obtain the output of the next layer, secondly, calculating a loss function (LossCalculation) of comparing the network output with a real label, calculating a value of the loss function and evaluating the accuracy of model prediction, thirdly, backward propagation gradient (BackwardPropagationofGradients) of calculating the gradient of each parameter to the loss function layer by layer according to a chained rule from the output layer, the process can be realized by calculating an error gradient of each layer and propagating the error gradient to the previous layer, parameter updating (ParameterUpdate) of each parameter along the opposite direction of the gradient according to the calculated gradient, so as to reduce the loss function, and generally performing parameter updating by using a gradient descent algorithm or a variant thereof, and fifth, repeated iteration (IterativeProcess) of the backward propagation algorithm generally needs to perform iteration for a plurality of times, each iteration is to forward propagate data (one mini-batch) through the network and update the gradient until the maximum number of iteration parameters reach the convergence condition or reach the maximum number of iteration losses. In the back propagation process, the gradient gradually decreases layer by layer, and finally tends to zero, so that parameters close to an input layer cannot be effectively updated, and the training effect of the network is further affected.
The random drop of the momentum gradient can be more stable and efficient when the parameters are updated, the local minimum value can be jumped out, the convergence speed is accelerated, meanwhile, the vibration of parameter updating can be reduced, and the random drop of the momentum gradient algorithm is generally faster in convergence than the common random drop of the momentum gradient algorithm, and the deep neural network can be trained more effectively.
The Brute-ForceMatcher (violence matcher) is a feature matching method commonly used in the field of computer vision, and is used for finding similar feature points in two images, and the basic idea is that for each feature point in one image, feature descriptors are compared with all feature points in the other image, and then the most similar feature point is selected as a matching result.
Among them, RANSAC (RandomSampleConsensus ) is an iterative method for fitting a model and removing outliers in the data. In the RANSAC algorithm, a threshold needs to be set to determine which data points are considered interior points to the model and which are considered outliers (exterior points). The selection of the threshold is usually determined according to specific application scenarios and data characteristics, in RANSAC, the threshold determines the fitting degree between the data points and the fitting model, and the data points exceeding the threshold are regarded as outliers, in general, the smaller the threshold is, the better the robustness of the model is, but more interior points can be misjudged as outliers, and conversely, the larger the threshold is, the worse the robustness of the model is, but more interior points can be reserved.
Wherein the rigid body transformation is characterized by not changing the shape and size of the object, but translating, rotating and scaling in space, thereby preserving the rigidity of the object (rigidity), and can be described as a Euclidean distance transform (Euclideantransformation) that maintains the distance and angle between points in space unchanged.
The homography matrix (HomographyMatrix), also called homography matrix or homography transformation matrix, is an important concept in the field of computer vision for describing the projective transformation relationship between two images. In two-dimensional space, the homography matrix can describe the projection transformation relation between one image and the other image, is commonly used in tasks such as image registration, image correction, image stitching and the like, and is assumed to have two images A and B, wherein the homography matrix H between the two images A and B defines the projection transformation relation between one point in the image A and a corresponding point in the image B, and particularly, for the two-dimensional points (x, y) in the image A, the corresponding point (x ', y') in the image B can be expressed as a multiplication relation of the homography matrix H and homogeneous coordinates:
Wherein H is a 3x3 matrix, called homography matrix, which contains a series of transformation parameters such as rotation, translation, scaling, projection, etc., and registration, alignment, splicing, etc. between images can be realized by solving the homography matrix. In practical application, a common method is to estimate a homography matrix through feature point matching, and then perform projection transformation of an image by using the homography matrix obtained by estimation. The homography matrix can be estimated using various methods, such as direct linear transformation estimation (DirectLinearTransformation, DLT), least squares (LeastSquares), random sample consensus (RANSAC), and so on.
The bolt pretightening force (PreloadForce) is a force applied to the bolt in the bolt installation process and is used for generating friction and pressure so as to form tight connection between the bolt and the connecting piece; the pretightening force is a fastening force, can guarantee tightness and stability of the bolt connection part, and prevents loosening and failure.
Further, the homography matrix satisfies the form of the rigid transformation matrix through matrix transformation:
wherein [ the Representing the horizontal displacement, vertical displacement and overall translation of the projective transformation in the two imagesRepresenting the rotation and scaling effects of projective transformation; for representing perspective effects and guaranteeing affine transformations, 、The amount of translation in the horizontal direction and in the vertical direction, respectivelyExpressed as a rotation angle. In the present invention,9 Elements of the homography matrix are obtained.
Referring to fig. 9 again, in an alternative embodiment of the present invention, the bolt loosening identification method based on image feature point matching mainly includes steps of feature point generation, feature point matching, matching pair screening, and rotation angle calculation.
In a specific embodiment of the invention, two bolts with shortest Euclidean distance of center coordinates of bolt subgraphs of the front and rear pictures are calculated and paired into a group, so that the matching of the same bolt in the front and rear pictures is obtained. The invention relates to a method for dynamically adjusting the distance threshold value, which comprises the steps of firstly comparing the current feature descriptor with the feature descriptor of a second group, calculating the square of the difference between each corresponding component of two feature vectors, then adding all square differences, taking the square root to calculate the vector distance, and setting a distance threshold value, wherein only the feature descriptor with the distance smaller than the threshold value is considered to be suitable for matching.
At this time, matching pairs between the first set of feature descriptors and the second set of feature descriptors are obtained. In the working condition, the front bolt picture and the rear bolt picture which are subjected to visual angle correction only comprise rigid transformation, so that the matrix allows us to carry out matrix transformation of the formula on the obtained homography matrix. And calculating a homography matrix by using OpenCV, and screening the matching points by adopting a RANSAC algorithm. However, in this process, the matching points obtained in accordance with the digital image processing conditions may not necessarily be true matching points, so in order to enhance the robustness of the algorithm, in the scheme of the invention, the threshold value of the RANSAC algorithm is dynamically adjusted, the number of inner points is increased, mismatching point pairs are deleted, and the optimal inner point subset is selected to calculate the homography matrix until the obtained homography matrix can meet the requirement of the rigid transformation matrix through matrix transformationIn the form of (2) meets the real conditions under the working condition, thereby determining the correct rotation angle of the bolt rotation. The rotation angle is determined according to the characteristic points, and the limit that the rotation angle can only be calculated by 60 degrees in the prior art is eliminated.
。
In step S20, the corrected bolt group picture is obtained by specifically performing bolt target detection on the initial bolt group picture by using a trained YOLOv-P6 algorithm, and obtaining a front set of bolt detection data and a rear set of bolt detection data, wherein the bolt detection data comprise bolt types, a center coordinate and a size of a detection frame, the bolt types comprise outer bolts, embedded bolts and missing bolts, the outer bolts correspond to inner circular screws, the center coordinate of the detection frame comprises a detection frame center coordinate x and a detection frame center coordinate y, and the size of the detection frame comprises a detection frame length l and a detection frame width w; traversing center coordinates of detection frames corresponding to each fastening bolt in one set of bolt detection data, calculating Euclidean distances of the center coordinates of any two detection frames, traversing to obtain all Euclidean distances in the bolt detection data, determining four vertexes of a bolt group based on the maximum two sets of Euclidean distances (diagonal distances), sequentially connecting the four vertexes into a quadrilateral, determining the row number and the column number of the outermost fastening bolts in an initial bolt group picture according to the number of detection frames penetrated by four edges of the quadrilateral, determining the row-column ratio of bolt arrangement in the initial bolt group picture, and performing perspective transformation according to the row-column ratio to obtain a corrected bolt group picture after visual angle correction.
The YOLOv algorithm is a target detection algorithm based on deep learning, and is widely used for real-time object detection tasks, the name of the algorithm is 'YOLO' stands for 'YouOnlyLookOnce', and the rapid detection speed and an efficient calculation mode are emphasized. In the YOLOv-P6 network model architecture adopted by the invention, a Backbone network is responsible for extracting features from an input image and converting the image into a feature representation with rich semantic information, neck (a connecting part/neck module) is an intermediate layer used for fusing the features from the Backbone to improve the performance of the model, and a Head (a task Head/Head arrangement module) is the last layer of the model, the structure of which can be different according to different tasks, and in the target detection task of the invention, a bounding box regressor and a classifier are used as the Head. YOLOv8-P6 enable fast and accurate target detection by dividing the entire image into smaller grid cells and predicting the target in each grid cell at the same time. The network training of YOLOv-P6 is as follows, taking collected target bolt photos as a data set, wherein three types including embedded bolts (bolt_embedded), sleeved bolts (bolt_ jacketed) and bolt missing (missing) are typically shown in fig. 4, data enhancement technologies such as cutting, adding noise points, rotating and changing color channels are processed, so that a target object is still contained in an enhanced image, a 18779 bolt detection data set is finally generated, a training set, a verification set and a test set are divided according to the proportion of 7:1.5:1.5, wherein the training set is 18779×70% ≡13145, the verification set and the test set are 18779×15% ≡2817, training of YOLOv8-P6 is realized through back propagation and momentum random gradient descent (SGD), the learning rate is 0.01, the small batch size is 8, and the maximum training epoch number is set to 500.
Further, the specific mode of determining the rotation angle of the corresponding fastening Bolt in the rear image relative to the fastening Bolt in the front image is that if the fastening Bolt is an outer sleeve Bolt, the rotation angles of the outer sleeve Bolt and the inner circular screw in the matching pair of the target Bolt are respectively calculated by using a SIFT algorithm combined with a Brutal-Force matcher, the relative rotation angles of the two types are calculated based on the rotation angles of the two types, and if the fastening Bolt is an inner sleeve Bolt, the rotation angle of the inner sleeve Bolt is calculated by using a SIFT algorithm combined with a Brutal-Force matcher.
Development of YOLOv-BoltSeg included, among other things, understanding of the basic principles and training of YOLOv 8-BoltSeg. The proposal of the invention is based on the Gather-and-distribution mechanism (GD) mechanism to improve YOLOv-Seg model, and the network architecture is shown in the figure. Firstly, image features are extracted through a Backbone, then a GD mechanism is added into a Neck structure to improve a feature fusion mechanism, the GD mechanism is realized through convolution and self-attention operation, and the multi-scale feature fusion capability is enhanced. It includes a Low-level gather and distribute branch (Low-GD) and a High-level gather and distribute branch (High-GD) to extract and fuse feature information through convolution-based and attention-based blocks, respectively. The process of collection and distribution corresponds to three modules, a Feature Alignment Module (FAM), an Information Fusion Module (IFM), and an information injection module (Inject). The collection process involves two steps, firstly, the FAM collects and adjusts the features at the various levels, and secondly, the IFM fuses the aligned features to generate global information. And the step of distributing branches is that after the fused global information is obtained from the collection process, the injection module distributes the information to each level and injects it using a simple attention operation, thereby enhancing the splitting ability of branches. Finally, the Head part adopts a bounding box regressor, a classifier and cross entropy loss. YOLOv8-BoltSeg were trained on the network as follows, using 2000 close-up bolt photographs in the target detection dataset as the dataset, labeling with Labelme to generate json files, and converting the json files into txt files required for training. The label types are classified into a hexagonal corner (hexagon) and an inner circle (circle), wherein the outer hexagonal label represents the head of the embedded bolt and the nut portion of the outer bolt, and the inner circle label represents the shank portion of the outer bolt, and a typical example is shown in fig. 6. Meanwhile, data enhancement technologies such as cutting, noise adding and color channel changing are applied, so that 6000 bolt detection data sets are finally generated, and a training set, a verification set and a test set are also divided according to the proportion of 7:1.5:1.5, so that the training set is 6000 multiplied by 70% = 4200, and the verification set and the test set are 6000 multiplied by 15% = 900. YOLOv8-BoltSeg is achieved by back propagation and random gradient descent of momentum (SGD), where the learning rate is 0.01, the small batch size is 8, and the maximum training epoch number is set to 500.
The concrete mode of obtaining the plurality of bolt subgraphs corresponding to each frame through image cutting and super-resolution reconstruction processing of the correction bolt crowd picture is that target detection is carried out on the correction bolt crowd picture to obtain a detection frame, the bolt subgraphs detected in the correction bolt crowd picture are cut according to the coordinates of the detection frame to obtain the bolt subgraphs, and the super-resolution reconstruction is carried out on the cut bolt subgraphs through SRGAN algorithm to improve the definition and detail of the image. In practice, the cleavage is based on the result of YOLOv-P6 target detection, and the cleavage is followed by segmentation. And carrying out semantic segmentation on the cut picture by using YOLOv-BoltSeg, and matting out a bolt area according to the semantic segmentation result, thereby removing redundant background. In the invention, semantic segmentation is performed after the bolt category is determined.
In a specific embodiment, SRGAN algorithm (Super-Resolution GENERATIVE ADVERSARIAL Network, i.e. Super-Resolution reconstruction algorithm based on generating an image against a Network), i.e. Super-Resolution reconstruction algorithm is used for Super-Resolution reconstruction of the segmented sub-images to improve the sharpness and detail of the image. The process is helpful for accurately restoring the detail information of the bolt, and is convenient for the subsequent feature point generation and matching. Meanwhile, YOLOv-BoltSeg algorithm is used for semantic segmentation, meanwhile, a coordinate text document obtained by segmentation is generated, and pixels of a segmented area are scratched out through a segmentation result, so that subsequently generated characteristic points are accurately located in a target area, and the detection accuracy and reliability are further improved. At this time, the category of the bolt is read according to the text document corresponding to the bolt subgraph. And traversing and reading the bolt subgraph coordinates stored in the text documents of all groups in the front and rear photos, and solving two bolts with the shortest Euclidean distance as a group of pairs, wherein the two bolts are considered to be the same bolt in the front and rear photos. It should be noted that, at this time, if there is a missing class in the bolt subgraph group pair, the subordinate step is skipped, and the missing (missing) is taken as the output value. And taking the bolt group pairs and the corresponding bolt types as inputs. When the outer Bolt is used, two types of outer hexagonal and inner circular are divided, the Sift-Bolt provided by the invention is used for respectively solving the corners of the outer hexagonal and the inner circular, and then the relative corners of the two types are solved to be used as the loosening corner of the outer Bolt. If the embedded Bolt is an embedded Bolt, the embedded Bolt is divided into outer hexagonal types, and at the moment, the outer hexagonal corner is obtained by using the Sift-Bolt, so that the embedded Bolt can be used as a loosening corner of the embedded Bolt. And finally, splicing the obtained angle numerical value result to a coordinate document corresponding to the bolt subgraph, and storing and outputting the coordinate document.
Further, in step S40, the specific steps of generating the feature description of the feature points and the feature points by the SIFT algorithm through the methods of scale space extremum detection, key point positioning, direction distribution and feature point description are as follows:
S421, convolving the initial bolt group picture by Gaussian kernels with different standard deviations to generate a series of blurred images with different blur degrees, obtaining a scale space, wherein each blurred image represents the blur degree under different scales, the Gaussian convolution kernels are shown in the following formula,
Wherein the method comprises the steps of、Representing the abscissa and the ordinate of the pixel point respectively,、Representing the length and width of the original bolt group picture respectively,The parameter is the variance (gaussian radius),The value is 1.2 to 1.6;
S422, subtracting two adjacent images with higher blurring degree in the scale space to generate a Gaussian difference image;
s423, marking candidate key points based on Gaussian difference images;
s424, threshold screening is carried out on the detected candidate key points, the key points and the edge response points of the low-contrast candidate are removed, a key point is obtained as shown in the following formula,
Wherein T is a threshold value, n is the number of images of the feature to be extracted,Is a pixel value;
Processing all candidate key points to obtain a key point set;
S425, on the preliminarily detected key point set, precisely positioning the position and the scale of the key point by fitting a second-order Gaussian function, and determining the precise position of the key point;
s426, determining a main direction of the key point, and ensuring rotation invariance of the key point;
s427, carrying out mathematical-level feature description on the key points to obtain feature description of the feature points, wherein the following formula is adopted
The processing is performed to acquire a feature description (descriptor) of the feature point, wherein,Is the L2 norm of the feature descriptor,Is a feature descriptor sub-vector of the feature,Is a component of the feature descriptor sub-vector, n is the vector dimension,The normalized feature descriptor vector is obtained, and the length of the descriptor is 1.
In the invention, n is a pair of pictures for matching, and the value is 2.
Further, T is 0.5, or T is a local threshold value obtained by adjusting by adopting an adaptive Gaussian threshold method, a local Gaussian weight is calculated according to the local pixel distribution of each small block, the threshold value is adjusted by adopting the adaptive Gaussian threshold method, as shown in a formula,
Wherein the method comprises the steps ofRepresentative pixelAt the local threshold value(s),Is the average gray value of the pixels in the adjacent local area,Is the standard deviation of pixels in adjacent local areas.
In a specific embodiment of the invention, the SIFT generates a feature description of the feature point through four steps of scale space extremum detection, key point positioning, direction allocation and feature point description.
The first step, namely a scale space extremum detection step, is to apply a series of Gaussian kernels with different standard deviations to an original image to convolve the original image so as to generate a series of images with different blurring degrees, namely a scale space, wherein each image represents the blurring degree of the original image under different scales, the Gaussian kernels are shown as a formula (1),
Wherein the method comprises the steps of、Representing the abscissa and ordinate of the pixel respectively,、Representing the length and width of the image respectively,The parameter is the variance, also known as the gaussian radius. Is usually taken from SIFTThe value is 1.6, but considering that in practice the camera has blurred the image, in the present invention,The value is 1.2. At this time, a gaussian difference operation is performed on the images of adjacent scales. I.e. subtracting two images with higher blur adjacent in the scale space to produce a gaussian difference image. The purpose of this step is to find local extreme points in the scale space, which have a stable response at different scales and an invariance to the scale change of the image. In the gaussian differential image, each pixel point is compared with 26 corresponding pixels of its 8 neighboring pixels to determine whether it is a local extremum point. If a pixel value (i.e., gray value) is greater (or less) than the values of its neighboring pixels, it is marked as a candidate keypoint. At this time, threshold screening is performed on the detected candidate key points, and key points with low contrast and edge response points are removed, so that a final key point set in the step is obtained as shown in a formula (2),
Wherein T in SIFT is always 0.4, n is the number of images of the feature to be extracted,Is the pixel value. It should be noted that, in order to make the key points of the screening more accurate and better process the images under various illumination conditions, the selection of the threshold value is optimized in the invention, an adaptive gaussian threshold value method is adopted, a local gaussian weight is calculated according to the local pixel distribution of each small block, the calculation for adjusting the threshold value is shown in the formula (3),
Wherein the method comprises the steps ofRepresentative pixelAt the local threshold value(s),Is the average gray value of the pixels in the adjacent local area,Is the standard deviation of pixels in adjacent local areas.
And secondly, positioning key points. And on the preliminarily detected key point set, the position and the scale of the key point are precisely positioned by fitting a second-order Gaussian function. Firstly, calculating the amplitude and direction of the pixel gradient around the key point by adopting a Sobel operator, then constructing a second-order Gaussian function near the extreme point position as shown in (4) to fit the curved surface shape of the local image,
Wherein, Is the gray value at the extreme value,Is the gradient vector which is used to determine the gradient,Is a Hessian matrix. And finally, performing secondary interpolation by expanding the Gaussian function into a Taylor series and deriving, namely solving the extreme points of the Gaussian function so as to determine the accurate positions of the key points.
Thirdly, in order to ensure the rotation invariance of the key points, the histograms of the gradient directions obtained in the second step are counted in the neighborhood around the key points. Each bin of the histogram represents an interval of one gradient direction, the value of which represents the accumulation of gradient magnitudes in that direction. The bin number of the histogram is divided into 36 bins, one bin every 10 degrees. The direction histogram is then traversed to find the direction with the greatest gradient magnitude as the principal direction of the keypoint. The main direction thus selected is typically the most pronounced direction of change of the key point at that location. Meanwhile, in order to enhance robustness of the keypoints, a plurality of main directions are typically selected within 360 degrees around the keypoints. In addition to the primary direction, the direction of the next largest gradient magnitude is selected as the secondary direction. By doing so, the key points can be more robust to the rotation variation, and the diversity of the characteristic points is increased.
And fourthly, after finding the key points and determining the main directions of the key points, carrying out mathematical feature description on the key points, namely a feature descriptor, so as to facilitate subsequent feature matching. First, in a 16x16 pixel region around the keypoint, this region is divided into 4 x4 pixel sub-regions, and then the gradient magnitude and direction of each pixel in the pixel sub-regions are calculated. The gradient directions of the pixels in each sub-region are then assigned to a corresponding gradient direction histogram, which is 8 directions, each 45 ° from the two directions. These gradient direction histograms are then concatenated to form a long feature vector, i.e. a keypoint descriptor. For each patch, its 8 gradient direction values are connected in turn to form a sub-vector of length 8. All sub-vectors are connected to obtain a feature vector of length 4 x 8=128. Finally, in order to increase the robustness of the descriptor, the invention performs L2 norm normalization on the whole descriptor vector, firstly, for each element in the descriptor vector, the squares thereof are calculated and summed as in formula (5),
Obtaining the L2 norm of the descriptor, whereinDescriptor vector,Is a component of the descriptor vector, n is the vector dimension, and has a value of 128. Each element in the descriptor vector is then divided by its L2 norm as (6),
And obtaining the normalized descriptor vector. This step makes the length of the descriptor a fixed value of 1, thereby further enhancing the rotational invariance and scale invariance of the descriptor
Further, in step S40, the first feature descriptor and the second feature descriptor are matched in a specific manner by comparing the first feature descriptor and the second feature descriptor one by one, calculating the square of the difference between each corresponding component of feature vectors of two feature points one by one, adding the squares of the differences between all corresponding components to obtain the square root of the difference, calculating the vector distance, determining the feature descriptor (feature point) with the vector distance smaller than the preset distance threshold as a proper matching point, wherein the preset distance threshold is determined by adopting a dynamic adjustment method, an initial distance threshold is set, if the number of feature point matching pairs is smaller than the minimum number of feature points of two images, the preset distance threshold is increased according to a preset step length S until all feature points are matched one by one, after the number of matching points meets the number requirement of matchers, screening the matching points by adopting a RANSAC algorithm, and further eliminating error matching, wherein the filtering threshold is increased step by adopting a filtering threshold of the dynamic adjustment RANSAC algorithm until the obtained single-correspondence matrix meets the matrix transformation matrixIn the form of (a).
In the invention, the number of the error matching pairs is the largest in the initial state, at the moment, the residual matching pairs are the smallest, the filtering threshold value is smaller and smaller, the number of the error matching pairs is smaller and smaller, and the residual matching pairs are larger and larger until the steel body transformation matrix form is satisfied.
Further, a pretension force variation value of the fastening bolt is determined based on the rotation angle of the fastening bolt and the rotation angle-pretension force variation relationship data.
In one embodiment of the present invention, after the rotation angle is determined, the operator will be prompted as to whether to perform the preload estimation. When the pretightening force is estimated, the operator selects the relation formula of the rotation angle and pretightening force proposed by Shoberg [32] which is input in advance, as shown in formula (7), when the rotation angle of the bolt isWhen the pretightening force is changedThe method comprises the following steps:
Wherein the method comprises the steps of For the rigidity of the bolt,For the connection stiffness, P is the pitch. Or the formula is input by self, and the type of the bolt, the rigidity of the connecting piece and the initial pretightening force value of the bolt are input. At the moment, the current pretightening force change value can be calculated based on the obtained bolt rotation angle and rotation angle-pretightening force change relation formula. It should be noted that whenWhen the output loss pretightening force is larger than the input initial pretightening force valueNamely, the pretightening force loss value is obtained, and the output value is added to the text document corresponding to the bolt subgraph and then stored, so that the document information of the bolt subgraph at the moment is shown in the table 1.
TABLE 1
The beneficial effects of the invention are as follows:
1. the labor cost in detection can be reduced, quick, efficient and accurate bolt loosening detection is realized, and the detection is non-contact detection, and a sensor is not required to be arranged;
2. the visual angle correction of the photographs taken by the bolt group is automatically carried out without auxiliary means;
3. the super-resolution reconstruction of the picture is increased, and the fuzzy picture can have an accurate detection result;
4. The YOLOv-BoltSeg network is developed to perform quick and accurate bolt semantic segmentation, so that the matching of the characteristic points is in a target area and cannot be on the background;
5. the SIFT-Bolt is provided for calculating the rotation angle of the Bolt, so that 360-degree full-period detection of the Bolt is realized, and the bottleneck of the detection range of 0 degree and 60 degrees is broken through;
6. the change value of the bolt pretightening force can be obtained based on computer vision, and a more critical bolt loosening reference value is provided
Further, the bolt looseness identification system based on image feature point matching comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes any bolt looseness identification method based on image feature point matching when executing the computer program.
Referring to fig. 10, the invention provides a bolt loosening identification unit based on image feature point matching, which comprises a data input module for executing step S10, a target detection and super-resolution reconstruction module for executing step S20, a semantic segmentation module for executing step S30, a corner and prestress calculation module for executing step S40, and a result and output display module for outputting and displaying the calculation result.
The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.