Disclosure of Invention
The invention aims to provide the chip surface defect detection method and the chip surface defect detection system based on the improvement YOLOv, so that the detection accuracy and the detection speed are higher on industrial chip defect detection equipment, the chip surface defect detection method and the chip surface defect detection system can be more easily deployed in a defect diagnosis analysis system, and an inspector can rapidly identify a chip containing defects according to detection results.
The invention provides a chip surface defect detection method based on an improvement YOLOv, which comprises the following specific steps:
s1, constructing a chip surface defect picture data set;
s2, dividing the obtained data set into a training set, a verification set and a test set;
S3, improving based on YOLOv model, fusing DCN and C2f to form a new C2f-DCN variable feature fusion module, extracting features, introducing a dynamic upsampling operator Dysample, and constructing an improved chip surface defect detection model by adopting a MPDIOU boundary loss function, wherein the improved chip surface defect detection model is an improved YOLOv model;
S4, inputting the chip surface defect dataset into the improved YOLov model for training, and optimizing and improving the detection precision and generalization capability of the YOLov model by improving the capability of extracting the chip surface defect characteristics, reducing the loss of detail characteristic information in an up-sampling stage, enhancing the positioning of target defects;
S5, testing YOLOv and the improved YOLOv8 model by using a test set, comparing experimental results, and verifying the accuracy and efficiency of the test set and the improved YOLOv model in chip surface defect detection, wherein the accuracy and efficiency comprise defect positions, defect types and confidence.
In step S1, a chip surface defect picture dataset is constructed, comprising:
(1.1) obtaining a chip surface defect image, namely obtaining a chip image with defects in the process of producing the chip, and dividing the chip image into three types of pin missing, surface scratch and pin bending according to types;
(1.2) image data enhancement, which is to expand the original data set and enhance it by using the image enhancement method
A diversity;
and (1.3) marking the image, namely marking the defect area by using a LabelImg image marking tool, and marking the defect area by using a frame.
In step S3, an improved chip surface defect detection model is constructed, comprising:
(3.1) configuring an environment, namely establishing a virtual environment by using Anaconda, and debugging YOLov source codes of a model;
Inputting YOLOv a constructed chip surface defect data set into a network structure model of YOLOv, wherein the data input comprises the steps of obtaining chip surface defect images, chip surface defect image labels, YOLOv network structure configuration files and YOLOv model pre-training weights;
(3.3) configuring super parameters, namely setting the size, learning rate, batch processing number and iteration times of the network input image;
(3.4) adopting a YOLOv original model, and improving the YOLOv network structure model.
In step (3.4), the improvement of YOLOv network architecture model includes:
(3.4.1) fusing DCN and C2f to form a novel C2f-DCN variable characteristic fusion Module
Extracting row characteristics;
(3.4.2) modifying the original upsampling module to Dysample;
(3.4.3) replacing the original loss function with MPDIoU boundary loss functions.
In the step (3.4.1), the DCN and the C2f are fused to form a new C2f-DCN variable feature fusion module for feature extraction. DCNv4_net generates a mask (mask) by depth-wise calculation and calculates an offset (offset) using a point-wise operation. Finally, the network is able to output for each pixel its displacement in the horizontal and vertical directions as well as the shape variation parameters. The parameter weights can be adaptively adjusted to better match the feature locations in the input data, thereby significantly improving the detection performance of the model for irregular defects.
Further, the step of calculating the offset and the shape change parameter of the pixel point in the step (3.4.1) is as follows:
Two convolution kernels are arranged in a C2f module Bottleneck of (3.4.1.1) YOLOv8, the sizes of the two convolution kernels are 3*3, the size of a feature diagram output by the two convolution kernels is unchanged, 3*3 convolution kernels also exist in a DCNv4_Net structure, required offset and change parameters can be calculated under the condition that the size of the feature diagram is not changed, and two 3*3 convolutions in the C2f module are replaced by DCNv4_Net to obtain C2f-DCNv4;
(3.4.1.2) inputting features into C2f-DCNv4, wherein the features are firstly divided into two parts according to the number of channels by a layer 1*1 convolution kernel, a layer Split feature segmentation layer is used for splitting the features into two parts according to the number of channels, a part waits for feature splicing Concat, and the other part is input into a DCNv structure, wherein the features pass through two DCNv4 convolution layers, the DCNv4 convolution layer is a separable convolution formed by combining 3*3 convolution and a linear projection, the original convolution weight wk is separated into a depth part and a point-by-point part, the depth part is responsible for an original position sensing modulation scalar mk, the point-by-point part is a projection weight w shared between sampling points, and the features pass through the separable convolution to obtain an abscissa offset and an offset adjustment scale of the sampling points:
In the formula, G is the number of aggregation groups, wg epsilon RC×C ′,C′ =C/G is the projection weight of different positions for the G group, K is the number of sampling points, m gk epsilon R is the modulation factor of the kth sampling point, and the model operation speed is accelerated by discarding softmax normalization operation by DCNv, unlike DCNv 3.
(3.4.1.3) Normalized BatchNorm d operation after feature output containing offset and scale of adjustment obtained by DCNv4 _net:
In the formula, x is input characteristic value, ex represents mean value, var x represents variance, two parameters can be directly calculated in forward propagation, gamma and beta are adjustment variance and adjustment mean value, initial value is 1 and 0 respectively, the two parameters are continuously learned and updated in backward propagation of model, epsilon is used for ensuring calculation stability, avoiding 0 occurrence of denominator, and defaulting to 1E-5.
(3.4.1.4) Using RELU function to activate the output result, and completely outputting the input value greater than 0 and the input value smaller than 0 to be 0.
In step (3.4.2), the original upsampling module is modified to Dysample, and the linear projection is used for upsampling Dysample to generate s 2 offset sets, new sampling points are constructed, and the input feature map is resampled by bilinear interpolation.
Further, the processing of the Dysample upsampling module in step (3.4.2) includes:
Firstly, given an up-sampling scale factor S and a feature map X with the size of C×H×W, carrying out bilinear interpolation on the input feature map X, generating an offset O with the size of 2S 2 ×H×W by using a linear layer with an input channel of C and an output channel of 2S 2, controlling the offset O by a dynamic range factor, then reshaping the size of the offset O into 2×sH×sW by pixel transformation, adding the offset O with an original sampling grid G to output a sampling set S, resampling the feature map X by the position of the sampling set S by a grid sampling function, and generating the up-sampling feature map X with the size of C×sH×sW.
In step (3.4.3), the original loss function is replaced with MPDIoU boundary loss functions. The MPDIoU loss function comprehensively considers the intersection ratio, the center point distance and the width-height deviation of the prediction frame and the real frame, takes the left lower corner and the right lower corner coordinates of the prediction frame as input, defines a rectangle in a specific mode, enables the distance between two points in the prediction frame and the real frame to be minimum, and can accelerate regression convergence of a network to the prediction frame, so that a more accurate prediction result is obtained. The calculation of MPDIoU loss functions is defined by the following formula:
Wherein B pre denotes the coordinates of the prediction bounding box, B gt denotes the coordinates of the actual labeling bounding box, x 1,y1 denotes the upper left-hand corner coordinates of the box, x 2,y2 denotes the lower right-hand corner coordinates of the box, For the distance between the top left corner and the bottom right corner between the prediction bounding box and the actual labeling bounding box, w, h is the width and height of the chip picture.
In another aspect, the present invention further provides a chip surface defect detection system based on the improvement YOLOv, which is characterized by comprising the following modules:
(1) The data set establishing module is used for establishing a chip surface defect picture data set;
(2) The data set dividing module is used for dividing the obtained data set into a training set, a verification set and a test set;
(3) The model building module is used for building an improved YOLOv network model;
(4) And the training and predicting module is used for inputting the data set into the improved YOLO network model to perform training and predicting to obtain a chip image containing the surface defects.
The invention also provides computer equipment, which comprises a camera, a memory and a processor, wherein the camera acquires chip images with surface defects, the internal memory provides an operating environment for an operating system and application programs, the external memory is used for storing the operating system and the computer programs, and the processor is used for running the computer programs in the steps S1-S5.
Compared with the prior art, the method has the remarkable advantages that the improved YOLOv model can identify defects of different shapes and sizes through DCNv variable convolution, the characteristic extraction capacity of a network is enhanced, the loss of defect edges and detail information is reduced by using a Dysample up-sampling operator, more characteristic information is reserved, the original loss function is replaced by using a MPDIOU boundary loss function, the convergence speed of network training is increased, and the positioning capacity of the model on the defects is improved. Compared with the original model, the method has the advantages that the detection effect is improved, and the method can be better applied to chip surface defect detection tasks.
Detailed Description
So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized below, may be had by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The invention may be embodied in many different forms and modifications may be made by one skilled in the art without departing from the spirit or essential characteristics thereof, and it is therefore not limited to the specific embodiments disclosed herein.
As shown in fig. 1, the chip surface defect detection method based on the improvement YOLOv according to the embodiment of the present invention includes:
step 1, constructing a chip surface defect picture data set, which comprises the following specific steps:
(1.1) acquiring a chip surface defect image. The chip surface defect image used in the invention is chip surface defect image data obtained by shooting in the chip production process of a certain chip enterprise. The dataset contains 9880 images in total, with a resolution of 1280 x 1080, in RGB format. These images contain three types of defects in total, pin missing, surface scratches, and pin bending.
(1.2) Image data enhancement to enhance the robustness of the model and to enhance the generalization ability of the model, image inversion, random scaling and Mosaic data enhancement techniques are applied to expand the original data set, enhancing its diversity. The data enhancement technology can help the model learn wider characteristic representation, reduce the over-fitting phenomenon, and improve the capability of the model for detecting the surface defects of the chip in different scenes, thereby improving the industrial production efficiency and quality.
And (1.3) marking the image, namely marking the image by using a LabelImg image marking tool, calibrating an actual defect area picture frame, and calibrating a defect area by using a plurality of anchor frames for the image containing a plurality of defects. The missing pin label is 0 (queshi), the surface scratched label is 1 (huaheng), and the bent pin label is 2 (wanqu). And after the labeling is completed, the generated json format data is stored, and the generated json format data is converted into txt format data which can be processed by the YOLO network.
And 2, dividing the data set, namely dividing the chip surface defect image data amplified by the sample and enhanced by the data and the corresponding label file into a training set, a testing set and a verification set according to the proportion of 8:1:1.
And 3, constructing a chip surface defect detection model based on the improved YOLOv model.
(3.1) Environment configuration, downloading the official source code of YOLOv model from Github of Ultralytics official, building virtual environment by using Anaconda, installing the following libraries according to the requirement file in YOLOv source code :matplotlib>=3.3.0、numpy>=1.22.2、opencv-python>=4.6.0、pillow>=7.1.2、pyyaml>=5.3.1、requests>=2.23.0、scipy>=1.4.1、tqdm>=4.64.0、torch>=1.8.0、torchvision>=0.9.0.
And (3.2) inputting data, namely inputting the divided chip surface defect training set and test set into YOLOv network models, wherein the data comprise chip surface defect images, corresponding labels, YOLOv network model pre-training weights and YOLOv network structure configuration files. The network structure of YOLOv is shown in fig. 2.
(3.3) Setting super parameters, namely setting a learning rate, the number of batch samples, the number of iterations, the number of image channels, the picture input size and an optimizer, wherein the initial learning rate is 0.01, the minimum learning rate is 0.001, the weight attenuation is 0.0005, the number of batch samples batchsize is 16, the number of iterations is 300, the number of image channels is 3, the picture input size is 640 x 640, and the optimizer is AdamW.
(3.4) Model improvement, wherein the original model adopted by the invention is YOLOv8, and the YOLOv network structure model is particularly improved in the following three ways.
(3.4.1) Feature extraction is performed by fusing DCN and C2f to form a new C2f-DCN variable feature fusion module:
(3.4.1.1) introducing DCNv4 variable convolution into Bottleneck in the C2f module to replace the original common convolution to form C2f-DCNv4;
(3.4.1.2) As shown in FIG. 3, a schematic representation of the structure of C2f-DCN formed by the fusion of DCNv and C2f is shown. Features in DCNv4 structures are first convolved through a layer 1*1, then Split feature segmentation layers divide the features into two parts according to channel numbers, one part is transferred into DCNv structures and then waits for feature stitching Concat, the other part is transferred into DCNv structures, in the structures, the features are convolved through a layer 3*3 and a DCNv4 convolution layer, the DCNv convolution layer is separable convolution formed by combining 3*3 convolutions and a linear projection, original convolution weights wk are separated into depth parts and point-by-point parts, the depth parts are responsible for original position perception modulation scalar mk, the point-by-point parts are projection weights w shared between sampling points, and feature pixels p 0 can be output through the separable convolution:
In the formula, G is the number of aggregation groups, for the G group, wg epsilon RC×C ′,C′ =C/G is the projection weight of different positions, K is the sampling point number, mgk epsilon R is the modulation factor of the kth sampling point, and the method is different from DCNv3 in that DCNv4 cancels the last softmax normalization, reduces unnecessary calculation and accelerates the calculation speed of the model;
(3.4.1.3) normalized BatchNorm2d operation on the feature output from DCNv 4:
In the formula, x represents the characteristic value of the input, ex represents the mean value thereof, and Var x represents the variance. The two parameters can be directly calculated in the forward propagation process, gamma and beta are used for adjusting variance and mean, initial values of the gamma and beta are 1 and 0 respectively, the two parameters are continuously learned and updated in the training reverse propagation process, epsilon is used for guaranteeing the stability of calculation, the situation that 0 occurs in denominator is avoided, and the default is 1e-5.
(3.4.2) Modifying the original upsampling Module to Dysample
In order to improve the capturing capability of the chip surface defect detection model to target defect characteristics and details, as shown in fig. 4, the invention generates s 2 offset sets by using linear projection through efficient Dysample up-sampling, constructs new sampling points, and resamples an input characteristic diagram by using a bilinear interpolation method, thereby improving the capturing capability of the chip surface defect detection model to target defect characteristics and details, reducing reasoning cost and improving the reliability of the up-sampling process.
The processing procedure of the Dysample up-sampling module comprises:
In the grid sampling process, a feature map X of size c×h 1×W1 and a sample set S of size 2×h 2×W2 are first given, where the subscripts of H and W denote the coordinates of X and y. The grid sampling function resamples the hypothetical bilinear interpolation X to X ′ and the size C X H 2×W2 using the position of the sample set S, where the sampling function is defined as follows:
X′=grid_sample(X,S),
In specific implementation, the up-sampling scale factor S and the feature map X with the size of c×h×w are given first, the linear layer with the number of input and output channels of C and 2gs 2 is used to generate the offset O with the size of 2gs 2 ×h×w, the offset O is controlled by the dynamic range factor, then the size of the offset O is remodeled into 2×sh×sw by pixel transformation, the offset O is added with the original sampling grid G to output a sampling set S, and the position of the sampling set S is resampled on the feature map X by the network sampling function to generate the up-sampling feature map X ′ with the size of c×sh×sw.
Dysample up-sampling generates a 'dynamic range factor' point by point through linear projection input features, and controls the dynamic scope within the size range of [0, 5] by introducing a sigmoid function and a static factor of 0.5, so that the flexibility of offset is further increased, and the offset is calculated by the following formula:
O=0.5sigmoid(linear1(X))*linear2(X)
In the present invention, instead of reconstructing the input features with upsampling kernels that generate content perception, dysample upsampling is performed in a manner that generates sample point locations, one point is acquired at each location and divided into s 2 sample points, so long as s 2 sample points can continue to be dynamically segmented, then only one point needs to be acquired for the entire upsampling process. In addition, the kernel weight of Dysample up-sampling is based on x and y as position conditions, only 2 channels of feature graphs are needed, and the sampling is more efficient.
(3.4.3) Substituting MPDIoU boundary loss functions for original loss functions
The MPDIoU loss function comprehensively considers the intersection ratio, the center point distance and the width-height deviation of the prediction frame and the real frame, takes the left lower corner and the right lower corner coordinates of the prediction frame as input, defines a rectangle in a specific mode, enables the distance between two points in the prediction frame and the real frame to be minimum, and can accelerate regression convergence of a network to the prediction frame, so that a more accurate prediction result is obtained. The calculation of MPDIoU loss functions is defined by the following formula:
Wherein B pre denotes the coordinates of the prediction bounding box, B gt denotes the coordinates of the actual labeling bounding box, x 1,y1 denotes the upper left-hand corner coordinates of the box, x 2,y2 denotes the lower right-hand corner coordinates of the box, For the distance between the top left corner and the bottom right corner between the prediction bounding box and the actual labeling bounding box, w, h is the width and height of the chip picture.
Step 4, training using the modified YOLOv model, comprising:
(5.1) the hardware environment used by the invention is configured to be CPU, intel Core i9-12900K, CPU main frequency 3.2GHz, memory 64G,GPU:NVIDIA GeForce RTX 3080, video memory GDDR6X24G and software environment is configured to be deep learning Pytorch 1.12.1,Python 3.8,Cuda 11.3. The super parameters are set to be that the initial learning rate is 0.01, the minimum learning rate is set to be 0.001, the weight attenuation is 0.0005, the number of batch samples batchsize is 16, the iteration number is 300, the number of image channels is 3, the image input size is 640 x 640, and the optimizer is AdamW.
(5.2) In the model training process, to increase the network training speed, YOLOv pre-training weights are used for training, and the authorities provide several pre-training weights, and different versions can be selected for training according to different requirements. In the present invention, the pre-training weight selected is yolov8s.pt. The model is then validated for a validation set every 50 iterations. And secondly, the learning rate is adjusted, and the learning rate is dynamically adjusted according to the verification result so as to prevent over fitting and under fitting. The learning rate is gradually reduced using a learning rate decay strategy (LEARNING RATE DECAY). And finally, training until the preset iteration times are reached or the performance of the verification set is not improved, and completing the training of the network model.
And 5, testing and evaluating the improved YOLOv model.
Model test and evaluation, namely loading a weight file generated after model training into a test file to perform model test, obtaining a test result of the improved chip surface defect detection model on a test set, wherein the test result comprises defect positions, defect types and confidence coefficient, and comparing the test result with a YOLOv model.
The improved chip surface defect detection model is improved based on YOLOv model, a novel C2f-DCN variable feature fusion module is formed by fusion of DCN and C2f to conduct feature extraction, an original up-sampling module is modified to Dysample, and a MPDIOU boundary loss function is used for replacing an original loss function to construct the improved chip surface defect detection model.
The improved algorithm not only has better performance on the whole detection precision, but also obviously improves the identification capability of fine defects in a complex background, strengthens the detection capability of irregular target defects by introducing a DCN operator, and strengthens the positioning of target defects by using MPDIoU.
The embodiment of the invention also provides a chip surface defect detection system based on the improvement YOLOv, which comprises the following modules:
(1) The data set establishing module is used for establishing a chip surface defect picture data set;
(2) The data set dividing module is used for dividing the obtained data set into a training set, a verification set and a test set;
(3) The model building module is used for building an improved YOLOv network model;
(4) And the training and predicting module is used for inputting the dataset into the improved YOLOv network model for training and predicting to obtain a chip image containing the surface defects.
In this embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 5. The computer device includes a processor, an internal memory, an external memory, and a camera connected by a system bus. The processor is used for providing computing and control capability, the internal memory provides an operating environment for an operating system and application programs, and the external memory is a nonvolatile storage medium for storing the operating system and the computer programs. The camera of the computer device is responsible for collecting chip images with surface defects and displaying the processed results on a display screen. The communication interface of the computer equipment and the external terminal can realize communication through a WIFI technology, and the display screen can be a liquid crystal display screen.
It should be noted that the architecture depicted in fig. 5 is merely representative of some examples of hardware architectures associated with the subject matter of this patent application, and is not intended to limit the particular type or configuration of computer device to which the invention may be applied. In fact, in the practical application process, according to different requirements, the above components can be increased or decreased, or the components can be combined in different ways, or even different layout designs can be adopted. It will thus be appreciated that many variations are possible to those skilled in the art and are not limited to the specific examples described herein.
Description of related words:
The foregoing description has only expressed preferred embodiments of the invention and is not intended to limit the invention in any way. Modifications and variations of the disclosed embodiments are possible in light of the above teachings, as would be recognized by those of skill in the art. However, any simple modification, equivalent variation and adaptation of the above embodiments according to the technical substance of the present invention, which are intended to deviate from the technical substance of the present invention, still fall within the scope of the technical solution of the present invention.