Disclosure of Invention
Aiming at overcoming the defects in the prior art, the invention provides a deep learning tobacco grading method based on expert experience guidance.
The technical scheme adopted by the invention is as follows:
A deep learning tobacco grading method based on expert experience guidance comprises the following steps:
And obtaining the depth network feature vector of the tobacco leaf image by utilizing a preset depth feature extraction network.
And obtaining the manual feature vector of the tobacco leaf image by using a traditional computer vision method.
And splicing the depth network feature vector and the manual feature vector into a joint feature vector.
And inputting the combined feature vector into a preset grading judgment network, and outputting a tobacco grading result.
And (3) carrying out re-grading on the tobacco leaf images with incorrect grading by an expert to obtain updated classification labels, and learning the grading judgment network according to the updated classification labels to obtain an optimized grading judgment network.
The method comprises the steps of outputting attention force diagram by a depth feature extraction network, correcting attention force diagram corresponding to a tobacco leaf image with incorrect grading by an expert, obtaining an attention guide diagram, learning the depth feature extraction network according to the attention guide diagram, obtaining an optimized depth feature extraction network, obtaining an optimized depth network feature vector according to the optimized depth feature extraction network, obtaining an optimized combined feature vector according to the optimized depth network feature vector, learning a grading judgment network according to the optimized combined feature vector and an updated grading label, and obtaining the optimized grading judgment network.
As a preferred solution, a preset depth feature extraction network acquisition method includes the following steps:
And determining the neural network basic parameters of the neural network architecture searching process, wherein the neural network basic parameters comprise image resolution and network depth.
A search space for the neural network architecture search is defined, wherein the search space comprises a basic convolution module, a convolution kernel size, expansion coefficients and channel numbers.
And sampling network parameters of the search space by using a random sampling method, and calculating search rewards of a model corresponding to the network parameters.
And repeating the network parameter sampling, and calculating the searching rewards of the network parameter corresponding to the model until the model searching rewards reach the requirement or the sampling times reach the limit. And selecting the network parameter with highest search reward to construct a corresponding model, and taking the model as a depth feature extraction network.
Preferably, the manual feature vector at least comprises one of length, width, length-width ratio, perimeter, area, circularity, breakage rate, RGB red, green and blue channel mean value or HSV hue, saturation and brightness channel mean value of tobacco leaves.
Preferably, the search space of the basic convolution module is { MBConv, fused-MBConv }, the search space of the convolution kernel size is {3x3,5x5}, the search space of the expansion coefficient is {1,2,4,6}, and the search space of the channel number is {32,64,128,256,512}.
As a preferred scheme, the hierarchical judgment network comprises an input layer, a hidden layer and an output layer, wherein the number of neurons of the input layer is equal to the length of the joint feature vector output by the joint feature extraction module, and the number of neurons of the output layer is equal to the number of tobacco categories.
As a preferred scheme, the specific method for inputting the combined feature vector into a preset grading judgment network and outputting the grading result of tobacco leaves comprises the following steps:
The joint feature vector is input to a hierarchical judgment network, and a predictive probability vector is output.
The probability entropy is calculated using the predictive probability vector.
And judging the relation between the calculated probability entropy and the set threshold value.
And when the probability entropy is smaller than the set threshold value, ending grading, and outputting a tobacco grading result.
When the probability entropy is larger than a set threshold, expert intervention is needed, and tobacco grading results are checked.
The deep learning tobacco grading method based on expert experience guidance has the beneficial effects that aiming at the condition of stronger subjectivity of the existing tobacco grading, the method has objectivity and consistency of tobacco grading, and can achieve higher grading accuracy. Aiming at the accidental wrong tobacco leaf grading situation, the invention adopts an expert guiding method, and utilizes the field knowledge of the expert to assist in improving the grading accuracy. The method of the invention not only can be used for grading tobacco leaves, but also can be applied to other classification fields, such as agricultural products and the like.
Detailed Description
The invention will be further described with reference to specific examples.
As shown in FIG. 1, the deep learning tobacco leaf grading device based on expert experience guidance comprises three modules, namely a combined feature extraction module, a grading judgment module and an expert guidance module. The method comprises the steps of utilizing tobacco grading experience to extract a plurality of effective manual features of tobacco from tobacco images, simultaneously using a deep neural network to extract tobacco deep network features, aiming at the defect of low flexibility of a model in the existing method, adopting a neural network architecture to search to obtain a deep neural network model which is most suitable for tobacco data, and finally fusing the manual features and the deep features to obtain joint features for grading. And the neural network of the combined characteristic input grading judgment module predicts the final grading result, calculates the grading misjudgment probability and prompts an expert to judge the uncertain tobacco leaves again. The expert guidance module uses expert knowledge to continuously improve the grading accuracy of the whole model. When the classification judgment module indicates the tobacco leaves with higher classification error probability or the expert actively discovers the tobacco leaves with error classification, the expert can indicate the correct classification of the tobacco leaves to the system and provide an attention guide graph as a judgment basis to assist the model in improving the classification accuracy.
The invention provides a deep learning tobacco grading method based on expert experience guidance, which comprises the following steps:
S0, searching and acquiring a depth feature extraction network by utilizing a neural network architecture, and performing the following processing:
And S01, determining neural network basic parameters of a neural network architecture searching process, wherein the neural network basic parameters comprise image resolution and network depth.
And S02, defining a search space for searching the neural network architecture, wherein the search space comprises a basic convolution module, a convolution kernel size, expansion coefficients and channel numbers.
S03, sampling network parameters of the search space by using a random sampling method, and calculating search rewards of a model corresponding to the network parameters.
And S04, repeating network parameter sampling, and calculating search rewards of the network parameters corresponding to the models until the model search rewards reach the requirement or the sampling times reach the limit. And selecting the network parameter with highest search reward to construct a corresponding model, and taking the model as a depth feature extraction network.
S1, extracting a manual feature vector by using a traditional computer vision method, extracting a depth network feature vector by using a depth feature extraction network, obtaining a joint feature vector according to the manual feature vector and the depth network feature vector, and performing the following processing:
S11, segmenting a tobacco leaf image by using a traditional computer vision method, and obtaining a 13-dimensional manual feature vector f m according to a segmented mask and the tobacco leaf image, wherein the manual feature vector f m comprises the length, width, length-width ratio, perimeter, area, circularity and breakage rate of tobacco leaves and the average value of six color channels (red, green, blue, tone, saturation and brightness);
S12, extracting features of the tobacco leaf images by using a depth feature extraction network, and obtaining a depth network feature vector f d;
And S13, splicing the manual feature vector f m and the depth network feature vector f d to obtain a joint feature vector f t.
S2, classifying tobacco leaves by utilizing the joint feature vector through a classification judgment network, and performing the following treatment:
S21, inputting the joint feature vector into a hierarchical judgment network, and outputting a predictive probability vector.
S22, calculating probability entropy by using the predictive probability vector.
S23, judging the relation between the calculated probability entropy and the set threshold value.
And S24, when the probability entropy is smaller than a set threshold value, ending grading.
S25, when the probability entropy is larger than a set threshold, expert intervention is needed, and the classification result is checked.
And S3, carrying out re-grading by an expert aiming at the tobacco leaf images with incorrect grading to obtain updated classification labels, and learning the grading judgment network according to the updated classification labels to obtain an optimized grading judgment network.
And S4, outputting attention force diagram by the depth feature extraction network, correcting attention region by an expert according to the attention force diagram corresponding to the incorrectly-classified tobacco leaf image, obtaining an attention guide diagram, learning the depth feature extraction network according to the attention guide diagram, obtaining an optimized depth feature extraction network, obtaining an optimized depth network feature vector according to the optimized depth feature extraction network, obtaining an optimized combined feature vector according to the optimized depth network feature vector, learning the classified judgment network according to the optimized combined feature vector and the updated classification label, and obtaining the optimized classified judgment network.
Example 1:
A deep learning tobacco grading method based on expert experience guidance comprises the following steps:
As shown in fig. 2, first, manual features are extracted using a conventional computer vision method. The tobacco leaf images are collected on the equipment, and the existing image segmentation technology is utilized to obtain the mask of the tobacco leaf area.
And searching the tobacco leaf contour in the binarization mask map, calculating the tobacco leaf edge based on the pixel neighborhood, and considering the transition position of the pixel value of the mask image as the tobacco leaf contour. In order to calculate the length and width of tobacco leaves, the minimum external rectangle of tobacco leaves needs to be obtained. The obtained tobacco leaf contour is utilized to obtain the minimum convex hull surrounding the tobacco leaf, and the minimum convex hull surrounding the tobacco leaf is obtained by enumerating the convex hull surrounding rectangles and comparing the surrounding rectangular areas. The length of the circumscribed rectangle can represent the length of the tobacco leaf, the width of the circumscribed rectangle can represent the width of the tobacco leaf, and the ratio of the length to the width of the circumscribed rectangle can represent the length-width ratio of the tobacco leaf. The tobacco leaf contour is calculated, so that the tobacco leaf perimeter and area can be calculated directly by using the pixel number of the tobacco leaf contour.
Using the calculated circumference and area of the tobacco leaves, the circularity of the tobacco leaves is expressed as:
wherein E represents the degree of circularity of tobacco leaves, A represents the area of tobacco leaves, and P represents the circumference of tobacco leaves.
Under the general condition, the calculated total area is slightly lower than the internal area of the profile, and the ratio of the area of the damaged area in the tobacco profile to the total area in the tobacco profile is calculated to obtain the tobacco damage rate, which is expressed as:
wherein R represents the breakage rate of tobacco leaves, and S represents the area of a breakage area.
Under the general condition, an image acquired by a visible light camera is divided into three channels RGB, three channels can be directly separated, and the average value of the image of each channel is calculated, so that the color average value characteristic of each channel is obtained. Besides directly calculating the color mean value of the RGB three channels, converting the tobacco leaf image obtained by the RGB color model into an image represented by the HSV color model, and calculating the mean value of each channel to obtain the color mean value characteristics of the HSV three channels.
The features extracted at this stage include the length, width, aspect ratio, perimeter, area, circularity, breakage rate of the tobacco leaf, and the mean of the six color channels (red, green, blue, hue, saturation, and brightness), 13 in total, and all of the manual feature values are combined into a 13-dimensional vector f m.
As shown in fig. 3, the method uses neural network architecture search to determine the best model. The classification accuracy of the traditional neural network in the tobacco data set is limited, so that the neural network architecture search is adopted to determine the optimal model on a small amount of tobacco data.
In the searching process of the neural network architecture, the basic parameters of the neural network, namely the resolution of an input image and the depth of the network are determined.
The search space of the basic convolution module is { MBConv, fused-MBConv }, the search space of the convolution kernel size is {3x3,5x5}, the search space of the expansion coefficient is {1,2,4,6}, and the search space of the channel number is {32,64,128,256,512}.
In the searching process, in order to compare the performances of different structures, the most suitable model is selected, and a searching reward function is required to be defined according to design requirements. In general, the design aims to reduce the calculated amount of the model and the parameter number of the model while improving the accuracy of the model.
The search rewards function may be defined as:
R=Acc·Sw·Pv
Wherein Acc represents accuracy, S represents model calculation time, P represents model parameter, w, v represents super-parameters for weighing calculation time and parameter.
And searching the model with the largest rewards to obtain the optimal model structure. In the searching process, the accuracy, the calculating time and the model parameter number are required to be recorded, the searching reward of each model is calculated by utilizing the searching reward function, and the model with the highest searching reward is selected to be used as a depth feature extraction network.
The neural network architecture searching process adopts random sampling, namely randomly generating model parameters, calculating searching rewards of the models on corresponding data sets, recording the searching rewards of each model, and randomly sampling to obtain the next model parameters.
And randomly obtaining model parameters when model searching is carried out each time, and calculating search rewards until the search rewards reach the requirements or the model sampling quantity reaches the limit.
And inputting the tobacco leaf image into a depth feature extraction network obtained by searching by utilizing a neural network architecture, extracting the depth features in the tobacco leaf image, and outputting a depth network feature vector f d.
And splicing the manual feature vector f m and the depth network feature vector f d to obtain a joint feature vector f t. And inputting the characteristics obtained by calculation of the combined characteristic extraction module into a classification judgment network to be used as a tobacco classification basis.
As shown in fig. 4, the hierarchical judgment network comprises an input layer, a hidden layer and an output layer, wherein the number of neurons of the input layer is equal to the length of the joint feature vector output by the joint feature extraction module, the number of neurons of the output layer is equal to the number of tobacco leaves categories, and the number of neurons of the hidden layer is set according to experience.
The calculation process of the hierarchical judgment network can be expressed as follows:
P=softmax(b(2)+w(2)·Swish(b(1)+w(1)·ft))
wherein, P represents the predictive probability vector output by the hierarchical judgment module, b (1),b(2) represents the neural network bias parameter, w (1),w(2) represents the neural network weight parameter, softmax (·) represents the softmax function, and Swish (·) represents the Swish activation function, namely:
The prediction probability vector P= [ P 1,p2,…,pN]T ] output by the neural network, wherein P i represents the probability that the current tobacco leaves are the i-th tobacco leaves, and N represents the number of tobacco leaf classification categories.
Tobacco prediction categories may be obtained, expressed as:
according to the prediction probability vector P, the probability distribution is counted, and the probability entropy of the current tobacco leaf prediction is calculated:
And judging the tobacco grading error probability by using the calculated probability entropy, and prompting according to a preset probability entropy threshold.
And eta represents a threshold parameter set by the system, in the grading process, the predicted probability entropy is compared with the set threshold, when E < eta, the confidence of the grading result is higher, an expert is not needed to check temporarily, and when E is more than or equal to eta, the confidence of the grading result is lower, the expert intervention is needed, and the grading result is checked.
In practical situations, the threshold value can be flexibly selected according to the situation, when the threshold value eta is lower, the number of tobacco leaves needing to be checked for the grading result is more, and when the threshold value eta is higher, the number of tobacco leaves needing to be checked for the grading result is less.
The expert guidance module of the method fully utilizes the interactivity of human and machine, manually marks out important attention positions in misjudgment and uncertain attention force diagram through the human intervention of the expert, and increases the attention of the depth feature extraction network to the parts through the updated attention guide diagram so as to further improve the feature extraction effect.
For misclassified tobacco leaves, an expert can check the misclassified tobacco leaves and provide a correct grading result to guide a grading judgment network to learn, so that the grading effect is improved.
As shown in fig. 5, when the depth feature extraction network attention area is incorrect, an expert may provide an attention guidance chart to guide the depth feature extraction network attention important area so as to improve the feature extraction accuracy. And after the expert corrects the grading result and the characteristics, updating the parameters of the depth characteristic extraction network and the grading judgment network by using the corrected grading result and characteristics.
In order to display the depth feature extraction network attention area on a hierarchical basis and to modify the attention area by means of an expert, attention branches need to be added on the depth feature extraction network obtained by searching by using a neural network architecture.
The calculation process of the depth feature extraction network is expressed as follows:
map1=F(I)
wherein F (·) is a convolution feature extractor, map 1 represents a feature map output by the convolution feature extractor, and I represents input tobacco image data.
The attention branches added at the depth feature extraction network are expressed as:
Mnet(I)=sigmoid(conv1(map1))
map2=conv2(map1)
v1=GAP(map2)
Wherein M net (I) represents a single-channel attention plot of output, sigmoid (·) represents a sigmoid activation function, conv 1 (·) represents a convolution layer, the number of input channels is equal to the feature map, the number of output channels is 1, conv 2 (·) represents a convolution layer, the number of input channels is equal to the feature map, the number of output channels is a depth feature vector length, and GAP (·) is a global average pooling operation. v 1 denotes the feature vector outputted by the feature map 2 after global average pooling.
Note that striving M net (I) will be used to calculate a new feature v 2, expressed as:
map3=conv3(map1+Mnet(I)·map1)
v2=GAP(map3)
Wherein conv 3 (·) represents a convolution layer, the number of input channels is equal to that of the feature map, the number of output channels is the depth feature vector length, map 3 represents the feature map output by conv 3 (·) of the convolution layer, and v 2 represents the feature vector output by map 3 after global average pooling.
At this time, the depth feature vector may be expressed as:
fd=v1+v2
the reason why the model makes the prediction result can be analyzed by using the attention map of the attention branch output.
Usually, only small areas in the tobacco leaf image often play a decisive role in tobacco leaf classification, so that an expert can judge whether a model attention area is correct or not by checking attention diagrams. If the model focuses on the wrong region, the expert needs to correct the region of attention seeking attention.
When the expert is required to check the result of the area of attention seeking, the module requests the expert to mark the basis area of the example hierarchy in the form of a popup window to form an attention guide map M expert (I). Calculating the error between the attention guide graph and the attention graph provided by the expert, and obtaining attention loss, wherein the attention loss is expressed as:
Latt=α|Mexpert(I)-Mnet(I)|
Where α represents the attention deficit weight.
When the expert does not provide the attention guide graph and only uses the tobacco classification labels to carry out model training, the training process uses the classification judgment network output and the tobacco correct classification labels to calculate classification loss, which is expressed as:
Lc=CE(P,labele)
Where label e represents the expert provided correct hierarchical label and CE (·) represents the cross entropy loss function.
When the expert provides an attention deficit map, the model training loss function is expressed as:
L=Latt+Lc
And respectively carrying out online incremental learning on the depth feature extraction network and the grading judgment network by using a correct grading result and an attention guide graph provided by an expert.
In the running process of the method, the classification result and the attention area are corrected by means of the guidance of an expert, so that the tobacco classification accuracy is continuously improved.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.