CN119360060A

CN119360060A - Asset matching detection method and device

Info

Publication number: CN119360060A
Application number: CN202411391864.0A
Authority: CN
Inventors: 张莉; 苗紫菀; 王宁宁; 李卓松; 王磊
Original assignee: Beijing Information Science and Technology University
Current assignee: Beijing Information Science and Technology University
Priority date: 2024-10-07
Filing date: 2024-10-07
Publication date: 2025-01-24

Abstract

The present invention discloses an asset matching detection method and device. The method includes: initializing an asset false matching detection model, loading the weight of the asset false matching detection model; when the asset data input to the asset false matching detection model is two pictures, normalizing the two pictures respectively, and using the asset false matching detection model to predict the similarity of the two pictures, and outputting the predicted similarity score; when the asset data input to the asset false matching detection model is two videos, normalizing the two videos respectively, selecting frame pairs from the two normalized videos to obtain two pictures corresponding to the frame pairs, and using the asset false matching detection model to predict the similarity of the two pictures, and outputting the predicted similarity score; based on the similarity score, judging whether the asset data matches. The present invention solves the technical problem of inaccurate asset matching.

Description

Asset matching detection method and device

Technical Field

The invention relates to the field of artificial intelligence, in particular to an asset matching detection method and device.

Background

With the continuous enhancement of asset management and supervision, the requirements for data security, asset security and information security are increasingly raised. However, due to the existence of remote management and diverse investment principals, such as power plants and large railroad equipment, the matching identification of fixed assets faces a series of challenges such as inefficiency, high cost, and false matching. In response to these problems, there is an urgent need for an intelligent, automated method to identify, match and monitor fixed assets to address evolving business needs.

The asset matching verification can be assisted by using the image similarity detection technology, and the image similarity detection refers to judging the similarity between two images. In this process, it is necessary to extract features of two images and compare the features to calculate the similarity between the two images. Typically by computing feature vectors or feature descriptors of the two images. In the research of similarity calculation, image similarity detection can be classified into a conventional learning method and a deep learning method.

Conventional learning methods generally include two steps, feature extraction and similarity measurement, and feature extraction can be further classified into a similarity detection method based on local features and global features. Representative algorithms for comparison are the scale invariant feature transform feature (SCALE INVARIANT FeatureTransform, SIFT) based algorithm, the acceleration robust feature (Speeded Up Robust Features, SURF) based algorithm, and the fast and rotation feature (Oriented FAST and Rotated BRIEF, ORB) based algorithm. The similarity detection method based on the global features generally adopts color histograms, gray level co-occurrence matrixes and the like to directly describe the global features of the image. The similarity measure is another important research direction, and the objective of the similarity measure is to map two feature vectors to a real number through a function, so as to determine the similarity degree between two images. Common similarity measurement methods include euclidean distance, cosine similarity, hamming distance, and the like. The method realizes identification by extracting features from the shallow surface of the image, but has weak feature extraction capability, has good effect only by identifying and verifying in a simple environment, cannot extract deep image features, and reduces identification precision and accuracy when other complex environment conditions such as illumination, angle change and the like occur.

In recent years, the application of the deep learning technology in the aspect of image similarity detection has been greatly advanced, and the deep learning technology becomes a mainstream method in the field of image similarity detection at present. The deep learning method realizes end-to-end learning characteristic representation and similarity measurement through a neural network. For example, the twin neural network (Siamese Neural Networks, SNN) performs well in many image similarity detection tasks, and is increasingly being used in a wide variety of image similarity comparison and metric learning tasks. However, the existing image similarity measurement method based on deep learning also has the technical problem of inaccurate measurement.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides an asset matching detection method and device, which are used for at least solving the technical problem of inaccurate asset matching.

According to one aspect of the embodiment of the invention, an asset matching detection method is provided, which comprises the steps of initializing an asset false matching detection model, loading weights of the asset false matching detection model, carrying out normalization processing on two pictures respectively when asset data input into the asset false matching detection model are the two pictures, carrying out similarity prediction on the two pictures by using the asset false matching detection model, outputting predicted similarity scores, carrying out normalization processing on the two videos respectively when asset data input into the asset false matching detection model are the two videos, selecting a frame pair from the two videos subjected to normalization processing to obtain two pictures corresponding to the frame pair, carrying out similarity prediction on the two pictures by using the asset false matching detection model, and outputting the predicted similarity scores, and judging whether the asset data are matched or not based on the similarity scores.

According to another aspect of the embodiment of the invention, an asset matching detection device is further provided, and the asset matching detection device comprises an initialization module, a detection module and a determination module, wherein the initialization module is configured to initialize an asset false matching detection model, load the weight of the asset false matching detection model, the detection module is configured to respectively normalize two pictures when asset data input into the asset false matching detection model is two pictures, and conduct similarity prediction on the two pictures by using the asset false matching detection model, and output a predicted similarity score, the normalization processing is conducted on the two videos respectively when the asset data input into the asset false matching detection model is two videos, frame pairs are selected from the two videos subjected to normalization processing to obtain two pictures corresponding to the frame pairs, and the similarity prediction is conducted on the two pictures by using the asset false matching detection model, and the determination module is configured to judge whether the asset data are matched or not based on the similarity score.

In the embodiment of the invention, an asset false match detection model is initialized, the weight of the asset false match detection model is loaded, normalization processing is respectively carried out on two pictures when asset data input into the asset false match detection model is two pictures, similarity prediction is carried out on the two pictures by using the asset false match detection model, the predicted similarity score is output, normalization processing is respectively carried out on the two videos when the asset data input into the asset false match detection model is two videos, two pictures corresponding to the frame pairs are obtained by selecting frame pairs from the two videos subjected to normalization processing, similarity prediction is carried out on the two pictures by using the asset false match detection model, and whether the asset data are matched or not is judged on the basis of the similarity score. Through the scheme, the technical problem of inaccurate asset matching is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a flow chart of an asset match detection method according to an embodiment of the invention;

FIG. 2 is a flow chart of a training method of an asset false match detection model according to an embodiment of the invention;

FIG. 3 is a flow chart of a method of validating an asset false match detection model according to an embodiment of the invention;

FIG. 4 is a flow chart of another training method for an asset false match detection model based on deep learning according to an embodiment of the invention;

FIG. 5 is an architecture diagram of an asset false match detection model according to an embodiment of the invention;

FIG. 6 is a block diagram of a twin neural network according to an embodiment of the present invention;

FIG. 7 is a diagram of a residual network architecture according to an embodiment of the invention;

FIG. 8 is a CBAM attention mechanism block diagram according to an embodiment of the invention;

FIG. 9 is a ResNet network block diagram of a fused attention mechanism according to an embodiment of the invention;

FIG. 10 is a block diagram of a model of a fused convolution block attention module and twin network in accordance with an embodiment of the present invention;

FIG. 11 is a flow chart of another asset match detection method according to an embodiment of the invention;

FIG. 12 is a flow chart of another method of validating an asset false match detection model according to an embodiment of the invention;

FIG. 13 is a sample diagram of a self-constructing asset data set according to an embodiment of the invention;

FIG. 14 is a diagram of an example of data enhancement according to an embodiment of the present invention;

FIG. 15 is a graph of training loss and accuracy as a function of iteration number in accordance with an embodiment of the present invention;

FIG. 16 is a schematic diagram of a training apparatus based on a deep learning asset false match detection model according to an embodiment of the invention;

fig. 17 shows a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In accordance with an embodiment of the present invention, there is provided a method embodiment of a training method for an asset false match detection model based on deep learning, it being noted that the steps illustrated in the flowchart of the figures may be performed in a computer system such as a set of computer executable instructions and, although a logical sequence is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in a different order than that illustrated herein.

SUMMARY

In order to solve the problems of low overseas asset matching recognition efficiency, high labor cost and false matching under a remote management and diversified investment subjects, an asset false matching detection method based on a twin neural network and a convolution attention module is provided, a twin neural network (Siamese neural network) is used as a basic framework, a convolution neural network model is constructed, a CBAM (Convolutional Block Attention Module) mixed attention mechanism is introduced to form a feature extraction module, and a pre-training weight is migrated into a training model through a migration learning method. And meanwhile, in order to enable the model to have better generalization capability, data enhancement is used for processing the samples. Experiments of collected asset data of different types on asset matching tasks show that the provided model achieves 92.42% accuracy and 92.48% F1 measure value, can improve the identification accuracy of false asset matching, and brings a more efficient and intelligent solution to the field of asset management.

Example 1

The application provides an asset matching detection method, as shown in figure 1, which comprises the following steps:

Step S102, initializing an asset false matching detection model, and loading the weight of the asset false matching detection model;

The asset false match detection model is obtained by constructing a convolutional neural network by using a twin neural network as a basic framework and introducing a mixed attention mechanism. The specific training method, as shown in fig. 2, includes the following steps:

step S1022, fusing a convolutional neural network and a twin network combined with an attention mechanism to construct the asset false matching detection model;

Constructing the convolutional neural network by using a twin neural network as a basic framework; the attention mechanism is introduced in the convolutional neural network to construct the asset false match detection model. And migrating the pre-training weight to the asset false matching detection model through a migration learning method.

Step S1024, inputting training data into the convolutional neural network, and performing feature extraction on two sample pictures forming a positive sample or a negative sample in the training data by utilizing the convolutional neural network to obtain two feature vectors;

First, asset data is collected and divided into a training data set, a validation data set and a test data set, and the training data set is preprocessed, wherein the preprocessing includes at least one of normalization of an image, resizing, and data enhancement.

And respectively carrying out feature extraction on the two sample pictures by using the convolutional neural network to obtain two high-dimensional feature graphs serving as the two feature vectors, wherein the high-dimensional feature graphs are feature graphs with the dimension larger than the preset dimension.

Step S1026, comparing the feature differences between the two feature vectors, and predicting the similarity of the two sample pictures based on the feature differences;

The method comprises the steps of calculating an L1 distance between two feature vectors by using a mapping function to serve as the feature difference, comparing and fusing the L1 distance by using a full connection layer to measure similarity of the two sample pictures, and outputting the predicted similarity by using an activation function based on the similarity.

And step 1028, comparing the predicted similarity with the actual similarity by using a loss function to obtain loss between the predicted similarity and the actual similarity, and updating network parameters of the convolutional neural network and the twin network based on the loss to obtain the trained false asset matching detection model.

Step S104, respectively carrying out normalization processing on the two pictures under the condition that the asset data input into the asset false matching detection model are two pictures, carrying out similarity prediction on the two pictures by using the asset false matching detection model, and outputting a predicted similarity score;

Firstly, the convolutional neural network is utilized to extract the characteristics of the two pictures, for example, a plurality of filters are used to respectively scan the two pictures to generate a characteristic diagram, the size of the characteristic diagram is reduced through downsampling, the nonlinear characteristics are introduced into the characteristic diagram with the reduced size by using an activation function, and the characteristic diagram with the nonlinear characteristics introduced is taken as the extracted characteristics.

Then, the weights of the feature map are adjusted. The method comprises the steps of capturing importance of each channel through global average pooling and global maximum pooling by utilizing a channel attention module, generating channel weights by utilizing a multi-layer perceptron, pooling the feature map in a channel dimension by utilizing a spatial attention module, generating spatial weights by convolution operation, and dynamically adjusting the weights of the feature map based on the channel weights and the spatial weights.

Finally, the similarity measurement is carried out on the extracted features by utilizing the measurement function of the twin neural network, the predicted similarity score is output, for example, the distance between two feature vectors respectively corresponding to the extracted features is calculated by utilizing the measurement function, the similarity measurement is carried out on the basis of the distance, and the predicted similarity score is output.

Step S106, respectively carrying out normalization processing on two videos under the condition that the asset data input into the asset false matching detection model is the two videos, selecting a frame pair from the two videos subjected to normalization processing to obtain two pictures corresponding to the frame pair, carrying out similarity prediction on the two pictures by using the asset false matching detection model, and outputting a predicted similarity score;

First, two pictures are extracted from a video. And respectively carrying out normalization processing on the two videos. The normalization processing aims to eliminate interference factors in the video, which are caused by shooting conditions, equipment or environmental differences, so that video data have consistency in the same feature space, and the detection accuracy of the model is improved. In particular, the normalization process may include operations such as size adjustment, color correction, brightness and contrast adjustment of the video frames to ensure that the two videos are compared against the same reference. After normalization processing is completed, a plurality of frame pairs are selected from the two videos subjected to normalization processing according to a certain strategy. The frame pair selection policy may be determined based on time intervals, key frame extraction algorithms, or other feature matching algorithms to ensure that the selected frame pairs are representative of the primary content and features of the video. The selected frame pairs typically comprise two temporally corresponding frames, each frame pair corresponding to one of the two videos. Finally, the selected frame pairs are converted into a picture form for further comparison analysis. By extracting the video frames as still pictures, it can be more conveniently analyzed using image processing algorithms to determine if there is a false match between the two videos. The process ensures that when the asset false matching detection model is input, the model can process unified and representative image data, thereby improving the detection precision and reliability.

Then, similarity prediction is performed. The similarity prediction method is the same as step S104, and will not be described here again.

Step S108, judging whether the asset data is matched or not based on the similarity score.

In some embodiments, the asset false match detection model may also be validated after it has been trained. As shown in fig. 3, the verification method of the asset false match detection model includes the following steps:

step S302, acquiring asset data, and dividing the asset data into a training data set, a verification data set and a test data set;

Step S304, based on the test data set, a quantitative analysis method is adopted, and the detection performance of the asset false match detection model on the similarity of the asset image is evaluated by using the accuracy, the precision, the recall and the F1 measured value, so as to verify the asset false match detection model.

The training data set is preprocessed, wherein the preprocessing includes data enhancement with the number of operations and the magnitude of the transformation. The data enhancement includes at least one of brightness adjustment, contrast adjustment, rotation, scaling, flipping, and occlusion.

In some embodiments, a predicted positive class TP and a predicted negative class FN in the case of a true positive class of test data in the test dataset and a predicted positive class FP and a predicted negative class TN in the case of a true negative class of test data in the test dataset are determined based on a binary confusion matrix, the accuracy, the precision, the recall, and the F1 measurement are determined based on the accuracy, the precision, the recall, and the F1 measurement, and a detection performance of an asset false match detection model on asset image similarity is evaluated based on at least a portion of the predicted positive class TP and the predicted negative class FN in the case of the true positive class and the predicted positive class FP and the predicted negative class TN in the case of the true negative class.

In some embodiments, the backbone network, the improvement method, and the data enhancement of the asset false match detection model are adjusted separately under control of variables, and based on the accuracy, the precision, the recall, and the F1 measurements are calculated separately for the adjusted backbone network, the improvement method, and the data enhancement to evaluate the impact of the backbone network, the improvement method, and the data enhancement on the performance of the asset false match detection model.

In some embodiments, under the same test dataset, extracting color features of pictures in the test dataset using a color histogram method, extracting texture features of the pictures using a gray level co-occurrence matrix method, calculating similarity of the pictures according to the extracted color features and texture features, and calculating accuracy, precision, recall and F1 measurement values of the color histogram method and the gray level co-occurrence matrix method, respectively, based on the calculated similarity, comparing the accuracy, precision, recall and F1 measurement values of the color histogram method and the gray level co-occurrence matrix method and the accuracy, precision, recall and F1 measurement value of the asset false match detection model to verify the asset false match detection model.

Example 2

FIG. 4 is another training method for an asset false match detection model based on deep learning, according to an embodiment of the invention, as shown in FIG. 4, the method comprising:

Step S402, data preprocessing.

First, asset data is collected and the data set is divided into a training set, a validation set and a test set, and a portion of the data set is preprocessed. The preprocessing step comprises normalization, size adjustment, data enhancement and the like of the image so as to improve generalization capability and robustness of the model.

Step S404, model training.

The asset false matching detection model provided by the application is obtained by fusing a convolutional neural network and a twin network which are combined with an attention mechanism as shown in fig. 5.

The twin network performs matching evaluation by calculating the similarity of the input asset images, and the twin neural network (Siamese Neural Network) is selected as a network framework for matching the similarity of the assets in consideration of the input of two images. As shown in fig. 6, the twin neural network (Siamese Neural Network) is a neural network architecture for comparing similarity of two input samples, and is composed of two sub-networks sharing weights, each sub-network receives one input sample and maps it into a feature space, and the similarity of the samples is determined by calculating the distance between two feature vectors. Wherein the input pictures are X1, X2, GW (X1), GW (X2) is the extracted characteristic vector, and the mapping function EW is calculated by the modulo length of the difference between the two vectors. The closer its value is to 1, the higher the similarity.

Compared with other network models, the twin network can be directly used for identifying test data by only learning the universal embedded function from the data set containing enough samples in the initial stage without retraining. Because of its unique structure, an effective model can be trained even with small dataset images. The subnetworks of the twin network are typically convolutional neural networks, facilitating selection of a suitable network structure for matching identification of assets according to different image identification tasks.

Convolutional neural networks are used for feature extraction. Feature extraction and description are key issues for computer vision, which primarily involve extracting useful, distinguishable features from images that can be used to perform object detection, image recognition, and other related tasks. Feature extraction and description are vital links in solving the problem of false matching detection of assets. The core of asset false match detection is to identify false matches by analyzing and comparing the characteristics of different assets. Convolutional neural networks (Convolutional Neural Network, CNN) are powerful tools for solving image feature extraction.

The convolutional neural network consists of five hierarchical structures of an input layer, a convolutional layer, a pooling layer, an activation layer and a full connection layer. The input layer receives the original image data, the convolution layer scans the image by using a plurality of filters to generate a feature map, the pooling layer reduces the size of the feature map through a downsampling operation to reduce the calculated amount and parameter amount, the activation layer generally uses an activation function such as ReLU (RECTIFIED LINEAR Unit) to introduce nonlinear features, and the full-connection layer receives feature input from a previous layer network to map to another fixed-dimension output vector. As the number of layers of the network increases, convolutional neural networks can extract higher-level, more abstract features. However, when the depth of the network increases to some extent, gradient extinction or gradient explosion problems occur. This makes it difficult for deep networks to converge during training, affecting the performance of the model. For this problem, resNet (residual network) models have been developed. ResNet the core idea is to introduce a residual block to solve the degradation problem in deep neural network training by residual connection (skip connections). Fig. 7 shows the structure of a Residual Block (Residual Block) which is a basic unit in the ResNet model.

Wherein x is the input of the convolutional neural network, and the output H (x) is obtained after a series of weight layers and nonlinear activation functions. The conventional forward propagation process may be expressed as H (x) =f (x). The key to the residual block is the introduction of a skip connection (residual connection), otherwise known as a residual connection, which passes the input x directly to the output, adding to the output F (x) processed by the convolutional layer and the activation function, i.e. H (x) =f (x) +x. This form of design allows the module to learn not the whole function F (x) but the residuals F (x) -x, hence the term residual network.

The application uses ResNet model structure, which can extract multi-layer characteristic information, and is suitable for the false matching model of assets, and the network structure is shown in table 1.

Table 1ResNet network architecture

In order to process influence factors such as irrelevant background noise in an asset image, a CBAM module is introduced into a ResNet model, so that the feature extraction capability can be effectively enhanced, and the accuracy of false matching detection of the asset is improved.

CBAM (Convolutional Block Attention Module) is an attention mechanism module for enhancing the performance of convolutional neural networks. It weights the feature map in the channel and spatial dimensions by introducing two sub-modules of channel attention (Channel Attention) and spatial attention (Spatial Attention), respectively. The channel attention module captures the importance of each channel through global average pooling and global maximum pooling, and generates channel weights through a multi-layer perceptron (MLP). The spatial attention module pools the feature map in the channel dimension and generates spatial weights through convolution operation, so that important spatial areas are highlighted. CBAM dynamically adjusts the weight of the feature map through the two steps, so that the network can be focused on key features more accurately, the representation capacity and performance of the model are improved, and the method is widely applied to tasks such as image classification, target detection, similarity matching and the like.

CBAM the structure diagram is shown in fig. 8, the characteristic diagram F is input, and the whole calculation process is shown in the formulas (1) and (2):

wherein, Representing the dot product operation, F 'representing the output of the input signature F through the channel attention module, F "being the output of F' through the spatial attention module. The channel attention module and the spatial attention module are calculated by formulas (3) and (4), wherein sigma represents a Sigmoid function:

M_c (F) = σ(MLP(AvgPool(F))+MLP(MaxPool(F))) (3)

M_s (F′) = σ(Conv([AvgPool(F′);MaxPool(F′)])) (4)

Migration learning helps the target domain train a reliable decision function by migrating auxiliary source knowledge, thereby solving the learning problem when sample data in the target domain is unlabeled or has only a small number of labeled samples. In the training process, a pre-training model of ResNet network is called by adopting a migration learning mode. ResNet50 is a deep convolutional neural network model pre-trained on an ImageNet dataset, with rich feature representation capabilities. By fine tuning the weight parameters of the asset feature vector on the source training set, the asset feature vector can be effectively extracted, so that the resource consumption is reduced and the training efficiency is improved. The feature extraction part of the present application combines ResNet and CBAM and figure 9 shows a ResNet network structure that incorporates the attention mechanism.

The ResNet network and the twin network combined with the attention mechanism are fused, and the structural schematic diagram of the network model is shown in fig. 10. In the ResNet50 0_ CBAM network with two branches, an asset picture is respectively input to form a positive sample or a negative sample, and feature extraction is respectively carried out to map the samples into a low-latitude space. And then flattening the extracted feature vectors into one-dimensional vectors, calculating the L1 distance between the two vectors, comparing and fusing features by utilizing two fully connected layers to better measure similarity, and finally outputting the similarity through a Sigmoid function to judge whether the assets are matched.

The application carries out reverse updating on the false matching detection network of the asset through the loss function, and adjusts the parameters of the feature extraction layer and the full connection layer so as to enable the parameters to more accurately predict the similarity of the input asset image pair. The fully connected neural network outputs a similarity score between 0 and 1, which is then cross-entropy computed with the label y of the input image pair (where y=1 is similar and y=0 is dissimilar). The cross entropy loss function measures the difference between the model output and the real label. The method comprises the steps of determining binary cross entropy loss of current training data based on actual similarity and prediction similarity of the current training data in the current training data set, accumulating all binary cross entropy loss in the training data set, and averaging all accumulated binary cross entropy loss to obtain final loss. The specific calculation process is as follows:

First, the output of the model is compressed between [0,1] through the Sigmoid activation function, representing the probability of being predicted as a positive class. Sigmoid function is shown in formula (5):

Then, comparing the prediction probability with the real label, and calculating the binary cross entropy loss, wherein the binary cross entropy loss is shown in a formula (6):

BCE = -[y_i·log(σ(z_i))+(1-y_i)·log(1-σ(z_i))] (6)

Finally, the losses of all samples are averaged to obtain the final loss value, see formula (7):

Where N is the number of image pairs, y _i is the true label of the ith image pair, i.e., the actual similarity (0 or 1), and z _i is the predicted similarity of the model.

According to the scheme, the asset false matching detection network is reversely updated through the loss function, and parameters of the feature extraction layer and the full connection layer are adjusted so as to improve detection accuracy. Specifically, the output of the model is compressed between [0,1] through a Sigmoid activation function, representing the probability of being predicted as a positive class. The output similarity score is then cross-entropy computed with the labels of the input image pair (y=1 for similarity and y=0 for dissimilarity). The cross entropy loss function measures the difference between the model output and the real label, and the error is gradually reduced by optimizing the network parameters through back propagation, so that the prediction capability of the model is improved. The calculation process of the cross entropy loss function comprises the steps of comparing the prediction probability with a real label, calculating binary cross entropy loss, and averaging the loss of all samples to obtain a final loss value. According to the scheme, the cross entropy loss function is utilized, the model parameters are adaptively learned, so that the model can more accurately predict the similarity of the asset image pair, the training process is stabilized, and the overlarge influence of a single sample on the whole training is avoided. Through the optimization process, the method and the device remarkably improve the effect of false matching detection of the assets, and have the characteristics of good interpretability and easiness in implementation.

In other embodiments, the binary cross entropy loss of the current training data may also be determined based on the actual similarity and the predicted similarity of the current training data in the current training data set, and the corresponding weights, all the binary cross entropy losses in the training data set are accumulated and averaged to obtain an average loss, a regularization term Lreg of the asset false match detection model is calculated based on the network parameters and regularization parameters λ of the asset false match detection model, and a final loss is calculated based on the average loss and regularization term of the binary cross entropy losses.

First, a binary cross entropy loss is calculated:

BCE=-[α·y_i·log(σ(z_i))+β·(1-y_i)·log(1-σ(z_i))]

wherein, alpha and beta are weight parameters of positive sample and negative sample respectively, which are used to adjust the influence of positive and negative sample on the overall loss so as to cope with the class imbalance problem. And α+β=1 such that the scale (scale) of the total loss is kept uniform, ensuring stability of training and interpretability of the loss value. Then, a regularization term is calculated.

Wherein λ is a regularization parameter, θj is a model parameter, and finally, a total loss function after combining the binary cross entropy loss and the regularization term is:

Where N is the number of samples, y _i is the true label (0 or 1) of the ith image pair, z _i is the predicted output of the model, and σ (zi) is the output of the Sigmoid function.

According to the application, the weight parameters alpha and beta are introduced, so that the problem of unbalanced category can be effectively adjusted, and the model can better perform on the data set with unbalanced category. By adding the regularization term lambda, the model can be restrained from excessively fitting training data, and the generalization capability of the model is enhanced. In summary, the improved loss function is more stable when processing complex data, reducing the excessive impact of a single sample on the overall training process.

In some embodiments, to prevent gradient explosion, gradients are clipped to a preset threshold using a gradient clipping norm, and model parameters θj are updated based on the clipped gradients, then regularization terms are calculated based on the model parameters, and the total loss is calculated based further on the binary cross entropy loss and regularization terms.

The images are respectively input into a pre-trained convolution attention network to train a feature extractor, the network adopts a weight sharing mechanism, then similarity measurement is carried out on the extracted features through a measurement function, and finally an asset image similarity score is output through a Sigmoid function.

And step S406, evaluating the model.

The performance of the asset false match detection model is comprehensively analyzed by a series of evaluation methods. The specific method of model evaluation will be described in detail below and will not be described in detail here.

Example 3

The embodiment of the application provides an asset matching detection method, as shown in fig. 11, which comprises the following steps:

Step S1102, initializing a similarity calculation model and loading model weights;

And step 1104, judging whether the input is a picture or a video, if the input is the picture, carrying out picture normalization processing, carrying out similarity prediction on the two images by using a pre-trained asset false matching detection model, and outputting a similarity score.

Step S1106, if the video is, video normalization processing is carried out, frame pairs are selected from the video, a pre-trained asset false matching detection model is used for carrying out similarity prediction on the two images, and a similarity score is output. And repeatedly selecting a frame pair, recording a frame with highest similarity, and outputting the highest similarity score. Finally, the matching condition of the assets is evaluated according to the similarity scores output by the model.

The training method of the asset false match detection model is as described above and will not be described in detail here.

Example 4

The embodiment of the application also provides a verification method of the asset false matching detection model, as shown in fig. 12, comprising the following steps:

Step S1202, a verification environment is set.

All experiments of the application are completed on the same experiment computer platform, python language is adopted, and experimental development is carried out based on Pytorch deep learning framework, and the integrated development environment is Pycharm. The experimental computer specific configuration information is shown in table 2.

Table 2 experimental environment configuration

In step S1204, data is collected.

The data set adopted by the experiment is the custom data set which is collected and manufactured on each large video website aiming at the similarity matching of the asset images, and the data set is expanded through various data enhancement technologies. The selection of the asset video needs to encompass various details and features of the asset, capturing the full view of the asset from different perspectives, helping to build a more comprehensive and accurate asset matching model. The asset video is then converted into frame images, each category representing one asset, for a total of 195 assets, for a total of 9636 asset pictures. A data set sample is shown in fig. 13.

Due to the time-varying and visual angle-varying nature of the asset photographing process, the present application employs an automatic enhancement strategy RandAugment that achieves data enhancement by simplified parameter adjustment, consisting essentially of two parameters, the number of operations (N) and the magnitude of the transformation (M). It randomly selects a predefined transformation operation and uniformly applies the transformation magnitudes for combinatorial enhancement. Enhancement techniques including random occlusion, rotation, scaling, flipping, brightness, contrast adjustment, etc., and the simulation results for the different enhancement techniques are described in Table 3.

Table 3 description of simulation effect of enhanced technique

Shooting of the same asset in different periods and different visual angles is simulated through a data enhancement technology, so that complex images can be matched during model training and testing, and overfitting is avoided. An example of data enhancement is shown in fig. 14.

Step S1206, an evaluation criterion is set.

Firstly, uniformly setting the sizes of all original images into jpg pictures of 105 multiplied by 105 in a self-constructed asset data set, then dividing the images into a training set, a verification set and a test set according to a ratio of 7:2:1, training the weight parameters of the model on the training set, adjusting the super parameters of the model on the verification set, and finally testing on the test set. The evaluation index of the similarity detection performance of the asset image is evaluated by adopting quantitative analysis and selecting the measurement values of accuracy, precision, recall rate and F1. The pictures in the test set were organized into a similar picture pair and a dissimilar picture pair, including 2310 pairs of images in total. Taking the example of successfully matching the same asset image from the test set, inputting a matching target asset image, judging that the matching target asset image is the same asset with similarity larger than 0.85, and if the matching target asset image is not the same asset, indicating that the matching target asset image is not the same asset. As shown in Table 4, TP (True Positive) is the number of samples that the real tag is positive and the model predicts as positive, FP (False Postive) is the number of samples that the real tag is negative but the model predicts as positive, TN (True Negative) is the number of samples that the real tag is negative and the model predicts as negative, FN (False Negative) is the number of samples that the real tag is positive but the model predicts as negative.

TABLE 4 two classification confusion matrix

The evaluation index calculation formula is as follows:

1) The calculation formula of the accuracy rate is shown in formula (8):

2) The calculation formula of the accuracy rate is shown in formula (9):

3) The calculation formula of the recall rate is shown in formula (10):

4) The formula of calculation of the F1 measurement value is shown in formula (11):

Step S1208, training and analysis are performed.

Model training parameters are shown in table 5:

table 5 model training parameters

The training process is that the batch size is set to be 32, the iteration number is set to be 200, the optimizer selects Sgd, the cosine learning rate is used for attenuation, the initial learning rate is set to be 1e-2, and the minimum learning rate is set to be 1e-4. The loss and accuracy curves during training are shown in fig. 15. As can be seen from fig. 15, the loss tended to flatten after training to 150 th Epoch, and the model was effectively trained within 200 Epoch.

In order to remotely manage overseas diversified assets, an asset data set is built, and an asset false matching detection method based on a twin neural network and a convolution attention module is provided, so that more accurate and efficient matching audit is realized. Experimental results show that the accuracy and efficiency of the calculation of the similarity of the assets are remarkably improved by the method. Not only has remarkable results in the aspects of asset identification and matching, but also has excellent efficiency in the intelligent inventory of the assets, and provides reliable support for asset management.

In order to ensure the scientificity and rationality of the experiment, the application introduces an ablation experiment in the experimental process. In the case of control variables, the three aspects of the backbone network, the improvement method and the data set processing are respectively adjusted, and the influence of each part on the overall system performance is evaluated. The experimental results are shown in table 6:

table 6 comparison of ablation experimental results

As can be seen from the data in Table 6, the model shows higher accuracy and F1 values, while maintaining higher accuracy and recall, in combination with CBAM modules and data enhancement techniques. This suggests that CBAM modules and data enhancements have a promoting effect on the performance enhancement of the model. And ResNet introduces residual connections that help alleviate the gradient vanishing problem, making the network more easy to train and able to learn more complex feature representations. In an asset matching task, complex feature representation capabilities can better distinguish between similar and dissimilar assets.

Step S1210, similarity matching result comparison.

Conventional methods for similarity matching of images generally extract color features of the images using histograms and texture features of the images using gray level co-occurrence matrices, and calculate the similarity of the images from the extracted feature vectors. In order to further verify the effectiveness of the network, the method of the application is compared with the traditional method. The comparison tests are carried out under the same test set, and the performance comparison of different similarity calculation methods in the task of detecting the similarity of the asset images is shown in table 7:

Table 7 comparison of performance of different methods in an asset image similarity detection task

In the traditional method, the accuracy rate of the histogram method is 52.64%, the recall rate is only 7.76%, the omission ratio is most obvious, the asset matching capability is the weakest, the accuracy rate of the gray level co-occurrence matrix method is 51.08%, the recall rate is 93.25%, although the matching capability is strong, misjudgment is easy to occur, and a plurality of dissimilar assets are misjudged as similar assets, so that the two methods cannot meet the actual application scene. Compared with the prior art, the accuracy of the method provided by the application reaches 92.42%, which is improved by 39.78% and 41.34% respectively compared with a histogram method and a gray level co-occurrence matrix method, and the F1 measure is 92.48% which is improved by 78.08% and 26.31% respectively compared with the two traditional methods. These results indicate that the deep learning based asset similarity matching method is significantly superior in performance to the conventional method.

Example 5

The application also provides a training device of the asset false matching detection model based on deep learning, which is shown in fig. 16 and comprises a construction module 192, a feature extraction module 194, a similarity comparison module and a training module 196, wherein the construction module 192 is configured to fuse a convolutional neural network combined with an attention mechanism with a twin network to construct the asset false matching detection model, the feature extraction module 194 is configured to input training data into the convolutional neural network and conduct feature extraction on two sample pictures forming positive samples or negative samples in the training data by utilizing the convolutional neural network to obtain two feature vectors, the similarity comparison module is configured to compare feature differences between the two feature vectors and predict similarity of the two sample pictures based on the feature differences, the training module 196 is configured to compare predicted similarity with actual similarity to obtain loss between the predicted similarity and the actual similarity, and network parameters of the convolutional neural network and the twin network are updated based on the loss to obtain the trained asset false matching detection model.

It should be noted that, in the training device provided in the above embodiment, only the division of the above functional modules is used as an example, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to perform all or part of the functions described above. In addition, the training device and the training method provided in the foregoing embodiments belong to the same concept, and detailed implementation processes of the training device and the training method are shown in the method embodiments, which are not described herein.

Example 6

Fig. 17 shows a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure. It should be noted that the electronic device shown in fig. 17 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present disclosure.

As shown in fig. 17, the electronic apparatus includes a Central Processing Unit (CPU) 1001 that can execute various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for system operation are also stored. The CPU1001, ROM 1002, and RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

Connected to the I/O interface 1005 are an input section 1006 including a keyboard, a mouse, and the like, an output section 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like, a storage section 1008 including a hard disk, and the like, and a communication section 1009 including a network interface card such as a LAN card, a modem, and the like. The communication section 1009 performs communication processing via a network such as the internet. The drive 1010 is also connected to the I/O interface 1005 as needed. A removable medium 1011, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in the drive 1010, so that a computer program read out therefrom is installed as needed in the storage section 1008.

In particular, according to embodiments of the present disclosure, the processes described below with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 1009, and/or installed from the removable medium 1011. When being executed by a Central Processing Unit (CPU) 1001, performs the various functions defined in the method and apparatus of the present application. In some embodiments, the electronic device may further include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.

It should be noted that the computer readable medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

As another aspect, the present application also provides a computer-readable medium that may be included in the electronic device described in the above embodiment, or may exist alone without being incorporated into the electronic device.

The computer-readable medium carries one or more programs which, when executed by one of the electronic devices, cause the electronic device to implement the methods described in the embodiments below. For example, the electronic device may implement the steps of the method embodiments described above, and so on.

As another aspect, the application also provides an electronic device comprising at least one processor and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor, which when executed by the at least one processor is adapted to cause an electronic device to perform a method of an embodiment of the application.

As another aspect, the present application also provides a computer program product comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is for causing the computer to perform the method of the embodiments of the present application.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. The storage medium includes a U disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, etc. which can store the program code.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims

1. An asset matching detection method, comprising:

Initializing an asset false matching detection model, and loading the weight of the asset false matching detection model;

Under the condition that the asset data input into the asset false matching detection model are two pictures, respectively carrying out normalization processing on the two pictures, carrying out similarity prediction on the two pictures by using the asset false matching detection model, and outputting a predicted similarity score;

Under the condition that the asset data input into the asset false matching detection model are two videos, respectively carrying out normalization processing on the two videos, selecting a frame pair from the two videos subjected to normalization processing to obtain two pictures corresponding to the frame pair, carrying out similarity prediction on the two pictures by using the asset false matching detection model, and outputting a predicted similarity score;

Based on the similarity score, a determination is made as to whether the asset data matches.

2. The method of claim 1, wherein the asset false match detection model is derived by constructing a convolutional neural network using a twin neural network as a base frame and introducing a mixed attention mechanism.

3. The method of claim 2, wherein using the asset false match detection model to perform similarity prediction on the two pictures, outputting a predicted similarity score, comprising:

Extracting features of the two pictures by using the convolutional neural network;

And carrying out similarity measurement on the extracted features by utilizing a measurement function of the twin neural network, and outputting the predicted similarity score.

4. A method according to claim 3, wherein similarity measurement of the extracted features using a metric function of the twin neural network comprises:

Calculating distances between two feature vectors respectively corresponding to the extracted features by using the metric function;

The similarity measure is performed based on the distance.

5. A method according to claim 3, wherein feature extraction of the two pictures using the convolutional neural network comprises:

scanning the two pictures respectively by using a plurality of filters to generate a feature map, and reducing the size of the feature map through downsampling;

the feature map of reduced size is introduced with a nonlinear feature using an activation function, and the feature map in which the nonlinear feature is introduced is taken as an extracted feature.

6. The method of claim 5, wherein after generating the feature map, the method further comprises:

capturing the importance of each channel through global average pooling and global maximum pooling by utilizing a channel attention module, and generating channel weights through a multi-layer perceptron;

Pooling the feature map in the channel dimension by using a spatial attention module, and generating spatial weights by convolution operation;

The weights of the feature map are dynamically adjusted based on the channel weights and the spatial weights.

7. An asset match detection device comprising:

The initialization module is configured to initialize an asset false matching detection model and load the weight of the asset false matching detection model;

The detection module is configured to respectively normalize two pictures under the condition that the asset data input into the asset false matching detection model are the two pictures, and predict the similarity of the two pictures by using the asset false matching detection model, and output the predicted similarity score; under the condition that the asset data input into the asset false matching detection model are two videos, respectively carrying out normalization processing on the two videos, selecting a frame pair from the two videos subjected to normalization processing to obtain two pictures corresponding to the frame pair, carrying out similarity prediction on the two pictures by using the asset false matching detection model, and outputting a predicted similarity score;

a determination module is configured to determine whether the asset data matches based on the similarity score.

8. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program when run controls a device in which the computer readable storage medium is located to perform the method according to any one of claims 1 to 6.

9. A computer device is characterized by comprising a memory and a processor,

The memory stores a computer program;

the processor being operative to execute a computer program stored in the memory, the computer program when run causes the processor to perform the method of any one of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.