[go: up one dir, main page]

CN110175615A - The adaptive visual position recognition methods in model training method, domain and device - Google Patents

The adaptive visual position recognition methods in model training method, domain and device Download PDF

Info

Publication number
CN110175615A
CN110175615A CN201910350741.5A CN201910350741A CN110175615A CN 110175615 A CN110175615 A CN 110175615A CN 201910350741 A CN201910350741 A CN 201910350741A CN 110175615 A CN110175615 A CN 110175615A
Authority
CN
China
Prior art keywords
image
feature extraction
layer
training
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910350741.5A
Other languages
Chinese (zh)
Other versions
CN110175615B (en
Inventor
桑农
刘耀华
高常鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201910350741.5A priority Critical patent/CN110175615B/en
Publication of CN110175615A publication Critical patent/CN110175615A/en
Application granted granted Critical
Publication of CN110175615B publication Critical patent/CN110175615B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种模型训练方法、域自适应的视觉位置识别方法及装置,属于计算机视觉技术领域,包括:建立基于深度神经网络的图像特征提取模型;根据标准数据集构建训练集,训练集中每个训练样本包括目标图像及其正样本和s个负样本;利用训练集对图像特征提取模型进行训练;图像特征提取模型中,特征提取网络包括级联的多个第一网络;第一网络由一个或多个第二网络以及一个极大池化层依次连接而成,极大池化层用于特征选择;第二网络包括依次连接的卷积层,用于特征提取;批标准化层,用于进行零均值标准化处理;激活函数层,用于进行激活处理;局部特征聚合网络用于聚合局部特征以得到图像的特征向量。本发明能够提高视觉位置识别的鲁棒性。

The invention discloses a model training method, a domain-adaptive visual position recognition method and a device, belonging to the technical field of computer vision, including: establishing an image feature extraction model based on a deep neural network; constructing a training set according to a standard data set; Each training sample includes a target image and its positive samples and s negative samples; the image feature extraction model is trained using the training set; in the image feature extraction model, the feature extraction network includes a plurality of cascaded first networks; the first network It is composed of one or more second networks and a maximum pooling layer connected in sequence, and the maximum pooling layer is used for feature selection; the second network includes sequentially connected convolution layers for feature extraction; batch normalization layer for Perform zero-mean normalization processing; the activation function layer is used for activation processing; the local feature aggregation network is used to aggregate local features to obtain the feature vector of the image. The invention can improve the robustness of visual position recognition.

Description

模型训练方法、域自适应的视觉位置识别方法及装置Model training method, domain adaptive visual position recognition method and device

技术领域technical field

本发明属于计算机视觉技术领域,更具体地,涉及一种模型训练方法、域自适应的视觉位置识别方法及装置。The invention belongs to the technical field of computer vision, and more specifically relates to a model training method, a domain adaptive visual position recognition method and a device.

背景技术Background technique

视觉位置识别,具体是指通过对图像进行特征提取,然后根据所提取的图像特征识别图像的地理位置。在现如今大力发展自动驾驶,自主导航移动机器人的需要越来越高以及虚拟现实和增强现实越来越普遍的情况下,视觉位置识别的研究在计算机视觉领域、机器人社区及其他相关领域中均引起了广泛的关注。Visual location recognition specifically refers to extracting features from an image, and then identifying the geographic location of the image based on the extracted image features. With the development of autonomous driving, the increasing demand for autonomously navigating mobile robots, and the increasing popularity of virtual reality and augmented reality, the research on visual position recognition is widely used in the field of computer vision, the robot community and other related fields. aroused widespread concern.

在早期的计算机视觉研究时期,主要利用人工精心设计的提取图像特征点的方法提取图像特征,如尺度不变特征变换(SIFT)特征点。所提取特征的设计非常依赖经验,一些专家学者甚至历时数十年才能设计出一个比较好的特征,并且这些利用手工设计的提取图像特征点的算法在急剧的光照变化(如白天到夜间)和场景变化(场景中的行人和车辆发生编发)等情况下效果非常差,依赖这些特征的视觉位置识别方法,如视觉词袋模型(V-BOW)等,其性能也会急剧下降。近年来,随着深度学习的兴起,并被广泛应用于目标识别、目标检测、目标跟踪、语义分割等领域,一些基于深度学习的视觉位置识别方法被提了出来。例如,基于卷积神经网络的位置识别(Convolutional Neural Network-based PlaceRecognition),该方法利用深度卷积神经网络提取图像特征,由于深度卷积神经网络可以根据特定的任务进行端到端的训练,提取的图像特征更具有鲁棒性。又例如,用于弱监督位置识别的CNN架构NetVLAD(NetVLAD:CNN architecture for weakly supervised placerecognition),该方法发挥了传统的局部特征聚合(VLAD)方法的优势,有效地聚合图像的局部特征,得到紧凑的图像表达特征向量,并该方法还使的利用深度神经网络提取的图像特征更具有鲁棒性。In the early period of computer vision research, image features were mainly extracted by artificially designed methods of extracting image feature points, such as scale-invariant feature transform (SIFT) feature points. The design of the extracted features is very dependent on experience, and some experts and scholars even took decades to design a better feature, and these hand-designed algorithms for extracting image feature points are in sharp lighting changes (such as day to night) and The effect is very poor when the scene changes (pedestrians and vehicles in the scene are edited), and the performance of visual position recognition methods that rely on these features, such as the visual bag of words model (V-BOW), will also drop sharply. In recent years, with the rise of deep learning and widely used in object recognition, object detection, object tracking, semantic segmentation and other fields, some visual location recognition methods based on deep learning have been proposed. For example, Convolutional Neural Network-based Place Recognition, which uses deep convolutional neural networks to extract image features, because deep convolutional neural networks can be trained end-to-end according to specific tasks, the extracted Image features are more robust. Another example is the CNN architecture NetVLAD (NetVLAD: CNN architecture for weakly supervised place recognition) used for weakly supervised place recognition. This method takes advantage of the traditional local feature aggregation (VLAD) method to effectively aggregate the local features of the image to obtain a compact The image expresses the feature vector, and this method also makes the image features extracted by the deep neural network more robust.

相比于传统的基于手工设计图像特征点的视觉位置识别方法,基于深度神经网络的视觉位置识别方法所提取的图像特征更具有鲁棒性,并且视觉位置识别较为准确。但是,深度神经网络在使用前需要进行训练,而由于视角、光照等因素的影响,用于训练的图像的特征分布与实际待识别图像的特征分布往往具有较大差异,在这种情况下,视觉位置识别的准确度得不到保证。总的来说,现有视觉位置识别方法的鲁棒性较低。Compared with the traditional visual position recognition method based on manually designed image feature points, the image features extracted by the visual position recognition method based on deep neural network are more robust, and the visual position recognition is more accurate. However, the deep neural network needs to be trained before use, and due to the influence of viewing angle, illumination and other factors, the feature distribution of the image used for training is often quite different from the feature distribution of the actual image to be recognized. In this case, The accuracy of visual place recognition is not guaranteed. In general, the robustness of existing visual place recognition methods is low.

发明内容Contents of the invention

针对现有技术的缺陷和改进需求,本发明提供了一种模型训练方法、域自适应的视觉位置识别方法及装置,其目的在于,提高视觉位置识别的鲁棒性。Aiming at the defects and improvement needs of the prior art, the present invention provides a model training method, a domain-adaptive visual position recognition method and device, the purpose of which is to improve the robustness of visual position recognition.

为实现上述目的,按照本发明的第一方面,提供了一种图像特征提取模型训练方法,包括:In order to achieve the above object, according to the first aspect of the present invention, a kind of image feature extraction model training method is provided, comprising:

(1)建立基于深度神经网络的图像特征提取模型,用于获取图像的特征向量;(1) Establish an image feature extraction model based on a deep neural network to obtain the feature vector of the image;

图像特征提取模型包括特征提取网络和局部特征聚合网络;The image feature extraction model includes a feature extraction network and a local feature aggregation network;

特征提取网络包括级联的多个第一网络;第一网络由一个或多个第二网络以及一个极大池化层依次连接而成,极大池化层用于对其前的第二网络输出的图像进行特征选择;第二网络包括依次连接的卷积层、批标准化层以及激活函数层,卷积层用于对图像进行特征提取,批标准化层用于对卷积层输出的图像进行零均值标准化处理,激活函数层用于对批标准化层输出的图像进行激活处理;The feature extraction network includes a plurality of cascaded first networks; the first network is sequentially connected by one or more second networks and a maximum pooling layer, and the maximum pooling layer is used for the output of the previous second network. Image feature selection; the second network includes sequentially connected convolutional layers, batch normalization layers, and activation function layers. The convolutional layer is used to extract features from the image, and the batch normalization layer is used to zero-mean the output image of the convolutional layer. Normalization processing, the activation function layer is used to activate the image output by the batch normalization layer;

局部特征聚合网络用于对特征提取网络输出的图像中所有的局部特征进行聚合,从而得到图像的特征向量;The local feature aggregation network is used to aggregate all the local features in the image output by the feature extraction network to obtain the feature vector of the image;

(2)在标准数据集中,获得各目标图像的正样本和s个负样本,以由一张目标图像及其正样本和负样本构成一个训练样本,从而得到由所有训练样本构成的训练集;(2) In the standard data set, obtain positive samples and s negative samples of each target image, so as to form a training sample by a target image and its positive samples and negative samples, thereby obtaining a training set composed of all training samples;

目标图像的正样本为其临近图像中与其特征距离最近的图像,目标图像与其临近图像的位置距离d满足TNL≤d<TNH;目标图像与其负样本的位置距离满足d≥TFThe positive sample of the target image is the image closest to its feature distance among the adjacent images, and the position distance d between the target image and its adjacent image satisfies T NL ≤ d<T NH ; the position distance between the target image and its negative sample satisfies d≥T F ;

(3)利用训练集对图像特征提取模型进行训练,从而得到各模型参数;(3) Utilize training set to train image feature extraction model, thereby obtain each model parameter;

其中,标准数据集中各图像的位置信息已知,目标图像为标准数据集中预先筛选出的多张图像;图像间的特征距离为图像的特征向量之间的距离;TNL、TNH和TF均为预设的阈值,0<TNL<TNH,TNH≤TF;s≥1。Among them, the position information of each image in the standard data set is known, and the target image is multiple images pre-screened in the standard data set; the feature distance between images is the distance between the feature vectors of the image; T NL , T NH and T F Both are preset thresholds, 0<T NL <T NH , T NH ≤T F ; s≥1.

上述图像特征提取模型训练方法,所建立的图像特征提取模型中,每个用于特征提取的卷积之后都由一个批标准化层(Batch Normalization)对该卷积层所输出的图像进行零均值标准化处理,由此能够在加速模型训练的同时,使得经该图像特征提取模型所提取的图像特征都有相似的分布,从而有效避免由于训练集中图像的特征分布差异较大而导致模型训练效果不佳,进而能够改善图像特征分布差异较大时,视觉位置识别的鲁棒性较低的问题。In the image feature extraction model training method described above, in the image feature extraction model established, each convolution used for feature extraction is followed by a batch normalization layer (Batch Normalization) to carry out zero-mean normalization on the image output by the convolution layer In this way, while speeding up the model training, the image features extracted by the image feature extraction model have a similar distribution, thereby effectively avoiding the poor model training effect due to the large difference in the feature distribution of the images in the training set , which can further improve the problem of low robustness of visual position recognition when the distribution of image features is quite different.

进一步地,局部特征聚合网络包括:降维卷积层、soft-max层、聚合层、内部归一化层以及整体归一化层;Further, the local feature aggregation network includes: dimensionality reduction convolution layer, soft-max layer, aggregation layer, internal normalization layer and overall normalization layer;

降维卷积层为一层卷积层,用于将待聚合图像的纬度降维到与预设的聚类中心的个数相等,以使得待聚合图像的每个通道表示局部特征与每个聚类中心之差的权重;The dimensionality reduction convolutional layer is a convolutional layer, which is used to reduce the latitude of the image to be aggregated to be equal to the number of preset cluster centers, so that each channel of the image to be aggregated represents local features and each The weight of the difference between cluster centers;

soft-max层用于对局部特征与每个聚类中心之差的权重进行归一化;The soft-max layer is used to normalize the weight of the difference between local features and each cluster center;

聚合层用于根据局部特征、聚类中心以及归一化之后的权重聚合得到VLAD(vector of locally aggregated descriptors)向量;VLAD向量由N个D维度的向量组成,N为聚类中心个数,D为聚类中心的维度;The aggregation layer is used to aggregate VLAD (vector of locally aggregated descriptors) vectors according to local features, cluster centers, and normalized weights; VLAD vectors are composed of N D-dimensional vectors, N is the number of cluster centers, and D is the dimension of the cluster center;

内部归一化层用于对VLAD向量中每个D维度的向量进行归一化,以使得每个D维度的向量的分布在同一数量级;The internal normalization layer is used to normalize the vectors of each D dimension in the VLAD vector, so that the distribution of the vectors of each D dimension is in the same order of magnitude;

整体归一化层用于将经过内部归一化层处理后的D维度的向量串联为一个列向量后,对该列向量进行归一化,以使得待聚合图像的每个局部特征分布在同一数量级;由此可以提高神经网络模型的收敛速度和网络模型的精度;The overall normalization layer is used to concatenate the D-dimensional vectors processed by the internal normalization layer into a column vector, and normalize the column vector so that each local feature of the image to be aggregated is distributed in the same Order of magnitude; thus, the convergence speed of the neural network model and the accuracy of the network model can be improved;

其中,待聚合图像为特征提取网络输出的图像。Among them, the image to be aggregated is the image output by the feature extraction network.

进一步地,s>1;通过选定多个负样本能够提高模型的训练精度,使得利用上述图像特征提取模型获取的图像特征向量进行视觉位置识别时,具有较高的鲁棒性。Further, s>1; the training accuracy of the model can be improved by selecting multiple negative samples, so that the image feature vector obtained by the above image feature extraction model is used for visual position recognition, which has high robustness.

进一步地,步骤(3)中,利用训练集对图像特征提取模型进行训练时,所采用的损失函数为:Further, in step (3), when using the training set to train the image feature extraction model, the loss function used is:

其中,n为训练样本总数,k为训练样本序号,i为负样本序号,qk、pk和nki分别表示第k个训练样本中的目标图像、正样本和第i个负样本,表示目标图像qk与其正样本pk之间的特征距离,为目标图像qk与其负样本nki之间的特征距离,m为预定义的超参数,max表示取最大值,min表示取最小值;Among them, n is the total number of training samples, k is the number of training samples, i is the number of negative samples, q k , p k and n ki represent the target image, positive sample and i-th negative sample in the kth training sample respectively, Indicates the feature distance between the target image q k and its positive sample p k , is the feature distance between the target image q k and its negative sample n ki , m is a predefined hyperparameter, max means taking the maximum value, and min means taking the minimum value;

上述损失函数,基于三元组损失的思想,使得通过训练,目标图像与正样本的特征距离最小化,同时到负样本的特征距离最大化;其中通过这一项选择了损失最大的负样本,由此能够基于难例挖掘的思想,使得模型训练过程中更注意比较难于辨别的负样本,进而可以在利用上述图像特征提取模型进行视觉位置识别时,避免与待识别图像相似的负样本的干扰。The above loss function, based on the idea of triplet loss, minimizes the feature distance between the target image and the positive sample and maximizes the feature distance to the negative sample through training; This item selects the negative sample with the largest loss, so that based on the idea of mining difficult examples, more attention is paid to the negative samples that are difficult to distinguish during the model training process, and then when using the above image feature extraction model for visual position recognition, Avoid the interference of negative samples similar to the image to be recognized.

按照本发明的第二方面,还提供了一种基于本发明第一方面所提供的图像特征提取模型训练方法的域自适应的视觉位置识别方法,包括:According to the second aspect of the present invention, there is also provided a domain-adaptive visual position recognition method based on the image feature extraction model training method provided in the first aspect of the present invention, including:

确定待识别图像所属的目标域,并获得目标域中不同位置处的多张图像,将所获取的图像与待识别图像均作为待检索图像;Determine the target domain to which the image to be recognized belongs, and obtain multiple images at different positions in the target domain, and use the acquired image and the image to be recognized as the image to be retrieved;

以待检索图像为输入,利用图像特征提取模型获得各待检索图像的特征向量;获取图像特征向量时,对于每一个卷积层,统计所有待检索图像经过该卷积层后,所得到的特征图的均值和标准差,作为该卷积层之后的一层批标准化层的参数;图像特征提取模型中其余的模型参数为训练所得的模型参数;Taking the image to be retrieved as input, use the image feature extraction model to obtain the feature vector of each image to be retrieved; when obtaining the image feature vector, for each convolutional layer, count all the images to be retrieved after passing through the convolutional layer, the obtained features The mean and standard deviation of the graph are used as the parameters of the batch normalization layer after the convolutional layer; the rest of the model parameters in the image feature extraction model are the model parameters obtained from training;

利用图像特征获取模型获取测试数据集中各图像的特征向量;Use the image feature acquisition model to obtain the feature vector of each image in the test data set;

根据所获取的特征向量获得测试数据集中与待识别图像的特征距离最近的图像,并将该图像的位置信息确定为待识别图像的位置信息,从而完成对待识别图像的视觉位置识别;Obtain the image with the closest feature distance to the image to be recognized in the test data set according to the acquired feature vector, and determine the position information of the image as the position information of the image to be recognized, thereby completing the visual position recognition of the image to be recognized;

其中,测试数据集中各图像的位置信息已知,域为影响图像特征分布的因素集合;Among them, the position information of each image in the test data set is known, and the domain is a set of factors that affect the distribution of image features;

根据实际应用中,会根据光照、视角、季节等因素对图像特征分布的影响情况完成域的划定,同一个域中,图像的特征分布相似;例如,如果仅光照会对图像的特征分布产生较大的影响,并且白天拍摄的图像具有相似的特征分布,夜间拍摄的图像具有相似的特征分布,则可根据光照条件划分得到两个域;According to the actual application, the delineation of the domain will be completed according to the influence of factors such as illumination, viewing angle, and season on the distribution of image features. In the same domain, the distribution of features of images is similar; Larger impact, and the images taken during the day have similar feature distributions, and the images taken at night have similar feature distributions, then two domains can be obtained according to the lighting conditions;

上述域自适应的视觉位置识别方法,在利用图像特征提取模型获取待识别图像的特征向量时,模型中各批标准化层的参数不依赖于训练集,而是利用多个与该待识别图像属于同一个域的图像获取相应的参数,由于同一个域中的图像具有相似的特征分布,因此,本发明能够实现域自适应,在训练集中图像与该待识别图像的特征分布差异较大时,仍然能够准确完成视觉位置识别,也即是说,本发明能够提高视觉位置识别的鲁棒性。In the above-mentioned domain adaptive visual position recognition method, when using the image feature extraction model to obtain the feature vector of the image to be recognized, the parameters of each batch of normalization layers in the model do not depend on the training set, but use multiple Images in the same domain acquire corresponding parameters. Since images in the same domain have similar feature distributions, the present invention can realize domain adaptation. When the feature distributions of the images in the training set and the image to be recognized differ greatly, The visual position recognition can still be accurately completed, that is to say, the present invention can improve the robustness of the visual position recognition.

进一步地,利用图像特征获取模型获取测试数据集中各图像的特征向量时,各模型参数的设置方式为:Further, when using the image feature acquisition model to obtain the feature vectors of each image in the test data set, the setting method of each model parameter is:

利用训练所得的模型参数设置各模型参数;Use the model parameters obtained from training to set each model parameter;

或者,对于每一个卷积层,统计测试数据集中所有图像经过该卷积层厚,所得到的特征图的均值和标准差,作为该卷积层之后的一层批标准化层的参数;图像特征提取模型中其余的模型参数为训练所得的模型参数。Or, for each convolutional layer, all the images in the statistical test data set pass through the convolutional layer thickness, and the mean and standard deviation of the obtained feature map are used as the parameters of the batch normalization layer after the convolutional layer; image features The rest of the model parameters in the extracted model are the model parameters obtained from training.

按照本发明的第三方面,提供了一种图像特征提取模型训练装置,包括:模型建立模块、训练集构造模块以及模型训练模块;According to a third aspect of the present invention, an image feature extraction model training device is provided, including: a model building module, a training set construction module, and a model training module;

模型建立模块用于建立基于深度神经网络的图像特征提取模型,图像特征提取模型用于获取图像的特征向量;The model building module is used to set up the image feature extraction model based on the deep neural network, and the image feature extraction model is used to obtain the feature vector of the image;

训练集构造模块用于在标准数据集中,获得各目标图像的正样本和s个负样本,以由一张目标图像及其正样本和负样本构成一个训练样本,从而得到由所有训练样本构成的训练集;The training set construction module is used to obtain positive samples and s negative samples of each target image in the standard data set, so as to form a training sample from a target image and its positive samples and negative samples, so as to obtain a training sample composed of all training samples Training set;

模型训练模块用于利用训练集对图像特征提取模型进行训练,从而得到各模型参数;The model training module is used to utilize the training set to train the image feature extraction model, thereby obtaining each model parameter;

其中,图像特征提取模型包括特征提取网络和局部特征聚合网络;Among them, the image feature extraction model includes a feature extraction network and a local feature aggregation network;

特征提取网络包括级联的多个第一网络;第一网络由一个或多个第二网络以及一个极大池化层依次连接而成,极大池化层用于对其前的第二网络输出的图像进行特征选择;第二网络包括依次连接的卷积层、批标准化层以及激活函数层,卷积层用于对图像进行特征提取,批标准化层用于对卷积层输出的图像进行零均值标准化处理,激活函数层用于对批标准化层输出的图像进行激活处理;The feature extraction network includes a plurality of cascaded first networks; the first network is sequentially connected by one or more second networks and a maximum pooling layer, and the maximum pooling layer is used for the output of the previous second network. Image feature selection; the second network includes sequentially connected convolutional layers, batch normalization layers, and activation function layers. The convolutional layer is used to extract features from the image, and the batch normalization layer is used to zero-mean the output image of the convolutional layer. Normalization processing, the activation function layer is used to activate the image output by the batch normalization layer;

局部特征聚合网络用于对特征提取网络输出的图像中所有的局部特征进行聚合,从而得到图像的特征向量;The local feature aggregation network is used to aggregate all the local features in the image output by the feature extraction network to obtain the feature vector of the image;

目标图像的正样本为其临近图像中与其特征距离最近的图像,目标图像与其临近图像的位置距离d满足TNL≤d<TNH;目标图像与其负样本的位置距离d满足d≥TFThe positive sample of the target image is the image closest to its feature distance among the adjacent images, and the position distance d between the target image and its adjacent image satisfies T NL ≤ d<T NH ; the position distance d between the target image and its negative sample satisfies d≥T F ;

标准数据集中各图像的位置信息已知,目标图像为标准数据集中预先筛选出的多张图像;图像间的特征距离为图像的特征向量之间的距离;TNL、TNH和TF均为预设的阈值,0<TNL<TNH,TNH≤TF;s≥1。The position information of each image in the standard data set is known, and the target image is multiple images pre-screened in the standard data set; the feature distance between images is the distance between the feature vectors of the image; T NL , T NH and TF are Preset threshold, 0<T NL <T NH , T NH ≤T F ; s≥1.

按照本发明的第四方面,还提供了一种基于本发明第一方面所提供的图像特征提取模型训练方法的域自适应的视觉位置识别装置,包括:检索集获取模块、第一特征提取模块、第二特征提取模块以及识别模块;According to the fourth aspect of the present invention, there is also provided a domain-adaptive visual position recognition device based on the image feature extraction model training method provided in the first aspect of the present invention, including: a retrieval set acquisition module, a first feature extraction module , a second feature extraction module and a recognition module;

检索集获取模块用于确定待识别图像所属的目标域,并获得目标域中不同位置处的多张图像,将所获取的图像与待识别图像均作为待检索图像;The retrieval set acquisition module is used to determine the target domain to which the image to be recognized belongs, and obtain multiple images at different positions in the target domain, and use the acquired image and the image to be recognized as the image to be retrieved;

第一特征提取模块用于以待检索图像为输入,利用图像特征提取模型获得各待检索图像的特征向量;获取图像特征向量时,对于每一个卷积层,统计所有待检索图像经过该卷积层后,所得到的特征图的均值和标准差,作为该卷积层之后的一层批标准化层的参数;图像特征提取模型中其余的模型参数为训练所得的模型参数;The first feature extraction module is used to take the image to be retrieved as input, and use the image feature extraction model to obtain the feature vector of each image to be retrieved; when obtaining the image feature vector, for each convolutional layer, count all the images to be retrieved after the convolution After the layer, the mean and standard deviation of the obtained feature map are used as the parameters of the batch normalization layer after the convolution layer; the rest of the model parameters in the image feature extraction model are the model parameters obtained from training;

第二特征提取模块用于利用图像特征获取模型获取测试数据集中各图像的特征向量;The second feature extraction module is used to obtain the feature vector of each image in the test data set by using the image feature acquisition model;

识别模块用于根据第一特征提取模块和第二特征提取模块所提取的特征向量,获得测试数据集中与待识别图像的特征距离最近的图像,并将该图像的位置信息确定为待识别图像的位置信息,从而完成对待识别图像的视觉位置识别;The recognition module is used to obtain the image with the closest feature distance to the image to be recognized in the test data set according to the feature vectors extracted by the first feature extraction module and the second feature extraction module, and determine the position information of the image as the image to be recognized Position information, so as to complete the visual position recognition of the image to be recognized;

其中,测试数据集中各图像的位置信息已知,域为影响图像特征分布的因素集合。Among them, the location information of each image in the test data set is known, and the domain is a set of factors that affect the distribution of image features.

总体而言,通过本发明所构思的以上技术方案,能够取得以下有益效果:Generally speaking, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:

(1)本发明所提供的图像特征提取模型训练方法,所建立的图像特征提取模型中,每个用于特征提取的卷积之后都由一个批标准化层对该卷积层所输出的图像进行零均值标准化处理,由此能够在加速模型训练的同时,使得经该图像特征提取模型所提取的图像特征都有相似的分布,从而有效避免由于训练集中图像的特征分布差异较大而导致模型训练效果不佳,进而能够改善图像特征分布差异较大时,视觉位置识别的鲁棒性较低的问题。(1) In the image feature extraction model training method provided by the present invention, in the established image feature extraction model, after each convolution for feature extraction, a batch normalization layer is used to perform a batch normalization on the output image of the convolution layer Zero-mean normalization processing, so that while accelerating model training, the image features extracted by the image feature extraction model have a similar distribution, thereby effectively avoiding the large differences in the feature distribution of the images in the training set. The effect is not good, and then it can improve the problem of low robustness of visual position recognition when the distribution of image features is large.

(2)本发明所提供的图像特征提取模型训练方法,在其优选方案中,通过选定多个负样本完成训练样本的构建,能够提高模型的训练精度,使得利用上述图像特征提取模型获取的图像特征向量进行视觉位置识别时,具有较高的鲁棒性。(2) The image feature extraction model training method provided by the present invention, in its preferred solution, completes the construction of training samples by selecting a plurality of negative samples, which can improve the training accuracy of the model, so that the above-mentioned image feature extraction model is used to obtain When the image feature vector is used for visual position recognition, it has high robustness.

(3)本发明所提供的图像特征提取模型训练方法,在其优选方案中,同时基于三元组损失和难例挖掘的思想构建损失函数,使得模型训练过程中更注意比较难于辨别的负样本,进而可以在利用上述图像特征提取模型进行视觉位置识别时,避免与待识别图像相似的负样本的干扰。(3) The image feature extraction model training method provided by the present invention, in its preferred scheme, simultaneously constructs a loss function based on the idea of triple loss and difficult example mining, so that more attention is paid to the negative samples that are difficult to distinguish during the model training process , and then it can avoid the interference of negative samples similar to the image to be recognized when using the above image feature extraction model for visual position recognition.

(4)本发明所提供的域自适应的视觉位置识别方法,在利用图像特征提取模型获取待识别图像的特征向量时,模型中各批标准化层的参数不依赖于训练集,而是利用多个与该待识别图像属于同一个域的图像获取相应的参数,由此实现了域自适应,从而能够提高视觉位置识别的鲁棒性。(4) In the domain adaptive visual position recognition method provided by the present invention, when using the image feature extraction model to obtain the feature vector of the image to be recognized, the parameters of each batch of normalization layers in the model do not depend on the training set, but use multiple An image belonging to the same domain as the image to be recognized acquires corresponding parameters, thereby realizing domain adaptation and improving the robustness of visual position recognition.

附图说明Description of drawings

图1为本发明实施例提供的图像特征提取模型示意图;Fig. 1 is a schematic diagram of an image feature extraction model provided by an embodiment of the present invention;

图2为本发明实施例提供的域自适应的视觉位置识别方法流程图;FIG. 2 is a flowchart of a domain-adaptive visual position recognition method provided by an embodiment of the present invention;

图3为本发明实施例提供的域自适应的视觉位置识别方法示意图。FIG. 3 is a schematic diagram of a domain-adaptive visual position recognition method provided by an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。此外,下面所描述的本发明各个实施方式中所涉及到的技术特征只要彼此之间未构成冲突就可以相互组合。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.

本发明所提供的图像特征提取模型训练方法,包括:The image feature extraction model training method provided by the present invention includes:

(1)建立基于深度神经网络的图像特征提取模型,用于获取图像的特征向量;(1) Establish an image feature extraction model based on a deep neural network to obtain the feature vector of the image;

如图1所示,图像特征提取模型包括特征提取网络和局部特征聚合网络;As shown in Figure 1, the image feature extraction model includes a feature extraction network and a local feature aggregation network;

特征提取网络包括级联的多个第一网络;第一网络由一个或多个第二网络以及一个极大池化层(Pool)依次连接而成,极大池化层用于对其前的第二网络输出的图像进行特征选择;第二网络包括依次连接的卷积层(Conv)、批标准化层(BN)以及激活函数层(Relu),卷积层用于对图像进行特征提取,批标准化层用于对卷积层输出的图像进行零均值标准化处理,激活函数层用于对批标准化层输出的图像进行激活处理;在本实施例中,各第二网络中卷积层的卷积核大小具体为3x3;各个第一网络所包含的第二网络的个数可以相同,也可以不同;The feature extraction network includes a plurality of cascaded first networks; the first network is sequentially connected by one or more second networks and a maximum pooling layer (Pool), and the maximum pooling layer is used for the previous second The image output by the network is selected for feature selection; the second network includes a sequentially connected convolutional layer (Conv), batch normalization layer (BN) and activation function layer (Relu). The convolutional layer is used to extract features from the image, and the batch normalization layer It is used to perform zero-mean normalization processing on the image output by the convolution layer, and the activation function layer is used to perform activation processing on the image output by the batch normalization layer; in this embodiment, the convolution kernel size of the convolution layer in each second network Specifically, it is 3x3; the number of second networks included in each first network may be the same or different;

局部特征聚合网络用于对特征提取网络输出的图像中所有的局部特征进行聚合,从而得到图像的特征向量;The local feature aggregation network is used to aggregate all the local features in the image output by the feature extraction network to obtain the feature vector of the image;

在一个可选的实施方式中,如图1所示,局部特征聚合网络包括:降维卷积层(Conv)、soft-max层(Soft-max)、聚合层(VLAD)、内部归一化层(Intra-normalization)以及整体归一化层(L2-normalization);In an optional embodiment, as shown in Figure 1, the local feature aggregation network includes: a dimensionality reduction convolutional layer (Conv), a soft-max layer (Soft-max), an aggregation layer (VLAD), an internal normalization layer (Intra-normalization) and overall normalization layer (L2-normalization);

降维卷积层为一层卷积层,其卷积和大小为1x1,用于将待聚合图像的纬度降维到与预设的聚类中心的个数相等,以使得待聚合图像的每个通道表示局部特征与每个聚类中心之差的权重;其中,待聚合图像为特征提取网络输出的图像;The dimensionality reduction convolutional layer is a convolutional layer with a convolution sum size of 1x1, which is used to reduce the latitude of the image to be aggregated to be equal to the number of preset cluster centers, so that each image to be aggregated A channel represents the weight of the difference between the local feature and each cluster center; where the image to be aggregated is the image output by the feature extraction network;

soft-max层用于对局部特征与每个聚类中心之差的权重进行归一化;The soft-max layer is used to normalize the weight of the difference between local features and each cluster center;

聚合层用于根据局部特征、聚类中心以及归一化之后的权重聚合得到VLAD向量;VLAD向量由N个D维度的向量组成,N为聚类中心个数,D为聚类中心的维度;The aggregation layer is used to aggregate VLAD vectors according to local features, cluster centers, and normalized weights; VLAD vectors are composed of N D-dimensional vectors, N is the number of cluster centers, and D is the dimension of the cluster centers;

设聚类中心有N个,用CluCenter表示,CluCenter=[c1,c2,...cj,...cN],其中,每个聚类中心的维度为D,cj(j∈{1,2,…,N})表示第j个聚类中心;Suppose there are N cluster centers, represented by CluCenter, CluCenter=[c 1 ,c 2 ,...c j ,...c N ], where the dimension of each cluster center is D, c j (j ∈{1,2,…,N}) represents the jth cluster center;

特征提取网络输出每张图像的局部特征为n个,用Features表示,Features=[f1,f2,...fi...fn],其中,fi(i∈{1,2,…,n})表示第i个局部特征;The feature extraction network outputs n local features of each image, represented by Features, Features=[f 1 ,f 2 ,...f i ...f n ], where f i (i∈{1,2 ,...,n}) represents the i-th local feature;

第i个局部特征与第j个聚类中心的差的权重用aij表示,可以得到VLAD向量中第j个D维度的向量VLADvectorj(即VLAD向量的第j个元素)为:The weight of the difference between the i-th local feature and the j-th cluster center is represented by a ij , and the vector VLADvector j of the j-th D dimension in the VLAD vector (ie, the j-th element of the VLAD vector) can be obtained as:

内部归一化层用于对VLAD向量中每个D维度的向量进行归一化,以使得每个D维度的向量的分布在同一数量级;The internal normalization layer is used to normalize the vectors of each D dimension in the VLAD vector, so that the distribution of the vectors of each D dimension is in the same order of magnitude;

整体归一化层用于将经过内部归一化层处理后的D维度的向量串联为一个列向量后,对该列向量进行归一化,以使得待聚合图像的每个局部特征分布在同一数量级;由此可以提高神经网络模型的收敛速度和网络模型的精度;The overall normalization layer is used to concatenate the D-dimensional vectors processed by the internal normalization layer into a column vector, and normalize the column vector so that each local feature of the image to be aggregated is distributed in the same Order of magnitude; thus, the convergence speed of the neural network model and the accuracy of the network model can be improved;

在本实施例中,内部归一化层和整体归一化层均通过L2范数归一化的方法完成归一化操作;In this embodiment, both the internal normalization layer and the overall normalization layer complete the normalization operation through the L2 norm normalization method;

(2)在标准数据集中,获得各目标图像的正样本和s个负样本,以由一张目标图像及其正样本和负样本构成一个训练样本,从而得到由所有训练样本构成的训练集;(2) In the standard data set, obtain positive samples and s negative samples of each target image, so as to form a training sample by a target image and its positive samples and negative samples, thereby obtaining a training set composed of all training samples;

其中,标准数据集中各图像的位置信息已知,目标图像为标准数据集中预先筛选出的多张图像;Wherein, the position information of each image in the standard data set is known, and the target image is a plurality of images pre-screened in the standard data set;

目标图像的正样本为其临近图像中与其特征距离最近的图像,目标图像与其临近图像的位置距离d满足TNL≤d<TNH;目标图像与其负样本的位置距离满足d≥TF;其中,TNL、TNH和TF均为预设的阈值,0<TNL<TNH,TNH≤TF;s≥1;图像间的特征距离为图像的特征向量之间的距离;The positive sample of the target image is the image closest to its feature distance among the adjacent images, and the position distance d between the target image and its adjacent image satisfies T NL ≤ d<T NH ; the position distance between the target image and its negative sample satisfies d≥T F ; where , T NL , T NH and TF are preset thresholds, 0<T NL <T NH , T NH ≤T F ; s≥1 ; the feature distance between images is the distance between feature vectors of images;

在本实施例中,用于模型训练的标准数据集为TokyoTimeMachine谷歌街景数据集;该数据集包括采集自多个不同的位置、每个位置从12个角度方向采集的图像,总共大约47000张图像,每个图像带有地理坐标信息;在该数据集中,目标图像为随机选取的10000张图像,即训练样本总数为n=10000;在其他应用中,也可根据实际应用需求选用其他的数据集作为标准数据集;In this embodiment, the standard dataset used for model training is the TokyoTimeMachine Google Street View dataset; this dataset includes images collected from multiple different locations, each location is collected from 12 angles, and a total of about 47,000 images , each image has geographic coordinate information; in this data set, the target image is 10,000 randomly selected images, that is, the total number of training samples is n=10,000; in other applications, other data sets can also be selected according to actual application requirements as a standard dataset;

阈值TNL、TNH和TF可根据所采用的标准数据集和实际的应用场景设定,一般情况下,TNH≤25,25≤TF;在本实施例中,阈值设置具体为TNL=1,TNH=10,TF=25;通过阈值TNL和TNH设置目标图像与其正样本间位置距离的上下限,可以保证正样本与目标图像相似但又有所区别,避免模型过拟合,从而可以保证较好的模型训练效果;Thresholds T NL , T NH and TF can be set according to the standard data set adopted and actual application scenarios. Generally, T NH ≤ 25, 25 ≤ T F ; in this embodiment, the threshold setting is specifically T NL = 1, T NH = 10, T F = 25; set the upper and lower limits of the distance between the target image and the positive sample through the threshold T NL and T NH , which can ensure that the positive sample is similar to the target image but different, avoiding the model Overfitting can ensure a better model training effect;

在本实施例中,具体设置每个训练样本中,负样本数量为s=4;通过选定多个负样本能够提高模型的训练精度,使得利用上述图像特征提取模型获取的图像特征向量进行视觉位置识别时,具有较高的鲁棒性;In this embodiment, in each training sample, the number of negative samples is specifically set to be s=4; by selecting multiple negative samples, the training accuracy of the model can be improved, so that the image feature vector obtained by the above-mentioned image feature extraction model is used for visual It has high robustness in position recognition;

本实施例所构建的训练集trainSet具体可表示为:The training set trainSet constructed in this embodiment can be specifically expressed as:

其中,对于任意第k个训练样本Sk,qk、pk和nki(i∈{1,2,3,4})分别表示该训练样本中的目标图像、正样本和第i个负样本;Among them, for any k-th training sample S k , q k , p k and n ki (i∈{1,2,3,4}) represent the target image, positive sample and i-th negative sample in the training sample, respectively. sample;

(3)利用训练集对图像特征提取模型进行训练,从而得到各模型参数。(3) Use the training set to train the image feature extraction model, so as to obtain each model parameter.

上述图像特征提取模型训练方法,所建立的图像特征提取模型中,每个用于特征提取的卷积之后都由一个批标准化层(Batch Normalization)对该卷积层所输出的图像进行零均值标准化处理,由此能够在加速模型训练的同时,使得经该图像特征提取模型所提取的图像特征都有相似的分布,从而有效避免由于训练集中图像的特征分布差异较大而导致模型训练效果不佳,进而能够改善图像特征分布差异较大时,视觉位置识别的鲁棒性较低的问题。In the image feature extraction model training method described above, in the image feature extraction model established, each convolution used for feature extraction is followed by a batch normalization layer (Batch Normalization) to carry out zero-mean normalization on the image output by the convolution layer In this way, while speeding up the model training, the image features extracted by the image feature extraction model have a similar distribution, thereby effectively avoiding the poor model training effect due to the large difference in the feature distribution of the images in the training set , which can further improve the problem of low robustness of visual position recognition when the distribution of image features is quite different.

为了进一步提高视觉位置识别的鲁棒性,上述图像特征提取模型训练方法的步骤(3)中,利用训练集对图像特征提取模型进行训练时,所采用的损失函数具体为:In order to further improve the robustness of visual position recognition, in the step (3) of the above image feature extraction model training method, when using the training set to train the image feature extraction model, the loss function adopted is specifically:

其中,表示目标图像qk与其正样本pk之间的特征距离,为目标图像qk与其负样本nki之间的特征距离,m为预定义的超参数,max表示取最大值,min表示取最小值;in, Indicates the feature distance between the target image q k and its positive sample p k , is the feature distance between the target image q k and its negative sample n ki , m is a predefined hyperparameter, max means taking the maximum value, and min means taking the minimum value;

上述损失函数,基于三元组损失的思想,使得通过训练,目标图像与正样本的特征距离最小化,同时到负样本的特征距离最大化;其中通过这一项选择了损失最大的负样本,由此能够基于难例挖掘的思想,使得模型训练过程中更注意比较难于辨别的负样本,进而可以在利用上述图像特征提取模型进行视觉位置识别时,避免与待识别图像相似的负样本的干扰。The above loss function, based on the idea of triplet loss, minimizes the feature distance between the target image and the positive sample and maximizes the feature distance to the negative sample through training; This item selects the negative sample with the largest loss, so that based on the idea of mining difficult examples, more attention is paid to the negative samples that are difficult to distinguish during the model training process, and then when using the above image feature extraction model for visual position recognition, Avoid the interference of negative samples similar to the image to be recognized.

本发明还提供了一种基于上述图像特征提取模型训练方法的域自适应的视觉位置识别方法,如图2所示,包括:The present invention also provides a domain-adaptive visual position recognition method based on the above image feature extraction model training method, as shown in Figure 2, including:

确定待识别图像所属的目标域,并获得目标域中不同位置处的多张图像,将所获取的图像与待识别图像均作为待检索图像;Determine the target domain to which the image to be recognized belongs, and obtain multiple images at different positions in the target domain, and use the acquired image and the image to be recognized as the image to be retrieved;

以待检索图像为输入,利用图像特征提取模型获得各待检索图像的特征向量;获取图像特征向量时,对于每一个卷积层,统计所有待检索图像经过该卷积层后,所得到的特征图的均值和标准差,作为该卷积层之后的一层批标准化层的参数;图像特征提取模型中其余的模型参数为训练所得的模型参数;Taking the image to be retrieved as input, use the image feature extraction model to obtain the feature vector of each image to be retrieved; when obtaining the image feature vector, for each convolutional layer, count all the images to be retrieved after passing through the convolutional layer, the obtained features The mean and standard deviation of the graph are used as the parameters of the batch normalization layer after the convolutional layer; the rest of the model parameters in the image feature extraction model are the model parameters obtained from training;

利用图像特征获取模型获取测试数据集中各图像的特征向量;在本实施例中,用于视觉位置识别的测试数据集具体为tokyo247数据集,其中每张图像带有地理坐标信息;Utilize the image feature acquisition model to obtain the feature vector of each image in the test data set; in the present embodiment, the test data set for visual position recognition is specifically the tokyo247 data set, wherein each image has geographic coordinate information;

根据所获取的特征向量获得测试数据集中与待识别图像的特征距离最近的图像,并将该图像的位置信息确定为待识别图像的位置信息,从而完成对待识别图像的视觉位置识别;Obtain the image with the closest feature distance to the image to be recognized in the test data set according to the acquired feature vector, and determine the position information of the image as the position information of the image to be recognized, thereby completing the visual position recognition of the image to be recognized;

其中,测试数据集中各图像的位置信息已知,域为影响图像特征分布的因素集合;Among them, the position information of each image in the test data set is known, and the domain is a set of factors that affect the distribution of image features;

根据实际应用中,会根据光照、视角、季节等因素对图像特征分布的影响情况完成域的划定,同一个域中,图像的特征分布相似;例如,如果仅光照会对图像的特征分布产生较大的影响,并且白天拍摄的图像具有相似的特征分布,夜间拍摄的图像具有相似的特征分布,则可根据光照条件划分得到两个域;具体根据哪些因素完成域的划定,以及同一个域中图像特征分布的相似程度,可根据实际的应用需求确定,只要保证最终视觉位置识别的精度满足要求即可;According to the actual application, the delineation of the domain will be completed according to the influence of factors such as illumination, viewing angle, and season on the distribution of image features. In the same domain, the distribution of features of images is similar; greater impact, and the images taken during the day have similar feature distributions, and the images taken at night have similar feature distributions, then two domains can be obtained according to the lighting conditions; according to which factors to complete the delineation of the domain, and the same The similarity of image feature distribution in the domain can be determined according to the actual application requirements, as long as the accuracy of the final visual position recognition meets the requirements;

上述域自适应的视觉位置识别方法,在利用图像特征提取模型获取待识别图像的特征向量时,模型中各批标准化层的参数不依赖于训练集,而是利用多个与该待识别图像属于同一个域的图像获取相应的参数,由于同一个域中的图像具有相似的特征分布,因此,本发明能够实现域自适应,在训练集中图像与该待识别图像的特征分布差异较大时,仍然能够准确完成视觉位置识别,也即是说,本发明能够提高视觉位置识别的鲁棒性。In the above-mentioned domain adaptive visual position recognition method, when using the image feature extraction model to obtain the feature vector of the image to be recognized, the parameters of each batch of normalization layers in the model do not depend on the training set, but use multiple Images in the same domain acquire corresponding parameters. Since images in the same domain have similar feature distributions, the present invention can realize domain adaptation. When the feature distributions of the images in the training set and the image to be recognized differ greatly, The visual position recognition can still be accurately completed, that is to say, the present invention can improve the robustness of the visual position recognition.

在上述视觉位置识别方法中,由于用于视觉位置识别的测试数据集tokyo247与用于模型训练的标准数据集TokyoTimeMachine中图像具有相似的特征分布,因此,在本实施例中,利用图像特征获取模型获取测试数据集中各图像的特征向量时,直接利用上述图像特征提取模型训练方法训练所得的模型参数设置各模型参数;In the above visual position recognition method, since the test data set tokyo247 used for visual position recognition has similar feature distribution to the images in the standard data set TokyoTimeMachine used for model training, in this embodiment, image features are used to obtain the model When obtaining the feature vector of each image in the test data set, directly utilize the model parameters obtained from the training of the above-mentioned image feature extraction model training method to set each model parameter;

在其他应用场景中,为了最大程度上避免对训练集的依赖,在利用图像特征获取模型获取测试数据集中各图像的特征向量时,还可以采用如下方式实现对模型参数的设置:对于每一个卷积层,统计测试数据集中所有图像经过该卷积层厚,所得到的特征图的均值和标准差,作为该卷积层之后的一层批标准化层的参数;图像特征提取模型中其余的模型参数为训练所得的模型参数。In other application scenarios, in order to avoid dependence on the training set to the greatest extent, when using the image feature acquisition model to obtain the feature vectors of each image in the test data set, the model parameters can also be set in the following way: For each volume Convolutional layer, all the images in the statistical test data set pass through the convolutional layer thickness, the mean and standard deviation of the obtained feature map are used as the parameters of the batch normalization layer after the convolutional layer; the rest of the models in the image feature extraction model Parameters are the model parameters obtained from training.

图3所示为利用进行视觉位置识别的一个示例,其中,训练集图像表示用于模型训练的标准数据集,query图像为待检索图像,gallery图像为测试集数据库图像。Figure 3 shows an example of visual location recognition by using the image, in which the training set image represents the standard data set used for model training, the query image is the image to be retrieved, and the gallery image is the test set database image.

本发明还提供了一种图像特征提取模型训练装置,用于实现上述图像特征提取模型训练方法,该装置包括:模型建立模块、训练集构造模块以及模型训练模块;The present invention also provides an image feature extraction model training device for realizing the above-mentioned image feature extraction model training method, the device includes: a model building module, a training set construction module and a model training module;

模型建立模块用于建立基于深度神经网络的图像特征提取模型,图像特征提取模型用于获取图像的特征向量;The model building module is used to set up the image feature extraction model based on the deep neural network, and the image feature extraction model is used to obtain the feature vector of the image;

训练集构造模块用于在标准数据集中,获得各目标图像的正样本和s个负样本,以由一张目标图像及其正样本和负样本构成一个训练样本,从而得到由所有训练样本构成的训练集;The training set construction module is used to obtain positive samples and s negative samples of each target image in the standard data set, so as to form a training sample from a target image and its positive samples and negative samples, so as to obtain a training sample composed of all training samples Training set;

模型训练模块用于利用训练集对图像特征提取模型进行训练,从而得到各模型参数;The model training module is used to utilize the training set to train the image feature extraction model, thereby obtaining each model parameter;

其中,图像特征提取模型包括特征提取网络和局部特征聚合网络;Among them, the image feature extraction model includes a feature extraction network and a local feature aggregation network;

特征提取网络包括级联的多个第一网络;第一网络由一个或多个第二网络以及一个极大池化层依次连接而成,极大池化层用于对其前的第二网络输出的图像进行特征选择;第二网络包括依次连接的卷积层、批标准化层以及激活函数层,卷积层用于对图像进行特征提取,批标准化层用于对卷积层输出的图像进行零均值标准化处理,激活函数层用于对批标准化层输出的图像进行激活处理;The feature extraction network includes a plurality of cascaded first networks; the first network is sequentially connected by one or more second networks and a maximum pooling layer, and the maximum pooling layer is used for the output of the previous second network. Image feature selection; the second network includes sequentially connected convolutional layers, batch normalization layers, and activation function layers. The convolutional layer is used to extract features from the image, and the batch normalization layer is used to zero-mean the output image of the convolutional layer. Normalization processing, the activation function layer is used to activate the image output by the batch normalization layer;

局部特征聚合网络用于对特征提取网络输出的图像中所有的局部特征进行聚合,从而得到图像的特征向量;The local feature aggregation network is used to aggregate all the local features in the image output by the feature extraction network to obtain the feature vector of the image;

目标图像的正样本为其临近图像中与其特征距离最近的图像,目标图像与其临近图像的位置距离d满足TNL≤d<TNH;目标图像与其负样本的位置距离d满足d≥TFThe positive sample of the target image is the image closest to its feature distance among the adjacent images, and the position distance d between the target image and its adjacent image satisfies T NL ≤ d<T NH ; the position distance d between the target image and its negative sample satisfies d≥T F ;

标准数据集中各图像的位置信息已知,目标图像为标准数据集中预先筛选出的多张图像;图像间的特征距离为图像的特征向量之间的距离;TNL、TNH和TF均为预设的阈值,0<TNL<TNH,TNH≤TF;s≥1;The position information of each image in the standard data set is known, and the target image is multiple images pre-screened in the standard data set; the feature distance between images is the distance between the feature vectors of the image; T NL , T NH and TF are Preset threshold, 0<T NL <T NH , T NH ≤T F ; s≥1;

在本实施例中,各模块的具体实施方式,可参考上述方法实施例中的描述,在此将不再复述。In this embodiment, for the specific implementation manners of each module, reference may be made to the description in the foregoing method embodiments, which will not be repeated here.

本发明还提供了一种域自适应的视觉位置识别装置,用于实现上述域自适应的视觉位置识别方法,该装置包括:检索集获取模块、第一特征提取模块、第二特征提取模块以及识别模块;The present invention also provides a domain-adaptive visual position recognition device, which is used to realize the above-mentioned domain-adaptive visual position recognition method. The device includes: a retrieval set acquisition module, a first feature extraction module, a second feature extraction module, and identification module;

检索集获取模块用于确定待识别图像所属的目标域,并获得目标域中不同位置处的多张图像,将所获取的图像与待识别图像均作为待检索图像;The retrieval set acquisition module is used to determine the target domain to which the image to be recognized belongs, and obtain multiple images at different positions in the target domain, and use the acquired image and the image to be recognized as the image to be retrieved;

第一特征提取模块用于以待检索图像为输入,利用图像特征提取模型获得各待检索图像的特征向量;获取图像特征向量时,对于每一个卷积层,统计所有待检索图像经过该卷积层后,所得到的特征图的均值和标准差,作为该卷积层之后的一层批标准化层的参数;图像特征提取模型中其余的模型参数为训练所得的模型参数;The first feature extraction module is used to take the image to be retrieved as input, and use the image feature extraction model to obtain the feature vector of each image to be retrieved; when obtaining the image feature vector, for each convolutional layer, count all the images to be retrieved after the convolution After the layer, the mean and standard deviation of the obtained feature map are used as the parameters of the batch normalization layer after the convolution layer; the rest of the model parameters in the image feature extraction model are the model parameters obtained from training;

第二特征提取模块用于利用图像特征获取模型获取测试数据集中各图像的特征向量;The second feature extraction module is used to obtain the feature vector of each image in the test data set by using the image feature acquisition model;

识别模块用于根据第一特征提取模块和第二特征提取模块所提取的特征向量,获得测试数据集中与待识别图像的特征距离最近的图像,并将该图像的位置信息确定为待识别图像的位置信息,从而完成对待识别图像的视觉位置识别;The recognition module is used to obtain the image with the closest feature distance to the image to be recognized in the test data set according to the feature vectors extracted by the first feature extraction module and the second feature extraction module, and determine the position information of the image as the image to be recognized Position information, so as to complete the visual position recognition of the image to be recognized;

其中,测试数据集中各图像的位置信息已知,域为影响图像特征分布的因素集合;Among them, the position information of each image in the test data set is known, and the domain is a set of factors that affect the distribution of image features;

在本实施例中,各模块的具体实施方式,可参考上述方法实施例中的描述,在此将不再复述。In this embodiment, for the specific implementation manners of each module, reference may be made to the description in the foregoing method embodiments, which will not be repeated here.

本领域的技术人员容易理解,以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。Those skilled in the art can easily understand that the above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention, All should be included within the protection scope of the present invention.

Claims (8)

1. An image feature extraction model training method is characterized by comprising the following steps:
(1) establishing an image feature extraction model based on a deep neural network, and obtaining feature vectors of an image;
the image feature extraction model comprises a feature extraction network and a local feature aggregation network;
the feature extraction network comprises a plurality of first networks in cascade; the first network is formed by sequentially connecting one or more second networks and a maximum pooling layer, and the maximum pooling layer is used for carrying out feature selection on images output by the previous second networks; the second network comprises a convolution layer, a batch standardization layer and an activation function layer which are sequentially connected, wherein the convolution layer is used for carrying out feature extraction on images, the batch standardization layer is used for carrying out zero-mean standardization processing on the images output by the convolution layer, and the activation function layer is used for carrying out activation processing on the images output by the batch standardization layer;
the local feature aggregation network is used for aggregating all local features in the image output by the feature extraction network so as to obtain a feature vector of the image;
(2) obtaining a positive sample and s negative samples of each target image in the standard data set so as to form a training sample by one target image and the positive sample and the negative sample thereof, thereby obtaining a training set formed by all the training samples;
the positive sample of the target image is the image which is closest to the characteristic distance of the target image in the adjacent images, and the position distance d between the target image and the adjacent images meets TNL≤d<TNH(ii) a The position distance between the target image and the negative sample thereof satisfies d ≥ TF
(3) Training the image feature extraction model by using the training set so as to obtain each model parameter;
the position information of each image in the standard data set is known, and the target image is a plurality of images screened in advance in the standard data set; the characteristic distance between the images is the distance between the characteristic vectors of the images; t isNL、TNHAnd TFAre all preset threshold values, T is more than 0NL<TNH,TNH≤TF;s≥1。
2. The image feature extraction model training method of claim 1, wherein the local feature aggregation network comprises: the integrated structure comprises a dimensionality reduction convolution layer, a soft-max layer, a polymerization layer, an internal normalization layer and an integral normalization layer; the dimensionality reduction convolutional layer is a convolutional layer and is used for reducing the dimensionality of the image to be aggregated to be equal to the number of preset clustering centers so that each channel of the image to be aggregated represents the weight of the difference between the local characteristic and each clustering center;
the soft-max layer is used for normalizing the weight of the difference between the local feature and each cluster center;
the aggregation layer is used for aggregating according to the local features, the clustering center and the weight after normalization to obtain a VLAD vector;
the VLAD vector consists of vectors of N D dimensions, wherein N is the number of the clustering centers, and D is the dimension of the clustering centers;
the inner normalization layer is used for normalizing the vector of each D dimension in the VLAD vector so that the distribution of the vector of each D dimension is in the same order of magnitude;
the integral normalization layer is used for serially connecting the D-dimension vectors processed by the internal normalization layer into a column vector, and then normalizing the column vector so as to enable each local feature of the image to be aggregated to be distributed in the same order of magnitude;
and the image to be aggregated is the image output by the feature extraction network.
3. The method of training an image feature extraction model of claim 1, wherein s > 1.
4. The method for training an image feature extraction model according to claim 3, wherein in the step (3), when the training set is used to train the image feature extraction model, the loss function used is:
wherein n is the total number of training samples, k is the serial number of the training samples, i is the serial number of the negative sample, qk、pkAnd nkiRespectively representing a target image, a positive sample and an ith negative sample in a kth training sample,representing a target image qkWith its positive sample pkThe characteristic distance between the two or more of them,is a target image qkWith its negative sample nkiM is a predefined hyper-parameter, max represents taking the maximum value, and min represents taking the minimum value.
5. A domain-adaptive visual position recognition method based on the image feature extraction model training method of any one of claims 1 to 4, comprising:
determining a target domain to which an image to be identified belongs, acquiring a plurality of images at different positions in the target domain, and taking the acquired image and the image to be identified as an image to be retrieved;
taking the image to be retrieved as input, and obtaining a feature vector of each image to be retrieved by using the image feature extraction model; when the image characteristic vector is obtained, for each convolution layer, counting the mean value and the standard deviation of the characteristic graph obtained after all the images to be retrieved pass through the convolution layer, and taking the mean value and the standard deviation as the parameters of a batch standard layer behind the convolution layer; the rest model parameters in the image feature extraction model are model parameters obtained by training;
acquiring a feature vector of each image in the test data set by using the image feature acquisition model;
obtaining an image which is closest to the characteristic distance of the image to be recognized in the test data set according to the obtained characteristic vector, and determining the position information of the image as the position information of the image to be recognized, so as to complete the visual position recognition of the image to be recognized;
and the position information of each image in the test data set is known, and the domain is a factor set influencing the image characteristic distribution.
6. The method of claim 1, wherein when the image feature acquisition model is used to obtain the feature vector of each image in the test dataset, the setting of each model parameter is as follows:
setting each model parameter by using the model parameters obtained by training;
or, for each convolution layer, counting the thickness of all images in the test data set passing through the convolution layer, and taking the mean value and standard deviation of the obtained characteristic diagram as parameters of a batch of normalization layers behind the convolution layer; and the rest model parameters in the image feature extraction model are model parameters obtained by training.
7. An image feature extraction model training device, comprising: the system comprises a model establishing module, a training set constructing module and a model training module;
the model establishing module is used for establishing an image feature extraction model based on a deep neural network, and the image feature extraction model is used for acquiring a feature vector of an image;
the training set construction module is used for obtaining a positive sample and s negative samples of each target image in the standard data set so as to form a training sample by one target image and the positive sample and the negative sample thereof, thereby obtaining a training set formed by all the training samples;
the model training module is used for training the image feature extraction model by using the training set so as to obtain each model parameter;
the image feature extraction model comprises a feature extraction network and a local feature aggregation network;
the feature extraction network comprises a plurality of first networks in cascade; the first network is formed by sequentially connecting one or more second networks and a maximum pooling layer, and the maximum pooling layer is used for carrying out feature selection on images output by the previous second networks; the second network comprises a convolution layer, a batch standardization layer and an activation function layer which are sequentially connected, wherein the convolution layer is used for carrying out feature extraction on images, the batch standardization layer is used for carrying out zero-mean standardization processing on the images output by the convolution layer, and the activation function layer is used for carrying out activation processing on the images output by the batch standardization layer;
the local feature aggregation network is used for aggregating all local features in the image output by the feature extraction network so as to obtain a feature vector of the image;
the positive sample of the target image is the image which is closest to the characteristic distance of the target image in the adjacent images, and the position distance d between the target image and the adjacent images meets TNL≤d<TNH(ii) a The position distance d between the target image and the negative sample thereof satisfies that d is more than or equal to TF
The position information of each image in the standard data set is known, and the target image is a plurality of images screened in advance in the standard data set; the characteristic distance between the images is the distance between the characteristic vectors of the images; t isNL、TNHAnd TFAre all preset threshold values, T is more than 0NL<TNH,TNH≤TF;s≥1。
8. A domain-adaptive visual position recognition device based on the image feature extraction model training method of any one of 1-4 is characterized by comprising the following steps: the system comprises a retrieval set acquisition module, a first feature extraction module, a second feature extraction module and an identification module;
the retrieval set acquisition module is used for determining a target domain to which an image to be identified belongs, acquiring a plurality of images at different positions in the target domain, and taking the acquired images and the image to be identified as the image to be retrieved;
the first feature extraction module is used for taking the image to be retrieved as input and obtaining a feature vector of each image to be retrieved by using the image feature extraction model; when the image characteristic vector is obtained, for each convolution layer, counting the mean value and the standard deviation of the characteristic graph obtained after all the images to be retrieved pass through the convolution layer, and taking the mean value and the standard deviation as the parameters of a batch standard layer behind the convolution layer; the rest model parameters in the image feature extraction model are model parameters obtained by training;
the second feature extraction module is used for acquiring a feature vector of each image in the test data set by using the image feature acquisition model;
the identification module is used for acquiring an image which is closest to the characteristic distance of the image to be identified in the test data set according to the characteristic vectors extracted by the first characteristic extraction module and the second characteristic extraction module, and determining the position information of the image as the position information of the image to be identified, so that the visual position identification of the image to be identified is completed;
and the position information of each image in the test data set is known, and the domain is a factor set influencing the image characteristic distribution.
CN201910350741.5A 2019-04-28 2019-04-28 Model training method, domain-adaptive visual position identification method and device Expired - Fee Related CN110175615B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910350741.5A CN110175615B (en) 2019-04-28 2019-04-28 Model training method, domain-adaptive visual position identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910350741.5A CN110175615B (en) 2019-04-28 2019-04-28 Model training method, domain-adaptive visual position identification method and device

Publications (2)

Publication Number Publication Date
CN110175615A true CN110175615A (en) 2019-08-27
CN110175615B CN110175615B (en) 2021-01-01

Family

ID=67690216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910350741.5A Expired - Fee Related CN110175615B (en) 2019-04-28 2019-04-28 Model training method, domain-adaptive visual position identification method and device

Country Status (1)

Country Link
CN (1) CN110175615B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866140A (en) * 2019-11-26 2020-03-06 腾讯科技(深圳)有限公司 Image feature extraction model training method, image searching method and computer equipment
CN111627065A (en) * 2020-05-15 2020-09-04 Oppo广东移动通信有限公司 Visual positioning method and device and storage medium
CN111914712A (en) * 2020-07-24 2020-11-10 合肥工业大学 A method and system for target detection in railway ground track scene
CN112328891A (en) * 2020-11-24 2021-02-05 北京百度网讯科技有限公司 Method for training search model, method for searching target object and device thereof
CN112541515A (en) * 2019-09-23 2021-03-23 北京京东乾石科技有限公司 Model training method, driving data processing method, device, medium and equipment
CN112733701A (en) * 2021-01-07 2021-04-30 中国电子科技集团公司信息科学研究院 Robust scene recognition method and system based on capsule network
CN112906724A (en) * 2019-11-19 2021-06-04 华为技术有限公司 Image processing device, method, medium and system
WO2021204014A1 (en) * 2020-11-12 2021-10-14 平安科技(深圳)有限公司 Model training method and related apparatus
CN113591771A (en) * 2021-08-10 2021-11-02 武汉中电智慧科技有限公司 Training method and device for multi-scene power distribution room object detection model
CN115345930A (en) * 2021-05-12 2022-11-15 阿里巴巴新加坡控股有限公司 Model training method, visual positioning method, device and equipment
CN119359802A (en) * 2024-09-26 2025-01-24 浙江大学 A method and device for image position recognition based on basic visual model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122712A (en) * 2017-03-27 2017-09-01 大连大学 It polymerize the palmprint image recognition methods of description vectors based on convolutional neural networks and two-way local feature
CN107767378A (en) * 2017-11-13 2018-03-06 浙江中医药大学 The multi-modal Magnetic Resonance Image Segmentation methods of GBM based on deep neural network
CN107967457A (en) * 2017-11-27 2018-04-27 全球能源互联网研究院有限公司 A kind of place identification for adapting to visual signature change and relative positioning method and system
US20180181804A1 (en) * 2016-12-28 2018-06-28 Konica Minolta Laboratory U.S.A., Inc. Data normalization for handwriting recognition
WO2018184195A1 (en) * 2017-04-07 2018-10-11 Intel Corporation Joint training of neural networks using multi-scale hard example mining
CN108647577A (en) * 2018-04-10 2018-10-12 华中科技大学 A kind of pedestrian's weight identification model that adaptive difficult example is excavated, method and system
CN109684977A (en) * 2018-12-18 2019-04-26 成都三零凯天通信实业有限公司 View landmark retrieval method based on end-to-end deep learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180181804A1 (en) * 2016-12-28 2018-06-28 Konica Minolta Laboratory U.S.A., Inc. Data normalization for handwriting recognition
CN107122712A (en) * 2017-03-27 2017-09-01 大连大学 It polymerize the palmprint image recognition methods of description vectors based on convolutional neural networks and two-way local feature
WO2018184195A1 (en) * 2017-04-07 2018-10-11 Intel Corporation Joint training of neural networks using multi-scale hard example mining
CN107767378A (en) * 2017-11-13 2018-03-06 浙江中医药大学 The multi-modal Magnetic Resonance Image Segmentation methods of GBM based on deep neural network
CN107967457A (en) * 2017-11-27 2018-04-27 全球能源互联网研究院有限公司 A kind of place identification for adapting to visual signature change and relative positioning method and system
CN108647577A (en) * 2018-04-10 2018-10-12 华中科技大学 A kind of pedestrian's weight identification model that adaptive difficult example is excavated, method and system
CN109684977A (en) * 2018-12-18 2019-04-26 成都三零凯天通信实业有限公司 View landmark retrieval method based on end-to-end deep learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
RELJA ARANDJELOVIC 等: "NetVLAD: CNN architecture for weakly supervised place recognition", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
YANGHAO LI等: "Adaptive Batch Normalization for practical domain adaptation", 《PATTERN RECOGNITION》 *
仇晓松 等: "基于卷积神经网络的视觉位置识别方法", 《计算机工程与设计》 *
王丽君 等: "基于卷积神经网络的位置识别", 《电子科技》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541515A (en) * 2019-09-23 2021-03-23 北京京东乾石科技有限公司 Model training method, driving data processing method, device, medium and equipment
CN112906724A (en) * 2019-11-19 2021-06-04 华为技术有限公司 Image processing device, method, medium and system
CN110866140A (en) * 2019-11-26 2020-03-06 腾讯科技(深圳)有限公司 Image feature extraction model training method, image searching method and computer equipment
CN110866140B (en) * 2019-11-26 2024-02-02 腾讯科技(深圳)有限公司 Image feature extraction model training method, image searching method and computer equipment
CN111627065B (en) * 2020-05-15 2023-06-20 Oppo广东移动通信有限公司 A visual positioning method, device, and storage medium
CN111627065A (en) * 2020-05-15 2020-09-04 Oppo广东移动通信有限公司 Visual positioning method and device and storage medium
CN111914712A (en) * 2020-07-24 2020-11-10 合肥工业大学 A method and system for target detection in railway ground track scene
CN111914712B (en) * 2020-07-24 2024-02-13 合肥工业大学 Railway ground track scene target detection method and system
WO2021204014A1 (en) * 2020-11-12 2021-10-14 平安科技(深圳)有限公司 Model training method and related apparatus
CN112328891A (en) * 2020-11-24 2021-02-05 北京百度网讯科技有限公司 Method for training search model, method for searching target object and device thereof
CN112733701A (en) * 2021-01-07 2021-04-30 中国电子科技集团公司信息科学研究院 Robust scene recognition method and system based on capsule network
CN115345930A (en) * 2021-05-12 2022-11-15 阿里巴巴新加坡控股有限公司 Model training method, visual positioning method, device and equipment
CN113591771A (en) * 2021-08-10 2021-11-02 武汉中电智慧科技有限公司 Training method and device for multi-scene power distribution room object detection model
CN113591771B (en) * 2021-08-10 2024-03-08 武汉中电智慧科技有限公司 Training method and equipment for object detection model of multi-scene distribution room
CN119359802A (en) * 2024-09-26 2025-01-24 浙江大学 A method and device for image position recognition based on basic visual model

Also Published As

Publication number Publication date
CN110175615B (en) 2021-01-01

Similar Documents

Publication Publication Date Title
CN110175615B (en) Model training method, domain-adaptive visual position identification method and device
CN107679078B (en) Bayonet image vehicle rapid retrieval method and system based on deep learning
CN110209859B (en) Method and device for recognizing places and training models of places and electronic equipment
CN107067020B (en) Image recognition method and device
CN107515895B (en) A visual target retrieval method and system based on target detection
Jin Kim et al. Learned contextual feature reweighting for image geo-localization
Lynen et al. Placeless place-recognition
CN114255403B (en) Optical remote sensing image data processing method and system based on deep learning
CN109948425A (en) A pedestrian search method and device based on structure-aware self-attention and online instance aggregation matching
CN110070066A (en) A kind of video pedestrian based on posture key frame recognition methods and system again
CN104615986B (en) The method that pedestrian detection is carried out to the video image of scene changes using multi-detector
CN112258580B (en) Visual SLAM loop detection method based on deep learning
CN103324677B (en) Hierarchical fast image global positioning system (GPS) position estimation method
Xia et al. Loop closure detection for visual SLAM using PCANet features
CN107291855A (en) A kind of image search method and system based on notable object
CN115830531A (en) Pedestrian re-identification method based on residual multi-channel attention multi-feature fusion
CN111460980A (en) Multi-scale detection method for small-target pedestrian based on multi-semantic feature fusion
CN104820718A (en) Image classification and searching method based on geographic position characteristics and overall situation vision characteristics
CN111709313B (en) Person Re-identification Method Based on Local and Channel Combination Features
CN107977660A (en) Region of interest area detecting method based on background priori and foreground node
CN109034035A (en) Pedestrian&#39;s recognition methods again based on conspicuousness detection and Fusion Features
CN111582178B (en) Method and system for vehicle re-identification based on multi-directional information and multi-branch neural network
CN108764018A (en) A kind of multitask vehicle based on convolutional neural networks recognition methods and device again
CN107958255A (en) Target detection method and device based on image
CN114863302B (en) A small target detection method for UAV based on improved multi-head self-attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210101