[go: up one dir, main page]

CN108416389B - Image classification method based on denoising sparse autoencoder and density space sampling - Google Patents

Image classification method based on denoising sparse autoencoder and density space sampling Download PDF

Info

Publication number
CN108416389B
CN108416389B CN201810212714.7A CN201810212714A CN108416389B CN 108416389 B CN108416389 B CN 108416389B CN 201810212714 A CN201810212714 A CN 201810212714A CN 108416389 B CN108416389 B CN 108416389B
Authority
CN
China
Prior art keywords
image
training
data set
fisher vector
image data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810212714.7A
Other languages
Chinese (zh)
Other versions
CN108416389A (en
Inventor
张辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yancheng Teachers University
Original Assignee
Yancheng Teachers University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yancheng Teachers University filed Critical Yancheng Teachers University
Priority to CN201810212714.7A priority Critical patent/CN108416389B/en
Publication of CN108416389A publication Critical patent/CN108416389A/en
Application granted granted Critical
Publication of CN108416389B publication Critical patent/CN108416389B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本发明公开了基于降噪稀疏自动编码器和密度空间采样的图像分类方法。步骤为:构建图像块训练集;构建单隐藏层的降噪稀疏自动编码器,输入图像块训练集,对降噪稀疏自动编码器进行训练;对训练图像数据集和测试图像数据集中的每幅图像进行密度空间采样;使用降噪稀疏自动编码器,对每幅图像经过密度空间采样得到的空间区域提取局部特征集信息;使用两层堆叠的Fisher Vector对特征集信息进行编码,得到每幅图像最终的Fisher向量;利用Fisher向量训练分类器,实现图像分类。本发明能够精确地获取图像信息,提高了图像的分类准确率,可用于大规模图像分类与检索系统的构建。

Figure 201810212714

The invention discloses an image classification method based on noise reduction sparse automatic encoder and density space sampling. The steps are: constructing a training set of image blocks; constructing a denoising sparse auto-encoder with a single hidden layer, inputting the training set of image blocks, and training the denoising sparse auto-encoder; The image is sampled in the density space; the denoising sparse auto-encoder is used to extract the local feature set information from the spatial region obtained by the density space sampling of each image; the feature set information is encoded by the two-layer stacked Fisher Vector to obtain each image The final Fisher vector; use the Fisher vector to train the classifier to achieve image classification. The invention can accurately acquire image information, improve the classification accuracy of images, and can be used for the construction of large-scale image classification and retrieval systems.

Figure 201810212714

Description

基于降噪稀疏自动编码器和密度空间采样的图像分类方法Image classification method based on denoising sparse autoencoder and density space sampling

技术领域technical field

本发明属于图像分类技术领域,特别涉及了基于降噪稀疏自动编码器和密度空间采样的图像分类方法。The invention belongs to the technical field of image classification, and particularly relates to an image classification method based on noise reduction sparse automatic encoder and density space sampling.

背景技术Background technique

随着多媒体技术的发展,图像分类已成为计算机视觉领域研究的热点问题。图像分类是依据图像具有的某种属性而将其划分到预先设定的不同类别中。如何将图像进行有效的表达是提高图像分类准确率的关键,其中特征的选择与提取问题是目前图像分类中存在的重点和难点问题。传统的Gabor filter、SIFT、LBP、HOG等这些人工设计特征方法,虽然在图像分类中取得了一定的效果,但这些方法需要精心设计且不能很好地应用于特定问题。近年来,深度学习在图像识别、语音识别等诸多应用领域都取得了巨大的成功。深度学习模型中卷积神经网络(CNN)在特征提取中取得突破性进展,但是CNN需要大量人工标注的标签数据,而降噪稀疏自动编码器能够利用大量无标签数据学习出描述图像内容的特征。传统的空间金字塔被广泛应用于图像分类中,但是传统的空间金字塔划分的网格比例固定,划分的方法较为单一,而基于滑动窗的密度空间采样方法可以捕获更多图像空间信息。With the development of multimedia technology, image classification has become a hot research topic in the field of computer vision. Image classification is to classify images into different preset categories according to certain attributes they have. How to express the image effectively is the key to improve the accuracy of image classification, among which the selection and extraction of features are the key and difficult problems in the current image classification. Traditional artificially designed feature methods such as Gabor filter, SIFT, LBP, HOG, etc., have achieved certain results in image classification, but these methods need to be carefully designed and cannot be well applied to specific problems. In recent years, deep learning has achieved great success in many application fields such as image recognition and speech recognition. Convolutional Neural Networks (CNN) in deep learning models have made breakthroughs in feature extraction, but CNNs require a large amount of manually labeled label data, while denoising sparse autoencoders can use a large amount of unlabeled data to learn features describing image content . The traditional spatial pyramid is widely used in image classification, but the grid ratio of the traditional spatial pyramid is fixed, and the division method is relatively simple, while the density space sampling method based on the sliding window can capture more image spatial information.

发明内容SUMMARY OF THE INVENTION

本发明提供基于降噪稀疏自动编码器和密度空间采样的图像分类方法,旨在解决图像特征提取和编码问题,克服现有图像分类方法存在的缺陷,减少计算成本,提高分类精度。The invention provides an image classification method based on noise reduction sparse automatic encoder and density space sampling, aiming at solving the problem of image feature extraction and coding, overcoming the defects of the existing image classification methods, reducing the calculation cost and improving the classification accuracy.

为了实现上述技术目的,本发明的技术方案为:In order to realize the above-mentioned technical purpose, the technical scheme of the present invention is:

基于降噪稀疏自动编码器和密度空间采样的图像分类方法,包括以下步骤:Image classification method based on denoising sparse autoencoder and density space sampling, including the following steps:

(1)构建图像块训练集;(1) Construct a training set of image blocks;

(2)构建单隐藏层的降噪稀疏自动编码器,输入步骤(1)得到的图像块训练集,对降噪稀疏自动编码器进行训练,直至满足停止迭代条件;(2) constructing a denoising sparse autoencoder with a single hidden layer, inputting the image block training set obtained in step (1), and training the denoising sparse autoencoder until the stop iteration condition is met;

(3)对训练图像数据集M1和测试图像数据集M2中的每幅图像进行密度空间采样,设定训练图像数据集M1的图像数目为m1,测试图像数据集M2的图像数目为m2(3) Perform density space sampling on each image in the training image data set M 1 and the test image data set M 2 , set the number of images in the training image data set M 1 to m 1 , and the images in the test image data set M 2 The number is m 2 ;

(4)将已训练好的降噪稀疏自动编码器作为特征提取器,对每幅图像经过密度空间采样得到的空间区域提取局部特征集信息;(4) Using the trained denoising sparse auto-encoder as a feature extractor, extract local feature set information from the spatial region obtained by density space sampling for each image;

(5)采用两层堆叠的Fisher Vector对步骤(4)提取的特征集信息进行编码,得到最终的Fisher向量,将其作为图像分类的特征向量;(5) using two-layer stacked Fisher Vector to encode the feature set information extracted in step (4) to obtain the final Fisher vector, which is used as a feature vector for image classification;

(6)采用训练图像数据集M1的Fisher向量和与之对应的标签来训练分类器,将测试图像数据集M2的Fisher向量输入训练好的分类器中,进行图像分类。(6) Use the Fisher vector of the training image data set M 1 and the corresponding label to train the classifier, and input the Fisher vector of the test image data set M 2 into the trained classifier for image classification.

进一步地,步骤(1)的具体过程如下:Further, the concrete process of step (1) is as follows:

(1a)获取无标签的图像集

Figure BDA0001597668740000021
其中,
Figure BDA0001597668740000023
表示第i张图像,N为无标签图像数目;(1a) Obtain an unlabeled image set
Figure BDA0001597668740000021
in,
Figure BDA0001597668740000023
Indicates the i-th image, and N is the number of unlabeled images;

(1b)从每张图像

Figure BDA0001597668740000024
中随机抽取M个r*r大小的图像块,一共得到MN个图像块,构成图像块训练集
Figure BDA0001597668740000022
n=r*r*3。(1b) from each image
Figure BDA0001597668740000024
Randomly extract M image blocks of size r*r, and obtain a total of MN image blocks to form the image block training set
Figure BDA0001597668740000022
n=r*r*3.

进一步地,步骤(2)的具体过程如下:Further, the concrete process of step (2) is as follows:

(2a)对图像块训练集X进行加噪处理:按照比例q随机对X中的图像块数据xj加噪处理,得xj'~q(xj'|xj),其中,xj'是按照比例q将向量xj一部分值置为0,j∈{1,MN};(2a) Add noise to the image block training set X: randomly add noise to the image block data x j in X according to the proportion q, to obtain x j '~q(x j '|x j ), where x j ' is to set a part of the value of the vector x j to 0 according to the ratio q, j∈{1,MN};

(2b)随机初始化模型参数W1、W2、b1、b2,设置降噪稀疏自动编码器输入层神经元的个数为n,隐藏层神经元的个数为s,其中,

Figure BDA0001597668740000031
表示编码权重,
Figure BDA0001597668740000032
表示解码权重,
Figure BDA0001597668740000033
表示编码偏置项,
Figure BDA0001597668740000034
表示解码偏置项;(2b) Randomly initialize the model parameters W 1 , W 2 , b 1 , b 2 , set the number of neurons in the input layer of the noise reduction sparse autoencoder to n, and the number of neurons in the hidden layer to be s, where,
Figure BDA0001597668740000031
represents the encoding weight,
Figure BDA0001597668740000032
represents the decoding weight,
Figure BDA0001597668740000033
represents the coding bias term,
Figure BDA0001597668740000034
represents the decoding bias term;

设αj=g(W1xj'+b1),其中,

Figure BDA0001597668740000035
Figure BDA0001597668740000036
Let α j =g(W 1 x j '+b 1 ), where,
Figure BDA0001597668740000035
Figure BDA0001597668740000036

(2c)定义降噪稀疏自动编码器的整体代价函数Loss:(2c) Define the overall cost function Loss of the denoising sparse autoencoder:

Figure BDA0001597668740000037
Figure BDA0001597668740000037

其中,λ是权重衰减系数,β是稀疏惩罚权重,ρ是目标稀疏值,

Figure BDA0001597668740000038
是所有训练样本在第t个隐藏神经元上的平均响应;where λ is the weight decay coefficient, β is the sparse penalty weight, ρ is the target sparse value,
Figure BDA0001597668740000038
is the average response of all training samples on the t-th hidden neuron;

(2d)利用误差反向传播算法,计算整体代价函数Loss的梯度;通过改进的拟牛顿法L-BFGS求解整体代价函数Loss的最小化问题,从而获得训练后的模型参数W1、b1、W2、b2(2d) Use the error back propagation algorithm to calculate the gradient of the overall cost function Loss; solve the minimization problem of the overall cost function Loss through the improved quasi-Newton method L-BFGS, so as to obtain the trained model parameters W 1 , b 1 , W 2 , b 2 .

进一步地,步骤(3)的具体过程如下:Further, the concrete process of step (3) is as follows:

(3a)分别获取训练图像数据集M1和测试图像数据集M2中的每幅图像;(3a) respectively acquiring each image in the training image data set M 1 and the test image data set M 2 ;

(3b)使用尺度可变的滑动窗口以迭代的方式依次从左向右和从上之下对步骤(3a)中获取的每幅图像进行密度空间采样,每幅图像采样得到R个空间区域;定义初始滑动窗口大小为[w,h],滑动窗口每次滑动的步长为t,定义每次滑动窗口获取的空间区域为一个四元组Area(m,n,w,h);其中,m和n表示滑动窗口左上角在图像中的坐标位置,w和h表示初始的滑动窗口的宽和高。(3b) performing density spatial sampling on each image obtained in step (3a) from left to right and from top to bottom in an iterative manner using a sliding window with variable scale, and each image is sampled to obtain R spatial regions; The initial sliding window size is defined as [w, h], the step size of each sliding window is t, and the spatial area obtained by each sliding window is defined as a quadruple Area (m, n, w, h); among them, m and n represent the coordinate position of the upper left corner of the sliding window in the image, and w and h represent the width and height of the initial sliding window.

进一步地,步骤(4)的具体过程如下:Further, the concrete process of step (4) is as follows:

(4a)从步骤(3b)得到的每幅图像的每个空间区域中随机抽取

Figure BDA0001597668740000039
个r*r的图像块;(4a) Randomly extract from each spatial region of each image obtained in step (3b)
Figure BDA0001597668740000039
r*r image blocks;

(4b)将训练好的降噪稀疏自动编码器作为特征提取器,获取图像i中第l个空间区域中

Figure BDA0001597668740000046
个r*r图像块的局部特征集
Figure BDA0001597668740000041
每幅图像获取R个局部特征集,对应R个空间区域。(4b) Using the trained denoising sparse autoencoder as a feature extractor, obtain the lth spatial region in image i
Figure BDA0001597668740000046
Local feature set of r*r image patches
Figure BDA0001597668740000041
Each image acquires R local feature sets, corresponding to R spatial regions.

进一步地,步骤(5)的具体步骤如下:Further, the concrete steps of step (5) are as follows:

(5a)采用含有K个高斯分量的高斯混合模型对训练图像数据集M1的所有局部特征集进行建模,并求解得到该模型的参数θ;(5a) Use a Gaussian mixture model containing K Gaussian components to model all the local feature sets of the training image data set M1, and solve to obtain the parameter θ of the model;

(5b)在步骤(5a)的基础上,利用第一层的Fisher vector对图像i的第l个空间区域的局部特征集进行编码,得到Fisher向量bl;经过第一层的Fisher向量编码后,得到图像i的Fisher向量集

Figure BDA0001597668740000042
(5b) on the basis of step (5a), utilize the Fisher vector of the first layer to encode the local feature set of the l-th spatial region of the image i to obtain the Fisher vector b l ; after the Fisher vector encoding of the first layer , get the Fisher vector set of image i
Figure BDA0001597668740000042

(5c)对训练图像数据集M1和测试图像数据集M2中所有图像的Fisher向量集进行PCA降维处理,降维后图像i的Fisher向量集为

Figure BDA0001597668740000043
(5c) Perform PCA dimensionality reduction on the Fisher vector sets of all images in the training image data set M 1 and the test image data set M 2 , and the Fisher vector set of the image i after dimensionality reduction is:
Figure BDA0001597668740000043

(5d)采用含有K'个高斯分量的高斯混合模型对训练图像数据集M1的所有图像降维后的Fisher向量集进行建模,并求解得到该模型的参数θ';(5d) Use a Gaussian mixture model containing K' Gaussian components to model the Fisher vector set of all images of the training image data set M 1 after dimensionality reduction, and solve to obtain the parameter θ' of the model;

(5e)在步骤(5d)的基础上,利用第二层的Fisher vector对降维后图像i的Fisher向量集Bi'进行编码,得到Fisher向量fi,得到训练图像数据集M1和测试图像数据集M2每幅图像最终的Fisher向量。(5e) On the basis of step (5d), use the Fisher vector of the second layer to encode the Fisher vector set B i ' of the image i after dimensionality reduction, obtain the Fisher vector f i , and obtain the training image data set M 1 and the test image Image dataset M 2 Final Fisher vector for each image.

进一步地,在步骤(5a)中,将K个高斯单元的高斯混合模型记为

Figure BDA0001597668740000044
其中,at为单个局部特征,pi(at)代表第i个高斯单元,
Figure BDA0001597668740000045
D表示局部特征at的维数,πii,∑i分别表示第i个高斯单元的权重、均值和协方差矩阵;通过训练图像数据集M1的所有局部特征集来估计高斯混合模型的所有参数θ={πii,∑i,i=1,...K}。Further, in step (5a), the Gaussian mixture model of K Gaussian units is recorded as
Figure BDA0001597668740000044
where at is a single local feature, pi (at ) represents the ith Gaussian unit,
Figure BDA0001597668740000045
D represents the dimension of local feature a t , π i , μ i , ∑ i represent the weight, mean and covariance matrix of the ith Gaussian unit, respectively; Gaussian mixture is estimated by training all local feature sets of image dataset M 1 All parameters of the model θ={π i , μ i , ∑ i , i=1,...K}.

进一步地,在步骤(6)中,所述分类器采用支持向量机。Further, in step (6), the classifier adopts a support vector machine.

采用上述技术方案带来的有益效果:The beneficial effects brought by the above technical solutions:

(1)本发明采用降噪稀疏自动编码器作为特征提取器,该模型是非监督特征学习方法,可以用无标签样本进行训练,可以解决当前需要大量人工标注样本的问题。此外,与普通的稀疏自动编码器相比,降噪稀疏自动编码器能够学习出更加鲁棒的特征表示。(1) The present invention adopts a noise reduction sparse auto-encoder as a feature extractor. The model is an unsupervised feature learning method, which can be trained with unlabeled samples, and can solve the current problem of requiring a large number of manually labeled samples. Furthermore, denoising sparse autoencoders are able to learn more robust feature representations compared to ordinary sparse autoencoders.

(2)本发明采用密度空间采样方法取代传统的空间金字塔结构。传统的空间金字塔对图像的分割比较固定,而密度空间采样方法可以对图像进行更为灵活的分割,从而可以捕获更多的图像空间信息。(2) The present invention adopts the density space sampling method to replace the traditional space pyramid structure. The traditional spatial pyramid segmentation of the image is relatively fixed, while the density spatial sampling method can perform more flexible segmentation of the image, so that more spatial information of the image can be captured.

(3)本发明采用堆叠的Fisher vector进行特征编码,与标准的单层结构相比,堆叠结构能够以层次结构抽取图像的语义信息,在图像分类上能够获取更好的精度和性能。(3) The present invention uses stacked Fisher vectors for feature encoding. Compared with the standard single-layer structure, the stacked structure can extract semantic information of images in a hierarchical structure, and can obtain better accuracy and performance in image classification.

附图说明Description of drawings

图1是本发明的方法流程图。FIG. 1 is a flow chart of the method of the present invention.

具体实施方式Detailed ways

以下将结合附图,对本发明的技术方案进行详细说明。The technical solutions of the present invention will be described in detail below with reference to the accompanying drawings.

本实施例的测试实验软硬件环境如下:The test experiment software and hardware environment of the present embodiment is as follows:

硬件类型:Hardware Type:

电脑类型:台式机;Computer type: desktop;

CPU:Intel(R)Core(TM)i5-5200U CPU@2.20GHzCPU: Intel(R) Core(TM) i5-5200U CPU@2.20GHz

内存:8.00GBRAM: 8.00GB

系统类型:64位操作系统System Type: 64-bit OS

开发语言:MatlabDevelopment language: Matlab

以下将结合附图,对本发明的技术方案进行详细说明。实施例以STL-10数据库为例,该数据库包含10类RGB图像,每幅图像的大小为96*96。其中用于有监督训练的训练样本数共为5000,将5000个训练样本划分为十折,每次用于监督训练的训练样本数为1000,测试样本数为8000。The technical solutions of the present invention will be described in detail below with reference to the accompanying drawings. The embodiment takes the STL-10 database as an example, the database contains 10 types of RGB images, and the size of each image is 96*96. Among them, the number of training samples used for supervised training is 5000, and the 5000 training samples are divided into ten folds. The number of training samples used for supervised training is 1000 each time, and the number of test samples is 8000.

本发明提供的一种基于降噪稀疏自动编码器和密度空间采样的图像分类方法,配合图1所示,具体步骤如下。The present invention provides an image classification method based on noise reduction and sparse automatic encoder and density space sampling, as shown in FIG. 1 , and the specific steps are as follows.

步骤1,构建图像块训练集:Step 1, build a training set of image patches:

1a,获取STL-10中无标签的图像集

Figure BDA0001597668740000061
其中,
Figure BDA0001597668740000068
表示第i张图像,无标签的图像数目为100000;1a, get the unlabeled image set in STL-10
Figure BDA0001597668740000061
in,
Figure BDA0001597668740000068
Indicates the i-th image, and the number of unlabeled images is 100,000;

1b,从每张图像

Figure BDA0001597668740000069
中随机抽取10个8*8大小的图像块,从而构成图像块训练集
Figure BDA0001597668740000062
乘以3是因为存在R、G、B三个通道。1b, from each image
Figure BDA0001597668740000069
Randomly select 10 image blocks of 8*8 size from the image block to form a training set of image blocks
Figure BDA0001597668740000062
Multiplying by 3 is because there are three channels R, G, B.

步骤2,构建单隐藏层的降噪稀疏自动编码器,输入步骤1b得到的图像块训练集,对降噪稀疏自动编码器进行训练,直至满足停止迭代条件:Step 2: Construct a denoising sparse autoencoder with a single hidden layer, input the image block training set obtained in step 1b, and train the denoising sparse autoencoder until the stop iteration condition is met:

2a,对图像块训练集X加噪处理:按照一定比例q随机对X中的图像块向量数据xj加噪处理,可得xj'~q(xj'|xj)。其中,xj'是按照比例q将向量xj一部分值置为0。这里q设为0.5,j∈{1,1000000};2a, add noise to the image block training set X: randomly add noise to the image block vector data x j in X according to a certain proportion q, to obtain x j '~q(x j '|x j ). Among them, x j ' is to set a part of the value of vector x j to 0 according to the ratio q. Here q is set to 0.5, j∈{1,1000000};

2b,随机初始化模型参数W1、W2、b1、b2,设置降噪稀疏自动编码器输入层神经元的个数为192,隐藏层神经元的个数为500。其中,

Figure BDA0001597668740000063
为编码权重,
Figure BDA0001597668740000064
为解码权重,
Figure BDA0001597668740000065
为编码偏置项,
Figure BDA0001597668740000066
为解码偏置项。假设αj=g(W1xj'+b1),其中
Figure BDA0001597668740000067
2b, randomly initialize the model parameters W 1 , W 2 , b 1 , b 2 , set the number of neurons in the input layer of the noise reduction sparse autoencoder to 192, and the number of neurons in the hidden layer to 500. in,
Figure BDA0001597668740000063
is the encoding weight,
Figure BDA0001597668740000064
is the decoding weight,
Figure BDA0001597668740000065
is the coding bias term,
Figure BDA0001597668740000066
is the decoding bias term. Suppose α j =g(W 1 x j '+b 1 ), where
Figure BDA0001597668740000067

2c,定义DSAE的整体代价函数Loss:2c, define the overall cost function Loss of DSAE:

Figure BDA0001597668740000071
Figure BDA0001597668740000071

其中,λ是权重衰减系数,β是稀疏惩罚权重,ρ是目标稀疏值,

Figure BDA0001597668740000072
是所有训练样本在第t个隐藏神经元上的平均响应,优选地,λ=0.03,β=3,ρ=0.1;where λ is the weight decay coefficient, β is the sparse penalty weight, ρ is the target sparse value,
Figure BDA0001597668740000072
is the average response of all training samples on the t-th hidden neuron, preferably, λ=0.03, β=3, ρ=0.1;

2d,利用误差反向传播算法,计算整体代价函数Loss的梯度。通过改进的拟牛顿法L-BFGS求解代价函数Loss的最小化问题,从而获得训练后的模型参数W1、b1、W2、b22d, using the error back-propagation algorithm to calculate the gradient of the overall cost function Loss. The minimization problem of the cost function Loss is solved by the improved quasi-Newton method L-BFGS, so as to obtain the trained model parameters W 1 , b 1 , W 2 , and b 2 .

步骤3,对训练图像数据集M1和测试图像数据集M2中的每幅图像进行密度空间采样,其中,训练图像数据集M1图像数目为1000,测试图像数据集M2图像数目为8000:Step 3, perform density space sampling on each image in the training image data set M 1 and the test image data set M 2 , wherein the number of images in the training image data set M 1 is 1000, and the number of images in the test image data set M 2 is 8000 :

3a,分别获取训练图像数据集M1和测试图像数据集M2中的每幅图像;3a, obtain each image in the training image data set M 1 and the test image data set M 2 respectively;

3b,使用尺度可变的滑动窗口以迭代的方式依次从左向右和从上之下对步骤3a中的每幅图像进行密度空间采样。定义初始滑动窗口大小为[w,h],滑动窗口每次滑动的步长为10,定义每次滑动窗口获取的空间区域为一个四元组Area(m,n,w,h)。其中,m和n表示滑动窗口左上角在图像中的坐标位置,w和h表示初始的滑动窗口的宽和高。m和n初始值为0,w和h初始值设为46,每幅图像可采样得到90个空间区域。3b, each image in step 3a is density-spatially sampled sequentially from left to right and top to bottom in an iterative manner using a variable-scale sliding window. The initial sliding window size is defined as [w, h], the step size of each sliding window is 10, and the spatial area obtained by each sliding window is defined as a quadruple Area (m, n, w, h). Among them, m and n represent the coordinate position of the upper left corner of the sliding window in the image, and w and h represent the width and height of the initial sliding window. The initial values of m and n are 0, the initial values of w and h are set to 46, and each image can be sampled to obtain 90 spatial regions.

步骤4,将已训练好的降噪稀疏自动编码器作为特征提取器,对每幅图像经过密度空间采样得到的空间区域提取局部特征集信息:Step 4: Using the trained denoising sparse autoencoder as a feature extractor, extract local feature set information from the spatial region obtained by density space sampling for each image:

4a,从步骤3c得到的每幅图像的每个空间区域中随机抽取100个8*8图像块;4a, randomly extract 100 8*8 image blocks from each spatial region of each image obtained in step 3c;

4b,将已训练好的降噪稀疏自动编码器作为特征提取器,获取图像i中第l个空间区域中100个8*8图像块的局部特征集

Figure BDA0001597668740000073
每幅图像可获取90个特征集。4b, using the trained denoising sparse autoencoder as a feature extractor, obtain the local feature set of 100 8*8 image blocks in the lth spatial region of image i
Figure BDA0001597668740000073
90 feature sets can be obtained per image.

步骤5,使用两层堆叠的Fisher Vector对步骤4b的特征集信息进行编码,得到最终的Fisher向量,作为图像分类的特征向量:Step 5, use the two-layer stacked Fisher Vector to encode the feature set information of step 4b, and obtain the final Fisher vector, which is used as the feature vector of image classification:

5a,采用含有128个高斯分量的高斯混合模型对训练图像数据集M1的所有局部特征集进行建模,并求解得到该模型的参数θ;将128个高斯单元的高斯混合模型记为

Figure BDA0001597668740000081
其中,at为单个局部特征,pi(at)代表第i个高斯单元,
Figure BDA0001597668740000082
D为局部特征at的维数,D=192,πii,∑i分别表示第i个高斯单元的权重、均值和协方差矩阵。通过训练图像数据集M1的所有局部特征集来估计高斯混合模型的所有参数θ={πii,∑i,i=1,...128};5a, use a Gaussian mixture model with 128 Gaussian components to model all the local feature sets of the training image dataset M 1 , and solve to obtain the parameter θ of the model; the Gaussian mixture model of 128 Gaussian units is recorded as
Figure BDA0001597668740000081
where at is a single local feature, pi (at ) represents the ith Gaussian unit,
Figure BDA0001597668740000082
D is the dimension of the local feature at, D= 192 , π i , μ i , Σ i represent the weight, mean and covariance matrix of the ith Gaussian unit, respectively. Estimate all parameters of the Gaussian mixture model θ={π i , μ i ,∑ i ,i=1,...128} by training all local feature sets of the image dataset M 1 ;

5b,在步骤5a基础上,利用第一层的Fisher vector对图像i的第l个空间区域的局部特征集A进行编码,得到Fisher向量bl。经过第一层的Fisher vector编码后,可得图像i的Fisher向量集

Figure BDA0001597668740000083
5b, on the basis of step 5a, use the Fisher vector of the first layer to encode the local feature set A of the lth spatial region of the image i to obtain the Fisher vector b l . After the Fisher vector encoding of the first layer, the Fisher vector set of image i can be obtained
Figure BDA0001597668740000083

5c,对训练图像数据集M1和测试图像数据集M2中所有图像的Fisher向量集进行PCA降维处理,降维后图像i的Fisher向量集为

Figure BDA0001597668740000084
5c, perform PCA dimensionality reduction processing on the Fisher vector sets of all images in the training image dataset M 1 and the test image dataset M 2 , and the Fisher vector set of the image i after dimensionality reduction is
Figure BDA0001597668740000084

5d,采用含有32个高斯分量的高斯混合模型对训练图像数据集M1的所有图像降维后的Fisher向量集进行建模,并求解得到该模型的参数θ';5d, using a Gaussian mixture model containing 32 Gaussian components to model the Fisher vector set of all images of the training image data set M1 after dimensionality reduction, and solving to obtain the parameter θ' of the model;

5e,在步骤5d基础上,利用第二层的Fisher vector对降维后图像i的Fisher向量集Bi'进行编码,得到Fisher向量fi,由此可得训练图像数据集M1和测试图像数据集M2每幅图像最终的Fisher向量f。5e, on the basis of step 5d, use the Fisher vector of the second layer to encode the Fisher vector set B i ' of the image i after dimensionality reduction to obtain the Fisher vector f i , thereby obtaining the training image data set M 1 and the test image The final Fisher vector f for each image of the dataset M2 .

步骤6,采用步骤5e中训练图像数据集M1的Fisher向量和与之对应的标签来训练支持向量机,将测试图像数据集M2的Fisher向量输入已训练好的分类器中,进行图像分类。Step 6: Use the Fisher vector of the training image data set M 1 in step 5e and the corresponding label to train the support vector machine, and input the Fisher vector of the test image data set M 2 into the trained classifier for image classification. .

实施例仅为说明本发明的技术思想,不能以此限定本发明的保护范围,凡是按照本发明提出的技术思想,在技术方案基础上所做的任何改动,均落入本发明保护范围之内。The embodiment is only to illustrate the technical idea of the present invention, and cannot limit the protection scope of the present invention. Any changes made on the basis of the technical solution according to the technical idea proposed by the present invention all fall within the protection scope of the present invention. .

Claims (4)

1.基于降噪稀疏自动编码器和密度空间采样的图像分类方法,其特征在于,包括以下步骤:1. The image classification method based on noise reduction sparse autoencoder and density space sampling, is characterized in that, comprises the following steps: (1)构建图像块训练集;该步骤的具体过程:(1) Build an image block training set; the specific process of this step: (1a)获取无标签的图像集
Figure FDA0002390211910000011
其中,
Figure FDA0002390211910000012
表示第i张图像,N为无标签图像数目;
(1a) Obtain an unlabeled image set
Figure FDA0002390211910000011
in,
Figure FDA0002390211910000012
Indicates the i-th image, and N is the number of unlabeled images;
(1b)从每张图像
Figure FDA0002390211910000013
中随机抽取M个r*r大小的图像块,一共得到MN个图像块,构成图像块训练集
Figure FDA0002390211910000014
(1b) from each image
Figure FDA0002390211910000013
Randomly extract M image blocks of size r*r, and obtain a total of MN image blocks to form the image block training set
Figure FDA0002390211910000014
(2)构建单隐藏层的降噪稀疏自动编码器,输入步骤(1)得到的图像块训练集,对降噪稀疏自动编码器进行训练,直至满足停止迭代条件;(2) constructing a denoising sparse autoencoder with a single hidden layer, inputting the image block training set obtained in step (1), and training the denoising sparse autoencoder until the stop iteration condition is met; (3)对训练图像数据集M1和测试图像数据集M2中的每幅图像进行密度空间采样,设定训练图像数据集M1的图像数目为m1,测试图像数据集M2的图像数目为m2;该步骤的具体过程:(3) Perform density space sampling on each image in the training image data set M 1 and the test image data set M 2 , set the number of images in the training image data set M 1 to m 1 , and the images in the test image data set M 2 The number is m 2 ; the specific process of this step: (3a)分别获取训练图像数据集M1和测试图像数据集M2中的每幅图像;(3a) respectively acquiring each image in the training image data set M 1 and the test image data set M 2 ; (3b)使用尺度可变的滑动窗口以迭代的方式依次从左向右和从上之下对步骤(3a)中获取的每幅图像进行密度空间采样,每幅图像采样得到R个空间区域;定义初始滑动窗口大小为[w,h],滑动窗口每次滑动的步长为t,定义每次滑动窗口获取的空间区域为一个四元组Area(m,n,w,h);其中,m和n表示滑动窗口左上角在图像中的坐标位置,w和h表示初始的滑动窗口的宽和高;(3b) performing density spatial sampling on each image obtained in step (3a) from left to right and from top to bottom in an iterative manner using a sliding window with variable scale, and each image is sampled to obtain R spatial regions; The initial sliding window size is defined as [w, h], the step size of each sliding window is t, and the spatial area obtained by each sliding window is defined as a quadruple Area (m, n, w, h); among them, m and n represent the coordinate position of the upper left corner of the sliding window in the image, and w and h represent the width and height of the initial sliding window; (4)将已训练好的降噪稀疏自动编码器作为特征提取器,对每幅图像经过密度空间采样得到的空间区域提取局部特征集信息;该步骤的具体过程:(4) Using the trained denoising sparse autoencoder as a feature extractor, extract local feature set information from the spatial region obtained by density space sampling for each image; the specific process of this step: (4a)从步骤(3b)得到的每幅图像的每个空间区域中随机抽取
Figure FDA0002390211910000015
个r*r的图像块;
(4a) Randomly extract from each spatial region of each image obtained in step (3b)
Figure FDA0002390211910000015
r*r image blocks;
(4b)将训练好的降噪稀疏自动编码器作为特征提取器,获取图像i中第l个空间区域中
Figure FDA0002390211910000021
个r*r图像块的局部特征集
Figure FDA0002390211910000022
每幅图像获取R个局部特征集,对应R个空间区域;
(4b) Using the trained denoising sparse autoencoder as a feature extractor, obtain the lth spatial region in image i
Figure FDA0002390211910000021
Local feature set of r*r image patches
Figure FDA0002390211910000022
Each image obtains R local feature sets, corresponding to R spatial regions;
(5)采用两层堆叠的Fisher Vector对步骤(4)提取的特征集信息进行编码,得到最终的Fisher向量,将其作为图像分类的特征向量;该步骤的具体过程:(5) Encoding the feature set information extracted in step (4) by using two-layer stacked Fisher Vectors to obtain the final Fisher vector, which is used as a feature vector for image classification; the specific process of this step: (5a)采用含有K个高斯分量的高斯混合模型对训练图像数据集M1的所有局部特征集进行建模,并求解得到该模型的参数θ;(5a) Use a Gaussian mixture model containing K Gaussian components to model all the local feature sets of the training image data set M1, and solve to obtain the parameter θ of the model; (5b)在步骤(5a)的基础上,利用第一层的Fisher vector对图像i的第l个空间区域的局部特征集进行编码,得到Fisher向量bl;经过第一层的Fisher向量编码后,得到图像i的Fisher向量集
Figure FDA0002390211910000023
(5b) on the basis of step (5a), utilize the Fisher vector of the first layer to encode the local feature set of the l-th spatial region of the image i to obtain the Fisher vector b l ; after the Fisher vector encoding of the first layer , get the Fisher vector set of image i
Figure FDA0002390211910000023
(5c)对训练图像数据集M1和测试图像数据集M2中所有图像的Fisher向量集进行PCA降维处理,降维后图像i的Fisher向量集为
Figure FDA0002390211910000024
(5c) Perform PCA dimension reduction on the Fisher vector sets of all images in the training image data set M 1 and the test image data set M 2 , and the Fisher vector set of the image i after dimensionality reduction is:
Figure FDA0002390211910000024
(5d)采用含有K'个高斯分量的高斯混合模型对训练图像数据集M1的所有图像降维后的Fisher向量集进行建模,并求解得到该模型的参数θ';(5d) Use a Gaussian mixture model containing K' Gaussian components to model the Fisher vector set of all images of the training image data set M 1 after dimensionality reduction, and solve to obtain the parameter θ' of the model; (5e)在步骤(5d)的基础上,利用第二层的Fishervector对降维后图像i的Fisher向量集Bi'进行编码,得到Fisher向量fi,得到训练图像数据集M1和测试图像数据集M2每幅图像最终的Fisher向量;(5e) On the basis of step (5d), use the Fisher vector of the second layer to encode the Fisher vector set B i ' of the image i after dimensionality reduction, obtain the Fisher vector f i , and obtain the training image data set M 1 and the test image The final Fisher vector of each image in the dataset M2 ; (6)采用训练图像数据集M1的Fisher向量和与之对应的标签来训练分类器,将测试图像数据集M2的Fisher向量输入训练好的分类器中,进行图像分类。(6) Use the Fisher vector of the training image data set M 1 and the corresponding label to train the classifier, and input the Fisher vector of the test image data set M 2 into the trained classifier for image classification.
2.根据权利要求1所述基于降噪稀疏自动编码器和密度空间采样的图像分类方法,其特征在于,步骤(2)的具体过程如下:2. the image classification method based on noise reduction sparse automatic encoder and density space sampling according to claim 1, is characterized in that, the concrete process of step (2) is as follows: (2a)对图像块训练集X进行加噪处理:按照比例q随机对X中的图像块数据xj加噪处理,得xj'~q(xj'|xj),其中,xj'是按照比例q将向量xj一部分值置为0,j∈{1,MN};(2a) Add noise to the image block training set X: randomly add noise to the image block data x j in X according to the proportion q, to obtain x j '~q(x j '|x j ), where x j ' is to set a part of the value of the vector x j to 0 according to the ratio q, j∈{1,MN}; (2b)随机初始化模型参数W1、W2、b1、b2,设置降噪稀疏自动编码器输入层神经元的个数为n,隐藏层神经元的个数为s,其中,
Figure FDA0002390211910000031
表示编码权重,
Figure FDA0002390211910000032
表示解码权重,
Figure FDA0002390211910000033
表示编码偏置项,
Figure FDA0002390211910000034
表示解码偏置项;
(2b) Randomly initialize the model parameters W 1 , W 2 , b 1 , b 2 , set the number of neurons in the input layer of the noise reduction sparse autoencoder to n, and the number of neurons in the hidden layer to be s, where,
Figure FDA0002390211910000031
represents the encoding weight,
Figure FDA0002390211910000032
represents the decoding weight,
Figure FDA0002390211910000033
represents the coding bias term,
Figure FDA0002390211910000034
represents the decoding bias term;
设αj=g(W1xj'+b1),其中,
Figure FDA0002390211910000035
g(z)=1/(1+exp(-z)),
Figure FDA0002390211910000036
Figure FDA0002390211910000037
Let α j =g(W 1 x j '+b 1 ), where,
Figure FDA0002390211910000035
g(z)=1/(1+exp(-z)),
Figure FDA0002390211910000036
Figure FDA0002390211910000037
(2c)定义降噪稀疏自动编码器的整体代价函数Loss:(2c) Define the overall cost function Loss of the denoising sparse autoencoder:
Figure FDA0002390211910000038
Figure FDA0002390211910000038
其中,λ是权重衰减系数,β是稀疏惩罚权重,ρ是目标稀疏值,
Figure FDA0002390211910000039
是所有训练样本在第t个隐藏神经元上的平均响应;
where λ is the weight decay coefficient, β is the sparse penalty weight, ρ is the target sparse value,
Figure FDA0002390211910000039
is the average response of all training samples on the t-th hidden neuron;
(2d)利用误差反向传播算法,计算整体代价函数Loss的梯度;通过改进的拟牛顿法L-BFGS求解整体代价函数Loss的最小化问题,从而获得训练后的模型参数W1、b1、W2、b2(2d) Use the error back propagation algorithm to calculate the gradient of the overall cost function Loss; solve the minimization problem of the overall cost function Loss through the improved quasi-Newton method L-BFGS, so as to obtain the trained model parameters W 1 , b 1 , W 2 , b 2 .
3.根据权利要求1所述基于降噪稀疏自动编码器和密度空间采样的图像分类方法,其特征在于,在步骤(5a)中,将K个高斯单元的高斯混合模型记为
Figure FDA00023902119100000310
其中,at为单个局部特征,pi(at)代表第i个高斯单元,
Figure FDA00023902119100000311
D表示局部特征at的维数,πii,∑i分别表示第i个高斯单元的权重、均值和协方差矩阵;通过训练图像数据集M1的所有局部特征集来估计高斯混合模型的所有参数θ={πii,∑i,i=1,...K}。
3. the image classification method based on noise reduction sparse automatic encoder and density space sampling according to claim 1, is characterized in that, in step (5a), the Gaussian mixture model of K Gaussian units is denoted as
Figure FDA00023902119100000310
where at is a single local feature, pi (at ) represents the ith Gaussian unit,
Figure FDA00023902119100000311
D represents the dimension of local feature a t , π i , μ i , ∑ i represent the weight, mean and covariance matrix of the ith Gaussian unit, respectively; Gaussian mixture is estimated by training all local feature sets of image dataset M 1 All parameters of the model θ={π i , μ i , ∑ i , i=1,...K}.
4.根据权利要求1所述基于降噪稀疏自动编码器和密度空间采样的图像分类方法,其特征在于,在步骤(6)中,所述分类器采用支持向量机。4 . The image classification method based on noise reduction sparse autoencoder and density space sampling according to claim 1 , wherein in step (6), the classifier adopts a support vector machine. 5 .
CN201810212714.7A 2018-03-15 2018-03-15 Image classification method based on denoising sparse autoencoder and density space sampling Active CN108416389B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810212714.7A CN108416389B (en) 2018-03-15 2018-03-15 Image classification method based on denoising sparse autoencoder and density space sampling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810212714.7A CN108416389B (en) 2018-03-15 2018-03-15 Image classification method based on denoising sparse autoencoder and density space sampling

Publications (2)

Publication Number Publication Date
CN108416389A CN108416389A (en) 2018-08-17
CN108416389B true CN108416389B (en) 2020-10-30

Family

ID=63131586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810212714.7A Active CN108416389B (en) 2018-03-15 2018-03-15 Image classification method based on denoising sparse autoencoder and density space sampling

Country Status (1)

Country Link
CN (1) CN108416389B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109389171B (en) * 2018-10-11 2021-06-25 云南大学 Medical image classification method based on multi-granularity convolutional denoising autoencoder technology
CN115496761B (en) * 2022-11-17 2023-03-03 湖南自兴智慧医疗科技有限公司 AE-based method and system for phase-splitting screening of low power lens and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8553994B2 (en) * 2008-02-05 2013-10-08 Futurewei Technologies, Inc. Compressive sampling for multimedia coding
CN105404858A (en) * 2015-11-03 2016-03-16 电子科技大学 Vehicle type recognition method based on deep Fisher network
CN105868796B (en) * 2016-04-26 2019-03-01 中国石油大学(华东) The design method of linear discriminant rarefaction representation classifier based on nuclear space
CN107133640A (en) * 2017-04-24 2017-09-05 河海大学 Image classification method based on topography's block description and Fei Sheer vectors
CN107609648B (en) * 2017-07-21 2021-02-12 哈尔滨工程大学 Genetic algorithm combined with stacking noise reduction sparse automatic encoder

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Action Recognition with Stacked Fisher Vectors;Xiaojiang Peng等;《European Conference on Computer Vision-ECCV2014》;20140912;第581-595页 *
Human Action Recognition from Local Part Model;Feng Shi等;《2011 IEEE International Workshop on Haptic Audio Visual Environments and Games》;20111017;第1-4页 *
融入形状信息的图像分类研究;嵇朋朋;《中国优秀硕士学位论文全文数据库信息科技辑》;20150715;第I138-1387页 *

Also Published As

Publication number Publication date
CN108416389A (en) 2018-08-17

Similar Documents

Publication Publication Date Title
CN107122809B (en) A neural network feature learning method based on image self-encoding
CN108664996B (en) A method and system for ancient text recognition based on deep learning
CN103605972B (en) Non-restricted environment face verification method based on block depth neural network
CN114898151B (en) An image classification method based on the fusion of deep learning and support vector machine
CN109345508B (en) A Bone Age Evaluation Method Based on Two-Stage Neural Network
CN104217225B (en) A kind of sensation target detection and mask method
Kong Facial expression recognition method based on deep convolutional neural network combined with improved LBP features
CN108108751B (en) Scene recognition method based on convolution multi-feature and deep random forest
CN112883839B (en) Remote sensing image interpretation method based on adaptive sample set construction and deep learning
Tian et al. Ear recognition based on deep convolutional network
CN109711283B (en) Occlusion expression recognition method combining double dictionaries and error matrix
Yang et al. STA-TSN: spatial-temporal attention temporal segment network for action recognition in video
CN110659665B (en) Model construction method of different-dimension characteristics and image recognition method and device
CN111680614A (en) An abnormal behavior detection method based on video surveillance
CN112464004A (en) Multi-view depth generation image clustering method
CN104200203B (en) A kind of human action detection method based on action dictionary learning
Sekma et al. Human action recognition based on multi-layer fisher vector encoding method
CN110059769A (en) The semantic segmentation method and system rebuild are reset based on pixel for what streetscape understood
CN106127240A (en) A kind of classifying identification method of plant image collection based on nonlinear reconstruction model
CN118196231B (en) A lifelong learning method based on concept segmentation
CN104462818B (en) A kind of insertion manifold regression model based on Fisher criterions
CN109948662B (en) A deep clustering method of face images based on K-means and MMD
CN104517120A (en) Remote sensing image scene classifying method on basis of multichannel layering orthogonal matching
CN110910388A (en) A Cancer Cell Image Segmentation Method Based on U-Net and Density Estimation
Li et al. Feature extraction based on deep‐convolutional neural network for face recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant