CN109002821B

CN109002821B - A digital identification method of online banking shield based on connected domain and tangent slope

Info

Publication number: CN109002821B
Application number: CN201810795864.5A
Authority: CN
Inventors: 姜燕; 饶刚; 梅文浩; 张颖
Original assignee: Wuhan University of Science and Technology WHUST
Current assignee: Wuhan University of Science and Technology WHUST
Priority date: 2018-07-19
Filing date: 2018-07-19
Publication date: 2021-11-02
Anticipated expiration: 2038-07-19
Also published as: CN109002821A

Abstract

The invention discloses a digital identification method for online banking shields based on connected domains and tangent slopes. The expansion structure element is used to eliminate laser digital string breakpoints. In order to ensure that the breakpoints are completely eliminated, a larger expansion factor is set, so that the digital string has some Adhesion, the extremum segmentation method is performed on the glued number string to separate the numbers, and finally, 6 geometric and topological features are used to construct a digital prototype to distinguish the numbers, so as to identify the numbers. The actual digital recognition experiment of online banking shield shows that the recognition accuracy rate of this recognition method for digital strings is as high as 99%, and the average recognition time is 0.15s. The system has high recognition accuracy and has a certain real-time performance. Further improvement of the system can be applied to online monitoring of laser printing production lines, as well as consistency verification of products and packaging before products leave the factory.

Description

A digital identification method of online banking shield based on connected domain and tangent slope

技术领域technical field

本发明属于自动识别检测技术领域，涉及一种网银盾数字识别方法，具体涉及一种基于连通域和切线斜率的网银盾数字识别方法。The invention belongs to the technical field of automatic identification and detection, and relates to a digital identification method for an online banking shield, in particular to a digital identification method for an online banking shield based on a connected domain and a tangent slope.

背景技术Background technique

相对于印品质量较高的传统印刷数字串，激光印刷数字串由于存在断点、边缘粗糙等问题，采用传统算法无法对其进行准确的分割与识别。Compared with the traditional printed digital strings with higher quality of printed products, the laser printed digital strings cannot be accurately segmented and identified by traditional algorithms due to the problems of breakpoints and rough edges.

发明内容SUMMARY OF THE INVENTION

为了解决上述技术问题，本发明提供了一种基于连通域和切线斜率的网银盾数字识别方法。In order to solve the above technical problems, the present invention provides a digital identification method for online banking shields based on connected domains and tangent slopes.

本发明所采用的技术方案是：一种基于连通域和切线斜率的网银盾数字识别方法，其特征在于，包括以下步骤：The technical scheme adopted in the present invention is: a digital identification method based on the connected domain and the tangent slope, which is characterized in that it comprises the following steps:

步骤1：对图像进行预处理；Step 1: Preprocess the image;

步骤2：数字串分割；Step 2: Divide the digital string;

包括断裂字符拼合和粘连字符提取；Including broken character flattening and glued character extraction;

步骤3：数字串识别；Step 3: Number string identification;

包括识别特征选取和字符分类识别。Including recognition feature selection and character classification recognition.

相对与现有技术，本发明的有益效果是：Relative to the prior art, the beneficial effects of the present invention are:

(1)提出了一种对激光印刷数字串进行切分识别的算法，相对于已有的数字识别方法，该算法对印刷质量较差的激光数字提出了一种新的膨胀消除断点后用极值法进行分割的方法，在识别方面采用6个几何和拓扑特征来构造数据库中数字原型，通过对比数字串分割结果与数字原型，从而区分字符，对字符进行识别。(1) An algorithm for segmentation and recognition of laser printed digital strings is proposed. Compared with the existing digital recognition methods, this algorithm proposes a new expansion to eliminate breakpoints for laser numbers with poor printing quality. In the method of segmentation by extreme value method, six geometric and topological features are used to construct digital prototypes in the database, and characters are distinguished and recognized by comparing the segmentation results of digital strings with the digital prototypes.

(2)实验结果证明算法识别准确率高，速度快，并具有一定的实时性，对激光断裂粘连字符具有很好的识别能力，并且可识别轻微倾斜字符。算法对12位数字串的识别准确率高达99％，平均识别时间为0.15s。(2) The experimental results show that the algorithm has high recognition accuracy, high speed, and certain real-time performance. It has a good ability to recognize laser fractured and glued characters, and can recognize slightly inclined characters. The algorithm's recognition accuracy for 12-digit strings is as high as 99%, and the average recognition time is 0.15s.

附图说明Description of drawings

图1为本发明实施例的流程图；1 is a flowchart of an embodiment of the present invention;

图2为本发明实施例的图像二值化结果示意图；2 is a schematic diagram of an image binarization result according to an embodiment of the present invention;

图3为本发明实施例的图像水平及竖直方向投影图；3 is a horizontal and vertical projection view of an image according to an embodiment of the present invention;

图4为本发明实施例的数字串膨胀处理实验对比图；4 is a comparison diagram of an experiment of digital string expansion processing according to an embodiment of the present invention;

图5为本发明实施例的图像上下轮廓特征图；FIG. 5 is a feature diagram of the upper and lower contours of an image according to an embodiment of the present invention;

图6为本发明实施例的Code 128C码编码格式图。FIG. 6 is a diagram of a Code 128C code encoding format according to an embodiment of the present invention.

具体实施方式Detailed ways

为了便于本领域普通技术人员理解和实施本发明，下面结合附图及实施例对本发明作进一步的详细描述，应当理解，此处所描述的实施示例仅用于说明和解释本发明，并不用于限定本发明。In order to facilitate the understanding and implementation of the present invention by those of ordinary skill in the art, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the embodiments described herein are only used to illustrate and explain the present invention, but not to limit it. this invention.

相对于印品质量较高的传统印刷数字串，激光印刷数字串由于存在断点、边缘粗糙等问题，采用传统算法无法对其进行准确的分割与识别。为此，本发明提出了一种对激光印刷数字串进行切分和识别的方法。该方法采用膨胀结构元消除激光数字串断点，为保证断点完全消除，设置较大的膨胀因子，从而使得数字串有所粘连，对粘连数字串进行极值切分法从而分离数字，最后用6个几何和拓扑特征构造数字原型用于区分数字，从而对数字进行识别。Compared with the traditional printed digital strings with higher quality of printed products, the laser printed digital strings cannot be accurately segmented and identified by traditional algorithms due to the problems of breakpoints and rough edges. To this end, the present invention proposes a method for segmenting and identifying a laser-printed digital string. In this method, the expansion structure element is used to eliminate the breakpoint of the laser digital string. In order to ensure the complete elimination of the breakpoint, a larger expansion factor is set, so that the digital string is glued. The digital prototype is constructed with 6 geometric and topological features for distinguishing the numbers, so as to identify the numbers.

由于网银盾的激光数字序列号与包装盒上条码序列号相匹配，故对条码进行识别，若两者识别结果一致，则数字识别成功且网银盾包装正确，否则网银盾包装有误或者序列号识别有误，网银盾需要重新返厂处理。Since the laser digital serial number of the online banking shield matches the barcode serial number on the packaging box, the barcode is identified. If the identification results of the two are consistent, the digital identification is successful and the online banking shield is packaged correctly; otherwise, the online banking shield is incorrectly packaged or the serial number If the identification is wrong, the online banking shield needs to be returned to the factory for processing.

请见图1，本发明提供的一种基于连通域和切线斜率的网银盾数字识别方法，包括以下步骤：See Fig. 1, a kind of online banking shield digital identification method based on connected domain and tangent slope provided by the present invention comprises the following steps:

步骤1：对图像进行预处理；Step 1: Preprocess the image;

步骤1.1：图像二值化；Step 1.1: Image binarization;

由于产品数量较大，图像会受环境光线、采集设备参数等因素影响，图像的灰度值会出现不同的灰度值集合。因此本实施例采用动态阈值处理法Otsu’s方法对图像进行二值化处理。Due to the large number of products, the image will be affected by factors such as ambient light and acquisition device parameters, and the gray value of the image will appear different gray value sets. Therefore, in this embodiment, the dynamic threshold processing method Otsu's method is used to perform binarization processing on the image.

图像处理结果如图2所示。The image processing result is shown in Figure 2.

步骤1.2：数字串提取；Step 1.2: digital string extraction;

对上述二值化处理后的区域采用统计法，0代表背景，1代表图元边界，对水平方向灰度等级为1的像素点数量进行统计，去除像素点低于平均值的行后提取像素点数量最多区域后对此区域进行竖直方向像素统计，去除像素点低于平均值的列后提取得像素点最多区域(视间隔小于10列的为同一区域)即为目标数字串所在区域，水平竖直方向统计结果如图3。记录最终区域高度ah为字符高度值，为避免边界信息损失，对该区域边界进行适当放大。Statistical method is used for the above binarized area, 0 represents the background, 1 represents the boundary of the primitive, and the number of pixels with a gray level of 1 in the horizontal direction is counted, and the pixels whose pixels are lower than the average are removed and the pixels are extracted. After the area with the largest number of points, perform vertical pixel statistics on this area, remove the column with the pixel points lower than the average and extract the area with the most pixel points (the same area with an apparent interval of less than 10 columns) is the area where the target number string is located. The statistical results in the horizontal and vertical directions are shown in Figure 3. The final area height ah is recorded as the character height value. In order to avoid the loss of boundary information, the area boundary is appropriately enlarged.

步骤2：数字串分割；Step 2: Divide the digital string;

其中断裂字符拼合，具体实现包括以下子步骤：Among them, the broken characters are combined, and the specific implementation includes the following sub-steps:

步骤2.1.1：数字串滤波去噪；Step 2.1.1: digital string filtering and denoising;

数字图像的采样及传输在经过传感器或传输通道时经常受到噪声的干扰，膨胀算法会放大噪声。故在膨胀前采用中值滤波消除孤立噪声点，使边缘保持良好。The sampling and transmission of digital images are often disturbed by noise when passing through the sensor or transmission channel, and the dilation algorithm will amplify the noise. Therefore, median filtering is used to eliminate isolated noise points before dilation to keep the edges well.

本实施例采用中值滤波消除孤立噪声点，滤波窗口大小为3×3，中值滤波器的定义如下：In this embodiment, median filtering is used to eliminate isolated noise points, and the size of the filtering window is 3×3. The definition of median filter is as follows:

G(m,n)＝MedA{fA(m+k,n+l),(k,l)∈w}；G(m,n)=MedA{fA(m+k,n+l),(k,l)∈w};

其中，fA(m,n)表示原始图像，G(m,n)表示处理后图像，m表示行数，n表示列数；k∈[-1,1]，l∈[-1,1]；w表示二维模板，为3*3；Among them, fA(m,n) represents the original image, G(m,n) represents the processed image, m represents the number of rows, and n represents the number of columns; k∈[-1,1], l∈[-1,1] ;w represents a two-dimensional template, which is 3*3;

步骤2.1.2：去除数字串断点；Step 2.1.2: Remove the breakpoint of the digital string;

为消除数字由于印刷质量引起的断点，本实施例采用对数字串进行膨胀；膨胀是将与图元接触的背景点合并到图元中，使图像中的图元“变粗”的操作，A被B膨胀，定义为：In order to eliminate the breakpoints caused by the printing quality of the numbers, this embodiment uses the expansion of the number string; expansion is the operation of merging the background points in contact with the primitives into the primitives to make the primitives in the image "thick". A is inflated by B, defined as:

式中：φ为空集，

为结构元，即A被B膨胀是由所有结构元的原点位置组成的集合；In the formula: φ is the empty set,

is a structural element, that is, A is expanded by B, which is a set composed of the origin positions of all structural elements;

本实施例采用膨胀结构元如下：This embodiment adopts the expansion structure element as follows:

膨胀处理很好的消除数字串中的断点，完成了数字串分割前处理，数字串处理前后如图4。Dilation processing can eliminate the breakpoints in the digital string very well, and complete the pre-processing of the digital string segmentation. Figure 4 shows the before and after processing of the digital string.

其中粘连字符提取，是对粘连数字串进行分割，提取单个字符，之后对字符进行归一化处理。具体实现包括以下子步骤：Among them, the glue character extraction is to segment the glue number string, extract a single character, and then normalize the character. The specific implementation includes the following sub-steps:

步骤2.2.1：字符分割；Step 2.2.1: character segmentation;

由于数字串中部分数字粘连，故采用轮廓极值分析法，得到合理的字符分割点，从而对数字串进行分割。Due to the adhesion of some numbers in the number string, the contour extreme value analysis method is used to obtain a reasonable character segmentation point, so as to segment the number string.

通过追踪搜索得到上下轮廓特征，如图5所示。The upper and lower contour features are obtained by tracking search, as shown in Figure 5.

图像上轮廓函数定义为f(x)，其中X＝(x₁,x₂,...,x_n)^T为定义域；n为定义域右边界，即总点数；对于X*∈R，如果存在某个ε＞0，使所有与X*的距离小于ε的X_a∈X均满足不等式f(X_a)≥f(X*)，即X_a∈X且|X_a-X*|＜ε，则称X*为f(x)在R上的局部极小点，局部极小值记为f_T(X^*)；由此得上轮廓局部最小值向量为X^T＝(X^T1，X^T2，...，X^Tm)；The contour function on the image is defined as f(x), where X=(x ₁ , x ₂ ,...,x _n ) ^T is the domain of definition; n is the right boundary of the domain of definition, that is, the total number of points; for X*∈R, If there is a certain ε>0, make all X _a ∈X whose distance from X* is less than ε satisfy the inequality f(X _a )≥f(X*), that is, X _a ∈ X and |X _a -X*| <ε, then X* is called the local minimum point of f(x) on R, and the local minimum value is recorded as f _T (X ^* ); thus the local minimum vector of the contour is X ^T = (X ^T1 , X ^T2 , ..., X ^Tm );

同理得到下轮廓局部最大值向量为X^B＝(X^B1，X^B2，...，X^Bm)；局部极大值记为f_B(X^*)；Similarly, the local maximum vector of the lower contour is obtained as X ^B = (X ^B1 , X ^B2 , ..., X ^Bm ); the local maximum value is denoted as f _B (X ^* );

对于X_Ti∈X_T，若

且X_Bj∈[X_Ti-5,X_Ti+5]，且For X _Ti ∈ X _T , if

and X _Bj ∈ [X _Ti -5,X _Ti +5], and

则

为数字串分割点；记录最终区域高度ah为字符高度值；but

It is the dividing point of the number string; record the final area height ah is the character height value;

对X_T进行遍历后，若D_y＝[D₁,D₂,...,D₁₃]则数字串分割成功，否则数字串分割失败；After traversing X _T , if D _y =[D ₁ ,D ₂ ,...,D ₁₃ ], the number string segmentation succeeds, otherwise the number string segmentation fails;

步骤2.2.2：字符分割；Step 2.2.2: character segmentation;

切分后的字符大小不一，且由于外包装加工误差，部分数字轻微倾斜，因此采用最近邻域插值法将图像进行归一化处理，将图像加工为尺寸统一的标准图像，保证了所有字符的最大宽度均为W、最大高度统一为H，从而较小由于字符大小不一引起的字符识别误差，提高识别的准确性。The size of the divided characters is different, and some numbers are slightly inclined due to the processing error of the outer packaging. Therefore, the nearest neighbor interpolation method is used to normalize the image, and the image is processed into a standard image of uniform size to ensure that all characters are The maximum width of the characters is W, and the maximum height is unified to H, so as to reduce the character recognition error caused by different character sizes and improve the recognition accuracy.

步骤2.2.3：轮廓突变点提取；Step 2.2.3: Contour mutation point extraction;

采用轮廓斜率突变检测算法，对数字进行识别；Use the contour slope mutation detection algorithm to identify the numbers;

首先对投影轮廓相邻两点之间的斜率进行计算，判断斜率是否产生突变，并记录斜率突变次数，作为字符识别特征，计算时去除突变的首尾两端；First, the slope between two adjacent points of the projected contour is calculated to determine whether the slope has a sudden change, and the number of slope sudden changes is recorded as a character recognition feature, and the first and last ends of the sudden change are removed during calculation;

底轮廓相邻两点之间斜率为：The slope between two adjacent points of the bottom contour is:

B(i)＝(Bottom(i+1)-Bottom(i))/1；B(i)=(Bottom(i+1)-Bottom(i))/1;

其中，i＝4,5,...,W-4；Among them, i=4,5,...,W-4;

当B(i)＞0.15*H时，记做斜率发生了一次突变，记录突变次数为bp；When B(i)>0.15*H, it is recorded as a mutation in the slope, and the number of mutation is recorded as bp;

右轮廓相邻两点之间斜率为：The slope between two adjacent points on the right contour is:

R(i)＝(Right(i+1)-Right(i))/1；R(i)=(Right(i+1)-Right(i))/1;

其中，i＝4_,5,...,H-4；Among them, i=4 _, 5,...,H-4;

为了提高识别的精度，将左轮廓分为上下两部分；In order to improve the recognition accuracy, the left contour is divided into upper and lower parts;

计算图像左上轮廓相邻两点之间的斜率为：Calculate the slope between two adjacent points on the upper left contour of the image as:

Lu(i)＝(Left(i+1)-Left(i))/1；Lu(i)=(Left(i+1)-Left(i))/1;

其中，i＝4,5...,H/2+1；Among them, i=4,5...,H/2+1;

计算图像左下轮廓相邻两点之间的斜率为：Calculate the slope between two adjacent points in the lower left contour of the image as:

Ld(i)＝(Left(i+1)-Left(i))/1；Ld(i)=(Left(i+1)-Left(i))/1;

其中，i＝H/2+1,...,H-4；Among them, i=H/2+1,...,H-4;

设置斜率阈值为0.3*W，R(i)、Lu(i)、Ld(i)中任意一个值大于0.3*W时记做斜率发生了一次突变，分别记录R(i)、Lu(i)、Ld(i)中的轮廓突变点数量为rp、lup、ldp。Set the slope threshold to 0.3*W, when any value of R(i), Lu(i), Ld(i) is greater than 0.3*W, it is recorded as a sudden change in the slope, and R(i), Lu(i) are recorded respectively. , The number of contour mutation points in Ld(i) is rp, lup, ldp.

步骤3：数字串识别；Step 3: Number string identification;

其中，识别特征选取，采用提取数值特征法对字符进行区分；为准确的对数字进行分类识别，选择6个几何和拓扑特征来构造数据库中数字原型，从而区分字符，所选特征如下：Among them, for the selection of identification features, the extraction of numerical features is used to distinguish characters; in order to accurately classify and identify numbers, six geometric and topological features are selected to construct digital prototypes in the database to distinguish characters. The selected features are as follows:

(1)连通域数量cd＝{1、2、3}；(1) The number of connected domains cd={1, 2, 3};

本实施例采用4联通寻找法；4联通寻找法是指，从区域上任一点出发，通过上、下、左、右四个方向上的移动组合，抵达区域的任意像素的方法，其中上、下、左、右四个方向上的像素分别记作Ii(x,y+1)、Ii(x,y-1)、Ii(x-1,y)、Ii(x+1,y)。This embodiment adopts the 4-connection search method; the 4-connection search method refers to the method of starting from any point on the area and reaching any pixel in the area through the combination of movements in the four directions of up, down, left and right. , the pixels in the left and right directions are respectively denoted as Ii(x, y+1), Ii(x, y-1), Ii(x-1, y), and Ii(x+1, y).

连通域具体算法如下：The specific algorithm of the connected domain is as follows:

1)二值图像前景像素为1，背景像素为0。记当前图像连通域数量最大值为Lmax＝1，从上至下扫描图像，对像素值为0的像素点Ii(x,y)不做处理；1) The foreground pixel of the binary image is 1, and the background pixel is 0. Note that the maximum number of connected domains in the current image is Lmax=1, scan the image from top to bottom, and do not process the pixel point Ii(x, y) whose pixel value is 0;

2)对第一个像素值为1的点进行搜索，考察Ii(x,y+1)、Ii(x,y-1)、Ii(x-1,y)、Ii(x+1,y)的值是否有至少一个值为Lmax，若有，Ii(x,y)＝Lmax，否则Ii(x,y)＝Lmax+1；2) Search for the point with the first pixel value of 1, examine Ii(x, y+1), Ii(x, y-1), Ii(x-1, y), Ii(x+1, y ) whether there is at least one value of Lmax, if so, Ii(x,y)=Lmax, otherwise Ii(x,y)=Lmax+1;

3)分别将Ii(x,y+1)、Ii(x,y-1)、Ii(x-1,y)、Ii(x+1,y)作为当前点对其进行2)中的搜索；3) Take Ii(x, y+1), Ii(x, y-1), Ii(x-1, y), and Ii(x+1, y) as the current point respectively and perform the search in 2) ;

4)重复2)、3)直至完成所有点的遍历。4) Repeat 2) and 3) until the traversal of all points is completed.

遍历字符Ii上的所有点之后可以得到字符Ii中的所有连通部分，以及字符Ii的连通域数量Lmax，从而将数字按照连通域的数量初步分类为：包含3个连通域的8，包含2个连通域的0、4、6、9，包含一个连通域的1、2、3、5、7。After traversing all the points on the character Ii, all connected parts in the character Ii and the number of connected domains Lmax of the character Ii can be obtained, so that the numbers are initially classified according to the number of connected domains: 8 including 3 connected domains, including 2 0, 4, 6, and 9 of the connected domain include 1, 2, 3, 5, and 7 of a connected domain.

(2)3/4字符高度内最大宽度与字符总高度比值wd＝max{Right(i)-Left(i)}/H；(2) The ratio of the maximum width within 3/4 character height to the total character height wd=max{Right(i)-Left(i)}/H;

(3)底轮廓突变点数量bp＝{0、≥1}；(3) The number of bottom contour mutation points bp={0, ≥1};

(4)右轮廓突变点数量rp＝{0、≥1}；(4) The number of right contour mutation points rp={0, ≥1};

(5)左上廓突变点数量lup＝{0、≥1}；(5) The number of mutation points in the upper left profile lup={0, ≥1};

(6)左下廓突变点数量ldp＝{0、≥1}。(6) The number of mutation points in the lower left profile ldp={0, ≥1}.

假设数字原型S_j，从而数字有10个j∈[0,9]，k为特征数，总共有6个特征，但是特征数取得最多的S_j为0，选取了4个特征，特征数取得最少的S_j为8，选取了1个特征，故k∈[1,4]；Suppose the digital prototype S _j , so the number has 10 j∈[0,9], k is the number of features, there are 6 features in total, but S _j with the most number of features is 0, 4 features are selected, and the number of features is obtained The minimum S _j is 8, and 1 feature is selected, so k∈[1,4];

其中，字符分类识别，是将扫描分割后所得字符s_i与数据库中的数字原型S_i进行匹配从而进行字符识别；Wherein, the character classification and recognition is to perform character recognition by matching the characters _si obtained after scanning and _segmentation with the digital prototype Si in the database;

其中S_j＝(f_1j,f_2j,...,f_mj)，j＝{1,2,...,k}，m≤6，f_mj为步骤3中6项特征之一；s_j＝({cd_j},{wd_j},{bp_j},{rp_j},{lup_j},{ldp_j})；where S _j =(f _1j , f _2j ,...,f _mj ), j={1,2,...,k}, m≤6, f _mj is one of the six features in step 3; s _j = ({cd _j },{wd _j },{bp _j },{rp _j },{lup _j },{ldp _j });

如果

且p＝{1,2,...m}，c＝{1,2,...8}，则

即字符s_j与原型S_j相匹配；其中f_pj和f_cj为步骤3中6项特性之一；if

And p={1,2,...m}, c={1,2,...8}, then

That is, the character s _j matches the prototype S _j ; where f _pj and f _cj are one of the six characteristics in step 3;

如果对于数字串C＝(s₁,s₂,...,s₁₂)若存在一个原型集合P＝(S₁,S₂,...,S_k)对于

则数字串识别成功。If for the digital string C=(s ₁ ,s ₂ ,...,s ₁₂ )if there exists a prototype set P=(S ₁ ,S ₂ ,...,S _k )for

Then the identification of the number string is successful.

本实施例的数字原型矢量如下：The digital prototype vector of this embodiment is as follows:

S₀＝{cd＝2,bp＝0,rp＝0,lbp＝0}；S ₀ ={cd=2,bp=0,rp=0,lbp=0};

S₁＝{cd＝1,wd＜0.35}；S ₁ ={cd=1,wd<0.35};

S₃＝{cd＝1,lup≥1,ldp≥1}；S ₃ ={cd=1, lup≥1, ldp≥1};

S₄＝{cd＝2,bp≥1}；S ₄ ={cd=2,bp≥1};

S₅＝{cd＝1,ldp≥1,lup≥1}；S ₅ ={cd=1,ldp≥1,lup≥1};

S₆＝{cd＝2,rp≥1}；S ₆ ={cd=2,rp≥1};

S₇＝{cd＝1,bp≥1}；S ₇ ={cd=1,bp≥1};

S₈＝{cd＝3}；S ₈ ={cd=3};

S₉＝{cd＝2,lbp≥1}。S ₉ ={cd=2, Ibp≥1}.

步骤4：进行数字串识别结果校验；Step 4: verify the identification result of the digital string;

步骤4.1：获取网银盾条形码图像；Step 4.1: Obtain the barcode image of the online banking shield;

网银盾使用的条形码格式为Code 128C码，其编码方式如图6。起始码、编码区和校验码中每个符号由3根黑条和3根白条组成，每个符号长度为11且包含两位数字，终止码由4根黑条3根白条组成长度为13。由此推得网银盾条码结构为：The barcode format used by the online banking shield is Code 128C, and its encoding method is shown in Figure 6. Each symbol in the start code, coding area and check code is composed of 3 black bars and 3 white bars, each symbol is 11 in length and contains two digits, and the end code is composed of 4 black bars and 3 white bars. The length is 13. From this, the structure of the online banking shield barcode is derived as follows:

X＝[S,A₁,A₂,A₃,A₄,A₅,A₆,T,E]X=[S,A ₁ ,A ₂ ,A ₃ ,A ₄ ,A ₅ ,A ₆ ,T,E]

其中S、E为起始、终止码，T为校验码，A为字符编码。Among them, S and E are the start and stop codes, T is the check code, and A is the character code.

步骤4.2：采用Otsu’s算法对图像进行二值化后提取条码区域，然后中值滤波去噪；Step 4.2: Use Otsu's algorithm to binarize the image and extract the barcode area, then median filter to denoise;

步骤4.3：采用平均值法和归一化法，消除由于倾斜带来的检测误差；Step 4.3: Use the mean value method and the normalization method to eliminate the detection error caused by the tilt;

由于包装运输过程中变形以及条码粘贴误差导致条码可能出现倾斜，对此本实施例采用平均值法和归一化法，消除由于倾斜带来的检测误差。The bar code may be inclined due to deformation during packaging and transportation and the bar code pasting error. For this purpose, the average value method and the normalization method are adopted in this embodiment to eliminate the detection error caused by the inclination.

首先对黑白条码宽度进行逐行统计，去除黑白条码总数不为55的统计结果，从而消除条码倾斜带来的统计误差，统计得到：First, the width of black and white barcodes is counted line by line, and the statistical results where the total number of black and white barcodes is not 55 are removed, so as to eliminate the statistical error caused by the skew of the barcode, and the statistics are obtained:

x_i＝[w₁,w₂,…,w₅₅]i∈[1,L]；x _i =[w ₁ ,w ₂ ,...,w ₅₅ ]i∈[1,L];

其中，x_i表示每一行统计总共55个黑条和白条得到宽度w_k；Among them, x _i represents that each row counts a total of 55 black bars and white bars to obtain the width w _k ;

统计过程中：对于倾斜的条码，有些行数的统计条数不到55(b行)，只统计黑白条扫描结果为55的行(a)，L为最终满足条件的总行数。Statistical process: For oblique barcodes, the number of statistical lines for some lines is less than 55 (line b), only the lines (a) whose scanning result of black and white bars is 55 are counted, and L is the total number of lines that finally meet the conditions.

对结果求均值以减小统计误差：Average the results to reduce statistical error:

其中，

表示每个黑条或白条统计得到的平均宽度；in,

Indicates the average width of each black bar or white bar;

计算条码单位长度：Calculate barcode unit length:

由此得到条码实际编码为：The actual barcode obtained from this is:

其中，

表示每个黑条或白条归一化得到的最终二值化对应的标准宽度；in,

Indicates the standard width corresponding to the final binarization obtained by normalizing each black bar or white bar;

对起始码和终止码进行校验，若

且

则进行编码区识别，对照Code 128C码编码表得[A₁,A₂,A₃,A₄,A₅,A₆,T]；Check the start code and end code, if

and

Then carry out coding region identification, and obtain [A ₁ , A ₂ , A ₃ , A ₄ , A ₅ , A ₆ , T] according to the Code 128C code coding table;

最后对条码进行校验：Finally, check the barcode:

T^*＝(105+1*A₁+2*A₂+3*A₃+4*A₄+5*A₅+6*A₆)％103；T ^* =(105+ ₁ *A1+ ₂ *A2+ ₃ *A3+ ₄ *A4+ ₅ *A5+ ₆ *A6)%103;

若T^*＝T则条码识别成功；If T ^* =T, the barcode recognition is successful;

若通过上述识别得到结果[A₁,A₂,A₃,A₄,A₅,A₆]＝(s₁,s₂,...,s₁₂)则该网银盾合格，校验通过，否则网银盾包装有误或者序列号识别有误。If the result [A ₁ , A ₂ , A ₃ , A ₄ , A ₅ , A ₆ ] = (s ₁ , s ₂ ,..., s ₁₂ ) is obtained through the above identification, the online banking shield is qualified and the verification is passed. Otherwise, the online banking shield is incorrectly packaged or the serial number is incorrectly identified.

本实施例对100个网银盾共1200个字符100个条码进行识别，对每个网银盾识别的序列号[s₁,s₂,...,s₁₂]和包装盒上扫描得出的条码号[A₁,A₂,A₃,A₄,A₅,A₆]进行对比，号码一致则判断为正确，否则判断为加工包装错误或序列号识别有误，设置声音报警器，提示当前的网银盾需要重新校验，同时对网银盾号码是否连续进行检验，若号码不连续则识别出错或生产线出错，系统报警。This embodiment identifies 100 online banking shields with a total of 1200 characters and 100 barcodes, and identifies the serial numbers [s ₁ , s ₂ ,..., s ₁₂ ] of each online banking shield and the barcodes scanned on the packaging box. No. [A ₁ , A ₂ , A ₃ , A ₄ , A ₅ , A ₆ ] for comparison, if the numbers are consistent, it is judged to be correct, otherwise it is judged that the processing and packaging is wrong or the serial number is wrongly identified, and a sound alarm is set to prompt the current The online banking shield needs to be re-checked, and at the same time, it is checked whether the number of the online banking shield is continuous.

实验得算法识别准确率达到99％，唯一一个网银盾识别出错是由于加工后带上了少许金属粉尘挡住数字9导致识别无效，擦掉粉尘后识别正确，即在没有污渍的情况下，本算法对网银盾的识别准确率可达到100％，算法平均识别时间为0.15s。最后进行错误产品校验实验，将产品与包装盒进行错误装配后，系统有效的检查出错误并报警。In the experiment, the recognition accuracy rate of the algorithm reached 99%. The only error in the recognition of the online banking shield was due to the fact that a little metal dust was used to block the number 9 after processing, which made the recognition invalid. After wiping off the dust, the recognition was correct. The recognition accuracy of the online banking shield can reach 100%, and the average recognition time of the algorithm is 0.15s. Finally, the error product verification experiment is carried out. After the product is wrongly assembled with the packaging box, the system effectively detects the error and alarms.

应当理解的是，本说明书未详细阐述的部分均属于现有技术。It should be understood that the parts not described in detail in this specification belong to the prior art.

应当理解的是，上述针对较佳实施例的描述较为详细，并不能因此而认为是对本发明专利保护范围的限制，本领域的普通技术人员在本发明的启示下，在不脱离本发明权利要求所保护的范围情况下，还可以做出替换或变形，均落入本发明的保护范围之内，本发明的请求保护范围应以所附权利要求为准。It should be understood that the above description of the preferred embodiments is relatively detailed, and therefore should not be considered as a limitation on the protection scope of the patent of the present invention. In the case of the protection scope, substitutions or deformations can also be made, which all fall within the protection scope of the present invention, and the claimed protection scope of the present invention shall be subject to the appended claims.

Claims

1. A method for identifying the number of an online silver shield based on a connected domain and a tangent slope is characterized by comprising the following steps:

step 1: preprocessing the image;

step 2: dividing a digit string;

splicing broken characters and extracting adhesive characters;

the method for extracting the sticky characters specifically comprises the following substeps:

step 2.2.1: character segmentation;

obtaining upper and lower contour features through tracking search;

the contour function on the image is defined as f (X), where X ═ X₁,x₂,...,x_n)^TTo define a domain; n is the right boundary of the definition domain, namely the total point number; for X^*E R, if there is some epsilon > 0, all are compared with X^*X is less than epsilon_aAll the epsilon X satisfy inequality f (X)_a)≥f(X^*) I.e. X_aBelongs to X and | X_a-X^*If | < ε, then it is called X^*Is f (x) a local minimum on R, the local minimum is denoted as f_T(X^*) (ii) a Thereby obtaining the vector of the local minimum value of the upper profile as X_T＝(X_T1，X_T2，...，X_Tm)；

Obtaining the local maximum vector of the lower contour as X by the same method_B＝(X_B1，X_B2，...，X_Bm) (ii) a Local maximum is noted as f_B(X^*)；

For X_Ti∈X_TIf, if

And X_Bj∈[X_Ti-5,X_Ti+5]And is and

then

Dividing points for the digit strings; recording the final area height ah as a character height value;

to X_TAfter traversing, if D_y＝[D₁,D₂,...,D₁₃]The numeric string is successfully segmented, otherwise, the numeric string is failed to segment;

step 2.2.2: character segmentation;

processing the image into a standard image with uniform size, ensuring that the maximum width of all characters is W and the maximum height is H;

step 2.2.3: extracting contour mutation points;

identifying the numbers by adopting a contour slope mutation detection algorithm;

firstly, a left-right scanning method is adopted to obtain a left end outline, then a right-left scanning method is adopted to obtain a right end outline, and the abscissa of the left outline and the abscissa of the right outline are respectively recorded by using one-dimensional arrays left (x) and right (x); obtaining a bottom contour by adopting a bottom-up scanning method, and recording the vertical coordinate of the bottom contour by using a one-dimensional array bottom (i);

then calculating the slope between two adjacent points of the projection contour, judging whether the slope generates mutation, recording the frequency of the slope mutation as a character recognition characteristic, and removing the head end and the tail end of the mutation during calculation;

the slope between two adjacent points of the bottom contour is as follows:

B(i)＝(Bottom(i+1)-Bottom(i))/1；

wherein, i ═ 4,5., W-4;

when B (i) > 0.15 × H, recording that the slope generates one mutation, and recording the mutation frequency as bp;

the slope between two adjacent points of the right contour is:

R(i)＝(Right(i+1)-Right(i))/1；

wherein, i is 4,5,., H-4;

in order to improve the identification precision, the left outline is divided into an upper part and a lower part;

calculating the slope between two adjacent points of the upper left contour of the image as follows:

Lu(i)＝(Left(i+1)-Left(i))/1；

wherein, i is 4,5, H/2+ 1;

calculating the slope between two adjacent points of the lower left contour of the image as follows:

Ld(i)＝(Left(i+1)-Left(i))/1；

wherein, i ═ H/2+ 1.., H-4;

setting a slope threshold value to be 0.3W, recording the slope as one time of mutation when any one value of R (i), Lu (i) and Ld (i) is more than 0.3W, and respectively recording the number of contour mutation points in R (i), Lu (i) and Ld (i) as rp, lup and ldp;

and step 3: recognizing a numeric string;

the method comprises the steps of identification feature selection and character classification identification.

2. The method for identifying the internet banking shield number based on the connected component and the tangent slope according to claim 1, wherein the detailed implementation of the step 1 comprises the following sub-steps:

step 1.1: carrying out image binarization;

carrying out binarization processing on the image by adopting a dynamic threshold processing method Otsu's method;

step 1.2: extracting a digit string;

adopting a statistical method for the image after binarization processing, wherein 0 represents a background and 1 represents a primitive boundary; counting the number of pixels with the gray level of 1 in the horizontal direction, removing the rows with the pixels lower than the average value, extracting the region with the largest number of pixels, then carrying out pixel counting in the vertical direction on the region, removing the columns with the pixels lower than the average value, then extracting the region with the largest number of pixels, namely the region where the target digit string is located, and recording the final region height ah as a character height value.

3. The internet banking shield number identification method based on the connected component and the tangent slope as claimed in claim 1, wherein the step 2 of splicing the broken characters specifically comprises the following sub-steps:

step 2.1.1: filtering and denoising the digital string;

and (3) eliminating isolated noise points by adopting median filtering, wherein the size of a filtering window is 3 multiplied by 3, and the definition of a median filter is as follows:

G(m,n)＝MedA{fA(m+k,n+l),(k,l)∈w}；

wherein, fA (m, n) represents the original image, G (m, n) represents the processed image, m represents the number of rows, and n represents the number of columns; k e-1, l e-1, 1; w represents a two-dimensional template, 3 x 3;

step 2.1.2: removing the break point of the numeric string;

expanding by using the logarithmic string; the expansion is an operation of combining background points in contact with the primitives into the primitives to make the primitives in the image "thick", wherein A is expanded by B and is defined as:

in the formula: phi is the empty set,

is a structural element, i.e. A is BDilation is a set consisting of the location of the origin of all structural elements.

4. The internet banking shield digital identification method based on the connected component and the tangent slope according to claim 1, wherein: selecting the identification characteristics in the step 3, and distinguishing characters by adopting a numerical characteristic extraction method; for accurate classification and identification of numbers, 6 geometric and topological features are selected to construct digital prototypes in a database so as to distinguish characters, and the selected features are as follows:

(1) the number of connected domains cd is {1,2, 3 };

(2)3/4 the ratio of the maximum width within the character height to the total character height, wd, max right (i) -left (i) }/H;

(3) the number bp of bottom contour mutation points is {0 ≧ 1 };

(4) the number rp of the right contour mutation points is {0 ≧ 1 };

(5) the number of upper left contour mutation points lup ≧ 1, ≧ 0;

(6) the number of lower left contour discontinuities ldp ≧ 0, ≧ 1.

5. The method for internet banking shield digital identification based on connected component and tangent slope according to claim 4, wherein: in the step 3, the character classification and identification is to scan and segment the obtained character s_iAnd the digital prototype S in the database_iPerforming matching to perform character recognition;

digital prototype S_jSo that the number has 10 j e [0,9 ]]K is the number of features, and there are 6 features in total, but the number of features is the largest S_jTo 0, 4 features were selected, with the least number of features S_jFor 8, 1 feature is chosen, so k ∈ [1,4 ]]；

Wherein S_j＝(f_1j,f_2j,...,f_mj)，j＝{1,2,...,k}，m≤6，f_mjIs one of the 6 characteristics in the step 3; s_j＝({cd_j},{wd_j},{bp_j},{rp_j},{lup_j},{ldp_j})；

If it is not

And p ═ 1,2,. m }, and c ═ 1,2,. 6}, then

I.e. the character s_jAnd the prototype S_jMatching; wherein f is_pjAnd f_cjIs one of the 6 characteristics in the step 3;

if C is(s) for the digit string₁,s₂,…,s₁₂) If there is a prototype set P ═ (S)₁,S₂,...,S_k) For the

The digit string identification is successful.

6. The method for identifying the internet banking shield number based on the connected component and the tangent slope according to any one of claims 1 to 5, wherein: step 3, after the digital string recognition is finished, further checking the digital string recognition result;

the method specifically comprises the following substeps:

step 4.1: acquiring an online silver shield bar code image;

the online banking shield bar Code format is Code 128C Code, and the coding mode of the online banking shield bar Code format is a start Code, a coding region, a check Code and a stop Code in sequence; each symbol in the start code, the coding region and the check code consists of 3 black bars and 3 white bars, each symbol has the length of 11 and comprises two digits, and the stop code consists of 4 black bars and 3 white bars, and has the length of 13; the internet bank shield bar code structure is as follows:

X＝[S,A₁,A₂,A₃,A₄,A₅,A₆,T,E]；

wherein S, E is start code and stop code, T is check code, A_iIs character code, i is 1,2, …, 6;

step 4.2: performing binarization on the image by adopting an Otsu's algorithm, extracting a bar code region, and then performing median filtering and denoising;

step 4.3: and (3) eliminating detection errors caused by inclination by adopting an average value method and a normalization method.

7. The method for identifying the internet banking shield number based on the connected domain and the tangent slope according to claim 6, wherein the specific implementation process of the step 4.3 is as follows:

firstly, the width of the black and white bar code is counted line by line, and the counting result that the total number of the black and white bar code is not 55 is removed, thereby eliminating the counting error caused by the inclination of the bar code, and obtaining the following result by counting:

x_i＝[w₁,w₂,...,w₅₅]i∈[1,L]；

wherein x is_iRepresenting a total of 55 black and white bars counted per line for a width w_k(ii) a L is the total number of rows which finally meet the condition;

the results are averaged to reduce statistical error:

wherein,

representing the average width obtained by each black strip or white strip;

calculating the unit length of the bar code:

the actual bar code is thus obtained as:

wherein,

representing each black bar or whiteNormalizing the bars to obtain a final standard width corresponding to binarization;

checking the start code and the end code if

And is

Then the Code area identification is carried out, and the Code 128C Code table is compared to obtain [ A ]₁,A₂,A₃,A₄,A₅,A₆,T]；

And finally, checking the bar code:

T^*＝(105+1*A₁+2*A₂+3*A₃+4*A₄+5*A₅+6*A₆)％103；

if T^*If the bar code is T, the bar code identification is successful;

if the result [ A ] is obtained by the above recognition₁,A₂,A₃,A₄,A₅,A₆]＝(s₁,s₂,...,s₁₂) The online silver shield is qualified, the verification is passed, otherwise, the online silver shield is packaged or the serial number identification is mistaken.