CN113723362A

CN113723362A - Method and device for detecting table line in image

Info

Publication number: CN113723362A
Application number: CN202111134050.5A
Authority: CN
Inventors: 龙伟; 郭丰俊; 丁凯; 龙腾
Original assignee: Shanghai Linguan Data Technology Co ltd; Shanghai Shengteng Data Technology Co ltd; Shanghai Yingwuchu Data Technology Co ltd; Shanghai Hehe Information Technology Development Co Ltd
Current assignee: Shanghai Linguan Data Technology Co ltd; Shanghai Shengteng Data Technology Co ltd; Shanghai Yingwuchu Data Technology Co ltd; Intsig Information Co Ltd; Shanghai Hehe Information Technology Development Co Ltd
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2021-11-30
Also published as: WO2023045298A1

Abstract

The application discloses a method for detecting a table line in an image. Step S10: and inputting the image into a semantic segmentation network to obtain a pixel set of the adjacent area of the potential table line. Step S20: and performing line segment fitting on the pixel set in the area adjacent to the table line to obtain the table line. Step S30: and removing the false table lines to obtain real table lines. Step S40: and respectively classifying all table lines into groups of all rows and all columns. Step S50: resulting in a complete structured spreadsheet. Step S60: if the spreadsheet in step S50 fails to be structured and is caused by error of detection of the form line, the typical features of the failure scene are extracted, and therefore, a difficult sample is generated, and the semantic segmentation network is retrained. By repeatedly training the semantic segmentation network, the method improves the accuracy of table line detection and is beneficial to improving the success rate of electronic table structuring.

Description

Method and device for detecting table line in image

Technical Field

The present application relates to a method of detecting a form line in an image (picture).

Background

Forms have wide application in daily life and office, and a great deal of demand exists for converting forms in pictures into electronic forms, and the automatic conversion technology generally depends heavily on the detection of form lines. The table lines include outer border lines for separating the inside of the table from the outside of the table, and inner separation lines for separating rows and columns inside the table.

Because of the image quality, the shooting angle, the uneven light, the bending and folding of the paper, the dislocation of the character area, the stamp watermark interference and the diversity of the color, thickness and style of the form line, the method can bring great challenge to the detection of the form line, and further influence the accuracy of the structure reduction of the form.

Disclosure of Invention

The technical problem to be solved by the application is to provide a method for detecting a table line in an image, which has the characteristics of high accuracy and capability of effectively assisting in table structure reduction.

In order to solve the above technical problem, the method for detecting a table line in an image proposed by the present application includes the following steps. Step S10: inputting the image into a semantic segmentation network to obtain a pixel set of a region adjacent to a potential table line; the set of pixels in the vicinity of the potential table line refers to isolated pixels in some regions where the table line may exist. Step S20: and performing line segment fitting on the pixel set in the area adjacent to the table line to obtain the table line. Step S30: and filtering the table lines obtained in the step S20 according to character line information obtained by carrying out optical character recognition on the image, and removing false table lines to obtain real table lines. Step S40: and according to the position relation among the table lines, respectively classifying all the table lines into groups of each row and each column. Step S50: and constructing cells according to the group to which the table lines belong, and storing the optical character recognition result in each cell range as character information in the cell to finally obtain the complete structured spreadsheet. Step S60: if the electronic form structuring in the step S50 fails and is caused by a form line detection error, extracting the typical features of the failure scene, generating a difficult sample according to the typical features, retraining the semantic segmentation network, and repeating the steps S10 to S50 by using the retrained semantic segmentation network until the electronic form structuring in the step S50 succeeds. By repeatedly training the semantic segmentation network, the method improves the accuracy of table line detection and is beneficial to improving the success rate of electronic table structuring.

Further, in step S10, the semantic segmentation of the image is to classify each pixel point in the image, determine the category of each point, and thereby perform region division; the semantic segmentation network is based on a deep learning algorithm and comprises one or more of a convolutional neural network, a deep convolutional neural network and a full convolutional network. This is a detailed description of step S10.

Further, in step S30, the character row information includes any one or more of a height of the character row, a width of a single character, and an angle of the character row.

Further, in step S40, the horizontal lines are sorted according to the starting end points and then processed in a loop, and when the horizontal lines are close in vertical distance and have overlapping horizontal portions, the horizontal lines are merged and deduplicated, so that the horizontal lines which logically belong to the same horizontal line but are actually detected as a plurality of horizontal lines are assembled into one horizontal line; finally, the horizontal lines of each table row are grouped into a group, and one or more horizontal lines are contained in the group according to the condition that whether the cells are combined or not; a similar approach is used for the processing of vertical lines. This is a specific explanation of step S40.

Optionally, in step S40, the process is accelerated by using a union-search algorithm.

Further, the step S60 further includes the following sub-steps. Step S61: a generic sample synthesis tool is prepared, the difficult sample synthesis tool having a plurality of adjustable parameters by which samples and labels of various features can be generated. Step S62: typical features in the scenario of a spreadsheet structuring failure due to a form line detection error are collected and analyzed. Step S63: according to the typical characteristics of the failure scenario obtained in step S62, parameters in the generic sample synthesis tool are adjusted to generate difficult samples and labels with the same characteristics. Step S64: retraining the semantic segmentation network for obtaining a set of pixels in the vicinity of a potential form line in the image with the generated difficult samples. This is a specific explanation of step S60.

Further, in step S61, the difficult sample synthesis tool abstracts the sample generation process into five parts, namely, basic background texture, table structure, text content and style, table line position and style, and stamp watermark synthesis; the parameters of the basic background texture part comprise any one or more of background pictures, background colors, texture patterns and texture colors; the parameters of the table structure part comprise any one or more of the number, the size, the position, the row and column number and the condition of merging cells of the table; the parameters of the text content and the style part comprise any one or more of a font size, a font style, a color, a position and an alignment mode; the parameters of the form line position and style part comprise any one or more of type style, thickness and pixel area of the form line; the parameters of the stamp watermark composition portion include any one or more of the number, position, angle, color of the stamp watermark.

Further, in step S62, the characteristic features of the failure scene include any one or more of a word line caused by printing misalignment or handwriting, a false line caused by a long-stroke repeated arrangement of chinese characters, a missing line caused by stamp blocking, erroneously identifying a stamp edge as a table line, a table line being difficult to distinguish from the background due to strong light shooting, cells being separated by color lines or color blocks in a complex texture sample, adjacent cells being separated by two parallel lines, and a missing table line being very short in a low dense cell.

Further, in step S63, the general sample synthesis tool generates a basic image according to the parameters of the texture portion of the basic background, generates a table structure according to the parameters of the table structure portion, generates text content and a style according to the parameters of the text content and the style portion, generates a frame line and a style according to the parameters of the table line position and the style portion, superimposes the stamp watermark according to the parameters of the stamp watermark synthesis portion, and finally synthesizes the image, the table structure, the text, the table line, and the stamp watermark of each portion into a picture with a label.

The application also provides a device for detecting the table lines in the image, which comprises a semantic segmentation unit, a line segment fitting unit, a table line filtering unit, a table line grouping unit, an electronic table structuring unit and a retraining unit. The semantic segmentation unit is used for obtaining a pixel set of a potential table line adjacent area in an input image by adopting a semantic segmentation network. The line fitting unit is used for performing line fitting on the pixel set in the area adjacent to the table line to obtain the table line. The table line filtering unit is used for filtering the table lines according to character line information obtained by carrying out optical character recognition on the image, removing false table lines and obtaining real table lines. The table line grouping unit is used for respectively grouping all the table lines into groups of rows and columns according to the position relation among the table lines. And the electronic form structuring unit is used for constructing the cells according to the groups to which the form lines belong, and storing the optical character recognition result in each cell range as the character information in the cell to finally obtain the complete structured electronic form. The retraining unit is used for extracting the typical features of a failure scene when the electronic form structuring unit fails to perform electronic form structuring and is caused by error detection of form lines, generating a difficult sample according to the typical features, and retraining the semantic segmentation network; and sending the retrained semantic segmentation network into the semantic segmentation unit, and repeatedly executing the semantic segmentation unit, the line fitting unit, the table line filtering unit, the table line grouping unit and the electronic table structuring unit until the electronic table structuring unit successfully executes electronic table structuring. The device improves the accuracy of table line detection by repeatedly training the semantic segmentation network, and is beneficial to improving the success rate of electronic table structuring.

The technical effect that this application obtained is: the table line is obtained by adopting a mode of combining a semantic segmentation network and line fitting, so that the problems of false lines and missing lines in table line detection are effectively reduced; the method aims at the table line detection of difficult scenes such as word line pressing, repeated word false lines, stamp shielding, light lines, colored lines, color blocks, dotted lines, double line separation, ultra-short lines and the like, and generates difficult samples to train a semantic segmentation network repeatedly by extracting the typical characteristics of a failed scene, so that the accuracy of the table line detection is improved.

Drawings

Fig. 1 is a schematic flowchart of a method for detecting a table line in an image according to the present application.

Fig. 2 is a schematic view of a sub-flow of step S60 in fig. 1.

Fig. 3 is a schematic structural diagram of an apparatus for detecting a table line in an image according to the present application.

The reference numbers in the figures illustrate: the method comprises the following steps of 10, 20, 30, 40, 50 and 60, wherein the steps are a semantic segmentation unit, a line segment fitting unit, a table line filtering unit, a table line grouping unit, a spreadsheet structuring unit and a retraining unit.

Detailed Description

Referring to fig. 1, the method for detecting a table line in an image according to the present application includes the following steps.

Step S10: the image is input into a Semantic Segmentation (Semantic Segmentation) network to obtain a set of pixels in the vicinity of potential table lines, namely isolated pixel points in some areas where table lines may exist. The semantic segmentation of an image is to classify each pixel point in the image, determine the category of each point, and thus perform region division, which is a prior art. Common semantic segmentation networks are based on deep learning algorithms, such as Convolutional Neural Networks (CNN), deep convolutional neural networks (dtn), Full Convolutional Networks (FCN), and the like. The step can effectively remove non-table lines in the image, remove character or background stripe interference and effectively reduce the problems of false lines and missing lines in table line detection.

Step S20: and performing line segment fitting on the pixel set in the area adjacent to the table line to obtain the table line, namely connecting the isolated pixel points predicted in the previous step into a line segment by adopting a traditional line segment fitting method.

Step S30: the table lines obtained in step S20 are filtered according to the character line information obtained by performing Optical Character Recognition (OCR) on the image, and the false table lines are removed to obtain clean real table lines. The character row information includes the height of the character row, the width of a single character, the angle of the character row, and the like.

For example, some character strokes are long, or adjacent character strokes are connected together, and may be detected as a table line in step S20, but belong to a false table line, which can be filtered out according to the height of the character line and the width of the individual character. For another example, when the length of a certain vertical table line detected in step S20 is smaller than the height of the character row, the certain vertical table line is determined to be a dummy table line. For another example, if the angle of the character line is considered to be horizontal, the vertical line is determined; if a certain table line detected in step S20 is out of the allowable angle range of the horizontal line and out of the allowable angle range of the vertical line, it is determined that the certain table line is a false table line. The allowable angle range of the horizontal line is, for example, plus or minus 15 degrees of the horizontal line. The allowable angle range of the vertical line is, for example, plus or minus 15 degrees of the vertical line.

Step S40: and according to the position relation among the table lines, respectively classifying all the table lines into groups of each row and each column. There is an inevitable case where the same table line is detected as a plurality of table lines due to factors such as poor image quality. Meanwhile, the table has the condition that the table lines belonging to the same row and the same column are divided into a plurality of table lines for format requirement. The step is to classify the horizontal lines into different row groups according to the position relation among the horizontal lines in the table lines in order to accurately restore the rows and columns of the cells; and classifying the vertical lines into different column groups according to the position relation among the vertical lines in the table lines.

For example, the horizontal line and the vertical line are distinguished by calculating the angle of the form line. For the horizontal lines, the horizontal lines are sorted according to the starting end points and then are circularly processed, merging and de-duplication are carried out when the horizontal lines which are close in vertical distance and have overlapped horizontal parts are encountered, so that the horizontal lines which logically belong to the same horizontal line but are actually detected to be a plurality of horizontal lines can be assembled into one horizontal line, and the processing process can be accelerated by using a Union-Find (Union-Find) algorithm. Finally, horizontal lines of each table row are grouped into a group, and one or more horizontal lines are contained in the group according to whether a cell is merged or not. A similar approach is used for the processing of vertical lines.

Step S50: and constructing cells according to the group to which the table lines belong, and storing the optical character recognition result in each cell range as character information in the cell to finally obtain the complete structured spreadsheet. This allows the layout of the spreadsheet to be consistent with the layout of the table in the original image.

Step S60: if the electronic form structuring in the step S50 fails and is caused by a form line detection error, extracting the typical features of the failure scene, generating a difficult sample according to the typical features, retraining the semantic segmentation network, and repeating the steps S10 to S50 by using the retrained semantic segmentation network until the electronic form structuring in the step S50 succeeds.

Referring to fig. 2, the step S60 further includes the following sub-steps.

Step S61: a general sample synthesis tool is prepared which can control the presence, size, position, style, etc. of the graphic elements in the generated sample by parameters. Therefore, when the sample is generated, the sample and the label of the corresponding characteristic can be generated only by adjusting the parameters according to the expected sample characteristic, and the data collection and data label process with higher cost is avoided.

By way of example, the difficult sample synthesis tool abstracts the sample generation process into five parts, namely, basic background texture, table structure, text content and style, table line position and style, and stamp watermarking synthesis, and various samples can be generated by flexibly configuring parameters of the parts. The parameters of the table structure part comprise the number, the size, the position, the row and column number, the merging cell condition and the like of the tables. The parameters of the text content and the style part comprise the font size, the font style, the color, the position, the alignment mode and the like. The parameters of the form line position and style section include the type style, thickness, pixel area, etc. of the form line. The parameters of the stamp watermark composition part include the number, position, angle, color, etc. of the stamp watermarks.

Step S62: typical features in the scenario of a spreadsheet structuring failure due to a form line detection error are collected and analyzed. Typical features of the failure scene include, for example, a character line caused by printing misalignment or handwriting, a false line caused by a long-stroke Chinese character longitudinal repeated arrangement, a missing line caused by stamp shielding, mistakenly identifying a stamp edge as a table line, making it difficult to distinguish the table line from the background due to strong light shooting, separating cells in a complex texture sample by color lines or color blocks, separating adjacent cells by two parallel lines, identifying a missing table line in a short dense cell, and the like.

Step S63: according to the typical characteristics of the failure scenario obtained in step S62, parameters in the generic sample synthesis tool are adjusted to generate difficult samples and labels with the same characteristics. The universal sample synthesis tool generates difficult samples and labels the generated difficult samples. Data annotation refers to the act of processing the learning data of an artificial intelligence algorithm by a data processing personnel marking tool.

As an example, the general sample synthesis tool generates a basic image according to parameters of a basic background texture part, generates a table structure according to parameters of a table structure part, generates text content and a style according to parameters of a text content and a style part, generates a frame line and a style according to parameters of a table line position and a style part, superimposes a stamp watermark according to parameters of a stamp watermark synthesis part, and finally synthesizes an image, a table structure, a text, a table line, a stamp watermark and the like of each part into a picture, wherein the picture has labels of the contents of the table structure, the table line and the like.

Step S64: retraining the semantic segmentation network for obtaining a set of pixels in the vicinity of a potential form line in the image with the generated difficult samples. The retrained semantic segmentation network is used for repeating the steps S10 to S50, and can bring more accurate segment fitting results, so that the success rate of structuring the whole spreadsheet is improved.

Referring to fig. 3, the apparatus for detecting table lines in an image according to the present application includes a semantic segmentation unit 10, a line fitting unit 20, a table line filtering unit 30, a table line grouping unit 40, an electronic table structuring unit 50, and a retraining unit 60.

The semantic segmentation unit 10 is configured to obtain a set of pixels in a region near a potential table line in an input image by using a semantic segmentation network, where the set of pixels is isolated pixels in a region where the table line may exist.

The line segment fitting unit 20 is configured to perform line segment fitting on the pixel set in the vicinity of the table line to obtain the table line, that is, the traditional line segment fitting method is adopted to connect the isolated pixel points predicted in the previous step into a line segment.

The form line filtering unit 30 is configured to filter the form lines according to the character line information obtained by performing optical character recognition on the image, and remove the false form lines to obtain real form lines.

The table line grouping unit 40 is configured to group all table lines into groups of rows and columns according to the position relationship between the table lines.

The spreadsheet structuring unit 50 is configured to construct cells according to the groups to which the table lines belong, and store the optical character recognition result in each cell range as the text information in the cell, so as to finally obtain a complete structured spreadsheet.

The retraining unit 60 is configured to, when the electronic form structuring unit 50 fails to perform electronic form structuring and is caused by a table line detection error, extract a characteristic feature of the failed scene, generate a difficult sample based on the characteristic feature, and retrain the semantic segmentation network. The retrained semantic segmentation network is sent to the semantic segmentation unit 10, and is repeatedly executed by the semantic segmentation unit 10, the line segment fitting unit 20, the table line filtering unit 30, the table line grouping unit 40 and the electronic table structuring unit 50 until the electronic table structuring unit 50 successfully executes electronic table structuring.

The method and the device for detecting the table lines in the image adopt a method of combining data driving (namely training and using a semantic segmentation network first, and generating a difficult sample according to a failure scene for retraining and using) and line fitting, and have strong robustness.

The above are merely preferred embodiments of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for detecting a table line in an image, comprising the steps of;

step S10: inputting the image into a semantic segmentation network to obtain a pixel set of a region adjacent to a potential table line; the potential table line neighborhood pixel set refers to isolated pixel points of some regions where table lines may exist;

step S20: performing line segment fitting on the pixel set in the area adjacent to the table line to obtain the table line;

step S30: filtering the table lines obtained in the step S20 according to character line information obtained by carrying out optical character recognition on the image, and removing false table lines to obtain real table lines;

step S40: according to the position relation among the table lines, respectively putting all the table lines into the groups of each row and each column;

step S50: constructing cells according to the groups to which the table lines belong, and storing the optical character recognition result in the range of each cell as character information in the cell to finally obtain a complete structured spreadsheet;

2. The method for detecting table lines in an image as claimed in claim 1, wherein in the step S10, the semantic segmentation of the image is to classify each pixel point in the image, determine the category of each point, and thereby perform region division; the semantic segmentation network is based on a deep learning algorithm and comprises one or more of a convolutional neural network, a deep convolutional neural network and a full convolutional network.

3. A method for detecting form lines in an image according to claim 1, wherein in step S30, the text line information includes any one or more of a height of a text line, a width of a single text, and an angle of a text line.

4. The method of claim 1, wherein in step S40, the horizontal lines are processed in a loop after being sorted according to the starting end points, and when the horizontal lines with close vertical distances and overlapping horizontal parts are encountered, the horizontal lines are merged and deduplicated, so that the horizontal lines which logically belong to the same horizontal line but are actually detected as a plurality of horizontal lines are assembled into one horizontal line; finally, the horizontal lines of each table row are grouped into a group, and one or more horizontal lines are contained in the group according to the condition that whether the cells are combined or not; a similar approach is used for the processing of vertical lines.

5. The method of claim 4, wherein in step S40, the process is accelerated by using a union-check set algorithm.

6. A method for detecting a form line in an image according to claim 1, wherein said step S60 further comprises the substeps of;

step S61: preparing a general sample synthesis tool having a plurality of adjustable parameters by which samples and labels of various features can be generated;

step S62: collecting and analyzing typical characteristics under the scene of spreadsheet structuralization failure caused by wrong detection of the spreadsheet lines;

step S63: according to the typical characteristics of the failure scene obtained in the step S62, adjusting parameters in the general sample synthesis tool to generate difficult samples and labels with the same characteristics;

step S64: retraining the semantic segmentation network for obtaining a set of pixels in the vicinity of a potential form line in the image with the generated difficult samples.

7. The method of claim 6, wherein in step S61, the difficult sample synthesis tool abstracts the sample generation process into five parts, namely, basic background texture, table structure, text content and style, table line position and style, and seal watermark synthesis; the parameters of the basic background texture part comprise any one or more of background pictures, background colors, texture patterns and texture colors; the parameters of the table structure part comprise any one or more of the number, the size, the position, the row and column number and the condition of merging cells of the table; the parameters of the text content and the style part comprise any one or more of a font size, a font style, a color, a position and an alignment mode; the parameters of the form line position and style part comprise any one or more of type style, thickness and pixel area of the form line; the parameters of the stamp watermark composition portion include any one or more of the number, position, angle, color of the stamp watermark.

8. The method of claim 6, wherein in step S62, the characteristic features of the failure scene include any one or more of a character line caused by printing misalignment or handwriting, a false line caused by long stroke Chinese character longitudinal repeat arrangement, a missing line caused by stamp blocking, mistakenly recognizing a stamp edge as a table line, a strong light shot causing a table line hard to distinguish from the background, a cell separated by a color line or a color block in a complex texture sample, a neighboring cell separated by two parallel lines, and a missing table line recognized by a very short table line in a low dense cell.

9. The method of claim 7, wherein in step S63, the general sample composition tool generates a base image according to parameters of a texture portion of a base background, generates a table structure according to parameters of a table structure portion, generates text content and a style according to parameters of a text content and a style portion, generates a frame line and a style according to parameters of a position and a style portion of a table line, superimposes a stamp watermark according to parameters of a stamp watermark composition portion, and finally combines the image, the table structure, the text, the table line, and the stamp watermark of each portion into a picture with a label.

10. A device for detecting table lines in an image is characterized by comprising a semantic segmentation unit, a line segment fitting unit, a table line filtering unit, a table line grouping unit, an electronic table structuring unit and a retraining unit;

the semantic segmentation unit is used for acquiring a pixel set of a potential table line adjacent region in an input image by adopting a semantic segmentation network;

the line fitting unit is used for performing line fitting on the pixel set in the area adjacent to the table line to obtain the table line;

the table line filtering unit is used for filtering the table lines according to character line information obtained by carrying out optical character recognition on the image, removing false table lines and obtaining real table lines;

the table line grouping unit is used for respectively grouping all the table lines into groups of rows and columns according to the position relation among the table lines;

the electronic form structuring unit is used for constructing cells according to the groups to which the form lines belong, and storing the optical character recognition result in each cell range as the character information in the cell to finally obtain a complete structured electronic form;

the retraining unit is used for extracting the typical features of a failure scene when the electronic form structuring unit fails to perform electronic form structuring and is caused by error detection of form lines, generating a difficult sample according to the typical features, and retraining the semantic segmentation network; and sending the retrained semantic segmentation network into the semantic segmentation unit, and repeatedly executing the semantic segmentation unit, the line fitting unit, the table line filtering unit, the table line grouping unit and the electronic table structuring unit until the electronic table structuring unit successfully executes electronic table structuring.