[go: up one dir, main page]

CN110991279B - Document Image Analysis and Recognition Method and System - Google Patents

Document Image Analysis and Recognition Method and System Download PDF

Info

Publication number
CN110991279B
CN110991279B CN201911143272.6A CN201911143272A CN110991279B CN 110991279 B CN110991279 B CN 110991279B CN 201911143272 A CN201911143272 A CN 201911143272A CN 110991279 B CN110991279 B CN 110991279B
Authority
CN
China
Prior art keywords
document image
network
recognition
prediction
image analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911143272.6A
Other languages
Chinese (zh)
Other versions
CN110991279A (en
Inventor
豆浩斌
陈博
朱风云
庞在虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Lingban Future Technology Co ltd
Original Assignee
Beijing Lingban Future Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Lingban Future Technology Co ltd filed Critical Beijing Lingban Future Technology Co ltd
Priority to CN201911143272.6A priority Critical patent/CN110991279B/en
Publication of CN110991279A publication Critical patent/CN110991279A/en
Application granted granted Critical
Publication of CN110991279B publication Critical patent/CN110991279B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a document image analysis and recognition system, which comprises: the system comprises a user operation end, an interaction center, a flow control end, a machine engine management end, a manual annotation management end, a machine terminal cluster and a manual terminal cluster; the user operation end, the flow control end, the machine engine management end and the manual annotation management end are respectively connected to the interaction center; the machine engine management end is connected with the machine terminal cluster; the manual annotation management end is connected with the manual terminal cluster. In addition, the invention also discloses a document image analysis and identification method. The document image analysis and recognition system has the efficiency of the machine and the accuracy of the manpower, provides simple operation steps and reliable processing results for users, and simultaneously has the teaching effect on the machine in the continuous iteration process by the man-machine coupling mode, thereby gradually enhancing the performance of the machine and reducing the participation degree of the manpower.

Description

Document image analysis and recognition method and system
Technical Field
The invention relates to the technical field of document image analysis and recognition, in particular to a document image analysis and recognition method and system.
Background
Optical character recognition (Optical Character Recognition, abbreviated as OCR) is a technique that optically converts text in a paper document into an image file of a pixel lattice and converts the text in the image into a text format by recognition software for further editing by word processing software.
Document image analysis and recognition (Document Image Analysis And Recognition, DIAR for short) is a technology that uses computer vision to analyze the physical and logical structure of a document image, locate and recognize elements (e.g., text, tables, images, graphics, etc.) within the document, and thereby form a complete description of the document.
A distributed software system is a software system that supports distributed processing, which is a system that performs tasks on a multi-processor architecture interconnected by a communications network.
In the prior art, the rudiment of the document image analysis and recognition technology is the traditional optical character recognition technology, the optical character recognition technology mainly processes and recognizes text parts in the document image, with the gradual improvement of the software and hardware capability of a computer and the requirements of people on higher layers and more aspects of document image processing, more technologies related to document image processing are deeply researched, such as page segmentation, layout analysis, chart analysis and the like, and complete analysis and description of different layers of the document image are realized, so that advanced functions such as document retrieval, abstract generation, knowledge extraction and the like can be better completed. Current document image analysis and recognition systems generally include the following processing steps:
1. Image preprocessing, including noise removal and distortion correction, to obtain a regular, easy-to-process document image;
2. page segmentation, namely dividing a document image into a plurality of consistency areas, such as texts, graphics, images, tables and the like;
3. layout analysis, namely analyzing the hierarchical structure relation of the document image, wherein the hierarchical structure relation comprises the relative position and the spatial layout of a physical layer, and semantic labels such as headers, footers, titles, chapters, paragraphs, headers, notes and the like of a logic layer;
4. the chart is analyzed, and the chart is a structured and visualized information presentation mode. The chart analysis extracts the structural information presented by the chart through analyzing the internal structure of the chart;
5. text positioning and recognition, namely determining the position information and text content of texts in a document, and dividing the positioning and recognition into text rows and single character positioning and recognition according to different processing algorithms;
6. structural description and format conversion of documents. The document structure obtained by analysis is described, stored and transmitted in a specific format, and can be converted into a common document format such as MS Word, PDF, HTML and the like.
However, the inventors have found that the document image analysis and recognition system in the prior art mainly has the following problems: 1. the function is incomplete, only a plurality of functions in document image analysis and recognition can be provided, a plurality of types of objects are recognized, and complete description of the hierarchical structure of the document image cannot be formed; 2. the accuracy is not high, and higher recognition accuracy cannot be ensured for document images with poor quality and complex formats; 3. the lack of sophisticated manual proofing tools and services results in poor user experience.
In addition, the prior art solutions do not provide a complete process flow for cost or efficiency reasons, lack some of the processing steps, and thus make it difficult to obtain a complete description of the document information; in addition, the prior art scheme only provides software tools, lacks functions of subsequent verification and the like, needs to be additionally solved by a user, and increases the use difficulty.
Disclosure of Invention
Aiming at the problems, the invention can provide a perfect document image analysis and identification solution, generate a perfect description of the hierarchical structure of the document image, and ensure the efficiency and accuracy of the whole processing flow in a distributed man-machine coupling mode.
Based on the above, a document image analysis and recognition method is specifically provided, which comprises the following steps:
step 1, a message communication end of a document image analysis and identification system receives a task initiation message sent by a user operation end, and the document image analysis and identification system starts a document image analysis and identification processing task;
step 2, obtaining a document image to be processed, inputting the document image to be processed into the document image analysis and recognition system, and obtaining basic information of the document to be processed;
Step 3, page segmentation is carried out on the document image to be processed, segmentation tasks of all page images in the document image to be processed are generated simultaneously in a message queue mode, the segmentation tasks are sent to a machine engine terminal for executing the tasks through a machine engine management terminal, an initial page segmentation result obtained after preprocessing of the machine engine terminal is forwarded to a manual labeling terminal, and a final page segmentation result after manual correction of a manual labeling person is returned to a flow control terminal of the document image analysis and recognition system for updating the page segmentation result;
step 4, obtaining initial information of the form analysis processing after the page segmentation processing is completed, simultaneously generating all form analysis tasks of the document image to be processed by adopting a message queue method, sending the form analysis tasks to a machine engine terminal for executing the tasks through a machine engine management end, forwarding an initial form analysis result obtained after the pretreatment of the machine engine terminal to a manual labeling terminal, and returning a final form analysis result after manual correction of a manual labeling person to a flow control end of the document image analysis and recognition system for updating the form analysis result;
Step 5, obtaining initial information of text detection after the page segmentation processing and the form analysis processing are completed, simultaneously generating all text detection tasks of a document image to be processed in a message queue mode, sending the text detection tasks to a machine engine terminal for executing the tasks through a machine engine management end, forwarding an initial text detection result obtained after preprocessing of the machine engine terminal to a manual labeling terminal, and returning a final text detection result subjected to manual correction by a manual labeling person to a flow control end of the document image analysis and recognition system for updating the text detection result;
step 6, obtaining initial information of text recognition after the text detection is completed, generating all text recognition tasks of a document image to be processed in a message queue mode, sending the text recognition tasks to a machine engine terminal for executing the tasks through a machine engine management end, forwarding an initial text recognition result obtained after preprocessing of the machine engine terminal to a manual labeling terminal, and returning a final text recognition result subjected to manual correction by a manual labeling person to a flow control end of the document image analysis and recognition system for updating the text recognition result;
And 7, integrating labeling results of different layers by the document image analysis and recognition system and leading out an electronic document file when the tasks of page segmentation, form analysis, text detection and text recognition of the document image to be processed are completed.
In one embodiment, acquiring the image of the document to be processed includes acquiring a page image of the document to be processed by scanning or photographing, and recording basic information of the document to be processed, wherein the basic information includes a name, an author, a publishing mechanism and a publishing date.
In one embodiment, the machine engine terminal runs a document image analysis and recognition model based on a deep neural network, determines model output required to be invoked according to the current document image analysis and recognition processing steps, and returns a processing result to a machine engine management end.
In one embodiment, the deep neural network-based document image analysis and recognition model comprises an input layer, a feature extraction network, a multi-task prediction network and a multi-task output layer; the input layer is connected to the feature extraction network, the feature extraction network is connected to the multi-task prediction network, and the multi-task prediction network is connected to the multi-task output layer;
The input layer receives an input page image, wherein the input page image is a page image in a document to be processed currently; the characteristic extraction network is a stacked multilayer convolutional neural network; the multi-task prediction network is a multi-layer prediction network exclusive to corresponding tasks respectively constructed for different prediction tasks; the multi-task output layer outputs output results of different prediction networks.
In one embodiment, the feature extraction network is a stacked multi-layer convolutional neural network, each layer of convolutional neural network is a nonlinear mapping output by a previous layer of convolutional neural network, and the input page image is represented and described through multiple nonlinear mappings, and the representing features are extracted and output; the representative feature of the page image obtained through the feature extraction network is a shared feature which is shared by various prediction tasks; the multiple prediction tasks comprise page segmentation, form analysis, text detection and text recognition;
the multi-task prediction network comprises a page segmentation prediction network, a form analysis prediction network, a text detection prediction network and a text recognition prediction network, which are respectively used for realizing the prediction tasks of page segmentation, form analysis, text detection and text recognition; the page segmentation prediction network, the form analysis prediction network, the text detection prediction network and the text recognition prediction network share input features, namely different prediction networks share the representation features output by the feature extraction network; the multi-task prediction network determines different prediction network structures according to different prediction tasks respectively;
The multi-task output layer comprises a page segmentation result output by the page segmentation prediction network, a table analysis result output by the table analysis prediction network, a text detection result output by the text detection prediction network and a text recognition result output by the text recognition prediction network.
In addition, in order to solve the technical problem in the prior art, a document image analysis and recognition system is specially provided, which comprises a user operation end, an interaction center, a flow control end, a machine engine management end, a manual annotation management end, a machine terminal cluster and a manual terminal cluster;
the user operation end, the flow control end, the machine engine management end and the manual annotation management end are respectively connected to the interaction center; the machine engine management end is connected with the machine terminal cluster; the manual annotation management end is connected with the manual terminal cluster;
the interaction center comprises a data storage end and a message communication end; the data storage end is used for storing data uploaded by a user, results obtained after analysis and identification processing of the document image and data required by interaction between different modules and terminals in the document image analysis and identification system; the message communication terminal is used for establishing and completing message communication between each module and each terminal in the document image analysis and recognition system;
The user operation terminal is used for performing operations of system login, data uploading, task initiation, progress checking, result downloading and recharging payment by a user;
the flow control end is a processing center of the document image analysis and recognition system and is used for controlling a human-computer coupled document image analysis and recognition processing flow and storing key data in the document image analysis and recognition processing process;
wherein the machine terminal cluster comprises a plurality of machine engine terminals; the machine engine management end is used for managing and scheduling the machine engine terminals, determining operation steps by receiving information sent by the flow control end, and issuing operation tasks to corresponding execution terminals according to the current running states of the machine engine terminals; when the operation is completed, returning a corresponding message to the flow control end;
the artificial terminal cluster comprises a plurality of artificial labeling terminals; the manual annotation management end is used for managing and scheduling the manual annotation terminals, determining operation steps by receiving the information sent by the flow control end, and issuing operation tasks to corresponding execution terminals according to the running states of the current manual annotation terminals; and after the operation is finished, returning a corresponding message to the flow controller.
In one embodiment, the human-computer coupled document image analysis and recognition processing flow includes that the flow control end receives a task initiation message of a user operation end through the message communication end, so that a document image analysis and recognition processing flow is started; the flow control end obtains the completed step of the current task, decides the next processing step and sends the next processing step to the machine engine management end or the manual annotation management end through the message communication end; after the processing flow is finished, the flow control end sends a message to the user operation end, so that the user operation end updates the current task completion state.
In one embodiment, the cluster of machine terminals includes a plurality of machine engine terminals; uniformly numbering all machine engine terminals in the system, and uniformly managing and allocating by a machine engine management end; the machine engine terminal runs a document image analysis and recognition model based on the deep neural network, determines model output required to be called according to the current document image analysis and recognition processing steps, and returns a processing result to the machine engine management terminal;
the manual terminal cluster comprises a plurality of manual labeling terminals, wherein the manual labeling terminals correspond to manual labeling operators, all the manual labeling operators and the corresponding manual labeling terminals are uniformly numbered, and the manual labeling management terminals uniformly manage and allocate the manual labeling operators and the corresponding manual labeling terminals; and the manual annotation staff checks and modifies the current annotation result in the manual annotation terminal, determines an operation page required to be called by the manual annotation terminal according to the current processing step, and returns the annotation result to the manual annotation management terminal.
In one embodiment, the deep neural network-based document image analysis and recognition model comprises an input layer, a feature extraction network, a multi-task prediction network and a multi-task output layer; the input layer is connected to the feature extraction network, the feature extraction network is connected to the multi-task prediction network, and the multi-task prediction network is connected to the multi-task output layer;
the input layer receives an input page image, wherein the input page image is a page image in a document to be processed currently; the characteristic extraction network is a stacked multilayer convolutional neural network; the multi-task prediction network is a multi-layer prediction network exclusive to corresponding tasks respectively constructed for different prediction tasks; the multi-task output layer outputs output results of different prediction networks.
In one embodiment, the feature extraction network is a stacked multi-layer convolutional neural network, each layer of convolutional neural network is a nonlinear mapping output by a previous layer of convolutional neural network, and the input page image is represented and described through multiple nonlinear mappings, and the representing features are extracted and output; the representative feature of the page image obtained through the feature extraction network is a shared feature which is shared by various prediction tasks; the multiple prediction tasks comprise page segmentation, form analysis, text detection and text recognition;
The multi-task prediction network comprises a page segmentation prediction network, a form analysis prediction network, a text detection prediction network and a text recognition prediction network, which are respectively used for realizing the prediction tasks of page segmentation, form analysis, text detection and text recognition; the page segmentation prediction network, the form analysis prediction network, the text detection prediction network and the text recognition prediction network share input features, namely different prediction networks share the representation features output by the feature extraction network; the multi-task prediction network determines different prediction network structures according to different prediction tasks respectively;
the multi-task output layer comprises a page segmentation result output by the page segmentation prediction network, a table analysis result output by the table analysis prediction network, a text detection result output by the text detection prediction network and a text recognition result output by the text recognition prediction network.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the document image analysis and recognition processing method disclosed by the invention has a machine initial judgment and manual verification process in each step, and the final processing result of each step is used as the initial condition of the next step, so that the whole document image analysis and recognition processing system can have the efficiency of the machine and the manual accuracy. In the document image analysis and recognition processing system disclosed by the invention, a plurality of nodes distributed on a network by manpower and machines are organically integrated and communicated through a flow control end, a data storage end and a message communication end, and finally provided for users in a distributed network service mode, so that simple operation steps and reliable processing results are provided for the users. The man-machine coupling mode can play a teaching role on the machine in the continuous iteration process, so that the performance of the machine is gradually enhanced, and the artificial participation degree is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other schematic diagrams may be obtained according to these drawings without inventive effort for a person skilled in the art;
wherein:
FIG. 1 is a schematic diagram of a document image analysis and recognition model based on a deep neural network in the present invention;
FIG. 2 is a schematic diagram of a system for analyzing and recognizing human-machine depth coupled distributed document images according to the present invention;
FIG. 3 is a flowchart of a document image analysis and recognition method according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the invention, a document image analysis and recognition model based on a deep neural network is firstly constructed, and the document image analysis and recognition model has a multitasking output, and can output the results of a plurality of different processing stages at the same time. In order to avoid the increase of model complexity and calculation amount caused by multitasking output, the document image analysis and recognition model adopts a unique feature sharing mode to improve the operation efficiency;
the deep neural network comprises a full-connection layer, a convolution layer, a cyclic connection layer, a pooling layer and a normalization layer;
the full connection layer is used for realizing the integral transformation of the input characteristics by connecting each output node with all input nodes; the fully connected layer can be expressed as Fc (x; ch; o, g (∙)) = g (Wx+b), where x ε R; (as ch; i x 1) is the input feature vector, W ε R; (as ch; o x;) ch; i) is the weight, b ε R; (as per; ch; o x 1) is the bias, as per, # of channels as input features, as per ch;, as per channel number as output features, and g (∙) is the activation function; the types of the activation functions include five types, linear, sigmoid, tanh, reLU and SoftMax.
The convolution layer realizes local transformation processing of input features in a shared local connection mode; the convolution layer may be expressed as Conv (x; h_k, w_k,) ch_i, [ x ] k, [ x ] sy_k, g (∙) ] g (wx+b), where symbol x is a convolution operator, x∈r (h_i x_i×) ch_i) is an input feature map, w∈r (h_k x_k× ×) ch_o× ch_i) is a convolution kernel weight, b∈r (k_o×1) is a bias, b∈r (k_o×1) is a channel number of an input feature, x_k is a convolution kernel lateral movement step size, we_k is a convolution kernel longitudinal movement step size, and g is an activation function (∙).
The circulating connection layer feeds back the output of the deep neural network as the input of the deep neural network, so that the characteristic extraction and transformation of the serialized signals are realized; the loop connection layer can be expressed as Rnn (x_t, h_ (t-1); ch_ i, [ ch_ o, g (∙) ] g (wx_t+uh_ (t-1) +b), where x_ t_ (ch_ i_ 1) is the input feature vector at time t, h_ (t-1) is the output feature vector at the previous time, w_ r_ (ch_ o×_ i) and u_ r_ (ch_ o×_, ch_ o_, respectively) are the mapping weights of the input feature at the current time and the output feature at the previous time, b_ r_ (_ o_ 1) is the bias, w_ o_ is the number of channels of the input feature, w_ o is the number of channels of the output feature, and w_ o_ is the activation function (∙).
The pooling layer includes two types: a maximum pooling layer and an average pooling layer. The pooling layer may be expressed as Pool (x; h_k, w_k, [ sx ] k, [ sy ] k), which mainly performs aggregation and aggregation of local areas of input features, finds the maximum value of the local area by the maximum pooling layer, and finds the average value of the local area by the average pooling layer. Where h_k is the width of the pooled local area, w_k is the height of the pooled local area, sx_k is the horizontal movement step of the pooled area, sy_k is the vertical movement step of the pooled area.
The normalization layer realizes the normalization transformation of the input features by means of mean variance normalization, batch normalization and the like; in particular, a batch normalization approach may be employed to achieve a normalized transformation of the input features.
As shown in fig. 1, the deep neural network-based document image analysis and recognition model includes an input layer 11, a feature extraction network 12, a multi-task prediction network 13, and a multi-task output layer 14; the input layer 11 is connected to the feature extraction network 12, the feature extraction network 12 is connected to the multi-tasking prediction network 13, and the multi-tasking prediction network 13 is connected to the multi-tasking output layer 14;
The input layer 11 receives an input page image, wherein the input page image is a page image in a document to be processed currently;
the feature extraction network 12 is a stacked multi-layer convolutional neural network, each layer of network is a nonlinear mapping output by a previous layer of network, and the effective representation and description of the input page image are realized through multiple nonlinear mappings, and the shared features are extracted and output;
in particular, the feature extraction network 12 employs a convolutional neural network comprising 13 convolutional layers and 1 pooling layer, the first layer of the feature extraction network being the convolutional layer with parameters Conv (5, 1,16,1, relu), the next layer to be connected being the pooling layer with parameters Pool (3,3,2,2); subsequently connected are 6 residual modules, which are added with a cross-layer connection directly from input to output on the basis of a normal multi-layer serial connection, each residual module is composed of 2 convolution layers, wherein, the 2 convolutional layer parameters of the first residual block are Conv (3, 16,32,2, relu) and Conv (3, 32,1, relu), respectively, the 2 convolutional layer parameters of the second residual block are Conv (3, 32,64,2, relu) and Conv (3, 64,1, relu), respectively, the 2 convolutional layer parameters of the third residual block are Conv (3, 64,128,2, relu) and Conv (3, 128,1, relu), respectively, the 2 convolutional layer parameters of the fourth residual block are Conv (3, 128,256,2, relu) and Conv (3, 256,1, relu), respectively, the 2 convolution layer parameters of the fourth residual error module are Conv respectively (3, 128,256,2, reLU) and Conv (3, 256,1, reLU); the shared feature is a representative feature of the page image acquired through the feature extraction network 12, and the representative feature is shared by a plurality of prediction tasks; the multiple prediction tasks comprise page segmentation, form analysis, text detection and text recognition;
Specifically, in order to realize multi-scale feature description of an input image, scaling output features of different layers of a feature extraction network into uniform sizes through image interpolation processing, and respectively adding convolution layers with parameters of Conv (1, [ Ch ] o, [ Ch ] i,1, reLU ] to convert the number of output channels into a specific size, wherein [ Ch ] o is the number of channels of the input features, and [ Ch ] i is the number of channels of the output features; for the output characteristics of 6 residual modules in the characteristic extraction network, respectively converting the output characteristics of each residual module into the number of output channels of the index Ch, the index I and the index 32, and then splicing the one dimension of the output channels to finally obtain the sharing characteristics of the output channels of the index 6 multiplied by 32 and the index 192;
the multi-task prediction network 13 is a task-specific multi-layer prediction network respectively constructed for different tasks, and comprises a page segmentation prediction network, a form analysis prediction network, a text detection prediction network and a text recognition prediction network, which are respectively used for realizing the prediction tasks of page segmentation, form analysis, text detection and text recognition; the page segmentation prediction network, the form analysis prediction network, the text detection prediction network and the text recognition prediction network share the same input features, namely share the shared features output by the feature extraction network; determining different prediction network structures according to the respective characteristics of the tasks;
In particular, the multi-layer prediction network of the multi-tasking prediction network 13 shares the same input features.
Wherein the page segmentation prediction network comprises a layer of convolutional network Conv (3, 192,5,1, softMax), wherein _cho=5 indicates that the page area comprises 5 types of background, text, image, table and segmentation lines;
the table analysis prediction network predicts the position and the orientation of the table grid line, and adopts a layer of convolution network, the structure of which is Conv (3, 192,2,1, sigmoid), wherein the _ch=2 represents 2 predicted values of the position and the orientation of the table grid line;
the text detection prediction network predicts the position and orientation information of the text line, and the adopted convolution network structure can be expressed as Conv (3, 192,6,1, sigmoid), wherein _ch_o=6 represents the probability score of the text line, the positions of the four frames on the upper, lower, left and right and the whole orientation 6 prediction values.
The text recognition prediction network needs to firstly convert the characteristics of a corresponding text line region into sequence characteristics with unified characteristic dimension through a space transformation network according to the position and orientation information of text lines, describes and characterizes the sequence relationship by adding a cyclic network Rnn (192, 256, tanh), and finally obtains a final prediction result by adding a layer of convolution network, wherein the structure of the convolution network is Conv (1, 256, charNum,1, softMax), and the _o=CharNum is the number of character categories to be recognized;
The multitasking output layer 14 outputs output results of different task prediction networks, namely the page segmentation prediction network outputs page segmentation results, the form analysis prediction network outputs form analysis results, the text detection prediction network outputs text detection results, and the text recognition prediction network outputs text recognition results; the output result is a final result or an intermediate result of the prediction task, and the intermediate result is subjected to post-processing to obtain a final result; there is a constraint relationship between the different output results.
During operation, the shared feature needs to be calculated only once and cached in the data storage. Under the condition of given sharing characteristics, different task prediction networks are relatively independent, the task prediction network to be operated is determined according to the task information of the flow control end, and a corresponding prediction result is obtained.
In order to sufficiently train the document image analysis and recognition model based on the deep neural network, the automatic document image generation method based on program synthesis is adopted to generate the image and simultaneously generate the annotation information of a plurality of corresponding output results required by model training. The document image analysis and recognition model constructed by adopting the automatic document image generation method based on program synthesis and the machine engine running the model can quickly and completely analyze the input document image to be processed.
In the invention, a distributed document image analysis and recognition system based on man-machine depth coupling is shown in fig. 2.
The document image analysis and recognition system comprises a user operation end 21, an interaction center 22, a flow control end 23, a machine engine management end 24, a manual annotation management end 25, a machine terminal cluster 26 and a manual terminal cluster 27;
the user operation end 21, the flow control end 23, the machine engine management end 24 and the manual annotation management end 25 are respectively connected to the interaction center 22; the machine engine management end 24 is connected with the machine terminal cluster 26; the manual annotation management end 25 is connected with the manual terminal cluster 27;
wherein, the interaction center 22 includes a data storage end 221 and a message communication end 222; the data storage 221 is configured to store data uploaded by a user, a processed result, and data required for interaction between different modules and terminals; the message communication end 222 is used for establishing and completing message communication between each module and terminal in the document image analysis and recognition system;
the user operation end 21 is used for performing operations of system login, data uploading, task initiation, progress checking, result downloading and recharging payment by a user;
The process control end 23 is a processing center of the document image analysis and recognition system, and is used for controlling a human-computer coupled document image analysis and recognition processing process and storing key data in a processing process;
specifically, the process control end 23 receives the task initiation message of the user operation end 21 through the message communication end 222, so as to start a document image analysis and recognition processing process; the flow control end 23 obtains the completion step of the current task and decides the next processing step, and sends the next processing step to the machine engine management end 24 or the manual annotation management end 25 through the message communication end 222; when the whole processing flow is completed, the flow control end 23 sends a message to the user operation end 21 to enable the user operation end 21 to update the current task completion state;
wherein the cluster of machine terminals 26 includes a plurality of machine engine terminals; the machine engine terminals are uniformly numbered, and are uniformly managed and allocated by the machine engine management end 26; the machine engine terminal runs a document image analysis and recognition model based on the deep neural network, determines model output required to be called according to the current processing step, and returns a processing result to the machine engine management terminal;
The machine engine management end 24 is used for managing and scheduling machine engines, determining the next operation by receiving the information sent by the flow controller, and issuing operation tasks to corresponding execution terminals according to the running states of the current machine engine terminals; when the operation is completed, returning a corresponding message to the flow controller;
wherein the artificial terminal cluster 27 comprises a plurality of artificial annotation terminals; the manual labeling terminals correspond to manual labeling operators, all the manual labeling operators and the corresponding manual labeling terminals are numbered uniformly, and the manual labeling management terminal 25 performs uniform management and allocation; checking and modifying the current labeling result in the manual labeling terminal, determining an operation page to be called according to the current processing step, and returning the labeling result to the manual labeling management terminal 25;
the manual labeling management end 25 is used for managing and scheduling manual labeling terminals, determining the next operation by receiving the information sent by the flow controller 23, and issuing operation tasks to corresponding execution terminals according to the running states of the current manual labeling terminals; when the operation is completed, a corresponding message is returned to the flow controller 23.
As shown in fig. 3, in order to improve the stability of the system, the invention further provides a method for analyzing and identifying a document image by human-machine deep coupling, so that the system can have the efficiency of a machine and the accuracy of manpower, and the method for analyzing and identifying the document image comprises the following steps:
step 1, a message communication terminal receives a task initiation message of a user operation terminal, so that a document image analysis and recognition processing task is started;
step 2, obtaining a document image to be processed, inputting the document image to be processed into the document image analysis and recognition system, and obtaining basic information of the document to be processed;
specifically, acquiring an image of a document to be processed comprises acquiring a page image of the document to be processed in a scanning or photographing mode, and recording basic information of the document to be processed, wherein the basic information comprises a name, an author, a publishing mechanism and a publishing date;
step 3, page segmentation is carried out on the document to be processed, segmentation tasks of all page images are generated simultaneously in a message queue mode, the segmentation tasks are sent to a machine engine, the machine engine processes the initial results and then transfers the initial results to a manual marking system, and the results after manual correction are returned to a flow control end for updating the page segmentation results;
Step 4, obtaining initial information of form analysis after the page segmentation is completed, generating all form analysis tasks simultaneously by adopting a message queue method, and returning a final result to update a form analysis result after machine engine pretreatment and manual verification in sequence;
step 5, obtaining initial information of text detection after page segmentation and form analysis are completed, generating all text detection tasks in a message queue mode, and returning a final result to update a text detection result through machine engine pretreatment and manual verification in sequence;
step 6, obtaining initial information of text recognition after the text detection is completed, generating all text recognition tasks in a message queue mode, and returning a final result to update a text recognition result through machine engine pretreatment and manual verification in sequence;
and 7, when all processing tasks of the document image to be processed are completed, integrating the labeling results of different layers, and exporting the electronic document file in a specific format.
The process control end receives a task initiating message of a user operation end through the message communication end, so that a document image analysis and identification process flow is started; the flow control end obtains the completed step of the current task and decides the next processing step, and sends the next processing step to the machine engine management end 24 or the manual annotation management end 25 through the message communication end; after the processing flow is completed, the flow control end sends a message to the user operation end 21, so that the user operation end 21 updates the current task completion state.
Wherein the cluster of machine terminals 26 includes a plurality of machine engine terminals; the machine engine terminals in the system are uniformly numbered, and are uniformly managed and allocated by the machine engine management end 24; the machine engine terminal runs a document image analysis and recognition model based on the deep neural network, determines model output required to be called according to the current document image analysis and recognition processing steps, and returns a processing result to the machine engine management end 24;
wherein the artificial terminal cluster 27 comprises a plurality of artificial labeling terminals, the artificial labeling terminals correspond to artificial labeling operators, all the artificial labeling operators and the corresponding artificial labeling terminals are uniformly numbered, and the artificial labeling management terminal 25 uniformly manages and distributes the artificial labeling operators and the corresponding artificial labeling terminals; the manual labeling operator checks and modifies the current labeling result in the manual labeling terminal, determines an operation page required to be called by the manual labeling terminal according to the current processing step, and returns the labeling result to the manual labeling management end 25.
The document image analysis and recognition model based on the deep neural network comprises an input layer 11, a feature extraction network 12, a multi-task prediction network 13 and a multi-task output layer 14 as shown in fig. 1; the input layer 11 is connected to the feature extraction network 12, the feature extraction network 12 is connected to the multi-tasking prediction network 13, and the multi-tasking prediction network 13 is connected to the multi-tasking output layer 14;
The input layer 11 receives an input page image, wherein the input page image is a page image in a document to be processed currently; the characteristic extraction network is a stacked multilayer convolutional neural network; the multi-task prediction network is a multi-layer prediction network exclusive to corresponding tasks respectively constructed for different prediction tasks; the multi-task output layer outputs output results of different prediction networks.
The feature extraction network 12 is a stacked multi-layer convolutional neural network, each layer of convolutional neural network is a nonlinear mapping output by a previous layer of convolutional neural network, and represents and describes an input page image through multiple nonlinear mappings, and extracts and outputs representing features; the representative feature of the page image obtained through the feature extraction network is a shared feature which is shared by various prediction tasks; the plurality of prediction tasks comprise page segmentation, form parsing, text detection and text recognition.
The multi-task prediction network 13 includes a page segmentation prediction network, a form analysis prediction network, a text detection prediction network, and a text recognition prediction network, which are respectively used for realizing the prediction tasks of page segmentation, form analysis, text detection, and text recognition; the page segmentation prediction network, the form analysis prediction network, the text detection prediction network and the text recognition prediction network share input features, namely different prediction networks share the representation features output by the feature extraction network; the multi-task prediction network 13 determines its different prediction network structures according to different prediction tasks, respectively.
The multitasking output layer 14 includes a page segmentation result output by the page segmentation prediction network, a form analysis result output by the form analysis prediction network, a text detection result output by the text detection prediction network, and a text recognition result output by the text recognition prediction network.
In the document image analysis and recognition processing method of human-machine deep coupling, the human-machine deep coupling refers to the process of machine initial judgment and manual verification for each step in document image analysis and recognition processing, and the final processing result of each step is taken as the initial condition of the next step.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the invention adopts the deep neural network model based on the sharing characteristics of a plurality of tasks as a machine engine of the document image analysis and recognition system, and a plurality of outputs enable the model to provide analysis and recognition results of each level of the document image, and meanwhile, the whole model can still keep higher operation efficiency due to the adoption of the characteristic sharing method. In actual operation, the shared feature of a certain page image is cached in the system, and each level task for executing the page image can be directly loaded into the cached feature without repeated calculation.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A document image analysis and recognition method, comprising:
step 1, a message communication end of a document image analysis and identification system receives a task initiation message sent by a user operation end, and the document image analysis and identification system starts a document image analysis and identification processing task;
step 2, obtaining a document image to be processed, inputting the document image to be processed into the document image analysis and recognition system, and obtaining basic information of the document to be processed;
step 3, page segmentation is carried out on the document image to be processed, segmentation tasks of all page images in the document image to be processed are generated simultaneously in a message queue mode, the segmentation tasks are sent to a machine engine terminal for executing the tasks through a machine engine management terminal, an initial page segmentation result obtained after preprocessing of the machine engine terminal is forwarded to a manual labeling terminal, and a final page segmentation result after manual correction of a manual labeling person is returned to a flow control terminal of the document image analysis and recognition system for updating the page segmentation result;
Step 4, obtaining initial information of the form analysis processing after the page segmentation processing is completed, simultaneously generating all form analysis tasks of the document image to be processed by adopting a message queue method, sending the form analysis tasks to a machine engine terminal for executing the tasks through a machine engine management end, forwarding an initial form analysis result obtained after the pretreatment of the machine engine terminal to a manual labeling terminal, and returning a final form analysis result after manual correction of a manual labeling person to a flow control end of the document image analysis and recognition system for updating the form analysis result;
step 5, obtaining initial information of text detection after the page segmentation processing and the form analysis processing are completed, simultaneously generating all text detection tasks of a document image to be processed in a message queue mode, sending the text detection tasks to a machine engine terminal for executing the tasks through a machine engine management end, forwarding an initial text detection result obtained after preprocessing of the machine engine terminal to a manual labeling terminal, and returning a final text detection result subjected to manual correction by a manual labeling person to a flow control end of the document image analysis and recognition system for updating the text detection result;
Step 6, obtaining initial information of text recognition after the text detection is completed, generating all text recognition tasks of a document image to be processed in a message queue mode, sending the text recognition tasks to a machine engine terminal for executing the tasks through a machine engine management end, forwarding an initial text recognition result obtained after preprocessing of the machine engine terminal to a manual labeling terminal, and returning a final text recognition result subjected to manual correction by a manual labeling person to a flow control end of the document image analysis and recognition system for updating the text recognition result;
and 7, integrating labeling results of different layers by the document image analysis and recognition system and leading out an electronic document file when the tasks of page segmentation, form analysis, text detection and text recognition of the document image to be processed are completed.
2. The document image analysis and recognition method according to claim 1, wherein,
the method comprises the steps of obtaining a page image of a document to be processed through scanning or photographing, and recording basic information of the document to be processed, wherein the basic information comprises names, authors, publishing and publishing mechanisms and publishing dates.
3. The document image analysis and recognition method according to claim 1, wherein,
the machine engine terminal runs a document image analysis and identification model based on the deep neural network, determines model output required to be invoked according to the current document image analysis and identification processing steps, and returns a processing result to the machine engine management terminal.
4. The document image analysis and recognition method according to claim 3, wherein,
the document image analysis and recognition model based on the deep neural network comprises an input layer, a feature extraction network, a multi-task prediction network and a multi-task output layer; the input layer is connected to the feature extraction network, the feature extraction network is connected to the multi-task prediction network, and the multi-task prediction network is connected to the multi-task output layer;
the input layer receives an input page image, wherein the input page image is a page image in a document to be processed currently; the characteristic extraction network is a stacked multilayer convolutional neural network; the multi-task prediction network is a multi-layer prediction network exclusive to corresponding tasks respectively constructed for different prediction tasks; the multi-task output layer outputs output results of different prediction networks.
5. The document image analysis and recognition method according to claim 4, wherein,
the feature extraction network is a superposed multi-layer convolutional neural network, each layer of convolutional neural network is a nonlinear mapping output by a previous layer of convolutional neural network, and the input page image is represented and described through multiple nonlinear mappings, and the representing features are extracted and output; the representative feature of the page image obtained through the feature extraction network is a shared feature which is shared by various prediction tasks; the multiple prediction tasks comprise page segmentation, form analysis, text detection and text recognition;
the multi-task prediction network comprises a page segmentation prediction network, a form analysis prediction network, a text detection prediction network and a text recognition prediction network, which are respectively used for realizing the prediction tasks of page segmentation, form analysis, text detection and text recognition; the page segmentation prediction network, the form analysis prediction network, the text detection prediction network and the text recognition prediction network share input features, namely different prediction networks share the representation features output by the feature extraction network; the multi-task prediction network determines different prediction network structures according to different prediction tasks respectively;
The multi-task output layer comprises a page segmentation result output by the page segmentation prediction network, a table analysis result output by the table analysis prediction network, a text detection result output by the text detection prediction network and a text recognition result output by the text recognition prediction network.
6. The document image analysis and recognition system is characterized by comprising a user operation end, an interaction center, a flow control end, a machine engine management end, a manual annotation management end, a machine terminal cluster and a manual terminal cluster;
the user operation end, the flow control end, the machine engine management end and the manual annotation management end are respectively connected to the interaction center; the machine engine management end is connected with the machine terminal cluster; the manual annotation management end is connected with the manual terminal cluster;
the interaction center comprises a data storage end and a message communication end; the data storage end is used for storing data uploaded by a user, results obtained after analysis and identification processing of the document image and data required by interaction between different modules and terminals in the document image analysis and identification system; the message communication terminal is used for establishing and completing message communication between each module and each terminal in the document image analysis and recognition system;
The user operation terminal is used for performing operations of system login, data uploading, task initiation, progress checking, result downloading, recharging and payment by a user;
the flow control end is a processing center of the document image analysis and recognition system and is used for controlling a human-computer coupled document image analysis and recognition processing flow and storing key data in the document image analysis and recognition processing process;
wherein the machine terminal cluster comprises a plurality of machine engine terminals; the machine engine management end is used for managing and scheduling the machine engine terminals, determining operation steps by receiving information sent by the flow control end, and issuing operation tasks to corresponding execution terminals according to the current running states of the machine engine terminals; when the operation is completed, returning a corresponding message to the flow control end;
the artificial terminal cluster comprises a plurality of artificial labeling terminals; the manual annotation management end is used for managing and scheduling the manual annotation terminals, determining operation steps by receiving the information sent by the flow control end, and issuing operation tasks to corresponding execution terminals according to the running states of the current manual annotation terminals; and after the operation is finished, returning a corresponding message to the flow control end.
7. The document image analysis and recognition system of claim 6, wherein,
the human-computer coupled document image analysis and recognition processing flow comprises that the flow control end receives a task initiating message of a user operation end through the message communication end, so that a document image analysis and recognition processing flow is started; the flow control end obtains the completed step of the current task, decides the next processing step and sends the next processing step to the machine engine management end or the manual annotation management end through the message communication end; after the processing flow is finished, the flow control end sends a message to the user operation end, so that the user operation end updates the current task completion state.
8. The document image analysis and recognition system of claim 6, wherein,
wherein the machine terminal cluster comprises a plurality of machine engine terminals; uniformly numbering all machine engine terminals in the system, and uniformly managing and allocating by a machine engine management end; the machine engine terminal runs a document image analysis and recognition model based on the deep neural network, determines model output required to be called according to the current document image analysis and recognition processing steps, and returns a processing result to the machine engine management terminal;
The manual terminal cluster comprises a plurality of manual labeling terminals, wherein the manual labeling terminals correspond to manual labeling operators, all the manual labeling operators and the corresponding manual labeling terminals are uniformly numbered, and the manual labeling management terminals uniformly manage and allocate the manual labeling operators and the corresponding manual labeling terminals; and the manual annotation staff checks and modifies the current annotation result in the manual annotation terminal, determines an operation page required to be called by the manual annotation terminal according to the current processing step, and returns the annotation result to the manual annotation management terminal.
9. The document image analysis and recognition system of claim 8, wherein,
the document image analysis and recognition model based on the deep neural network comprises an input layer, a feature extraction network, a multi-task prediction network and a multi-task output layer; the input layer is connected to the feature extraction network, the feature extraction network is connected to the multi-task prediction network, and the multi-task prediction network is connected to the multi-task output layer;
the input layer receives an input page image, wherein the input page image is a page image in a document to be processed currently; the characteristic extraction network is a stacked multilayer convolutional neural network; the multi-task prediction network is a multi-layer prediction network exclusive to corresponding tasks respectively constructed for different prediction tasks; the multi-task output layer outputs output results of different prediction networks.
10. The document image analysis and recognition system of claim 9, wherein,
the feature extraction network is a superposed multi-layer convolutional neural network, each layer of convolutional neural network is a nonlinear mapping output by a previous layer of convolutional neural network, and the input page image is represented and described through multiple nonlinear mappings, and the representing features are extracted and output; the representative feature of the page image obtained through the feature extraction network is a shared feature which is shared by various prediction tasks; the multiple prediction tasks comprise page segmentation, form analysis, text detection and text recognition;
the multi-task prediction network comprises a page segmentation prediction network, a form analysis prediction network, a text detection prediction network and a text recognition prediction network, which are respectively used for realizing the prediction tasks of page segmentation, form analysis, text detection and text recognition; the page segmentation prediction network, the form analysis prediction network, the text detection prediction network and the text recognition prediction network share input features, namely different prediction networks share the representation features output by the feature extraction network; the multi-task prediction network determines different prediction network structures according to different prediction tasks respectively;
The multi-task output layer comprises a page segmentation result output by the page segmentation prediction network, a table analysis result output by the table analysis prediction network, a text detection result output by the text detection prediction network and a text recognition result output by the text recognition prediction network.
CN201911143272.6A 2019-11-20 2019-11-20 Document Image Analysis and Recognition Method and System Active CN110991279B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911143272.6A CN110991279B (en) 2019-11-20 2019-11-20 Document Image Analysis and Recognition Method and System

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911143272.6A CN110991279B (en) 2019-11-20 2019-11-20 Document Image Analysis and Recognition Method and System

Publications (2)

Publication Number Publication Date
CN110991279A CN110991279A (en) 2020-04-10
CN110991279B true CN110991279B (en) 2023-08-22

Family

ID=70085461

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911143272.6A Active CN110991279B (en) 2019-11-20 2019-11-20 Document Image Analysis and Recognition Method and System

Country Status (1)

Country Link
CN (1) CN110991279B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898411B (en) * 2020-06-16 2021-08-31 华南理工大学 Text image annotation system, method, computer device and storage medium
CN114677696A (en) * 2020-12-09 2022-06-28 中铁高新工业股份有限公司 A method for identifying quota data of bridge steel structure
CN113936271A (en) * 2021-10-18 2022-01-14 北京有竹居网络技术有限公司 Text recognition method, device, readable medium and electronic device
CN114881992B (en) * 2022-05-24 2023-04-07 北京安德医智科技有限公司 Skull fracture detection method and device and storage medium
CN118135578B (en) * 2024-05-10 2024-09-24 沈阳出版社有限公司 Text learning and proofreading system based on text recognition

Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04124785A (en) * 1990-09-17 1992-04-24 Hitachi Ltd Confirmation and correction method for ocr recognition result
GB0031596D0 (en) * 2000-12-22 2001-02-07 Barbara Justin S A system and method for improving accuracy of signal interpretation
JP2002024758A (en) * 2000-07-07 2002-01-25 Hitachi Ltd Method for deciding input data in form handling device
GB0318214D0 (en) * 2002-08-23 2003-09-03 Hewlett Packard Development Co Systems and methods for processing text-based electronic documents
AU2005201754A1 (en) * 2005-04-27 2006-11-16 Canon Kabushiki Kaisha Method of extracting data from documents
CN101013440A (en) * 2007-01-12 2007-08-08 王宏源 Method for constructing digital library based on book knowledge element
CN101118597A (en) * 2006-07-31 2008-02-06 富士通株式会社 Account form processing method, account form processing device and computer product
CN101441713A (en) * 2007-11-19 2009-05-27 汉王科技股份有限公司 Optical character recognition method and apparatus of PDF document
CN101542504A (en) * 2006-09-08 2009-09-23 谷歌公司 Shape clustering in post-optical character recognition processing
CN101539929A (en) * 2009-04-17 2009-09-23 无锡天脉聚源传媒科技有限公司 Method for indexing TV news by utilizing computer system
CN102289667A (en) * 2010-05-17 2011-12-21 微软公司 User correction of errors arising in a textual document undergoing optical character recognition (OCR) process
CN102663138A (en) * 2012-05-03 2012-09-12 北京大学 Method and device for inputting formula query terms
CN104123550A (en) * 2013-04-25 2014-10-29 魏昊 Cloud computing-based text scanning identification method
CN105159870A (en) * 2015-06-26 2015-12-16 徐信 Processing system for precisely completing continuous natural speech textualization and method for precisely completing continuous natural speech textualization
CN107369440A (en) * 2017-08-02 2017-11-21 北京灵伴未来科技有限公司 The training method and device of a kind of Speaker Identification model for phrase sound
CN107943937A (en) * 2017-11-23 2018-04-20 杭州源诚科技有限公司 A kind of debtors assets monitoring method and system based on trial open information analysis
CN108170658A (en) * 2018-01-12 2018-06-15 山西同方知网数字出版技术有限公司 A kind of flexibly configurable, the Text region flexibly defined adapt critique system
CN108537146A (en) * 2018-03-22 2018-09-14 五邑大学 A kind of block letter mixes line of text extraction system with handwritten form
CN109165293A (en) * 2018-08-08 2019-01-08 上海宝尊电子商务有限公司 A kind of expert data mask method and program towards fashion world
CN109255113A (en) * 2018-09-04 2019-01-22 郑州信大壹密科技有限公司 Intelligent critique system
CN109543614A (en) * 2018-11-22 2019-03-29 厦门商集网络科技有限责任公司 A kind of this difference of full text comparison method and equipment
CN109685052A (en) * 2018-12-06 2019-04-26 泰康保险集团股份有限公司 Method for processing text images, device, electronic equipment and computer-readable medium
CN109840519A (en) * 2019-01-25 2019-06-04 青岛盈智科技有限公司 A kind of adaptive intelligent form recognition input device and its application method
CN109934227A (en) * 2019-03-12 2019-06-25 上海兑观信息科技技术有限公司 System for recognizing characters from image and method
CN110298032A (en) * 2019-05-29 2019-10-01 西南电子技术研究所(中国电子科技集团公司第十研究所) Text classification corpus labeling training system
CN110321875A (en) * 2019-07-19 2019-10-11 东莞理工学院 A kind of resume identification and intelligent classification screening system based on deep learning
CN110378332A (en) * 2019-06-14 2019-10-25 上海咪啰信息科技有限公司 A kind of container terminal case number (CN) and Train number recognition method and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7697757B2 (en) * 2005-06-15 2010-04-13 Hewlett-Packard Development Company, L.P. Computer assisted document modification
US20070033118A1 (en) * 2005-08-02 2007-02-08 Taxscan Technologies, Llc Document Scanning and Data Derivation Architecture.
GB0623236D0 (en) * 2006-11-22 2007-01-03 Ibm An apparatus and a method for correcting erroneous image identifications generated by an ocr device
US8311973B1 (en) * 2011-09-24 2012-11-13 Zadeh Lotfi A Methods and systems for applications for Z-numbers

Patent Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04124785A (en) * 1990-09-17 1992-04-24 Hitachi Ltd Confirmation and correction method for ocr recognition result
JP2002024758A (en) * 2000-07-07 2002-01-25 Hitachi Ltd Method for deciding input data in form handling device
GB0031596D0 (en) * 2000-12-22 2001-02-07 Barbara Justin S A system and method for improving accuracy of signal interpretation
GB0318214D0 (en) * 2002-08-23 2003-09-03 Hewlett Packard Development Co Systems and methods for processing text-based electronic documents
AU2005201754A1 (en) * 2005-04-27 2006-11-16 Canon Kabushiki Kaisha Method of extracting data from documents
CN101118597A (en) * 2006-07-31 2008-02-06 富士通株式会社 Account form processing method, account form processing device and computer product
CN101542504A (en) * 2006-09-08 2009-09-23 谷歌公司 Shape clustering in post-optical character recognition processing
CN101013440A (en) * 2007-01-12 2007-08-08 王宏源 Method for constructing digital library based on book knowledge element
CN101441713A (en) * 2007-11-19 2009-05-27 汉王科技股份有限公司 Optical character recognition method and apparatus of PDF document
CN101539929A (en) * 2009-04-17 2009-09-23 无锡天脉聚源传媒科技有限公司 Method for indexing TV news by utilizing computer system
CN102289667A (en) * 2010-05-17 2011-12-21 微软公司 User correction of errors arising in a textual document undergoing optical character recognition (OCR) process
CN102663138A (en) * 2012-05-03 2012-09-12 北京大学 Method and device for inputting formula query terms
CN104123550A (en) * 2013-04-25 2014-10-29 魏昊 Cloud computing-based text scanning identification method
CN105159870A (en) * 2015-06-26 2015-12-16 徐信 Processing system for precisely completing continuous natural speech textualization and method for precisely completing continuous natural speech textualization
CN107369440A (en) * 2017-08-02 2017-11-21 北京灵伴未来科技有限公司 The training method and device of a kind of Speaker Identification model for phrase sound
CN107943937A (en) * 2017-11-23 2018-04-20 杭州源诚科技有限公司 A kind of debtors assets monitoring method and system based on trial open information analysis
CN108170658A (en) * 2018-01-12 2018-06-15 山西同方知网数字出版技术有限公司 A kind of flexibly configurable, the Text region flexibly defined adapt critique system
CN108537146A (en) * 2018-03-22 2018-09-14 五邑大学 A kind of block letter mixes line of text extraction system with handwritten form
CN109165293A (en) * 2018-08-08 2019-01-08 上海宝尊电子商务有限公司 A kind of expert data mask method and program towards fashion world
CN109255113A (en) * 2018-09-04 2019-01-22 郑州信大壹密科技有限公司 Intelligent critique system
CN109543614A (en) * 2018-11-22 2019-03-29 厦门商集网络科技有限责任公司 A kind of this difference of full text comparison method and equipment
CN109685052A (en) * 2018-12-06 2019-04-26 泰康保险集团股份有限公司 Method for processing text images, device, electronic equipment and computer-readable medium
CN109840519A (en) * 2019-01-25 2019-06-04 青岛盈智科技有限公司 A kind of adaptive intelligent form recognition input device and its application method
CN109934227A (en) * 2019-03-12 2019-06-25 上海兑观信息科技技术有限公司 System for recognizing characters from image and method
CN110298032A (en) * 2019-05-29 2019-10-01 西南电子技术研究所(中国电子科技集团公司第十研究所) Text classification corpus labeling training system
CN110378332A (en) * 2019-06-14 2019-10-25 上海咪啰信息科技有限公司 A kind of container terminal case number (CN) and Train number recognition method and system
CN110321875A (en) * 2019-07-19 2019-10-11 东莞理工学院 A kind of resume identification and intelligent classification screening system based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张文国.OCR数字化加工系统研发成功 为图书、档案、文献资料数字化提供先进技术手段.电子出版.2001,(04),全文. *

Also Published As

Publication number Publication date
CN110991279A (en) 2020-04-10

Similar Documents

Publication Publication Date Title
CN110991279B (en) Document Image Analysis and Recognition Method and System
CN114092707B (en) Image-text visual question answering method, system and storage medium
JP2022056316A (en) Character structuring extraction method and device, electronic apparatus, storage medium, and computer program
CN113177124A (en) Vertical domain knowledge graph construction method and system
CN115438215B (en) Image-text bidirectional search and matching model training method, device, equipment and medium
CN114743014B (en) Laser point cloud feature extraction method and device based on multi-head self-attention
CN118155214B (en) Prompt learning method, image classification method and related devices
WO2024098524A1 (en) Text and video cross-searching method and apparatus, model training method and apparatus, device, and medium
CN114694158A (en) Extraction method of structured information of bill and electronic equipment
CN118298215A (en) Multi-mode prompt learning-based multi-label image classification method and system
CN115374765B (en) Computing power network 5G data analysis system and method based on natural language processing
CN116974626B (en) Analysis sequence chart generation method, device, equipment and computer readable storage medium
CN114896067A (en) Automatic generation method and device of task request information, computer equipment and medium
CN118351307A (en) Multi-domain attention-enhanced three-dimensional point cloud semantic segmentation method and device
US12354194B2 (en) Neural image compositing with layout transformers
CN115620082A (en) Model training method, head pose estimation method, electronic device and storage medium
CN117634459B (en) Target content generation and model training method, device, system, equipment and medium
CN119226478A (en) Data processing method and device, storage medium and electronic device
CN118038025B (en) Method, device and equipment for foggy target detection based on frequency domain and spatial domain
CN118690250A (en) A method for knob switch state recognition based on multimodal model
CN111768214B (en) Product attribute prediction method, system, device and storage medium
CN118944282A (en) A regional remote intelligent inspection method and system for substations with "storage-computation-iteration-dispatching" collaboration
CN118334663A (en) One-stop artificial intelligence image processing model construction method and device
CN118247686A (en) Training method, recognition method, system, equipment and medium for image detection model
CN114610807A (en) Data import template configuration method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant