[go: up one dir, main page]

CN113763313B - Text image quality detection method, device, medium and electronic equipment - Google Patents

Text image quality detection method, device, medium and electronic equipment Download PDF

Info

Publication number
CN113763313B
CN113763313B CN202110484548.8A CN202110484548A CN113763313B CN 113763313 B CN113763313 B CN 113763313B CN 202110484548 A CN202110484548 A CN 202110484548A CN 113763313 B CN113763313 B CN 113763313B
Authority
CN
China
Prior art keywords
text
text image
image
scale
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110484548.8A
Other languages
Chinese (zh)
Other versions
CN113763313A (en
Inventor
黄鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110484548.8A priority Critical patent/CN113763313B/en
Publication of CN113763313A publication Critical patent/CN113763313A/en
Application granted granted Critical
Publication of CN113763313B publication Critical patent/CN113763313B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

本申请属于计算机技术领域,具体涉及一种文本图像的质量检测方法、装置、计算机可读介质及电子设备。该方法包括:检测文本图像的字符尺度;当文本图像的字符尺度小于第一预设尺度时,对文本图像进行放大处理,当文本图像的字符尺度大于第二预设尺度时,对文本图像进行缩小处理;将文本图像输入到第一神经网络中,以检测出文本图像中的一个或多个文本区域;对文本区域进行质量分预测,得到文本区域的质量分;基于预设权重对文本区域的质量分进行加权求和处理得到文本图像的质量分。本申请实施例能够提高对文本区域乃至文本图像的质量分的预测的准确性。

The present application belongs to the field of computer technology, and specifically relates to a method, device, computer-readable medium and electronic device for quality detection of text images. The method comprises: detecting the character scale of the text image; when the character scale of the text image is less than a first preset scale, enlarging the text image, and when the character scale of the text image is greater than a second preset scale, reducing the text image; inputting the text image into a first neural network to detect one or more text regions in the text image; predicting the quality score of the text region to obtain the quality score of the text region; and performing weighted summation processing on the quality score of the text region based on preset weights to obtain the quality score of the text image. The embodiments of the present application can improve the accuracy of the prediction of the quality score of the text region and even the text image.

Description

Text image quality detection method, device, medium and electronic equipment
Technical Field
The application belongs to the technical field of computers, and particularly relates to a text image quality detection method and device, a computer readable medium and electronic equipment.
Background
With the wider and wider application of OCR (Optical Character Recognition ) technology, the acquired text image quality is more focused, and the text image quality evaluation method is also attracting more extensive interest in academia and industry.
Most of the existing image quality evaluation methods aim at natural scene images and are not suitable for text image quality evaluation, so that an evaluation method aiming at text image quality needs to be provided.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the application and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.
Disclosure of Invention
The application aims to provide a quality detection method and device for text images, a computer readable medium and electronic equipment, and at least solves the technical problems of inaccurate Chinese line identification and the like in the related technology to a certain extent.
Other features and advantages of the application will be apparent from the following detailed description, or may be learned by the practice of the application.
According to an aspect of an embodiment of the present application, there is provided a quality detection method of a text image, including:
Detecting the character scale of the text image, wherein the character scale of the text image is the average scale of characters of the text image;
When the character scale of the text image is smaller than a first preset scale, amplifying the text image to enable the character scale of the text image to be in a preset scale range, and when the character scale of the text image is larger than a second preset scale, reducing the text image to enable the character scale of the text image to be in a preset scale range, wherein the preset scale range is a scale range which is larger than the first preset scale and smaller than the second preset scale;
Inputting the text image into a first neural network, and performing feature extraction and mapping processing on the text image through the first neural network to detect one or more text areas in the text image, wherein a sensing window is configured in the first neural network, the sensing window is in a preset scale range, and the sensing window is used for moving on the text image to perform feature extraction on the text image;
predicting the quality of a text region in the text image to obtain the quality of the text region;
And acquiring preset weights corresponding to the text areas, and carrying out weighted summation on the quality scores of the text areas based on the preset weights to obtain the quality scores of the text images.
According to an aspect of an embodiment of the present application, there is provided a quality detection apparatus of a text image, the quality detection apparatus including:
a character scale detection module configured to detect a character scale of the text image, the character scale of the text image being an average scale of characters of the text image;
The scaling module is configured to amplify the text image so that the character scale of the text image is in a preset scale range when the character scale of the text image is smaller than a first preset scale, and to reduce the text image so that the character scale of the text image is in a preset scale range when the character scale of the text image is larger than a second preset scale, wherein the preset scale range is a scale range which is larger than the first preset scale and smaller than the second preset scale;
The text region detection module is configured to input the text image into a first neural network, and perform feature extraction and mapping processing on the text image through the first neural network so as to detect one or more text regions in the text image, wherein a sensing window is configured in the first neural network, the sensing window is in a preset scale range, and the sensing window is used for moving on the text image so as to perform feature extraction on the text image;
the quality score prediction module is configured to predict the quality score of a text region in the text image to obtain the quality score of the text region;
And the weighted summation module is configured to acquire preset weights corresponding to the text areas, and perform weighted summation processing on the quality scores of the text areas based on the preset weights to obtain the quality scores of the text images.
In some embodiments of the present application, based on the above technical solution, the weighted summation module includes:
A word number obtaining unit configured to obtain a word number of each text region, and perform a summation operation on the word number of each text region to obtain a total word number of the text image;
And the preset weight calculation unit is configured to acquire the ratio of the word number of each text region to the total word number of the text image, and take the ratio as the preset weight corresponding to each text region.
In some embodiments of the present application, based on the above technical solution, the text region is a text line, the text line includes a continuous image region of one or more serially arranged characters in the text image, and the quality prediction module includes:
A length-to-height ratio detection unit configured to detect a text line in the text image and a length-to-height ratio of the text line, the length of the text line being a length of an extension line of the text line extending along a character arrangement direction in the text line, the height of the text line being a height of the text line perpendicular to the extension line;
a text line dividing unit configured to divide a text line having a length-to-height ratio greater than a preset value into a plurality of text lines having a length-to-height ratio less than or equal to the preset value;
And the quality score prediction unit is configured to perform feature extraction and quality score prediction on the text lines with the length-height ratios smaller than or equal to a preset value respectively to obtain the quality scores of the text lines with the length-height ratios smaller than or equal to the preset value.
In some embodiments of the present application, based on the above technical solution, the text line segmentation unit includes:
A text line projection subunit, configured to project the text line with the length-to-height ratio greater than a preset value onto the length direction of the text line to form a one-dimensional projection point set, wherein the projection of the pixel point at the position of the character in the length direction of the text line forms a real point projection, the projection of the pixel point at the position other than the position of the character in the length direction of the text line forms a virtual point projection, and the real point projection and the virtual point projection form a projection point set;
And the text line segmentation subunit is configured to acquire segmentation points on the line segments aggregated by the virtual point projection, segment the text line according to the segmentation points, and segment the text line with the length-to-height ratio larger than a preset value into a plurality of text lines with the length-to-height ratio smaller than or equal to the preset value.
In some embodiments of the present application, based on the above technical solution, the mass fraction prediction module may include:
an input unit configured to input a text region in the text image into a second neural network;
The feature extraction unit is configured to extract features of the text region through a convolution layer of the second neural network to obtain plane features;
The dimension reduction processing unit is configured to perform dimension reduction processing on the plane features through a pooling layer of the second neural network to obtain feature vectors;
And the full-connection calculation unit is configured to perform full-connection calculation on the feature vector through a full-connection layer of the second neural network to obtain the prediction quality score of the text region.
In some embodiments of the present application, based on the above technical solution, the mass fraction prediction module may further include:
A data set acquisition unit configured to acquire a text recognition data set including a text image and a recognition accuracy of the text image, wherein the recognition accuracy of the text image is a ratio of a recognizable character number of the text image to an actual character number, the recognizable character number being a number of characters that the text image can be correctly recognized by a character recognition model, the actual character number being a number of characters actually included in the text image;
The data set labeling unit is configured to perform equal ratio conversion on the recognition accuracy of the text image to obtain the quality score of the text image, and label the quality score of the text image into the text recognition data set;
A neural network training unit configured to input the text recognition data set into the second neural network, training the second neural network.
In some embodiments of the present application, based on the above technical solution, the text region is a single word region, and the single word region is a continuous image region including one character in the text image.
According to an aspect of the embodiments of the present application, there is provided a computer-readable medium having stored thereon a computer program which, when executed by a processor, implements a quality detection method of a text image as in the above technical solution.
According to an aspect of an embodiment of the present application, there is provided an electronic device including a processor, and a memory for storing executable instructions of the processor, wherein the processor is configured to perform the quality detection method of a text image as in the above technical solution via execution of the executable instructions.
According to an aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the quality detection method of a text image as in the above technical solution.
In the technical scheme provided by the embodiment of the application, the text image with the character scale exceeding the preset scale range is subjected to the enlarging process or the shrinking process, so that the character scale of the text image is in the preset scale range, and the sensing window of the first neural network is also in the preset scale range. Therefore, the matching degree of the sensing window of the first neural network and the character size of the text image is higher, the accuracy of detecting the text region in the text image can be improved, the accuracy of predicting the text region and the quality of the text image can be improved, and meanwhile, the robustness of the text image quality detection scheme of the embodiment of the application is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is evident that the drawings in the following description are only some embodiments of the present application and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
Fig. 1 schematically shows a block diagram of an exemplary system architecture to which the technical solution of the present application is applied.
Fig. 2 schematically illustrates a flow chart of the steps of a quality detection method of some embodiments of the present application.
Fig. 3 schematically illustrates a comparison of detection effects of an embodiment of the present application and related art.
Fig. 4 schematically shows a flowchart of the steps for quality score prediction of text regions in a text image to obtain quality scores of the text regions in an embodiment of the application.
Fig. 5 schematically illustrates a process of performing feature extraction, dimension reduction processing and full-connection calculation on a text region by using a second neural network according to an embodiment of the present application, and obtaining a quality score of the text region.
Fig. 6 schematically shows a partial step flow diagram of a quality detection method according to an embodiment of the application, before entering text regions in a text image into a second neural network.
Fig. 7 schematically shows a flowchart of the steps for predicting the quality of a text region in a text image to obtain the quality of the text region according to an embodiment of the present application.
Fig. 8 schematically illustrates a flowchart of the steps for dividing a text line with a length to height ratio greater than a predetermined value into a plurality of text lines with a length to height ratio less than or equal to the predetermined value in an embodiment of the present application.
Fig. 9 schematically illustrates a flowchart of the steps for obtaining preset weights corresponding to respective text regions in an embodiment of the present application.
Fig. 10 schematically illustrates a scene of detecting text lines in a text image and quality scores and numbers of text lines according to an embodiment of the present application.
Fig. 11 schematically illustrates a scene diagram for detecting text lines in a text image and quality scores and aspect ratios of the text lines in accordance with an embodiment of the application.
Fig. 12 schematically shows a block diagram of a text image quality detecting apparatus provided by an embodiment of the present application.
Fig. 13 schematically shows a block diagram of a computer system of an electronic device for implementing an embodiment of the application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many different forms and should not be construed as limited to the examples set forth herein, but rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the exemplary embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the application may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace human eyes with a camera and a Computer to perform machine Vision such as recognition, tracking and measurement on a target, and further perform graphic processing to make the Computer process into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.
Fig. 1 schematically shows a block diagram of an exemplary system architecture to which the technical solution of the present application is applied.
As shown in fig. 1, system architecture 100 may include a terminal device 110, a network 120, and a server 130. Terminal device 110 may include various electronic devices such as smart phones, tablet computers, notebook computers, desktop computers, and the like. The server 130 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. Network 120 may be a communication medium of various connection types capable of providing a communication link between terminal device 110 and server 130, and may be, for example, a wired communication link or a wireless communication link.
The system architecture in embodiments of the present application may have any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 130 may be a server group composed of a plurality of server devices. In addition, the technical solution provided in the embodiment of the present application may be applied to the terminal device 110, or may be applied to the server 130, or may be implemented by the terminal device 110 and the server 130 together, which is not limited in particular.
For example, the server 130 may be equipped with the method for detecting quality of a text image according to the embodiment of the present application, and after the user uploads the text image through the terminal device 110, the server may implement the method for detecting quality of the text image according to the embodiment of the present application. Therefore, the matching degree of the sensing window of the first neural network and the character size of the text image is higher, the accuracy of detecting the text region in the text image can be improved, the accuracy of predicting the text region and the quality of the text image can be improved, and meanwhile, the robustness of the text image quality detection scheme of the embodiment of the application is improved.
In addition, in the technical scheme provided by the embodiment of the application, the quality score prediction is performed on the text region in the text image to obtain the quality score of the text region, then the preset weight corresponding to each text region is obtained, and the quality score of the text region is weighted and summed based on the preset weight to obtain the quality score of the text image, so that the quality score prediction problem of the text image is converted into the quality score prediction problem of the text region and the weight relative to each text region is obtained. Therefore, compared with the method for directly carrying out quality score prediction on the text image, the method for carrying out quality score prediction on the text region can eliminate the influence of the region which does not contain characters in the text image on the quality score prediction result, and the quality score prediction accuracy is higher. Meanwhile, different text regions can have different preset weights, and the weighted summation processing is carried out on the quality scores of the text regions based on the preset weights, so that the robustness of the text image quality detection scheme of the embodiment of the application is further improved.
The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted device, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.
With the widespread use of smart devices in our daily lives, it is often necessary to submit mobile captured text images in business processes of companies, so that the number of text images is rapidly increasing. Thus, intelligent document identification is becoming increasingly important for business process automation. Intelligent document recognition is very sensitive to text image quality. Since unavoidable distortions during image capturing may result in lower text image quality, the accuracy of recognition of the captured text image is often reduced, which may severely hamper subsequent business processes. For example, in an online insurance underwriting service, immediate recapturing is required if low quality document images submitted for claims are not detected as soon as possible. Because the image of the underwriting file cannot be obtained once the paper file is lost or the user is not matched with the provision, key information can be lost in the business process. Since the quality of text images uploaded by users varies, it is necessary to evaluate the quality of such text images in advance to reject text images of low quality.
The quality detection method provided by the application is described in detail below with reference to the specific embodiments.
Fig. 2 schematically illustrates a flow chart of the steps of a quality detection method of some embodiments of the present application. The execution subject of the quality detection method may be a terminal device, a server, or the like, and the present application is not limited to this. As shown in FIG. 2, the quality detection method mainly comprises the following steps S210 to S250.
S210, detecting the character scale of the text image, wherein the character scale of the text image is the average scale of characters of the text image;
S220, when the character scale of the text image is smaller than a first preset scale, amplifying the text image to enable the character scale of the text image to be in a preset scale range, and when the character scale of the text image is larger than a second preset scale, reducing the text image to enable the character scale of the text image to be in the preset scale range, wherein the preset scale range is a scale range which is larger than the first preset scale and smaller than the second preset scale;
S230, inputting a text image into a first neural network, and performing feature extraction and mapping processing on the text image through the first neural network to detect one or more text areas in the text image, wherein a sensing window is configured in the first neural network, the sensing window is in a preset scale range and is used for moving on the text image to perform feature extraction on the text image, and the text areas are continuous image areas consisting of partial or all characters in the text image;
S240, predicting the quality of a text region in the text image to obtain the quality of the text region;
S250, obtaining preset weights corresponding to the text areas, and carrying out weighted summation processing on the quality scores of the text areas based on the preset weights to obtain the quality scores of the text images.
Wherein the text image is an image containing text, i.e. an image containing characters. The character scale of the text image is the average scale of the characters of the text image. In particular, the dimensions may be height or area, etc. In some embodiments, the average size of the characters of the text image may be an average of the character heights of each character in the text image, and in other embodiments, the average size of the characters of the text image may be an average of the character areas of each character in the text image. In a specific embodiment, an MSER (Maximally Stable Extremal Regions, maximum stable extremum region) detector may be employed as a single word detector to detect a single character in a text image. Then, the average height of all characters in the text image is obtained, and scaling processing is performed on the original image in a self-adaptive manner according to the calculated average height, as described in step S220, so that the character scale of the text image is in a preset scale range, so as to adapt to the scale size of the sensing window of the first neural network, and the accuracy of the detection of the text region by the first neural network can be improved. The receptive window is a matrix for convoluting data by a convolution layer in the convolutional neural network, namely the receptive field of the convolutional neural network.
In some embodiments, after the text image is input to the first neural network, the feature extraction unit of the first neural network performs feature extraction on the text image, and then the mapping processing unit of the first neural network performs mapping processing, so that one or more text regions in the text image can be detected. In a specific embodiment, the first neural network may be a PSE (Progressive Scale Expansion, progressive extension algorithm) split network, i.e., PSENet. Thus, multi-direction, multi-angle and multi-scale detection of the text image can be realized, large-scale characters and small-scale characters of the text image with characters with scales which are different by several times can be detected and distinguished, and the text image with character arrangement directions of oblique, bending, reverse and the like can be detected and distinguished. Referring to fig. 3, fig. 3 schematically illustrates a comparison diagram of detection effects of a certain embodiment of the present application and related art. As shown in fig. 3, the detection of text regions, in this embodiment text lines, is performed on the same text image using related art and some embodiments of the present application. The text image 310 shows that only text lines in a single direction (lateral direction as shown in fig. 3) can be detected in the related art, that no text lines in the longitudinal direction can be detected, and that there is some edge loss for larger-scale characters that are "respectfully" to be detected incompletely. The text image 320 shows that the technical scheme of the application can realize the detection of the text image in multiple directions (horizontal and longitudinal), multiple angles (horizontal and vertical) and multiple scales (large scale and small scale), can realize the complete detection of the text line, and can greatly reduce the edge loss of the text line and the character in the detection process. In addition to PSE partitioning networks, the first neural network may also be other CNN (Cable News Network, convolutional neural network) -based neural networks for detecting text regions in text images, which is not limited in this regard.
It will be appreciated that unlike a scene image, a text image is essentially more text-focused. Therefore, in the embodiment of the application, the quality score of the text image is obtained by carrying out weighted summation processing on the quality score of the text region based on the preset weight, so that the quality of the text image can be accurately reflected.
Fig. 4 schematically shows a flowchart of the steps for quality score prediction of text regions in a text image to obtain quality scores of the text regions in an embodiment of the application. As shown in fig. 4, based on the above embodiment, the quality score prediction is performed on the text region in the text image in step S240 to obtain the quality score of the text region, which may further include the following steps S410 to S440.
S410, inputting a text region in the text image into a second neural network;
s420, extracting features of the text region through a convolution layer of the second neural network to obtain plane features;
S430, performing dimension reduction processing on the plane characteristics through a pooling layer of the second neural network to obtain characteristic vectors;
S440, performing full-connection calculation on the feature vector through a full-connection layer of the second neural network to obtain the prediction quality score of the text region.
In some embodiments, the structure of the second neural network may include a convolutional layer, a pooling layer, and a fully-connected layer.
Specifically, fig. 5 schematically illustrates a process of performing feature extraction, dimension reduction processing and full-connection calculation on a text region by using the second neural network according to an embodiment of the present application, and obtaining a predicted quality score of the text region. Referring to fig. 5, the convolution layer of the second neural network performs feature extraction on the text region through convolution, maximum pooling and convolution sub-steps to obtain the first plane feature. Then, the pooling layer of the second neural network obtains the corresponding feature vector by pooling the maximum value and the minimum value of the first plane feature. And then, the full connection layer of the second neural network obtains the quality score corresponding to the text region through full connection calculation of the feature vector.
In some implementations, the text region can be a text line. The second neural network may be a DIQA (Deep CNN-Based Blind Image Quality Predictor, image quality assessment) framework based on text lines. In a specific embodiment, the second neural network may be constructed based on ResNet (residual network), and since ResNet has excellent feature representation capability, feature extraction can be performed on the text image more accurately, so that accuracy of prediction of the quality score can be improved. The second neural network may also be other CNN networks such as VGG (Visual Geometry Group ) networks.
In some embodiments, a text line having a length to height ratio greater than a preset value may be segmented into a plurality of text lines having a length to height ratio less than or equal to the preset value. Notably, the size of the text image is typically much larger than the image size accepted by the deep convolutional neural network. In order to meet the condition that the size of an image accepted by the deep convolutional neural network is relatively fixed, if the size of a text image is adjusted in an undersampling mode before detection, a text with smaller characters possibly becomes fuzzy or even illegible after undersampling is caused, and finally, the detection of text lines is inaccurate, so that the quality score prediction of the text image is relatively inaccurate.
Therefore, in order to avoid degradation caused by undersampling of text lines, in the embodiment of the present application, the first neural network performs feature extraction and mapping processing on the text image, and after detecting a text region in the text image, the text region is input into the second neural network, that is, the deep convolutional neural network, to perform mass fraction prediction. By detecting the text region in the text image and inputting the text region into the second neural network for quality prediction, the text image is decomposed into one or more text regions and then is input into the second neural network, so that the problem of overlarge image caused by directly inputting the text image into the neural network can be avoided, and the image degradation caused by adjusting the size of the text image in an undersampling mode before detection can be avoided. In addition, the input images are text areas, and most of information in the images of the text areas is text information, so that the second neural network is favorable for extracting features and analyzing features, and the accuracy and the robustness of the second neural network on the prediction of the image quality can be improved. Based on the above effects, it can be understood that the text image quality detection method according to the embodiment of the application has more accurate quality detection effects on quality detection of business document images such as insurance check and contract, and also has more accurate quality detection effects under natural text image quality evaluation scenes such as bank card shooting images, medical record sheet shooting images, invoice shooting images and the like with larger interference and higher difficulty in identifying characters.
In the second neural network, the L2 loss may be employed as an estimated loss to describe the difference between the predicted and actual quality. Specifically, the estimated loss L2 is defined as:
Where q is the predicted quality score, q gt is the quality score of the text image label in the text recognition dataset, and q gt is the accuracy of text recognition of the text image when the scaling ratio of the scaling is 1.
The parameters of the fully connected layer may be randomly initialized at a uniform normal distribution within (-0.1,0.1) range prior to training the second neural network.
Fig. 6 schematically shows a partial step flow diagram of a quality detection method according to an embodiment of the application, before entering text regions in a text image into a second neural network. As shown in fig. 6, before inputting the text region in the text image into the second neural network in step S410, the following steps S610 to S630 may be further included on the basis of the above embodiments.
S610, acquiring a text recognition data set, wherein the text recognition data set comprises a text image and the recognition accuracy of the text image, the recognition accuracy of the text image is the ratio of the number of recognizable characters of the text image to the actual number of characters, the number of recognizable characters is the number of characters of the text image which can be correctly recognized by a character recognition model, and the actual number of characters is the number of characters actually included in the text image;
s620, performing equal ratio conversion on the recognition accuracy of the text image to obtain the quality score of the text image, and labeling the quality score of the text image into a text recognition data set;
S630, inputting the text recognition data set into a second neural network, and training the second neural network.
Specifically, the quality score of the text image is obtained by performing equal-ratio conversion on the recognition accuracy of the text image, which may be obtained by converting the recognition accuracy of the text image into a percentage quality score, and labeling the quality score of the text image into the text recognition data set. The text recognition dataset is used to train the second neural network to improve accuracy of the second neural network's predictions of quality scores.
The conversion ratio of the equal ratio conversion may be 0.1, 1, 10, 100, or the like, and the present application is not limited thereto.
In some embodiments, the text image in the text recognition dataset may be a text line image, i.e., an image containing one or more serially arranged characters. Wherein the characters are arranged in series, i.e. characters arranged in series one by one. Specifically, the serially arranged characters may be a single row of characters or a single column of characters. Therefore, it can be understood that the background clutter and noise of the image can be reduced by training the second neural network based on the text line image, so that the influence of the background clutter and noise of the image on the quality score can be reduced, and the accuracy of quality score prediction can be improved. In this case, if the second neural network is trained based on a single-line text image, the attribute of the input image in the training process of the second neural network can be close to the attribute of the input image when the quality of the input image is predicted by using the second neural network, and the input image is an image containing one or more characters arranged in series, thereby being beneficial to improving the prediction accuracy of the quality of the text region prediction by using the second neural network.
In some embodiments, the text image in the text recognition dataset may be derived from an artificial image synthesized using an algorithm such as fuzzy noise, and the quality score of the text image in the text recognition dataset may be automatically generated during fuzzy noise synthesis.
In other embodiments, the text image in the text recognition dataset may be derived from a real text image and the quality score of the text image in the text recognition dataset may be derived from a manual score.
In still other embodiments, the text image in the text recognition dataset may be derived from recognition data of an OCR (Optical Character Recognition ) model in the text recognition task for a real text image. In this embodiment, the recognition accuracy of the text image is subjected to scaling to obtain the quality score of the text image, and the quality score of the text image is marked in the text recognition data set, so that the text recognition data set has the advantage of being more objective compared with the manually marked quality score, and because the accuracy is a continuous real number and is not a manually marked discrete quality score, the training of the parameter value of the second neural network can be optimized, and the training task is helped to converge better. Moreover, subjective scoring of a text image is difficult, a data set of quality scores of the image is lacking, a data set of identification accuracy of a text identification task to the image is very popular, and the data set of identification accuracy of the text identification task to the image is adopted in the scheme, so that the quality scores of the text image are converted according to the identification accuracy of the text image. Meanwhile, the real text image accords with the actual scene better than the artificial image synthesized by adopting algorithms such as fuzzy noise and the like, and is beneficial to improving the parameter training effect of the second neural network, so that the prediction accuracy of the text region prediction quality score by using the second neural network is beneficial to improvement.
Fig. 7 schematically shows a flowchart of the steps for predicting the quality of a text region in a text image to obtain the quality of the text region according to an embodiment of the present application. As shown in fig. 7, based on the above embodiment, the text region is a text line, the text line text image includes one or more continuous image regions of characters arranged in series, and the quality score prediction is performed on the text region in the text image in step S240 to obtain the quality score of the text region, which may further include the following steps S710 to S730.
S710, detecting text lines in a text image and the length-to-height ratio of the text lines, wherein the length-to-height ratio is the ratio of the length to the height of the text lines, the length of the text lines is the length of extension lines of the text lines extending along the character arrangement direction in the text lines, and the height of the text lines is the height of the text lines perpendicular to the extension lines;
S720, dividing a text line with the length-height ratio larger than a preset value into a plurality of text lines with the length-height ratio smaller than or equal to the preset value;
s730, respectively carrying out feature extraction and quality score prediction on a plurality of text lines with length-height ratios smaller than or equal to a preset value to obtain quality scores of the text lines with length-height ratios smaller than or equal to the preset value.
According to the method and the device, the text line with the length-height ratio larger than the preset value is divided into a plurality of text lines with the length-height ratio smaller than or equal to the preset value, and the text line with the longer length and the more characters can be divided into the text lines with the shorter length and the length-height ratio smaller than or equal to the preset value, so that the long text is adaptively divided, the overlong text line is prevented from being input into the second neural network, and the prediction accuracy of the second neural network can be greatly improved.
In some embodiments, the text image in the text recognition dataset may be a text line image, i.e. an image comprising one or more serially arranged characters, more specifically the length to height ratio of the text line image may be less than or equal to a preset value. Therefore, the length-to-height ratio of the processed image is smaller than or equal to a preset value in a training stage of adopting the text recognition data set and a stage of carrying out quality score prediction on the text, so that the optimization of the second neural network is facilitated, the calculated amount of the second neural network can be reduced, and the prediction efficiency of carrying out quality score prediction on the text is improved.
Fig. 8 schematically illustrates a flowchart of the steps for dividing a text line with a length to height ratio greater than a predetermined value into a plurality of text lines with a length to height ratio less than or equal to the predetermined value in an embodiment of the present application. As shown in fig. 8, on the basis of the above embodiment, the step S720 of dividing the text line with the length-height ratio greater than the preset value into a plurality of text lines with the length-height ratio less than or equal to the preset value may further include the following steps S810 to S820.
S810, projecting a text line with a length-to-height ratio larger than a preset value onto the length direction of the text line to form a one-dimensional projection point set, wherein projection of pixel points at positions of characters on the length direction of the text line forms real point projection, projection of pixel points at positions except for the positions of the characters on the length direction of the text line forms virtual point projection, and the real point projection and the virtual point projection form a projection point set;
s820, dividing points are taken from line segments gathered by virtual point projection, and text lines are divided according to the dividing points, so that the text lines with the length-height ratio larger than a preset value are divided into a plurality of text lines with the length-height ratio smaller than or equal to the preset value.
Specifically, the real-point projection may be a black-point pixel, the virtual-point projection may be a white-point pixel, and the one-dimensional projection point set may be a line segment composed of the black-point pixel, a line segment composed of the white-point pixel, or a line segment composed of the black-point pixel and the white-point pixel. The segmentation points are taken from the line segments gathered by the virtual point projection, and the text line is segmented according to the segmentation points, so that it can be understood that the line segments gathered by the virtual point projection are the projection line segments of some pixels at positions except the positions of the characters in the length direction of the text line, and the segmentation points are taken from the line segments gathered by the virtual point projection, but the text line is segmented at the positions except the positions of the characters, so that the segmentation of a single character into two different text lines can be avoided, the character recognition difficulty is caused, and the accuracy of quality part prediction is reduced.
Fig. 9 schematically illustrates a flowchart of the steps for obtaining preset weights corresponding to respective text regions in an embodiment of the present application. As shown in fig. 9, on the basis of the above embodiment, the acquiring of the preset weights corresponding to the respective text regions in step S250 may further include the following steps S910 to S920.
S910, obtaining the word numbers of all the text areas, and carrying out summation operation on the word numbers of all the text areas to obtain the total word numbers of the text images;
S920, respectively obtaining the ratio of the number of words of each text region to the total number of words of the text image, and taking the ratio as a preset weight corresponding to each text region.
It can be understood that the longer text lines in the text image occupy more content, and the quality of the longer text lines has larger influence on the quality of the text image, so that the ratio of the number of words in each text region to the total number of words in the text image is obtained respectively, and the ratio is used as the preset weight corresponding to each text region, so that the preset weight of the longer text lines is larger, the quality detection method of the text image in the embodiment of the application accords with the judgment logic of actual quality judgment, the accuracy of quality detection of the text image can be improved, and the detection efficiency of the quality detection method is improved.
In a specific embodiment, the quality score of the text imageThe method comprises the following steps:
Wherein q j is the predicted quality of the jth text line, and w j is the preset weight of the jth text line in the text image. In some embodiments of the present invention, in some embodiments, May be equal to the sum of the predicted quality scores of all text lines multiplied by the corresponding preset weights. The definition of w j can be as follows:
Wherein, R (j) is the number of characters of the j text lines, and Sigma kR(k) is the sum of the numbers of characters of k text lines in the text image. In some embodiments, the number of words of a text line may be detected by a single word detector, such as an MSER detector. In other embodiments, the number of words R (k) in a text line is approximately determined by the aspect ratio of the text line, for example:
Where line_w is the length of the text line and line_h is the height of the text line. The length of the text line is the length of an extension line of the text line extending along the character arrangement direction in the text line, and the height of the text line is the height of the text line perpendicular to the extension line. Therefore, the length-height ratio of the text lines is adopted to approximately calculate the number of the text lines, so that the detection flow of the quality detection method of the embodiment of the application can be simplified, and the detection quantity can be reduced.
Or in some embodiments, acquiring preset weights corresponding to the text regions comprises acquiring the length-to-height ratio of each text region, summing the length-to-height ratios of each text region to obtain a total length-to-height ratio, and acquiring the ratio of the length-to-height ratio of each text region to the total length-to-height ratio as the preset weight corresponding to each text region. Therefore, the preset weights corresponding to the text areas are directly calculated by adopting the length-height ratio of the text lines, so that the calculated amount can be reduced, and the detection efficiency of the quality detection method can be improved.
Fig. 10 schematically illustrates a scene of detecting text lines in a text image and quality scores and numbers of text lines according to an embodiment of the present application. Referring to fig. 10, text line detection is performed on a text image, then quality score prediction and text number detection are performed on the text line in the text image, and the results of the text line quality score prediction and text number detection are displayed below the corresponding text line. For example, the number of characters of the text line "women" displayed below the text line "women" is 2, and the predicted quality is 0.8. At this time, the predicted quality score of the text image may be calculated from the formula (1) of the quality score of the text image, the results of the quality score prediction and the text line number detection.
Fig. 11 schematically illustrates a scene diagram for detecting text lines in a text image and quality scores and aspect ratios of the text lines in accordance with an embodiment of the application. Referring to fig. 11, text line detection is performed on a text image, then quality score prediction and length-to-height ratio detection are performed on the text line in the text image, and the results of the quality score prediction and length-to-height ratio detection of the text line are displayed below the corresponding text line. For example, text line "ultrasound description:" length to height ratio of 4.4, predicted mass fraction of 0.848. At this time, the predicted quality score of the text image may be calculated from the formula (1) of the quality score of the text image and the quality score and length-to-height ratio of the text line.
In some embodiments, the text region is a single word region, which is a continuous image region of the text image that contains one character. Then, the quality score of each word area is obtained by carrying out quality score prediction on the word areas in the text image, then the preset weight corresponding to each word area is obtained, and the quality scores of the word areas are weighted and summed based on the preset weight to obtain the quality score of the text image. Wherein, the preset weights of the single word areas are all 1. Or the preset weight of the single word region may be determined by the proportion of the area of the single word region to the sum of the areas of the single word regions, for example:
quality score of text image The method comprises the following steps:
Wherein q j is the predicted quality of the jth text line, and v j is the preset weight of the jth word region in the text image. In some embodiments of the present invention, in some embodiments, The sum value obtained by multiplying the predicted quality scores of all the word areas and the corresponding preset weights can be equal. v j may be defined as follows:
Wherein S (j) is the area size of the jth single-word region, and Sigma kS(k) is the sum of the area sizes of k single-word regions in the text image. Therefore, it can be understood that the importance degree of the characters with larger area in the text image is higher, and the influence of the quality of the characters with larger area on the quality of the text image is larger, so that the preset weight corresponding to each text word area is obtained, the quality of the text image is obtained by carrying out weighted summation on the quality of the word areas based on the preset weight, the preset weight of the word areas with larger area can be larger, the quality detection method of the text image in the embodiment of the application accords with the judgment logic of actual quality judgment, and the accuracy of quality detection of the text image can be improved.
In some embodiments, the first neural network and the second neural network may be combined into the same deep neural network, and feature values obtained by feature extraction of the text image by the first neural network are shared into the second neural network. Specifically, the quality detection method of the text image may further include:
And inputting a characteristic value obtained by carrying out characteristic extraction on the text image by the first neural network into the second neural network, wherein the characteristic value is used for helping the second neural network to carry out characteristic extraction on the text region through the convolution layer to obtain one or more of plane characteristics, characteristic vectors and prediction quality scores.
Therefore, the calculated amount of the quality detection method of the text image in the embodiment of the application can be reduced, and the quality detection efficiency of the text image can be improved.
It should be noted that although the steps of the methods of the present application are depicted in the accompanying drawings in a particular order, this does not require or imply that the steps must be performed in that particular order, or that all illustrated steps be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.
The following describes an embodiment of the apparatus of the present application, which can be used to perform the quality detection method of a text image in the above-described embodiment of the present application. Fig. 12 schematically shows a block diagram of a text image quality detecting apparatus provided by an embodiment of the present application. As shown in fig. 12, the quality detection apparatus 1200 of a text image includes:
a character scale detection module 1210 configured to detect a character scale of the text image, the character scale of the text image being an average scale of characters of the text image;
a scaling module 1220 configured to, when the character scale of the text image is smaller than a first preset scale, perform scaling processing on the text image so that the character scale of the text image is in a preset scale range, and when the character scale of the text image is larger than a second preset scale, perform scaling processing on the text image so that the character scale of the text image is in a preset scale range, the preset scale range being a scale range that is larger than the first preset scale and smaller than the second preset scale;
The text region detection module 1230 is configured to input a text image into a first neural network, and perform feature extraction and mapping processing on the text image through the first neural network to detect a text region in the text image, wherein a sensing window is configured in the first neural network, the sensing window is in a preset scale range, and the sensing window is used for moving on the text image to perform feature extraction on the text image;
A quality score prediction module 1240 configured to perform quality score prediction on a text region in the text image, to obtain a quality score of the text region;
The weighted summation module 1250 is configured to obtain preset weights corresponding to the respective text regions, and perform weighted summation processing on the quality scores of the text regions based on the preset weights to obtain the quality scores of the text images.
In some embodiments of the application, based on the above embodiments, the weighted summation module comprises:
The word number acquisition unit is configured to acquire the word number of each text region and perform summation operation on the word number of each text region to obtain the total word number of the text image;
and the preset weight calculation unit is configured to acquire the ratio of the word number of each text region to the total word number of the text image, and the ratio is used as the preset weight corresponding to each text region.
In some embodiments of the present application, based on the above embodiments, the text region is a text line, the text line text image includes a continuous image region of one or more serially arranged characters, and the quality prediction module includes:
A length-to-height ratio detection unit configured to detect a text line in the text image and a length-to-height ratio of the text line, the length of the text line being a length of an extension line of the text line extending along a character arrangement direction in the text line, the height of the text line being a height of the text line perpendicular to the extension line;
a text line dividing unit configured to divide a text line having a length-to-height ratio greater than a preset value into a plurality of text lines having a length-to-height ratio less than or equal to the preset value;
And the quality score prediction unit is configured to perform feature extraction and quality score prediction on the text lines with the length-height ratios smaller than or equal to the preset value respectively to obtain quality scores of the text lines with the length-height ratios smaller than or equal to the preset value.
In some embodiments of the present application, based on the above embodiments, the text line segmentation unit includes:
A text line projection subunit configured to project a text line with a length-to-height ratio greater than a preset value onto a length direction of the text line to form a one-dimensional projection point set, wherein projections of pixels at positions where characters are located on the length direction of the text line form real point projections, projections of pixels at positions other than the positions where the characters are located on the length direction of the text line form virtual point projections, and the real point projections and the virtual point projections form the projection point set;
And the text line segmentation subunit is configured to acquire segmentation points on the line segments aggregated by virtual point projection, and segment the text line according to the segmentation points so as to segment the text line with the length-height ratio larger than a preset value into a plurality of text lines with the length-height ratio smaller than or equal to the preset value.
In some embodiments of the present application, based on the above embodiments, the mass fraction prediction module may include:
an input unit configured to input a text region in the text image into the second neural network;
the feature extraction unit is configured to extract features of the text region through a convolution layer of the second neural network to obtain plane features;
the dimension reduction processing unit is configured to perform dimension reduction processing on the plane characteristics through a pooling layer of the second neural network to obtain characteristic vectors;
and the full-connection calculation unit is configured to perform full-connection calculation on the feature vector through a full-connection layer of the second neural network to obtain the prediction quality score of the text region.
In some embodiments of the present application, based on the above embodiments, the mass fraction prediction module may further include:
The data set acquisition unit is configured to acquire a text recognition data set, wherein the text recognition data set comprises a text image and the recognition accuracy of the text image, the recognition accuracy of the text image is the ratio of the number of recognizable characters of the text image to the actual number of characters, the number of recognizable characters is the number of characters of the text image which can be correctly recognized by the character recognition model, and the actual number of characters is the number of characters actually included in the text image;
The data set labeling unit is configured to perform equal ratio conversion on the recognition accuracy of the text image to obtain the quality score of the text image, and label the quality score of the text image into the text recognition data set;
And a neural network training unit configured to input the text recognition data set into a second neural network, and train the second neural network.
In some embodiments of the present application, based on the above embodiments, the text region is a single character region, and the single character region is a continuous image region containing one character in the text image.
Specific details of the text image quality detection device provided in each embodiment of the present application have been described in the corresponding method embodiments, and are not described herein.
Fig. 13 schematically shows a block diagram of a computer system of an electronic device for implementing an embodiment of the application.
It should be noted that, the computer system 1300 of the electronic device shown in fig. 13 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.
As shown in fig. 13, the computer system 1300 includes a central processing unit 1301 (Central Processing Unit, CPU) which can perform various appropriate actions and processes according to a program stored in a Read-Only Memory 1302 (ROM) or a program loaded from a storage portion 1308 into a random access Memory 1303 (RandomAccess Memory, RAM). In the random access memory 1303, various programs and data necessary for the system operation are also stored. The cpu 1301, the rom 1302, and the ram 1303 are connected to each other via a bus 1304. An Input/Output interface 1305 (i.e., an I/O interface) is also connected to bus 1304.
Connected to the input/output interface 1305 are an input section 1306 including a keyboard, a mouse, and the like, an output section 1307 including a Cathode Ray Tube (CRT), a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD), and the like, and a speaker, and the like, a storage section 1308 including a hard disk, and the like, and a communication section 1309 including a network interface card such as a local area network card, a modem, and the like. The communication section 1309 performs a communication process via a network such as the internet. The drive 1310 is also connected to the input/output interface 1305 as needed. Removable media 1311, such as magnetic disks, optical disks, magneto-optical disks, semiconductor memory, and the like, is installed as needed on drive 1310 so that a computer program read therefrom is installed as needed into storage portion 1308.
In particular, the processes described in the various method flowcharts may be implemented as computer software programs according to embodiments of the application. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1309 and/or installed from the removable medium 1311. The computer programs, when executed by the central processor 1301, perform the various functions defined in the system of the present application.
It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), a flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1.一种文本图像的质量检测方法,其特征在于,包括:1. A method for detecting the quality of a text image, comprising: 检测所述文本图像的字符尺度,所述文本图像的字符尺度为所述文本图像的字符的平均尺度;Detecting the character scale of the text image, where the character scale of the text image is an average scale of the characters in the text image; 当所述文本图像的字符尺度小于第一预设尺度时,对所述文本图像进行放大处理以使得所述文本图像的字符尺度处于预设尺度范围中,当所述文本图像的字符尺度大于第二预设尺度时,对所述文本图像进行缩小处理以使得所述文本图像的字符尺度处于预设尺度范围中,所述预设尺度范围为大于第一预设尺度并且小于第二预设尺度的尺度范围;When the character scale of the text image is smaller than a first preset scale, the text image is enlarged so that the character scale of the text image is within a preset scale range; when the character scale of the text image is larger than a second preset scale, the text image is reduced so that the character scale of the text image is within a preset scale range, wherein the preset scale range is a scale range larger than the first preset scale and smaller than the second preset scale; 将所述文本图像输入到第一神经网络中,通过第一神经网络对文本图像进行特征提取和映射处理,以检测出文本图像中的一个或多个文本区域;其中,所述第一神经网络中配置了感受窗,所述感受窗处于预设尺度范围中,所述感受窗用于在所述文本图像上移动以对所述文本图像进行特征提取;所述文本区域为所述文本图像中由部分或者全部字符组成的连续图像区域;The text image is input into a first neural network, and feature extraction and mapping processing are performed on the text image by the first neural network to detect one or more text regions in the text image; wherein a receptive window is configured in the first neural network, and the receptive window is within a preset scale range, and the receptive window is used to move on the text image to extract features from the text image; the text region is a continuous image region composed of part or all of the characters in the text image; 对所述文本图像中的文本区域进行质量分预测,得到所述文本区域的质量分;Predicting the quality score of the text region in the text image to obtain the quality score of the text region; 获取各个所述文本区域的字数,并将各个所述文本区域的字数进行求和运算得到所述文本图像的总字数;Obtaining the number of words in each of the text regions, and summing the number of words in each of the text regions to obtain the total number of words in the text image; 分别获取各个所述文本区域的字数与所述文本图像的总字数的比值,将所述比值作为与各个所述文本区域相对应的预设权重,并基于所述预设权重对所述文本区域的质量分进行加权求和处理得到所述文本图像的质量分。The ratio of the number of words in each of the text areas to the total number of words in the text image is obtained respectively, the ratio is used as a preset weight corresponding to each of the text areas, and the quality scores of the text areas are weighted summed based on the preset weights to obtain the quality score of the text image. 2.根据权利要求1所述的文本图像的质量检测方法,其特征在于,所述文本区域为文本行,所述文本行为所述文本图像中包含一个或多个串行排列的字符的连续图像区域,所述对所述文本图像中的文本区域进行质量分预测,得到所述文本区域的质量分,包括:2. The method for detecting the quality of a text image according to claim 1, wherein the text region is a text line, and the text line is a continuous image region of the text image containing one or more characters arranged in series, and the quality score of the text region in the text image is predicted to obtain the quality score of the text region, comprising: 检测所述文本图像中的文本行及所述文本行的长高比,所述长高比为所述文本行的长度和高度的比值,所述文本行的长度为所述文本行沿着所述文本行中的字符排列方向延伸的延伸线的长度,所述文本行的高度为所述文本行垂直于所述延伸线的高度;Detecting a text line in the text image and an aspect ratio of the text line, wherein the aspect ratio is a ratio of the length to the height of the text line, the length of the text line is a length of an extension line of the text line extending along a character arrangement direction of the text line, and the height of the text line is a height of the text line perpendicular to the extension line; 将长高比大于预设值的文本行分割成多个长高比小于或等于预设值的文本行;Split a text line whose aspect ratio is greater than a preset value into multiple text lines whose aspect ratio is less than or equal to a preset value; 分别对所述多个长高比小于或等于预设值的文本行进行特征提取和质量分预测,得到所述多个长高比小于或等于预设值的文本行的质量分。Feature extraction and quality score prediction are performed on the multiple text lines whose aspect ratios are less than or equal to a preset value respectively, to obtain quality scores of the multiple text lines whose aspect ratios are less than or equal to the preset value. 3.根据权利要求2所述的文本图像的质量检测方法,其特征在于,所述将长高比大于预设值的文本行分割成多个长高比小于或等于预设值的文本行,包括:3. The text image quality detection method according to claim 2, characterized in that the step of dividing a text line whose aspect ratio is greater than a preset value into a plurality of text lines whose aspect ratio is less than or equal to the preset value comprises: 将所述长高比大于预设值的文本行投影到文本行的长度方向上,形成一维的投影点集合,其中,字符所在位置的像素点在文本行的长度方向上的投影形成实点投影,除所述字符所在位置以外的位置的像素点在文本行的长度方向上的投影形成虚点投影,实点投影和虚点投影组成投影点集合;Projecting the text line whose aspect ratio is greater than a preset value onto the length direction of the text line to form a one-dimensional projection point set, wherein the projection of the pixel point at the position where the character is located on the length direction of the text line forms a real point projection, and the projection of the pixel point at a position other than the position where the character is located on the length direction of the text line forms a virtual point projection, and the real point projection and the virtual point projection constitute a projection point set; 在所述虚点投影聚集的线段上取分割点,根据所述分割点对所述文本行进行分割,以将长高比大于预设值的文本行分割成多个长高比小于或等于预设值的文本行。A segmentation point is taken on the line segment where the virtual point projections are gathered, and the text line is segmented according to the segmentation point to segment the text line with an aspect ratio greater than a preset value into multiple text lines with an aspect ratio less than or equal to the preset value. 4.根据权利要求1所述的文本图像的质量检测方法,其特征在于,所述对所述文本图像中的文本区域进行质量分预测,得到所述文本区域的质量分,包括:4. The text image quality detection method according to claim 1, characterized in that the step of predicting the quality score of the text region in the text image to obtain the quality score of the text region comprises: 将所述文本图像中的文本区域输入到第二神经网络中;Inputting the text region in the text image into a second neural network; 通过所述第二神经网络的卷积层对所述文本区域进行特征提取,获得平面特征;Performing feature extraction on the text area through the convolution layer of the second neural network to obtain plane features; 通过所述第二神经网络的池化层对所述平面特征进行降维处理,获得特征向量;Performing dimensionality reduction processing on the plane features through the pooling layer of the second neural network to obtain a feature vector; 通过所述第二神经网络的全连接层对所述特征向量进行全连接计算,获得所述文本区域的预测质量分。The feature vector is fully connected by the fully connected layer of the second neural network to obtain a prediction quality score of the text region. 5.根据权利要求4所述的文本图像的质量检测方法,其特征在于,在将所述文本图像中的文本区域输入到第二神经网络中之前,所述方法还包括:5. The method for detecting the quality of a text image according to claim 4, characterized in that before inputting the text area in the text image into the second neural network, the method further comprises: 获取文本识别数据集,所述文本识别数据集包括文本图像及所述文本图像的识别准确率,其中,所述文本图像的识别准确率为所述文本图像的可识别字符数占实际字符数的比率,所述可识别字符数为所述文本图像能够被字符识别模型正确识别出的字符数,所述实际字符数为所述文本图像中实际包括的字符数;Acquire a text recognition data set, the text recognition data set including a text image and a recognition accuracy rate of the text image, wherein the recognition accuracy rate of the text image is a ratio of the number of recognizable characters in the text image to the actual number of characters, the recognizable number of characters is the number of characters that can be correctly recognized by the character recognition model in the text image, and the actual number of characters is the number of characters actually included in the text image; 对所述文本图像的识别准确率进行等比换算得到所述文本图像的质量分,并将所述文本图像的质量分标注到所述文本识别数据集中;Performing geometric conversion on the recognition accuracy of the text image to obtain a quality score of the text image, and marking the quality score of the text image into the text recognition data set; 将所述文本识别数据集输入到所述第二神经网络中,对所述第二神经网络进行训练。The text recognition data set is input into the second neural network to train the second neural network. 6.根据权利要求1所述的文本图像的质量检测方法,其特征在于,所述文本区域为单字区域,所述单字区域为所述文本图像中包含一个字符的连续图像区域。6 . The text image quality detection method according to claim 1 , wherein the text area is a single-word area, and the single-word area is a continuous image area containing one character in the text image. 7.一种文本图像的质量检测装置,其特征在于,包括:7. A text image quality detection device, characterized by comprising: 字符尺度检测模块,被配置为检测所述文本图像的字符尺度,所述文本图像的字符尺度为所述文本图像的字符的平均尺度;A character scale detection module is configured to detect the character scale of the text image, wherein the character scale of the text image is an average scale of the characters of the text image; 缩放模块,被配置为当所述文本图像的字符尺度小于第一预设尺度时,对所述文本图像进行放大处理以使得所述文本图像的字符尺度处于预设尺度范围中,当所述文本图像的字符尺度大于第二预设尺度时,对所述文本图像进行缩小处理以使得所述文本图像的字符尺度处于预设尺度范围中,所述预设尺度范围为大于第一预设尺度并且小于第二预设尺度的尺度范围;a scaling module, configured to, when the character scale of the text image is smaller than a first preset scale, enlarge the text image so that the character scale of the text image is within a preset scale range, and when the character scale of the text image is larger than a second preset scale, reduce the text image so that the character scale of the text image is within a preset scale range, wherein the preset scale range is a scale range larger than the first preset scale and smaller than the second preset scale; 文本区域检测模块,被配置为将所述文本图像输入到第一神经网络中,通过第一神经网络对文本图像进行特征提取和映射处理,以检测出文本图像中的文本区域;其中,所述第一神经网络中配置了感受窗,所述感受窗处于预设尺度范围中,所述感受窗用于在所述文本图像上移动以对所述文本图像进行特征提取;文本区域为所述文本图像中由部分或者全部字符组成的连续图像区域;The text region detection module is configured to input the text image into a first neural network, perform feature extraction and mapping processing on the text image through the first neural network, so as to detect a text region in the text image; wherein a perception window is configured in the first neural network, the perception window is within a preset scale range, and the perception window is used to move on the text image to perform feature extraction on the text image; the text region is a continuous image region composed of part or all of the characters in the text image; 质量分预测模块,被配置为对所述文本图像中的文本区域进行质量分预测,得到所述文本区域的质量分;A quality score prediction module is configured to predict the quality score of the text region in the text image to obtain the quality score of the text region; 加权求和模块,被配置为获取与各个所述文本区域相对应的预设权重,并基于所述预设权重对所述文本区域的质量分进行加权求和处理得到所述文本图像的质量分;A weighted summation module is configured to obtain preset weights corresponding to the respective text regions, and perform weighted summation processing on the quality scores of the text regions based on the preset weights to obtain the quality scores of the text images; 所述加权求和模块包括:The weighted summation module comprises: 字数获取单元,被配置为获取各个所述文本区域的字数,并将各个所述文本区域的字数进行求和运算得到所述文本图像的总字数;A word count acquisition unit is configured to acquire the word count of each of the text regions, and sum the word counts of each of the text regions to obtain the total word count of the text image; 预设权重计算单元,被配置为分别获取各个所述文本区域的字数与所述文本图像的总字数的比值,将所述比值作为与各个所述文本区域相对应的预设权重。The preset weight calculation unit is configured to respectively obtain the ratio of the number of words in each of the text regions to the total number of words in the text image, and use the ratio as the preset weight corresponding to each of the text regions. 8.一种计算机可读介质,其上存储有计算机程序,该计算机程序被处理器执行时实现权利要求1至6中任意一项所述的文本图像的质量检测方法。8. A computer-readable medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the text image quality detection method according to any one of claims 1 to 6. 9.一种电子设备,其特征在于,包括:9. An electronic device, comprising: 处理器;以及Processor; and 存储器,用于存储所述处理器的可执行指令;A memory, configured to store executable instructions of the processor; 其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1至6中任意一项所述的文本图像的质量检测方法。The processor is configured to execute the text image quality detection method according to any one of claims 1 to 6 by executing the executable instructions. 10.一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机指令,所述计算机指令存储在计算机可读存储介质中,计算机设备的处理器从所述计算机可读存储介质读取所述计算机指令,所述处理器执行所述计算机指令,使得所述计算机设备执行如权利要求1至6中任一项所述的文本图像的质量检测方法。10. A computer program product, characterized in that the computer program product includes computer instructions, the computer instructions are stored in a computer-readable storage medium, a processor of a computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the text image quality detection method as described in any one of claims 1 to 6.
CN202110484548.8A 2021-04-30 2021-04-30 Text image quality detection method, device, medium and electronic equipment Active CN113763313B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110484548.8A CN113763313B (en) 2021-04-30 2021-04-30 Text image quality detection method, device, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110484548.8A CN113763313B (en) 2021-04-30 2021-04-30 Text image quality detection method, device, medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN113763313A CN113763313A (en) 2021-12-07
CN113763313B true CN113763313B (en) 2025-05-30

Family

ID=78786949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110484548.8A Active CN113763313B (en) 2021-04-30 2021-04-30 Text image quality detection method, device, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113763313B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114898409B (en) * 2022-07-14 2022-09-30 深圳市海清视讯科技有限公司 Data processing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914834A (en) * 2020-06-18 2020-11-10 绍兴埃瓦科技有限公司 Image recognition method and device, computer equipment and storage medium
CN111967456A (en) * 2020-08-06 2020-11-20 赖明钟 OCV detection method for code-spraying characters

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9183224B2 (en) * 2009-12-02 2015-11-10 Google Inc. Identifying matching canonical documents in response to a visual query
US20140267438A1 (en) * 2013-03-13 2014-09-18 Apple Inc. Scaling an image having text
CN107644415B (en) * 2017-09-08 2019-02-22 众安信息技术服务有限公司 A kind of text image quality assessment method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914834A (en) * 2020-06-18 2020-11-10 绍兴埃瓦科技有限公司 Image recognition method and device, computer equipment and storage medium
CN111967456A (en) * 2020-08-06 2020-11-20 赖明钟 OCV detection method for code-spraying characters

Also Published As

Publication number Publication date
CN113763313A (en) 2021-12-07

Similar Documents

Publication Publication Date Title
CN111754541B (en) Target tracking method, device, equipment and readable storage medium
CN115050064B (en) Human face liveness detection method, device, equipment and medium
US9036905B2 (en) Training classifiers for deblurring images
US20230021661A1 (en) Forgery detection of face image
WO2022161286A1 (en) Image detection method, model training method, device, medium, and program product
WO2021203863A1 (en) Artificial intelligence-based object detection method and apparatus, device, and storage medium
CN112101344B (en) Video text tracking method and device
CN111027563A (en) Text detection method, device and recognition system
CN106096542B (en) Image video scene recognition method based on distance prediction information
CN111209897B (en) Video processing method, device and storage medium
CN110457523B (en) Cover picture selection method, model training method, device and medium
CN107209942A (en) Method for checking object and image indexing system
CN111814821A (en) Deep learning model establishing method, sample processing method and device
CN111177450B (en) Image retrieval cloud identification method and system and computer readable storage medium
CN106503112B (en) Video retrieval method and device
CN115082667A (en) Image processing method, device, equipment and storage medium
CN115131218A (en) Image processing method, apparatus, computer readable medium and electronic device
CN112488072A (en) Method, system and equipment for acquiring face sample set
CN112801960B (en) Image processing method and device, storage medium and electronic equipment
CN109241299B (en) Multimedia resource searching method, device, storage medium and equipment
CN117036392A (en) Image detection method and related device
CN113763313B (en) Text image quality detection method, device, medium and electronic equipment
CN118015644B (en) Social media keyword data analysis method and device based on pictures and characters
CN116129228A (en) Training method of image matching model, image matching method and device thereof
CN114387489A (en) Power equipment identification method and device and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant