[go: up one dir, main page]

CN112395874B - Order information correction method, device, equipment and storage medium - Google Patents

Order information correction method, device, equipment and storage medium Download PDF

Info

Publication number
CN112395874B
CN112395874B CN202011339777.2A CN202011339777A CN112395874B CN 112395874 B CN112395874 B CN 112395874B CN 202011339777 A CN202011339777 A CN 202011339777A CN 112395874 B CN112395874 B CN 112395874B
Authority
CN
China
Prior art keywords
information
order
corrected
target
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011339777.2A
Other languages
Chinese (zh)
Other versions
CN112395874A (en
Inventor
张斌
彭佳玮
陈凯歌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sensetime International Pte Ltd
Original Assignee
Sensetime International Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sensetime International Pte Ltd filed Critical Sensetime International Pte Ltd
Priority to CN202011339777.2A priority Critical patent/CN112395874B/en
Publication of CN112395874A publication Critical patent/CN112395874A/en
Priority to PCT/IB2021/055848 priority patent/WO2022112857A1/en
Application granted granted Critical
Publication of CN112395874B publication Critical patent/CN112395874B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0633Managing shopping lists, e.g. compiling or processing purchase lists
    • G06Q30/0635Managing shopping lists, e.g. compiling or processing purchase lists replenishment orders; recurring orders
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/14Travel agencies

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Remote Sensing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The method comprises the steps of obtaining order information to be corrected according to a text recognition result of an order, determining target search information from the text recognition result, obtaining order reference information matched with the target search information in a preset search mode, and correcting the order information to be corrected by utilizing the order reference information to obtain target order information.

Description

Order information correction method, device, equipment and storage medium
Technical Field
The present disclosure relates to computer vision, and in particular, to a method, apparatus, device, and storage medium for correcting order information.
Background
OCR (Optical Character Recognition ) technology is now widely used in a variety of fields and industries, by which a large portion of text words in a text-material image can be recognized. However, due to the accuracy problem of the OCR results, errors may occur in the information extracted from the OCR results. How to obtain accurate information according to the OCR result is still to be further studied.
Disclosure of Invention
The embodiment of the disclosure provides a correction scheme of order information.
According to one aspect of the disclosure, a correction method of order information is provided, and the method comprises the steps of obtaining order information to be corrected according to a text recognition result of an order, determining target search information from the text recognition result, obtaining order reference information matched with the target search information in a preset search mode, and correcting the order information to be corrected by utilizing the order reference information to obtain target order information.
In combination with any one of the embodiments provided in the present disclosure, the target search information includes a part of content of the order information to be corrected, and the part of content includes at least one of a subject name and at least one element.
In combination with any one embodiment of the disclosure, the acquiring order reference information matched with the target search information through a preset search mode includes at least one of accessing a setting database to acquire order reference information matched with the target search information from the setting database, and acquiring order reference information matched with the target search information through internet.
In connection with any one of the embodiments provided in this disclosure, the setting database includes reference unit information of a plurality of levels, and reference unit information of a lowest level of the plurality of levels corresponds to a plurality of reference subject names.
In combination with any one embodiment of the disclosure, the setting database stores first reference information corresponding to a reference subject name, the determining of target search information from the text recognition result includes obtaining unit information of a lowest level in the order information to be corrected according to level division in the setting database, and the obtaining of order reference information matched with the target search information from the setting database includes determining target unit information matched with unit information of a lowest level in the order information to be corrected in the reference unit information of the lowest level in the setting database, determining target subject names meeting preset conditions in a plurality of reference subject names corresponding to the target unit information, and obtaining order reference information matched with the target search information according to the first reference information corresponding to the target subject names.
In combination with any one of the embodiments provided in the disclosure, the setting database stores second reference information corresponding to a reference subject name, the acquiring order reference information matched with the target search information from the setting database includes acquiring unit information of a lowest level in the order information to be corrected according to level division in the setting database, determining target unit information matched with the unit information of the lowest level in the order information to be corrected in the reference unit information of the lowest level in the setting database, determining target subject names meeting preset conditions in a plurality of reference subject names corresponding to the target unit information, and acquiring order reference information matched with the target search information according to the reference unit information of each level corresponding to the target subject name and the second reference information corresponding to the target subject name.
In combination with any one embodiment of the disclosure, the determining a target subject name that meets a preset condition from among a plurality of reference subject names corresponding to the target unit information includes respectively matching the subject name corresponding to the to-be-corrected order information with the plurality of reference subject names corresponding to the target unit information, and determining the reference subject name with the highest matching score and exceeding a first set threshold as the target subject name.
According to any implementation mode of the method, the order reference information matched with the target search information is obtained through the Internet, the method comprises the steps of searching in the Internet according to part of content of the order information to be corrected to obtain at least one piece of reference information matched with the target search information, matching the reference information corresponding to the target search information with the order information to be corrected, and obtaining the order reference information which is highest in matching score and exceeds a second set threshold.
In combination with any one of the embodiments provided in the present disclosure, the method further includes adding the order reference information obtained from the internet and a subject name corresponding to the order information to be corrected to information corresponding to reference unit information of a lowest level in the setting database.
In combination with any one of the embodiments provided in the present disclosure, the method further includes updating information corresponding to reference unit information of a lowest level in the setting database according to the order reference information obtained from the internet and a subject name corresponding to the order information to be corrected.
In combination with any one of the embodiments provided in the disclosure, the order information to be corrected at least includes address information, at least one element included in the order information to be corrected includes at least one of a administrative area and a postal code, and the reference unit information of multiple levels included in the setting database includes reference administrative area information or postal code information.
According to any embodiment of the method, the device and the system for obtaining the order information to be corrected according to the text recognition result of the order, the method comprises the steps of obtaining the text recognition result of the order, wherein the text recognition result comprises a plurality of text boxes, determining a first text box containing key information from the text boxes, the key information comprises partial content of the order information to be corrected, the partial content comprises at least one element in the order information to be corrected and at least one keyword indicating the order information to be corrected, merging at least part of the text boxes according to the first text box to obtain a merged text box, and obtaining the order information to be corrected from the merged text box.
According to one aspect of the disclosure, a correction device of order information is provided, and the correction device comprises an acquisition unit, a determination unit, a matching unit and a correction unit, wherein the acquisition unit is used for acquiring order information to be corrected according to a text recognition result of an order, the determination unit is used for determining target search information from the text recognition result, the matching unit is used for acquiring order reference information matched with the target search information in a preset search mode, and the correction unit is used for correcting the order information to be corrected according to the order reference information so as to obtain target order information.
In combination with any one of the embodiments provided in the present disclosure, the target search information includes a part of content of the order information to be corrected, and the part of content includes at least one of a subject name and at least one element.
In combination with any one of the embodiments provided in the present disclosure, the matching unit is specifically configured to access a setting database to obtain order reference information matched with the target search information from the setting database, and obtain order reference information matched with the target search information through the internet.
In connection with any one of the embodiments provided in this disclosure, the setting database includes reference unit information of a plurality of levels, and reference unit information of a lowest level of the plurality of levels corresponds to a plurality of reference subject names.
In combination with any one embodiment of the disclosure, the setting database stores first reference information corresponding to a reference subject name, the determining unit is specifically configured to obtain unit information of a lowest level in order information to be corrected according to level division in the setting database, and the obtaining order reference information matched with the target search information from the setting database includes determining target unit information matched with unit information of a lowest level in the reference unit information of the setting database, determining target subject names meeting preset conditions in a plurality of reference subject names corresponding to the target unit information, and obtaining order reference information matched with the target search information according to the first reference information corresponding to the target subject names.
In combination with any one of the embodiments provided by the disclosure, the setting database stores second reference information corresponding to a reference subject name, the matching unit is specifically configured to obtain unit information of a lowest level in the order information to be corrected according to level division in the setting database, determine target unit information matched with the unit information of the lowest level in the order information to be corrected in the reference unit information of the lowest level in the setting database, determine target subject names corresponding to the target unit information and meeting preset conditions in a plurality of reference subject names corresponding to the target unit information, and obtain order reference information matched with the target search information according to the reference unit information of each level corresponding to the target subject name and the second reference information corresponding to the target subject name.
In combination with any one of the embodiments provided in the present disclosure, the matching unit is configured to match, among a plurality of reference subject names corresponding to the target unit information, a subject name that meets a preset condition, where the subject name corresponding to the to-be-corrected order information is respectively matched with the plurality of reference subject names corresponding to the target unit information, and determine, as the target subject name, the reference subject name that has the highest matching score and exceeds a first set threshold.
In combination with any one of the embodiments provided in the present disclosure, the matching unit is specifically configured to search in the internet according to a part of content of the order information to be corrected to obtain at least one reference information matched with the target search information, match the reference information corresponding to the target search information with the order information to be corrected, and obtain order reference information with a highest matching score and exceeding a second set threshold.
In combination with any one of the embodiments provided in the present disclosure, the apparatus further includes an adding unit, configured to add the order reference information obtained from the internet and a subject name corresponding to the to-be-corrected order information to information corresponding to reference unit information at a lowest level in the setting database.
In combination with any one of the embodiments provided in the present disclosure, the apparatus further includes an updating unit, configured to update information corresponding to reference unit information of a lowest level in the setting database according to the order reference information obtained from the internet and a subject name corresponding to the order information to be corrected.
In combination with any one of the embodiments provided in the disclosure, the order information to be corrected at least includes address information, at least one element included in the order information to be corrected includes at least one of a administrative area and a postal code, and the reference unit information of multiple levels included in the setting database includes reference administrative area information or postal code information.
The method comprises the steps of obtaining a text recognition result of an object to be processed, determining a first text box containing key information from the text boxes, wherein the key information comprises partial content of the order information to be corrected, the partial content comprises at least one element in the order information to be corrected and at least one keyword indicating the order information to be corrected, merging at least part of the text boxes according to the first text box to obtain a merged text box, and obtaining the order information to be corrected from the merged text box.
According to an aspect of the disclosure, there is provided an electronic device, the device comprising a memory for storing computer instructions executable on the processor for implementing a method of correcting order information according to any embodiment of the disclosure when the computer instructions are executed.
According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of correcting order information according to any of the embodiments of the present disclosure.
According to the correction method, device, equipment and storage medium of order information of one or more embodiments of the present disclosure, order information to be corrected is obtained according to a text recognition result of an order, target search information is determined from the text recognition result, order reference information matched with the target search information is obtained in a preset search mode, the order information to be corrected is corrected by using the order reference information to obtain target order information, and accurate target order information can be obtained rapidly from the text recognition result of the order.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the specification and together with the description, serve to explain the principles of the specification.
FIG. 1 is a flow chart of a method of correcting order information in accordance with at least one embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a configuration database in a method for correcting order information according to at least one embodiment of the present disclosure;
3A, 3B, 3C are schematic diagrams of an information extraction method according to at least one embodiment of the present disclosure;
FIG. 4 is a schematic diagram of an order information correction device according to at least one embodiment of the present disclosure;
Fig. 5 is a schematic structural diagram of an electronic device according to at least one embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
Fig. 1 illustrates a flow chart of a method of correcting order information according to some embodiments of the present disclosure. As shown in FIG. 1, the method includes steps 101-104.
In step 101, order information to be corrected is obtained according to the text recognition result of the order.
In an embodiment of the present disclosure, the order for text recognition includes at least one of an order image, an order in the form of an electronic document, such as a pdf document. Those skilled in the art will appreciate that the order may also include other types suitable for text recognition.
In one example, text boxes contained in an order can be obtained by text detection of the order, text words in the text boxes can be identified by text recognition of the obtained text boxes, so that text recognition results can be obtained, and text recognition, such as OCR, can be directly carried out on the order to be processed, so that text recognition results of the text boxes contained in the order can be obtained. The embodiment of the disclosure does not limit a specific method for acquiring the text recognition result.
The order information to be corrected is the order information to be corrected obtained from the text recognition result of the order according to the set rule. For example, in the case where the order information to be corrected contains address information, the address information to be corrected may be acquired from the text recognition result according to a rule of the address information.
In step 102, target search information is determined from the text recognition result. The target search information is information related to the order information to be corrected or capable of reflecting the characteristics of the order information to be corrected.
In one example, the target search information includes partial content of the order information to be corrected, the partial content including at least one of a subject name, at least one element. Taking address information as an example, the part of the contents in the order to be corrected indicated by the target search information may include a subject name (for example, a name, a place name, etc.) to which the address information belongs, and at least one element (for example, each level of administrative area, a postal code corresponding to each level of line area, etc.) included in the address information.
In step 103, order reference information matched with the target search information is acquired through a preset search mode.
In the embodiment of the disclosure, the order reference information matched with the target search information can be acquired from a setting database by accessing the setting database. The setting database stores a plurality of reference subject names and corresponding reference information. For example, when the processing information is address information, the setting database is a database storing a plurality of subject names and corresponding address information, and according to the subject names, such as "XX hotel" and postal code, corresponding to the order information to be corrected, the setting database can search for the matched "XX hotel", and use the corresponding address information as order reference information.
In the embodiment of the disclosure, the order reference information matched with the target search information can also be acquired through the internet. Still taking address information as an example, according to the subject name and the postal code corresponding to the order information to be corrected, a search engine is utilized to search in the internet, and the information corresponding to the searched matched subject name is used as order reference information.
In the embodiment of the disclosure, order reference information matched with the target search information can be acquired from a setting database and the internet at the same time. In the case of order reference information obtained from both the setting database and the internet, either one of them, or a designated one may be used as target order reference information, and in the case of order reference information obtained only from the internet, the setting database may be updated with the search result of the internet.
In step 104, the order information to be corrected is corrected by using the order reference information, so as to obtain target order information.
In the embodiment of the present disclosure, the correction method, apparatus, device and storage medium for order information provided in at least one embodiment of the present disclosure obtain order information to be corrected according to a text recognition result of an order, determine target search information from the text recognition result, obtain order reference information matched with the target search information in a preset search manner, correct the order information to be corrected by using the order reference information to obtain target order information, and can quickly obtain accurate target order information from the text recognition result of the order.
Address databases in the related art generally only support subject name to address queries and have a certain fault tolerance only at the beginning and end of an input word. Since the correction method of the order information provided in the embodiment of the disclosure obtains the matched order reference information according to the target search information determined from the text recognition result, and the target search information may be part of the content in the order information to be corrected, such as a main body name, or an element in the order information to be corrected, even if there is wrong information in the order information, even if there is wrong main body name, the correction method can also be used as the target search information through other information in the order information to obtain the order reference information to correct the order information to be corrected, so that the correction method has higher fault tolerance.
In addition, since the acquisition of the target search information is irrelevant to the text arrangement mode of the order, the correction method of the order information provided by at least one embodiment of the present disclosure is applicable to orders with different layouts.
In some embodiments, from the text recognition result of the order, a subject name corresponding to the order information to be corrected may be obtained as target search information. The subject name and the order information to be corrected are, for example, key value pair information, wherein the subject name indicates an attribute, and the order information to be corrected indicates a value of the attribute.
In one example, the order information to be corrected may be address information, where a subject corresponding to the address information is an object to which the address information belongs, and a name of a corresponding master pair is a name of the object to which the address information belongs. For example, the corresponding subject name is a name in the case where the object to which the address information belongs is a person, and the corresponding subject name is a place name in the case where the object to which the ground information belongs is a place. The order information to be corrected can also be identity information, and the name of the principal corresponding to the identity information is a name. Those skilled in the art will appreciate that the order information to be corrected may also be other types of information, and this disclosure is not limited in this regard.
In some embodiments, the setting database may include a plurality of levels of reference unit information, and reference unit information of each lowest level of the plurality of levels corresponds to a plurality of reference subject names. In the setting database, the reference unit information is organized and stored according to the hierarchical level from top to bottom, and the lower the hierarchical level, the narrower the range or the lower the authority corresponding to the reference unit information. The lowest hierarchy is the reference information unit with the minimum corresponding range or the lowest authority. Taking a setting database storing address information as an example, the reference unit information of multiple levels included in the setting database includes reference administrative area information and/or postal code information, and the reference unit information of the lowest level includes an administrative area name with the smallest range and/or a postal code corresponding to the smallest administrative area.
In one example, the reference unit information in the setting database may be stored in a tree structure, and the non-leaf nodes of different levels store the reference unit information of different levels, and the leaf nodes are used for storing the reference body names belonging to the nodes of the previous level.
In some embodiments, the setting database further stores first reference information corresponding to each reference subject name. The first reference information is usually complete information corresponding to the reference subject name, and includes reference unit information of each level and specific reference information corresponding to the reference subject name. Taking address information as an example, the first reference information may be complete address information, including administrative area information of each level and a specific address corresponding to a reference subject name, for example, a street and/or a unit. The first reference information is obtained in advance, and the reference information corresponding to the reference subject name with higher reliability and accuracy is obtained.
Taking the to-be-corrected order information as address information in the hotel order as an example, the reference unit information of multiple levels in the setting database may be administrative areas of multiple levels. The tree structure for storing address information can be a root tree structure, the root node has no practical meaning, the child node of the root can be used for storing travel agents (such as XX travel agencies) of orders, the rest of non-leaf nodes can be used for storing administrative district components or postal codes of countries, each leaf node can store an object name, and each leaf node can also store complete address information corresponding to the object name. In the subtree corresponding to the same traveller, all non-leaf nodes are unique, and the parent node of the non-leaf node represents its own direct high administrative district.
Fig. 2 is a schematic diagram of a configuration of a setting database in a method for correcting order information according to at least one embodiment of the present disclosure. As shown in fig. 2, the subtrees of the traveller may be structured according to a top-to-bottom (shallow-to-deep) hierarchy, the country-province-city-region, in some cases the next level of the region may also include sub-regions, and each administrative region may also be replaced with a postal code, for example structured as a country-province-postal code-region. It will be appreciated by those skilled in the art that the above is merely an example, and that the postal code may be substituted for any administrative area, and this disclosure is not limited in this regard.
For the setting database for storing address information, the reference administrative region information of each level stored in the tree structure can be obtained according to administrative division tables and corresponding tables of postal codes and administrative regions of each country disclosed on the Internet, and the reference main body names and corresponding first reference information stored in the leaf nodes can be obtained by manual labeling.
In the case that the first reference information corresponding to each reference subject name is also stored in the setting database, the order reference information corresponding to the order information to be corrected may be obtained in the following manner.
First, the unit information of the lowest hierarchy in the order information to be corrected may be acquired as target search information according to the hierarchy division in the setting database.
Taking the to-be-corrected order information as the address information of the hotel order as an example, according to the hierarchical division of the addresses in the setting database, that is, the tree structure of the database, the unit information of each hierarchy included in the to-be-corrected order information can be obtained. For example, splitting the order information to be corrected according to the tree structure "country-province-city-district" in the database, then administrative district information of each level included in the address information can be obtained. Wherein, the administrative area information of the lowest hierarchy can be used as target search information. For example, in the case where the minimum administrative area included in the address information is a subarea, the subarea information may be used as target search information, in the case where the minimum administrative area included in the address information is a subarea, the area information may be used as target search information, and other cases are similar and will not be described again.
Next, the target unit information matched with the unit information of the lowest level in the order information to be corrected in the reference unit information of the lowest level in the setting database is determined. That is, in the tree structure of the setting database, the position of the lowest-level unit information in the order information to be corrected is located. And in the tree structure of the setting database, locating the storage position of the unit information of the lowest level, namely determining the (matched) reference unit information corresponding to the unit information of the lowest level, and taking the reference unit information as target unit information.
And then, determining the target subject names which meet preset conditions in a plurality of reference subject names corresponding to the target unit information.
The reference unit information of each lowest level in the setting database corresponds to a plurality of reference subject names, and thus among the plurality of reference subject names, the target subject name can be determined according to a preset condition.
In one example, the subject names corresponding to the order information to be corrected can be respectively matched with a plurality of reference subject names corresponding to the target unit information, and the reference subject name with the highest matching score and exceeding a first set threshold value is determined as the target subject name.
And finally, acquiring order reference information matched with the target search information according to the first reference information corresponding to the target subject name.
Under the condition that the reference information corresponding to each reference subject name is stored in the preset library, according to the first reference information corresponding to the determined target subject name, order reference information of the order information to be corrected can be obtained. The reference information stored in the setting database has higher credibility and accuracy, and the reference information is utilized to correct the order information to be corrected, so that more accurate target order information can be obtained.
In some embodiments, the setting database stores second reference information corresponding to each reference subject name. The second reference information is other reference information than the reference unit information of each level, and is generally more specific information than the reference unit information of each level. Taking the to-be-corrected order information as an example of address information contained in a hotel order, the second reference information may be, for example, a street and/or a unit where the hotel is located. The second reference information is obtained in advance, and the reference information corresponding to the reference subject name with higher reliability and accuracy is obtained.
In the case that the second reference information corresponding to each reference subject name is stored in the setting database, the manner of determining the target subject name is similar to the above method, except that after determining the target subject name, order reference information corresponding to the order information to be corrected is obtained according to the reference unit information of each level corresponding to the target subject name and the second reference information corresponding to the target subject name.
According to the reference unit information of each level corresponding to the target subject name and the second reference information corresponding to the target subject name, the complete information of the target subject name can be obtained, the order information to be corrected is corrected according to the complete information, and more accurate and complete target order information can be obtained.
In some embodiments, order reference information corresponding to the to-be-corrected order information may also be obtained from the internet according to the target search information.
In one example, the part of the content of the to-be-corrected order information, such as the subject name or at least one element, may be searched in the internet to obtain at least one order reference information corresponding to the subject name, and the order reference information corresponding to the subject name is matched with the to-be-corrected order information to obtain the reference information with the highest matching score and exceeding the second set threshold.
Still taking the to-be-corrected order information as address information of a hotel order as an example, the target search information may include a postal code included in the address information and/or administrative area information of one level. According to the subject name corresponding to the address information, namely the hotel name, and at least one element included in the order information to be corrected, the address information of a plurality of possible hotel addresses can be obtained from the Internet. And performing fuzzy matching on the address information obtained from the mutual network and the order information to be corrected according to the address components, wherein the address information which is highest in matching and exceeds a second set threshold value is used as order reference information of the order information to be corrected so as to perform correction, thereby obtaining more accurate hotel address information.
In the case where the matching scores of two or more address information are the same, any one of the address information may be retained and the other address information may be deleted.
In the embodiment of the disclosure, the address database is used for organizing and storing each grade of administrative district and the postal code corresponding to the administrative district, and the setting can be carried out according to the specification of the target country, so that the correction method is easy to expand to the correction of the travel itinerary information of any country destination.
In some embodiments, the target search information may be retrieved in the settings database first, and then in the interconnections.
When the target subject name matched with the subject name corresponding to the order information to be corrected does not exist in the setting database, the reference information corresponding to the order information to be corrected and the subject name corresponding to the order information to be corrected, which are acquired from the internet, can be added to the information corresponding to the reference unit information of the lowest level in the setting database, that is, the subject name is added to the reference subject name corresponding to the reference unit information of the corresponding lowest level. For the setting database of the tree structure, namely, the subject name corresponding to the order information to be corrected and the order reference information are stored in leaf nodes of the tree structure to become newly added reference subject names and corresponding first reference information.
When the reference information obtained from the setting database is inconsistent with the reference information obtained from the internet, the information corresponding to the reference unit information of the lowest level in the setting database can be updated according to the reference information corresponding to the order information to be corrected obtained from the internet and the subject name corresponding to the order information to be corrected. That is, the reference information of the target subject name corresponding to the reference unit information of the lowest level in the setting database is replaced by the reference information corresponding to the order to be corrected obtained from the internet. For the setting database of the tree structure, namely, the reference information corresponding to the order information to be corrected is replaced with the reference information corresponding to the reference subject name originally stored in the leaf node of the tree structure, so that the reference information of the reference subject name is updated.
In one example, before updating the reference information of the reference subject name, a latest update time of the reference information corresponding to the order information to be corrected acquired from the internet may be acquired, and whether to update the reference information of the reference subject name may be determined based on the update time. For example, the update may be performed within a set time range, such as within the last year, or within the last 6 months, if the last update time exceeds the set time range, on the contrary, a prompt message may be output, and a technician may determine whether to perform the update to avoid an erroneous update.
In the embodiment of the disclosure, by adding and updating the setting database by using the reference information acquired from the internet, the credibility and accuracy of the reference information acquired from the setting database can be obtained, so that more accurate order information to be corrected can be acquired from an order to be processed.
Since it is necessary to fill in hotel information and provide hotel travel itineraries for examination when transacting visa for travel applications. Word recognition and information extraction of hotel travel itineraries can reduce tedious user filling and simplify the review process, however, due to the accuracy of OCR results, errors may occur in the information extracted from the OCR results.
In the related art, an N-gram is usually used to correct the text recognition result, however, because the training of the N-gram depends on word banks, but the word banks of address information, especially the word banks of foreign places are usually incomplete, and the correction effect of the N-gram on the text recognition result of the hotel order is poor.
By applying the correction method of the order information provided by at least one embodiment of the present disclosure to automatic visa processing, hotel address information in a text recognition result of a hotel travel itinerary can be corrected, for example, error information in a hotel address is corrected, or incomplete hotel addresses are complemented, so that accuracy and reliability of automatic visa information filling are improved, user experience is improved, and approval process is facilitated to be accelerated. In addition, the correction method can utilize the reference information acquired from the Internet to correct or update the setting database according to the reference information acquired from the Internet, so that the problem of incomplete word stock can be solved, and a better correction effect can be obtained.
In the embodiment of the disclosure, the order information to be corrected at least includes address information and identity information, and in this case, the order information to be corrected may be obtained from a text recognition result of the order to be processed by the following method.
First, a text recognition result of the order is obtained, wherein the text recognition result comprises a plurality of text boxes.
Next, a first text box containing key information is determined from the plurality of text boxes. The key information may include a partial content of the order information to be corrected, the partial content including at least one element in the order information to be corrected, at least one of keywords indicating the order information to be corrected.
In the case where the order information to be corrected is address information, the key information may include an element "postal code" in the address information, and in the case where the area to which the address information belongs is known, the number of digits of the postal code may be determined. Taking the order information to be corrected as the thailand address as an example, since the thailand zip code is a 5-digit number, it can be determined that the key information is a 5-digit number. In this step, a text box containing 5 digits is determined as the first text box. Considering that more than 5 digits may be included in the identified content, for example, the text box includes 8 digits, etc., in order to reduce the additional discriminating operation, a text box including only 5 digits may be determined as the first text box in the actual application process.
In some embodiments, for the found zip code, a search may also be made in the list of zip codes for the locale to which the found zip code belongs, to confirm that the found zip code is indeed the locale to which the found zip code belongs.
Under the condition of the area to which the unknown address information belongs, the digit condition of the postal codes in all areas of the world can be synthesized, and the key information can be determined as 4-9 digits. In this step, text boxes containing numbers of 4 bits to 9 bits are respectively determined as the first text box. In one possible implementation, to reduce the additional discrimination operations, a text box containing only 4-9 digits may be determined as the first text box, i.e., not considered for text boxes containing 10 digits or more.
The key information may further include element-administrative area information, such as "xx", in the address information, and a text box containing text content such as "xx" is determined as a first text box among the plurality of text boxes.
The key information further comprises a key word indicating the order information to be corrected, and taking the order information to be corrected as an address as an example, the key word comprises an address and other keywords representing the address in other languages. In the present application, the form of the keyword is not limited, and may include various expressions such as a full name and abbreviation.
And combining at least part of the text boxes according to the first text box to obtain a combined text box.
In an embodiment of the present disclosure, the text box to be merged is determined based on the first text box. For example, a text box to be combined may be determined according to a positional relationship with the first text box, and the text box to be combined is combined to obtain a combined text box.
And finally, acquiring order information to be corrected from the combined text box.
The order information to be corrected can be extracted from the merged text box according to the content contained in the merged text box or the format information of the merged text box or according to the content contained in the merged text box and the format information of the merged text box.
In the embodiment of the disclosure, the first text box containing the key information is determined in the text boxes contained in the text recognition result of the order to be processed, at least part of the text boxes are combined according to the first text box to obtain the combined text box, and the order information to be corrected is acquired from the combined text box, so that efficient information processing can be realized in the order to be processed according to the key information in the order information to be corrected.
In some embodiments, the text boxes may be combined in the following manner to obtain a combined text box.
First, a positional relationship between each of the plurality of text boxes except the first text box and the first text box is acquired. The positional relationship includes an azimuth relationship of other text boxes (i.e., any text box other than the first text box or a specified text box) with the first text box, for example, above or below the first text box, and also includes a distance from the first text box, for example, a pixel distance from the first text box in a vertical direction, and a pixel distance in a horizontal direction. Wherein the distance between the text boxes is determined from the distance between the center points of the two text boxes.
And then, determining the text box, of which the position relation with the first text box belongs to a set range, as a second text box. For example, a text box above the first text box may be determined as a second text box, or a text box whose pixel distance in the vertical direction from the first text box is within a set threshold may be determined as a second text box, or the like.
And then, combining the first text box and the second text box as text boxes to be combined to obtain the combined text box.
In the embodiment of the disclosure, according to the positions of the text boxes in the text recognition result and the first text box containing the key information, the text boxes to be combined are determined, and the combined text boxes are combined, so that the combined text box objects can be reduced to be in the range related to the order information to be corrected, the information processing amount is reduced, and the information processing efficiency is improved.
The merging of the text boxes to be merged may be performed on a line basis. That is, according to the row to which each text box in the text boxes to be combined belongs, the text boxes to be combined are combined, and the combined text boxes are obtained.
And determining one text box belonging to the same row as one merging text box under the condition that the number of the text boxes belonging to the same row in the text boxes to be merged is one.
And under the condition that the number of the text boxes belonging to the same row in the text boxes to be combined is a plurality of, combining the text boxes belonging to the same row to obtain a combined text box.
Fig. 3A shows an exemplary merging result. As shown in fig. 3A, the text box comprises a plurality of rows of merged text boxes, including merged text boxes 301-303, wherein each row of merged text boxes is obtained by merging one or more text boxes contained in the row.
In the embodiment of the disclosure, the text boxes to be merged are merged by merging the rows to which each text box belongs, so that the merged text box corresponding to each row is obtained, and subsequent information processing is facilitated.
In some embodiments, for a plurality of text boxes belonging to the same row, merging two adjacent text boxes under the condition that the distance between the two adjacent text boxes is smaller than a first threshold value, and merging every two adjacent text boxes meeting the conditions in the same row to obtain one merged text box corresponding to the row. The first threshold may be specifically determined according to a format feature of the order information to be corrected.
For a plurality of text boxes belonging to the same line, in case the distance between adjacent text boxes is greater than or equal to the first threshold value, it is indicated that the two adjacent text boxes may not be related content, not belonging to the order information to be corrected, and therefore the adjacent text boxes are not merged.
And under the condition that adjacent text boxes in the same row are combined to obtain more than one combined text box, determining the combined text box corresponding to the row according to the obtained position relation between the combined text box and the first text box. For example, a merged text box closest to the first text box in the horizontal direction is taken as a final merged text box.
In the embodiment of the disclosure, by limiting the merging condition between adjacent text boxes in the same row, the text boxes with irrelevant contents can be prevented from being merged into the merged text box, and the accuracy of information processing is improved.
In some embodiments, the order information to be corrected may be obtained from the consolidated text box according to the format characteristics of the order to be processed.
The format features of the order to be processed comprise distance features among the texts of each row, font features of the texts of each row, position relationship features among the texts and the like.
According to the format characteristics, a target direction for acquiring the order information to be corrected can be determined, and the order information to be corrected is acquired according to the target direction.
For example, in the case where the order information to be corrected is address information and the key information is a zip code, since the zip code is located at the end of the address information in general, it can be determined that the order information to be corrected is located above the first text box, and thus a target direction for extracting the order information to be corrected can be determined, and extraction is performed according to the target direction.
For another example, in the case where the order information to be corrected is address information and the key information is a key word "address" indicating the address information, since the key "address" word is generally located at the forefront of the address information, it can be determined that the order information to be corrected is located below the first text box, and thus a target direction in which the order information to be corrected is extracted can be determined, and extraction can be performed according to the target direction.
In the embodiment of the disclosure, the information processing efficiency can be improved by determining the target direction according to the format characteristics of the order to be processed and acquiring the order information to be corrected from the combined text box according to the target direction.
In some embodiments, the target directions include a first target direction for indicating a direction of traversing the merged text box in locating an area in which the order information to be corrected is located, and a second target direction for indicating a direction of reading the order information to be corrected from the area in which the order information to be corrected is located.
In one example, the first text box is used as a starting position, the merged text box is traversed according to the first target direction until the merged text box where the key information is located is found, the key information is used as a starting position, the merged text box is traversed according to the second target direction until the merged text box where the key information is located is found, and content traversed according to the second target direction is obtained. Wherein the key information may include a key word indicating the order information to be corrected, at least one element of the order information to be corrected, a partial content of the order information to be corrected, and the like. Taking the order information to be corrected as an address as an example, the keywords indicating the address information comprise 'address', and keywords representing the address in other languages.
Referring to the exemplary merged text box shown in fig. 3A, the key information is the zip code 10110, and the first text box containing the "zip code 10110" is taken as a starting position, that is, from the merged text box 301, the merged text box is traversed upwards until the merged text box 302 where the key information "Address" is located is found. And traversing the merged text box downwards by taking the key information Address as a starting position until the merged text box 301 where the key information postal code 10110 is located is found, and acquiring the content traversed downwards as order information to be corrected. The "address" such as english paraphrasing is not limited to the form of uppercase, lowercase, or the like of some or all of the letters in the word, and may be adjusted in accordance with the actual situation. That is, in the processing such as actual identification, the same processing manner as for ADDRESS, address, address and the like can be adopted, that is, all are identified as "addresses".
In one example, the method further includes obtaining a distance between adjacent merged text boxes. Wherein the adjacent merged text box includes two merged text boxes adjacent in a vertical direction. And a plurality of merged text boxes obtained from the text recognition result, including a plurality of pairs of adjacent merged text boxes. As shown in fig. 3B, the merged text boxes 311-314 include adjacent merged text boxes 311-312, adjacent merged text boxes 312-313, and adjacent merged text boxes 313-314.
And traversing the combined text boxes according to the first target direction by taking the first text boxes as starting positions until adjacent combined text boxes with the distance meeting a first setting condition are found. The method comprises the steps of traversing, namely acquiring text content in the merged text box, and acquiring the distance between the merged text box and adjacent merged text boxes, wherein the adjacent merged text boxes are traversed among the traversed merged text boxes. And then, traversing the merged text box according to the second target direction by taking the traversed merged text box as a starting position in the adjacent merged text boxes with the distance meeting the first setting condition until the merged text box where the key information is located is found, and acquiring the traversed content according to the second target direction. The distance between adjacent merged text boxes meets a first setting condition, wherein the distance between the adjacent merged text boxes is larger than a first inter-box distance threshold.
Referring to the exemplary merged text box shown in fig. 3B, the key information is zip code 10400, and the first text box containing the zip code is taken as a starting position, that is, the first text box containing "10400" is taken as a starting position, that is, from merged text box 311, the merged text box is traversed upwards. Taking traversing to the merge text box 312 as an example, the method includes obtaining content in the merge text box 312 and obtaining a distance between the merge text box 312 and the merge text box 311. The distance between two text boxes may be a pixel distance between the center points of the two text boxes in the vertical direction, or a pixel distance between corresponding positions of the two text boxes may be used as a distance between the two text boxes, for example, in the case that the two text boxes are aligned left, a corner point of the two text boxes located at the upper left corner or the lower left corner may be used as two vertices for determining the distance, and a pixel distance between the two vertices may be used as a distance between the two text boxes. Of course, other ways similar to those described above may be used to determine the distance between two text boxes. The specific implementation is not limited in the present application, and may include, but is not limited to, the above-exemplified cases. In the case that the distance between the merged text box 312 and the merged text box 311 does not satisfy the first set condition, that is, the distance between the merged text box 312 and the merged text box 311 is less than or equal to the first inter-box distance threshold, the upward traversal is continued. In case it is detected that the distance between the merged text box 314 and the merged text box 313 satisfies the first set condition, i.e. the distance between the merged text box 314 and the merged text box 313 is greater than the first inter-box distance threshold, the upward traversal is stopped. Next, using the merged text box 313 as a starting position, that is, using the merged text box 314 and the merged text box 313 traversed first in the merged text box 313 as starting positions, traversing the merged text box downwards until the merged text box 311 where the key information postal code 10400 is located is found, and acquiring the content traversed downwards as order information to be corrected.
In the embodiment of the present disclosure, the relationship between the directions in which the first target direction and the second target direction are respectively pointed is not limited, that is, the first target direction and the second target direction may be at a certain angle, for example, the first target direction and the second target direction may be opposite (i.e. 180 °) or the same (i.e. 0 °).
In one example, when the key information is located at a beginning portion of the order information to be corrected, the first target direction may indicate to traverse down the merged text box, by traversing down the merged text box until the key information is found, or until an adjacent merged text box is found that is a distance that satisfies a first set condition. And under the condition that the key information is positioned at the beginning part of the order information to be corrected, the first target direction and the second target direction are the same, traversing is carried out again in the traversing area according to the second target direction, and the traversed content is obtained as the order information to be corrected.
In some embodiments, the adjacent merged text box is used as a target adjacent merged text box, and the first inter-frame distance threshold corresponding to the target adjacent merged text box is determined according to at least one of the height of the merged text box traversed first in the target adjacent merged text box, the distance between the merged text boxes contained in the traversed adjacent merged text box and the height of the merged text box traversed first. Wherein the target adjacent merged text box is two adjacent merged text boxes for which a first inter-frame distance threshold is to be determined. In the embodiments of the present disclosure, the first inter-frame distance threshold value corresponding to each pair of adjacent merged text boxes may be different.
In one example, the first inter-frame distance threshold is determined based on a height of a first traversed merge text box in the target adjacent merge text boxes.
Taking the first inter-frame distance threshold corresponding to the adjacent merged text boxes 311 and 312 in fig. 3B as an example, since each merged text box is traversed from bottom to top in the process of locating the area where the order information to be corrected is located, the first inter-frame distance threshold corresponding to the adjacent merged text boxes 311 and 312, in this example, may be determined according to the height of the merged text box 311. For example, the first inter-frame distance threshold is set to 0.65×mean_height1 (the height of the merge text box 311).
In one example, the first inter-box distance threshold may be determined based on the distance between the merge text boxes contained in adjacent merge text boxes that have been traversed and the height of the merge text box that was traversed first. The first traversed combined text box is the first traversed combined text box in the process of positioning the area where the order information to be corrected is located.
Taking the first inter-frame distance threshold corresponding to the adjacent merged text boxes 312 and 313 in fig. 3B as an example, the first inter-frame distance threshold corresponding to the two can be determined according to the distance between the traversed adjacent merged text boxes 311 and 312 and the height of the first traversed merged text box 311. For example, the first inter-frame distance threshold is set to be a means1_distance+standard 1_displacement, where means1_distance represents a distance between adjacent merged text boxes 311 and 312, standard 1_displacement represents a perturbation value corresponding to merged text boxes 311 and 312, standard 1_displacement=0.25×height1, and height1 is, for example, a height of merged text box 311.
In the case where more than one pair of adjacent text boxes have been traversed, taking as an example the first inter-frame distance threshold corresponding to the adjacent text boxes 313 and 314 in fig. 3B, the first inter-frame distance threshold corresponding to the target adjacent text boxes 313 and 314 may be determined according to the distance between the adjacent merged text boxes 311 and 312 that have been traversed, the distance between the adjacent merged text boxes 312 and 313, and the height of the merged text box 311 that has been traversed first.
In one example, a first inter-frame distance threshold corresponding to the target adjacent merge text frame may be determined by obtaining an inter-frame distance between the target adjacent merge text frame by weighting and summing a distance between merge text frames contained by a reference adjacent merge text frame and an inter-frame distance between merge text frames contained by the reference adjacent merge text frame, wherein the reference adjacent text frame is a nearest neighbor text frame to the target merge text frame, obtaining an updated disturbance value for the target adjacent merge text frame, the updated disturbance value being obtained by weighting and summing an absolute value of a disturbance value and a distance difference value of the adjacent merge text frame traversed first, wherein the distance difference value is a difference between the inter-frame distance of the target adjacent merge text frame and the distance between merge text frames contained by the reference adjacent merge text frame, the disturbance value being determined based on a height of the first traversed merge text frame, and determining the first inter-frame distance threshold based on the updated disturbance value and the updated disturbance value.
Taking still the first inter-frame distance threshold value corresponding to the adjacent text boxes 313 and 314 in fig. 3B as an example, first, the updated inter-frame distance corresponding to the adjacent text boxes 313 and 314 is obtained
New_mean=0.6×mean_distance+0.4×mean2_distance, where mean_distance is the updated inter-box distance between the merged text boxes included in reference to adjacent merged text boxes 312 and 313. In this example, the distance between the update frames corresponding to each adjacent merged text frame is obtained in the same manner except for the adjacent merged text frame traversed first. And the distance between the update frames corresponding to the adjacent traversed merged text frames is the distance between the contained merged text frames. Next, an updated perturbation value new_displacement=0.6×standard 1_displacement+0.4×abs (mean2_distance-new_mean) is obtained, where standard 1_displacement represents the perturbation value corresponding to the merged text boxes 311 and 312 as described above, and the meaning of mean2_distance and new_mean is as described above, for example, the height of the merged text box 311. Finally, a first inter-frame distance threshold corresponding to the target adjacent merged text boxes 313 and 314 is determined according to the updated inter-frame distance and the updated perturbation value obtained above.
It will be appreciated by those skilled in the art that the values of the various parameters above are for illustration only and not intended to be limiting, and that the values of the various parameters and the weighting coefficient values may be determined as desired.
For the plurality of merged text boxes shown in fig. 3B, by applying the above-described method of determining the first inter-box distance threshold, when traversing upward from the merged text box 311, it is detected that the distance between the merged text box 314 and the merged text box 313 is greater than the corresponding first inter-box distance threshold, so that traversing is stopped, then traversing downward from the merged text box 314 and the merged text box 313, which were traversed first, as starting positions, until the merged text box 311 where the key information is located is found, and acquiring the content obtained by traversing downward.
In the embodiment of the disclosure, the fault tolerance of the information extraction method provided by the embodiment of the disclosure is improved by setting the disturbance value on the distance threshold and updating the current distance threshold according to the traversed distance between the adjacent merged text boxes and the first traversed merged text box, so that the information of the order to be corrected can be extracted more effectively.
In some embodiments, after the to-be-corrected order information is extracted, according to the target direction, according to the positional relationship with the area where the to-be-corrected order information is located, determining the subject name corresponding to the to-be-corrected order information from the merged text box outside the area where the to-be-corrected order information is located.
In the files with multiple formats, the text box closest to the extracted target area is the text box of the subject name corresponding to the order information to be corrected. Taking the partial screenshot of the hotel order shown in fig. 3B as an example, it can be seen that the text box above the extracted address information is the name of the main body of the address information, namely the hotel. The same is true for files such as business cards and shopping orders, and text boxes closest to the areas where address information, identity information and the like are located are text boxes where names of subjects of the information are located.
In one example, the subject name corresponding to the order information to be corrected may be determined by the following method.
Firstly, determining a merged text box closest to an area where order information to be corrected is located in a first target direction, traversing the merged text box according to the first target direction by taking the merged text box as a starting position until an adjacent merged text box with a distance meeting a second set condition is found, traversing the merged text box outside the area where the order information to be corrected is located according to the second target direction by taking the first traversed merged text box as the starting position in the adjacent merged text box with the distance meeting the second set condition, and acquiring contents traversed according to the second target direction.
Taking the merged text box shown in fig. 3C as an example, the content included in the merged text boxes 321 to 322 is to-be-corrected order information extracted by the order information correction method according to any embodiment of the present disclosure, and the area where the merged text boxes 321 to 322 are located may be determined as the area where the to-be-corrected order information is located. Among the respective merged text boxes determined according to the text recognition result, in addition to the merged text boxes 321 to 322, the merged text box closest to the region where the order information to be corrected is located in the first target direction (the direction of search traversal, in this example, upward) is 323 (there are characters in a non-target language between the merged text box 322 and the merged text box 323, as shown by the gray part, ignored). The merged text box is traversed upward with the merged text box 323 as a starting position. Since the distance between the adjacent merge text box above the merge text box 323 and the merge text box 323 exceeds the second inter-box threshold, that is, the second setting condition is satisfied (in the case that no other merge text box above the merge text box 323 is present, the second setting condition is considered to be satisfied), the merge text box 323 is taken as the starting position, and the merge text box outside the area where the order information to be corrected is located is traversed downwards, that is, the merge text box 323 in this example, so that the content "xxxxxxx Hotel" in the merge text box can be determined as the name of the main body of the order information to be corrected, that is, "xxxxx Hotel" is determined as the name of the main body of the order information to be corrected.
In some embodiments, when the merged text box is used as a starting position and the merged text box is traversed according to the first target direction, the merged text box not above the target area is ignored, that is, the merged text box which does not intersect with the merged text box in which the order information to be corrected is located in the horizontal direction is ignored.
In one example, where "but not" ("is included in the traversed merge text box), then the distance condition between adjacent merge text boxes may be ignored, and the traversal of the merge text box in the first target direction may continue until" ("is found, and then a determination may be made as to whether to stop the traversal based on the distance condition between adjacent merge text boxes.
In one example, the second inter-frame distance threshold may be set to 0.6 x mean_height (average height of neighboring merged text boxes) with complete brackets "()" included in the merged text box currently traversed, or without brackets. Those skilled in the art will appreciate that the above coefficient settings are examples and this disclosure is not limited thereto.
The information extraction method proposed by any embodiment of the present disclosure can be applied to images or electronic documents of various formats, including at least images or electronic documents (e.g., pdf documents) of hotel orders, airplane travel slips, passports, identity cards, and the like. By applying the information extraction method to the images or electronic documents in the various formats, the corresponding types of order information to be corrected contained in the electronic documents or the electronic documents can be extracted, and the order information to be corrected at least comprises one item of address information, journey information, identity information and the like.
Fig. 4 is a device for correcting order information according to at least one embodiment of the present disclosure, where the device includes an obtaining unit 401 configured to obtain order information to be corrected according to a text recognition result of an order, a determining unit 402 configured to determine target search information from the text recognition result, a matching unit 403 configured to obtain order reference information matched with the target search information through a preset search mode, and a correcting unit 404 configured to correct the order information to be corrected by using the order reference information, so as to obtain target order information.
In some embodiments, the target search information includes at least one of a partial content of the order information to be corrected, the partial content including at least one of a subject name, at least one element.
In some embodiments, the matching unit is specifically configured to access a setting database to obtain order reference information matched with the target search information from the setting database, and obtain order reference information matched with the target search information through the internet.
In some embodiments, the setting database includes a plurality of levels of reference unit information, and a lowest level of the plurality of levels of reference unit information corresponds to a plurality of reference subject names.
In some embodiments, the setting database stores first reference information corresponding to a reference subject name, the determining unit is specifically configured to obtain unit information of a lowest level in the order information to be corrected according to level division in the setting database, the obtaining order reference information matched with the target search information from the setting database includes determining target unit information matched with unit information of a lowest level in the order information to be corrected in the reference unit information of the lowest level in the setting database, determining target subject names meeting preset conditions in a plurality of reference subject names corresponding to the target unit information, and obtaining order reference information matched with the target search information according to the first reference information corresponding to the target subject names.
In some embodiments, the setting database stores second reference information corresponding to a reference subject name, the matching unit is specifically configured to obtain unit information of a lowest level in the order information to be corrected according to level division in the setting database, determine target unit information matched with unit information of the lowest level in the order information to be corrected in the reference unit information of the lowest level in the setting database, determine target subject names corresponding to the target unit information and meeting preset conditions, and obtain order reference information matched with the target search information according to the reference unit information of each level corresponding to the target subject name and the second reference information corresponding to the target subject name.
In some embodiments, the matching unit is configured to determine, from among a plurality of reference subject names corresponding to the target unit information, a target subject name that meets a preset condition, and specifically configured to match the subject name corresponding to the to-be-corrected order information with the plurality of reference subject names corresponding to the target unit information, and determine, as the target subject name, the reference subject name that has the highest matching score and exceeds a first set threshold.
In some embodiments, the matching unit is specifically configured to search in the internet according to a part of the content of the order information to be corrected to obtain at least one reference information matched with the target search information, match the reference information corresponding to the target search information with the order information to be corrected, and obtain the order reference information with the highest matching score and exceeding a second set threshold.
In some embodiments, the apparatus further includes an adding unit, configured to add the order reference information obtained from the internet and a subject name corresponding to the order information to be corrected to information corresponding to reference unit information of a lowest level in the setting database.
In some embodiments, the device further includes an updating unit, configured to update information corresponding to reference unit information of a lowest level in the setting database according to the order reference information obtained from the internet and a subject name corresponding to the order information to be corrected.
In some embodiments, the order information to be corrected comprises at least address information, at least one element included in the order information to be corrected comprises at least one of a administrative area and a postal code, and the reference unit information of the plurality of levels included in the setting database comprises reference administrative area information or postal code information.
In some embodiments, the obtaining unit is specifically configured to obtain a text recognition result of the object to be processed, where the text recognition result includes a plurality of text boxes, determine a first text box including key information from the plurality of text boxes, where the key information includes a part of content of the order information to be corrected, where the part of content includes at least one element in the order information to be corrected and at least one keyword indicating the order information to be corrected, combine at least some of the plurality of text boxes according to the first text box to obtain a combined text box, and obtain the order information to be corrected from the combined text box.
According to an aspect of the disclosure, there is provided an electronic device, the device comprising a memory for storing computer instructions executable on the processor for implementing a method of correcting order information according to any embodiment of the disclosure when the computer instructions are executed.
According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of correcting order information according to any of the embodiments of the present disclosure.
According to the correction method, device, equipment and storage medium of order information of one or more embodiments of the present disclosure, order information to be corrected is obtained according to a text recognition result of an order, target search information is determined from the text recognition result, order reference information matched with the target search information is obtained in a preset search mode, the order information to be corrected is corrected by using the order reference information to obtain target order information, and accurate target order information can be obtained rapidly from the text recognition result of the order.
Fig. 5 is an electronic device provided in at least one embodiment of the present disclosure, the device including a memory for storing computer instructions executable on the processor for implementing a method of correcting order information in accordance with any of the embodiments of the present disclosure when the computer instructions are executed.
At least one embodiment of the present disclosure also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method for correcting order information according to any embodiment of the present disclosure.
One skilled in the relevant art will recognize that one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, one or more embodiments of the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for data processing apparatus embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the description of method embodiments in part.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the acts or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly embodied computer software or firmware, in computer hardware including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on a manually-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Computers suitable for executing computer programs include, for example, general purpose and/or special purpose microprocessors, or any other type of central processing unit. Typically, the central processing unit will receive instructions and data from a read only memory and/or a random access memory. The essential elements of a computer include a central processing unit for carrying out or executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks, etc. However, a computer does not have to have such a device. Furthermore, the computer may be embedded in another device, such as a mobile phone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.
Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices including, for example, semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disk or removable disks), magneto-optical disks, and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features of specific embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. On the other hand, the various features described in the individual embodiments may also be implemented separately in the various embodiments or in any suitable subcombination. Furthermore, although features may be acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Furthermore, the processes depicted in the accompanying drawings are not necessarily required to be in the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.
The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.

Claims (13)

1.一种订单信息的校正方法,其特征在于,所述方法包括:1. A method for correcting order information, characterized in that the method comprises: 根据订单的文本识别结果获得待校正订单信息,所述待校正订单信息为地址信息;Obtaining order information to be corrected according to a text recognition result of the order, wherein the order information to be corrected is address information; 从所述文本识别结果中确定目标搜索信息,所述目标搜索信息包括所述待校正订单信息的部分内容,所述部分内容包括地址信息所属的主体名称和所述地址信息所包括的行政区,或者,地址信息所属的主体名称和所述地址信息所包括的行政区所对应的邮政编码;Determining target search information from the text recognition result, the target search information includes part of the order information to be corrected, the part including the name of the subject to which the address information belongs and the administrative district included in the address information, or the name of the subject to which the address information belongs and the postal code corresponding to the administrative district included in the address information; 通过预设搜索方式获取与所述目标搜索信息匹配的订单参考信息,所述订单参考信息是指与所述目标搜索信息对应的地址信息,包括:访问设定数据库,以从所述设定数据库中获取与所述目标搜索信息匹配的订单参考信息,所述设定数据库包括多个层级的参考单元信息,且所述多个层级中最低层级的参考单元信息对应于多个参考主体名称;Acquiring order reference information matching the target search information by a preset search method, wherein the order reference information refers to address information corresponding to the target search information, including: accessing a setting database to acquire order reference information matching the target search information from the setting database, wherein the setting database includes reference unit information of multiple levels, and the reference unit information of the lowest level among the multiple levels corresponds to multiple reference subject names; 利用所述订单参考信息校正所述待校正订单信息,以得到目标订单信息。The order information to be corrected is corrected using the order reference information to obtain target order information. 2.根据权利要求1所述的方法,其特征在于,所述通过预设搜索方式获取与所述目标搜索信息匹配的订单参考信息还包括:2. The method according to claim 1, characterized in that the step of obtaining order reference information matching the target search information by a preset search method further comprises: 通过互联网获取与所述目标搜索信息匹配的订单参考信息。Acquire order reference information matching the target search information through the Internet. 3.根据权利要求1所述的方法,其特征在于,所述设定数据库存储有参考主体名称对应的第一参考信息,所述第一参考信息是所述参考主体名称所对应的完整信息;3. The method according to claim 1, characterized in that the setting database stores first reference information corresponding to the reference subject name, and the first reference information is complete information corresponding to the reference subject name; 从所述文本识别结果中确定目标搜索信息,包括:Determining target search information from the text recognition result includes: 根据所述设定数据库中的层级划分,获取所述待校正订单信息中最低层级的单元信息;According to the hierarchical division in the setting database, obtaining the unit information of the lowest level in the order information to be corrected; 所述从所述设定数据库中获取与所述目标搜索信息匹配的订单参考信息,包括:The acquiring order reference information matching the target search information from the setting database includes: 确定所述设定数据库的最低层级的参考单元信息中,与所述待校正订单信息中最低层级的单元信息相匹配的目标单元信息;Determine target unit information in the lowest level reference unit information of the setting database that matches the lowest level unit information in the order information to be corrected; 确定所述目标单元信息所对应的多个参考主体名称中,符合预设条件的目标主体名称;Determine a target subject name that meets a preset condition among a plurality of reference subject names corresponding to the target unit information; 根据所述目标主体名称所对应的第一参考信息,获得与所述目标搜索信息匹配的订单参考信息。According to the first reference information corresponding to the target entity name, order reference information matching the target search information is obtained. 4.根据权利要求1所述的方法,其特征在于,所述设定数据库存储有参考主体名称对应的第二参考信息,所述第二参考信息为各个层级的参考单元信息之外的其他参考信息;4. The method according to claim 1, characterized in that the setting database stores second reference information corresponding to the reference subject name, and the second reference information is other reference information other than the reference unit information of each level; 从所述文本识别结果中确定目标搜索信息,包括:Determining target search information from the text recognition result includes: 根据所述设定数据库中的层级划分,获取所述待校正订单信息中最低层级的单元信息;According to the hierarchical division in the setting database, obtaining the unit information of the lowest level in the order information to be corrected; 所述从所述设定数据库中获取与所述目标搜索信息匹配的订单参考信息,包括:The acquiring order reference information matching the target search information from the setting database includes: 确定所述设定数据库的最低层级的参考单元信息中,与所述待校正订单信息中最低层级的单元信息相匹配的目标单元信息;Determine target unit information in the lowest level reference unit information of the setting database that matches the lowest level unit information in the order information to be corrected; 确定所述目标单元信息所对应的多个参考主体名称中,符合预设条件的目标主体名称;Determine a target subject name that meets a preset condition among a plurality of reference subject names corresponding to the target unit information; 根据所述目标主体名称所对应的各个层级的参考单元信息,以及所述目标主体名称所对应的第二参考信息,获得与所述目标搜索信息匹配的订单参考信息。According to the reference unit information of each level corresponding to the target subject name and the second reference information corresponding to the target subject name, order reference information matching the target search information is obtained. 5.根据权利要求3或4所述的方法,其特征在于,所述确定所述目标单元信息所对应的多个参考主体名称中,符合预设条件的目标主体名称,包括:5. The method according to claim 3 or 4, characterized in that the step of determining the target subject name that meets a preset condition among the multiple reference subject names corresponding to the target unit information comprises: 将所述待校正订单信息对应的主体名称分别与所述目标单元信息所对应的多个参考主体名称进行匹配;Matching the subject name corresponding to the order information to be corrected with a plurality of reference subject names corresponding to the target unit information respectively; 将匹配得分最高且超过第一设定阈值的参考主体名称,确定为目标主体名称。The reference subject name with the highest matching score and exceeding the first set threshold is determined as the target subject name. 6.根据权利要求2所述的方法,其特征在于,所述通过互联网获取与所述目标搜索信息匹配的订单参考信息,包括:6. The method according to claim 2, characterized in that the step of obtaining order reference information matching the target search information through the Internet comprises: 根据所述待校正订单信息的部分内容在互联网中进行搜索,获得至少一个所述目标搜索信息匹配的参考信息;Searching the Internet based on part of the order information to be corrected to obtain at least one reference information matching the target search information; 将所述目标搜索信息对应的参考信息与所述待校正订单信息进行匹配;Matching the reference information corresponding to the target search information with the order information to be corrected; 获取匹配得分最高且超过第二设定阈值的订单参考信息。The order reference information having the highest matching score and exceeding a second set threshold is obtained. 7.根据权利要求6所述的方法,其特征在于,所述方法还包括:7. The method according to claim 6, characterized in that the method further comprises: 将从互联网中获取的所述订单参考信息,以及所述待校正订单信息对应的主体名称,添加至所述设定数据库中最低层级的参考单元信息所对应的信息中。The order reference information obtained from the Internet and the subject name corresponding to the order information to be corrected are added to the information corresponding to the reference unit information at the lowest level in the setting database. 8.根据权利要求7所述的方法,其特征在于,所述方法还包括:8. The method according to claim 7, characterized in that the method further comprises: 根据从互联网中获取的所述订单参考信息,以及所述待校正订单信息对应的主体名称,对所述设定数据库中最低层级的参考单元信息所对应的信息进行更新。According to the order reference information obtained from the Internet and the subject name corresponding to the order information to be corrected, the information corresponding to the lowest level reference unit information in the setting database is updated. 9.根据权利要求1所述的方法,其特征在于,所述设定数据库所包括的多个层级的参考单元信息包括参考行政区信息和/或邮政编码信息。9 . The method according to claim 1 , wherein the reference unit information of multiple levels included in the setting database includes reference administrative district information and/or postal code information. 10.根据权利要求9所述的方法,其特征在于,所述根据订单的文本识别结果获得待校正订单信息,包括:10. The method according to claim 9, characterized in that the step of obtaining the order information to be corrected according to the text recognition result of the order comprises: 获取所述订单的文本识别结果,所述文本识别结果包括多个文本框;Obtaining a text recognition result of the order, wherein the text recognition result includes a plurality of text boxes; 从所述多个文本框中确定包含关键信息的第一文本框,所述关键信息包括所述待校正订单信息的部分内容,所述部分内容包括待校正订单信息中的至少一个元素、指示待校正订单信息的关键词中的至少一项;Determine a first text box containing key information from the multiple text boxes, wherein the key information includes a portion of the order information to be corrected, and the portion of the content includes at least one element in the order information to be corrected and at least one of keywords indicating the order information to be corrected; 根据所述第一文本框,对所述多个文本框中的至少部分进行合并,得到合并文本框;According to the first text box, merging at least part of the multiple text boxes to obtain a merged text box; 从所述合并文本框获取所述待校正订单信息。The order information to be corrected is obtained from the merged text box. 11.一种订单信息的校正装置,其特征在于,所述装置包括:11. A device for correcting order information, characterized in that the device comprises: 获取单元,用于根据订单的文本识别结果获得待校正订单信息,所述待校正订单信息为地址信息;An acquisition unit, used for acquiring order information to be corrected according to a text recognition result of the order, wherein the order information to be corrected is address information; 确定单元,用于从所述文本识别结果中确定目标搜索信息,所述目标搜索信息包括所述待校正订单信息的部分内容,所述部分内容包括地址信息所属的主体名称和所述地址信息所包括的行政区,或者,地址信息所属的主体名称和所述地址信息所包括的行政区所对应的邮政编码;a determination unit, configured to determine target search information from the text recognition result, wherein the target search information includes part of the content of the order information to be corrected, wherein the part includes the name of the subject to which the address information belongs and the administrative district included in the address information, or the name of the subject to which the address information belongs and the postal code corresponding to the administrative district included in the address information; 匹配单元,用于通过预设搜索方式获取与所述目标搜索信息匹配的订单参考信息,所述订单参考信息是指与所述目标搜索信息对应的地址信息,具体用于:访问设定数据库,以从所述设定数据库中获取与所述目标搜索信息匹配的订单参考信息,所述设定数据库包括多个层级的参考单元信息,且所述多个层级中最低层级的参考单元信息对应于多个参考主体名称;a matching unit, configured to obtain order reference information matching the target search information by a preset search method, wherein the order reference information refers to address information corresponding to the target search information, and is specifically configured to: access a setting database to obtain order reference information matching the target search information from the setting database, wherein the setting database includes reference unit information of multiple levels, and the reference unit information of the lowest level among the multiple levels corresponds to multiple reference subject names; 校正单元,用于利用所述订单参考信息校正所述待校正订单信息,以得到目标订单信息。A correction unit is used to correct the order information to be corrected by using the order reference information to obtain target order information. 12.一种电子设备,其特征在于,所述设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现权利要求1至10任一项所述的方法。12. An electronic device, characterized in that the device comprises a memory and a processor, wherein the memory is used to store computer instructions executable on the processor, and the processor is used to implement the method according to any one of claims 1 to 10 when executing the computer instructions. 13.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现权利要求1至10任一所述的方法。13. A computer-readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the method according to any one of claims 1 to 10 is implemented.
CN202011339777.2A 2020-11-25 2020-11-25 Order information correction method, device, equipment and storage medium Active CN112395874B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011339777.2A CN112395874B (en) 2020-11-25 2020-11-25 Order information correction method, device, equipment and storage medium
PCT/IB2021/055848 WO2022112857A1 (en) 2020-11-25 2021-06-30 Method and apparatus for correcting order information, and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011339777.2A CN112395874B (en) 2020-11-25 2020-11-25 Order information correction method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112395874A CN112395874A (en) 2021-02-23
CN112395874B true CN112395874B (en) 2025-04-22

Family

ID=74603919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011339777.2A Active CN112395874B (en) 2020-11-25 2020-11-25 Order information correction method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112395874B (en)
WO (1) WO2022112857A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114092684B (en) * 2021-11-17 2024-11-19 中国银联股份有限公司 A text calibration method, device, equipment and storage medium
CN114120322B (en) * 2022-01-26 2022-05-10 深圳爱莫科技有限公司 Order commodity quantity identification result correction method and processing equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442702A (en) * 2019-08-15 2019-11-12 北京上格云技术有限公司 Searching method, device, readable storage medium storing program for executing and electronic equipment
CN110674396A (en) * 2019-08-28 2020-01-10 北京三快在线科技有限公司 Text information processing method and device, electronic equipment and readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050137991A1 (en) * 2003-12-18 2005-06-23 Bruce Ben F. Method and system for name and address validation and correction
WO2009005492A1 (en) * 2007-06-29 2009-01-08 United States Postal Service Systems and methods for validating an address
JP5043735B2 (en) * 2008-03-28 2012-10-10 インターナショナル・ビジネス・マシーンズ・コーポレーション Information classification system, information processing apparatus, information classification method, and program
CN107239453B (en) * 2016-03-28 2020-10-02 平安科技(深圳)有限公司 Information writing method and device
CN109784235A (en) * 2018-12-29 2019-05-21 广东益萃网络科技有限公司 Method for automatically inputting, device, computer equipment and the storage medium of paper form

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442702A (en) * 2019-08-15 2019-11-12 北京上格云技术有限公司 Searching method, device, readable storage medium storing program for executing and electronic equipment
CN110674396A (en) * 2019-08-28 2020-01-10 北京三快在线科技有限公司 Text information processing method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN112395874A (en) 2021-02-23
WO2022112857A1 (en) 2022-06-02

Similar Documents

Publication Publication Date Title
EP3971731B1 (en) Fence address-based coordinate data processing method and apparatus, and computer device
CN113434623B (en) Fusion method based on multi-source heterogeneous space planning data
US11182604B1 (en) Computerized recognition and extraction of tables in digitized documents
US11348330B2 (en) Key value extraction from documents
US7769778B2 (en) Systems and methods for validating an address
US5943443A (en) Method and apparatus for image based document processing
CN111652176B (en) Information extraction method, device, equipment and storage medium
CN112395874B (en) Order information correction method, device, equipment and storage medium
WO2009005492A1 (en) Systems and methods for validating an address
US20220335073A1 (en) Fuzzy searching using word shapes for big data applications
CN117763169B (en) Knowledge extraction method, device, equipment and storage medium in situation analysis field
AU2021364331A1 (en) Systems and methods for enabling relevant data to be extracted from a plurality of documents
CN113642320A (en) Method, device, equipment and medium for extracting document directory structure
Chiang et al. GeoAI for the digitization of historical maps
CN118132759A (en) Method for carrying out knowledge graph analysis on cultural book data
HK40039018A (en) Correction method, device, equipment and storage medium of order information
CN114817186B (en) A structured data conversion system and method
JP2024003769A (en) Character recognition system, method of recognizing character by computer, and character search system
KR102697516B1 (en) Character recognition method and system robust to errors of character recognition that recognize information included in tables
Luft Automatic georeferencing of historical maps by geocoding
JP4521466B2 (en) Form processing device
JP4521377B2 (en) Form processing apparatus, program for executing the apparatus, and form format creation program
JP2655087B2 (en) Character recognition post-processing method
JP5712415B2 (en) Form processing system and form processing method
JP3111524B2 (en) Image information database search method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40039018

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant