[go: up one dir, main page]

CN114116724B - Data verification method, device, equipment and readable storage medium - Google Patents

Data verification method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN114116724B
CN114116724B CN202111446609.8A CN202111446609A CN114116724B CN 114116724 B CN114116724 B CN 114116724B CN 202111446609 A CN202111446609 A CN 202111446609A CN 114116724 B CN114116724 B CN 114116724B
Authority
CN
China
Prior art keywords
source
data
target
field
partition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111446609.8A
Other languages
Chinese (zh)
Other versions
CN114116724A (en
Inventor
陈双琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202111446609.8A priority Critical patent/CN114116724B/en
Publication of CN114116724A publication Critical patent/CN114116724A/en
Application granted granted Critical
Publication of CN114116724B publication Critical patent/CN114116724B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application discloses a data verification method, a device, equipment and a readable storage medium, and relates to the fields of artificial intelligence and medical treatment, wherein the method comprises the following steps: acquiring a source data table and a target data table, and determining the table types of the source data table and the target data table; if the table type of the source data table and the table type of the target data table are all the full-amount tables, respectively determining whether the data amount in the source data table and the data amount in the target data table are larger than a data amount threshold value; if yes, partitioning the source data table to obtain a source partition table, and partitioning the target data table to obtain a target partition table; checking based on the source partition field in the source partition table and the target partition field in the target partition table; and if the data verification between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, determining that the source data table is inconsistent with the target data table. By adopting the embodiment of the application, the data verification efficiency can be improved.

Description

Data verification method, device, equipment and readable storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a data verification method, apparatus, device, and readable storage medium.
Background
The data verification is an important quality assurance means in the field of big data, and in front of massive data scale, rapid data flow and various data types, in order to ensure the validity of data and the accuracy of data transferred to a downstream system in the data cleaning and processing process, the data verification method can rapidly verify the accuracy of the data, has great application in the big data industry, can help a multi-bin system to ensure the validity of the massive data, and improves the reliability of the data analysis result of the downstream system. As in the medical field, data verification is required for a large amount of medical data.
In the prior art, when data verification faces to massive and complex data resources, each field in different tables needs to be compared and verified manually, so that the data verification efficiency is low.
Disclosure of Invention
The embodiment of the application provides a data verification method, a device, equipment and a readable storage medium, which can improve the data verification efficiency.
In a first aspect, the present application provides a data verification method, including:
Acquiring a source data table and a target data table, and determining the table type of the source data table and the table type of the target data table;
If the table type of the source data table and the table type of the target data table are all full-quantity tables, determining whether the data quantity in the source data table and the data quantity in the target data table are larger than a data quantity threshold value or not respectively;
If the data quantity in the source data table and the data quantity in the target data table are both larger than the data quantity threshold, partitioning the source data table to obtain at least one source partition table, and partitioning the target data table to obtain at least one target partition table;
Performing data verification based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table;
If the data check between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, determining that the source data table is inconsistent with the target data table, wherein the corresponding target partition table is a partition table corresponding to any source partition table in the at least one target partition table.
In a second aspect, the present application provides a data verification apparatus, comprising:
the data acquisition module is used for acquiring a source data table and a target data table and determining the table type of the source data table and the table type of the target data table;
The quantity determining module is used for determining whether the data quantity in the source data table and the data quantity in the target data table are larger than a data quantity threshold value or not respectively if the table type of the source data table and the table type of the target data table are all full-quantity tables;
The partition processing module is used for partitioning the source data table to obtain at least one source partition table and partitioning the target data table to obtain at least one target partition table if the data amount in the source data table and the data amount in the target data table are both larger than the data amount threshold;
The data verification module is used for carrying out data verification based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table;
And the result determining module is used for determining that the source data table is inconsistent with the target data table if the data verification between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, wherein the corresponding target partition table is a partition table corresponding to any source partition table in the at least one target partition table.
With reference to the second aspect, in one possible implementation manner, the partition processing module is specifically configured to:
Determining an equally dividing rule for the source data table based on the data amount in the source data table, and dividing the source data table by adopting the equally dividing rule to obtain the at least one source partition table; or alternatively
Determining a field division rule for the source data table based on the field type of the field in the source data table, dividing the source data table by adopting the field division rule to obtain the at least one source partition table, wherein the field division rule indicates that the preset field type is included.
With reference to the second aspect, in one possible implementation manner, the data check includes a field check and a field format check; the data verification module comprises:
A first sampling unit, configured to sample the source partition field to obtain at least one source sampling field;
The second sampling unit is used for sampling the target partition field to obtain at least one target sampling field;
A field checking unit for performing field checking on the at least one source sample field and the at least one target sample field;
the format checking unit is used for checking the field format of the at least one source sampling field and the field format of the at least one target sampling field if the field check passes;
And the region verification unit is used for carrying out data verification on the residual source partition field in the at least one source partition table and the residual target partition field in the at least one target partition table if the field format verification passes.
With reference to the second aspect, in one possible implementation manner, the result determining module is specifically configured to:
if the source partition field in any source partition table is not matched with the target partition field in the corresponding target partition table, determining that the source data table is inconsistent with the target data table; or alternatively
And if each source partition field in the at least one source partition table is matched with a target partition field in the at least one target partition table, and the field format of one or more source partition fields in the at least one source partition table is not matched with the target partition field format in the at least one target partition table, determining that the source data table is inconsistent with the target data table.
With reference to the second aspect, in one possible implementation manner, the data verification apparatus further includes: the second checking module is used for checking each source field in the source data table and each target field in the target data table if the data amount in the source data table and the data amount in the target data table are smaller than or equal to the data amount threshold;
If the field verification is passed, performing field format verification on the field format of each source field in the source data table and the field format of each target field in the target data table;
if the field formats of one or more source fields in the source data table are not matched with the field formats of the target fields in the target data table, determining that the source data table is inconsistent with the target data table.
With reference to the second aspect, in one possible implementation manner, the data verification apparatus further includes: a third verification module for:
If the table type of the source data table and the table type of the target data table are incremental tables, performing field verification on each source field in the source data table and each target field in the target data table;
If the field verification is passed, performing field format verification on the field format of each source field in the source data table and the field format of each target field in the target data table;
if the field formats of one or more source fields in the source data table are not matched with the field formats of the target fields in the target data table, determining that the source data table is inconsistent with the target data table.
With reference to the second aspect, in one possible implementation manner, the data verification apparatus further includes: the hierarchy acquisition module is used for:
Acquiring at least one intermediate level data table between the source data table and the target data table, wherein the at least one intermediate level data table comprises a first level data table and a second level data table, the first level data table is obtained by carrying out data extraction processing on the source data table, the second level data table is obtained by carrying out data cleaning processing on the first level data table, and the target data table is obtained by carrying out logic processing on the second level data table;
partitioning a first level field in the first level data table to obtain at least one first partition table;
partitioning the second level field in the second level data table to obtain at least one second partitioning table;
the result determining module is specifically configured to:
if the data verification between the source partition field and the first partition field is not passed, determining that the source data table is inconsistent with the target data table; or alternatively
If the data verification between the source partition field and the first partition field passes, performing data verification on the first partition field in the at least one first partition table and the second partition field in the at least one second partition table;
if the data verification between the first partition field and the second partition field is not passed, determining that the source data table is inconsistent with the target data table; or alternatively
And if the data verification between the first partition field and the second partition field is not passed, determining that the source data table is inconsistent with the target data table, wherein the data verification comprises field verification and field format verification.
In a third aspect, the present application provides a computer device comprising: a processor, a memory, a network interface;
the processor is connected to a memory, and a network interface, wherein the network interface is used for providing a data communication function, the memory is used for storing a computer program, and the processor is used for calling the computer program to enable a computer device containing the processor to execute the data verification method.
In a fourth aspect, the present application provides a computer readable storage medium having stored therein a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the above-described data verification method.
In a fifth aspect, the present application provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the data verification method provided in the various alternatives in the first aspect of the application.
In the embodiment of the application, the table type of the source data table and the table type of the target data table are determined by acquiring the source data table and the target data table; if the table type of the source data table and the table type of the target data table are all full-amount tables, determining whether the data amount in the source data table and the data amount in the target data table are larger than a data amount threshold value or not respectively; if the data quantity in the source data table and the data quantity in the target data table are both larger than the data quantity threshold, partitioning the source data table to obtain at least one source partition table, and partitioning the target data table to obtain at least one target partition table; and carrying out data verification based on the source partition field in at least one source partition table and the target partition field in at least one target partition table, and if the data verification between the source partition field in any source partition table and the target partition field in the corresponding target partition table does not pass, determining that the source data table is inconsistent with the target data table. Because the table type of the data table is determined in advance, when the table type of the data table is determined to be a full-quantity table and the data quantity in the data table is larger than the data quantity threshold value, a plurality of partition tables are obtained by partitioning the data in the data table, the data in each partition table can be checked, and whether the data in the source partition table and the data in the target partition table are consistent or not is determined. If the data in a certain source partition table and the data in a target partition table are inconsistent, the source data table and the target data table can be determined to be inconsistent, the whole source data table and the whole target data table are not required to be checked, and the data checking efficiency can be improved. In addition, by automatically checking the data in the source data table and the target data table, the fields in the source data table and the target data table do not need to be compared manually, so that the data checking efficiency can be improved, and the accuracy of data checking can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a data verification method according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating another data verification method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a data verification device according to an embodiment of the present application;
Fig. 4 is a schematic diagram of a composition structure of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The technical scheme of the application is suitable for carrying out consistency check on the data in the source data table and the target data table, thereby determining whether the source data table and the target data table have a consistent scene. The source data table and the target data table may refer to data tables related to medical fields, such as doctor-patient data tables, and may also refer to data tables related to other fields. Determining the table type of the source data table and the table type of the target data table by acquiring the source data table and the target data table; if the table type of the source data table and the table type of the target data table are all full-amount tables, determining whether the data amount in the source data table and the data amount in the target data table are larger than a data amount threshold value or not respectively; if the data quantity in the data table and the data quantity in the target data table are both greater than the data quantity threshold, partitioning the source data table to obtain at least one source partition table, and partitioning the target data table to obtain at least one target partition table; and carrying out data verification based on the source partition field in at least one source partition table and the target partition field in at least one target partition table, and if the data verification between the source partition field in any source partition table and the target partition field in the target partition table corresponding to the source partition table in at least one target partition table is not passed, determining that the source data table is inconsistent with the target data table. Because the table type of the data table is determined in advance, when the table type of the data table is determined to be a full-quantity table and the data quantity in the data table is larger than the data quantity threshold value, the data in the data table is partitioned to obtain a plurality of partition tables, the data in each partition table can be checked, and whether the data in the source partition table and the data in the target partition table are consistent or not is determined. If the data in the partition table is inconsistent, the source data table and the target data table can be determined to be inconsistent, and the whole source data table and the whole target data table are not required to be checked, so that the data checking efficiency can be improved. In addition, by automatically checking the data in the source data table and the target data table, the fields in the source data table and the target data table do not need to be compared manually, so that the data checking efficiency can be improved, and the accuracy of data checking can be improved.
Referring to fig. 1, fig. 1 is a flowchart of a data verification method according to an embodiment of the present application, where the data verification method may be applied to a computer device. The computer device may be an electronic device, including but not limited to a mobile phone, tablet, desktop, notebook, palm, vehicle-mounted device, augmented Reality/Virtual Reality (AR/VR) device, head mounted display, wearable device, smart speaker, digital camera, and other mobile internet device (mobile INTERNET DEVICE, MID) with network access capability, etc.; it may also refer to a stand-alone server, or a server cluster composed of several servers, or a cloud computing center. As shown in fig. 1, the data verification method includes, but is not limited to, the following steps:
s101, acquiring a source data table and a target data table, and determining the table type of the source data table and the table type of the target data table.
In the embodiment of the application, as the consistency of the source data table and the target data table is required to be checked, the data in the source data table and the data in the target data table are required to be checked, and whether the data in the source data table and the data in the target data table are consistent is determined, so that whether the source data table and the target data table are consistent is determined. The computer device may obtain the source data table and the target data table, determine a table type of the source data table and a table type of the target data table. The target data table may be obtained by processing a source data table, for example, may be obtained by cleaning data of the source data table, or may be obtained by copying the source data table, or may be obtained by other manners, which is not limited in the embodiment of the present application. In the embodiment of the application, the source data table and the target data table may refer to data tables related to the medical field, such as a doctor-patient data table, a chronic disease data table, a medical treatment data table, and the like, may also be data tables related to the education field, such as a student information data table, and may also be data tables related to other fields.
The computer device may obtain the source data table and the target data table from the data warehouse, and determine a table type of the source data table and a table type of the target data table based on a data synchronization mode of the data warehouse. The data synchronization modes of the data warehouse can comprise a full-volume synchronization mode and an incremental synchronization mode, wherein full-volume synchronization refers to synchronization of all data in a data table; incremental synchronization synchronizes only the part of the data table that is subject to change. If the data synchronization mode of the data warehouse is full-volume synchronization, the table types of the source data table and the target data table are full-volume tables. If the data synchronization mode of the data warehouse is incremental synchronization, the table types of the source data table and the target data table are incremental tables, and the data warehouse is used for storing various data tables. It will be appreciated that the data in the source data table obtained using full synchronization is the same as the data in the source data table obtained using delta synchronization, and the data in the target data table obtained using full synchronization is the same as the data in the target data table obtained using delta synchronization.
Alternatively, the computer device may pre-embed all the data sources, including the source data table and the target data table, in the data warehouse to form a mapping dictionary, package different data source connection methods in a class through Python, and the subsequent computer device may access the data warehouse by selecting different dictionary keys to obtain the source data table and the target data table. That is, the computer device obtains the source data table and the target data table by integrating different data tables in the data warehouse in advance, and when the source data table and the target data table need to be obtained, by inputting the identification of the source data table and the identification of the target data table, thereby accessing the source data table and the target data table. For example, the identification of the data table includes, but is not limited to, an account number and a password corresponding to the data table, and the account number and the password of each data table are different. The computer equipment can obtain the source data table through logging in the account number and the password of the source data table, and can obtain the target data table through logging in the account number and the password of the target data table.
Alternatively, the computer device may also obtain the source data table and the target data table from different file libraries, where the computer device may obtain the source data table and the target data table by obtaining a file name or a storage path of the source data table, based on the storage path, from a corresponding file library, and obtaining the target data table, based on the storage path, from a corresponding file library, based on the file name or the storage path, of the target data table. The method for acquiring the source data table and the target data table is not limited in the embodiment of the application, and the computer equipment can acquire the source data table and the target data table in a data transmission mode by other equipment, and the like.
S102, if the table type of the source data table and the table type of the target data table are all full-amount tables, determining whether the data amount in the source data table and the data amount in the target data table are larger than a data amount threshold respectively.
In the embodiment of the application, since the table type of the source data table and the table type of the target data table are all full-amount tables, whether the data amount in the source data table and the data amount in the target data table are larger than the data amount threshold value can be respectively determined. The data amount may be the amount of data in a data table. If the table type of the source data table and the table type of the target data table are all full-amount tables, and the data amount in the source data table and the data amount in the target data table are larger than the data amount threshold, the data amount in the source data table and the data amount in the target data table are more. Therefore, when the consistency of the source data table and the target data table is checked, if the data in the whole source data table and the data in the whole target data table are checked, more time is consumed, the data checking efficiency is low, and the data in the source data table and the target data table can be checked after being processed. If the table type of the source data table is a full table and the data amount in the source data table is greater than the data amount threshold, the source data table is a large table. Because the data amount in the large table is large, the data in the large table can be processed and then be verified. Generally, the target data table is obtained by processing the source data table, and the difference between the data amounts in the two tables is small, if the data amount in the source data table is greater than the data amount threshold, the data amount in the target data table is greater than the data amount threshold; if the data amount in the source data table is smaller than the data amount threshold, the data amount in the target data table is smaller than the data amount threshold.
Optionally, if the data amount in the source data table and the data amount in the target data table are both smaller than or equal to the data amount threshold, performing data verification on each data in the source data table and each data in the target data table, and determining consistency of the source data table and the target data table. That is, if the table type of the source data table is a full table and the data amount in the source data table is smaller than or equal to the data amount threshold, the source data table is a small table, and the data amount in the small table is small, all the data in the data table can be directly checked.
Optionally, if the table type of the source data table and the table type of the target data table are both incremental tables, performing data verification on each data in the source data table and each data in the target data table, and determining consistency of the source data table and the target data table. Because the table type of the source data table and the table type of the target data table are incremental tables, which indicate that the data amount in the source data table and the target data table is small, data verification can be performed on all data in the source data table and all data in the target data table, and consistency of the source data table and the target data table is determined.
S103, if the data quantity in the source data table and the data quantity in the target data table are both larger than the data quantity threshold, partitioning the source data table to obtain at least one source partition table, and partitioning the target data table to obtain at least one target partition table.
In the embodiment of the application, as the data volume in the source data table and the data volume in the target data table are both larger than the data volume threshold, the data volumes in the source data table and the target data table are more. When the consistency of the source data table and the target data table is checked, if the data in the whole source data table and the data in the whole target data table are checked, more time is consumed, the data checking efficiency is low, and the source data table and the target data table can be checked after being partitioned. The data table can be divided into a plurality of partition tables by partitioning the data table, so that the partition tables can be checked to verify the consistency of the data table, and the data check efficiency can be improved by verifying whether the data in the partition tables are consistent or not to determine whether the data in the source data table and the target data table are consistent or not because the data in the partition tables are less.
Optionally, the computer device may divide the source data table based on a preset partition rule to obtain at least one source partition table, where the preset partition rule includes an equal partition rule or a field partition rule. Specifically, the computer device may determine an halving rule for the source data table based on the data amount in the source data table, and divide the source data table by using the halving rule to obtain at least one source partition table; or determining a field division rule aiming at the source data table based on the field type of the field in the source data table, dividing the source data table by adopting the field division rule, and dividing the field matched with the preset field type in the source data table into the same source partition table to obtain at least one source partition table, wherein the field division rule indicates the preset field type. The dividing rule may refer to dividing the data table into N equal parts, so as to obtain N partition tables, where N is a positive integer. For example, the greater the amount of data in the source data table, the greater the value of N; the smaller the amount of data in the source data table, the smaller the value of N, the data table including the source data table and the target data table. The field partitioning rule may include a preset field type, which may include, for example, a name field, a time field, or other field type. For example, the preset field type is a name field, the source data table includes a plurality of name fields, the name field may refer to a name of a user, and the source data table may include data corresponding to the user. For example, the source data table includes medical data corresponding to 3 users, the computer device may divide the source data table based on the name field, divide a field matching the name field into the same source partition table, for example, divide all medical data corresponding to users whose name field is three into one source partition table, divide all medical data corresponding to users whose name field is four into one source partition table, and so on, thereby dividing the source data table into 3 source partition tables. The data table includes a source data table and a target data table, and the partitioning rules of the source data table and the target data table may be the same.
Optionally, the partitioning method of the target data table may be the same as that of the source data table, and since the partitioning rules of the source data table and the target data table are the same, when at least one source partition table and at least one target partition table are obtained by partitioning, a corresponding relationship between each source partition table and each target partition table may be established, and when data verification is subsequently performed on a source partition field in the source partition table and a target partition field in the target partition table, verification may be performed on the basis of a source partition field in the source partition table and a target partition field in the target partition table corresponding to the source partition table, so as to determine whether the source data table and the target data table are consistent. By dividing the source data table into at least one source partition table and dividing the target data table into at least one target partition table, data verification can be performed based on the source partition table and the target partition table during subsequent data verification, so that consistency of the source data table and the target data table is determined, and the data verification can be performed in a partition table verification mode because the data amount in the source partition table is smaller than that of the source data table.
S104, data verification is carried out based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table.
In the embodiment of the application, the computer equipment can perform data verification based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table to obtain a data verification result, wherein the data verification result is used for indicating whether the source data table is consistent with the target data table or not. And if the data check between the source partition field in the at least one source partition table and the target partition field in the at least one target partition table passes, determining that the source data table is consistent with the target data table.
S105, if the data check between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, determining that the source data table is inconsistent with the target data table.
The corresponding target partition table is a partition table corresponding to any source partition table in at least one target partition table. Optionally, since the computer device may divide the source data table and the target data table by using the same partition processing manner to obtain at least one source partition table and at least one target partition table, a correspondence between each source partition table and each target partition table may be established, and then a target partition table corresponding to the source partition table in the at least one target partition table may be determined based on the correspondence. That is, the computer device may perform a data check on a source partition field in the source partition table and a target partition field in a target partition table corresponding to the source partition table to determine consistency of the source data table and the target data table. Namely, the computer equipment only needs to carry out data verification on the fields in the two partition tables (namely, one source partition table and the target partition table corresponding to the source partition table), so that the data verification efficiency is improved. Or the target partition field in the at least one target partition table corresponding to the source partition field may also refer to a target partition field in each target partition table in the at least one target partition table, that is, the computer device may perform data verification based on the source partition field in each source partition table and the target partition field in each target partition table in the at least one target partition table, to determine consistency of the source data table and the target data table. For example, the source data table is divided into x source partition tables, the target data table is divided into y target partition tables, and x and y are positive integers, so that the computer device can perform data verification on the source partition field in the first source partition table in the x source partition tables and the target partition field in each target partition table in the y target partition tables respectively, thereby improving the accuracy of data verification.
Optionally, the data verification may include a field checksum field format verification, where the field verification may refer to verifying a field name in the source data table and a field name in the target data table to determine whether the field name in the source data table is consistent with the field name in the target data table; the field format check may refer to checking the data format and field content of the fields in the source data table to determine whether the source data table is consistent with the target data table. Wherein, the field name can comprise fields of name, gender, age and the like, the data format of the field can comprise numerical value type, character type or other types, and the field content can comprise Zhang San, male, 30 and the like.
Alternatively, the computer device may perform data verification based on the field of the partition obtained by sampling the field in the partition table, to obtain a data verification result. Specifically, the computer device may sample the source partition field to obtain at least one source sample field; sampling the target partition field to obtain at least one target sampling field; performing field verification on at least one source sample field and at least one target sample field; if the field verification is passed, performing field format verification on the field format of at least one source sampling field and the field format of at least one target sampling field; and if the field format check is passed, performing data check on the remaining source partition fields in the at least one source partition table and the remaining target partition fields in the at least one target partition table to obtain a partition table check result, and determining the data check result based on the partition table check result.
Wherein the remaining source partition fields in the at least one source partition table may refer to fields in the at least one source partition table other than the at least one source sample field, e.g., the remaining source partition fields in one source partition table refer to source partition fields in the source partition table that are not sampled. The remaining target partition fields in the at least one target partition table may refer to fields in the at least one target partition table other than the at least one target sample field, e.g., the remaining target partition fields in one target partition table refer to target partition fields in the target partition table that are not sampled. That is, in the embodiment of the present application, after a source partition table is obtained by partitioning a source data table, a field in the source partition table may be sampled, and data verification is performed based on a source sampling field obtained by sampling and a target sampling field obtained by sampling, and if the source sampling field and the target sampling field have consistency, data verification is performed on a remaining source partition field in the source partition table except the source sampling field and a remaining target partition field in the target partition table except the target sampling field, so as to obtain a data verification result. If the data check between the source sampling field in any one source partition table and the target sampling field in the target partition table corresponding to the source partition table is not passed or the data check between the residual source partition field and the residual target partition field is not passed, the source data table and the target data table are determined to be inconsistent, the subsequent data check is not needed, and the data check efficiency is improved.
Optionally, when the computer device performs field verification on at least one source sampling field and at least one target sampling field, one-to-one verification can be performed on the source sampling field and the target sampling field, that is, after one source sampling field and one target sampling field are verified, verification is performed on the next source sampling field and the next target sampling field; the source sampling field and the target sampling field can be all checked together, namely, each source sampling field and each target sampling field are checked at the same time; or, field verification may be performed on each source sample field and each target sample field in turn, which is not limited by the embodiment of the present application. It may be understood that, in the embodiments of the present application, all the field check between the fields in the source data table and the fields in the target data table and the field format check between the field format of the fields in the source data table and the field format of the fields in the target data table may refer to the one-to-one check, all the check together or the check in sequence.
Because the data verification includes field checksum field format verification, when the data verification is performed on the source sampling field and the target sampling field, the field verification may be performed on the source sampling field and the target sampling field first. If the verification between the source sampling field and the target sampling field is not passed, a data verification result is obtained and indicates that the source data table and the target data table are inconsistent, and the field format verification is not required to be carried out on the source sampling field and the target sampling field, so that the data verification efficiency can be improved. If the verification between the source sampling field and the target sampling field is passed, the field format verification is performed on the source sampling field and the target sampling field, namely, the double verification is performed on the source sampling field and the target sampling field, so that the accuracy of data verification is improved. And if the field format check between the source sampling field and the target sampling field is passed, performing data check on the residual source partition field in at least one source partition table and the residual target partition field in at least one target partition table, including field check and field format check, obtaining a partition table check result, and determining a data check result based on the partition table check result.
Alternatively, the method of determining that the source data table and the target data table are inconsistent based on the partition table verification result may include the following cases:
In the first case, if the source partition field in any source partition table is not matched with the target partition field in the target partition table corresponding to the source partition table in at least one target partition table, determining that the source data table is inconsistent with the target data table. Wherein, the field matching may refer to the same field name, and the field format matching may refer to the same field format and field content.
In a second case, if each source partition field in the at least one source partition table matches a target partition field in the at least one target partition table, and a field format of one or more source partition fields in the at least one source partition table does not match a target partition field format in the at least one target partition table, determining that the source data table is inconsistent with the target data table.
That is, when the data verification is performed on the source partition field and the target partition field, if the verification between the source partition field and the target partition field is not passed, it is determined that the data verification result indicates that the source data table and the target data table are inconsistent. If the verification between the source partition field and the target partition field passes, and the verification between the field format of the source partition field and the field format of the target partition field does not pass, determining that the data verification result indicates that the source data table and the target data table are inconsistent.
That is, in the embodiment of the present application, a plurality of partition tables are obtained by partitioning a data table, and then sampling processing is performed on each partition table to obtain a sampling field, and the consistency between the source data table and the target data table is determined based on the consistency between the source sampling field and the target sampling field, and the consistency between the field format of the source sampling field and the field format of the target sampling field. By processing each source partition table and each target partition table in this way, consistency between each source partition table and each target partition table can be obtained, thereby determining a final data verification result. If the data verification results among any one of the partition tables are inconsistent, the source data table and the target data table are determined to be inconsistent. When the verification of any previous step fails, the source data table and the target data table can be indicated to be inconsistent, so that the subsequent verification step is stopped, and the data verification efficiency can be improved. If the step verification passes, the subsequent step is executed, so that the accuracy of data verification can be ensured, and the accuracy of data verification can be further improved under the condition of improving the data verification efficiency. Optionally, if each source partition field in the at least one source partition table is matched with a target partition field in the at least one target partition table, and a field format of each source partition field is matched with a field format of a target partition field in the at least one target partition table, performing data verification on the remaining source partition fields and the remaining target partition fields, and if the data verification between the remaining source partition fields and the remaining target partition fields passes, determining that the source data table is consistent with the target data table.
Because the prior art performs a comparison check for each source field in the source data table and each target field in the target data table when the consistency check is performed on the source data table and the target data table, the technical scheme of the application is that after the partitioning and sampling treatment are performed on the source data table and the target data table, the data verification is carried out by adopting the sampled fields, and the data verification efficiency is higher because the data volume for carrying out the data verification is smaller than the data volume for carrying out one-to-one comparison verification on each source field in the source data table and each target field in the target data table.
Optionally, if the verification between the field format of the at least one source sampling field and the field format of the at least one target sampling field passes, data verification may be performed on all source partition fields in the at least one source partition table and all target partition fields in the at least one target partition table to obtain a partition table verification result, and the data verification result is determined based on the partition table verification result. Because the data verification is performed after the fields in the partition table are sampled, if the data verification is passed, the data verification can be performed on the fields and the field formats in the whole partition table, and the accuracy of the data verification can be ensured.
Optionally, if the table type of the source data table and the table type of the target data table are full-size tables, and the data amount in the source data table and the data amount in the target data table are smaller than or equal to the data amount threshold; or the table types of the source data table and the target data table are incremental tables, namely when the data amount in the source data table and the data amount in the target data table are smaller, the computer equipment can perform data verification on all data in the source data table and all data in the target data table to determine the consistency of the source data table and the target data table. Specifically, the computer device may perform a field check on each source field in the source data table and each target field in the target data table; if the verification between each source field in the source data table and each target field in the target data table passes, performing field format verification on the field format of each source field in the source data table and the field format of each target field in the target data table, and determining a data verification result between the source data table and the target data table based on the field format verification result.
Optionally, if the target data table is obtained by processing the source data table, an intermediate level data table between the source data table and the target data table may be obtained, and the consistency between the source data table and the target data table may be determined by performing data verification on the source data table and the intermediate level data table and performing data verification on the intermediate level data table and the target data table. In particular, the computer device may obtain at least one intermediate-level data table between the source data table and the target data table; partitioning a first level field in a first level data table to obtain at least one first partitioning table; partitioning the second level field in the second level data table to obtain at least one second partitioning table; performing data verification on a source partition field in at least one source partition table and a first partition field in at least one first partition table; if the data verification between the source partition field and the first partition field passes, performing data verification on the first partition field in at least one first partition table and the second partition field in at least one second partition table; and if the data verification between the first partition field and the second partition field passes, performing data verification on the second partition field in at least one second partition table and the target partition field in at least one target partition table to obtain a data verification result.
The data verification comprises field verification and field format verification, the at least one intermediate level data table comprises a first level data table and a second level data table, the first level data table is obtained by carrying out data extraction processing on the source data table, the second level data table is obtained by carrying out data cleaning processing on the first level data table, and the target data table is obtained by carrying out logic processing on the second level data table. Data extraction refers to the process of extracting data from a data source; data cleaning refers to the process of rechecking and checking data, and aims to delete repeated information, correct existing errors and provide data consistency; the logic processing refers to performing a logic operation on the cleaned data. It can be understood that if the data verification between the source partition field and the first partition field does not pass, it is determined that the source data table and the target data table are inconsistent, and a subsequent data verification process is not required to be executed, so that the data verification efficiency is saved; if the data verification between the source partition field and the first partition field passes, and the data verification between the first partition field and the second partition field does not pass, determining that the source data table and the target data table are inconsistent, and no subsequent data verification process is required to be executed, so that the data verification efficiency is saved.
The data of the source data table and the intermediate level data table can be verified by acquiring the intermediate level data table between the source data table and the target data table, the data of the intermediate level data table and the target data table are verified, the consistency between the source data table and the target data table is determined, and the data verification is performed in a multi-layer verification mode, so that the abnormal data can be quickly determined due to the fact that the source data table and the target data table are inconsistent due to the abnormality of which link, and the data table is convenient to modify or operate in other modes.
Optionally, if the data verification result indicates that the source data table is inconsistent with the target data table, the computer device may acquire an abnormal source field in the source data table and an abnormal target field in the target data table, and output the abnormal source field and the abnormal target field.
Specifically, the computer device may output the abnormal source field and the abnormal target field in the visual interface, for example, may convert the data into dataframe and write the dataframe into excel through pandas, so that all the abnormal source fields and the abnormal target fields may be clearly displayed in the visual interface in a form of a table, and then may be checked manually to ensure the accuracy of data verification.
In the embodiment of the application, the table type of the source data table and the table type of the target data table are determined by acquiring the source data table and the target data table; if the table type of the source data table and the table type of the target data table are all full-amount tables, determining whether the data amount in the source data table and the data amount in the target data table are larger than a data amount threshold value or not respectively; if the data quantity in the source data table and the data quantity in the target data table are both larger than the data quantity threshold, partitioning the source data table to obtain at least one source partition table, and partitioning the target data table to obtain at least one target partition table; and carrying out data verification based on the source partition field in at least one source partition table and the target partition field in at least one target partition table, and if the data verification between the source partition field in any source partition table and the target partition field in the corresponding target partition table does not pass, determining that the source data table is inconsistent with the target data table. Because the table type of the data table is determined in advance, when the table type of the data table is determined to be a full-quantity table and the data quantity in the data table is larger than the data quantity threshold value, a plurality of partition tables are obtained by partitioning the data in the data table, the data in each partition table can be checked, and whether the data in the source partition table and the data in the target partition table are consistent or not is determined. If the data in a certain source partition table and the data in a target partition table are inconsistent, the source data table and the target data table can be determined to be inconsistent, the whole source data table and the whole target data table are not required to be checked, and the data checking efficiency can be improved. In addition, by automatically checking the data in the source data table and the target data table, the fields in the source data table and the target data table do not need to be compared manually, so that the data checking efficiency and the data checking accuracy can be improved.
Optionally, referring to fig. 2, fig. 2 is a flow chart of another data verification method according to an embodiment of the present application. The data verification method can be applied to computer equipment; as shown in fig. 2, the data verification method includes, but is not limited to, the following steps:
S201, acquiring a source data table and a target data table, and determining the table type of the source data table and the table type of the target data table.
S202, determining whether the table types of the source data table and the target data table are all full-quantity tables.
If the table type of the source data table and the table type of the target data table are all tables, step S203 is executed, i.e. whether the data amount in the source data table and the data amount in the target data table are greater than the data amount threshold is determined respectively. If not, i.e. the table type of the source data table and the table type of the target data table are both incremental tables, step S206 is executed, i.e. the field verification is performed on each source field in the source data table and each target field in the target data table.
S203, determining whether the data amount in the source data table and the data amount in the target data table are larger than a data amount threshold respectively.
If yes, that is, the data amount in the source data table and the data amount in the target data table are both greater than the data amount threshold, step S204 is executed. If not, that is, the data amount in the source data table and the data amount in the target data table are both less than or equal to the data amount threshold, step S206 is performed.
S204, partitioning the source data table to obtain at least one source partition table, and partitioning the target data table to obtain at least one target partition table.
S205, data verification is carried out based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table, and a data verification result is obtained.
S206, performing field verification on each source field in the source data table and each target field in the target data table.
S207, if the field verification is passed, performing field format verification on the field format of each source field in the source data table and the field format of each target field in the target data table, and determining a data verification result between the source data table and the target data table based on the field format verification result.
In the embodiment of the application, the table type of the source data table and the table type of the target data table are determined by acquiring the source data table and the target data table; if the table type of the source data table and the table type of the target data table are all full-amount tables, determining whether the data amount in the source data table and the data amount in the target data table are larger than a data amount threshold value or not respectively; if the data quantity in the source data table and the data quantity in the target data table are both larger than the data quantity threshold, partitioning the source data table to obtain at least one source partition table, and partitioning the target data table to obtain at least one target partition table; and carrying out data verification based on the source partition field in at least one source partition table and the target partition field in at least one target partition table, and if the data verification between the source partition field in any source partition table and the target partition field in the corresponding target partition table does not pass, determining that the source data table is inconsistent with the target data table. Because the table type of the data table is determined in advance, when the table type of the data table is determined to be a full-quantity table and the data quantity in the data table is larger than the data quantity threshold value, a plurality of partition tables are obtained by partitioning the data in the data table, the data in each partition table can be checked, and whether the data in the source partition table and the data in the target partition table are consistent or not is determined. If the data in a certain source partition table and the data in a target partition table are inconsistent, the source data table and the target data table can be determined to be inconsistent, the whole source data table and the whole target data table are not required to be checked, and the data checking efficiency can be improved. In addition, by automatically checking the data in the source data table and the target data table, the fields in the source data table and the target data table do not need to be compared manually, so that the data checking efficiency and the data checking accuracy can be improved.
The method of the embodiment of the application is described above, and the device of the embodiment of the application is described below.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a data verification device according to an embodiment of the present application, where the data verification device may be a computer program (including program code) running in a computer device, for example, the data verification device is an application software; the data verification device can be used for executing corresponding steps in the data verification method provided by the embodiment of the application. The data verification device 30 includes:
a data acquisition module 31, configured to acquire a source data table and a target data table, and determine a table type of the source data table and a table type of the target data table;
the number determining module 32 is configured to determine whether the data amount in the source data table and the data amount in the target data table are greater than a data amount threshold, respectively, if the table type of the source data table and the table type of the target data table are all full-amount tables;
The partition processing module 33 is configured to perform partition processing on the source data table to obtain at least one source partition table, and perform partition processing on the target data table to obtain at least one target partition table if the data amount in the source data table and the data amount in the target data table are both greater than the data amount threshold;
a data verification module 34, configured to perform data verification based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table;
The result determining module 35 is configured to determine that the source data table is inconsistent with the target data table if the data check between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, where the corresponding target partition table is a partition table corresponding to any source partition table in at least one target partition table.
Optionally, the partition processing module 33 is specifically configured to:
Determining an equally dividing rule for the source data table based on the data amount in the source data table, and dividing the source data table by adopting the equally dividing rule to obtain the at least one source partition table; or alternatively
Determining a field division rule for the source data table based on the field type of the field in the source data table, dividing the source data table by adopting the field division rule to obtain the at least one source partition table, wherein the field division rule indicates the preset field type.
Optionally, the data check includes a field check and a field format check; the data verification module 34 includes:
A first sampling unit 341, configured to sample the source partition field to obtain at least one source sampling field;
a second sampling unit 342, configured to sample the target partition field to obtain at least one target sampling field;
A field checking unit 343, configured to perform field checking on the at least one source sample field and the at least one target sample field;
A format checking unit 344, configured to perform a field format check on the field format of the at least one source sample field and the field format of the at least one target sample field if the field check passes;
And the area checking unit 345 is configured to perform data checking on the remaining source partition fields in the at least one source partition table and the remaining target partition fields in the at least one target partition table if the field format check passes.
Optionally, the result determining module 35 is specifically configured to:
if the source partition field in any source partition table is not matched with the target partition field in the corresponding target partition table, determining that the source data table is inconsistent with the target data table; or alternatively
And if each source partition field in the at least one source partition table is matched with a target partition field in the at least one target partition table, and the field format of one or more source partition fields in the at least one source partition table is not matched with the target partition field format in the at least one target partition table, determining that the source data table is inconsistent with the target data table.
Optionally, the data verification device 30 further includes: a second checking module 36, configured to perform field checking on each source field in the source data table and each target field in the target data table if the data amount in the source data table and the data amount in the target data table are less than or equal to the data amount threshold;
If the field verification is passed, performing field format verification on the field format of each source field in the source data table and the field format of each target field in the target data table;
if the field formats of one or more source fields in the source data table are not matched with the field formats of the target fields in the target data table, determining that the source data table is inconsistent with the target data table.
Optionally, the data verification device 30 further includes: a third verification module 37 for:
If the table type of the source data table and the table type of the target data table are incremental tables, performing field verification on each source field in the source data table and each target field in the target data table;
If the field verification is passed, performing field format verification on the field format of each source field in the source data table and the field format of each target field in the target data table;
if the field formats of one or more source fields in the source data table are not matched with the field formats of the target fields in the target data table, determining that the source data table is inconsistent with the target data table.
Optionally, the data verification device 30 further includes: a hierarchy acquisition module 38 for:
Acquiring at least one intermediate level data table between the source data table and the target data table, wherein the at least one intermediate level data table comprises a first level data table and a second level data table, the first level data table is obtained by carrying out data extraction processing on the source data table, the second level data table is obtained by carrying out data cleaning processing on the first level data table, and the target data table is obtained by carrying out logic processing on the second level data table;
partitioning a first level field in the first level data table to obtain at least one first partition table;
partitioning the second level field in the second level data table to obtain at least one second partitioning table;
the result determining module 35 is specifically configured to:
if the data verification between the source partition field and the first partition field is not passed, determining that the source data table is inconsistent with the target data table; or alternatively
If the data verification between the source partition field and the first partition field passes, performing data verification on the first partition field in the at least one first partition table and the second partition field in the at least one second partition table;
if the data verification between the first partition field and the second partition field is not passed, determining that the source data table is inconsistent with the target data table; or alternatively
And if the data verification between the first partition field and the second partition field is not passed, determining that the source data table is inconsistent with the target data table, wherein the data verification comprises field verification and field format verification.
It should be noted that, in the embodiment corresponding to fig. 3, the content not mentioned may be referred to the description of the method embodiment, and will not be repeated here.
In the embodiment of the application, the table type of the source data table and the table type of the target data table are determined by acquiring the source data table and the target data table; if the table type of the source data table and the table type of the target data table are all full-amount tables, determining whether the data amount in the source data table and the data amount in the target data table are larger than a data amount threshold value or not respectively; if the data quantity in the source data table and the data quantity in the target data table are both larger than the data quantity threshold, partitioning the source data table to obtain at least one source partition table, and partitioning the target data table to obtain at least one target partition table; and carrying out data verification based on the source partition field in at least one source partition table and the target partition field in at least one target partition table, and if the data verification between the source partition field in any source partition table and the target partition field in the corresponding target partition table does not pass, determining that the source data table is inconsistent with the target data table. Because the table type of the data table is determined in advance, when the table type of the data table is determined to be a full-quantity table and the data quantity in the data table is larger than the data quantity threshold value, a plurality of partition tables are obtained by partitioning the data in the data table, the data in each partition table can be checked, and whether the data in the source partition table and the data in the target partition table are consistent or not is determined. If the data in a certain source partition table and the data in a target partition table are inconsistent, the source data table and the target data table can be determined to be inconsistent, the whole source data table and the whole target data table are not required to be checked, and the data checking efficiency can be improved. In addition, by automatically checking the data in the source data table and the target data table, the fields in the source data table and the target data table do not need to be compared manually, so that the data checking efficiency and the data checking accuracy can be improved.
Referring to fig. 4, fig. 4 is a schematic diagram of a composition structure of a computer device according to an embodiment of the present application. As shown in fig. 4, the above-mentioned computer device 40 may include: processor 401, network interface 404 and memory 405, and in addition, the above-mentioned computer device 40 may further include: a user interface 403, and at least one communication bus 402. Wherein communication bus 402 is used to enable connected communications between these components. The user interface 403 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 403 may further include a standard wired interface and a wireless interface. The network interface 404 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 405 may be a high-speed RAM memory or a nonvolatile memory (non-volatile memory), such as at least one disk memory. The memory 405 may also optionally be at least one storage device located remotely from the aforementioned processor 401. As shown in fig. 4, an operating system, a network communication module, a user interface module, and a device control application may be included in the memory 405, which is a type of computer-readable storage medium.
In the computer device 40 shown in FIG. 4, the network interface 404 may provide network communication functions; while user interface 403 is primarily an interface for providing input to a user; and processor 401 may be used to invoke a device control application stored in memory 405 to implement:
Acquiring a source data table and a target data table, and determining the table type of the source data table and the table type of the target data table;
If the table type of the source data table and the table type of the target data table are all full-quantity tables, determining whether the data quantity in the source data table and the data quantity in the target data table are larger than a data quantity threshold value or not respectively;
If the data quantity in the source data table and the data quantity in the target data table are both larger than the data quantity threshold, partitioning the source data table to obtain at least one source partition table, and partitioning the target data table to obtain at least one target partition table;
Performing data verification based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table;
if the data verification between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, determining that the source data table is inconsistent with the target data table, wherein the corresponding target partition table is a partition table corresponding to any source partition table in at least one target partition table.
It should be understood that the computer device 40 described in the embodiment of the present application may perform the description of the data verification method in the embodiment corresponding to fig. 1 and 2, and may also perform the description of the data verification apparatus in the embodiment corresponding to fig. 3, which is not repeated herein. In addition, the description of the beneficial effects of the same method is omitted.
In the embodiment of the application, the table type of the source data table and the table type of the target data table are determined by acquiring the source data table and the target data table; if the table type of the source data table and the table type of the target data table are all full-amount tables, determining whether the data amount in the source data table and the data amount in the target data table are larger than a data amount threshold value or not respectively; if the data quantity in the source data table and the data quantity in the target data table are both larger than the data quantity threshold, partitioning the source data table to obtain at least one source partition table, and partitioning the target data table to obtain at least one target partition table; and carrying out data verification based on the source partition field in at least one source partition table and the target partition field in at least one target partition table, and if the data verification between the source partition field in any source partition table and the target partition field in the corresponding target partition table does not pass, determining that the source data table is inconsistent with the target data table. Because the table type of the data table is determined in advance, when the table type of the data table is determined to be a full-quantity table and the data quantity in the data table is larger than the data quantity threshold value, a plurality of partition tables are obtained by partitioning the data in the data table, the data in each partition table can be checked, and whether the data in the source partition table and the data in the target partition table are consistent or not is determined. If the data in a certain source partition table and the data in a target partition table are inconsistent, the source data table and the target data table can be determined to be inconsistent, the whole source data table and the whole target data table are not required to be checked, and the data checking efficiency can be improved. In addition, by automatically checking the data in the source data table and the target data table, the fields in the source data table and the target data table do not need to be compared manually, so that the data checking efficiency and the data checking accuracy can be improved.
The embodiments of the present application also provide a computer readable storage medium storing a computer program comprising program instructions which, when executed by a computer, cause the computer to perform a method as in the previous embodiments, the computer being part of a computer device as mentioned above. Such as the processor 401 described above. As an example, the program instructions may be executed on one computer device or on multiple computer devices located at one site, or on multiple computer devices distributed across multiple sites and interconnected by a communication network, which may constitute a blockchain network.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in the embodiments may be accomplished by computer programs to instruct related hardware, where the programs may be stored on a computer readable storage medium, and where the programs, when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like.
The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims (8)

1. A method of data verification, comprising:
acquiring a source data table and a target data table, and determining the table type of the source data table and the table type of the target data table;
If the table type of the source data table and the table type of the target data table are all full-quantity tables, determining whether the data quantity in the source data table and the data quantity in the target data table are larger than a data quantity threshold value or not respectively; if the data amount in the source data table and the data amount in the target data table are both larger than the data amount threshold, partitioning the source data table to obtain at least one source partition table, and partitioning the target data table to obtain at least one target partition table;
Acquiring at least one intermediate level data table between the source data table and the target data table, wherein the at least one intermediate level data table comprises a first level data table and a second level data table, the first level data table is obtained by carrying out data extraction processing on the source data table, the second level data table is obtained by carrying out data cleaning processing on the first level data table, and the target data table is obtained by carrying out logic processing on the second level data table;
partitioning a first level field in the first level data table to obtain at least one first partition table;
partitioning the second level field in the second level data table to obtain at least one second partitioning table;
Performing data verification based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table, comprising:
sampling the source partition field to obtain at least one source sampling field; the data verification comprises field verification and field format verification;
sampling the target partition field to obtain at least one target sampling field;
performing field verification on the at least one source sample field and the at least one target sample field;
if the field verification is passed, performing field format verification on the field format of the at least one source sampling field and the field format of the at least one target sampling field;
If the field format check is passed, performing data check on the remaining source partition fields in the at least one source partition table and the remaining target partition fields in the at least one target partition table;
If the data check between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, determining that the source data table is inconsistent with the target data table, where the corresponding target partition table is a partition table corresponding to any source partition table in the at least one target partition table, and the method includes:
If the data verification between the source partition field and the first partition field is not passed, determining that the source data table is inconsistent with the target data table; or alternatively
If the data verification between the source partition field and the first partition field passes, performing data verification on the first partition field in the at least one first partition table and the second partition field in the at least one second partition table;
If the data verification between the first partition field and the second partition field is not passed, determining that the source data table is inconsistent with the target data table; or alternatively
And if the data verification between the first partition field and the second partition field passes, performing data verification on the second partition field in the at least one second partition table and the target partition field in the at least one target partition table, and if the data verification between the second partition field and the target partition field does not pass, determining that the source data table is inconsistent with the target data table, wherein the data verification comprises field verification and field format verification.
2. The method of claim 1, wherein partitioning the source data table to obtain at least one source partition table comprises:
Determining an equally dividing rule for the source data table based on the data amount in the source data table, and dividing the source data table by adopting the equally dividing rule to obtain the at least one source partition table; or alternatively
Determining a field division rule aiming at the source data table based on the field type of the field in the source data table, dividing the source data table by adopting the field division rule to obtain the at least one source partition table, wherein the field division rule indicates a preset field type.
3. The method of claim 1, wherein determining that the source data table is inconsistent with the target data table if the data check between the source partition field in any source partition table and the target partition field in the corresponding target partition table does not pass, comprises:
If the source partition field in any source partition table is not matched with the target partition field in the corresponding target partition table, determining that the source data table is inconsistent with the target data table; or alternatively
And if each source partition field in the at least one source partition table is matched with a target partition field in the at least one target partition table, and the field formats of one or more source partition fields in the at least one source partition table are not matched with the target partition field formats in the at least one target partition table, determining that the source data table is inconsistent with the target data table.
4. The method of claim 1, wherein after said determining whether the amount of data in the source data table and the amount of data in the target data table are greater than a data amount threshold, the method further comprises:
If the data amount in the source data table and the data amount in the target data table are smaller than or equal to the data amount threshold, performing field verification on each source field in the source data table and each target field in the target data table;
If the field verification is passed, performing field format verification on the field format of each source field in the source data table and the field format of each target field in the target data table;
and if the field formats of one or more source fields in the source data table are not matched with the field formats of the target fields in the target data table, determining that the source data table is inconsistent with the target data table.
5. The method of claim 1, wherein after said determining the table type of the source data table and the table type of the target data table, the method further comprises:
if the table type of the source data table and the table type of the target data table are incremental tables, performing field verification on each source field in the source data table and each target field in the target data table;
If the field verification is passed, performing field format verification on the field format of each source field in the source data table and the field format of each target field in the target data table;
and if the field formats of one or more source fields in the source data table are not matched with the field formats of the target fields in the target data table, determining that the source data table is inconsistent with the target data table.
6. A data verification device for performing the method of any one of claims 1-5, comprising:
the data acquisition module is used for acquiring a source data table and a target data table and determining the table type of the source data table and the table type of the target data table;
The quantity determining module is used for determining whether the data quantity in the source data table and the data quantity in the target data table are larger than a data quantity threshold value or not respectively if the table type of the source data table and the table type of the target data table are all full-quantity tables;
The partition processing module is used for partitioning the source data table to obtain at least one source partition table and partitioning the target data table to obtain at least one target partition table if the data amount in the source data table and the data amount in the target data table are both larger than the data amount threshold;
the data verification module is used for carrying out data verification based on the source partition field in the at least one source partition table and the target partition field in the at least one target partition table;
And the result determining module is used for determining that the source data table is inconsistent with the target data table if the data verification between the source partition field in any source partition table and the target partition field in the corresponding target partition table is not passed, wherein the corresponding target partition table is a partition table corresponding to any source partition table in the at least one target partition table.
7. A computer device, comprising: a processor, a memory, and a network interface;
The processor is connected to the memory, the network interface for providing data communication functions, the memory for storing program code, the processor for invoking the program code to cause the computer device to perform the method of any of claims 1-5.
8. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the method of any of claims 1-5.
CN202111446609.8A 2021-11-29 2021-11-29 Data verification method, device, equipment and readable storage medium Active CN114116724B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111446609.8A CN114116724B (en) 2021-11-29 2021-11-29 Data verification method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111446609.8A CN114116724B (en) 2021-11-29 2021-11-29 Data verification method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN114116724A CN114116724A (en) 2022-03-01
CN114116724B true CN114116724B (en) 2024-08-23

Family

ID=80368790

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111446609.8A Active CN114116724B (en) 2021-11-29 2021-11-29 Data verification method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN114116724B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114647384B (en) * 2022-03-29 2025-08-22 阿里巴巴(中国)有限公司 Resource processing method and device
CN114942927B (en) * 2022-06-08 2025-01-14 中信建投证券股份有限公司 A database data comparison method and device
CN115438049A (en) * 2022-10-20 2022-12-06 中国农业银行股份有限公司 Partition table historical data storage method and device and computer readable medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908672A (en) * 2017-10-24 2018-04-13 深圳前海微众银行股份有限公司 Application report implementation method, equipment and storage medium based on Hadoop platform
CN111367886A (en) * 2020-03-02 2020-07-03 中国邮政储蓄银行股份有限公司 Method and device for data migration in database

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11366858B2 (en) * 2019-11-10 2022-06-21 Tableau Software, Inc. Data preparation using semantic roles
CN111581197B (en) * 2020-04-30 2023-06-13 中国工商银行股份有限公司 Method and device for sampling and checking data table in data set
CN113656404A (en) * 2021-07-30 2021-11-16 平安消费金融有限公司 Data verification method and device, computer equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908672A (en) * 2017-10-24 2018-04-13 深圳前海微众银行股份有限公司 Application report implementation method, equipment and storage medium based on Hadoop platform
CN111367886A (en) * 2020-03-02 2020-07-03 中国邮政储蓄银行股份有限公司 Method and device for data migration in database

Also Published As

Publication number Publication date
CN114116724A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN114116724B (en) Data verification method, device, equipment and readable storage medium
CN108897691B (en) Data processing method, device, server and medium based on interface simulation service
CN111026470B (en) System and method for verification and conversion of input data
CN110474900B (en) Game protocol testing method and device
CN107451112B (en) Form tool data checking method, device, terminal equipment and storage medium
CN114416877A (en) Data processing method, device and equipment and readable storage medium
CN111679979B (en) Destructive testing method and device
CN108959508B (en) SQL data generation method and device
CN113536770B (en) Text analysis method, device and equipment based on artificial intelligence and storage medium
CN110532180A (en) A kind of generation method and device of test data
CN115408453A (en) Configured report generation method and device, computer equipment and storage medium
CN111752766A (en) Redundancy detection method, device and equipment for data processing logic and storage medium
CN112181430A (en) Code change statistical method and device, electronic equipment and storage medium
CN118133794B (en) Table configuration method, apparatus, device and storage medium
US20250086011A1 (en) Automation with composable asynchronous tasks
CN113053531B (en) Medical data processing method, medical data processing device, computer readable storage medium and equipment
CN119066670A (en) Systems and methods for protecting proprietary data when using third-party AI/ML services
CN118297184A (en) Quality detection method and device for artificial intelligent system
CN116661758A (en) Method, device, electronic equipment and medium for optimizing log framework configuration
CN115292178A (en) Test data searching method, device, storage medium and terminal
CN114564336A (en) Data consistency checking method, device, equipment and storage medium
CN109933573B (en) Database service updating method, device and system
CN112433932A (en) Data processing method and device and computer storage medium
CN112650809B (en) Method and device for formatting tree structure data and electronic equipment
CN117112446B (en) Editor debugging method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant