[go: up one dir, main page]

CN112463411A - Data processing method, device, server and storage medium - Google Patents

Data processing method, device, server and storage medium Download PDF

Info

Publication number
CN112463411A
CN112463411A CN202011455610.2A CN202011455610A CN112463411A CN 112463411 A CN112463411 A CN 112463411A CN 202011455610 A CN202011455610 A CN 202011455610A CN 112463411 A CN112463411 A CN 112463411A
Authority
CN
China
Prior art keywords
data
processed
server
check
offset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011455610.2A
Other languages
Chinese (zh)
Inventor
贺宁
魏程琛
傅浩
张智鹏
李�诚
张瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Unisinsight Technology Co Ltd
Original Assignee
Chongqing Unisinsight Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Unisinsight Technology Co Ltd filed Critical Chongqing Unisinsight Technology Co Ltd
Priority to CN202011455610.2A priority Critical patent/CN112463411A/en
Publication of CN112463411A publication Critical patent/CN112463411A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/54Indexing scheme relating to G06F9/54
    • G06F2209/547Messaging middleware

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及大数据技术领域,提供了一种数据处理方法、装置、服务器及存储介质,应用于第一服务器,第一服务器与运行有消息中间件的第二服务器通信连接,第一服务器包括数据表,所述方法包括:通过消息中间件获取待处理数据,其中,待处理数据包括用于唯一表征待处理数据的标识;若存在与数据表相关的校验表,则利用校验表判断数据表中是否存在待处理数据,其中,数据表包括标识字段,校验表的主键与数据表的标识字段相关;若数据表中不存在待处理数据,则对待处理数据进行处理。相对于现有技术,本发明可以根据校验表判断数据表中是否存在重复的待处理数据,从而提高了重复数据的校验效率,最终提高了数据处理的效率。

Figure 202011455610

The present invention relates to the technical field of big data, and provides a data processing method, device, server and storage medium, which are applied to a first server, where the first server is connected in communication with a second server running message middleware, and the first server includes data The method includes: obtaining data to be processed through a message middleware, wherein the data to be processed includes an identifier used to uniquely characterize the data to be processed; if there is a check table related to the data table, use the check table to judge the data Whether there is data to be processed in the table, where the data table includes an identification field, and the primary key of the verification table is related to the identification field of the data table; if there is no data to be processed in the data table, the data to be processed is processed. Compared with the prior art, the present invention can judge whether there is duplicate data to be processed in the data table according to the check table, thereby improving the check efficiency of duplicate data and finally improving the efficiency of data processing.

Figure 202011455610

Description

Data processing method, device, server and storage medium
Technical Field
The invention relates to the technical field of big data, in particular to a data processing method, a data processing device, a server and a storage medium.
Background
The message middleware utilizes an efficient and reliable message transfer mechanism for platform-independent data communication and integration of a distributed system based on data communication. By providing the message transmission and message queuing models, a producer producing data and a consumer processing the data can be decoupled, a certain data buffering function is provided, and the reliability of data processing is greatly improved.
In the process of recovering the message middleware exception, the consumer can repeatedly take out the historical data for repeated processing because the rollback of the exception data can be involved.
In order to avoid repeated processing of data, a consumer usually determines whether repeated data exists first, and if so, the repeated processing of the data is not performed any more.
Disclosure of Invention
The invention aims to provide a data processing method, a data processing device, a server and a storage medium, which can improve the verification efficiency of repeated data and further improve the data processing efficiency.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
in a first aspect, the present invention provides a data processing method applied to a first server, where the first server is communicatively connected to a second server running a message middleware, and the first server includes a data table, where the method includes: acquiring data to be processed through the message middleware, wherein the data to be processed comprises an identifier for uniquely representing the data to be processed; if a check table related to the data table exists, judging whether the data to be processed exists in the data table or not by using the check table, wherein the data table comprises an identification field, and a main key of the check table is related to the identification field of the data table; and if the data to be processed does not exist in the data table, processing the data to be processed.
In a second aspect, the present invention provides a data processing apparatus applied to a first server, the first server being communicatively connected to a second server running a message middleware, the first server including a data table, the apparatus comprising: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring data to be processed through the message middleware, and the data to be processed comprises an identifier for uniquely representing the data to be processed; the judging module is used for judging whether the data to be processed exists in the data table or not by utilizing a checking table if the checking table related to the data table exists, wherein the data table comprises an identification field, and a main key of the checking table is related to the identification field of the data table; and the processing module is used for processing the data to be processed if the data to be processed does not exist in the data table.
In a third aspect, the present invention provides a server comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the data processing method as described above when executing the computer program.
In a fourth aspect, the invention provides a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, realizes the data processing method as described above.
Compared with the prior art, the method and the device have the advantages that the main key of the check table is determined according to the identification field of the data table, and whether repeated data to be processed exist in the data table can be judged according to the check table, so that the check efficiency of repeated data is improved, and the data processing efficiency is finally improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 shows a schematic view of an application scenario provided in an embodiment of the present invention.
Fig. 2 is a block diagram illustrating a server according to an embodiment of the present invention.
Fig. 3 is a flowchart illustrating a data processing method according to an embodiment of the present invention.
Fig. 4 is a flowchart illustrating another data processing method according to an embodiment of the present invention.
Fig. 5 is a flowchart illustrating another data processing method according to an embodiment of the present invention.
Fig. 6 is a flowchart illustrating another data processing method according to an embodiment of the present invention.
Fig. 7 is a flowchart illustrating another data processing method according to an embodiment of the present invention.
Fig. 8 is a flowchart illustrating another data processing method according to an embodiment of the present invention.
Fig. 9 is a block diagram illustrating a data processing apparatus according to an embodiment of the present invention.
Icon: 10-a first server; 11-a processor; 12-a memory; 13-a bus; 14-a communication interface; 20-a second server; 30-a third server; 100-a data processing device; 110-an obtaining module; 120-a judgment module; 130-a processing module; 140-a recovery module; 150-an update module; 160-purge module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present invention, it should be noted that if the terms "upper", "lower", "inside", "outside", etc. indicate an orientation or a positional relationship based on that shown in the drawings or that the product of the present invention is used as it is, this is only for convenience of description and simplification of the description, and it does not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention.
Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.
At present, under the background of big data, along with the development of distributed computation, the drawback of the mode of synchronous processing shows gradually, because the producer who is responsible for data generation and the consumer who is responsible for processing data are not matched with each other in step, the synchronous processing mode often can cause losing of data, and the mode of asynchronous processing owing to kept in to the data that the producer generated to the problem that the data that the synchronous mode exists are lost has been solved. To achieve decoupling between the producer and the consumer, message middleware is typically employed to temporarily store data generated by the producer. Referring to fig. 1, fig. 1 is a schematic view illustrating an application scenario provided by an embodiment of the present invention, in fig. 1, a first server 10 is a consumer responsible for data processing, a second server 20 is a server running a message middleware and responsible for temporary storage of data, and a third server is a producer responsible for data generation. The producer generates data, sends the generated data to the message middleware for temporary storage, and the consumer acquires the data from the message middleware and processes the acquired data, for example, stores the acquired data in a preset database.
In this embodiment, the message middleware may be, but is not limited to, kakfa, ActiveMQ, RabbitMQ, and the like.
In this embodiment, the first server 10, the second server 20, and the third server 30 may be physical computers or virtual machines capable of implementing the same functions as the physical computers, one server, or a server cluster composed of multiple servers.
It should be noted that the message middleware may run on the second server 20 independently, or may run on the first server 10 or the third server 30 as a software functional module.
On the basis of fig. 1, a block schematic diagram of the first server 10 in fig. 1 is provided in the embodiment of the present invention, please refer to fig. 2, and fig. 2 shows a block schematic diagram of the first server 10 provided in the embodiment of the present invention.
The first server 10 comprises a processor 11, a memory 12, a bus 13, a communication interface 14. The processor 11 and the memory 12 are connected by a bus 13, and the processor 11 communicates with the second server 20 through a communication interface 14.
The processor 11 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 11. The Processor 11 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components.
The memory 12 is used for storing a program, such as the data processing apparatus 100 in the embodiment of the present invention, the data processing apparatus 100 includes at least one software functional module which can be stored in the memory 12 in a form of software or firmware (firmware), and the processor 11 executes the program after receiving an execution instruction to implement the data processing method in the embodiment of the present invention.
The Memory 12 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory). Alternatively, the memory 12 may be a storage device built in the processor 11, or may be a storage device independent of the processor 11.
The bus 13 may be an ISA bus, a PCI bus, an EISA bus, or the like. Fig. 2 is represented by only one double-headed arrow, but does not represent only one bus or one type of bus.
Referring to fig. 3, fig. 3 is a flowchart illustrating a data processing method according to an embodiment of the present invention, where the method is applied to the first server 10 in fig. 1 and fig. 2, and the method includes the following steps:
step S100, data to be processed is obtained through message middleware, wherein the data to be processed comprises an identifier for uniquely representing the data to be processed.
In this embodiment, when the producer generates data, an identifier for uniquely characterizing the data is generated, the data and the identifier thereof are sent to the message middleware for temporary storage, and the consumer takes the data out of the middleware, processes the data, and stores the data into the data table. The data table may be stored in advance on the first server 10, or may be stored in advance in an external database server communicatively connected to the first server 10.
In this embodiment, the identifier of the data to be processed may be composed of two or more of the generation time, the service name, and the module number.
Step S110, if a checking table related to the data table exists, judging whether the data table has data to be processed by using the checking table, wherein the data table comprises an identification field, and a main key of the checking table is related to the identification field of the data table.
In this embodiment, the check table is provided with a primary key, where the primary key is a candidate field selected for unique identification, and the primary key may be composed of one field or a plurality of fields. The primary key typically functions four times: the integrity of an entity can be ensured; the operation speed of the database can be accelerated; when adding new record in the table, automatically checking the primary key value of the new record, and not allowing the value to be repeated with the primary key values of other records; and fourthly, automatically displaying the records in the table according to the sequence of the primary key values. Therefore, after the check table sets the primary key, when the identification of the data is stored in the check table, the primary key check is triggered first, that is, whether the value of the primary key of the data to be stored already exists is judged, if the value of the primary key of the data to be stored already exists, the primary key check fails, which means that the data is the repeated data, and the identification of the data stored in the check table fails in the data table, otherwise, the primary key check succeeds, which means that the data is not the repeated data, and the identification of the data stored in the check table succeeds.
In this embodiment, the primary key of the check table is related to the identification field of the data table, for example, the primary key of the check table may be the identification field of the data table, or may be a combination of the identification field of the data table and other fields. For example, the data table includes a field a, a field B, and a field C, where the field a is an identification field, that is, two records having the same value of the field a do not exist in the data table, and the related check table of the data table includes the field a, and the field a is set as the primary key.
In this embodiment, the number of fields in the check table may be smaller than that of the data table, for example, the check table includes field a and the data table includes fields A, B and C, or the number of data pieces in the check table is smaller than that of the data table, for example, the data table stores data of the last year, the number of data pieces is ten thousand, and the check table stores data of the last day, and the number of data pieces is only thousands. The data volume in the check table is smaller than that in the data table, the check table is provided with the main key, the data table does not need to be provided with the main key, and the main key is related to the identification field of the data table, so that whether the data to be processed is repeated data in the data table can be quickly judged by using the check table.
And step S120, if the data to be processed does not exist in the data table, processing the data to be processed.
In this embodiment, the absence of the data to be processed in the data table means that the data to be processed is not duplicated data, at this time, the data to be processed needs to be processed, and the data to be processed is stored in the data table after processing, it should be noted that the identifier of the data to be processed is correspondingly stored in the corresponding position of the identifier field in the data table.
According to the method provided by the embodiment of the invention, the comprehensiveness of the data is ensured and the reliability of the data is increased by setting the check table for the data table, the main key of the check table is determined according to the identification field of the data table, and whether repeated data to be processed exists in the data table can be judged according to the main key of the check table, so that the check efficiency of repeated data is improved, and the data processing efficiency is finally improved.
On the basis of fig. 3, an embodiment of the present invention further provides a specific implementation manner for determining whether there is data to be processed in the data table, please refer to fig. 4, where fig. 4 shows a flowchart of another data processing method provided in the embodiment of the present invention, and step S110 includes the following sub-steps:
and a substep S1101 of performing primary key verification on the identifier of the data to be processed by using the verification table.
In this embodiment, because the primary key of the check table is related to the identification field of the data table, and the identification field of the data table is the only field for representing data, the primary key check can be performed on the identification of the data to be processed through the check table, and whether the data to be processed is the repeated data in the data table is determined according to the primary key check result.
And a substep S1102, if the identifier of the data to be processed passes the primary key verification, judging that the data to be processed does not exist in the data table, and storing the identifier of the data to be processed into the verification table.
And a substep S1103 of determining that the data to be processed exists in the data table if the identifier of the data to be processed does not pass the primary key verification.
In this embodiment, if the identifier of the to-be-processed data does not pass the primary key verification, the identifier of the to-be-processed data is not stored in the verification table.
According to the method provided by the embodiment of the invention, the main key check is carried out on the identifier of the data to be processed to judge whether the data to be processed is the repeated data, so that the judging efficiency is improved.
In this embodiment, when the system is started, a check table is not yet created, and at this time, in order to subsequently perform repeated data determination according to the check table, the check table needs to be created first, and a primary key is set for the newly created check table, so that another data processing method is further provided in this embodiment of the present invention, referring to fig. 5, where fig. 5 shows a flowchart of another data processing method provided in this embodiment of the present invention, the method further includes:
step S130, if the checking table related to the data table does not exist, the checking table is created, and the main key of the checking table is determined according to the identification field of the data table.
According to the method provided by the embodiment of the invention, the main key of the check table is set while the check table is created, so that the influence of creating the check table on the subsequent data processing efficiency can be reduced to the maximum extent.
In this embodiment, in order to record the location of the consumer fetching data from the message middleware, the consumer usually sets a second offset in the message middleware to represent the location of the current data to be processed, and after the consumer finishes processing the data, the consumer usually sets a first offset to represent the location of the current processed data, where the first offset and the second offset are normally synchronous or substantially synchronous. However, when the message middleware is abnormal, for example, the consumer fails to write the second offset, or the consumer fails to process the data, or the consumer fails to write the first offset, at this time, data rollback needs to be performed, the stored data is restored to a previous state, and the first offset and the second offset also need to be synchronously rolled back to the previous state, so as to avoid missing during data restoration, an embodiment of the present invention further provides a specific implementation of abnormal restoration, please refer to fig. 6, and fig. 6 shows a flowchart of another data processing method provided by the embodiment of the present invention, where the method includes:
step S200, when detecting that the message middleware performs an abnormal recovery, using the smaller of the first offset and the second offset as a start position.
In this embodiment, the first offset and the second offset are continuously increased along with the processing of the data, so that the smaller position of the first offset and the second offset is used as the starting position of the recovery, the starting position corresponds to the earlier processed data, and in order to avoid the omission of the data, the data recovery is performed from the data at the starting position (i.e., the earlier data). For example, if the first offset is 1000 and the second offset is 998, the start position is 998, and data is restored from the data at the position 998.
Step S210, updating the data table and the check table from the data at the start position of the message middleware to perform data exception recovery.
In this embodiment, the process of performing data exception recovery on data from the start position is similar to the foregoing steps S100 to S120, and data to be recovered is sequentially taken out from the start position, and a check table is used to determine whether the data to be recovered already exists in the data table, if so, the data to be recovered is ignored, and the next data is continuously recovered, otherwise, the data to be recovered is processed and stored in the data table.
In this embodiment, in order to enable the first offset and the second offset to be always in correct positions, an embodiment of the present invention further provides a specific implementation manner for updating the first offset and the second offset, please refer to fig. 7, and fig. 7 shows a flowchart of another data processing method provided in the embodiment of the present invention, where the method includes step S101 and step S131.
Step S101, the control message middleware updates the second offset.
In this embodiment, the updating of the second offset may be performed after the first server 10 obtains the data to be processed through the message middleware, and the first server 10 controls the second server 20 to update the second offset in the second server 20, so that the second offset points to the next data to be processed, which needs to be obtained.
Step S121, store the data to be processed in the data table, and update the first offset.
In this embodiment, the updating of the first offset may be performed after the data to be processed is processed, in an application scenario where the data to be processed needs to be stored, after the data to be processed is processed, the data to be processed also needs to be stored in the data table, and after the data to be processed is successfully stored, the first offset is updated to point to the currently processed data.
Note that, when data is restored, the second offset and the first offset may not be updated in step S101 and step S131, so that the second offset and the first offset point to the correct positions after data restoration.
In this embodiment, in order to avoid that the data amount in the check table is too large to affect the efficiency of determining the repeated data, it is necessary to properly clear the data in the check table, so that only a preset amount of data is stored in the check table, and therefore, an embodiment of the present invention further provides a method for clearing the data in the check table, please refer to fig. 8, where fig. 8 shows a flowchart of another data processing method provided by the embodiment of the present invention, where the method includes the following steps:
step S300, analyzing the generation time of each data record from the identification field of each data record in the check table.
In this embodiment, the check table includes a plurality of data records, a value of an identification field of each data record is an identification of the data record, and the identification is determined according to a generation time of each data record, for example, the identification may be composed of the generation time and a serial number, and in a scenario where a plurality of types of services exist, a unique code may also be set for each type of service, and at this time, the identification may be composed of the generation time, a service name, a code of the service type, and a serial number. For example, each service type is defined as a unique code with a number of 2 bits, such as person: 01, face: 02, each type of service is also coded according to the codes of the pods, the number of coded bits is 2, if person-pod-0 is numbered 00, person-pod-1 is numbered 01, and the number of the service of the same type is unique. And the four-bit stream is processed and is subjected to scribing distribution through the serial number of the pod, the number of the 4-bit stream is 1 ten thousand pieces of data in total, if the same type of service is N, the number of one service is 10000/N, and if a remainder exists, the number is given to the last service number. The overall identification generation rule is as follows: time 20201024000000+ type number 01+ service number 01+ serial number 4 bits. The identifier generation rule provided by the embodiment can avoid repetition of the primary key to the greatest extent and prevent data loss.
In this embodiment, the producer generates an identifier for the produced data according to the identifier generation rule, sends the identifier and the corresponding data to the message middleware for temporary storage, and the consumer takes the data and the corresponding identifier out of the message middleware, stores the identifier of the data in the check table, and stores the data and the identifier of the data in the data table.
In step S310, the data record in the check table whose generation time is within the preset time period is cleared.
In this embodiment, a preset time period may be preset, only the data in the preset time period is stored in the check table, and the data in the check table may be cleared according to a preset cycle, so that only the data in the preset time period is stored in the check table. For example, the preset period is one day, the preset time period is the last 3 days, and when each day is 3 am, the generation time of each data record is analyzed from the identification field of each data record in the check table, all data before 2 days are deleted, and it is ensured that only the data of the last 3 days are stored in the check table.
According to the method provided by the embodiment of the invention, the data in the sub-table is maintained to be a relatively small data volume by regularly deleting the data in the sub-table, so that the verification speed of the repeated data is accelerated.
In order to more clearly illustrate the data processing scheme provided by the above embodiment, the embodiment of the present invention further provides a specific example for detailed description, and takes the message middleware kafaka as an example for description.
For example, the number of the vehicle consumption service is 01, the number is 4, the preset period of the check table is 1 day, and the current time is 2020, 10, 24, 05, 04 minutes and 02 seconds.
The data processing procedure is as follows:
firstly, the built-in initialization data is loaded in the service initialization process, the vehicle 01 is used for loading, the number is from 00 to 03 because the number is 4, and the processing serial numbers of each service are 00-03: 0000-; when the data is pod0 and only this piece of data is processed this second, the token generated according to the token generation rule of the above embodiment is 2020102405040201000000. And after the data generates the identification and completes the corresponding service processing, the data is stored in kafka.
Secondly, the consumer service monitors that the second offset in the kakfa is updated, the data in the kafka are consumed, the second offset on the kafka side is updated immediately after the data are consumed in the service memory, whether a check table for 24 days exists is determined, and if the second offset does not exist, the check table is created; if the data is consistent with the identification in the current check table, the data is discarded, if the data is not consistent with the identification, other data are continuously compared, after comparison processing is completed, the data is uniformly stored in the data table through a copy method, and whether new data is written into the check table or not, the first offset needs to be updated.
The data clearing process in the check table is as follows: the check table only stores the data of the identification field of the data table, and the data table stores the data of all the fields of the data. When the time is 10 and 25 months in 2020 and 3 am, a new checking table of 25/26/27 days is created, and if the checking table exists, the checking table is skipped. After the new check table is built, the expired data in the sub-table is cleaned, and the retention period is only 1 day, so that the expired data is data before 24 days, including all data of No. 23.
In order to perform the embodiments of the data processing method described above and the corresponding steps in the various possible embodiments, an implementation of the data processing apparatus 100 is given below. Referring to fig. 9, fig. 9 is a block diagram illustrating a data processing apparatus 100 according to an embodiment of the invention. It should be noted that the basic principle and the resulting technical effect of the data processing apparatus 100 provided in the present embodiment are the same as those of the above embodiments, and for the sake of brief description, no reference is made to this embodiment.
The data processing apparatus 100 includes an obtaining module 110, a determining module 120, a processing module 130, a recovering module 140, an updating module 150, and a clearing module 160.
The obtaining module 110 is configured to obtain data to be processed through message middleware, where the data to be processed includes an identifier for uniquely characterizing the data to be processed.
The determining module 120 is configured to determine whether to-be-processed data exists in the data table by using the check table if the check table related to the data table exists, where the data table includes an identification field, and a primary key of the check table is related to the identification field of the data table.
As a specific implementation manner, the determining module 120 is specifically configured to: performing primary key verification on the identifier of the data to be processed by using a verification table; if the identification of the data to be processed passes the primary key verification, judging that the data to be processed does not exist in the data table, and storing the identification of the data to be processed into a verification table; and if the identifier of the data to be processed does not pass the primary key verification, judging that the data to be processed exists in the data table.
As a specific implementation manner, the determining module 120 is further configured to: and if the checking table related to the data table does not exist, creating the checking table, and determining the main key of the checking table according to the identification field of the data table.
The processing module 130 is configured to process the data to be processed if the data to be processed does not exist in the data table.
A recovery module 140 for: when the message middleware is detected to perform abnormal recovery, the smaller one of the first offset and the second offset is used as an initial position; the data table and the check table are updated from the data at the start position of the message middleware for data exception recovery.
An updating module 150, configured to control the message middleware to update the second offset.
As a specific embodiment, the update module 150 is further configured to: and storing the data to be processed into a data table and updating the first offset.
A purge module 160 for: analyzing the generation time of each data record from the identification field of each data record in the check table; and clearing the data records with the generation time within the preset time period in the check table.
An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the data processing method described above.
In summary, embodiments of the present invention provide a data processing method, an apparatus, a server, and a storage medium, which are applied to a first server, where the first server is communicatively connected to a second server running a message middleware, and the first server includes a data table, where the method includes: acquiring data to be processed through message middleware, wherein the data to be processed comprises an identifier for uniquely representing the data to be processed; if the checking table related to the data table exists, judging whether the data table has data to be processed or not by using the checking table, wherein the data table comprises an identification field, and a main key of the checking table is related to the identification field of the data table; and if the data to be processed does not exist in the data table, processing the data to be processed. Compared with the prior art, the embodiment of the invention determines the main key of the check table according to the identification field of the data table, and can judge whether repeated data to be processed exists in the data table according to the check table, thereby improving the check efficiency of repeated data and finally improving the data processing efficiency.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1.一种数据处理方法,其特征在于,应用于第一服务器,所述第一服务器与运行有消息中间件的第二服务器通信连接,所述第一服务器包括数据表,所述方法包括:1. A data processing method, characterized in that it is applied to a first server, the first server is communicatively connected with a second server running message middleware, the first server comprises a data table, and the method comprises: 通过所述消息中间件获取待处理数据,其中,所述待处理数据包括用于唯一表征所述待处理数据的标识;Obtaining data to be processed through the message middleware, wherein the data to be processed includes an identifier for uniquely characterizing the data to be processed; 若存在与所述数据表相关的校验表,则利用校验表判断所述数据表中是否存在所述待处理数据,其中,所述数据表包括标识字段,所述校验表的主键与所述数据表的标识字段相关;If there is a check table related to the data table, use the check table to determine whether the data to be processed exists in the data table, wherein the data table includes an identification field, and the primary key of the check table is the same as the The identification field of the data table is related; 若所述数据表中不存在所述待处理数据,则对所述待处理数据进行处理。If the data to be processed does not exist in the data table, the data to be processed is processed. 2.如权利要求1所述的数据处理方法,其特征在于,所述利用校验表判断所述数据表中是否存在所述待处理数据的步骤包括:2. The data processing method according to claim 1, wherein the step of judging whether the data to be processed exists in the data table by using a check table comprises: 利用所述校验表对所述待处理数据的标识进行主键校验;Use the check table to perform primary key check on the identifier of the data to be processed; 若所述待处理数据的标识通过主键校验,则判定所述数据表中不存在所述待处理数据、并将所述待处理数据的标识存储至所述校验表;If the identifier of the data to be processed passes the primary key verification, it is determined that the data to be processed does not exist in the data table, and the identifier of the data to be processed is stored in the verification table; 若所述待处理数据的标识未通过主键校验,则判定所述数据表中存在所述待处理数据。If the identifier of the data to be processed fails the primary key verification, it is determined that the data to be processed exists in the data table. 3.如权利要求1所述的数据处理方法,其特征在于,所述方法还包括:3. The data processing method according to claim 1, wherein the method further comprises: 若不存在与所述数据表相关的校验表,则创建校验表,并依据所述数据表的标识字段确定所述校验表的主键。If there is no check table related to the data table, a check table is created, and the primary key of the check table is determined according to the identification field of the data table. 4.如权利要求1所述的数据处理方法,其特征在于,所述第一服务器存储有第一偏移量,所述第一偏移量用于表征当前已处理数据的位置,所述第二服务器存储有第二偏移量,所述第二偏移量用于表征当前待处理数据的位置,所述方法还包括:4. The data processing method according to claim 1, wherein the first server stores a first offset, the first offset is used to represent the position of the currently processed data, the first offset The second server stores a second offset, where the second offset is used to represent the current position of the data to be processed, and the method further includes: 当检测到所述消息中间件进行异常恢复时,将所述第一偏移量和所述第二偏移量之中较小的作为起始位置;When it is detected that the message middleware performs abnormal recovery, the smaller of the first offset and the second offset is used as the starting position; 从所述消息中间件的所述起始位置处的数据对所述数据表和所述校验表进行更新,以进行数据异常恢复。The data table and the check table are updated from the data at the starting position of the message middleware, so as to perform abnormal data recovery. 5.如权利要求4所述的数据处理方法,其特征在于,所述通过所述消息中间件获取待处理数据的步骤之后还包括:5. The data processing method according to claim 4, wherein after the step of acquiring the data to be processed through the message middleware, the step further comprises: 控制所述消息中间件更新所述第二偏移量。The message middleware is controlled to update the second offset. 6.如权利要求4所述的数据处理方法,其特征在于,所述对所述待处理数据进行处理的步骤之后还包括:6. The data processing method according to claim 4, wherein the step of processing the data to be processed further comprises: 将所述待处理数据存储至所述数据表、并更新所述第一偏移量。The data to be processed is stored in the data table, and the first offset is updated. 7.如权利要求1所述的数据处理方法,其特征在于,所述校验表的主键为所述数据表的标识字段,所述校验表包括多条数据记录,每条所述数据记录的标识字段的值是根据每条所述数据记录的生成时间确定的,所述方法还包括:7. The data processing method according to claim 1, wherein the primary key of the check table is the identification field of the data table, and the check table comprises a plurality of data records, and each of the data records The value of the identification field is determined according to the generation time of each of the data records, and the method further includes: 从所述校验表中每条所述数据记录的标识字段中解析出每条所述数据记录的生成时间;Parse out the generation time of each of the data records from the identification field of each of the data records in the check table; 将所述校验表中生成时间在预设时段内的数据记录清除。Clearing data records whose generation time is within a preset time period in the verification table. 8.一种数据处理装置,其特征在于,应用于第一服务器,所述第一服务器与运行有消息中间件的第二服务器通信连接,所述第一服务器包括数据表,所述装置包括:8. A data processing device, characterized in that it is applied to a first server, the first server is communicatively connected to a second server running message middleware, the first server comprises a data table, and the device comprises: 获取模块,用于通过所述消息中间件获取待处理数据,其中,所述待处理数据包括用于唯一表征所述待处理数据的标识;an acquisition module, configured to acquire data to be processed through the message middleware, wherein the data to be processed includes an identifier for uniquely characterizing the data to be processed; 判断模块,用于若存在与所述数据表相关的校验表,则利用校验表判断所述数据表中是否存在所述待处理数据,其中,所述数据表包括标识字段,所述校验表的主键与所述数据表的标识字段相关;The judgment module is configured to use the check table to judge whether the data to be processed exists in the data table if there is a check table related to the data table, wherein the data table includes an identification field, and the check table includes an identification field. The primary key of the verification table is related to the identification field of the data table; 处理模块,用于若所述数据表中不存在所述待处理数据,则对所述待处理数据进行处理。A processing module, configured to process the data to be processed if the data to be processed does not exist in the data table. 9.一种服务器,包括存储器和处理器,其特征在于,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现如权利要求1-7中任一项所述的数据处理方法。9. A server comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the data processing according to any one of claims 1-7 when the processor executes the computer program method. 10.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1-7中任一项所述的数据处理方法。10. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the data processing method according to any one of claims 1-7 is implemented.
CN202011455610.2A 2020-12-10 2020-12-10 Data processing method, device, server and storage medium Pending CN112463411A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011455610.2A CN112463411A (en) 2020-12-10 2020-12-10 Data processing method, device, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011455610.2A CN112463411A (en) 2020-12-10 2020-12-10 Data processing method, device, server and storage medium

Publications (1)

Publication Number Publication Date
CN112463411A true CN112463411A (en) 2021-03-09

Family

ID=74800755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011455610.2A Pending CN112463411A (en) 2020-12-10 2020-12-10 Data processing method, device, server and storage medium

Country Status (1)

Country Link
CN (1) CN112463411A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113608896A (en) * 2021-08-12 2021-11-05 重庆紫光华山智安科技有限公司 Method, system, medium and terminal for dynamically switching data stream
CN114564253A (en) * 2022-03-02 2022-05-31 重庆紫光华山智安科技有限公司 Task creation method, system, electronic device and readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130073593A1 (en) * 2003-01-15 2013-03-21 Luke Martin Leonard Porter Bitemporal Relational Databases and Methods of Manufacturing and Use
CN104754036A (en) * 2015-03-06 2015-07-01 合一信息技术(北京)有限公司 Message processing system and processing method based on kafka
CN105812405A (en) * 2014-12-29 2016-07-27 阿里巴巴集团控股有限公司 Method, device and system for processing messages
CN106776811A (en) * 2016-11-23 2017-05-31 李天� data index method and device
CN107633096A (en) * 2017-10-13 2018-01-26 四川长虹电器股份有限公司 Data write duplicate removal treatment method in real time
CN109101528A (en) * 2018-06-21 2018-12-28 深圳市买买提信息科技有限公司 Data processing method, data processing equipment and electronic equipment
CN111626865A (en) * 2020-05-22 2020-09-04 泰康保险集团股份有限公司 Data processing method and device, electronic equipment and storage medium
CN112000489A (en) * 2020-07-29 2020-11-27 新华三大数据技术有限公司 Kafka data processing method and server

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130073593A1 (en) * 2003-01-15 2013-03-21 Luke Martin Leonard Porter Bitemporal Relational Databases and Methods of Manufacturing and Use
CN105812405A (en) * 2014-12-29 2016-07-27 阿里巴巴集团控股有限公司 Method, device and system for processing messages
CN104754036A (en) * 2015-03-06 2015-07-01 合一信息技术(北京)有限公司 Message processing system and processing method based on kafka
CN106776811A (en) * 2016-11-23 2017-05-31 李天� data index method and device
CN107633096A (en) * 2017-10-13 2018-01-26 四川长虹电器股份有限公司 Data write duplicate removal treatment method in real time
CN109101528A (en) * 2018-06-21 2018-12-28 深圳市买买提信息科技有限公司 Data processing method, data processing equipment and electronic equipment
CN111626865A (en) * 2020-05-22 2020-09-04 泰康保险集团股份有限公司 Data processing method and device, electronic equipment and storage medium
CN112000489A (en) * 2020-07-29 2020-11-27 新华三大数据技术有限公司 Kafka data processing method and server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
知乎用户DI4RHU: "微服务架构之幂等性问题及设计思想", pages 1, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/74046140?utm_id=0> *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113608896A (en) * 2021-08-12 2021-11-05 重庆紫光华山智安科技有限公司 Method, system, medium and terminal for dynamically switching data stream
CN113608896B (en) * 2021-08-12 2023-09-08 重庆紫光华山智安科技有限公司 Method, system, medium and terminal for dynamically switching data streams
CN114564253A (en) * 2022-03-02 2022-05-31 重庆紫光华山智安科技有限公司 Task creation method, system, electronic device and readable storage medium
CN114564253B (en) * 2022-03-02 2023-06-09 重庆紫光华山智安科技有限公司 Task creation method, system, electronic device and readable storage medium

Similar Documents

Publication Publication Date Title
CN109597722A (en) Database backup file restoration methods, device and electronic equipment
CN111737137B (en) Interface test data generation method and device, host and storage medium
CN112084066B (en) Data processing method, device and storage medium
US9582314B2 (en) Managing data consistency between loosely coupled components in a distributed computing system
CN112463411A (en) Data processing method, device, server and storage medium
CN111290910B (en) Log processing method, device, server and storage medium
US20160266961A1 (en) Trace capture of successfully completed transactions for trace debugging of failed transactions
CN107169055B (en) A kind of operating method and operating system of database table
CN113448754A (en) Account checking method and device and electronic equipment
CN115357429A (en) Method and device for recovering data file and client
CN115544989A (en) Form calculation method, electronic device and computer storage medium
CN114722387A (en) Database abnormal tampering detection method, device, equipment and storage medium
CN114416883A (en) Block chain light node data synchronization method, device, equipment and readable storage medium
CN109376141B (en) Data migration method and device
CN117290158A (en) Data recovery method and device, electronic equipment and storage medium
CN116932150A (en) Transaction processing method, device, equipment and storage medium
CN113986877B (en) Data migration method, device, equipment and storage medium
US20220029814A1 (en) Non-transitory computer-readable storage medium, information processing method, and information processing apparatus
JP2018197993A (en) TERMINAL DEVICE, SERVER DEVICE, TERMINAL DEVICE CONTROL PROGRAM, SERVER DEVICE CONTROL METHOD, AND SYSTEM
CN110990475B (en) Batch task inserting method and device, computer equipment and storage medium
JP2010152707A (en) Backup method of database and database system
CN115220957A (en) A thread restart method and related equipment
CN113760870B (en) Business data processing method, device and equipment
CN110232003A (en) Data reconstruction method, device, electronic equipment and storage medium
CN120353808B (en) Control method, device and storage medium of write-off system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210309