CN116383295A - Data processing method, device, electronic equipment and computer readable storage medium - Google Patents
Data processing method, device, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN116383295A CN116383295A CN202310658405.3A CN202310658405A CN116383295A CN 116383295 A CN116383295 A CN 116383295A CN 202310658405 A CN202310658405 A CN 202310658405A CN 116383295 A CN116383295 A CN 116383295A
- Authority
- CN
- China
- Prior art keywords
- task
- etl
- target
- etl task
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/302—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/448—Execution paradigms, e.g. implementations of programming paradigms
- G06F9/4482—Procedural
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a data processing method, a device, electronic equipment and a computer readable storage medium, and relates to the technical field of computers, wherein the method comprises the following steps: acquiring an ETL task attribute, wherein the ETL task attribute comprises a corresponding relation between an ETL task and data source information; based on the ETL task attribute, detecting whether a target ETL task to be executed exists currently; if the current target ETL task is detected, monitoring a service port of each task execution node to obtain survival information of each task execution node; determining a target node for executing the target ETL task based on the survival information, wherein the target node is a survival task executing node; and the control target node executes the target ETL task based on the data source information corresponding to the target ETL task to obtain target data. The method and the device can improve timeliness and stability of ETL task execution.
Description
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method, an apparatus, an electronic device, and a computer readable storage medium.
Background
And the Extraction, conversion and Loading (ETL) of data in distributed and heterogeneous data sources, such as relational data, plane data files and the like, is responsible for extracting the data into a temporary middle layer, then cleaning, converting and integrating the data, and finally Loading the data into a data warehouse or a data mart, so that the data is a foundation for online analysis processing and data mining. Data of different sources, formats and characteristic properties can be logically or physically organically concentrated through ETL, so that comprehensive data sharing is provided for enterprises.
Today, ETL tools are numerous, such as open source tools dataX, informatics, etc., however, the timeliness and stability of data processing has yet to be improved when the ETL tools are faced with massive amounts of data.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a data processing method, a device, electronic equipment and a computer readable storage medium, which can improve the timeliness and stability of ETL task execution.
In order to solve the above technical problems, the present invention provides a data processing method, which is applied to an electronic device, where the electronic device is communicatively connected to task execution nodes, and the task execution nodes are configured to execute data extraction, conversion, and loading ETL tasks, and the method includes: acquiring an ETL task attribute, wherein the ETL task attribute comprises a corresponding relation between an ETL task and data source information; detecting whether a target ETL task to be executed exists currently or not based on the ETL task attribute; if the current existence of the target ETL task is detected, monitoring a service port of each task execution node to obtain survival information of each task execution node; determining a target node for executing the target ETL task based on the survival information, wherein the target node is a task execution node in a survival state; and controlling the target node to execute the target ETL task based on the data source information corresponding to the target ETL task to obtain target data.
In addition, the distributed target nodes are task execution nodes in a survival state, so that the execution success power of the ETL task is improved, the execution stability of the ETL task is improved, high-availability cluster node service is provided for the execution of the ETL task, and the requirement of a massive data scene is met.
In some embodiments, the electronic device is further communicatively connected to a server where the service database is located, and the acquiring the ETL task attribute includes: responding to a configuration request of the ETL task attribute, and acquiring data source information from the service database; displaying the data source information on a human-computer interaction interface for ETL task configuration; and acquiring the configured ETL task attribute through the man-machine interaction interface.
According to the embodiment of the application, the interface for ETL task attribute configuration can be provided, the interface automatically pulls the data source information, so that a user can conveniently configure ETL aiming at the data source information to be accessed.
In some embodiments, the human-machine interaction interface further comprises any one or a combination of the following: setting columns of ETL task update logic, setting columns of task execution nodes of ETL tasks, and setting columns of timing trigger conditions of the ETL tasks.
In some embodiments, the ETL task attributes further include preset node information; the determining, based on the survival information, a target node for performing the target ETL task, including: if the task execution node identified by the preset node information is in an unoccupied state, selecting the surviving task execution node from the task execution nodes as the target node; and if the task execution node identified by the preset node information is in a survival state, taking the preset node as the target node.
By adopting the technical scheme, the electronic equipment can be enabled to preferentially distribute the execution nodes set by the user to execute the ETL task.
In some embodiments, after the controlling the target node to execute the target ETL task based on the data source information corresponding to the target ETL task to obtain the target data, the method further includes: and controlling the target node to store the execution log of the target data in a log database.
By adopting the technical scheme, compared with the method that the log is stored in the cache by the key tool, the log is persisted into the log database, so that a user can trace the ETL task execution process based on the log conveniently.
In some embodiments, the ETL task attribute further includes a timing trigger condition of the ETL task, and the detecting whether the target ETL task to be executed currently exists based on the ETL task attribute includes: determining an ETL task which is triggered currently based on the timing triggering condition of the ETL task; and taking the current triggered ETL task as the target ETL task.
By adopting the technical scheme, the timing execution of the ETL task can be realized.
In some embodiments, determining the currently triggered ETL task based on the timed trigger condition of the ETL task includes: determining the timing time of a preset external timer based on the timing trigger condition of the ETL task, wherein the preset external timer is a time trigger component except a Kettle native timer; determining an ETL task which is triggered currently through the preset external timer; the controlling the target node to execute the target ETL task based on the data source information corresponding to the target ETL task includes: and controlling the target node to execute the target ETL task through a Kettle task automation engine.
By adopting the technical scheme, the external timer triggers the Kettle task by calling the application program interface of the Kettle component through the timing trigger, and the execution stability of the ETL task is ensured.
An embodiment of the present application further provides a data processing apparatus, which is applied to an electronic device, where the electronic device is communicatively connected to each task execution node, and the task execution node is configured to execute data extraction, conversion, and loading ETL tasks, and the apparatus includes: the acquisition module is used for acquiring ETL task attributes, wherein the ETL task attributes comprise the corresponding relation between the ETL tasks and the data source information; the detection module is used for detecting whether a target ETL task to be executed exists currently or not based on the ETL task attribute; the monitoring module is used for monitoring the service ports of the task execution nodes if the target ETL task exists currently, and obtaining survival information of the task execution nodes; the control module is used for determining a target node for executing the target ETL task based on the survival information, wherein the target node is a task execution node in a survival state, and is used for controlling the target node to execute the target ETL task based on the data source information corresponding to the target ETL task to obtain target data.
An embodiment of the present application further provides an electronic device, where the electronic device includes a processor and a memory, where the memory is configured to store instructions, and the processor is configured to invoke the instructions in the memory, so that the electronic device executes the data processing method described above.
An embodiment of the present application further provides a computer readable storage medium storing computer instructions that, when executed on an electronic device, cause the electronic device to perform the data processing method described above.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings which are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a data processing system according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating steps of a data processing method according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a human-computer interaction interface according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a Kettle-based data processing system according to one embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to FIG. 1, FIG. 1 is a schematic diagram of a data processing system according to an embodiment of the present application. The data processing system may comprise an electronic device 101, with data processing means integrated in the electronic device 101.
The data processing system may further include a task execution cluster 102, where task execution clusters include task execution nodes, and the task execution nodes may be used to execute ETL tasks. The task execution cluster 102 may be built based on Kettle, or may be implemented based on other ETL tools for performing data extraction-transformation-loading.
Kettle is a cross-platform running ETL tool written by Java codes and provided with a graphical interface, and a user can finish ETL logic of data by dragging components on a canvas interface of the Client. Kettle is an open source free ETL tool, so data processing cost can be reduced through Kettle.
In the case where the task execution cluster 102 is built based on ketle, the task execution cluster 102 may also be referred to as a ketle cluster, and the task execution nodes in the task execution cluster 102 are communicatively connected to the electronic device 101.
The electronic device 101 of the embodiment of the present application may be configured to obtain an ETL task attribute, where the ETL task attribute includes a correspondence between each ETL task and data source information; detecting whether a target ETL task to be executed exists currently or not; if the current existence of the target ETL task is detected, monitoring a service port of each task execution node to obtain survival information of each task execution node; determining a target node for executing the target ETL task based on the survival information, wherein the target node is a task execution node in a survival state; and controlling the target node to execute the target ETL task based on the data source information corresponding to the target ETL task attribute to obtain target data.
In addition, the distributed target nodes are task execution nodes in a survival state, so that the execution success power of the ETL task is improved, the execution stability of the ETL task is improved, high-availability cluster node service is provided for the execution of the ETL task, and the requirement of a massive data scene is met.
The electronic device 101 is a device capable of automatically performing numerical calculations and/or information processing in accordance with instructions set or stored in advance, the hardware of which includes, but is not limited to, a processor, a micro-program controller (Microprogrammed Control Unit, MCU), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a programmable gate array (Field-Programmable Gate Array, FPGA), a digital processor (Digital Signal Processor, DSP), an embedded device, and the like. The electronic device may be a portable electronic device (e.g., a cell phone, tablet computer), a personal computer, a server, etc.
It will be appreciated that the application environment shown in fig. 1 is only one application scenario of the embodiment of the present application, and is not limited to the application scenario of the present application, and other application environments may also include more or fewer electronic devices than those shown in fig. 1, for example, only one electronic device is shown in fig. 1, and it will be appreciated that the data processing system may also include one or more other services, which is not limited herein.
In addition, the data processing system may further include a server where a service database is located, where the service database stores various types of service data, for example, production data of various parts of a factory, human resource data, and the like.
The electronic device 101 may be communicatively coupled to a server where the service database resides, and the electronic device 101 may have access to the service database. For example, the electronic device may be connected to the service database through java connection database (Java DateBase Connectivity, JDBC) technology, or may also be connected through other database connection technologies, which is not limited in the embodiments of the present application.
It should be noted that, the schematic view of the scenario of the data processing system shown in fig. 1 is only an example, and the data processing system and scenario described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided in the embodiments of the present application, and those skilled in the art can know that, with the evolution of the data processing system and the appearance of a new service scenario, the technical solutions provided in the embodiments of the present application are equally applicable to similar technical problems.
FIG. 2 is a flowchart illustrating steps of an embodiment of a data processing method according to the present application. The order of the steps in the flow diagrams may be changed, and some steps may be omitted, according to different needs.
Referring to fig. 2, the data processing method may include the following steps.
The ETL job attribute characterizes the execution configuration information of the ETL job. The ETL job attribute may include a correspondence between the ETL job and the data source information. The data source information may be a data identifier such as a service database name, a service database mode, a table name, a field name, and the like, which is not limited in the embodiment of the present application. The source data identified by the data source information may be converted into target data by performing the ETL task.
The ETL task attribute may further include update logic of the ETL task, trigger conditions of the ETL task, and information of an execution node of the ETL task, but is not limited to this, and a specific configuration item in the ETL task attribute may be set according to actual requirements.
The update logic of the ETL job may include the update manner of the ETL job, such as full volume update, delta insertion, and delta synchronization, and the update frequency of the ETL job.
The triggering condition of the ETL job is used for limiting the triggering time of the ETL job. For example, the triggering condition of the ETL task may be: and if the accumulated updated data reaches the preset piece of data in the source data, executing the ETL task. For another example, the triggering condition of the ETL task may be a timed triggering condition of the ETL task, for example, the ETL task is performed once every preset time period, or the ETL task is performed at a preset time point.
In some embodiments, the electronic device of the embodiments of the present application may further be communicatively connected to a server where the service database is located, and step 201 may include: the electronic equipment responds to a configuration request of the ETL task attribute, acquires data source information from a service database, and then displays the data source information on a man-machine interaction interface for carrying out ETL task configuration, such as a selection frame provided with the data source information on the man-machine interaction interface, and a user can configure the ETL task on the man-machine interaction interface; and the electronic equipment acquires the configured ETL task attribute, such as the selected data source, through the man-machine interaction interface.
The man-machine interaction interface for carrying out ETL task configuration further comprises one or a combination of the following: setting columns of ETL task update logic, setting columns of task execution nodes of ETL tasks, and setting columns of timing trigger conditions of the ETL tasks. For example, referring to fig. 3, fig. 3 is a schematic diagram of a human-machine interaction interface for performing ETL task configuration.
Further, after the electronic device obtains the configured ETL task attribute from the man-machine interaction interface, the ETL task attribute may be stored in a task attribute table created in advance, so as to facilitate maintenance of the ETL task attribute of each ETL task. The electronic device may determine the already created ETL task from the ETL task attribute table, i.e. may obtain the ETL task attributes of the already created ETL task in step 102.
The interface can display the automatically pulled data source information, so that a user can conveniently configure the ETL according to the data source information to be accessed.
Compared with the operation of key and the like needing to manually introduce corresponding components and the like with larger difficulty, the embodiment of the application packages the related components together, and presents the related components to a user in the form of the setting bar, so that the configuration operation of the ETL task is further simplified, the difficulty of ETL task configuration is reduced, and the data warehouse is convenient for business personnel to construct.
In some embodiments, the electronic device may further provide a registration page of the service database, on the basis of which the user may configure registration information of the service database, the electronic device obtains registration information of the service database, such as a user name (user), a password (password), and a connection website (Uniform Resource Locator, URL) for connection of the service database, from the registration page, and then the electronic device registers the database on the basis of the registration information, so that the user can perform configuration of the ETL task on the data to be accessed, without performing registration connection of the service database each time the ETL task is configured, so that configuration of the ETL task is simpler.
Specifically, based on the triggering condition of each ETL task, whether a target ETL task to be executed exists currently or not is detected.
In some embodiments, step 202 may include: acquiring a timing triggering condition of an ETL task; determining the current triggered ETL task based on the timing trigger condition of the ETL task; and taking the current triggered ETL task as the target ETL task. For example, the timed trigger condition may be to perform an ETL task once per hour, or, 01 daily: 00-13:00 performs the ETL task, but is not limited thereto.
Further, determining the currently triggered ETL task based on the timing trigger condition of the ETL task may include: determining the timing time of a preset external timer based on the timing trigger condition of the ETL task, wherein the preset external timer is a time trigger component except a Kettle native timer; and determining the ETL task which is triggered currently through the preset external timer. The preset external timer may be, but is not limited to, quartz, XXJob, etc.
The Kettle tool is limited by the C/S architecture, and after a graphical interface corresponding to the Kettle tool is started on the electronic equipment, a native timer of the Kettle can be started to run, so that the timed running of the ETL task is ensured. That is, under the condition that the graphical interface is not opened in the electronic device, the original timer of the Kettle cannot normally run, so that the ETL task is triggered by the external timer, and the execution stability of the ETL task can be ensured.
The survival information is used to characterize whether the task execution node survives.
The target node is a task execution node in a survival state.
In some embodiments, where the ETL task attribute further includes preset node information, step 204 may include: if the task execution node identified by the preset node information is in an unoccupied state, selecting the surviving task execution node from the task execution nodes as the target node; and if the task execution node identified by the preset node information is in a survival state, taking the preset node as the target node.
The electronic device of the embodiment may preferentially allocate the task execution node (i.e., the preset node) set by the user to execute the ETL task.
In other embodiments, step 204 may include: and randomly selecting the task execution node in the survival state from the task execution nodes as a target node.
The table name, field name, data type, etc. of the target data may be set according to the user requirement, which is not limited in the embodiment of the present application.
In some embodiments, where the target node is built by a Kettle tool, step 205 may include: and controlling the target node to execute the target ETL task through a Kettle task automation engine.
In some embodiments, after the control target node performs the target ETL task based on the data source information corresponding to the target ETL task, the control target node may be instructed to store an execution log of the target data in the log database.
Compared with the log stored in the cache by the key tool, the log is persisted into the log database by the embodiment, so that a user can trace the ETL task execution process based on the log conveniently.
By way of example, the process of building a system based on a Kettle tool to enable automated execution of ETL tasks is described below in connection with FIG. 4.
1. The Java project is built, and a run-time dependency package corresponding to the Kettle version is introduced so that the Kettle tool can be operated through Java external code.
2. An external timer such as components of Quartz, XXJob and the like is introduced into Java items, and then an application program interface of a Kettle component is called through the external timer, so that the effect that the external timer triggers an ETL task is achieved.
3. And creating an ETL management system of the ETL, developing corresponding back-end Java logic, and creating an ETL task attribute table. The ETL management system is used for providing a dragging and configuring entrance for a user, namely providing a man-machine interaction interface for carrying out ETL task configuration, so that the user can carry out ETL configuration according to the use requirement, the dependence of ETL task configuration on research and development capability is decoupled, and the configuration operation of the ETL task is simplified.
4. And creating a database resource library of Kettle. The database resource library is used for persistence of ETL task output results of subsequent configuration, and is convenient for access of Kettle cluster distributed architecture, and concurrent and parallel access sharing of task resources is met.
5. And constructing a cluster high-availability service of Kettle. For example, a multi-node ketle cluster service can be constructed through a gate command of a ketle tool, and the ketle cluster service provides an ETL node service for the outside, that is, a ketle cluster includes a task execution node for executing an ETL task, so as to further ensure a cluster high availability group, the task execution node can be divided into a master node and a slave node corresponding to the master node. The Kettle cluster service has fixed port resources, and the electronic equipment can monitor the service ports through the operation and maintenance component or the script to obtain the survival condition of the task execution node, so that the service provided by the task execution node can be eliminated after the task execution node fails, and new service is re-created, thereby realizing the high availability of the service provided by the task execution node to the outside.
6. The ETL task automation engine of Kettle is constructed, automatic process construction is carried out through controls such as table input, table output, variable transfer, process component nesting, log monitoring, data flow blocking, embedded JavaScript, SQL execution, logic judgment component and the like of Kettle, so that ETL task attributes in an ETL task attribute table are converted into actual ETL execution logic, extraction, conversion and filling of data can be completed in the Kettle automation engine by source data, and a construction result of the automation engine is stored in an ETL database resource library.
Through the ETL management system, a user can perform ETL custom configuration according to actual requirements, and ETL tasks are executed based on the custom configuration. After the external timer is triggered, the electronic device instructs a target node of the target ETL task to call an ETL execution engine based on Kettle, and the execution engine completes the whole flow of the ETL according to the ETL task attribute of the target ETL task.
The installation and deployment environments described above may be as follows, but may be set according to requirements in practical applications, and embodiments of the present application are not limited thereto.
(1) Java: jdk 1.8 or more.
(2) Database services such as Mysql or other relational databases.
(3) Kettle tool.
(4) Kettle cluster service.
(5) Java items, including as follows: springboot, quartz, or xxjob, kettleApi components.
(6) Front-end Web pages, such as vue.
(7) Container management platforms, such as container cluster management platforms (k 8 s).
According to the embodiment of the application, the ETL task can be automatically executed through a Kettle tool by combining a custom code, an external timer and the packaging of a Kettle control. In the system built by Kettle, firstly, an ETL task configuration inlet is built, namely, a man-machine interaction interface for carrying out ETL task configuration is built, task attributes of the ETL task are recorded, then, based on the function of an ETL database resource library of Kettle, a process engine for automatically executing the ETL task is built, and an external timer is combined to realize automatic batch synchronization scene of data of a data warehouse.
When the ETL task configuration is carried out, the user can directly configure the ETL task based on the human-computer interaction interface, and the operation of the ETL task configuration can be simplified; by setting a plurality of ETL task execution nodes and distributing corresponding task execution nodes for the ETL tasks, the concurrent execution of the plurality of ETL tasks can be realized, and the timeliness and the high availability of the ETL tasks are improved; by setting an external timer, the execution stability of the ETL task is ensured; the state log of the execution of the ETL task can be persisted through the log database, so that the execution process of the ETL task can be traced conveniently.
Based on the same ideas as the data processing method in the above embodiments, the present application also provides a data processing apparatus that can be used to perform the above data processing method. For ease of illustration, only those portions of the data processing apparatus embodiments that are relevant to the embodiments of the present application are shown in the structural schematic diagrams, and those skilled in the art will appreciate that the illustrated structures are not limiting of the apparatus and may include more or fewer components than illustrated, or may combine certain components, or a different arrangement of components.
As shown in fig. 5, the data processing apparatus includes an acquisition module 501, a detection module 502, a monitoring module 503, and a control module 504. In some embodiments, the modules described above may be programmable software instructions stored in memory and executable by a processor call. It will be appreciated that in other embodiments, the modules may be program instructions or firmware (firmware) that are resident in the processor.
The acquiring module 501 is configured to acquire an ETL task attribute, where the ETL task attribute includes a correspondence between an ETL task and data source information;
the detection module 502 is configured to detect whether a target ETL task to be executed currently exists based on the ETL task attribute;
a monitoring module 503, configured to monitor a service port of each task execution node if the target ETL task is detected to exist currently, so as to obtain survival information of each task execution node;
the control module 504 is configured to determine, based on the survival information, a target node for executing the target ETL task, where the target node is a task execution node in a survival state, and is configured to control the target node to execute the target ETL task based on data source information corresponding to the target ETL task, so as to obtain target data.
Fig. 6 is a schematic diagram of an embodiment of an electronic device of the present application.
The electronic device 100 comprises a memory 20, a processor 30 and a computer program 40 stored in the memory 20 and executable on the processor 30. The steps of the above-described embodiments of the data processing method, such as steps 201 to 205 shown in fig. 2, are implemented when the processor 30 executes the computer program 40.
By way of example, the computer program 40 may likewise be partitioned into one or more modules/units that are stored in the memory 20 and executed by the processor 30. The one or more modules/units may be a series of computer program instruction segments capable of performing particular functions for describing the execution of the computer program 40 in the electronic device 100. For example, the system may be divided into an acquisition module 501, a detection module 502, a monitoring module 503 and a control module 504 shown in fig. 5.
It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the electronic device 100 and is not meant to be limiting of the electronic device 100, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic device 100 may also include input-output devices, network access devices, buses, etc.
The processor 30 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor, a single-chip microcomputer or the processor 30 may be any conventional processor or the like.
The memory 20 may be used to store computer programs 40 and/or modules/units, and the processor 30 implements various functions of the electronic device 100 by running or executing the computer programs and/or modules/units stored in the memory 20, as well as invoking data stored in the memory 20. The memory 20 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), and a storage data area; the storage data area may store data (such as audio data) created according to the use of the electronic device 100, and the like. In addition, the memory 20 may include high-speed random access memory, and may also include nonvolatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other nonvolatile solid state storage device.
The modules/units integrated with the electronic device 100 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each method embodiment described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
In addition, the data processing method, the device, the electronic apparatus and the computer readable storage medium provided in the embodiments of the present invention are described in detail, and specific examples should be adopted to illustrate the principles and the embodiments of the present invention, where the description of the above embodiments is only for helping to understand the method and the core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Claims (10)
1. A data processing method, applied to an electronic device, where the electronic device is communicatively connected to task execution nodes, and the task execution nodes are configured to execute data extraction, conversion, and loading ETL tasks, the method comprising:
acquiring an ETL task attribute, wherein the ETL task attribute comprises a corresponding relation between an ETL task and data source information;
detecting whether a target ETL task to be executed exists currently or not based on the ETL task attribute;
if the current existence of the target ETL task is detected, monitoring a service port of each task execution node to obtain survival information of each task execution node;
determining a target node for executing the target ETL task based on the survival information, wherein the target node is a task execution node in a survival state;
and controlling the target node to execute the target ETL task based on the data source information corresponding to the target ETL task to obtain target data.
2. The data processing method of claim 1, wherein the electronic device is further communicatively connected to a server where a service database is located, and the obtaining the ETL task attribute includes:
responding to a configuration request of the ETL task attribute, and acquiring data source information from the service database;
displaying the data source information on a human-computer interaction interface for ETL task configuration;
and acquiring the configured ETL task attribute through the man-machine interaction interface.
3. The data processing method of claim 2, wherein the human-machine interaction interface further comprises any one or a combination of the following: setting columns of ETL task update logic, setting columns of task execution nodes of ETL tasks, and setting columns of timing trigger conditions of the ETL tasks.
4. The data processing method according to claim 1, wherein the ETL task attribute further includes preset node information; the determining, based on the survival information, a target node for performing the target ETL task, including:
if the task execution node identified by the preset node information is in an unoccupied state, selecting the surviving task execution node from the task execution nodes as the target node;
and if the task execution node identified by the preset node information is in a survival state, taking the preset node as the target node.
5. The data processing method according to claim 1, further comprising, after said controlling said target node to execute said target ETL task based on data source information corresponding to said target ETL task, obtaining target data:
and controlling the target node to store the execution log of the target data in a log database.
6. The data processing method of claim 1, wherein the ETL task attributes further include a timing trigger condition of the ETL task, and the detecting whether there is a target ETL task to be executed currently based on the ETL task attributes includes:
determining an ETL task which is triggered currently based on the timing triggering condition of the ETL task;
and taking the current triggered ETL task as the target ETL task.
7. The data processing method of claim 6, wherein the determining the currently triggered ETL task based on the timed trigger condition of the ETL task comprises:
determining the timing time of a preset external timer based on the timing trigger condition of the ETL task, wherein the preset external timer is a time trigger component except a Kettle native timer;
determining an ETL task which is triggered currently through the preset external timer;
the controlling the target node to execute the target ETL task based on the data source information corresponding to the target ETL task includes:
and controlling the target node to execute the target ETL task through a Kettle task automation engine.
8. A data processing apparatus for application to an electronic device, the electronic device being communicatively coupled to task execution nodes for executing data extraction, conversion and loading ETL tasks, the apparatus comprising:
the acquisition module is used for acquiring ETL task attributes, wherein the ETL task attributes comprise the corresponding relation between the ETL tasks and the data source information;
the detection module is used for detecting whether a target ETL task to be executed exists currently or not based on the ETL task attribute;
the monitoring module is used for monitoring the service ports of the task execution nodes if the target ETL task exists currently, and obtaining survival information of the task execution nodes;
the control module is used for determining a target node for executing the target ETL task based on the survival information, wherein the target node is a task execution node in a survival state, and is used for controlling the target node to execute the target ETL task based on the data source information corresponding to the target ETL task to obtain target data.
9. An electronic device comprising a processor and a memory, wherein the memory is configured to store instructions, the processor configured to invoke the instructions in the memory, to cause the electronic device to perform the data processing method of any of claims 1-7.
10. A computer readable storage medium storing computer instructions which, when run on an electronic device, cause the electronic device to perform the data processing method of any one of claims 1 to 7.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310658405.3A CN116383295A (en) | 2023-06-06 | 2023-06-06 | Data processing method, device, electronic equipment and computer readable storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310658405.3A CN116383295A (en) | 2023-06-06 | 2023-06-06 | Data processing method, device, electronic equipment and computer readable storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN116383295A true CN116383295A (en) | 2023-07-04 |
Family
ID=86961928
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310658405.3A Pending CN116383295A (en) | 2023-06-06 | 2023-06-06 | Data processing method, device, electronic equipment and computer readable storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116383295A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120030603A (en) * | 2025-04-24 | 2025-05-23 | 杭州安泉数智科技有限公司 | A distributed database desensitization system, method and device based on kettle |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110333940A (en) * | 2019-06-25 | 2019-10-15 | 深圳前海微众银行股份有限公司 | Condition-based task scheduling method, device, equipment and storage medium |
| CN112199432A (en) * | 2020-10-19 | 2021-01-08 | 天翼电子商务有限公司 | High-performance data ETL device based on distribution and control method |
| CN113761005A (en) * | 2021-07-31 | 2021-12-07 | 浪潮电子信息产业股份有限公司 | Metadata configuration method, device, electronic device and storage medium |
| CN115048205A (en) * | 2022-08-15 | 2022-09-13 | 广州粤芯半导体技术有限公司 | ETL scheduling platform, deployment method thereof and computer-readable storage medium |
| CN115687468A (en) * | 2022-09-09 | 2023-02-03 | 上海镁信健康科技有限公司 | System for processing data in distributed service by ETL process button |
-
2023
- 2023-06-06 CN CN202310658405.3A patent/CN116383295A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110333940A (en) * | 2019-06-25 | 2019-10-15 | 深圳前海微众银行股份有限公司 | Condition-based task scheduling method, device, equipment and storage medium |
| CN112199432A (en) * | 2020-10-19 | 2021-01-08 | 天翼电子商务有限公司 | High-performance data ETL device based on distribution and control method |
| CN113761005A (en) * | 2021-07-31 | 2021-12-07 | 浪潮电子信息产业股份有限公司 | Metadata configuration method, device, electronic device and storage medium |
| CN115048205A (en) * | 2022-08-15 | 2022-09-13 | 广州粤芯半导体技术有限公司 | ETL scheduling platform, deployment method thereof and computer-readable storage medium |
| CN115687468A (en) * | 2022-09-09 | 2023-02-03 | 上海镁信健康科技有限公司 | System for processing data in distributed service by ETL process button |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120030603A (en) * | 2025-04-24 | 2025-05-23 | 杭州安泉数智科技有限公司 | A distributed database desensitization system, method and device based on kettle |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109997126B (en) | Event driven extraction, transformation, and loading (ETL) processing | |
| EP3513317B1 (en) | Data serialization in a distributed event processing system | |
| CN110908641B (en) | Visualization-based stream computing platform, method, device and storage medium | |
| CN109656963B (en) | Metadata acquisition method, device, device and computer-readable storage medium | |
| Viennot et al. | Synapse: a microservices architecture for heterogeneous-database web applications | |
| CN111324610A (en) | Data synchronization method and device | |
| WO2019195121A1 (en) | Digital worker management system | |
| CN108958729B (en) | Data processing method, device and storage medium | |
| WO2020238597A1 (en) | Hadoop-based data updating method, device, system and medium | |
| US20250173377A1 (en) | Continuous builds of derived datasets in response to other dataset updates | |
| AU2017254506B2 (en) | Method, apparatus, computing device and storage medium for data analyzing and processing | |
| US10949218B2 (en) | Generating an execution script for configuration of a system | |
| CN112199443B (en) | Data synchronization method and device, computer equipment and storage medium | |
| CN115934855A (en) | Full-link field level blood margin analysis method, system, equipment and storage medium | |
| EP4024228A1 (en) | System and method for batch and real-time feature calculation | |
| CN110399089B (en) | Data storage method, device, equipment and medium | |
| CN112131208A (en) | Full data migration method, device and equipment and computer readable storage medium | |
| CN111897808B (en) | Data processing method and device, computer equipment and storage medium | |
| CN113360581A (en) | Data processing method, device and storage medium | |
| CN112559525A (en) | Data checking system, method, device and server | |
| CN116383295A (en) | Data processing method, device, electronic equipment and computer readable storage medium | |
| CN113806429A (en) | Canvas type log analysis method based on large data stream processing framework | |
| EP3657351A1 (en) | Smart data transition to cloud | |
| CN113779117A (en) | Data monitoring method and device, storage medium and electronic equipment | |
| CN115455006A (en) | Data processing method, data processing device, electronic device, and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230704 |
|
| RJ01 | Rejection of invention patent application after publication |