CN118363753A

CN118363753A - Data processing method and device, equipment and medium applied to distributed system

Info

Publication number: CN118363753A
Application number: CN202410458523.4A
Authority: CN
Inventors: 王垒
Original assignee: Kunlun Core Beijing Technology Co ltd
Current assignee: Kunlun Core Beijing Technology Co ltd
Priority date: 2024-04-16
Filing date: 2024-04-16
Publication date: 2024-07-19

Abstract

The disclosure provides a data processing method, a device, equipment and a medium applied to a distributed system, relates to the technical field of computers, and particularly relates to the technical fields of chip technology, data processing and distributed processing. The implementation scheme is as follows: splitting a first parameter set of the first process into a plurality of sub-parameter sets; determining an operator sequence comprising a first processing operator and a communication operator for each node; starting a plurality of processes respectively corresponding to the plurality of nodes; compiling a plurality of operator sequences by utilizing a compiler to obtain a plurality of executable files; determining the identification of a target communication domain formed by a plurality of processes; issuing an identification of a target communication domain, input data and an executable file by utilizing a process corresponding to each node, so that each node can execute a first processing operator based on the input data and execute a communication operator based on output data of the first processing operator and the identification of the target communication domain; and determining a first processing result based on output data of the communication operators of the plurality of nodes.

Description

Data processing method and device, equipment and medium applied to distributed system

Technical Field

The present disclosure relates to the field of computer technology, and in particular, to the field of chip technology, data processing, and distributed processing technology, and in particular, to a data processing method, apparatus, electronic device, computer readable storage medium, and computer program product applied to a distributed system.

Background

Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

With the development of artificial intelligence technology, more and more applications achieve effects far exceeding those of traditional algorithms based on artificial intelligence technology. Deep learning is a data-intensive algorithm and a computation-intensive algorithm, and in order to increase the training speed and reasoning speed of a large-scale deep learning model, a distributed system comprising a plurality of nodes can be utilized to execute data processing, so as to meet the computational power requirement. The deployment efficiency of large-scale deep learning reasoning tasks in a distributed system will directly affect the speed and convenience of model reasoning.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, the problems mentioned in this section should not be considered as having been recognized in any prior art unless otherwise indicated.

Disclosure of Invention

The present disclosure provides a data processing method, apparatus, electronic device, computer readable storage medium and computer program product for application to a distributed system.

According to an aspect of the present disclosure, there is provided a data processing method applied to a distributed system, including: responsive to determining that a first process needs to be performed on input data based on a first set of parameters, splitting the first set of parameters into a plurality of sub-sets of parameters respectively corresponding to a plurality of nodes in the distributed system; determining, for each node of the plurality of nodes, an operator sequence to be executed by the node, wherein the operator sequence includes a first processing operator and a communication operator that execute the first processing based on a corresponding set of sub-parameters of the node, a parameter item of the communication operator includes an identification of a target communication domain, and the communication operator indicates an operation of aggregating data within the target communication domain; starting a plurality of processes respectively corresponding to the plurality of nodes; compiling a plurality of operator sequences corresponding to the plurality of nodes by utilizing a compiler to obtain a plurality of executable files; determining the identification of a target communication domain formed by the plurality of processes; for each node in the plurality of nodes, issuing an identification of the target communication domain, the input data and an executable file corresponding to the node by utilizing a process corresponding to the node, so that the node can execute a first processing operator in an operator sequence corresponding to the node based on the input data, and can execute a communication operator in the operator sequence based on output data of the first processing operator and the identification of the target communication domain; and determining a first processing result based on output data of communication operators of the plurality of nodes.

According to an aspect of the present disclosure, there is provided a data processing apparatus applied to a distributed system, including: a splitting unit configured to split a first parameter set into a plurality of sub-parameter sets respectively corresponding to a plurality of nodes in the distributed system in response to determining that the first processing needs to be performed on input data based on the first parameter set; a first determining unit configured to determine, for each node of the plurality of nodes, an operator sequence to be executed by the node, wherein the operator sequence includes a first processing operator and a communication operator that execute the first processing based on a corresponding sub-parameter set of the node, a parameter item of the communication operator includes an identification of a target communication domain, and the communication operator indicates an operation of aggregating data within the target communication domain; a starting unit configured to start a plurality of processes respectively corresponding to the plurality of nodes; a compiling unit configured to complete compiling of a plurality of operator sequences corresponding to the plurality of nodes by using a compiler to obtain a plurality of executable files; a second determining unit configured to determine an identification of a target communication domain constituted by the plurality of processes; an execution unit configured to issue, for each node of the plurality of nodes, an identification of the target communication domain, the input data, and an executable file corresponding to the node using a process corresponding to the node, so that the node can execute a first processing operator in an operator sequence corresponding to the node based on the input data, and can execute a communication operator in the operator sequence based on output data of the first processing operator and the identification of the target communication domain; and a third determination unit configured to determine a first processing result based on output data of communication operators of the plurality of nodes.

According to an aspect of the present disclosure, there is provided an electronic apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data processing method described above.

According to an aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the above-described data processing method.

According to an aspect of the present disclosure, there is provided a computer program product comprising a computer program, wherein the computer program is capable of implementing the above-mentioned data processing method when being executed by a processor.

According to one or more embodiments of the present disclosure, distributed data processing may be simply and quickly implemented, and data processing efficiency may be improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The accompanying drawings illustrate exemplary embodiments and, together with the description, serve to explain exemplary implementations of the embodiments. The illustrated embodiments are for exemplary purposes only and do not limit the scope of the claims. Throughout the drawings, identical reference numerals designate similar, but not necessarily identical, elements.

FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, according to an exemplary embodiment of the present disclosure;

FIG. 2 illustrates a flow chart of a data processing method according to an exemplary embodiment of the present disclosure;

FIG. 3 illustrates a flow chart of a data processing method according to an exemplary embodiment of the present disclosure;

FIG. 4 shows a block diagram of a data processing apparatus according to an exemplary embodiment of the present disclosure;

fig. 5 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, the use of the terms "first," "second," and the like to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of the elements, unless otherwise indicated, and such terms are merely used to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, they may also refer to different instances based on the description of the context.

The terminology used in the description of the various illustrated examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, the elements may be one or more if the number of the elements is not specifically limited. Furthermore, the term "and/or" as used in this disclosure encompasses any and all possible combinations of the listed items.

In the related art, a universal compiler framework can be utilized to conveniently realize compiling and optimizing steps of various data processing tasks, such as reasoning tasks of a large-scale deep learning model. Conventional compiler frameworks can only be used to deploy data processing tasks on a single hardware node, failing to support distributed processing, thereby limiting data processing size and processing speed. With the increase of the large model parameter, the upper limit of the processing capacity of a single hardware greatly limits the execution efficiency of the large model reasoning task.

Based on the above, the disclosure provides a data processing method, when a data processing task for executing first processing based on a first parameter set is received, an operator sequence corresponding to each node is obtained by splitting parameters and inserting operators, a first processing operator in the operator sequence can realize first processing to obtain a partial processing result corresponding to a partial parameter, and a communication operator can realize data aggregation to aggregate partial processing results of a plurality of nodes into a complete first processing result. Because the processing type of the data processing operator is not changed, the communication domain in the distributed system can be constructed by starting multi-process processing on the basis of conveniently completing compiling by utilizing a compiler, the communication operator can be ensured to be executed correctly, the complete distributed processing is realized, and the deployment efficiency of the data processing task in the distributed system is improved.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented, in accordance with an embodiment of the present disclosure. Referring to fig. 1, the system 100 includes one or more client devices 101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120. Client devices 101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

In embodiments of the present disclosure, the server 120 may run one or more services or software applications that enable execution of the data processing methods.

In some embodiments, server 120 may also provide other services or software applications, which may include non-virtual environments and virtual environments. In some embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of client devices 101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.

In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof that are executable by one or more processors. A user operating client devices 101, 102, 103, 104, 105, and/or 106 may in turn utilize one or more client applications to interact with server 120 to utilize the services provided by these components. It should be appreciated that a variety of different system configurations are possible, which may differ from system 100. Accordingly, FIG. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.

The user may send data processing requests or input data using client devices 101, 102, 103, 104, 105, and/or 106. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that the present disclosure may support any number of client devices.

Client devices 101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and the like. These computer devices may run various classes and versions of software applications and operating systems, such as MICROSOFT Windows, appli OS, UNIX-like operating systems, linux, or Linux-like operating systems (e.g., GOOGLE Chrome OS); or include various mobile operating systems such as MICROSOFT Windows Mobile OS, iOS, windows Phone, android. Portable handheld devices may include cellular telephones, smart phones, tablet computers, personal Digital Assistants (PDAs), and the like. Wearable devices may include head mounted displays (such as smart glasses) and other devices. The gaming system may include various handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 110 may be any of a variety of networks known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. For example only, the one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a blockchain network, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture that involves virtualization (e.g., one or more flexible pools of logical storage devices that may be virtualized to maintain virtual storage devices of the server). In various embodiments, server 120 may run one or more services or software applications that provide the functionality described below.

The computing units in server 120 may run one or more operating systems including any of the operating systems described above as well as any commercially available server operating systems. Server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, etc.

In some implementations, server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of client devices 101, 102, 103, 104, 105, and 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of client devices 101, 102, 103, 104, 105, and 106.

In some implementations, the server 120 may be a server of a distributed system or a server that incorporates a blockchain. The server 120 may also be a cloud server, or an intelligent cloud computing server or intelligent cloud host with artificial intelligence technology. The cloud server is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and Virtual special server (VPS PRIVATE SERVER) service.

The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of databases 130 may be used to store information such as audio files and video files. Database 130 may reside in various locations. For example, the database used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. Database 130 may be of different categories. In some embodiments, the database used by server 120 may be, for example, a relational database. One or more of these databases may store, update, and retrieve the databases and data from the databases in response to the commands.

In some embodiments, one or more of databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key value stores, object stores, or conventional stores supported by the file system.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

Fig. 2 illustrates a flowchart of a data processing method 200 applied to a distributed system according to an exemplary embodiment of the present disclosure. As shown in fig. 2, the method 200 includes:

Step S201, responding to the determination that the first processing is needed to be executed on the input data based on a first parameter set, splitting the first parameter set into a plurality of sub-parameter sets respectively corresponding to a plurality of nodes in the distributed system;

Step S202, determining an operator sequence to be executed by each node in the plurality of nodes, wherein the operator sequence comprises a first processing operator and a communication operator for executing the first processing based on a corresponding sub-parameter set of the node, parameter items of the communication operator comprise identification of a target communication domain, and the communication operator indicates an operation of aggregating data in the target communication domain;

step 203, starting a plurality of processes respectively corresponding to the plurality of nodes;

Step S204, compiling a plurality of operator sequences corresponding to the plurality of nodes by utilizing a compiler to obtain a plurality of executable files;

Step S205, determining the identification of a target communication domain formed by the processes;

Step S206, for each node in the plurality of nodes, issuing, to the node, an identifier of the target communication domain, the input data, and an executable file corresponding to the node by using a process corresponding to the node, so that the node can execute a first processing operator in an operator sequence corresponding to the node based on the input data, and can execute a communication operator in the operator sequence based on output data of the first processing operator and the identifier of the target communication domain; and

Step S207, determining a first processing result based on output data of communication operators of the plurality of nodes.

By applying the method 200, when a data processing task for executing the first processing based on the first parameter set is received, an operator sequence corresponding to each node is obtained by splitting the parameter and inserting the operator, the first processing operator in the operator sequence can implement the first processing to obtain a partial processing result corresponding to a partial parameter, and the communication operator can implement data aggregation to aggregate the partial processing results of the plurality of nodes into a complete first processing result. Because the processing type of the data processing operator is not changed, the communication domain in the distributed system can be constructed by starting multi-process processing on the basis of conveniently completing compiling by utilizing a compiler, so that the communication operator can be executed correctly, and the complete distributed processing is realized on the basis of parameter splitting-parallel execution of the first processing-realization of result aggregation by utilizing the communication operator.

In some examples, the first set of parameters may be parameters corresponding to different processing types, such as convolution kernel parameters corresponding to a convolution process. The first process may be, for example, an arithmetic operation, a matrix multiplication, a convolution calculation, an activation operation, a tensor operation, or the like, or may be a cascade combination of one or more of the above respective process types.

In some examples, the first set of parameters may be a set of weight parameters of a deep learning model, for example, may be in the form of a weight parameter matrix, a weight parameter tensor, or the like. The first process may be an inference calculation of various types of neural network layers, such as a full-join calculation, a convolution calculation, a pooling calculation, and the like. Therefore, forward or reverse calculation of one or more network layers in the deep learning model can be completed by utilizing the first processing operator, and reasoning or training of the large-scale deep learning model can be realized.

In some examples, the communication operator may be a set communication operator of various types, such as an all reduction (all reduction) operator and a data aggregation (ALL GATHER) operator, to achieve a reduction or aggregation of the local processing results (i.e., the results calculated based on a portion of the first set of parameters) to obtain a complete first processing result.

In some examples, splitting the first parameter set into a plurality of sub-parameter sets corresponding to a plurality of nodes in the distributed system in step S201 may be implemented according to a preset parameter segmentation rule. For example, when the input data is feature map data and the first processing to be performed is to perform convolution based on a depth convolution kernel, the segmentation of the convolution kernel data may be performed in the depth direction according to the parameter number of the convolution kernel, so as to implement parallel computation by using a plurality of nodes, and then implement aggregation of processing results by using a subsequent communication operator in the operator sequence.

In some examples, the first process is an inference calculation of a large-scale deep learning model. In this case, splitting of large models can be achieved using various existing model tensor splitting approaches. For example, for the embedded representation (Embedding) operation based on the lookup table in the deep learning model, the parameter value of Embedding may be divided into a plurality of nodes, and then the all reduce operation on the index item is completed in the communication operator, so as to obtain the found Embedding result. For another example, for matrix multiplication operation in the deep learning model, the weight matrix may be split into a plurality of submatrices, and the input matrix and the plurality of submatrices are subjected to matrix multiplication in parallel to obtain a plurality of submatrices, and then the operation is completed ALL GATHER in the communication operator to obtain a complete result matrix.

The splitting manner of the first parameter set is not limited in the present disclosure as long as the mathematical consistency of the original first parameter set and the multiple sub-parameter sets after splitting can be achieved.

It may be appreciated that, in the splitting of the first parameter set to obtain the multiple sub-parameter sets, it may be determined in step S202 based on the splitting of the first parameter set that multiple first processing operators are to be executed by multiple nodes respectively, so as to obtain an operator sequence corresponding to each node and including the first processing operator and the communication operator. In this case, the respective processing types of the plurality of first processing operators are identical (i.e., for implementing the first processing), but the processing parameters are different.

In some examples, starting the plurality of processes respectively corresponding to the plurality of nodes in step S203 may be implemented using an mp.spin function in an OpenMPI library. In some examples, other encapsulation functions or custom code blocks may be utilized, which are not limited in this disclosure.

In some examples, the compiling of the plurality of operator sequences corresponding to the plurality of nodes using the compiler in step S204 may be accomplished using a TVM compiler framework. In some examples, various functions required by communication operators (e.g., all reduce operators, ALL GATHER operators) may be added using a common add operator function of the TVM, such as relay networking, attrs attribute addition, shape derivation, rel functions, and the like. In this case, the operator's input data shape parameters and output data shape parameters may be defined according to the particular type of communication operator. For example, the value of the first shape dimension of the output data of the all reduce operator is the product of the value of the first shape dimension of the input data and the number of nodes, and the parameters of the other shape dimensions are consistent with the parameters of the input data, keeping the output data shape parameters consistent with the input data shape parameters ALL GATHER.

According to some embodiments, determining in step S205 the identity of the target communication domain constituted by the plurality of processes comprises: generating an identification of the target communication domain using a first process of the plurality of processes, and issuing the identification of the target communication domain, the input data, and the plurality of executable files to the plurality of nodes using the plurality of processes in step S206 includes: transmitting, by the first process, an identification of the target communication domain to at least one other process of the plurality of processes other than the first process; and creating context variables of a long life cycle in the plurality of nodes with the plurality of processes, wherein a value of the context variables of each node includes an identification of the target communication domain. Therefore, the first process can be utilized to generate the identification of the communication domain, and the identification is broadcasted to other processes so as to be stored in the context variable of the node, so that each node can be ensured to correctly use the identification in the data processing process.

It is to be understood that the first process may be any one of a plurality of processes, as long as the unification of the identifications in the nodes can be achieved based on the above steps. In some examples, the step of generating the identification may be accomplished by calling a get_unique_id function in the collection communication library, and then by calling init_rank to effect distribution of the unique id.

According to some embodiments, the generating the identification of the target communication domain with a first process of the plurality of processes comprises: creating a first folder in a storage space shared by the plurality of processes using the first process; and writing the identification of the target communication domain to the first folder, and wherein transmitting the identification of the target communication domain to at least one other process of the plurality of processes other than the first process using the first process comprises: and reading the identification of the target communication domain from the first folder by utilizing the at least one other process. Therefore, broadcasting of the target communication domain identification can be achieved through multi-process shared storage, and convenience is improved.

According to some embodiments, the generating the identification of the target communication domain with a first process of the plurality of processes further comprises: after creating the first folder, creating a first write lock to lock the first folder; and releasing the first write lock in response to determining that the identification of the target communication domain has been written to the first folder. Therefore, the file locking mechanism can be utilized to ensure the synchronization of shared resources among a plurality of processes, namely, the unique correctness of the communication domain identification is ensured, and the correctness of the data processing result is further ensured.

It will be appreciated that the above-described step of achieving identity synchronization via multi-process shared storage can only be applied in a single-machine multi-card distributed system, i.e. using the physical interconnection of multiple processing cores (nodes) within a single machine with shared storage. When the method 200 is applied to a multi-machine multi-card distributed system, identification synchronization can be achieved using the distribution steps described above.

According to some embodiments, the plurality of nodes comprises a communication library function for implementing the communication operator, the value of the context variable of each node further comprising a parameter term of the communication library function. By applying the steps, when each node of the distributed system is provided with the customized communication library function, a plurality of processes can configure the initialization information of each node by using the context variables so as to adapt to the use specification of the communication library function and improve the convenience.

According to some embodiments, starting a plurality of processes respectively corresponding to the plurality of nodes in step S203 includes: starting a plurality of processes respectively corresponding to the plurality of nodes by running a first code block based on a first programming language; and recording process information of the plurality of processes by creating environment variables, and wherein, for each node of the plurality of nodes, issuing the identifier of the target communication domain, the input data, and the executable file corresponding to the node by using the process corresponding to the node in step S206 includes: issuing the identification of the target communication domain, the input data and the executable file to the node by running a second code block based on a second programming language, wherein the second code block is capable of acquiring process information of the plurality of processes based on the environment variable. Therefore, when mixed programming is carried out by utilizing different programming languages (such as Python and C++), process information can be transmitted among different code blocks by utilizing environment variables, so that the flexibility is improved, the characteristics of different programming languages or the characteristics of different types of program products are fully utilized, and the correctness of flow connection is ensured.

In some examples, the foregoing step of starting multi-process distributed may be implemented by using Python, and further distributed process information (for example, the number of processes) may be obtained by calling the mpi.comm_world.get_rank () function and the mpi.comm_world.get_size () function in the MPI library, and transferred to the runtime of the TVM compiler framework through the environment variable, and further, the c++ may be used to implement the compilation optimization and deployment execution process of the operator sequence based on the TVM compiler framework.

In some examples, aggregate communications among the plurality of nodes (e.g., all reduce communications computations and ALL GATHER communications computations) may be implemented by invoking the MPI_Send/MPI_Recv/MPI_Barrier communications interface in the MPI library and the aforementioned distributed process information.

In some examples, the debug environment variable may be further defined for printing communication rate information for aggregate communication to facilitate debug optimization of the distributed system.

By applying the mixed programming means, the flexibility and convenience of Python and the high operation efficiency of C++ can be considered, so that the distributed system is utilized conveniently and efficiently to realize data processing, and the data processing scale and the processing efficiency are improved.

Fig. 3 shows a flow chart of a data processing method 300 according to an exemplary embodiment of the present disclosure. As shown in fig. 3, the method 300 includes:

step S301, responding to the determination that the first processing needs to be executed based on the first parameter set, splitting the first parameter set into a plurality of sub-parameter sets respectively corresponding to a plurality of nodes in the distributed system;

step S302, determining an operator sequence to be executed by each node in the plurality of nodes;

step S303, adding a communication library function for realizing the communication operator to a compiler;

step S304, starting a plurality of processes respectively corresponding to the plurality of nodes;

Step S305, completing compiling of a plurality of operator sequences corresponding to a plurality of nodes by utilizing a compiler to obtain a plurality of executable files;

Step S306, generating an identification of the target communication domain by using a first process in the plurality of processes;

step S307, transmitting, by using the first process, the identification of the target communication domain to at least one other process other than the first process among the plurality of processes;

Step S308, transmitting input data and executable files corresponding to the nodes by utilizing processes corresponding to the nodes for each node so that the nodes can execute operator sequences based on the input data;

Step S309, determining a first processing result based on output data of communication operators of the plurality of nodes.

By applying the method 300, the operator sequence corresponding to each node can be obtained by splitting the parameters and inserting operators, the first processing operator in the operator sequence can realize the first processing to obtain the partial processing result corresponding to the partial parameters, and the communication operator can realize the data aggregation to aggregate the partial processing results of the plurality of nodes into the complete first processing result. By adding the communication operator in the compiler in a self-defined manner, the communication domain in the distributed system can be constructed by starting multi-process processing on the basis of conveniently completing the compiling by using the compiler, so that the communication operator can be executed correctly, and the complete distributed processing can be realized conveniently.

According to an aspect of the present disclosure, there is also provided a data processing apparatus applied to a distributed system. Fig. 4 shows a block diagram of a data processing apparatus 400 applied to a distributed system according to an exemplary embodiment of the present disclosure. As shown in fig. 4, the apparatus 400 includes:

A splitting unit 401 configured to split a first parameter set into a plurality of sub-parameter sets respectively corresponding to a plurality of nodes in the distributed system in response to determining that the first processing needs to be performed on the input data based on the first parameter set;

A first determining unit 402 configured to determine, for each node of the plurality of nodes, an operator sequence to be executed by the node, wherein the operator sequence includes a first processing operator and a communication operator that execute the first processing based on a corresponding sub-parameter set of the node, a parameter item of the communication operator includes an identification of a target communication domain, and the communication operator indicates an operation of aggregating data within the target communication domain;

A starting unit 403 configured to start a plurality of processes respectively corresponding to the plurality of nodes;

A compiling unit 404 configured to complete compiling of a plurality of operator sequences corresponding to the plurality of nodes with a compiler to obtain a plurality of executable files;

a second determining unit 405 configured to determine an identification of a target communication domain constituted by the plurality of processes;

an execution unit 406 configured to, for each node of the plurality of nodes, issue, to the node, an identification of the target communication domain, the input data, and an executable file corresponding to the node using a process corresponding to the node, so that the node can execute a first processing operator in an operator sequence corresponding to the node based on the input data, and can execute a communication operator in the operator sequence based on output data of the first processing operator and the identification of the target communication domain; and

The third determining unit 407 is configured to determine the first processing result based on output data of communication operators of the plurality of nodes.

According to some embodiments, the second determining unit 405 is configured to: generating an identification of the target communication domain with a first process of the plurality of processes, and wherein the execution unit 406 is configured to: transmitting, by the first process, an identification of the target communication domain to at least one other process of the plurality of processes other than the first process; and creating context variables for the plurality of nodes using the plurality of processes, wherein a value of the context variable for each node includes an identification of the target communication domain.

According to some embodiments, the second determining unit 405 is configured to: creating a first folder in a storage space shared by the plurality of processes using the first process; and writing an identification of the target communication domain to the first folder, and wherein the execution unit 406 is configured to: and reading the identification of the target communication domain from the first folder by utilizing the at least one other process.

According to some embodiments, the second determining unit 405 is further configured to: creating a first write lock to lock the first folder after creating the first folder; and releasing the first write lock in response to determining that the identification of the target communication domain has been written to the first folder.

According to some embodiments, the plurality of nodes comprises a communication library function for implementing the communication operator, the value of the context variable of each node further comprising a parameter term of the communication library function.

According to some embodiments, the initiation unit 403 is configured to: starting a plurality of processes respectively corresponding to the plurality of nodes by running a first code block based on a first programming language; and recording process information of the plurality of processes by creating an environment variable, and wherein the execution unit 406 is configured to: issuing the identification of the target communication domain, the input data and the executable file to the node by running a second code block based on a second programming language, wherein the second code block is capable of acquiring process information of the plurality of processes based on the environment variable.

It should be appreciated that the operation of the various elements of the data processing apparatus 400 shown in fig. 4 may correspond to the various steps of the data processing method 200 described in fig. 2 as being applied to a distributed system. Thus, the operations, features and advantages described above with respect to method 200 are equally applicable to apparatus 400 and the various units it comprises. For brevity, certain operations, features and advantages are not described in detail herein.

According to an aspect of the present disclosure, there is also provided an electronic apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data processing method described above.

According to an aspect of the present disclosure, there is also provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the above-described data processing method.

According to an aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the above-mentioned data processing method.

Referring to fig. 5, a block diagram of an electronic device 500 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the apparatus 500 includes a computing unit 501 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The computing unit 501, ROM 502, and RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Various components in the device 500 are connected to the I/O interface 505, including: an input unit 506, an output unit 507, a storage unit 508, and a communication unit 509. The input unit 506 may be any type of device capable of inputting information to the device 500, the input unit 506 may receive input numeric or character information and generate key signal inputs related to user settings and/or function control of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a trackpad, a trackball, a joystick, a microphone, and/or a remote control. The output unit 507 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 508 may include, but is not limited to, magnetic disks, optical disks. The communication unit 509 allows the device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, 802.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the respective methods and processes described above, such as a data processing method. For example, in some embodiments, the data processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When a computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the data processing method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

While embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the methods, systems, and apparatus described above are merely illustrative embodiments or examples and that the scope of the present disclosure is not limited by these embodiments or examples. Various elements of the embodiments or examples may be omitted or replaced with equivalent elements thereof. Furthermore, the steps may be performed in a different order than described in the present disclosure. Further, various elements of the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced by equivalent elements that appear after the disclosure.

Claims

1. A data processing method applied to a distributed system, comprising:

responsive to determining that a first process needs to be performed on input data based on a first set of parameters, splitting the first set of parameters into a plurality of sub-sets of parameters respectively corresponding to a plurality of nodes in the distributed system;

Determining, for each node of the plurality of nodes, an operator sequence to be executed by the node, wherein the operator sequence includes a first processing operator and a communication operator that execute the first processing based on a corresponding set of sub-parameters of the node, a parameter item of the communication operator includes an identification of a target communication domain, and the communication operator indicates an operation of aggregating data within the target communication domain;

starting a plurality of processes respectively corresponding to the plurality of nodes;

compiling a plurality of operator sequences corresponding to the plurality of nodes by utilizing a compiler to obtain a plurality of executable files;

Determining the identification of a target communication domain formed by the plurality of processes;

For each node in the plurality of nodes, issuing an identification of the target communication domain, the input data and an executable file corresponding to the node by utilizing a process corresponding to the node, so that the node can execute a first processing operator in an operator sequence corresponding to the node based on the input data, and can execute a communication operator in the operator sequence based on output data of the first processing operator and the identification of the target communication domain; and

And determining a first processing result based on output data of communication operators of the plurality of nodes.

2. The method of claim 1, wherein the determining the identity of the target communication domain comprised of the plurality of processes comprises:

generating an identification of the target communication domain with a first process of the plurality of processes,

And wherein said issuing, with said plurality of processes, said identification of said target communication domain, said input data, and said plurality of executable files to said plurality of nodes comprises:

transmitting, by the first process, an identification of the target communication domain to at least one other process of the plurality of processes other than the first process; and

Creating context variables for the plurality of nodes using the plurality of processes, wherein a value of the context variable for each node includes an identification of the target communication domain.

3. The method of claim 2, wherein the generating the identification of the target communication domain with a first process of the plurality of processes comprises:

creating a first folder in a storage space shared by the plurality of processes using the first process; and

Writing an identification of the target communication domain to the first folder,

And wherein transmitting, with the first process, the identification of the target communication domain to at least one other process of the plurality of processes other than the first process comprises:

And reading the identification of the target communication domain from the first folder by utilizing the at least one other process.

4. The method of claim 3, wherein the generating the identification of the target communication domain with the first process of the plurality of processes further comprises:

After creating the first folder, creating a first write lock to lock the first folder; and

The first write lock is released in response to determining that the identification of the target communication domain has been written to the first folder.

5. The method of any of claims 2-4, wherein the plurality of nodes includes a communication library function for implementing the communication operator, the value of the context variable of each node further including a parameter term of the communication library function.

6. The method of any of claims 1-5, wherein the launching a plurality of processes respectively corresponding to the plurality of nodes comprises:

starting a plurality of processes respectively corresponding to the plurality of nodes by running a first code block based on a first programming language; and

By creating an environment variable, process information of the plurality of processes is recorded,

And wherein for each node of the plurality of nodes, issuing, to the node, the identity of the target communication domain, the input data, and the executable file corresponding to the node using the process corresponding to the node comprises:

Issuing the identification of the target communication domain, the input data and the executable file to the node by running a second code block based on a second programming language, wherein the second code block is capable of acquiring process information of the plurality of processes based on the environment variable.

7. A data processing apparatus for use in a distributed system, comprising:

A splitting unit configured to split a first parameter set into a plurality of sub-parameter sets respectively corresponding to a plurality of nodes in the distributed system in response to determining that the first processing needs to be performed on input data based on the first parameter set;

A first determining unit configured to determine, for each node of the plurality of nodes, an operator sequence to be executed by the node, wherein the operator sequence includes a first processing operator and a communication operator that execute the first processing based on a corresponding sub-parameter set of the node, a parameter item of the communication operator includes an identification of a target communication domain, and the communication operator indicates an operation of aggregating data within the target communication domain;

A starting unit configured to start a plurality of processes respectively corresponding to the plurality of nodes;

a compiling unit configured to complete compiling of a plurality of operator sequences corresponding to the plurality of nodes by using a compiler to obtain a plurality of executable files;

a second determining unit configured to determine an identification of a target communication domain constituted by the plurality of processes;

An execution unit configured to issue, for each node of the plurality of nodes, an identification of the target communication domain, the input data, and an executable file corresponding to the node using a process corresponding to the node, so that the node can execute a first processing operator in an operator sequence corresponding to the node based on the input data, and can execute a communication operator in the operator sequence based on output data of the first processing operator and the identification of the target communication domain; and

And a third determination unit configured to determine a first processing result based on output data of communication operators of the plurality of nodes.

8. The apparatus of claim 7, wherein the second determination unit is configured to:

And wherein the execution unit is configured to:

9. The apparatus of claim 8, wherein the second determination unit is configured to:

And wherein the execution unit is configured to:

10. The apparatus of claim 9, wherein the second determination unit is further configured to:

creating a first write lock to lock the first folder after creating the first folder; and

11. The apparatus of any of claims 8-10, wherein the plurality of nodes comprises a communication library function for implementing the communication operator, the value of the context variable of each node further comprising a parameter term of the communication library function.

12. The apparatus of any of claims 7-11, wherein the initiation unit is configured to:

And wherein the execution unit is configured to:

13. An electronic device, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein the method comprises the steps of

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-6.

15. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the method according to any of claims 1-6.