[go: up one dir, main page]

CN118796355A - Data processing method, device, computer readable storage medium and electronic device - Google Patents

Data processing method, device, computer readable storage medium and electronic device Download PDF

Info

Publication number
CN118796355A
CN118796355A CN202410840016.7A CN202410840016A CN118796355A CN 118796355 A CN118796355 A CN 118796355A CN 202410840016 A CN202410840016 A CN 202410840016A CN 118796355 A CN118796355 A CN 118796355A
Authority
CN
China
Prior art keywords
container group
storage
directory
application
container
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410840016.7A
Other languages
Chinese (zh)
Inventor
程竹江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Ping Pong Intelligent Technology Co ltd
Original Assignee
Hangzhou Ping Pong Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Ping Pong Intelligent Technology Co ltd filed Critical Hangzhou Ping Pong Intelligent Technology Co ltd
Priority to CN202410840016.7A priority Critical patent/CN118796355A/en
Publication of CN118796355A publication Critical patent/CN118796355A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种数据处理方法、装置、计算机可读存储介质及电子设备。涉及分布式领域,该方法包括:通过第一容器组中的应用获取匹配关系集合以及第一容器组的容器组信息,其中,匹配关系集合中包括多个第一容器组与多个存储目录之间的匹配关系,不同的第一容器组与不同的存储目录匹配;通过第一容器组中的应用依据容器组信息和匹配关系集合确定第一容器组匹配的存储目录,得到目标存储目录;通过第一容器组中的应用将该应用处理完成的数据存储至目标存储目录。本发明解决了相关技术中各个容器组中的应用将处理后的数据写入同一个文件中,容易导致数据丢失或错乱,从而存在数据处理可靠性差的技术问题。

The present invention discloses a data processing method, device, computer-readable storage medium and electronic device. It relates to the field of distributed computing, and the method comprises: obtaining a matching relationship set and container group information of the first container group through an application in a first container group, wherein the matching relationship set includes matching relationships between multiple first container groups and multiple storage directories, and different first container groups match different storage directories; determining the storage directory matched by the first container group according to the container group information and the matching relationship set through an application in the first container group to obtain a target storage directory; and storing the data processed by the application in the first container group to the target storage directory through an application in the first container group. The present invention solves the technical problem in the related art that the applications in each container group write the processed data into the same file, which easily leads to data loss or confusion, and thus has poor data processing reliability.

Description

数据处理方法、装置、计算机可读存储介质及电子设备Data processing method, device, computer readable storage medium and electronic device

技术领域Technical Field

本发明涉及分布式领域,具体而言,涉及一种数据处理方法、装置、计算机可读存储介质及电子设备。The present invention relates to the field of distribution, and in particular to a data processing method, device, computer-readable storage medium and electronic device.

背景技术Background Art

随着云计算技术的快速发展,容器化技术得到了广泛的应用,尤其是在Kubernetes(简称K8s)这样的容器编排平台。Kubernetes是一个开源的容器编排系统,它能够自动化部署、扩展和管理容器化应用程序。在K8s中,容器组Pod是基本的部署单元,其用于封装一个或多个容器,实现应用程序的部署和管理。With the rapid development of cloud computing technology, containerization technology has been widely used, especially in container orchestration platforms such as Kubernetes (K8s for short). Kubernetes is an open source container orchestration system that can automatically deploy, scale and manage containerized applications. In K8s, the container group Pod is the basic deployment unit, which is used to encapsulate one or more containers to implement application deployment and management.

K8s中的Pod常常被用来处理数据流。在数据处理完成后,Pod将数据写入文件中以便其它服务使用。在相关技术中,K8s系统内包括多个容器组Pod,各个Pod分别对数据进行处理,并将处理完成的数据写入同一个文件中,在数据累积到一定大小的情况下,上传至云存储服务,该过程如图1所示。当多个容器组pod往同一个文件写数据时,可能会出现写入冲突的问题,导致数据不一致或丢失,在高并发场景下,多个Pod同时将数据写入同一个文件还可能会导致I/O(Input/Output,输入/输出)性能瓶颈,影响整个系统的稳定性和响应速度。Pods in K8s are often used to process data streams. After data processing is completed, the Pod writes the data to a file for use by other services. In related technologies, the K8s system includes multiple container group Pods, each of which processes the data separately and writes the processed data into the same file. When the data accumulates to a certain size, it is uploaded to the cloud storage service. The process is shown in Figure 1. When multiple container group Pods write data to the same file, write conflicts may occur, resulting in inconsistent or lost data. In high-concurrency scenarios, multiple Pods writing data to the same file at the same time may also cause I/O (Input/Output) performance bottlenecks, affecting the stability and response speed of the entire system.

综上所述,对于现有的基于Pod的数据处理方法,多个Pod往同一个文件中写入处理后的数据会导致数据丢失或错乱问题,从而影响数据处理的可靠性。In summary, for existing Pod-based data processing methods, multiple Pods writing processed data to the same file may cause data loss or confusion, thereby affecting the reliability of data processing.

发明内容Summary of the invention

本发明实施例提供了一种数据处理方法、装置、计算机可读存储介质及电子设备,以至少解决相关技术中各个容器组中的应用将处理后的数据写入同一个文件中,容易导致数据丢失或错乱,从而存在数据处理可靠性差的技术问题。The embodiments of the present invention provide a data processing method, device, computer-readable storage medium and electronic device to at least solve the technical problem in the related art that applications in each container group write processed data into the same file, which easily leads to data loss or confusion, thereby resulting in poor data processing reliability.

根据本发明实施例的一个方面,提供了一种数据处理方法,应用于分布式系统,分布式系统中包括多个第一容器组,第一容器组为运行有数据处理应用的容器组,该方法包括:通过第一容器组中的应用获取匹配关系集合以及第一容器组的容器组信息,其中,匹配关系集合中包括多个第一容器组与多个存储目录之间的匹配关系,不同的第一容器组与不同的存储目录匹配;通过第一容器组中的应用依据容器组信息和匹配关系集合确定第一容器组匹配的存储目录,得到目标存储目录;通过第一容器组中的应用将该应用处理完成的数据存储至目标存储目录。According to one aspect of an embodiment of the present invention, a data processing method is provided, which is applied to a distributed system. The distributed system includes multiple first container groups, and the first container group is a container group running a data processing application. The method includes: obtaining a matching relationship set and container group information of the first container group through an application in the first container group, wherein the matching relationship set includes matching relationships between the multiple first container groups and multiple storage directories, and different first container groups match different storage directories; determining a storage directory matched by the first container group according to the container group information and the matching relationship set through the application in the first container group to obtain a target storage directory; and storing data processed by the application in the first container group in the target storage directory through the application in the first container group.

进一步地,数据处理方法还包括:依据容器组信息,判断匹配关系集合中是否包括第一容器组对应的匹配关系;若包括第一容器组对应的匹配关系,则将第一容器组对应的匹配关系中的存储目录确定为目标存储目录;若不包括第一容器组对应的匹配关系,则确定第一存储目录,并将第一存储目录确定为目标存储目录,其中,第一存储目录是指未匹配有第一容器组的存储目录。Furthermore, the data processing method further includes: judging, based on the container group information, whether the matching relationship set includes the matching relationship corresponding to the first container group; if the matching relationship corresponding to the first container group is included, determining the storage directory in the matching relationship corresponding to the first container group as the target storage directory; if the matching relationship corresponding to the first container group is not included, determining the first storage directory, and determining the first storage directory as the target storage directory, wherein the first storage directory refers to a storage directory that is not matched with the first container group.

进一步地,数据处理方法还包括:在将第一存储目录确定为目标存储目录之后,在匹配关系集合中建立第一容器组与第一存储目录之间的匹配关系。Furthermore, the data processing method further includes: after determining the first storage directory as the target storage directory, establishing a matching relationship between the first container group and the first storage directory in the matching relationship set.

进一步地,数据处理方法还包括:获取预设的多个目录号;对于多个目录号中的第N个目录号,判断匹配关系集合中是否存在第二存储目录,其中,第二存储目录为使用第N个目录号的存储目录;若匹配关系集合中不存在第二存储目录,则将第二存储目录确定为第一存储目录;若匹配关系集合中存在第二存储目录,则更新N,并重复执行判断匹配关系集合中是否存在新的第二存储目录的步骤,直至匹配关系集合中不存在新的第二存储目录,将新的第二存储目录确定为第一存储目录。Furthermore, the data processing method also includes: obtaining multiple preset directory numbers; for the Nth directory number among the multiple directory numbers, determining whether there is a second storage directory in the matching relationship set, wherein the second storage directory is a storage directory using the Nth directory number; if the second storage directory does not exist in the matching relationship set, determining the second storage directory as the first storage directory; if the second storage directory exists in the matching relationship set, updating N, and repeating the step of determining whether there is a new second storage directory in the matching relationship set until there is no new second storage directory in the matching relationship set, and determining the new second storage directory as the first storage directory.

进一步地,分布式系统中还包括第二容器组,第二容器组为运行有目录管理应用的容器组,数据处理方法还包括:通过第二容器组中的应用确定分布式系统中已存在的存储目录,得到多个第三存储目录;对于每个第三存储目录,通过第二容器组中的应用判断匹配关系集合中是否包括第三存储目录;在包括第三存储目录的情况下,通过第二容器组中的应用判断第三存储目录在匹配关系集合中匹配的第一容器组是否处于运行状态;在第三存储目录匹配的第一容器组未处于运行状态的情况下,通过第二容器组中的应用将第三存储目录对应的匹配关系从匹配关系集合中删除。Furthermore, the distributed system further includes a second container group, which is a container group running a directory management application. The data processing method further includes: determining, by the application in the second container group, a storage directory existing in the distributed system to obtain a plurality of third storage directories; for each third storage directory, determining, by the application in the second container group, whether the matching relationship set includes the third storage directory; if the third storage directory is included, determining, by the application in the second container group, whether a first container group matched by the third storage directory in the matching relationship set is in a running state; if the first container group matched by the third storage directory is not in a running state, deleting, by the application in the second container group, a matching relationship corresponding to the third storage directory from the matching relationship set.

进一步地,数据处理方法还包括:在第三存储目录匹配的第一容器组未处于运行状态的情况下,通过第二容器组中的应用将第三存储目录下的数据上传至目标服务器;通过第二容器组中的应用删除第三存储目录下的数据,并删除第三存储目录。Furthermore, the data processing method also includes: when the first container group matching the third storage directory is not in operation, uploading the data under the third storage directory to the target server through the application in the second container group; deleting the data under the third storage directory through the application in the second container group, and deleting the third storage directory.

进一步地,数据处理方法还包括:在通过第二容器组中的应用判断匹配关系集合中是否包括第三存储目录之后,在不包括第三存储目录的情况下,通过第二容器组中的应用获取预设的配置信息,其中,配置信息表征是否允许删除匹配关系集合中未记录的第三存储目录;通过第二容器组中的应用依据配置信息确定对第三存储目录的处理方式。Furthermore, the data processing method also includes: after determining whether the matching relationship set includes the third storage directory through the application in the second container group, if the third storage directory is not included, obtaining preset configuration information through the application in the second container group, wherein the configuration information indicates whether the third storage directory not recorded in the matching relationship set is allowed to be deleted; and determining, through the application in the second container group, a processing method for the third storage directory based on the configuration information.

根据本发明实施例的另一方面,还提供了一种数据处理装置,应用于分布式系统,分布式系统中包括多个第一容器组,第一容器组为运行有数据处理应用的容器组,该装置包括:第一获取模块,用于通过第一容器组中的应用获取匹配关系集合以及第一容器组的容器组信息,其中,匹配关系集合中包括多个第一容器组与多个存储目录之间的匹配关系,不同的第一容器组与不同的存储目录匹配;第一确定模块,用于通过第一容器组中的应用依据容器组信息和匹配关系集合确定第一容器组匹配的存储目录,得到目标存储目录;存储模块,用于通过第一容器组中的应用将该应用处理完成的数据存储至目标存储目录。According to another aspect of an embodiment of the present invention, a data processing device is provided, which is applied to a distributed system. The distributed system includes multiple first container groups, and the first container group is a container group running a data processing application. The device includes: a first acquisition module, which is used to acquire a matching relationship set and container group information of the first container group through the application in the first container group, wherein the matching relationship set includes matching relationships between multiple first container groups and multiple storage directories, and different first container groups match different storage directories; a first determination module, which is used to determine the storage directory matched by the first container group according to the container group information and the matching relationship set through the application in the first container group, and obtain a target storage directory; and a storage module, which is used to store data processed by the application in the first container group to the target storage directory.

根据本发明实施例的另一方面,还提供了一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,其中,计算机程序被设置为运行时执行上述的数据处理方法。According to another aspect of an embodiment of the present invention, a computer-readable storage medium is provided, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned data processing method when running.

根据本发明实施例的另一方面,还提供了一种电子设备,电子设备包括一个或多个处理器;存储器,用于存储一个或多个程序,当一个或多个程序被一个或多个处理器执行时,使得一个或多个处理器实现用于运行程序,其中,程序被设置为运行时执行上述的数据处理方法。According to another aspect of an embodiment of the present invention, an electronic device is also provided, which includes one or more processors; a memory for storing one or more programs, so that when the one or more programs are executed by the one or more processors, the one or more processors are implemented to run the programs, wherein the programs are configured to execute the above-mentioned data processing method when running.

在本发明实施例中,采用将不同容器组中应用处理完成的数据写入不同目录的方式,通过第一容器组中的应用获取匹配关系集合以及第一容器组的容器组信息,然后通过第一容器组中的应用依据容器组信息和匹配关系集合确定第一容器组匹配的存储目录,得到目标存储目录,从而通过第一容器组中的应用将该应用处理完成的数据存储至目标存储目录。其中,匹配关系集合中包括多个第一容器组与多个存储目录之间的匹配关系,不同的第一容器组与不同的存储目录匹配。In the embodiment of the present invention, the data processed by the applications in different container groups are written into different directories, and the matching relationship set and the container group information of the first container group are obtained through the application in the first container group, and then the storage directory matched by the first container group is determined by the application in the first container group according to the container group information and the matching relationship set to obtain the target storage directory, so that the data processed by the application is stored in the target storage directory through the application in the first container group. The matching relationship set includes matching relationships between multiple first container groups and multiple storage directories, and different first container groups match different storage directories.

在上述过程中,通过获取匹配关系集合以及第一容器组的容器组信息,并依据容器组信息和匹配关系集合确定第一容器组匹配的存储目录,实现了对当前的第一容器组对应的目标存储目录的准确确定,由于不同的第一容器组与不同的存储目录匹配,因此,目标存储目录仅用于存储当前的第一容器组中的应用所处理完成的数据,当通过第一容器组中的应用将该应用处理完成的数据存储至对应的目标存储目录时,实现了将不同第一容器组中的应用处理完成的数据写入不同目录,也即写入不同文件,从而避免了数据丢失或错乱的现象发生,提高数据处理可靠性。In the above process, by acquiring the matching relationship set and the container group information of the first container group, and determining the storage directory matched by the first container group according to the container group information and the matching relationship set, the target storage directory corresponding to the current first container group is accurately determined. Since different first container groups match different storage directories, the target storage directory is only used to store data processed by the application in the current first container group. When the data processed by the application is stored in the corresponding target storage directory through the application in the first container group, the data processed by the applications in different first container groups is written into different directories, that is, into different files, thereby avoiding data loss or confusion and improving data processing reliability.

由此可见,本申请所提供的方案达到了将不同容器组中应用处理完成的数据写入不同目录的目的,从而实现了提高数据处理可靠性的技术效果,进而解决了相关技术中各个容器组中的应用将处理后的数据写入同一个文件中,容易导致数据丢失或错乱,从而存在数据处理可靠性差的技术问题。It can be seen that the solution provided by the present application achieves the purpose of writing the data processed by applications in different container groups into different directories, thereby achieving the technical effect of improving the reliability of data processing, and further solves the technical problem that the applications in each container group in the related technology write the processed data into the same file, which easily leads to data loss or confusion, thereby resulting in poor data processing reliability.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The drawings described herein are used to provide a further understanding of the present invention and constitute a part of this application. The exemplary embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention. In the drawings:

图1是相关技术中的一种可选的数据处理方法的示意图;FIG1 is a schematic diagram of an optional data processing method in the related art;

图2是根据本发明实施例的一种可选的数据处理方法的示意图一;FIG2 is a schematic diagram 1 of an optional data processing method according to an embodiment of the present invention;

图3是根据本发明实施例的一种可选的第二容器组中的应用的工作示意图;FIG3 is a schematic diagram of an optional working application in a second container group according to an embodiment of the present invention;

图4是根据本发明实施例的一种可选的数据处理方法的示意图二;FIG4 is a second schematic diagram of an optional data processing method according to an embodiment of the present invention;

图5是根据本发明实施例的一种可选的第一容器组中的应用的工作示意图;FIG5 is a schematic diagram of an optional working application in a first container group according to an embodiment of the present invention;

图6是根据本发明实施例的一种可选的数据处理装置的示意图;FIG6 is a schematic diagram of an optional data processing device according to an embodiment of the present invention;

图7是根据本发明实施例的一种可选的电子设备的示意图。FIG. 7 is a schematic diagram of an optional electronic device according to an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。In order to enable those skilled in the art to better understand the scheme of the present invention, the technical scheme in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work should fall within the scope of protection of the present invention.

需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the specification and claims of the present invention and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchanged where appropriate, so that the embodiments of the present invention described herein can be implemented in an order other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, for example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to those steps or units that are clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products or devices.

需要说明的是,本申请所涉及的用户信息(包括但不限于用户设备信息、用户个人信息等)和数据(包括但不限于用于分析的数据、存储的数据、展示的数据等),均为经用户授权或者经过各方充分授权的信息和数据,并且相关数据的收集、使用和处理需要遵守相关地区的相关法律法规和标准,并提供有相应的操作入口,供用户选择授权或者拒绝。It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with relevant laws, regulations and standards in the relevant regions, and provide corresponding operation entrances for users to choose to authorize or refuse.

为了便于描述,以下对本申请实施例涉及的部分名词或术语进行说明:For the convenience of description, some nouns or terms involved in the embodiments of the present application are explained below:

Kubernetes:Kubernetes是开源的一个容器编排引擎,它支持自动化部署、大规模可伸缩、应用容器化管理。Kubernetes是一个可移植、可扩展的开源平台,用于管理容器化的工作负载和服务,可促进声明式配置和自动化。kubernetes拥有一个庞大且快速增长的生态系统,其服务、支持和工具的使用范围广泛。Kubernetes: Kubernetes is an open source container orchestration engine that supports automated deployment, large-scale scalability, and application containerization management. Kubernetes is a portable, extensible, open source platform for managing containerized workloads and services that facilitates declarative configuration and automation. Kubernetes has a large and rapidly growing ecosystem with a wide range of services, support, and tools.

Pod:在kubernetes集群中,Pod是所有业务类型的基础,也是K8S管理的最小单位级,它是一个或多个容器的组合。这些容器共享存储、网络和命名空间,以及如何运行的规范。在Pod中,所有容器都被同一安排和调度,并运行在共享的上下文中。对于具体应用而言,Pod是它们的逻辑主机,Pod包含业务相关的多个应用容器。Pod: In a kubernetes cluster, Pod is the basis of all business types and the smallest unit level of K8S management. It is a combination of one or more containers. These containers share storage, network, and namespace, as well as specifications for how to run. In a Pod, all containers are arranged and scheduled in the same way and run in a shared context. For specific applications, Pod is their logical host, and Pod contains multiple application containers related to the business.

容器:容器是一个标准化的软件单元,它将代码及其所有依赖关系打包,以便应用程序从一个计算环境可靠快速地运行到另一个计算环境。容器镜像是一个轻量的独立的可执行的软件包。包含程序运行的时候所需的一切:代码,运行依赖,系统工具,系统库和设置。容器是应用层的抽象,它将代码和依赖关系打包在一起。多个容器可以在同一台机器上运行,并与其他容器共享操作系统内核,每个容器在用户空间中作为独立进程运行。Container: A container is a standardized software unit that packages code and all its dependencies so that applications can run reliably and quickly from one computing environment to another. A container image is a lightweight, independent executable software package. It contains everything a program needs to run: code, runtime dependencies, system tools, system libraries, and settings. The container is an abstraction at the application layer that packages code and dependencies together. Multiple containers can run on the same machine and share the operating system kernel with other containers, and each container runs as an independent process in user space.

实施例1Example 1

根据本发明实施例,提供了一种数据处理方法的实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。According to an embodiment of the present invention, an embodiment of a data processing method is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer executable instructions, and although a logical order is shown in the flowchart, in some cases, the steps shown or described can be executed in an order different from that shown here.

图2是根据本发明实施例的一种可选的数据处理方法的示意图一,如图2所示,该方法应用于分布式系统,分布式系统中包括多个第一容器组,第一容器组为运行有数据处理应用的容器组,该方法包括如下步骤:FIG. 2 is a schematic diagram of an optional data processing method according to an embodiment of the present invention. As shown in FIG. 2 , the method is applied to a distributed system, the distributed system includes a plurality of first container groups, the first container group is a container group running a data processing application, and the method includes the following steps:

步骤S201,通过第一容器组中的应用获取匹配关系集合以及第一容器组的容器组信息,其中,匹配关系集合中包括多个第一容器组与多个存储目录之间的匹配关系,不同的第一容器组与不同的存储目录匹配。Step S201: obtaining a matching relationship set and container group information of the first container group through an application in the first container group, wherein the matching relationship set includes matching relationships between multiple first container groups and multiple storage directories, and different first container groups match different storage directories.

可选的,在本实施例中,分布式系统可以是Kubernetes系统,容器组可以是指Pod,数据处理应用至少用于处理数据,处理方式包括但不限于解析、格式转换、压缩等。多个第一容器组的代码、配置可以相同,也可以不同。Optionally, in this embodiment, the distributed system may be a Kubernetes system, the container group may refer to a Pod, and the data processing application is at least used to process data, and the processing method includes but is not limited to parsing, format conversion, compression, etc. The codes and configurations of the multiple first container groups may be the same or different.

在一些实施例中,根据实际应用场景的不同,数据处理应用有所不同,处理得到的数据也有所不同,例如,在金融场景下,数据处理应用可以是支付应用,该应用可以用于处理用户的支付请求中的数据,处理得到的数据可以是支付信息以及支付结果等数据,又例如,在大数据场景下,数据处理应用可以是信息分析应用,该应用可以依据用户的需求,对对应的大数据(例如,用户的浏览记录等)进行分析,并生成对应的分析结果(例如,用户的浏览喜好),处理得到的数据即为前述的分析结果。在本实施例中,对数据处理应用的应用类型以及应用处理完成的数据的数据内容不作具体限定。In some embodiments, the data processing application is different according to different actual application scenarios, and the processed data is also different. For example, in a financial scenario, the data processing application may be a payment application, which can be used to process the data in the user's payment request, and the processed data may be payment information and payment results. For another example, in a big data scenario, the data processing application may be an information analysis application, which can analyze the corresponding big data (for example, the user's browsing history, etc.) according to the user's needs, and generate corresponding analysis results (for example, the user's browsing preferences), and the processed data is the aforementioned analysis results. In this embodiment, there is no specific limitation on the application type of the data processing application and the data content of the data processed by the application.

可选的,对于任意一个第一容器组,该第一容器组中的应用可以获取匹配关系集合以及该第一容器组的容器组信息。例如,该应用在应用启动的情况下获取匹配关系集合以及第一容器组的容器组信息,又例如,该应用在接收到待处理数据的情况下获取匹配关系集合以及第一容器组的容器组信息。Optionally, for any first container group, an application in the first container group may obtain a matching relationship set and container group information of the first container group. For example, the application obtains the matching relationship set and container group information of the first container group when the application is started, or for another example, the application obtains the matching relationship set and container group information of the first container group when receiving data to be processed.

在一些实施例中,匹配关系集合中包括多个第一容器组与多个存储目录之间的匹配关系,若某一个第一容器组与某一个存储目录匹配,则确定该存储目录用于存储该第一容器组中的应用所处理完成的数据。其中,不同的第一容器组与不同的存储目录匹配,即各个第一容器组将处理完成的数据存储至不同的存储目录。In some embodiments, the matching relationship set includes matching relationships between multiple first container groups and multiple storage directories. If a first container group matches a storage directory, the storage directory is determined to be used to store data processed by the application in the first container group. Different first container groups match different storage directories, that is, each first container group stores the processed data in a different storage directory.

在一些实施例中,上述的容器组信息可以是指容器组的标识。例如,将容器组的标识以环境变量POD_NAME暴露,从而使得应用可以从环境变量中获取到POD_NAME的值,也即获取到容器组标识。In some embodiments, the container group information may refer to an identifier of the container group. For example, the identifier of the container group is exposed as an environment variable POD_NAME, so that the application can obtain the value of POD_NAME from the environment variable, that is, obtain the container group identifier.

步骤S202,通过第一容器组中的应用依据容器组信息和匹配关系集合确定第一容器组匹配的存储目录,得到目标存储目录。Step S202: Determine the storage directory that matches the first container group according to the container group information and the matching relationship set through the application in the first container group, and obtain the target storage directory.

例如,匹配关系集合中包括第一容器组对应的匹配关系。在此情况下,对于任意一个第一容器组,该第一容器组中的应用可以从匹配关系集合中确定第一容器组对应的匹配关系,从而依据该匹配关系确定目标存储目录。For example, the matching relationship set includes the matching relationship corresponding to the first container group. In this case, for any first container group, the application in the first container group can determine the matching relationship corresponding to the first container group from the matching relationship set, and thus determine the target storage directory according to the matching relationship.

又例如,匹配关系集合中可能包括第一容器组对应的匹配关系,也可能不包括第一容器组对应的匹配关系。在此情况下,对于任意一个第一容器组,该第一容器组中的应用可以先判断匹配关系集合中是否包括第一容器组对应的匹配关系,若包括,则依据该匹配关系确定目标存储目录。若不包括,则可以将任意一个未匹配有第一容器组的存储目录确定为目标存储目录。For another example, the matching relationship set may include the matching relationship corresponding to the first container group, or may not include the matching relationship corresponding to the first container group. In this case, for any first container group, the application in the first container group can first determine whether the matching relationship set includes the matching relationship corresponding to the first container group. If it does, the target storage directory is determined based on the matching relationship. If it does not, any storage directory that does not match the first container group can be determined as the target storage directory.

步骤S203,通过第一容器组中的应用将该应用处理完成的数据存储至目标存储目录。Step S203: The data processed by the application is stored in the target storage directory through the application in the first container group.

可选的,对于任意一个第一容器组,当该第一容器组中的应用对接收到数据完成处理后,可以将该应用处理完成的数据存储至目标存储目录。例如,目标存储目录下存储有目标文件,将该应用处理完成的数据存储至目标存储目录下的目标文件。其中,不同存储目录下的目标文件也有所不同。Optionally, for any first container group, when an application in the first container group completes processing after receiving the data, the data processed by the application can be stored in the target storage directory. For example, the target storage directory stores a target file, and the data processed by the application is stored in the target file in the target storage directory. The target files in different storage directories are also different.

基于上述步骤S201至步骤S203所限定的方案,可以获知,在本发明实施例中,采用将不同容器组中应用处理完成的数据写入不同目录的方式,通过第一容器组中的应用获取匹配关系集合以及第一容器组的容器组信息,然后通过第一容器组中的应用依据容器组信息和匹配关系集合确定第一容器组匹配的存储目录,得到目标存储目录,从而通过第一容器组中的应用将该应用处理完成的数据存储至目标存储目录。其中,匹配关系集合中包括多个第一容器组与多个存储目录之间的匹配关系,不同的第一容器组与不同的存储目录匹配。Based on the scheme defined in the above steps S201 to S203, it can be known that in the embodiment of the present invention, the data processed by the applications in different container groups are written into different directories, and the matching relationship set and the container group information of the first container group are obtained through the application in the first container group, and then the storage directory matched by the first container group is determined by the application in the first container group according to the container group information and the matching relationship set to obtain the target storage directory, so that the data processed by the application is stored in the target storage directory through the application in the first container group. The matching relationship set includes matching relationships between multiple first container groups and multiple storage directories, and different first container groups match different storage directories.

容易注意到的是,在上述过程中,通过获取匹配关系集合以及第一容器组的容器组信息,并依据容器组信息和匹配关系集合确定第一容器组匹配的存储目录,实现了对当前的第一容器组对应的目标存储目录的准确确定,由于不同的第一容器组与不同的存储目录匹配,因此,目标存储目录仅用于存储当前的第一容器组中的应用所处理完成的数据,当通过第一容器组中的应用将该应用处理完成的数据存储至对应的目标存储目录时,实现了将不同第一容器组中的应用处理完成的数据写入不同目录,也即写入不同文件,从而避免了数据丢失或错乱的现象发生,提高数据处理可靠性。It is easy to notice that, in the above process, by obtaining the matching relationship set and the container group information of the first container group, and determining the storage directory matched by the first container group according to the container group information and the matching relationship set, the target storage directory corresponding to the current first container group is accurately determined. Since different first container groups match different storage directories, the target storage directory is only used to store data processed by the application in the current first container group. When the data processed by the application is stored in the corresponding target storage directory through the application in the first container group, the data processed by the applications in different first container groups is written into different directories, that is, into different files, thereby avoiding data loss or confusion and improving data processing reliability.

由此可见,本申请所提供的方案达到了将不同容器组中应用处理完成的数据写入不同目录的目的,从而实现了提高数据处理可靠性的技术效果,进而解决了相关技术中各个容器组中的应用将处理后的数据写入同一个文件中,容易导致数据丢失或错乱,从而存在数据处理可靠性差的技术问题。It can be seen that the solution provided by the present application achieves the purpose of writing the data processed by applications in different container groups into different directories, thereby achieving the technical effect of improving the reliability of data processing, and further solves the technical problem that the applications in each container group in the related technology write the processed data into the same file, which easily leads to data loss or confusion, thereby resulting in poor data processing reliability.

在一种可选的实施例中,在通过第一容器组中的应用依据容器组信息和匹配关系集合确定第一容器组匹配的存储目录,得到目标存储目录的过程中,第一容器组中的应用可以依据容器组信息,判断匹配关系集合中是否包括第一容器组对应的匹配关系,若包括第一容器组对应的匹配关系,则将第一容器组对应的匹配关系中的存储目录确定为目标存储目录,若不包括第一容器组对应的匹配关系,则确定第一存储目录,并将第一存储目录确定为目标存储目录,其中,第一存储目录是指未匹配有第一容器组的存储目录。In an optional embodiment, in the process of determining the storage directory matched by the first container group according to the container group information and the matching relationship set by the application in the first container group to obtain the target storage directory, the application in the first container group may determine whether the matching relationship set includes the matching relationship corresponding to the first container group according to the container group information; if the matching relationship corresponding to the first container group is included, the storage directory in the matching relationship corresponding to the first container group is determined as the target storage directory; if the matching relationship corresponding to the first container group is not included, the first storage directory is determined and determined as the target storage directory, wherein the first storage directory refers to a storage directory that is not matched with the first container group.

可选的,匹配关系集合可以基于第一容器组的容器组信息和存储目录的标识记录多个第一容器组与多个存储目录之间的匹配关系。上述的容器组信息可以是指容器组的标识。匹配关系集合可以是应用从第一存储区域中获取的,第一存储区域可以是云存储服务中的存储区域,也可以是分布式协调服务中的存储区域,还可以是其它类型的存储区域。Optionally, the matching relationship set may record the matching relationships between multiple first container groups and multiple storage directories based on the container group information of the first container group and the identifier of the storage directory. The above container group information may refer to the identifier of the container group. The matching relationship set may be obtained by the application from the first storage area, and the first storage area may be a storage area in a cloud storage service, a storage area in a distributed coordination service, or other types of storage areas.

可选的,对于任意一个第一容器组,该第一容器组中的应用可以判断匹配关系集合中是否包含该第一容器组的容器组信息,从而在匹配关系集合中包含该第一容器组的容器组信息的情况下,确定匹配关系集合中包括第一容器组对应的匹配关系,在匹配关系集合中不包含该第一容器组的容器组信息的情况下,确定匹配关系集合中不包括第一容器组对应的匹配关系。Optionally, for any first container group, an application in the first container group may determine whether the matching relationship set includes container group information of the first container group, thereby determining that the matching relationship set includes the matching relationship corresponding to the first container group when the matching relationship set includes the container group information of the first container group; and determining that the matching relationship set does not include the matching relationship corresponding to the first container group when the matching relationship set does not include the container group information of the first container group.

在一些实施例中,第一容器组的应用可以在启动完成的情况下,获取匹配关系集合,并依据容器组信息,判断匹配关系集合中是否包括该第一容器组对应的匹配关系。如果在此过程中,发现匹配关系集合中包括该第一容器组对应的匹配关系,说明该第一容器组中的应用异常重启过,因此,可以直接将该第一容器组对应的匹配关系中的存储目录确定为目标存储目录,并复用该目录。其中,第一容器组对应的匹配关系是指包含有该第一容器组的容器组信息的匹配关系。In some embodiments, the application of the first container group can obtain a matching relationship set when the startup is completed, and determine whether the matching relationship set includes the matching relationship corresponding to the first container group based on the container group information. If it is found in this process that the matching relationship set includes the matching relationship corresponding to the first container group, it means that the application in the first container group has been restarted abnormally. Therefore, the storage directory in the matching relationship corresponding to the first container group can be directly determined as the target storage directory, and the directory can be reused. The matching relationship corresponding to the first container group refers to the matching relationship that includes the container group information of the first container group.

在一些实施例中,若匹配关系集合中不包括第一容器组对应的匹配关系,则说明该第一容器组中的应用之前可能未处理过数据,比如,该第一容器组是一个新创建的第一容器组,在此情况下,该第一容器组中的应用可以确定第一存储目录,并将第一存储目录确定为目标存储目录。其中,第一存储目录是指未匹配有第一容器组的存储目录。In some embodiments, if the matching relationship set does not include the matching relationship corresponding to the first container group, it means that the application in the first container group may not have processed the data before. For example, the first container group is a newly created first container group. In this case, the application in the first container group can determine the first storage directory and determine the first storage directory as the target storage directory. The first storage directory refers to a storage directory that is not matched with the first container group.

在一些实施例中,各个存储目录被设置在第二存储区域下,第二存储区域与第一存储区域不同,第二存储区域可以是指分布式系统中的磁盘或其它类型的存储区域。In some embodiments, each storage directory is set in a second storage area, and the second storage area is different from the first storage area. The second storage area may refer to a disk or other types of storage areas in a distributed system.

例如,第一容器组中的应用可以确定在第二存储区域中已存在的存储目录,并从第二存储区域中已存在的存储目录中确定第一存储目录,从而将该第一存储目录确定为目标存储目录。又例如,第一容器组中的应用可以直接在第二存储区域中创建一个新的存储目录(也即创建第一存储目录),从而将该存储目录确定为目标存储目录。For example, the application in the first container group may determine the storage directory that already exists in the second storage area, and determine the first storage directory from the storage directories that already exist in the second storage area, thereby determining the first storage directory as the target storage directory. For another example, the application in the first container group may directly create a new storage directory (that is, create the first storage directory) in the second storage area, thereby determining the storage directory as the target storage directory.

需要说明的是,通过在包括第一容器组对应的匹配关系的情况下,则将匹配关系中的存储目录确定为目标存储目录,避免了直接生成新的存储目录进行使用,从而避免了目录数量的无限制增加,减少了碎片化文件、上传和维护成本。通过在不包括第一容器组对应的匹配关系的情况下,确定第一存储目录,并将第一存储目录确定为目标存储目录,保证了不同的第一容器组不会向同一文件中的写入数据,从而提高了数据处理的可靠性。It should be noted that, by determining the storage directory in the matching relationship as the target storage directory when the matching relationship corresponding to the first container group is included, it is avoided to directly generate a new storage directory for use, thereby avoiding the unlimited increase in the number of directories, reducing fragmented files, upload and maintenance costs. By determining the first storage directory and determining the first storage directory as the target storage directory when the matching relationship corresponding to the first container group is not included, it is ensured that different first container groups will not write data to the same file, thereby improving the reliability of data processing.

在一种可选的实施例中,在将第一存储目录确定为目标存储目录之后,第一容器组中的应用可以在匹配关系集合中建立第一容器组与第一存储目录之间的匹配关系。In an optional embodiment, after the first storage directory is determined as the target storage directory, the application in the first container group may establish a matching relationship between the first container group and the first storage directory in the matching relationship set.

例如,匹配关系集合存储在第一存储区域中,第一容器组中的应用可以在匹配关系集合中建立第一容器组与第一存储目录之间的匹配关系,得到更新后的匹配关系集合,并将更新后的匹配关系集合上传至第一存储区域,以替换旧的匹配关系集合。For example, the matching relationship set is stored in the first storage area, and the application in the first container group can establish a matching relationship between the first container group and the first storage directory in the matching relationship set, obtain an updated matching relationship set, and upload the updated matching relationship set to the first storage area to replace the old matching relationship set.

需要说明的是,通过上述过程,实现了在多个第一容器组之间共享匹配关系的更新信息,避免了未对匹配关系集合进行更新时,可能出现某一存储目录被多个第一容器组使用的现象发生,从而更近一步地提高了数据处理的可靠性。It should be noted that, through the above process, it is possible to share the update information of the matching relationship among multiple first container groups, thereby avoiding the phenomenon that a storage directory is used by multiple first container groups when the matching relationship set is not updated, thereby further improving the reliability of data processing.

在一种可选的实施例中,在确定第一存储目录的过程中,第一容器组中的应用可以获取预设的多个目录号,然后对于多个目录号中的第N个目录号,判断匹配关系集合中是否存在第二存储目录,从而若匹配关系集合中不存在第二存储目录,则将第二存储目录确定为第一存储目录,若匹配关系集合中存在第二存储目录,则更新N,并重复执行判断匹配关系集合中是否存在新的第二存储目录的步骤,直至匹配关系集合中不存在新的第二存储目录,将新的第二存储目录确定为第一存储目录。其中,第二存储目录为使用第N个目录号的存储目录。In an optional embodiment, in the process of determining the first storage directory, the application in the first container group can obtain multiple preset directory numbers, and then for the Nth directory number among the multiple directory numbers, determine whether there is a second storage directory in the matching relationship set, so that if there is no second storage directory in the matching relationship set, the second storage directory is determined as the first storage directory, and if there is a second storage directory in the matching relationship set, N is updated, and the step of determining whether there is a new second storage directory in the matching relationship set is repeated until there is no new second storage directory in the matching relationship set, and the new second storage directory is determined as the first storage directory. The second storage directory is a storage directory using the Nth directory number.

可选的,预设的多个目录号可以是存储在第一存储区域中,也可以是存储在第二存储区域中,还可以是存储在除第一存储区域和第二存储区域以外的区域中。Optionally, the preset multiple directory numbers may be stored in the first storage area, may be stored in the second storage area, or may be stored in an area other than the first storage area and the second storage area.

在一些实施例中,不同的目录号用于作为不同的存储目录的标识,即第一容器组中的应用可以根据目录号生成目录号所对应的存储目录。当存在目录号时,不代表一定存在该目录号对应的存储目录,例如,目录号为01-10,有10个目录号,那么存储目录的数量可能小于等于10。In some embodiments, different directory numbers are used as identifiers of different storage directories, that is, the application in the first container group can generate a storage directory corresponding to the directory number according to the directory number. When a directory number exists, it does not mean that there must be a storage directory corresponding to the directory number. For example, if the directory numbers are 01-10 and there are 10 directory numbers, then the number of storage directories may be less than or equal to 10.

可选的,当获取预设的多个目录号之后,第一容器组中的应用可以从多个目录号中的第N个目录号开始遍历,对于第N个目录,判断匹配关系集合中是否存在第二存储目录。若不存在,则确定该第二存储目录未被其它第一容器组占用,为可用的存储目录,因此可以将该第二存储目录确定为第一存储目录。其中,预设的多个目录号可以按照目录号大小排序,例如,假设目录号为01-10,那么01为第一个目录号,02为第二个目录号,并以此类推,可选的,预设的多个目录号也可以按照其它方式排序。Optionally, after obtaining the preset multiple directory numbers, the application in the first container group can start traversing from the Nth directory number among the multiple directory numbers, and for the Nth directory, determine whether there is a second storage directory in the matching relationship set. If not, it is determined that the second storage directory is not occupied by other first container groups and is an available storage directory, so the second storage directory can be determined as the first storage directory. Among them, the preset multiple directory numbers can be sorted according to the size of the directory numbers. For example, assuming that the directory numbers are 01-10, then 01 is the first directory number, 02 is the second directory number, and so on. Optionally, the preset multiple directory numbers can also be sorted in other ways.

若匹配关系集合中存在第二存储目录,则确定该第二存储目录已被其它第一容器组占用,在此情况下,当前的第一容器组中的应用可以更新N,例如,假设当前的N为1,则可以对N进行加一操作,得到N=2。然后重复执行判断匹配关系集合中是否存在新的第二存储目录的步骤,直至匹配关系集合中不存在新的第二存储目录,将新的第二存储目录确定为第一存储目录。其中,前述的新的第二存储目录是指使用新的第N个目录号的存储目录。If there is a second storage directory in the matching relationship set, it is determined that the second storage directory has been occupied by other first container groups. In this case, the application in the current first container group can update N. For example, assuming that the current N is 1, N can be added by one to obtain N=2. Then, the step of determining whether there is a new second storage directory in the matching relationship set is repeated until there is no new second storage directory in the matching relationship set, and the new second storage directory is determined as the first storage directory. The aforementioned new second storage directory refers to a storage directory using a new Nth directory number.

在一些实施例中,各个存储目录被设置在第二存储区域下,在将第二存储目录确定为第一存储目录的过程中,第一容器组中的应用可以判断第二存储目录中是否存在该第二存储目录,若存在,则直接将第二存储目录确定为第一存储目录,若不存在,则依据当前的第N个目录号在第二存储区域中创建第二存储目录,然后将第二存储目录确定为第一存储目录。In some embodiments, each storage directory is set under the second storage area. In the process of determining the second storage directory as the first storage directory, the application in the first container group can determine whether the second storage directory exists in the second storage directory. If it exists, the second storage directory is directly determined as the first storage directory. If it does not exist, a second storage directory is created in the second storage area according to the current Nth directory number, and then the second storage directory is determined as the first storage directory.

需要说明的是,通过依据目录号筛选未被其它第一容器组占用的存储目录,实现了对存储目录的有序筛选,从而提高了确定第一存储目录的效率和准确性,进而提高了数据处理的可靠性。It should be noted that by screening the storage directories not occupied by other first container groups according to the directory numbers, orderly screening of the storage directories is achieved, thereby improving the efficiency and accuracy of determining the first storage directories, and further improving the reliability of data processing.

在一种可选的实施例中,分布式系统中还包括第二容器组,第二容器组为运行有目录管理应用的容器组,分布式系统可以通过第二容器组中的应用确定分布式系统中已存在的存储目录,得到多个第三存储目录,对于每个第三存储目录,通过第二容器组中的应用判断匹配关系集合中是否包括第三存储目录,从而在包括第三存储目录的情况下,通过第二容器组中的应用判断第三存储目录在匹配关系集合中匹配的第一容器组是否处于运行状态,并在第三存储目录匹配的第一容器组未处于运行状态的情况下,通过第二容器组中的应用将第三存储目录对应的匹配关系从匹配关系集合中删除。In an optional embodiment, the distributed system further includes a second container group, where the second container group is a container group running a directory management application. The distributed system can determine the storage directories existing in the distributed system through the application in the second container group to obtain multiple third storage directories. For each third storage directory, the application in the second container group is used to determine whether the matching relationship set includes the third storage directory. If the third storage directory is included, the application in the second container group is used to determine whether the first container group matched by the third storage directory in the matching relationship set is in a running state. If the first container group matched by the third storage directory is not in a running state, the matching relationship corresponding to the third storage directory is deleted from the matching relationship set by the application in the second container group.

可选的,上述的目录管理应用用于管理本实施例中的存储目录,管理内容包括但不限于删除存储目录、对存储目录中的数据进行压缩处理、将存储目录中的数据上传至目标服务器、删除匹配关系集合中存储目录对应的匹配关系等。Optionally, the above-mentioned directory management application is used to manage the storage directory in this embodiment, and the management content includes but is not limited to deleting the storage directory, compressing the data in the storage directory, uploading the data in the storage directory to the target server, deleting the matching relationship corresponding to the storage directory in the matching relationship set, etc.

在一些实施例中,各个存储目录被设置在第二存储区域下,第二存储区域可以是分布式中磁盘。图3是根据本发明实施例的一种可选的第二容器组中的应用的工作示意图,如图3所示,第二容器组中的应用可以对磁盘(即第二存储区域)进行遍历,以确定分布式系统中已存在的存储目录,得到多个第三存储目录。In some embodiments, each storage directory is set under the second storage area, and the second storage area can be a distributed disk. Figure 3 is a schematic diagram of the operation of an application in an optional second container group according to an embodiment of the present invention. As shown in Figure 3, the application in the second container group can traverse the disk (i.e., the second storage area) to determine the storage directories that already exist in the distributed system and obtain multiple third storage directories.

可选的,在确定了多个第三存储目录之后,如图3所示,对于每个第三存储目录,第二容器组中的应用可以判断匹配关系集合中是否包括第三存储目录,例如,判断匹配关系集合中是否包括第三存储目录的目录标识,若包括该目录标识,则确定匹配关系集合中包括第三存储目录,若不包括该目录标识,则确定匹配关系集合中不包括第三存储目录。其中,如图3所示,在遍历得到多个第三存储目录之前,第二容器组中的应用可以获取匹配关系集合。Optionally, after determining multiple third storage directories, as shown in FIG3, for each third storage directory, the application in the second container group can determine whether the matching relationship set includes the third storage directory, for example, determine whether the matching relationship set includes the directory identifier of the third storage directory, if the directory identifier is included, determine that the matching relationship set includes the third storage directory, if the directory identifier is not included, determine that the matching relationship set does not include the third storage directory. As shown in FIG3, before traversing to obtain multiple third storage directories, the application in the second container group can obtain the matching relationship set.

可选的,在匹配关系集合中不包括第三存储目录的情况下,第二容器组中的应用可以删除该第三存储目录或者保留该第三存储目录。Optionally, when the matching relationship set does not include the third storage directory, the application in the second container group may delete the third storage directory or retain the third storage directory.

可选的,在匹配关系集合中包括第三存储目录的情况下,第二容器组中的应用判断第三存储目录在匹配关系集合中匹配的第一容器组是否处于运行状态。例如,如图3所示,在遍历得到多个第三存储目录之前,第二容器组中的应用可以获取所有处于运行状态的第一容器组的容器组信息,得到存活容器列表,从而根据该存活容器列表判断第三存储目录在匹配关系集合中匹配的第一容器组是否处于运行状态。例如,若第三存储目录匹配的第一容器组的容器组信息存在于存活容器列表,则确定该第一容器组处于运行状态,反之,若第三存储目录匹配的第一容器组的容器组信息未存在于存活容器列表,则确定该第一容器组未处于运行状态。Optionally, in the case where the matching relationship set includes the third storage directory, the application in the second container group determines whether the first container group matched by the third storage directory in the matching relationship set is in a running state. For example, as shown in FIG3, before traversing to obtain multiple third storage directories, the application in the second container group can obtain the container group information of all first container groups in a running state, obtain a list of surviving containers, and then determine whether the first container group matched by the third storage directory in the matching relationship set is in a running state based on the surviving container list. For example, if the container group information of the first container group matched by the third storage directory exists in the surviving container list, it is determined that the first container group is in a running state. Conversely, if the container group information of the first container group matched by the third storage directory does not exist in the surviving container list, it is determined that the first container group is not in a running state.

在一些实施例中,如图3所示,若第三存储目录匹配的第一容器组处于运行状态,则不对该第三存储目录进行处理。若第三存储目录匹配的第一容器组未处于运行状态,说明该第一容器组之前运行过,现在已删除,因此,如图3所示,第二容器组中的应用可以将该第三存储目录对应的匹配关系从匹配关系集合中删除,以便后续其它的第一容器组可以使用该第三存储目录。In some embodiments, as shown in FIG3 , if the first container group that matches the third storage directory is in a running state, the third storage directory is not processed. If the first container group that matches the third storage directory is not in a running state, it means that the first container group has been run before and is now deleted. Therefore, as shown in FIG3 , the application in the second container group can delete the matching relationship corresponding to the third storage directory from the matching relationship set, so that other first container groups can use the third storage directory later.

需要说明的是,通过上述过程,实现了不再工作的第一容器组所对应的存储目录的有效释放,避免了某些存储目录一直被已删除的第一容器组占用,导致分布式系统中的目录数量会随时间推移不断增加的现象发生,从而避免了对分布式系统资源的无效占用。It should be noted that, through the above process, the storage directory corresponding to the first container group that is no longer working is effectively released, which avoids the phenomenon that some storage directories are always occupied by the deleted first container group, resulting in the number of directories in the distributed system increasing over time, thereby avoiding the invalid occupation of distributed system resources.

在一种可选的实施例中,在第三存储目录匹配的第一容器组未处于运行状态的情况下,分布式系统可以通过第二容器组中的应用将第三存储目录下的数据上传至目标服务器,并通过第二容器组中的应用删除第三存储目录下的数据,并删除第三存储目录。In an optional embodiment, when the first container group matching the third storage directory is not in operation, the distributed system may upload the data under the third storage directory to the target server through the application in the second container group, delete the data under the third storage directory through the application in the second container group, and delete the third storage directory.

可选的,如图3所示,在第三存储目录匹配的第一容器组未处于运行状态的情况下,第二容器组中的应用可以对该第三存储目录下的目标文件进行压缩处理,得到压缩文件,然后将压缩文件上传至目标服务器。其中,目标服务器可以是云存储服务所在的服务器,也可以是其它服务。Optionally, as shown in FIG3 , when the first container group matching the third storage directory is not in operation, the application in the second container group can compress the target file under the third storage directory to obtain a compressed file, and then upload the compressed file to the target server. The target server can be the server where the cloud storage service is located, or other services.

可选的,如图3所示,在将压缩文件上传至目标服务器之后,第二容器组中的应用还可以删除第三存储目录下的数据,并删除第三存储目录。例如,从第二存储区域中删除第三存储目录。Optionally, as shown in Figure 3, after uploading the compressed file to the target server, the application in the second container group may also delete the data in the third storage directory and delete the third storage directory, for example, by deleting the third storage directory from the second storage area.

需要说明的是,通过上述过程,实现了对分布式系统中暂时不需要再使用的存储目录的有效清理,并实现了对该存储目录下的文件的有效备份,从而更进一步地避免了对分布式系统资源的无效占用。It should be noted that, through the above process, the storage directory that is temporarily no longer needed in the distributed system is effectively cleaned up, and the files under the storage directory are effectively backed up, thereby further avoiding the ineffective occupation of distributed system resources.

在一种可选的实施例中,在通过第二容器组中的应用判断匹配关系集合中是否包括第三存储目录之后,分布式系统可以在不包括第三存储目录的情况下,通过第二容器组中的应用获取预设的配置信息,并通过第二容器组中的应用依据配置信息确定对第三存储目录的处理方式。其中,配置信息表征是否允许删除匹配关系集合中未记录的第三存储目录。In an optional embodiment, after determining whether the matching relationship set includes the third storage directory through the application in the second container group, the distributed system can obtain preset configuration information through the application in the second container group when the third storage directory is not included, and determine the processing method for the third storage directory through the application in the second container group according to the configuration information. The configuration information indicates whether the deletion of the third storage directory not recorded in the matching relationship set is allowed.

在一些实施例中,分布式系统内可以预设有配置信息,该配置信息用于表征是否允许删除匹配关系集合中未记录的第三存储目录。In some embodiments, configuration information may be preset in the distributed system, and the configuration information is used to indicate whether to allow deletion of the third storage directory not recorded in the matching relationship set.

在匹配关系集合中包括第三存储目录的情况下,第二容器组中的应用可以获取该配置信息,从而依据配置信息确定对第三存储目录的处理方式。例如,如图3所示,若配置信息表征“删除匹配关系集合中未记录的第三存储目录”,则将该第三存储目录下的数据上传至目标服务器,并删除第三存储目录下的数据,以及删除第三存储目录、反之,若配置信息表征“不删除匹配关系集合中未记录的第三存储目录”,则保留该第三存储目录。In the case where the matching relationship set includes the third storage directory, the application in the second container group can obtain the configuration information, and thus determine the processing method for the third storage directory according to the configuration information. For example, as shown in FIG3 , if the configuration information indicates “delete the third storage directory not recorded in the matching relationship set”, the data under the third storage directory is uploaded to the target server, and the data under the third storage directory is deleted, and the third storage directory is deleted; on the contrary, if the configuration information indicates “do not delete the third storage directory not recorded in the matching relationship set”, the third storage directory is retained.

需要说明的是,通过上述过程,提高了对第三存储目录处理的灵活性,从而提高了本申请的适用性。It should be noted that, through the above process, the flexibility of processing the third storage directory is improved, thereby improving the applicability of the present application.

在一种可选的实施例中,图4是根据本发明实施例的一种可选的数据处理方法的示意图二,根据图4对本实施例中分布式系统的一种可选的工作过程进行说明。如图4所示,分布式系统包括多个第一容器组(即图4中Pod1、Pod2等)以及第二容器组(即图4中的管理Pod),Pod1、Pod2等第一容器组中的应用分别从第一存储区域中获取匹配关系集合,匹配关系集合中包括的匹配关系即为图4中示出的“01-Pod name1”,“02-Pod name2”,“03-Podname8”等。其中,“01-Pod name1”表示01号存储目录与Pod1存在匹配关系,其它匹配关系以此类推,故此处不再赘述。In an optional embodiment, FIG4 is a schematic diagram of an optional data processing method according to an embodiment of the present invention. An optional working process of the distributed system in this embodiment is described according to FIG4. As shown in FIG4, the distributed system includes multiple first container groups (i.e., Pod1, Pod2, etc. in FIG4) and a second container group (i.e., the management Pod in FIG4). The applications in the first container groups such as Pod1 and Pod2 respectively obtain a matching relationship set from the first storage area. The matching relationships included in the matching relationship set are "01-Pod name1", "02-Pod name2", "03-Podname8", etc. shown in FIG4. Among them, "01-Pod name1" indicates that there is a matching relationship between storage directory No. 01 and Pod1, and other matching relationships are similar, so they are not repeated here.

进一步地,如图4所示,Pod1、Pod2中的应用分别可以根据容器组信息和匹配关系集合确定其匹配的存储目录,得到目标存储目录,例如,在图4中,Pod1对应的目标存储目录为01号存储目录,Pod2对应的目标存储目录为02号存储目录。之后,Pod1、Pod2中的应用分别可以将其处理完成的数据存储至第二存储区域中对应的目标存储目录中的目标文件(即图4中的“data.log”)。Further, as shown in FIG4, the applications in Pod1 and Pod2 can respectively determine their matching storage directories according to the container group information and the matching relationship set to obtain the target storage directory. For example, in FIG4, the target storage directory corresponding to Pod1 is storage directory No. 01, and the target storage directory corresponding to Pod2 is storage directory No. 02. Afterwards, the applications in Pod1 and Pod2 can respectively store the processed data in the target files (i.e., "data.log" in FIG4) in the corresponding target storage directories in the second storage area.

可选的,如图4所示,分布式系统中的第二容器组(即图4中的管理Pod)中的应用可以对匹配关系集合进行管理,例如,将停止工作或异常的第一容器组对应的存储目录所对应的匹配关系从匹配关系集合中删除。该第二容器组还可以对需要删除的存储目录中的数据上传至目标服务器。Optionally, as shown in FIG4 , an application in the second container group (i.e., the management Pod in FIG4 ) in the distributed system can manage the matching relationship set, for example, deleting the matching relationship corresponding to the storage directory corresponding to the first container group that has stopped working or is abnormal from the matching relationship set. The second container group can also upload the data in the storage directory that needs to be deleted to the target server.

在一种可选的实施例中,图5是根据本发明实施例的一种可选的第一容器组中的应用的工作示意图,根据图5对第一容器组中的应用确定目标存储目录的过程进行说明。如图5所示,对于每个第一容器组,该第一容器组中的应用获取当前的第一容器组的容器组信息,然后判断匹配关系集合中是否包括第一容器组对应的匹配关系,若包括,则直接复用该匹配关系中的存储目录,也即将匹配关系中的存储目录确定为目标存储目录。In an optional embodiment, Fig. 5 is a schematic diagram of the operation of an application in an optional first container group according to an embodiment of the present invention, and the process of determining the target storage directory by the application in the first container group is described according to Fig. 5. As shown in Fig. 5, for each first container group, the application in the first container group obtains the container group information of the current first container group, and then determines whether the matching relationship set includes the matching relationship corresponding to the first container group, and if it does, directly reuses the storage directory in the matching relationship, that is, determines the storage directory in the matching relationship as the target storage directory.

如图4所示,若匹配关系集合中不包括第一容器组对应的匹配关系,则该第一容器组中的应用获取预设的多个目录号,并依次遍历目录号。对于第N个目录号,判断匹配关系集合中是否存在使用该第N个目录号的存储目录(也即判断是否存在第二存储目录),若不存在,则将该存储目录确定为目标存储目录,并在匹配关系集合中建立对应的匹配关系,若匹配关系集合中存在使用该第N个目录号的存储目录,则继续遍历,直至匹配关系集合中不存在使用新的第N个目录号的存储目录,从而将该存储目录确定为目标存储目录。As shown in FIG4 , if the matching relationship set does not include the matching relationship corresponding to the first container group, the application in the first container group obtains multiple preset directory numbers and traverses the directory numbers in sequence. For the Nth directory number, it is determined whether there is a storage directory using the Nth directory number in the matching relationship set (that is, it is determined whether there is a second storage directory). If not, the storage directory is determined as the target storage directory, and a corresponding matching relationship is established in the matching relationship set. If there is a storage directory using the Nth directory number in the matching relationship set, the traversal is continued until there is no storage directory using the new Nth directory number in the matching relationship set, thereby determining the storage directory as the target storage directory.

由此可见,本申请所提供的方案达到了将不同容器组中应用处理完成的数据写入不同目录的目的,从而实现了提高数据处理可靠性的技术效果,进而解决了相关技术中各个容器组中的应用将处理后的数据写入同一个文件中,容易导致数据丢失或错乱,从而存在数据处理可靠性差的技术问题。It can be seen that the solution provided by the present application achieves the purpose of writing the data processed by applications in different container groups into different directories, thereby achieving the technical effect of improving the reliability of data processing, and further solves the technical problem that the applications in each container group in the related technology write the processed data into the same file, which easily leads to data loss or confusion, thereby resulting in poor data processing reliability.

实施例2Example 2

根据本发明实施例,提供了一种数据处理装置的实施例,其中,图6是根据本发明实施例的一种可选的数据处理装置的示意图,如图6所示,该装置应用于分布式系统,分布式系统中包括多个第一容器组,第一容器组为运行有数据处理应用的容器组,该装置包括:According to an embodiment of the present invention, an embodiment of a data processing device is provided, wherein FIG6 is a schematic diagram of an optional data processing device according to an embodiment of the present invention. As shown in FIG6 , the device is applied to a distributed system, and the distributed system includes a plurality of first container groups, and the first container group is a container group running a data processing application. The device includes:

第一获取模块601,用于通过第一容器组中的应用获取匹配关系集合以及第一容器组的容器组信息,其中,匹配关系集合中包括多个第一容器组与多个存储目录之间的匹配关系,不同的第一容器组与不同的存储目录匹配;A first acquisition module 601 is used to acquire a matching relationship set and container group information of the first container group through an application in the first container group, wherein the matching relationship set includes matching relationships between multiple first container groups and multiple storage directories, and different first container groups match different storage directories;

第一确定模块602,用于通过第一容器组中的应用依据容器组信息和匹配关系集合确定第一容器组匹配的存储目录,得到目标存储目录;A first determination module 602 is used to determine the storage directory matched by the first container group according to the container group information and the matching relationship set through the application in the first container group, and obtain the target storage directory;

存储模块603,用于通过第一容器组中的应用将该应用处理完成的数据存储至目标存储目录。The storage module 603 is used to store the data processed by the application in the target storage directory through the application in the first container group.

需要说明的是,上述第一获取模块601、第一确定模块602以及存储模块603对应于上述实施例中的步骤S201至步骤S203,三个模块与对应的步骤所实现的示例和应用场景相同,但不限于上述实施例1所公开的内容。It should be noted that the above-mentioned first acquisition module 601, first determination module 602 and storage module 603 correspond to steps S201 to S203 in the above-mentioned embodiment. The examples and application scenarios implemented by the three modules and the corresponding steps are the same, but are not limited to the contents disclosed in the above-mentioned embodiment 1.

可选的,第一确定模块602还包括:判断子模块,用于依据容器组信息,判断匹配关系集合中是否包括第一容器组对应的匹配关系;第一确定子模块,用于若包括第一容器组对应的匹配关系,则将第一容器组对应的匹配关系中的存储目录确定为目标存储目录;第二确定子模块,用于若不包括第一容器组对应的匹配关系,则确定第一存储目录,并将第一存储目录确定为目标存储目录,其中,第一存储目录是指未匹配有第一容器组的存储目录。Optionally, the first determination module 602 further includes: a judgment submodule, used to judge whether the matching relationship set includes the matching relationship corresponding to the first container group according to the container group information; a first determination submodule, used to determine the storage directory in the matching relationship corresponding to the first container group as the target storage directory if the matching relationship corresponding to the first container group is included; and a second determination submodule, used to determine the first storage directory if the matching relationship corresponding to the first container group is not included, and determine the first storage directory as the target storage directory, wherein the first storage directory refers to a storage directory that is not matched with the first container group.

可选的,数据处理装置还包括:第一处理模块,用于在匹配关系集合中建立第一容器组与第一存储目录之间的匹配关系。Optionally, the data processing device further includes: a first processing module, configured to establish a matching relationship between the first container group and the first storage directory in the matching relationship set.

可选的,第二确定子模块还包括:获取单元,用于获取预设的多个目录号;判断单元,用于对于多个目录号中的第N个目录号,判断匹配关系集合中是否存在第二存储目录,其中,第二存储目录为使用第N个目录号的存储目录;第一确定单元,用于若匹配关系集合中不存在第二存储目录,则将第二存储目录确定为第一存储目录;第二确定单元,用于若匹配关系集合中存在第二存储目录,则更新N,并重复执行判断匹配关系集合中是否存在新的第二存储目录的步骤,直至匹配关系集合中不存在新的第二存储目录,将新的第二存储目录确定为第一存储目录。Optionally, the second determination submodule also includes: an acquisition unit, used to acquire a preset plurality of directory numbers; a judgment unit, used to judge whether there is a second storage directory in the matching relationship set for the Nth directory number among the plurality of directory numbers, wherein the second storage directory is a storage directory using the Nth directory number; a first determination unit, used to determine the second storage directory as the first storage directory if there is no second storage directory in the matching relationship set; and a second determination unit, used to update N if there is a second storage directory in the matching relationship set, and repeatedly perform the step of judging whether there is a new second storage directory in the matching relationship set, until there is no new second storage directory in the matching relationship set, and the new second storage directory is determined as the first storage directory.

可选的,数据处理装置还包括:第二确定模块,用于通过第二容器组中的应用确定分布式系统中已存在的存储目录,得到多个第三存储目录;第一判断模块,用于对于每个第三存储目录,通过第二容器组中的应用判断匹配关系集合中是否包括第三存储目录;第二判断模块,用于在包括第三存储目录的情况下,通过第二容器组中的应用判断第三存储目录在匹配关系集合中匹配的第一容器组是否处于运行状态;第二处理模块,用于在第三存储目录匹配的第一容器组未处于运行状态的情况下,通过第二容器组中的应用将第三存储目录对应的匹配关系从匹配关系集合中删除。Optionally, the data processing device further includes: a second determination module, used to determine, through the application in the second container group, a storage directory that already exists in the distributed system to obtain multiple third storage directories; a first judgment module, used to determine, for each third storage directory, through the application in the second container group, whether the third storage directory is included in the matching relationship set; a second judgment module, used to determine, through the application in the second container group, whether the first container group matched by the third storage directory in the matching relationship set is in a running state when the third storage directory is included; and a second processing module, used to delete, through the application in the second container group, the matching relationship corresponding to the third storage directory from the matching relationship set when the first container group matched by the third storage directory is not in a running state.

可选的,数据处理装置还包括:第三处理模块,用于通过第二容器组中的应用将第三存储目录下的数据上传至目标服务器;第四处理模块,用于通过第二容器组中的应用删除第三存储目录下的数据,并删除第三存储目录。Optionally, the data processing device also includes: a third processing module, used to upload the data under the third storage directory to the target server through the application in the second container group; a fourth processing module, used to delete the data under the third storage directory through the application in the second container group, and delete the third storage directory.

可选的,数据处理装置还包括:第二获取模块,用于在不包括第三存储目录的情况下,通过第二容器组中的应用获取预设的配置信息,其中,配置信息表征是否允许删除匹配关系集合中未记录的第三存储目录;第三确定模块,用于通过第二容器组中的应用依据配置信息确定对第三存储目录的处理方式。Optionally, the data processing device also includes: a second acquisition module, used to obtain preset configuration information through the application in the second container group without including the third storage directory, wherein the configuration information indicates whether deletion of the third storage directory not recorded in the matching relationship set is allowed; and a third determination module, used to determine a processing method for the third storage directory according to the configuration information through the application in the second container group.

实施例3Example 3

根据本发明实施例的另一方面,还提供了计算机可读存储介质,计算机可读存储介质中存储有计算机程序,其中,计算机程序被设置为运行时执行上述的数据处理方法。According to another aspect of an embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned data processing method when running.

实施例4Example 4

根据本发明实施例的另一方面,还提供了一种电子设备,其中,图7是根据本发明实施例的一种可选的电子设备的示意图,如图7所示,电子设备包括一个或多个处理器;存储器,用于存储一个或多个程序,当一个或多个程序被一个或多个处理器执行时,使得一个或多个处理器实现用于运行程序,其中,程序被设置为运行时执行上述的数据处理方法。According to another aspect of an embodiment of the present invention, an electronic device is further provided, wherein Figure 7 is a schematic diagram of an optional electronic device according to an embodiment of the present invention. As shown in Figure 7, the electronic device includes one or more processors; a memory for storing one or more programs, which, when the one or more programs are executed by one or more processors, enables the one or more processors to run the programs, wherein the programs are configured to execute the above-mentioned data processing method when running.

上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the above embodiments of the present invention are only for description and do not represent the advantages or disadvantages of the embodiments.

在本发明的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above embodiments of the present invention, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

在本申请所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如单元的划分,可以为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. Among them, the device embodiments described above are only schematic. For example, the division of units can be a logical function division. There may be other division methods in actual implementation. For example, multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of units or modules, which can be electrical or other forms.

作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed over multiple units. Some or all of the units may be selected according to actual needs to achieve the purpose of the present embodiment.

另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.

集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, in essence, or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including a number of instructions for a computer device (which can be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the methods of each embodiment of the present invention. The aforementioned storage medium includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.

以上仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。The above are only preferred embodiments of the present invention. It should be pointed out that, for ordinary technicians in this technical field, several improvements and modifications can be made without departing from the principles of the present invention. These improvements and modifications should also be regarded as the scope of protection of the present invention.

Claims (10)

1. A data processing method, applied to a distributed system, the distributed system including a plurality of first container groups, the first container groups being container groups running data processing applications, the method comprising:
Acquiring a matching relation set and container group information of the first container group through an application in the first container group, wherein the matching relation set comprises matching relations between a plurality of first container groups and a plurality of storage catalogs, and different first container groups are matched with different storage catalogs;
determining a storage catalog matched with the first container group according to the container group information and the matching relation set by using the application in the first container group to obtain a target storage catalog;
and storing the data processed by the application in the first container group to the target storage catalog.
2. The method of claim 1, wherein determining, by the application in the first container group, a storage directory to which the first container group matches according to the container group information and the set of matching relationships, to obtain a target storage directory, comprises:
judging whether the matching relation set comprises the matching relation corresponding to the first container group according to the container group information;
If the matching relation corresponding to the first container group is included, determining a storage directory in the matching relation corresponding to the first container group as the target storage directory;
And if the matching relation corresponding to the first container group is not included, determining a first storage directory, and determining the first storage directory as the target storage directory, wherein the first storage directory refers to a storage directory which is not matched with the first container group.
3. The method of claim 2, wherein after determining the first storage directory as the target storage directory, the method further comprises:
And establishing a matching relation between the first container group and the first storage catalog in the matching relation set.
4. The method of claim 2, wherein determining the first storage directory comprises:
Acquiring a plurality of preset directory numbers;
Judging whether a second storage directory exists in the matching relation set for an Nth directory number in the plurality of directory numbers, wherein the second storage directory is a storage directory using the Nth directory number;
If the second storage catalogue does not exist in the matching relation set, determining the second storage catalogue as the first storage catalogue;
If the second storage catalogue exists in the matching relation set, updating the N, and repeatedly executing the step of judging whether a new second storage catalogue exists in the matching relation set or not until the new second storage catalogue does not exist in the matching relation set, and determining the new second storage catalogue as the first storage catalogue.
5. The method of claim 1, further comprising a second set of containers in the distributed system, the second set of containers being a set of containers running a catalog management application, the method further comprising:
Determining the existing storage catalogs in the distributed system through the application in the second container group to obtain a plurality of third storage catalogs;
Judging whether the matching relation set comprises the third storage catalogue or not through the application in the second container group for each third storage catalogue;
judging whether a first container group matched with the third storage catalog in the matching relation set is in an operation state or not through an application in the second container group under the condition that the third storage catalog is included;
and deleting the matching relation corresponding to the third storage catalog from the matching relation set through the application in the second container group under the condition that the first container group matched with the third storage catalog is not in an operation state.
6. The method of claim 5, wherein in the event that the third storage directory matched first container group is not in an operational state, the method further comprises:
uploading the data under the third storage directory to a target server through the application in the second container group;
Deleting the data under the third storage directory by the application in the second container group, and deleting the third storage directory.
7. The method of claim 5, wherein after determining, by the application in the second container group, whether the third storage directory is included in the set of matching relationships, the method further comprises:
Acquiring preset configuration information through an application in the second container group under the condition that the third storage catalogue is not included, wherein the configuration information represents whether deletion of the third storage catalogue which is not recorded in the matching relation set is allowed or not;
And determining the processing mode of the third storage catalogue according to the configuration information through the application in the second container group.
8. A data processing apparatus for use in a distributed system comprising a plurality of first container groups, the first container groups being container groups having data processing applications running thereon, the apparatus comprising:
the first acquisition module is used for acquiring a matching relation set and container group information of the first container group through an application in the first container group, wherein the matching relation set comprises matching relations between a plurality of first container groups and a plurality of storage catalogs, and different first container groups are matched with different storage catalogs;
the first determining module is used for determining a storage catalog matched with the first container group according to the container group information and the matching relation set through an application in the first container group to obtain a target storage catalog;
And the storage module is used for storing the data processed by the application in the first container group to the target storage catalog.
9. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program, wherein the computer program is arranged to execute the data processing method of any of the claims 1 to 7 when run.
10. An electronic device, the electronic device comprising one or more processors; a memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement a method for running a program, wherein the program is configured to perform the data processing method of any of claims 1 to 7 when run.
CN202410840016.7A 2024-06-26 2024-06-26 Data processing method, device, computer readable storage medium and electronic device Pending CN118796355A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410840016.7A CN118796355A (en) 2024-06-26 2024-06-26 Data processing method, device, computer readable storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410840016.7A CN118796355A (en) 2024-06-26 2024-06-26 Data processing method, device, computer readable storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN118796355A true CN118796355A (en) 2024-10-18

Family

ID=93031028

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410840016.7A Pending CN118796355A (en) 2024-06-26 2024-06-26 Data processing method, device, computer readable storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN118796355A (en)

Similar Documents

Publication Publication Date Title
US7805469B1 (en) Method and apparatus for splitting and merging file systems
JP2019220195A (en) System and method for implementing data storage service
CN112256359A (en) Microservice merging method, apparatus, electronic device and readable storage medium
CN110659259B (en) Database migration method, server and computer storage medium
US10318387B1 (en) Automated charge backup modelling
CN110956269A (en) Data model generation method, device, equipment and computer storage medium
US11150981B2 (en) Fast recovery from failures in a chronologically ordered log-structured key-value storage system
CN110825694A (en) Data processing method, device, equipment and storage medium
CN1552022A (en) File archival
CN111684437B (en) Staggered update key-value storage system ordered by time sequence
CN113157645A (en) Cluster data migration method, device, equipment and storage medium
US9176974B1 (en) Low priority, multi-pass, server file discovery and management
WO2025055766A1 (en) Database recovery method and apparatus, storage medium, and electronic device
CN117991987A (en) Method and device for processing orphan nodes, operating system and electronic equipment
CN118796355A (en) Data processing method, device, computer readable storage medium and electronic device
US11163636B2 (en) Chronologically ordered log-structured key-value store from failures during garbage collection
CN115454491A (en) Version deployment method and related device
CN111796972B (en) File hot-repair method, device, equipment and storage medium
CN114546731A (en) Workflow data recovery method and data recovery system
CN114415950A (en) Storage space allocation method and device
CN112988694A (en) Operation and maintenance method and device for batch management of network file systems by centralized management platform
CN113419743B (en) Comprehensive application script deployment method, device, equipment and storage medium
CN116662303B (en) Application change strategy generation method and device
CN110750259B (en) Component processing method and device
CN118796773A (en) A storage management method, device, storage management equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination