CN118689630A

CN118689630A - A GPU service tidal deployment method and device based on K8s

Info

Publication number: CN118689630A
Application number: CN202410663011.1A
Authority: CN
Inventors: 吕亚霖; 张浩然
Original assignee: Beijing Baige Feichi Technology Co ltd
Current assignee: Beijing Baige Feichi Technology Co ltd
Priority date: 2024-05-27
Filing date: 2024-05-27
Publication date: 2024-09-24

Abstract

The invention discloses a K8 s-based GPU service tide deployment method and device, wherein the K8 s-based GPU service tide deployment method comprises the following steps: deploying a main control node tidal service tide-master, a working node tidal service tide-worker and a gateway tidal service tide-gateway based on a K8s platform; defining operation information of the GPU service instance by defining CRD through the custom resource; the master control node tidal service tide-master monitors user-defined resource definition CRD and GPU resource information, and all tide-worker nodes of the working node tidal service tide-worker are initialized and started according to the GPU resource information; the master node tidal service tide-master dynamically adjusts the states and flow allocation policies of the working node tidal service tide-worker and the gateway tidal service tide-gateway according to actual GPU resource conditions, GPU use conditions of models in each tide-worker node and flow conditions of the gateway tidal service tide-gateway. According to the technical scheme, the system is timely adjusted according to the model requirements and flow changes, the system is guaranteed to be in an optimal state, and the execution speed and the execution effect of deep learning and artificial intelligence application are improved.

Description

A GPU service tidal deployment method and device based on K8s

技术领域Technical Field

本发明涉及分布式系统和资源管理的技术领域，具体提供一种基于K8s的GPU服务潮汐部署方法及装置。The present invention relates to the technical field of distributed systems and resource management, and specifically provides a GPU service tidal deployment method and device based on K8s.

背景技术Background Art

在当今人工智能和深度学习应用中，使用GPU进行计算是提高模型训练和推理性能的关键因素之一。然而，GPU资源的管理和调度对于分布式GPU服务来说是一个挑战。当前，Kubernetes已成为容器化应用管理的事实标准，但它在GPU资源管理方面仍面临一些挑战，例如静态绑定、资源利用率低下和扩缩容的不足。In today's artificial intelligence and deep learning applications, using GPUs for computing is one of the key factors in improving model training and inference performance. However, the management and scheduling of GPU resources is a challenge for distributed GPU services. Currently, Kubernetes has become the de facto standard for containerized application management, but it still faces some challenges in GPU resource management, such as static binding, low resource utilization, and insufficient scaling.

在传统的GPU资源管理中，通常会将推理任务和训练任务分开处理。推理任务一般具有较低的计算复杂度和内存需求，而训练任务则需要更多的计算和内存资源。为了提高GPU资源利用率，现有技术会对推理任务进行静态部署和扩缩容管理，而训练任务则采用单独的资源池管理。In traditional GPU resource management, inference tasks and training tasks are usually handled separately. Inference tasks generally have lower computational complexity and memory requirements, while training tasks require more computing and memory resources. In order to improve GPU resource utilization, existing technologies statically deploy and scale inference tasks, while training tasks are managed using a separate resource pool.

在现有技术中，GPU服务的静态分配和绑定是常见的方法，但存在以下缺点：In the prior art, static allocation and binding of GPU services is a common method, but it has the following disadvantages:

静态绑定：静态分配和绑定无法满足动态服务需求，导致资源利用率低下。由于模型和流量的变化，有些GPU资源可能一直处于闲置状态，而另一些GPU资源可能过载从而导致性能下降。因此，静态绑定导致资源利用率低下和性能下降。Static binding: Static allocation and binding cannot meet dynamic service requirements, resulting in low resource utilization. Due to changes in models and traffic, some GPU resources may remain idle, while others may be overloaded, resulting in performance degradation. Therefore, static binding leads to low resource utilization and performance degradation.

扩缩容：现有技术在GPU服务的动态扩缩容方面存在不足。静态绑定中，往往是通过手动或者定时的主动方式进行扩缩，或者是通过GPU负载、日志这种结果性质的观测手段来做决策，而且往往是pod级别的扩缩，pod级别的扩缩会附带很多额外操作，如：nfs的挂载卸载或者模型的下载清理，都是在pod启动和回收过程中同步进行，影响pod的拉起和回收速度，而频繁的扩缩带来的文件系统频繁挂载卸载可能引发操作系统异常。Scaling: Existing technologies are insufficient in the dynamic scaling of GPU services. In static binding, scaling is often done manually or in a scheduled manner, or through observation methods such as GPU load and logs to make decisions. Moreover, scaling is often done at the pod level, which comes with many additional operations, such as mounting and unmounting nfs or downloading and cleaning models, which are all performed simultaneously during the pod startup and recycling process, affecting the pod startup and recycling speed. Frequent mounting and unmounting of the file system caused by frequent scaling may cause operating system abnormalities.

资源利用率低下：由于推理和训练任务被分开处理，导致GPU资源在某个时刻存在闲置或过载的情况，无法充分利用资源。Low resource utilization: Because inference and training tasks are processed separately, GPU resources may be idle or overloaded at some point, and resources cannot be fully utilized.

由此可知，传统静态部署方式往往不能有效地利用GPU资源，导致资源浪费或者资源不足的情况。因此，本发明旨在解决在动态GPU场景下，如何自动化和智能化地管理和调度GPU资源，以满足不断变化的模型需求和流量情况。It can be seen that the traditional static deployment method often cannot effectively utilize GPU resources, resulting in resource waste or insufficient resources. Therefore, the present invention aims to solve how to automatically and intelligently manage and schedule GPU resources in a dynamic GPU scenario to meet the ever-changing model requirements and traffic conditions.

发明内容Summary of the invention

针对以上技术问题，本发明提出一种基于K8s的GPU服务潮汐部署方法及装置，以解决在动态GPU场景下资源管理和部署优化的问题，提高系统的效率和用户体验。In view of the above technical problems, the present invention proposes a GPU service tidal deployment method and device based on K8s to solve the problems of resource management and deployment optimization in dynamic GPU scenarios and improve the efficiency of the system and user experience.

具体地，采用了如下技术方案：Specifically, the following technical solutions are adopted:

在第一方面，本发明提供一种基于K8s的GPU服务潮汐部署方法，包括：In a first aspect, the present invention provides a GPU service tidal deployment method based on K8s, comprising:

基于K8s平台部署主控节点潮汐服务tide-master、工作节点潮汐服务tide-worker及网关潮汐服务tide-gateway；Deploy the master node tide service tide-master, the worker node tide service tide-worker, and the gateway tide service tide-gateway based on the K8s platform;

通过自定义资源定义CRD定义GPU服务实例的运行信息；Define the running information of the GPU service instance through the custom resource definition CRD;

所述主控节点潮汐服务tide-master监听自定义资源定义CRD和GPU资源信息，根据GPU资源信息初始化启动工作节点潮汐服务tide-worker的所有tide-worker节点；The master control node tide service tide-master monitors the custom resource definition CRD and GPU resource information, and initializes all tide-worker nodes of the starting working node tide service tide-worker according to the GPU resource information;

所述主控节点潮汐服务tide-master根据实际GPU资源情况、各tide-worker节点中模型的GPU使用情况和网关潮汐服务tide-gateway的流量情况，动态调整工作节点潮汐服务tide-worker和网关潮汐服务tide-gateway的状态和流量分配策略。The master control node tide service tide-master dynamically adjusts the status and traffic allocation strategy of the working node tide service tide-worker and the gateway tide service tide-gateway according to the actual GPU resource situation, the GPU usage of the model in each tide-worker node and the traffic situation of the gateway tide service tide-gateway.

作为本发明的可选实施方式，本发明的一种基于K8s的GPU服务潮汐部署方法中，所述主控节点潮汐服务tide-master监听自定义资源定义CRD和GPU资源信息，根据GPU资源信息初始化启动工作节点潮汐服务tide-worker的所有tide-worker节点包括：As an optional implementation of the present invention, in a GPU service tide deployment method based on K8s of the present invention, the master control node tide service tide-master listens to the custom resource definition CRD and GPU resource information, and initializes all tide-worker nodes of the starting working node tide service tide-worker according to the GPU resource information, including:

启动主控节点潮汐服务tide-master、工作节点潮汐服务tide-worker及网关潮汐服务tide-gateway；Start the master node tide service tide-master, the worker node tide service tide-worker and the gateway tide service tide-gateway;

所述主控节点潮汐服务tide-master监听自定义资源定义CRD和GPU资源信息，并根据GPU资源信息启动工作节点潮汐服务tide-worker的所有tide-worker节点，所述tide-worker节点在启动后就会异步加载CRD中的所有模型；The master control node tide service tide-master listens to the custom resource definition CRD and GPU resource information, and starts all tide-worker nodes of the working node tide service tide-worker according to the GPU resource information. After the tide-worker node is started, it will asynchronously load all models in the CRD;

所述工作节点潮汐服务tide-worker轮询主控节点潮汐服务tide-master获取需要运行的模型信息或者训练任务；The working node tide service tide-worker polls the master control node tide service tide-master to obtain the model information or training tasks that need to be run;

所述网关潮汐服务tide-gateway监听主控节点潮汐服务tide-master获取模型对应的PodIP和模型的对应关系信息。The gateway tide service tide-gateway monitors the master control node tide service tide-master to obtain the PodIP corresponding to the model and the corresponding relationship information of the model.

作为本发明的可选实施方式，本发明的一种基于K8s的GPU服务潮汐部署方法中，所述主控节点潮汐服务tide-master根据实际资源情况、各tide-worker节点中模型的GPU使用情况和网关潮汐服务tide-gateway的流量情况，动态调整工作节点潮汐服务tide-worker和网关潮汐服务tide-gateway的状态和流量分配策略包括：As an optional embodiment of the present invention, in a GPU service tide deployment method based on K8s of the present invention, the master control node tide service tide-master dynamically adjusts the status and traffic allocation strategy of the working node tide service tide-worker and the gateway tide service tide-gateway according to the actual resource situation, the GPU usage of the model in each tide-worker node and the traffic situation of the gateway tide service tide-gateway, including:

所述主控节点潮汐服务tide-master定期拉取网关潮汐服务tide-gateway的流量信息和工作节点潮汐服务tide-worker下模型的错误率和GPU使用率；The master control node tide service tide-master regularly pulls the traffic information of the gateway tide service tide-gateway and the error rate and GPU usage rate of the model under the working node tide service tide-worker;

所述主控节点潮汐服务tide-master根据阈值判断是否需要调整工作节点潮汐服务tide-worker中tide-worker节点和模型的关联关系、或者网关潮汐服务tide-gateway的Pod数量。The master control node tide service tide-master determines whether it is necessary to adjust the association relationship between the tide-worker node and the model in the working node tide service tide-worker, or the number of Pods of the gateway tide service tide-gateway according to the threshold.

作为本发明的可选实施方式，本发明的一种基于K8s的GPU服务潮汐部署方法中，所述主控节点潮汐服务tide-master根据阈值判断是否需要调整工作节点潮汐服务tide-worker中tide-worker节点和模型的关联关系、或者网关潮汐服务tide-gateway的Pod数量包括：As an optional implementation of the present invention, in a GPU service tide deployment method based on K8s of the present invention, the master control node tide service tide-master determines whether it is necessary to adjust the association relationship between the tide-worker node and the model in the working node tide service tide-worker, or the number of Pods of the gateway tide service tide-gateway according to the threshold value, including:

若所述主控节点潮汐服务tide-master定期拉取网关潮汐服务tide-gateway的流量大于预设流量阈值；If the traffic that the master node tide service tide-master regularly pulls from the gateway tide service tide-gateway is greater than the preset traffic threshold;

或者，所述主控节点潮汐服务tide-master定期拉取工作节点潮汐服务tide-worker下模型的错误率大于预设错误率；Alternatively, the error rate of the model under the working node tide service tide-worker periodically pulled by the master control node tide service tide-master is greater than the preset error rate;

或者，所述主控节点潮汐服务tide-master定期拉取工作节点潮汐服务tide-worker下模型的GPU使用率大于预设GPU使用率；Alternatively, the GPU usage rate of the model under the working node tide service tide-worker that is periodically pulled by the master control node tide service tide-master is greater than the preset GPU usage rate;

则主控节点潮汐服务tide-master判定需要调整工作节点潮汐服务tide-worker中tide-worker节点和模型的关联关系、或者网关潮汐服务tide-gateway的Pod数量。The master node tide service tide-master determines that it is necessary to adjust the association between the tide-worker node and the model in the working node tide service tide-worker, or the number of Pods in the gateway tide service tide-gateway.

作为本发明的可选实施方式，本发明的一种基于K8s的GPU服务潮汐部署方法，包括：As an optional implementation of the present invention, a GPU service tidal deployment method based on K8s of the present invention includes:

所述工作节点潮汐服务tide-worker轮询所述主控节点潮汐服务tide-master获取模型信息或者训练任务；The working node tide service tide-worker polls the master control node tide service tide-master to obtain model information or training tasks;

当感知到所述工作节点潮汐服务tide-worker中tide-worker节点和模型的关联关系被调整时，所述工作节点潮汐服务tide-worker关闭原先运行的模型，切换为调整后的模型。When it is perceived that the association between the tide-worker node and the model in the working node tide service tide-worker is adjusted, the working node tide service tide-worker closes the originally running model and switches to the adjusted model.

所述网关潮汐服务tide-gateway监听所述主控节点潮汐服务tide-master中tide-worker节点和模型的关联关系，负责将对应模型的流量请求转发到对应工作节点潮汐服务tide-worker的Pod上；The gateway tide service tide-gateway monitors the association between the tide-worker node and the model in the master node tide service tide-master, and is responsible for forwarding the traffic request of the corresponding model to the Pod of the corresponding working node tide service tide-worker;

所述网关潮汐服务tide-gateway监测到tide-worker节点和模型的关联关系信息发生变化后，停止将该模型的流量分配到摘除的tide-worker节点上。After the gateway tide service tide-gateway detects that the association relationship information between the tide-worker node and the model has changed, it stops distributing the traffic of the model to the removed tide-worker node.

作为本发明的可选实施方式，本发明的一种基于K8s的GPU服务潮汐部署方法中，所述通过自定义资源定义CRD定义GPU服务实例的运行信息包括：通过自定义资源定义CRD定义GPU服务实例的模型运行所需信息和路由信息。As an optional implementation of the present invention, in a K8s-based GPU service tidal deployment method of the present invention, the operation information of the GPU service instance defined by the custom resource definition CRD includes: the model operation required information and routing information of the GPU service instance defined by the custom resource definition CRD.

在第二方面，本发明提供一种基于K8s的GPU服务潮汐部署装置，包括：In a second aspect, the present invention provides a GPU service tidal deployment device based on K8s, comprising:

主控节点潮汐模块，基于K8s平台部署主控节点潮汐服务tide-master；The master node tide module deploys the master node tide service tide-master based on the K8s platform;

工作节点潮汐模块，基于K8s平台部署工作节点潮汐服务tide-worker；The worker node tide module deploys the worker node tide service tide-worker based on the K8s platform;

网关潮汐模块，基于K8s平台部署网关潮汐服务tide-gateway；The gateway tide module deploys the gateway tide service tide-gateway based on the K8s platform;

自定义资源定义模块CRD，定义GPU服务实例的运行信息和路由信息；Custom resource definition module CRD, which defines the running information and routing information of the GPU service instance;

所述主控节点潮汐模块监听自定义资源定义模块CRD和GPU资源信息，根据GPU资源信息在初始化的时候启动工作节点潮汐服务tide-worker的所有tide-worker节点；The master node tide module monitors the custom resource definition module CRD and GPU resource information, and starts all tide-worker nodes of the working node tide service tide-worker according to the GPU resource information during initialization;

所述主控节点潮汐模块根据实际GPU资源情况、各tide-worker节点中模型的GPU使用情况和网关潮汐服务tide-gateway的流量情况，动态调整工作节点潮汐服务tide-worker和网关潮汐服务tide-gateway的状态和流量分配策略。The master node tide module dynamically adjusts the status and traffic allocation strategy of the working node tide service tide-worker and the gateway tide service tide-gateway according to the actual GPU resource situation, the GPU usage of the model in each tide-worker node and the traffic situation of the gateway tide service tide-gateway.

在第三方面，本发明提供电子设备，包括处理器和存储器，所述存储器用于存储计算机可执行程序，当所述计算机程序被所述处理器执行时，所述处理器执行所述的一种基于K8s的GPU服务潮汐部署方法。In a third aspect, the present invention provides an electronic device comprising a processor and a memory, wherein the memory is used to store a computer executable program, and when the computer program is executed by the processor, the processor executes the K8s-based GPU service tidal deployment method.

在第四方面，本发明提供计算机可读记录介质，存储有计算机可执行程序，所述计算机可执行程序被执行时，实现所述的一种基于K8s的GPU服务潮汐部署方法。In a fourth aspect, the present invention provides a computer-readable recording medium storing a computer executable program, which, when executed, implements the K8s-based GPU service tidal deployment method.

与现有技术相比，本发明的有益效果：Compared with the prior art, the present invention has the following beneficial effects:

自动化和智能化：本发明的一种基于K8s的GPU服务潮汐部署方法通过自定义CRD和动态调整机制，实现了自动化的GPU服务动态部署。主控节点潮汐服务tide-master监听CRD和GPU资源信息，并根据实际情况智能地调整工作节点潮汐服务tide-worker和网关潮汐服务tide-gateway的状态和流量分配策略，实现了自动化和智能化的服务管理。Automation and intelligence: The GPU service tide deployment method based on K8s of the present invention realizes the automatic dynamic deployment of GPU services through custom CRD and dynamic adjustment mechanism. The master node tide service tide-master monitors CRD and GPU resource information, and intelligently adjusts the status and traffic distribution strategy of the working node tide service tide-worker and the gateway tide service tide-gateway according to the actual situation, realizing automatic and intelligent service management.

实时性和响应性：通过实时监测、预测和动态调整，本发明的一种基于K8s的GPU服务潮汐部署方法能够及时地根据模型需求和流量变化进行调整。它具有实时性和响应性，能够快速适应变化的需求，提高服务的灵活性和效率。Real-time and responsiveness: Through real-time monitoring, prediction and dynamic adjustment, the GPU service tidal deployment method based on K8s of the present invention can be adjusted in time according to model requirements and traffic changes. It has real-time and responsiveness, can quickly adapt to changing needs, and improve the flexibility and efficiency of services.

资源优化和利用率提升：通过动态调整和优化GPU资源分配，本发明的一种基于K8s的GPU服务潮汐部署方法能够实现资源的最优化利用。它根据模型需求和流量情况，动态分配和调整资源，避免了资源浪费和负载不均衡，提高了资源的利用率和系统的整体效率。而且通过动态扩缩训练和推理任务，根据实际负载情况灵活分配GPU资源，提高了资源的利用率。在资源紧张的情况下，能够收缩训练任务并扩容推理任务，从而更有效地利用有限的资源。Resource optimization and utilization improvement: By dynamically adjusting and optimizing GPU resource allocation, a GPU service tidal deployment method based on K8s of the present invention can achieve optimal resource utilization. It dynamically allocates and adjusts resources according to model requirements and traffic conditions, avoids resource waste and load imbalance, and improves resource utilization and the overall efficiency of the system. Moreover, by dynamically scaling training and reasoning tasks, GPU resources are flexibly allocated according to actual load conditions, thereby improving resource utilization. In the case of resource constraints, training tasks can be shrunk and reasoning tasks can be expanded, thereby making more effective use of limited resources.

扩展性和可扩展性：本发明的一种基于K8s的GPU服务潮汐部署方法基于Kubernetes平台，具有良好的扩展性和可扩展性。它能够根据实际需求快速扩展和缩减运行的模型数量，以适应不断增长的模型数量和变化的流量情况。Scalability and extensibility: The GPU service tidal deployment method based on K8s of the present invention is based on the Kubernetes platform and has good scalability and extensibility. It can quickly expand and reduce the number of running models according to actual needs to adapt to the growing number of models and changing traffic conditions.

高可用性和稳定性：通过动态部署和调整机制，本发明的一种基于K8s的GPU服务潮汐部署方法能够提高GPU服务的可用性和稳定性。它能够根据实时监测和预测的结果，自动调整和优化服务配置，避免服务出现故障或过载状态，提高系统的稳定性和可靠性。High availability and stability: Through dynamic deployment and adjustment mechanisms, the GPU service tidal deployment method based on K8s of the present invention can improve the availability and stability of GPU services. It can automatically adjust and optimize service configuration according to the results of real-time monitoring and prediction, avoid service failure or overload, and improve the stability and reliability of the system.

实时监控和管控能力：所有流量通过网关潮汐服务tide-gateway进行管理和监控，具备实时监控和管控的能力。可以对流量进行实时限流和降级，从而保证系统的稳定性，并根据实际需求进行及时调整。Real-time monitoring and control capabilities: All traffic is managed and monitored through the Tide Gateway service tide-gateway, which has the ability to monitor and control in real time. Traffic can be limited and downgraded in real time to ensure the stability of the system and make timely adjustments based on actual needs.

因此，相对于正常静态部署中，GPU资源通常被静态分配给不同的模型，导致一部分GPU资源可能处于闲置状态，而另一部分可能过载。本发明的一种基于K8s的GPU服务潮汐部署方法通过动态调整，根据实际模型需求和流量情况智能地分配GPU资源，从而提高GPU资源的利用率。由于GPU资源的动态分配和优化，以及实时的服务负载均衡，本技术方案能够合理分配资源，优化系统性能和效率。本技术方案根据模型需求和流量变化，及时进行调整，确保系统处于最佳状态，提高深度学习和人工智能应用的执行速度和效果。Therefore, compared with normal static deployment, GPU resources are usually statically allocated to different models, resulting in some GPU resources being idle while others being overloaded. The present invention's tidal deployment method for GPU services based on K8s intelligently allocates GPU resources according to actual model requirements and traffic conditions through dynamic adjustment, thereby improving the utilization of GPU resources. Due to the dynamic allocation and optimization of GPU resources, as well as real-time service load balancing, the technical solution can reasonably allocate resources and optimize system performance and efficiency. The technical solution makes timely adjustments based on model requirements and traffic changes to ensure that the system is in the best state and improve the execution speed and effect of deep learning and artificial intelligence applications.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1本发明实施例一种基于K8s的GPU服务潮汐部署装置的整体结构图；FIG1 is an overall structural diagram of a GPU service tidal deployment device based on K8s according to an embodiment of the present invention;

图2采用本发明实施例一种基于K8s的GPU服务潮汐部署方法的处理流程图；FIG2 is a processing flow chart of a GPU service tidal deployment method based on K8s according to an embodiment of the present invention;

图3本发明实施例一种基于K8s的GPU服务潮汐部署方法的工作时序图；FIG3 is a working sequence diagram of a GPU service tidal deployment method based on K8s according to an embodiment of the present invention;

图4本发明实施例的电子设备的结构示意图；FIG4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

图5本发明实施例的计算机可读记录介质的示意图。FIG. 5 is a schematic diagram of a computer-readable recording medium according to an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合附图，对本发明实施例中的技术方案进行清楚、完整的描述。显然，所描述的实施例是本发明的一部分实施例，而不是全部的实施例。To make the purpose, technical solution and advantages of the embodiments of the present invention clearer, the technical solution in the embodiments of the present invention will be described clearly and completely in conjunction with the accompanying drawings. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of them.

因此，以下对本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围，而是仅仅表示本发明的部分实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。Therefore, the following detailed description of the embodiments of the present invention is not intended to limit the scope of the invention claimed for protection, but merely represents some embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without making creative work are within the scope of protection of the present invention.

需要说明的是，在不冲突的情况下，本发明中的实施例及实施例中的特征和技术方案可以相互组合。It should be noted that, in the absence of conflict, the embodiments of the present invention and the features and technical solutions in the embodiments may be combined with each other.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that similar reference numerals and letters denote similar items in the following drawings, and therefore, once an item is defined in one drawing, it does not require further definition and explanation in the subsequent drawings.

在本发明的描述中，需要说明的是，术语“上”、“下”等指示的方位或位置关系为基于附图所示的方位或位置关系，或者是该发明产品使用时惯常摆放的方位或位置关系，或者是本领域技术人员惯常理解的方位或位置关系，这类术语仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。此外，术语“第一”、“第二”等仅用于区分描述，而不能理解为指示或暗示相对重要性。In the description of the present invention, it should be noted that the terms "upper", "lower", etc. indicate the orientation or positional relationship based on the orientation or positional relationship shown in the drawings, or the orientation or positional relationship in which the invention product is usually placed when in use, or the orientation or positional relationship commonly understood by those skilled in the art. Such terms are only for the convenience of describing the present invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore cannot be understood as limiting the present invention. In addition, the terms "first", "second", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.

实施例一Embodiment 1

本实施例的一种基于K8s的GPU服务潮汐部署方法，包括：A GPU service tidal deployment method based on K8s in this embodiment includes:

本实施例的一种基于K8s的GPU服务潮汐部署方法基于K8s(即Kubernetes)平台，实现了一套用于GPU服务动态部署的解决方案。该方案包含三个服务：主控节点潮汐服务tide-master、工作节点潮汐服务tide-worker及网关潮汐服务tide-gateway，并支持自定义CRD(自定义资源定义)。通过CRD定义GPU服务实例的模型运行信息和路由信息，主控节点潮汐服务tide-master会监听CRD以及GPU资源信息，并根据实际GPU资源情况、各tide-worker节点中模型的GPU使用情况和网关潮汐服务tide-gateway的流量情况，动态调整工作节点潮汐服务tide-worker和网关潮汐服务tide-gateway的状态和流量分配策略，实现动态扩缩容模型运行，并管理工作节点潮汐服务tide-worker的Pod分配需要运行的模型信息。A GPU service tide deployment method based on K8s in this embodiment is based on the K8s (i.e. Kubernetes) platform, and implements a solution for dynamic deployment of GPU services. The solution includes three services: master node tide service tide-master, worker node tide service tide-worker and gateway tide service tide-gateway, and supports custom CRD (custom resource definition). The model operation information and routing information of the GPU service instance are defined by CRD. The master node tide service tide-master will monitor CRD and GPU resource information, and dynamically adjust the status and traffic allocation strategy of the working node tide service tide-worker and the gateway tide service tide-gateway according to the actual GPU resource situation, the GPU usage of the model in each tide-worker node and the traffic situation of the gateway tide service tide-gateway, so as to realize the dynamic expansion and contraction model operation, and manage the model information that needs to be run for the Pod allocation of the working node tide service tide-worker.

本实施例的一种基于K8s的GPU服务潮汐部署方法中，所述通过自定义资源定义CRD定义GPU服务实例的运行信息包括：通过自定义资源定义CRD定义GPU服务实例的模型运行所需信息和路由信息。In a K8s-based GPU service tidal deployment method of this embodiment, the operation information of the GPU service instance defined by the custom resource definition CRD includes: the model operation information and routing information required for defining the GPU service instance by the custom resource definition CRD.

本实施例的一种基于K8s的GPU服务潮汐部署方法具有如下技术效果：The GPU service tidal deployment method based on K8s in this embodiment has the following technical effects:

自动化和智能化：本实施例的一种基于K8s的GPU服务潮汐部署方法通过自定义CRD和动态调整机制，实现了自动化的GPU服务动态部署。主控节点潮汐服务tide-master监听CRD和GPU资源信息，并根据实际情况智能地调整工作节点潮汐服务tide-worker和网关潮汐服务tide-gateway的状态和流量分配策略，实现了自动化和智能化的服务管理。Automation and intelligence: A GPU service tide deployment method based on K8s in this embodiment realizes automatic dynamic deployment of GPU services through custom CRD and dynamic adjustment mechanism. The master node tide service tide-master monitors CRD and GPU resource information, and intelligently adjusts the status and traffic distribution strategy of the working node tide service tide-worker and the gateway tide service tide-gateway according to the actual situation, realizing automatic and intelligent service management.

实时性和响应性：通过实时监测、预测和动态调整，本实施例的一种基于K8s的GPU服务潮汐部署方法能够及时地根据模型需求和流量变化进行调整。它具有实时性和响应性，能够快速适应变化的需求，提高服务的灵活性和效率。Real-time and responsiveness: Through real-time monitoring, prediction and dynamic adjustment, the GPU service tidal deployment method based on K8s in this embodiment can be adjusted in time according to model requirements and traffic changes. It has real-time and responsiveness, can quickly adapt to changing needs, and improve the flexibility and efficiency of services.

资源优化和利用率提升：通过动态调整和优化GPU资源分配，本实施例的一种基于K8s的GPU服务潮汐部署方法能够实现资源的最优化利用。它根据模型需求和流量情况，动态分配和调整资源，避免了资源浪费和负载不均衡，提高了资源的利用率和系统的整体效率。而且通过动态扩缩训练和推理任务，根据实际负载情况灵活分配GPU资源，提高了资源的利用率。在资源紧张的情况下，能够收缩训练任务并扩容推理任务，从而更有效地利用有限的资源。Resource optimization and utilization improvement: By dynamically adjusting and optimizing GPU resource allocation, a GPU service tidal deployment method based on K8s in this embodiment can achieve optimal resource utilization. It dynamically allocates and adjusts resources according to model requirements and traffic conditions, avoids resource waste and load imbalance, and improves resource utilization and overall system efficiency. Moreover, by dynamically scaling training and reasoning tasks, GPU resources are flexibly allocated according to actual load conditions, thereby improving resource utilization. In the case of tight resources, training tasks can be shrunk and reasoning tasks can be expanded, thereby making more effective use of limited resources.

扩展性和可扩展性：本实施例的一种基于K8s的GPU服务潮汐部署方法基于Kubernetes平台，具有良好的扩展性和可扩展性。它能够根据实际需求快速扩展和缩减运行的模型数量，以适应不断增长的模型数量和变化的流量情况。Scalability and extensibility: The GPU service tidal deployment method based on K8s in this embodiment is based on the Kubernetes platform and has good scalability and extensibility. It can quickly expand and reduce the number of running models according to actual needs to adapt to the growing number of models and changing traffic conditions.

高可用性和稳定性：通过动态部署和调整机制，本实施例的一种基于K8s的GPU服务潮汐部署方法能够提高GPU服务的可用性和稳定性。它能够根据实时监测和预测的结果，自动调整和优化服务配置，避免服务出现故障或过载状态，提高系统的稳定性和可靠性。High availability and stability: Through dynamic deployment and adjustment mechanisms, the GPU service tidal deployment method based on K8s in this embodiment can improve the availability and stability of GPU services. It can automatically adjust and optimize service configuration based on real-time monitoring and prediction results, avoid service failures or overload states, and improve system stability and reliability.

因此，相对于正常静态部署中，GPU资源通常被静态分配给不同的模型，导致一部分GPU资源可能处于闲置状态，而另一部分可能过载。本实施例的一种基于K8s的GPU服务潮汐部署方法通过动态调整，根据实际模型需求和流量情况智能地分配GPU资源，从而提高GPU资源的利用率。由于GPU资源的动态分配和优化，以及实时的服务负载均衡，本技术方案能够合理分配资源，优化系统性能和效率。本技术方案根据模型需求和流量变化，及时进行调整，确保系统处于最佳状态，提高深度学习和人工智能应用的执行速度和效果。Therefore, compared with normal static deployment, GPU resources are usually statically allocated to different models, resulting in some GPU resources being idle while others being overloaded. A tidal deployment method for GPU services based on K8s in this embodiment intelligently allocates GPU resources according to actual model requirements and traffic conditions through dynamic adjustment, thereby improving the utilization of GPU resources. Due to the dynamic allocation and optimization of GPU resources, as well as real-time service load balancing, this technical solution can reasonably allocate resources and optimize system performance and efficiency. This technical solution makes timely adjustments based on model requirements and traffic changes to ensure that the system is in the best state and improve the execution speed and effect of deep learning and artificial intelligence applications.

进一步地，相较于传统的静态部署和扩缩容机制，本实施例的一种基于K8s的GPU服务潮汐部署方法在GPU资源就绪时即启动所有Pod，并在CRD定义时异步处理NFS挂载和模型下载等操作，从而减少Pod的拉起和回收时间，提高整体系统的性能。Furthermore, compared with traditional static deployment and scaling mechanisms, a K8s-based GPU service tidal deployment method in this embodiment starts all Pods when GPU resources are ready, and asynchronously processes operations such as NFS mounting and model downloading during CRD definition, thereby reducing the time it takes to pull up and recycle the Pod and improving the performance of the overall system.

具体地，本实施例的一种基于K8s的GPU服务潮汐部署方法中，所述主控节点潮汐服务tide-master监听自定义资源定义CRD和GPU资源信息，根据GPU资源信息初始化启动工作节点潮汐服务tide-worker的所有tide-worker节点包括：Specifically, in a GPU service tide deployment method based on K8s in this embodiment, the master control node tide service tide-master listens to the custom resource definition CRD and GPU resource information, and initializes all tide-worker nodes of the starting working node tide service tide-worker according to the GPU resource information, including:

由此可知，本实施例的一种基于K8s的GPU服务潮汐部署方法具有上述初始化流程，初始化流程中使用CRD定义模型运行信息和路由信息，这使得主控节点潮汐服务tide-master能够根据CRD实时获取模型的相关信息和路由信息，并据此动态调整工作节点潮汐服务tide-worker和网关潮汐服务tide-gateway的状态和流量分配策略。It can be seen that a GPU service tide deployment method based on K8s in this embodiment has the above-mentioned initialization process. CRD is used in the initialization process to define model operation information and routing information, which enables the master node tide service tide-master to obtain relevant information and routing information of the model in real time according to CRD, and dynamically adjust the status and traffic distribution strategy of the working node tide service tide-worker and the gateway tide service tide-gateway accordingly.

进一步地，本实施例的一种基于K8s的GPU服务潮汐部署方法还具有周期性动态调整流程，具体地，所述主控节点潮汐服务tide-master根据实际资源情况、各tide-worker节点中模型的GPU使用情况和网关潮汐服务tide-gateway的流量情况，动态调整工作节点潮汐服务tide-worker和网关潮汐服务tide-gateway的状态和流量分配策略包括：Furthermore, a GPU service tide deployment method based on K8s in this embodiment also has a periodic dynamic adjustment process. Specifically, the master control node tide service tide-master dynamically adjusts the status and traffic allocation strategy of the working node tide service tide-worker and the gateway tide service tide-gateway according to the actual resource situation, the GPU usage of the model in each tide-worker node and the traffic situation of the gateway tide service tide-gateway, including:

其中，GPU使用率：指GPU设备在某个时段内被实际利用的程度，一般使用百分比表示。通过监控GPU使用率，可以了解GPU设备的负载情况，从而做出合理的资源调整。Among them, GPU utilization rate refers to the actual utilization degree of GPU device in a certain period of time, generally expressed as a percentage. By monitoring GPU utilization rate, you can understand the load of GPU device and make reasonable resource adjustments.

本实施例的一种基于K8s的GPU服务潮汐部署方法中周期性动态调整流程，通过定期拉取流量信息、错误率和GPU使用率，并根据阈值进行动态调整，该周期性动态调整流程的机制确保了服务的实时性和稳定性，能够自动根据不同模型的需求调整GPU资源分配。In this embodiment, a periodic dynamic adjustment process in a K8s-based GPU service tidal deployment method is implemented. Traffic information, error rate, and GPU usage rate are periodically pulled, and dynamically adjusted according to thresholds. The mechanism of the periodic dynamic adjustment process ensures the real-time and stability of the service, and can automatically adjust GPU resource allocation according to the needs of different models.

进一步地，本实施例的一种基于K8s的GPU服务潮汐部署方法中，所述主控节点潮汐服务tide-master根据阈值判断是否需要调整工作节点潮汐服务tide-worker中tide-worker节点和模型的关联关系、或者网关潮汐服务tide-gateway的Pod数量包括：Further, in a GPU service tide deployment method based on K8s in this embodiment, the master control node tide service tide-master determines whether it is necessary to adjust the association relationship between the tide-worker node and the model in the working node tide service tide-worker, or the number of Pods of the gateway tide service tide-gateway according to the threshold value, including:

进一步地，本实施例的一种基于K8s的GPU服务潮汐部署方法，包括：Furthermore, a GPU service tidal deployment method based on K8s in this embodiment includes:

本实施例的一种基于K8s的GPU服务潮汐部署方法，还包括：A GPU service tidal deployment method based on K8s in this embodiment also includes:

参见图2所示，本实施例一种基于K8s的GPU服务潮汐部署方法的周期性动态调整流程的流程图，包括：As shown in FIG2 , a flowchart of a periodic dynamic adjustment process of a tidal deployment method for a GPU service based on K8s in this embodiment includes:

步骤1：主控节点潮汐服务tide-master定期拉取网关潮汐服务tide-gateway和工作节点潮汐服务tide-worker下的模型的流量信息、错误率和GPU使用率。Step 1: The master node tide-master regularly pulls the traffic information, error rate, and GPU usage of the models under the gateway tide-gateway and the worker node tide-worker.

步骤2：主控节点潮汐服务tide-master根据阈值判断是否需要调整worker和模型的关联关系或网关潮汐服务tide-gateway的Pod数量。Step 2: The master node tide-master determines whether it is necessary to adjust the association between the worker and the model or the number of Pods of the gateway tide-gateway based on the threshold.

步骤3：若需要调整，主控节点潮汐服务tide-master调整工作节点潮汐服务tide-worker运行的模型信息。Step 3: If adjustment is required, the master node tide service tide-master adjusts the model information of the working node tide service tide-worker.

步骤4：网关潮汐服务tide-gateway检测到工作节点潮汐服务tide-worker和模型的信息发生变化后，停止将该模型的流量分配到摘除的tide-worker节点。Step 4: After the gateway tide service tide-gateway detects that the information of the working node tide service tide-worker and the model has changed, it stops distributing the traffic of the model to the removed tide-worker node.

步骤5：工作节点潮汐服务tide-worker监听到新的模型信息后，关闭原先运行的模型，切换为新的模型。Step 5: After the working node tide service tide-worker listens to the new model information, it closes the previously running model and switches to the new model.

参见图3所示，本实施例一种基于K8s的GPU服务潮汐部署方法的时序图，通过上述流程图和时序图，可以看到：Referring to FIG. 3 , a timing diagram of a GPU service tidal deployment method based on K8s in this embodiment is shown. Through the above flow chart and timing diagram, it can be seen that:

主控节点潮汐服务tide-master监听CRD和GPU资源信息，根据GPU资源信息在初始化的时候启动所有的tide-worker，并根据实际情况动态调整tide-worker和模型的关联关系。The master node tide service tide-master monitors CRD and GPU resource information, starts all tide-workers according to the GPU resource information during initialization, and dynamically adjusts the association between tide-workers and models according to actual conditions.

tide-gateway会监听tide-master中tide-worker和模型的关联关系，负责将对应模型的流量请求转发到对应的tide-workerPod上。tide-gateway monitors the association between tide-worker and the model in tide-master, and is responsible for forwarding the traffic requests of the corresponding model to the corresponding tide-workerPod.

tide-worker负责轮询tide-master获取所需的模型信息或者训练任务，并由该信息运行相应的模型或者训练任务。tide-worker is responsible for polling tide-master to obtain the required model information or training tasks, and runs the corresponding model or training tasks based on the information.

通过这样的流程，本技术方案实现了动态扩缩容和部署的功能，减少模型实例调整所需要的时间，加快模型的启动速度，能够提高GPU服务的资源利用率和性能，并实现实时的服务负载均衡和资源优化。Through such a process, this technical solution realizes the functions of dynamic expansion and contraction and deployment, reduces the time required for model instance adjustment, speeds up the startup of the model, improves the resource utilization and performance of GPU services, and realizes real-time service load balancing and resource optimization.

扩缩容：指根据负载情况自动伸缩系统的资源规模。当系统负载较高时，自动扩展资源以满足需求；当系统负载较低时，自动缩减资源以避免资源浪费。对于静态部署的方式，扩缩容指pod扩缩容，对于本方案来说，因为tide-worker节点会根据资源数量在初始化的时候就创建出来，所以本方案的扩缩容指的是运行模型的扩缩容，即tide-worker节点和模型的关联关系，扩容就是多将一些tide-worker节点运行这个模型，缩容就是解除一些tide-worker节点和这个模型的关联，而不是tide-worker节点的pod的扩缩容。Scaling: refers to automatically scaling the system's resource size based on load conditions. When the system load is high, resources are automatically expanded to meet demand; when the system load is low, resources are automatically reduced to avoid resource waste. For static deployment, scaling refers to pod scaling. For this solution, because tide-worker nodes are created at the time of initialization based on the number of resources, the scaling of this solution refers to the scaling of the running model, that is, the association between tide-worker nodes and models. Scaling means adding more tide-worker nodes to run the model, and scaling means removing the association between some tide-worker nodes and the model, rather than scaling the pods of tide-worker nodes.

参见图1所示，本实施例同时提供一种基于K8s的GPU服务潮汐部署装置，包括：As shown in FIG1 , this embodiment also provides a GPU service tidal deployment device based on K8s, including:

本实施例的一种基于K8s的GPU服务潮汐部署装置基于K8s(即Kubernetes)平台，实现了一套用于GPU服务动态部署的解决方案。该方案包含三个服务：主控节点潮汐服务tide-master、工作节点潮汐服务tide-worker及网关潮汐服务tide-gateway，并支持自定义CRD(自定义资源定义)。通过CRD定义GPU服务实例的模型运行信息和路由信息，主控节点潮汐服务tide-master会监听CRD以及GPU资源信息，并根据实际GPU资源情况、各tide-worker节点中模型的GPU使用情况和网关潮汐服务tide-gateway的流量情况，动态调整工作节点潮汐服务tide-worker和网关潮汐服务tide-gateway的状态和流量分配策略，实现动态扩缩容模型运行，并管理工作节点潮汐服务tide-worker的Pod分配需要运行的模型信息。A GPU service tide deployment device based on K8s in this embodiment is based on the K8s (i.e. Kubernetes) platform, and implements a solution for dynamic deployment of GPU services. The solution includes three services: master node tide service tide-master, worker node tide service tide-worker and gateway tide service tide-gateway, and supports custom CRD (custom resource definition). The model operation information and routing information of the GPU service instance are defined through CRD. The master node tide service tide-master will monitor CRD and GPU resource information, and dynamically adjust the status and traffic allocation strategy of the working node tide service tide-worker and the gateway tide service tide-gateway according to the actual GPU resource situation, the GPU usage of the model in each tide-worker node and the traffic situation of the gateway tide service tide-gateway, so as to realize the dynamic expansion and contraction model operation, and manage the model information that needs to be run for the Pod allocation of the working node tide service tide-worker.

因此，本实施例的一种基于K8s的GPU服务潮汐部署装置具有如下技术效果：Therefore, the GPU service tidal deployment device based on K8s in this embodiment has the following technical effects:

自动化和智能化：本实施例的一种基于K8s的GPU服务潮汐部署装置通过自定义CRD和动态调整机制，实现了自动化的GPU服务动态部署。主控节点潮汐服务tide-master监听CRD和GPU资源信息，并根据实际情况智能地调整工作节点潮汐服务tide-worker和网关潮汐服务tide-gateway的状态和流量分配策略，实现了自动化和智能化的服务管理。Automation and intelligence: A GPU service tide deployment device based on K8s in this embodiment realizes automatic dynamic deployment of GPU services through custom CRD and dynamic adjustment mechanism. The master node tide service tide-master monitors CRD and GPU resource information, and intelligently adjusts the status and traffic distribution strategy of the working node tide service tide-worker and the gateway tide service tide-gateway according to the actual situation, realizing automatic and intelligent service management.

实时性和响应性：通过实时监测、预测和动态调整，本实施例的一种基于K8s的GPU服务潮汐部署装置能够及时地根据模型需求和流量变化进行调整。它具有实时性和响应性，能够快速适应变化的需求，提高服务的灵活性和效率。Real-time and responsiveness: Through real-time monitoring, prediction and dynamic adjustment, the GPU service tidal deployment device based on K8s in this embodiment can be adjusted in time according to model requirements and traffic changes. It has real-time and responsiveness, can quickly adapt to changing needs, and improve the flexibility and efficiency of services.

资源优化和利用率提升：通过动态调整和优化GPU资源分配，本实施例的一种基于K8s的GPU服务潮汐部署装置能够实现资源的最优化利用。它根据模型需求和流量情况，动态分配和调整资源，避免了资源浪费和负载不均衡，提高了资源的利用率和系统的整体效率。而且通过动态扩缩训练和推理任务，根据实际负载情况灵活分配GPU资源，提高了资源的利用率。在资源紧张的情况下，能够收缩训练任务并扩容推理任务，从而更有效地利用有限的资源。Resource optimization and utilization improvement: By dynamically adjusting and optimizing GPU resource allocation, a GPU service tidal deployment device based on K8s in this embodiment can achieve optimal resource utilization. It dynamically allocates and adjusts resources according to model requirements and traffic conditions, avoids resource waste and load imbalance, and improves resource utilization and overall system efficiency. Moreover, by dynamically scaling training and reasoning tasks, GPU resources are flexibly allocated according to actual load conditions, thereby improving resource utilization. In the case of resource constraints, training tasks can be shrunk and reasoning tasks can be expanded, thereby making more effective use of limited resources.

扩展性和可扩展性：本实施例的一种基于K8s的GPU服务潮汐部署装置基于Kubernetes平台，具有良好的扩展性和可扩展性。它能够根据实际需求快速扩展和缩减运行的模型数量，以适应不断增长的模型数量和变化的流量情况。Scalability and extensibility: The GPU service tidal deployment device based on K8s in this embodiment is based on the Kubernetes platform and has good scalability and extensibility. It can quickly expand and reduce the number of running models according to actual needs to adapt to the growing number of models and changing traffic conditions.

高可用性和稳定性：通过动态部署和调整机制，本实施例的一种基于K8s的GPU服务潮汐部署装置能够提高GPU服务的可用性和稳定性。它能够根据实时监测和预测的结果，自动调整和优化服务配置，避免服务出现故障或过载状态，提高系统的稳定性和可靠性。High availability and stability: Through dynamic deployment and adjustment mechanisms, a GPU service tidal deployment device based on K8s in this embodiment can improve the availability and stability of GPU services. It can automatically adjust and optimize service configuration based on real-time monitoring and prediction results, avoid service failures or overload states, and improve system stability and reliability.

因此，相对于正常静态部署中，GPU资源通常被静态分配给不同的模型，导致一部分GPU资源可能处于闲置状态，而另一部分可能过载。本实施例的一种基于K8s的GPU服务潮汐部署装置通过动态调整，根据实际模型需求和流量情况智能地分配GPU资源，从而提高GPU资源的利用率。由于GPU资源的动态分配和优化，以及实时的服务负载均衡，本技术方案能够合理分配资源，优化系统性能和效率。本技术方案根据模型需求和流量变化，及时进行调整，确保系统处于最佳状态，提高深度学习和人工智能应用的执行速度和效果。Therefore, compared with normal static deployment, GPU resources are usually statically allocated to different models, resulting in some GPU resources being idle while others being overloaded. A GPU service tidal deployment device based on K8s in this embodiment intelligently allocates GPU resources according to actual model requirements and traffic conditions through dynamic adjustment, thereby improving the utilization of GPU resources. Due to the dynamic allocation and optimization of GPU resources and real-time service load balancing, this technical solution can reasonably allocate resources and optimize system performance and efficiency. This technical solution makes timely adjustments based on model requirements and traffic changes to ensure that the system is in the best state and improve the execution speed and effect of deep learning and artificial intelligence applications.

实施例二Embodiment 2

下面描述本发明的电子设备实施例，该电子设备可以视为对于上述本发明的方法和装置实施例的具体实体实施方式。对于本发明电子设备实施例中描述的细节，应视为对于上述方法或装置实施例的补充；对于在本发明电子设备实施例中未披露的细节，可以参照上述方法或装置实施例来实现。The electronic device embodiment of the present invention is described below, and the electronic device can be regarded as a specific physical implementation of the method and device embodiments of the present invention described above. The details described in the electronic device embodiment of the present invention should be regarded as a supplement to the above method or device embodiments; details not disclosed in the electronic device embodiment of the present invention can be implemented with reference to the above method or device embodiments.

图4是本发明的一个实施例的电子设备的结构示意图，该电子设备包括处理器和存储器，所述存储器用于存储计算机可执行程序，当所述计算机程序被所述处理器执行时，所述处理器执行实施例一的一种基于K8s的GPU服务潮汐部署方法。Figure 4 is a structural diagram of an electronic device of an embodiment of the present invention, which includes a processor and a memory, wherein the memory is used to store a computer executable program. When the computer program is executed by the processor, the processor executes a GPU service tidal deployment method based on K8s in Example 1.

如图4所示，电子设备以通用计算设备的形式表现。其中处理器可以是一个，也可以是多个并且协同工作。本发明也不排除进行分布式处理，即处理器可以分散在不同的实体设备中。本发明的电子设备并不限于单一实体，也可以是多个实体设备的总和。As shown in FIG. 4 , the electronic device is presented in the form of a general-purpose computing device. The processor may be one or more and work in coordination. The present invention does not exclude distributed processing, that is, the processor may be dispersed in different physical devices. The electronic device of the present invention is not limited to a single entity, but may also be the sum of multiple physical devices.

所述存储器存储有计算机可执行程序，通常是机器可读的代码。所述计算机可读程序可以被所述处理器执行，以使得电子设备能够执行本发明的方法，或者方法中的至少部分步骤。The memory stores a computer executable program, which is usually a machine-readable code. The computer-readable program can be executed by the processor to enable the electronic device to perform the method of the present invention, or at least part of the steps in the method.

所述存储器包括易失性存储器，例如随机存取存储单元(RAM)和/或高速缓存存储单元，还可以是非易失性存储器，如只读存储单元(ROM)。The memory includes a volatile memory, such as a random access memory unit (RAM) and/or a cache memory unit, and may also be a non-volatile memory, such as a read-only memory unit (ROM).

可选的，该实施例中，电子设备还包括有I/O接口，其用于电子设备与外部的设备进行数据交换。I/O接口可以为表示几类总线结构中的一种或多种，包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。Optionally, in this embodiment, the electronic device further includes an I/O interface, which is used for the electronic device to exchange data with an external device. The I/O interface can represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local bus using any of a variety of bus structures.

应当理解，图4显示的电子设备仅仅是本发明的一个示例，本发明的电子设备中还可以包括上述示例中未示出的元件或组件。例如，有些电子设备中还包括有显示屏等显示单元，有些电子设备还包括人机交互元件，例如按扭、键盘等。只要该电子设备能够执行存储器中的计算机可读程序以实现本发明方法或方法的至少部分步骤，均可认为是本发明所涵盖的电子设备。It should be understood that the electronic device shown in FIG. 4 is only an example of the present invention, and the electronic device of the present invention may also include elements or components not shown in the above examples. For example, some electronic devices also include display units such as display screens, and some electronic devices also include human-computer interaction elements such as buttons, keyboards, etc. As long as the electronic device can execute the computer-readable program in the memory to implement the method of the present invention or at least part of the steps of the method, it can be considered as an electronic device covered by the present invention.

图5是本发明的一个实施例的计算机可读记录介质的示意图。如图5所示，计算机可读记录介质中存储有计算机可执行程序，所述计算机可执行程序被执行时，实现本发明实施例一的一种基于K8s的GPU服务潮汐部署方法。所述计算机可读记录介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了可读程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。可读记录介质还可以是可读记录介质以外的任何可读介质，该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。可读记录介质上包含的程序代码可以用任何适当的介质传输，包括但不限于无线、有线、光缆、RF等等，或者上述的任意合适的组合。FIG5 is a schematic diagram of a computer-readable recording medium of an embodiment of the present invention. As shown in FIG5 , a computer-readable recording medium stores a computer executable program, and when the computer executable program is executed, a GPU service tidal deployment method based on K8s of the first embodiment of the present invention is implemented. The computer-readable recording medium may include a data signal propagated in a baseband or as part of a carrier, which carries a readable program code. This propagated data signal may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. The readable recording medium may also be any readable medium other than a readable recording medium, which may send, propagate, or transmit a program for use by or in combination with an instruction execution system, device, or device. The program code contained on the readable recording medium may be transmitted using any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the above.

可以以一种或多种程序设计语言的任意组合来编写用于执行本发明操作的程序代码，所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在企业员工计算设备上执行、部分地在企业员工设备上执行、作为一个独立的软件包执行、部分在企业员工计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中，远程计算设备可以通过任意种类的网络，包括局域网(LAN)或广域网(WAN)，连接到企业员工计算设备，或者，可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。Program code for performing the operations of the present invention may be written in any combination of one or more programming languages, including object-oriented programming languages such as Java, C++, etc., and conventional procedural programming languages such as "C" or similar programming languages. The program code may be executed entirely on the enterprise employee computing device, partially on the enterprise employee device, as a separate software package, partially on the enterprise employee computing device and partially on a remote computing device, or entirely on a remote computing device or server. In cases involving remote computing devices, the remote computing device may be connected to the enterprise employee computing device through any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (e.g., via the Internet using an Internet service provider).

通过以上对实施方式的描述，本领域的技术企业员工易于理解，本发明可以由能够执行特定计算机程序的硬件来实现，例如本发明的系统，以及系统中包含的电子处理单元、服务器、客户端、手机、控制单元、处理器等。本发明也可以由执行本发明的方法的计算机软件来实现，例如由微处理器、电子控制单元，客户端、服务器端等执行的控制软件来实现。但需要说明的是，执行本发明的方法的计算机软件并不限于由一个或特定个的硬件实体中执行，其也可以是由不特定具体硬件的以分布式的方式来实现。对于计算机软件，软件产品可以存储在一个计算机可读的记录介质(可以是CD-ROM，U盘，移动磁盘等)中，也可以分布式存储于网络上，只要其能使得电子设备执行根据本发明的方法。Through the above description of the implementation mode, it is easy for technical enterprise employees in this field to understand that the present invention can be implemented by hardware capable of executing a specific computer program, such as the system of the present invention, and the electronic processing unit, server, client, mobile phone, control unit, processor, etc. contained in the system. The present invention can also be implemented by computer software that executes the method of the present invention, such as control software executed by a microprocessor, an electronic control unit, a client, a server, etc. However, it should be noted that the computer software that executes the method of the present invention is not limited to being executed by one or a specific hardware entity, and it can also be implemented in a distributed manner by unspecified specific hardware. For computer software, the software product can be stored in a computer-readable recording medium (which can be a CD-ROM, a U disk, a mobile disk, etc.), or it can be distributed and stored on the network, as long as it enables the electronic device to execute the method according to the present invention.

以上实施例仅用以说明本发明而并非限制本发明所描述的技术方案，尽管本说明书参照上述的各个实施例对本发明已进行了详细的说明，但本发明不局限于上述具体实施方式，因此任何对本发明进行修改或等同替换；而一切不脱离发明的精神和范围的技术方案及其改进，其均涵盖在本发明的权利要求范围当中。The above embodiments are only used to illustrate the present invention and are not intended to limit the technical solutions described in the present invention. Although the present invention has been described in detail with reference to the above embodiments, the present invention is not limited to the above specific implementation methods. Therefore, any modification or equivalent replacement of the present invention; and all technical solutions and improvements thereof that do not depart from the spirit and scope of the invention are included in the scope of the claims of the present invention.

Claims

1. A GPU service tidal deployment method based on K8s, characterized by comprising:

Deploy the master node tide service tide-master, the worker node tide service tide-worker, and the gateway tide service tide-gateway based on the K8s platform;

Define the running information of the GPU service instance through the custom resource definition CRD;

The master control node tide service tide-master monitors the custom resource definition CRD and GPU resource information, and initializes all tide-worker nodes of the starting working node tide service tide-worker according to the GPU resource information;

The master control node tide service tide-master dynamically adjusts the status and traffic allocation strategy of the working node tide service tide-worker and the gateway tide service tide-gateway according to the actual GPU resource situation, the GPU usage of the model in each tide-worker node and the traffic situation of the gateway tide service tide-gateway.

2. According to a K8s-based GPU service tide deployment method according to claim 1, it is characterized in that the master control node tide service tide-master monitors the custom resource definition CRD and GPU resource information, and initializes all tide-worker nodes of the starting working node tide service tide-worker according to the GPU resource information, including:

Start the master node tide service tide-master, the worker node tide service tide-worker and the gateway tide service tide-gateway;

The master control node tide service tide-master listens to the custom resource definition CRD and GPU resource information, and starts all tide-worker nodes of the working node tide service tide-worker according to the GPU resource information. After the tide-worker node is started, it will asynchronously load all models in the CRD;

The working node tide service tide-worker polls the master control node tide service tide-master to obtain the model information or training tasks that need to be run;

The gateway tide service tide-gateway monitors the master control node tide service tide-master to obtain the Pod IP corresponding to the model and the corresponding relationship information of the model.

3. According to a K8s-based GPU service tide deployment method described in claim 1 or 2, it is characterized in that the master control node tide service tide-master dynamically adjusts the status and traffic allocation strategy of the working node tide service tide-worker and the gateway tide service tide-gateway according to the actual resource situation, the GPU usage of the model in each tide-worker node and the traffic situation of the gateway tide service tide-gateway, including:

The master control node tide service tide-master regularly pulls the traffic information of the gateway tide service tide-gateway and the error rate and GPU usage rate of the model under the working node tide service tide-worker;

The master control node tide service tide-master determines whether it is necessary to adjust the association relationship between the tide-worker node and the model in the working node tide service tide-worker, or the number of Pods of the gateway tide service tide-gateway according to the threshold.

4. According to a K8s-based GPU service tide deployment method according to claim 3, it is characterized in that the master control node tide service tide-master determines whether it is necessary to adjust the association relationship between the tide-worker node and the model in the working node tide service tide-worker, or the number of Pods of the gateway tide service tide-gateway according to the threshold value, including:

If the traffic that the master node tide service tide-master regularly pulls from the gateway tide service tide-gateway is greater than the preset traffic threshold;

Alternatively, the error rate of the model under the working node tide service tide-worker periodically pulled by the master control node tide service tide-master is greater than the preset error rate;

Alternatively, the GPU usage rate of the model under the working node tide service tide-worker that is periodically pulled by the master control node tide service tide-master is greater than the preset GPU usage rate;

The master node tide service tide-master determines that it is necessary to adjust the association between the tide-worker node and the model in the working node tide service tide-worker, or the number of Pods in the gateway tide service tide-gateway.

5. According to a K8s-based GPU service tidal deployment method according to claim 4, it is characterized by comprising:

The working node tide service tide-worker polls the master control node tide service tide-master to obtain model information or training tasks;

When it is perceived that the association between the tide-worker node and the model in the working node tide service tide-worker is adjusted, the working node tide service tide-worker closes the originally running model and switches to the adjusted model.

6. According to a K8s-based GPU service tidal deployment method according to claim 4, it is characterized by comprising:

The gateway tide service tide-gateway monitors the association between the tide-worker node and the model in the master node tide service tide-master, and is responsible for forwarding the traffic request of the corresponding model to the Pod of the corresponding working node tide service tide-worker;

After the gateway tide service tide-gateway detects that the association relationship information between the tide-worker node and the model has changed, it stops distributing the traffic of the model to the removed tide-worker node.

7. According to a K8s-based GPU service tidal deployment method according to claim 1, it is characterized in that the operation information of the GPU service instance defined by the custom resource definition CRD includes: the model operation required information and routing information of the GPU service instance defined by the custom resource definition CRD.

8. A GPU service tidal deployment device based on K8s, characterized by comprising:

The master node tide module deploys the master node tide service tide-master based on the K8s platform;

The worker node tide module deploys the worker node tide service tide-worker based on the K8s platform;

The gateway tide module deploys the gateway tide service tide-gateway based on the K8s platform;

Custom resource definition module CRD, which defines the running information and routing information of the GPU service instance;

The master node tide module monitors the custom resource definition module CRD and GPU resource information, and starts all tide-worker nodes of the working node tide service tide-worker according to the GPU resource information during initialization;

The master node tide module dynamically adjusts the status and traffic allocation strategy of the working node tide service tide-worker and the gateway tide service tide-gateway according to the actual GPU resource situation, the GPU usage of the model in each tide-worker node and the traffic situation of the gateway tide service tide-gateway.

9. An electronic device, comprising a processor and a memory, wherein the memory is used to store a computer executable program, and wherein when the computer program is executed by the processor, the processor executes a K8s-based GPU service tidal deployment method as described in any one of claims 1 to 7.

10. A computer-readable recording medium storing a computer executable program, wherein when the computer executable program is executed, a GPU service tidal deployment method based on K8s is implemented as described in any one of claims 1 to 7.