CN118642844A

CN118642844A - Dynamic and smooth expansion of POD resources based on K8S and rescheduling method under heavy load

Info

Publication number: CN118642844A
Application number: CN202410692031.1A
Authority: CN
Inventors: 何伟; 林海; 乔斌; 陈利东; 范停停
Original assignee: Henan Splendor Science and Technology Co Ltd
Current assignee: Henan Splendor Science and Technology Co Ltd
Priority date: 2024-05-31
Filing date: 2024-05-31
Publication date: 2024-09-13

Abstract

The present invention provides a K8S-based POD resource dynamic smooth expansion and heavy load rescheduling method, the method comprising: pre-deploying a business microservice POD and an application management program on a K8S cluster; pre-configuring an original memory threshold value, a memory expansion threshold value, a memory maximum value threshold value, an original CPU threshold value, a CPU expansion threshold value and a CPU core maximum value threshold value; using an application management program to monitor the real-time memory occupancy and the real-time CPU core number of the business microservice POD; judging whether to update the original memory threshold value based on the memory expansion threshold value, the real-time memory occupancy, the original memory threshold value and the memory maximum value threshold value corresponding to the business microservice POD; judging whether to update the original CPU threshold value based on the CPU expansion threshold value, the real-time CPU core number, the original CPU threshold value and the CPU core maximum value threshold value corresponding to the business microservice POD; therefore, the present invention can quickly and accurately judge the timing of dynamic and smooth expansion of POD resources, and automatically perform dynamic and smooth expansion.

Description

Dynamic and smooth expansion of POD resources based on K8S and rescheduling method under heavy load

技术领域Technical Field

本发明涉及容器云平台技术领域，具体的说，涉及了一种基于K8S的POD资源动态平滑扩展及重负载时重调度方法。The present invention relates to the technical field of container cloud platform, and in particular to a K8S-based POD resource dynamic smooth expansion and heavy load rescheduling method.

背景技术Background Art

在K8S（Kubernetes 的简称）集群环境中，POD是最小的部署单元，它可以包含一个或多个容器。在铁路综合视频监控系统的部署和管理中，可以利用K8S集群来自动化部署和扩展监控系统的组件，提高系统的可靠性和可扩展性。同时，K8S集群的数据存储和持久化功能也可以为铁路综合视频监控系统提供稳定的数据存储和备份解决方案。但是，K8S集群的一些固有内在机制，导致现有K8S集群无法完全满足铁路综合视频监控系统的业务需求，存在问题主要有：In the K8S (short for Kubernetes) cluster environment, POD is the smallest deployment unit, which can contain one or more containers. In the deployment and management of the railway integrated video surveillance system, the K8S cluster can be used to automatically deploy and expand the components of the monitoring system to improve the reliability and scalability of the system. At the same time, the data storage and persistence functions of the K8S cluster can also provide a stable data storage and backup solution for the railway integrated video surveillance system. However, some inherent internal mechanisms of the K8S cluster make the existing K8S cluster unable to fully meet the business needs of the railway integrated video surveillance system. The main problems are:

（1）K8S集群中的POD的CPU和内存资源只能在其yaml配置文件中根据经验进行静态定义，设置好后无论资源是否够用，都无法进行改变，除非手工修改；因此现有K8S集群无法根据POD实际的资源占用和变化情况进行动态调整，较易发生因资源定义不准确、不合理导致业务POD卡顿甚至无法正常运行的情况；(1) The CPU and memory resources of the POD in the K8S cluster can only be statically defined in its yaml configuration file based on experience. Once set, regardless of whether the resources are sufficient, they cannot be changed unless they are manually modified. Therefore, the existing K8S cluster cannot be dynamically adjusted according to the actual resource usage and changes of the POD, and it is easy for the business POD to freeze or even fail to operate normally due to inaccurate and unreasonable resource definitions;

而，铁路综合视频监控系统要对大量高清视频图像进行不间断的分析处理，正常情况下对资源要求已经很高，如遇到应急指挥、事后分析、热点回溯、施工检修等多方集中调看的情形，所需资源会瞬时激增，波动很大，这就要求能对K8S中的业务POD资源进行自动的监测判断，实现按需的动态扩容，否则就会导致视频业务响应缓慢甚至中断；However, the railway integrated video surveillance system needs to continuously analyze and process a large number of high-definition video images. Under normal circumstances, the resource requirements are already very high. If there are situations where multiple parties need to centrally monitor and adjust the system, such as emergency command, post-event analysis, hot spot backtracking, construction and maintenance, the required resources will surge instantly and fluctuate greatly. This requires automatic monitoring and judgment of the business POD resources in K8S to achieve dynamic expansion on demand. Otherwise, the video service will respond slowly or even be interrupted.

（2）之前版本的K8S中，即使进行了静态扩容，新增的资源也不能立即被使用，必须重启业务POD后才能正式生效，这必然会引起业务的中断（连续扩容时甚至是多次频繁中断）；(2) In previous versions of K8S, even if static capacity expansion was performed, the newly added resources could not be used immediately and had to be restarted before the business Pod could take effect. This would inevitably cause business interruptions (even multiple frequent interruptions during continuous capacity expansion);

而，铁路综合视频监控系统的资源扩容往往是突发，无规律、不定时的，且现场无专业的K8S和业务值守人员，无法进行正确、有效、及时的扩容及重启业务POD等人工操作，因此要求POD资源扩展必须是在无人干预的情况下自动进行的，且无需重启相关POD，否则资源扩容就没有实际意义；However, the resource expansion of the railway integrated video surveillance system is often sudden, irregular, and irregular. There are no professional K8S and business on-site personnel, and it is impossible to perform correct, effective, and timely expansion and restart business PODs and other manual operations. Therefore, it is required that POD resource expansion must be automatic without human intervention and there is no need to restart the relevant PODs. Otherwise, resource expansion will be meaningless.

（3）一个POD调度到哪个Node节点上运行，是由K8S的scheduler组件根据一定的条件和内部算法计算出来的，这个过程不受人为控制。POD调度完成后，在没有人为参与重新调度或节点故障的前提下，是无法根据节点的资源实时使用情况重新进行调度。这会导致集群中各个Node节点的资源使用率变的很不均衡，不仅会造成资源浪费，还有可能会使部分Node的资源使用率飙升，从而导致其上承载的所有服务卡顿甚至完全无法响应；(3) The Node to which a POD is scheduled to run is calculated by the K8S scheduler component based on certain conditions and internal algorithms. This process is not controlled by humans. After the POD is scheduled, it cannot be rescheduled based on the real-time resource usage of the node without human participation in rescheduling or node failure. This will cause the resource utilization of each Node in the cluster to become very unbalanced, which will not only cause resource waste, but may also cause the resource utilization of some Nodes to soar, causing all services hosted on them to freeze or even become completely unresponsive;

为避免这种“首次即终生”模式带来的问题，承载铁路综合视频监控系统的K8S平台不仅要实现POD层面的资源动态扩展能力，还要实现更高层次的Node物理节点间的自动重调度功能，使POD可以从重负载的Node节点迁移到轻负载的Node节点上，进而均衡优化整个集群的资源负载情况。To avoid the problems caused by this "first time, lifetime" model, the K8S platform that carries the railway integrated video surveillance system must not only realize the dynamic expansion of resources at the POD level, but also realize the automatic rescheduling function between higher-level Node physical nodes, so that POD can migrate from heavily loaded Node nodes to lightly loaded Node nodes, thereby balancing and optimizing the resource load of the entire cluster.

为了解决以上存在的问题，人们一直在寻求一种理想的技术解决方案。In order to solve the above problems, people have been seeking an ideal technical solution.

发明内容Summary of the invention

基于此，有必要针对上述技术问题，提供一种基于K8S的POD资源动态平滑扩展及重负载时重调度方法，使POD资源能够进行动态平滑扩展，同时使集群中各节点的资源使用率更为均衡，达到提高业务POD稳定性和集群可靠性的目的。Based on this, it is necessary to provide a K8S-based POD resource dynamic and smooth expansion and heavy load rescheduling method to address the above technical problems, so that POD resources can be dynamically and smoothly expanded, and at the same time make the resource utilization of each node in the cluster more balanced, so as to achieve the purpose of improving business POD stability and cluster reliability.

为了实现上述目的，本发明第一方面提供一种基于K8S的POD资源动态平滑扩展方法，其包括：预先部署业务微服务POD和应用管理程序；其中，所述业务微服务POD用于安装视频监控管理软件；In order to achieve the above-mentioned object, the first aspect of the present invention provides a method for dynamic and smooth expansion of POD resources based on K8S, which comprises: pre-deploying a business microservice POD and an application management program; wherein the business microservice POD is used to install video surveillance management software;

预先配置所述业务微服务POD对应的原内存门限值、内存扩展阈值、内存最大值门限、原CPU门限值、CPU扩展阈值及CPU内核数最大值门限；其中，所述原内存门限值和所述原CPU门限值均为POD资源动态参数；Pre-configure the original memory threshold value, memory expansion threshold value, memory maximum value threshold value, original CPU threshold value, CPU expansion threshold value and CPU core maximum value threshold value corresponding to the business microservice POD; wherein the original memory threshold value and the original CPU threshold value are both POD resource dynamic parameters;

在业务微服务POD的运行过程中，利用预先部署的应用管理程序监测业务微服务POD的实时内存占用量和实时CPU内核数量；During the operation of the business microservice Pod, the pre-deployed application management program is used to monitor the real-time memory usage and real-time CPU core number of the business microservice Pod;

基于所述业务微服务POD对应的内存扩展阈值、实时内存占用量、原内存门限值及内存最大值门限，判断是否对所述原内存门限值进行更新；Based on the memory expansion threshold, real-time memory usage, original memory threshold and maximum memory threshold corresponding to the business microservice POD, determine whether to update the original memory threshold;

若是，则所述应用管理程序基于预设步长Ⅰ和所述原内存门限值获得新内存门限值，并基于所述新内存门限值对所述原内存门限值进行更新；其中，所述新内存门限值＞所述原内存门限值；If yes, the application management program obtains a new memory threshold value based on the preset step size I and the original memory threshold value, and updates the original memory threshold value based on the new memory threshold value; wherein the new memory threshold value>the original memory threshold value;

基于所述业务微服务POD对应的CPU扩展阈值、实时CPU内核数量、原CPU门限值及CPU内核数最大值门限，判断是否对所述原CPU门限值进行更新；Based on the CPU expansion threshold, the real-time number of CPU cores, the original CPU threshold and the maximum value threshold of the number of CPU cores corresponding to the business microservice POD, determine whether to update the original CPU threshold;

若是，则所述应用管理程序基于预设步长Ⅱ和所述原CPU门限值获得新CPU门限值，并基于所述新CPU门限值对所述原CPU门限值进行更新；其中，所述新CPU门限值＞所述原CPU门限值。If so, the application management program obtains a new CPU threshold value based on the preset step size II and the original CPU threshold value, and updates the original CPU threshold value based on the new CPU threshold value; wherein the new CPU threshold value>the original CPU threshold value.

为了实现上述目的，本发明第二方面提供一种基于K8S的POD重负载时的重调度方法，其包括：将部署业务微服务POD的Node节点设置目标物理节点，并预先配置所述目标物理节点的重调度阈值；In order to achieve the above-mentioned object, the second aspect of the present invention provides a rescheduling method for a POD under heavy load based on K8S, which comprises: setting a target physical node for a Node node where a business microservice POD is deployed, and pre-configuring a rescheduling threshold of the target physical node;

在执行权利要求上述的基于K8S的POD资源动态平滑扩展方法的过程中，利用所述应用管理程序监测所述目标物理节点，基于所述重调度阈值判断所述目标物理节点是否处于重负载状态；In the process of executing the K8S-based POD resource dynamic smooth expansion method as claimed in the claim, the target physical node is monitored by the application management program, and whether the target physical node is in a heavy load state is determined based on the rescheduling threshold;

若所述目标物理节点处于重负载状态，则将所述Node节点上的业务微服务POD重新调度到K8S集群中其它可用节点上。If the target physical node is in a heavy load state, the business microservice POD on the Node node will be rescheduled to other available nodes in the K8S cluster.

为了实现上述目的，本发明第三方面提供一种K8S集群，其包括N个master节点和M个Node节点，所述master节点上编译安装有负载代理组件Nginx；其中，所述负载代理组件Nginx的配置文件中包括N个Master节点的代理信息；In order to achieve the above-mentioned object, the third aspect of the present invention provides a K8S cluster, which includes N master nodes and M Node nodes, and a load proxy component Nginx is compiled and installed on the master node; wherein the configuration file of the load proxy component Nginx includes proxy information of the N Master nodes;

在K8S集群上预先部署业务微服务POD；其中，所述业务微服务POD用于安装视频监控管理软件；Pre-deploy a business microservice POD on the K8S cluster; wherein the business microservice POD is used to install the video surveillance management software;

在K8S集群上预先安装应用管理程序；其中，所述应用管理程序用于监测所述业务微服务POD的实时内存占用量和实时CPU内核数量；An application management program is pre-installed on the K8S cluster; wherein the application management program is used to monitor the real-time memory usage and the real-time number of CPU cores of the business microservice POD;

在业务微服务POD的运行过程中，利用预先安装的应用管理程序监测业务微服务POD的实时内存占用量和实时CPU内核数量；During the operation of the business microservice Pod, the real-time memory usage and the real-time number of CPU cores of the business microservice Pod are monitored using the pre-installed application management program;

本发明的有益效果为：The beneficial effects of the present invention are:

1）本发明通过在K8S集群上预先安装监测业务微服务POD的应用管理程序，将原内存门限值和原CPU门限值设置为POD资源动态参数，基于原内存门限值、内存扩展阈值、内存最大值门限、原CPU门限值、CPU扩展阈值、CPU内核数最大值门限、实时内存占用量和实时CPU内核数量判断POD资源动态平滑扩展的时机，应用管理程序利用预设步长对POD资源进行动态平滑扩展；1) The present invention pre-installs an application management program for monitoring business microservice POD on the K8S cluster, sets the original memory threshold value and the original CPU threshold value as POD resource dynamic parameters, and judges the timing of dynamic and smooth expansion of POD resources based on the original memory threshold value, memory expansion threshold value, memory maximum value threshold, original CPU threshold value, CPU expansion threshold value, CPU core maximum value threshold, real-time memory usage and real-time CPU core number. The application management program dynamically and smoothly expands POD resources using a preset step size;

2）本发明在POD资源扩展动态过程中，K8S集群上部署的业务微服务POD无需重启，确保业务不中断；2) In the dynamic process of POD resource expansion, the business microservice POD deployed on the K8S cluster does not need to be restarted to ensure uninterrupted business;

3）本发明通过预先配置目标物理节点的重调度阈值，利用应用管理程序监测目标物理节点是否处于重负载状态，并在目标物理节点处于重负载状态时将业务微服务POD重新调度到K8S集群中其它可用节点上，达到POD重负载时的重调度的目的；3) The present invention pre-configures the rescheduling threshold of the target physical node, uses the application management program to monitor whether the target physical node is in a heavy load state, and rescheduling the business microservice POD to other available nodes in the K8S cluster when the target physical node is in a heavy load state, thereby achieving the purpose of rescheduling when the POD is heavily loaded;

同时使集群中各节点的资源使用率更为均衡，达到提高业务POD稳定性和集群可靠性的目的。At the same time, the resource utilization of each node in the cluster is more balanced, thereby improving the stability of the business POD and the reliability of the cluster.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明的基于K8S的POD资源动态平滑扩展方法的流程示意图；FIG1 is a schematic diagram of a flow chart of a method for dynamically and smoothly expanding POD resources based on K8S of the present invention;

图2是本发明的一种实施例中的基于K8S的POD资源动态平滑扩展方法的流程示意图；2 is a schematic diagram of a flow chart of a method for dynamic and smooth expansion of POD resources based on K8S in an embodiment of the present invention;

图3是本发明的另一种实施例中的基于K8S的POD资源动态平滑扩展方法的流程示意图；3 is a schematic diagram of a flow chart of a method for dynamic and smooth expansion of POD resources based on K8S in another embodiment of the present invention;

图4是本发明的基于K8S的POD重负载时的重调度方法的流程示意图；FIG4 is a flow chart of a rescheduling method for a POD under heavy load based on K8S of the present invention;

图5是本发明的K8S集群的结构示意图。FIG5 is a schematic diagram of the structure of the K8S cluster of the present invention.

具体实施方式DETAILED DESCRIPTION

下面通过具体实施方式，对本发明的技术方案做进一步的详细描述。The technical solution of the present invention is further described in detail below through specific implementation methods.

为方便理解，首先结合本发明的技术方案对本发明中涉及的交互方和/或术语和/或自定义词进行说明：For ease of understanding, the interactive parties and/or terms and/or custom words involved in the present invention are first described in conjunction with the technical solution of the present invention:

K8S集群：Kubernetes 的简称，是一个开源的容器编排系统，用于自动化部署、扩展和管理容器化应用程序；具体包括Master节点、Node节点等，Master 节点负责整个集群的管理和调度，Node 节点则负责运行容器化应用程序；K8S cluster: short for Kubernetes, is an open source container orchestration system used to automatically deploy, scale and manage containerized applications. It includes Master nodes and Node nodes. The Master node is responsible for the management and scheduling of the entire cluster, while the Node node is responsible for running containerized applications.

如，在生产环境中使用四台服务器构建包括两个Master节点和四个Node节点组建的K8S集群，业务微服务POD部署在K8S集群的Node节点上；K8S版本需为1.27以上，四个Node节点采用Containerd作为容器运行时，复用集群中的三个节点搭建Etcd集群，如附图5所示。For example, in a production environment, four servers are used to build a K8S cluster consisting of two Master nodes and four Node nodes. The business microservice POD is deployed on the Node node of the K8S cluster; the K8S version must be 1.27 or above, and the four Node nodes use Containerd as the container runtime. Three nodes in the cluster are reused to build an Etcd cluster, as shown in Figure 5.

应用微服务POD：POD是Kubernetes中的最小部署单元，不同的应用微服务POD实现不同的业务功能，如在综合视频监控业务系统中的应用微服务POD用于安装铁路综合视频监控管理软件，实现视频监控业务。Application microservice POD: POD is the smallest deployment unit in Kubernetes. Different application microservice PODs implement different business functions. For example, in the comprehensive video surveillance business system, the application microservice POD is used to install the railway comprehensive video surveillance management software to implement video surveillance business.

应用管理程序：指的是一种自研的管理程序，周期性调用K8S的API接口，获取到业务微服务POD和Node节点的CPU和内存性能数据，并执行资源计算、阈值比较、POD扩容等操作，以通过监测到的业务微服务POD的实时内存占用量和实时CPU内核数量等、物理节点的实时内存占用量和实时CPU内核数量等信息，进行POD资源动态平滑扩展或者在重负载时对POD进行重调度。Application management program: refers to a self-developed management program that periodically calls the K8S API interface to obtain the CPU and memory performance data of the business microservice POD and Node nodes, and performs resource calculation, threshold comparison, POD expansion and other operations. It monitors the real-time memory usage and real-time CPU core number of the business microservice POD, the real-time memory usage and real-time CPU core number of the physical node, etc., to dynamically and smoothly expand the POD resources or reschedule the POD under heavy load.

原内存门限值和原CPU门限值：均为POD资源动态参数，应用管理程序根据实时内存占用量、原内存门限值及内存扩展阈值等判断是否需要对原内存门限值进行动态调整，基于实时CPU内核数量、原CPU门限值及CPU扩展阈值等判断是否需要对原CPU门限值进行动态调整；Original memory threshold and original CPU threshold: Both are dynamic parameters of POD resources. The application management program determines whether the original memory threshold needs to be adjusted dynamically based on the real-time memory usage, original memory threshold, and memory expansion threshold. It also determines whether the original CPU threshold needs to be adjusted dynamically based on the real-time number of CPU cores, original CPU threshold, and CPU expansion threshold.

可以理解，在部署业务微服务POD时，预先在其yaml配置文件中定义好的资源初始值，具体包括业务微服务POD在运行时可使用的内存request（初始最小内存占用量）和内存limit（原内存门限值，即初始最大内存占用量）、及业务微服务POD在运行时可使用的CPUrequest（初始最小CPU内核数）和CPU limit（原CPU门限值，即初始最大CPU内核数）。It can be understood that when deploying a business microservice POD, the initial resource values are pre-defined in its yaml configuration file, including the memory request (initial minimum memory usage) and memory limit (original memory threshold value, i.e. initial maximum memory usage) that the business microservice POD can use at runtime, and the CPU request (initial minimum number of CPU cores) and CPU limit (original CPU threshold value, i.e. initial maximum number of CPU cores) that the business microservice POD can use at runtime.

内存扩展阈值及CPU扩展阈值：作为触发POD资源动态扩展的扩展阈值，可以动态调整；内存扩展阈值及CPU扩展阈值可以为百分比，也可以为具体数值。Memory expansion threshold and CPU expansion threshold: As the expansion threshold that triggers the dynamic expansion of POD resources, it can be adjusted dynamically; the memory expansion threshold and CPU expansion threshold can be a percentage or a specific value.

预设步长Ⅰ；指的是预先配置的内存资源扩展步长，该步长是预先设定的，并可根据实际情况进行调整设置。Preset step size I refers to the pre-configured memory resource expansion step size, which is preset and can be adjusted according to actual conditions.

预设步长Ⅱ：指的是预先配置的CPU资源扩展步长，该步长也是预先设定的，并可根据实际情况进行调整设置。Preset step size II: refers to the pre-configured CPU resource expansion step size, which is also preset and can be adjusted according to actual conditions.

内存最大值门限和CPU内核数最大值门限：为了不影响节点上的其它微服务和物理节点正常运行，不允许某个业务微服务POD过度使用资源，会为所有的业务微服务POD再设置一个资源最大值；因此，POD资源不能进行无限制扩展；Maximum memory limit and maximum CPU core number limit: In order not to affect the normal operation of other microservices and physical nodes on the node, a business microservice Pod is not allowed to overuse resources. A maximum resource limit will be set for all business microservice Pods; therefore, Pod resources cannot be expanded indefinitely;

如，在生产实例中，将业务微服务POD对应的内存最大值门限设置为9000毫核，CPU内核数最大值门限9000M；CPU动态调整步长为1000毫核，表示每次动态调整后CPU门限值会增加1000毫核，内存动态调整步长为1000M，表示每次动态调整后内存门限值会增加1000M；因此，业务微服务POD的原内存门限值最大可以调整为10000毫核，原CPU门限值最大可以调整为10000M；For example, in the production instance, the maximum memory threshold corresponding to the business microservice POD is set to 9000 millicore, and the maximum CPU core number threshold is 9000M; the CPU dynamic adjustment step is 1000 millicore, which means that the CPU threshold value will increase by 1000 millicore after each dynamic adjustment, and the memory dynamic adjustment step is 1000M, which means that the memory threshold value will increase by 1000M after each dynamic adjustment; therefore, the original memory threshold value of the business microservice POD can be adjusted to 10000 millicore at most, and the original CPU threshold value can be adjusted to 10000M at most;

如附图4所示，在应用管理程序检测到业务微服务POD对应的原内存门限值＞内存最大值门限，或者原CPU门限值＞CPU内核数最大值门限后，应用管理程序不再执行相应扩容操作，等待人为介入处理。As shown in Figure 4, after the application management program detects that the original memory threshold value corresponding to the business microservice POD is greater than the maximum memory threshold value, or the original CPU threshold value is greater than the maximum CPU core number threshold, the application management program no longer performs the corresponding capacity expansion operation and waits for human intervention.

物理节点的重调度阈值：预先配置是触发POD重调度的门限值，具体的包括目标物理节点的内存重调度阈值和CPU重调度阈值，内存重调度阈值和CPU重调度阈值可以按百分比方式进行预先设定，也可以按照其他方式设置（如具体数值）；Rescheduling threshold of physical node: pre-configured threshold value that triggers POD rescheduling, including memory rescheduling threshold and CPU rescheduling threshold of target physical node. Memory rescheduling threshold and CPU rescheduling threshold can be pre-set as percentage or in other ways (such as specific values).

部署业务微服务POD部署完成后，应用管理程序会按照一定周期（一般为几秒），通过K8S集群的API 接口获取部署业务微服务POD的物理节点的实时内存占用量和实时CPU内核数量等，检测物理节点是否处于重负载状态；若处于重负载状态则说明当前物理节点的负载较高，应用管理程序会驱逐该节点上的POD，被驱逐的POD经过K8Skube-scheduler组件计算后重新调度到其他节点；若未处于重负载状态则说明当前物理节点负载相对空闲，无需进行重调度；After the business microservice POD is deployed, the application management program will obtain the real-time memory usage and real-time CPU core number of the physical node where the business microservice POD is deployed through the API interface of the K8S cluster at a certain period (usually a few seconds) to detect whether the physical node is in a heavy load state; if it is in a heavy load state, it means that the current load of the physical node is high, and the application management program will evict the POD on the node. The evicted POD will be rescheduled to other nodes after calculation by the K8Skube-scheduler component; if it is not in a heavy load state, it means that the current physical node load is relatively idle and no rescheduling is required;

具体调度策略包括但不限于资源优先、数据局部性、节点亲和性、反亲和性、污点、容忍度等。Specific scheduling strategies include but are not limited to resource priority, data locality, node affinity, anti-affinity, taint, tolerance, etc.

实施例1Example 1

如附图1所示，本实施例给出了一种基于K8S的POD资源动态平滑扩展方法的具体实施方式，所述方法包括：As shown in FIG. 1 , this embodiment provides a specific implementation of a method for dynamically and smoothly expanding POD resources based on K8S, and the method includes:

在K8S集群的Node节点上预先部署业务微服务POD；其中，所述业务微服务POD用于安装视频监控管理软件；Pre-deploy the business microservice POD on the Node node of the K8S cluster; wherein the business microservice POD is used to install the video surveillance management software;

预先部署业务微服务POD和应用管理程序；其中，所述业务微服务POD用于安装视频监控管理软件；Pre-deploy business microservice POD and application management program; wherein the business microservice POD is used to install video surveillance management software;

若是，则所述应用管理程序基于预设步长Ⅰ和所述原内存门限值获得新内存门限值，并基于所述新内存门限值对所述原内存门限值进行更新；其中，所述新内存门限值＞所述原内存门限值，如所述新内存门限值=所述原内存门限值+预设步长Ⅰ；If so, the application management program obtains a new memory threshold value based on the preset step length I and the original memory threshold value, and updates the original memory threshold value based on the new memory threshold value; wherein the new memory threshold value>the original memory threshold value, such as the new memory threshold value=the original memory threshold value+preset step length I;

若是，则所述应用管理程序基于预设步长Ⅱ和所述原CPU门限值获得新CPU门限值，并基于所述新CPU门限值对所述原CPU门限值进行更新；其中，所述新CPU门限值＞所述原CPU门限值，如所述新CPU门限值=所述原CPU门限值+预设步长Ⅱ。If so, the application management program obtains a new CPU threshold value based on the preset step size II and the original CPU threshold value, and updates the original CPU threshold value based on the new CPU threshold value; wherein, the new CPU threshold value > the original CPU threshold value, such as the new CPU threshold value = the original CPU threshold value + the preset step size II.

需要说明的是，配置文件中的内存门限值具体数值为原内存门限值，所述原内存门限值和所述新内存门限值是相对而言的，在内存门限值更新时，配置文件中的内存门限值具体数值由原内存门限值变更为新内存门限值；在下一次动态调整判断时，以配置文件中的内存门限值为准进行判断；It should be noted that the specific value of the memory threshold value in the configuration file is the original memory threshold value, and the original memory threshold value and the new memory threshold value are relative. When the memory threshold value is updated, the specific value of the memory threshold value in the configuration file is changed from the original memory threshold value to the new memory threshold value; when the next dynamic adjustment judgment is made, the memory threshold value in the configuration file is used as the basis for judgment;

配置文件中的CPU门限值具体数值为原CPU门限值，原CPU门限值和新CPU门限值也是相对而言，在CPU门限值更新时，配置文件中的CPU门限值具体数值由原CPU门限值变更为新CPU门限值，在下一次动态调整判断时，以配置文件中的CPU门限值为准进行判断。The specific value of the CPU threshold value in the configuration file is the original CPU threshold value. The original CPU threshold value and the new CPU threshold value are also relative. When the CPU threshold value is updated, the specific value of the CPU threshold value in the configuration file is changed from the original CPU threshold value to the new CPU threshold value. When the next dynamic adjustment is made, the CPU threshold value in the configuration file is used as the basis for judgment.

还需要说明的是，本实施例通过在K8S集群上预先安装监测业务微服务POD的应用管理程序，将原内存门限值和原CPU门限值设置为POD资源动态参数，基于内存扩展阈值和CPU扩展阈值等判断POD资源动态平滑扩展的时机，通过应用管理程序利用预设步长自动对POD资源进行动态平滑扩展。It should also be noted that this embodiment pre-installs an application management program for monitoring the business microservice POD on the K8S cluster, sets the original memory threshold value and the original CPU threshold value as POD resource dynamic parameters, judges the timing of dynamic and smooth expansion of POD resources based on the memory expansion threshold and the CPU expansion threshold, and automatically dynamically and smoothly expands POD resources using a preset step size through the application management program.

还需要说明的是，本实施例根据原内存门限值与内存最大值门限的比较结果、以及实时内存占用量与内存扩展阈值之间的关系，自动确定是否对原内存门限值进行更新；通过原CPU门限值与CPU内核数最大值门限的比较结果、以及实时CPU内核数量与CPU扩展阈值之间的关系，自动确定是否对原CPU门限值进行更新。It should also be noted that this embodiment automatically determines whether to update the original memory threshold value based on the comparison result between the original memory threshold value and the maximum memory value threshold, and the relationship between the real-time memory usage and the memory expansion threshold; and automatically determines whether to update the original CPU threshold value based on the comparison result between the original CPU threshold value and the maximum CPU core number threshold, and the relationship between the real-time CPU core number and the CPU expansion threshold.

还需要说明的是，本实施例中POD按内存和CPU两种资源类型分别进行扩展，且POD资源扩展是动态而非静态；POD资源动态扩展是指根据应用管理程序实时监测到的POD内存和CPU实际占用情况，实时进行计算、判断及自动扩容，而传统的POD资源只能在静态yaml文件中定义，设置好后无论资源是否够用，都无法进行改变，除非手工修改。It should also be noted that in this embodiment, POD is expanded according to two resource types, memory and CPU, respectively, and the POD resource expansion is dynamic rather than static; dynamic expansion of POD resources refers to real-time calculation, judgment and automatic expansion based on the actual occupancy of POD memory and CPU monitored by the application management program in real time, while traditional POD resources can only be defined in static yaml files. After setting, no matter whether the resources are sufficient or not, they cannot be changed unless they are modified manually.

还需要说明的是，本实施例突破了传统K8S中POD资源静态分配和只能调度一次的局限，使得资源分配和利用更为灵活、合理。It should also be noted that this embodiment breaks through the limitations of static allocation of POD resources and scheduling only once in traditional K8S, making resource allocation and utilization more flexible and reasonable.

实施例2Example 2

在实施例1的基础上，本实施例给出了两种具体实施方式。Based on Example 1, this example provides two specific implementation methods.

在一种具体实施方式中，如附图2所示，所述内存扩展阈值为原内存门限值的百分比，所述CPU扩展阈值为原CPU门限值的百分比；In a specific implementation, as shown in FIG2 , the memory expansion threshold is a percentage of the original memory threshold value, and the CPU expansion threshold is a percentage of the original CPU threshold value;

基于所述业务微服务POD对应的内存扩展阈值、实时内存占用量、原内存门限值及内存最大值门限，判断是否对所述原内存门限值进行更新时，执行：When determining whether to update the original memory threshold value based on the memory expansion threshold, real-time memory usage, original memory threshold value, and maximum memory threshold value corresponding to the business microservice POD, execute:

计算实时内存百分比，所述实时内存百分比=所述业务微服务POD的实时内存占用量/所述原内存门限值；Calculate the real-time memory percentage, where the real-time memory percentage = the real-time memory usage of the business microservice Pod/the original memory threshold value;

在所述实时内存百分比＞所述内存扩展阈值、且所述原内存门限值≤预设的内存最大值门限时，所述应用管理程序判定对所述原内存门限值进行更新；否则，所述应用管理程序判定不对所述原内存门限值进行更新；When the real-time memory percentage is greater than the memory expansion threshold and the original memory threshold value is less than or equal to the preset maximum memory threshold value, the application management program determines to update the original memory threshold value; otherwise, the application management program determines not to update the original memory threshold value;

基于所述业务微服务POD对应的CPU扩展阈值、实时CPU内核数量、原CPU门限值及CPU内核数最大值门限，判断是否对所述原CPU门限值进行更新时，执行：When determining whether to update the original CPU threshold value based on the CPU expansion threshold value, the real-time number of CPU cores, the original CPU threshold value, and the maximum threshold value of the number of CPU cores corresponding to the business microservice POD, the following is executed:

计算实时CPU百分比，所述实时CPU百分比=所述业务微服务POD的实时CPU内核数量/所述原CPU门限值；Calculate the real-time CPU percentage, where the real-time CPU percentage = the number of real-time CPU cores of the business microservice POD/the original CPU threshold value;

在所述实时CPU百分比＞所述CPU扩展阈值，且所述原CPU门限值≤预设的CPU内核数最大值门限时，所述应用管理程序判定对所述原CPU门限值进行更新；否则，所述应用管理程序判定不对所述原CPU门限值进行更新。When the real-time CPU percentage is greater than the CPU expansion threshold and the original CPU threshold value is less than or equal to the preset CPU core maximum value threshold, the application management program determines to update the original CPU threshold value; otherwise, the application management program determines not to update the original CPU threshold value.

如，内存扩展阈值设置为60%，如果所述实时内存百分比（业务微服务POD的实时内存占用量/原内存门限值）＞60%，且原内存门限值≤预设的内存最大值门限，应用管理程序则会动态增加POD的原内存门限值，以预设步长Ⅰ将所述原内存门限值动态调整为新内存门限值，新内存门限值=原内存门限值+预设步长Ⅰ；For example, the memory expansion threshold is set to 60%. If the real-time memory percentage (real-time memory usage of the business microservice POD/original memory threshold) is greater than 60%, and the original memory threshold is less than or equal to the preset maximum memory threshold, the application management program will dynamically increase the original memory threshold of the POD, and dynamically adjust the original memory threshold to a new memory threshold with a preset step length I. The new memory threshold = the original memory threshold + the preset step length I;

如CPU扩展阈值设置为60%，如果所述实时CPU百分比（业务微服务POD的实时CPU内核数量/原CPU门限值）＞60%，且原CPU门限值≤预设的CPU内核数最大值门限，应用管理程序会动态增加POD的原CPU门限值，以预设步长Ⅱ将所述原CPU门限值动态调整为所述新CPU门限值，新CPU门限值= 原CPU门限值+预设步长Ⅱ。For example, if the CPU expansion threshold is set to 60%, if the real-time CPU percentage (real-time number of CPU cores of the business microservice POD/original CPU threshold value) is greater than 60%, and the original CPU threshold value is ≤ the preset maximum value threshold of the number of CPU cores, the application management program will dynamically increase the original CPU threshold value of the POD, and dynamically adjust the original CPU threshold value to the new CPU threshold value with a preset step size II, and the new CPU threshold value = original CPU threshold value + preset step size II.

在另一种具体实施方式中，如附图3所示，所述内存扩展阈值和所述实时内存占用量的单位一致，所述CPU扩展阈值和所述实时CPU内核数量的单位一致；In another specific implementation, as shown in FIG3 , the memory expansion threshold is in the same unit as the real-time memory usage, and the CPU expansion threshold is in the same unit as the real-time CPU core number;

在所述业务微服务POD的实时内存占用量＞所述内存扩展阈值、且所述原内存门限值≤预设的内存最大值门限时，所述应用管理程序判定对所述原内存门限值进行更新；否则，所述应用管理程序判定不对所述原内存门限值进行更新；When the real-time memory usage of the business microservice POD is greater than the memory expansion threshold, and the original memory threshold value is less than or equal to the preset maximum memory threshold value, the application management program determines to update the original memory threshold value; otherwise, the application management program determines not to update the original memory threshold value;

在所述业务微服务POD的实时CPU内核数量＞所述CPU扩展阈值，且所述原CPU门限值≤预设的CPU内核数最大值门限时，所述应用管理程序判定对所述原CPU门限值进行更新；否则，所述应用管理程序判定不对所述原CPU门限值进行更新。When the real-time number of CPU cores of the business microservice POD is greater than the CPU expansion threshold, and the original CPU threshold value is less than or equal to the preset maximum value threshold of the number of CPU cores, the application management program determines to update the original CPU threshold value; otherwise, the application management program determines not to update the original CPU threshold value.

如，内存扩展阈值设置为原内存门限值的60%，如果业务微服务POD的实时内存占用量＞内存扩展阈值，且原内存门限值≤预设的内存最大值门限，应用管理程序则会动态增加POD的原内存门限值，以预设步长Ⅰ将所述原内存门限值动态调整为新内存门限值，新内存门限值=原内存门限值+预设步长Ⅰ；For example, the memory expansion threshold is set to 60% of the original memory threshold. If the real-time memory usage of the business microservice POD is greater than the memory expansion threshold, and the original memory threshold is less than or equal to the preset maximum memory threshold, the application management program will dynamically increase the original memory threshold of the POD, and dynamically adjust the original memory threshold to a new memory threshold with a preset step size I. The new memory threshold = the original memory threshold + the preset step size I;

如CPU扩展阈值设置为原CPU门限值的60%，如果业务微服务POD的实时CPU内核数量＞CPU扩展阈值，且原CPU门限值≤预设的CPU内核数最大值门限，应用管理程序会动态增加POD的原CPU门限值，以预设步长Ⅱ将所述原CPU门限值动态调整为所述新CPU门限值，新CPU门限值= 原CPU门限值+预设步长Ⅱ。For example, if the CPU expansion threshold is set to 60% of the original CPU threshold value, if the real-time number of CPU cores of the business microservice POD is greater than the CPU expansion threshold, and the original CPU threshold value is ≤ the preset maximum value threshold of the number of CPU cores, the application management program will dynamically increase the original CPU threshold value of the POD, and dynamically adjust the original CPU threshold value to the new CPU threshold value with a preset step size II, and the new CPU threshold value = the original CPU threshold value + the preset step size II.

需要说明的是，附图3所示的具体实施方式中，所述内存扩展阈值为动态参数，跟随原内存门限值进行动态变化；所述CPU扩展阈值也为动态参数，跟随原CPU门限值进行动态变化。在对原内存门限值进行更新后，所述内存扩展阈值自动发生变化，如新内存扩展阈值为60%的新内存门限值；在对所述原CPU门限值进行更新后，CPU扩展阈值也自动发生变化。It should be noted that in the specific implementation shown in FIG. 3, the memory expansion threshold is a dynamic parameter, which changes dynamically following the original memory threshold value; the CPU expansion threshold is also a dynamic parameter, which changes dynamically following the original CPU threshold value. After the original memory threshold value is updated, the memory expansion threshold changes automatically, such as the new memory expansion threshold is 60% of the new memory threshold value; after the original CPU threshold value is updated, the CPU expansion threshold also changes automatically.

还需要说明的是，上述两种具体实施方式为并列方案，在实际应用中，择一使用即可。It should also be noted that the above two specific implementation methods are parallel solutions, and in practical applications, only one can be used.

在一些实施例中，所述基于K8S的POD资源动态平滑扩展方法还包括：In some embodiments, the K8S-based POD resource dynamic and smooth expansion method further includes:

预先在所述K8S集群中的API（Application Programming Interface）服务器、关键组件kubelet和默认调度器的配置文件中，分别添加就地垂直伸缩参数，并将所述就地垂直伸缩参数预先配置为true，以启用平滑扩容操作。In the configuration files of the API (Application Programming Interface) server, the key component kubelet, and the default scheduler in the K8S cluster, in-place vertical scaling parameters are added respectively, and the in-place vertical scaling parameters are pre-configured to true to enable smooth expansion operations.

需要说明的是，本实施例通过预先在K8S集群中的API服务器(kube-apiserver)、关键组件kubelet和默认调度器(kube-scheduler)的配置文件中，分别添加额外参数InPlacePodVerticalScaling=true，启用平滑扩容操作，资源扩展被触发后自动进行，无需人为干预；因此，本实施例的POD资源扩展是平滑进行的，扩展过程中POD无需重启，业务不中断。It should be noted that this embodiment enables smooth capacity expansion by adding the additional parameter InPlacePodVerticalScaling=true in the configuration files of the API server (kube-apiserver), the key component kubelet and the default scheduler (kube-scheduler) in the K8S cluster in advance. After the resource expansion is triggered, it is automatically carried out without human intervention. Therefore, the POD resource expansion in this embodiment is carried out smoothly, the POD does not need to be restarted during the expansion process, and the business is not interrupted.

实施例3Example 3

在上述实施例的基础上，本实施例给出了一种基于K8S的POD重负载时的重调度方法的具体实施方式。Based on the above embodiment, this embodiment provides a specific implementation method of a rescheduling method when a POD is heavily loaded based on K8S.

在一些实施例中，所述基于K8S的POD重负载时的重调度方法包括：In some embodiments, the rescheduling method based on K8S when the POD is heavily loaded includes:

将部署业务微服务POD的Node节点设置目标物理节点，并预先配置所述目标物理节点的重调度阈值；Set the target physical node for the Node node where the business microservice POD is deployed, and pre-configure the rescheduling threshold of the target physical node;

在执行实施例1或2中的基于K8S的POD资源动态平滑扩展方法的过程中，利用所述应用管理程序监测所述目标物理节点，基于所述重调度阈值判断所述目标物理节点是否处于重负载状态；In the process of executing the K8S-based POD resource dynamic smooth expansion method in Example 1 or 2, the target physical node is monitored by the application management program, and whether the target physical node is in a heavy load state is determined based on the rescheduling threshold;

需要说明的是，在目标物理节点的资源实际使用情况异常，应用管理程序会对Node节点上的业务微服务POD进行驱逐来释放目标物理节点的负载，以此保障集群的可用性；It should be noted that if the actual resource usage of the target physical node is abnormal, the application management program will evict the business microservice POD on the Node node to release the load of the target physical node to ensure the availability of the cluster;

还需要说明的是，POD重调度是动态而非静态的，是自动进行的，无需人为干预。It should also be noted that POD rescheduling is dynamic rather than static, and is performed automatically without human intervention.

在一些实施例中，将所述Node节点上的业务微服务POD重新调度到K8S集群中其它可用节点上之前，还包括：In some embodiments, before rescheduling the business microservice POD on the Node node to other available nodes in the K8S cluster, it also includes:

判断目标业务微服务POD所在节点与处于重负载状态的节点是否为同一物理节点，若是则先将目标业务微服务POD重新调度到K8S集群中其它可用节点上，如附图4所示；Determine whether the node where the target business microservice POD is located is the same physical node as the node in the heavy load state. If so, first reschedule the target business microservice POD to other available nodes in the K8S cluster, as shown in Figure 4;

其中，所述目标业务微服务POD指的是需要对原内存门限值和/或原CPU门限值进行更新的业务微服务POD；The target business microservice POD refers to the business microservice POD for which the original memory threshold value and/or the original CPU threshold value need to be updated;

若目标业务微服务POD所在节点与处于重负载状态的节点不是同一物理节点，则POD资源动态平滑扩展操作与POD重调度操作并行处理，如附图4所示。If the node where the target business microservice POD is located is not the same physical node as the node in the heavy load state, the POD resource dynamic smooth expansion operation and the POD rescheduling operation are processed in parallel, as shown in Figure 4.

需要说明的是，如附图4所示，在仅需要对原内存门限值和/或原CPU门限值进行更新时执行实施例1中的相关步骤；在仅需要对存在目标物理节点处于重负载状态的情况时，执行驱逐物理节点上的POD操作。It should be noted that, as shown in FIG4, the relevant steps in Example 1 are executed when only the original memory threshold value and/or the original CPU threshold value need to be updated; and the POD operation on the physical node is executed when only the target physical node is in a heavy load state.

实施例4Example 4

在上述实施例的基础上，本实施例给出了一种K8S集群的具体实施方式；Based on the above embodiments, this embodiment provides a specific implementation method of a K8S cluster;

在一些实施例中，所述K8S集群包括N个master节点和M个Node节点，复用集群中的K个节点部署Etcd分布式数据库，所述master节点上编译安装有负载代理组件Nginx；其中，所述负载代理组件Nginx的配置文件中包括N个Master节点的代理信息；In some embodiments, the K8S cluster includes N master nodes and M Node nodes, and the K nodes in the cluster are reused to deploy the Etcd distributed database. The load proxy component Nginx is compiled and installed on the master node; wherein the configuration file of the load proxy component Nginx includes the proxy information of the N Master nodes;

需要说明的是，现有POD的CPU和内存资源只能在其yaml文件中根据经验进行静态定义，无法根据POD实际的资源占用和变化情况进行动态调整，较易发生因资源定义不准确、不合理导致了业务无法正常运行；而铁路综合视频监控系统要对大量高清视频图像进行不间断的分析处理，正常情况下对资源要求已经很高，如遇到应急指挥、事后分析、热点回溯、施工检修等多方集中调看的情形，所需资源会瞬时激增，波动很大，这就要求能对K8S中的业务POD资源进行自动的监测判断，实现按需的动态扩容，否则就会导致视频业务响应缓慢甚至中断。为解决该问题，本实施例给出了一种能够POD资源动态平滑扩展的K8S集群，可以满足以上应用需求。It should be noted that the CPU and memory resources of the existing POD can only be statically defined in its yaml file based on experience, and cannot be dynamically adjusted according to the actual resource occupancy and changes of the POD. It is easy for the business to fail to operate normally due to inaccurate and unreasonable resource definitions; and the railway integrated video monitoring system needs to continuously analyze and process a large number of high-definition video images. Under normal circumstances, the resource requirements are already very high. For example, in the case of emergency command, post-analysis, hot spot backtracking, construction and maintenance, etc., the required resources will surge instantly and fluctuate greatly. This requires that the business POD resources in K8S can be automatically monitored and judged to achieve dynamic expansion on demand, otherwise it will cause the video business to respond slowly or even interrupted. To solve this problem, this embodiment provides a K8S cluster that can dynamically and smoothly expand POD resources to meet the above application requirements.

在一些实施例中，在K8S集群中的API服务器、关键组件和默认调度器的配置文件中，分别预先添加就地垂直伸缩参数，并将所述就地垂直伸缩参数预先配置为true，以启用平滑扩容操作。In some embodiments, in the configuration files of the API server, key components and default scheduler in the K8S cluster, in-place vertical scaling parameters are pre-added respectively, and the in-place vertical scaling parameters are pre-configured to true to enable smooth expansion operations.

需要说明的是，之前版本的K8S中，即使进行了静态扩容，新增的资源也不能立即被使用，必须重启业务POD后才能正式生效，这必然会引起业务的中断（连续扩容时甚至是多次频繁中断）；而综合视频监控系统的资源扩容往往是突发，无规律、不定时的，且现场无专业的K8S和业务值守人员，无法进行正确、有效、及时的扩容及重启业务POD等人工操作，因此要求POD资源扩展必须是在无人干预的情况下自动进行的，且无需重启相关POD，否则资源扩容就没有实际意义。为解决该问题，本实施例启用平滑扩容操作，在POD资源扩展动态过程中，在修改所述原内存门限值和所述原CPU门限值时，业务微服务POD不受影响，原有的应用连接不会中断，修改后的内存门限值和CPU门限值是实时生效的，因此扩展过程中POD无需重启，业务不中断。It should be noted that in previous versions of K8S, even if static expansion is performed, the newly added resources cannot be used immediately, and the business POD must be restarted before it can take effect, which will inevitably cause business interruptions (even multiple frequent interruptions during continuous expansion); and the resource expansion of the comprehensive video surveillance system is often sudden, irregular, and irregular, and there are no professional K8S and business on-duty personnel on site, and it is impossible to perform correct, effective, and timely expansion and restart of business POD and other manual operations. Therefore, it is required that POD resource expansion must be performed automatically without human intervention, and there is no need to restart the relevant POD, otherwise the resource expansion will have no practical significance. To solve this problem, this embodiment enables smooth expansion operation. During the dynamic process of POD resource expansion, when modifying the original memory threshold value and the original CPU threshold value, the business microservice POD is not affected, the original application connection will not be interrupted, and the modified memory threshold value and CPU threshold value are effective in real time. Therefore, during the expansion process, the POD does not need to be restarted and the business is not interrupted.

实施例5Example 5

在上述实施例的基础上，本实施例给出了另一种K8S集群的具体实施方式；Based on the above embodiment, this embodiment provides another specific implementation method of K8S cluster;

在一些实施例中，所述K8S集群还将K8S集群中部署业务微服务POD的Node节点设置目标物理节点，并预先配置所述目标物理节点的重调度阈值；In some embodiments, the K8S cluster also sets the Node node where the business microservice POD is deployed in the K8S cluster as the target physical node, and pre-configures the rescheduling threshold of the target physical node;

利用所述应用管理程序监测所述目标物理节点，基于所述重调度阈值判断所述目标物理节点是否处于重负载状态；Monitoring the target physical node using the application management program, and determining whether the target physical node is in a heavy load state based on the rescheduling threshold;

需要说明的是，其它可用节点指的是除目标物理节点外的其他Node节点，且该Node节点目前并未处于重负载状态。It should be noted that other available nodes refer to other Node nodes except the target physical node, and the Node node is not currently in a heavy load state.

判断目标业务微服务POD所在节点与处于重负载状态的节点是否为同一物理节点，若是则先将目标业务微服务POD重新调度到K8S集群中其它可用节点上；Determine whether the node where the target business microservice POD is located is the same physical node as the node in a heavy load state. If so, reschedule the target business microservice POD to other available nodes in the K8S cluster;

其中，所述目标业务微服务POD指的是需要对原内存门限值和/或原CPU门限值进行更新的业务微服务POD。The target business microservice POD refers to a business microservice POD for which the original memory threshold value and/or the original CPU threshold value need to be updated.

在一种具体实施方式中，在K8S集群中实现POD资源动态平滑扩展及重负载时重调度时，执行以下步骤：In a specific implementation, when implementing dynamic and smooth expansion of POD resources and rescheduling under heavy load in a K8S cluster, the following steps are performed:

步骤1，基于K8S集群包括两个Master/Node复用节点和两个Node组建的四节点，复用集群中的三个节点部署Etcd分布式数据库，工作模式为集群；Step 1: Based on the four-node K8S cluster consisting of two Master/Node multiplexing nodes and two Nodes, deploy the Etcd distributed database on three nodes in the multiplexing cluster, and the working mode is cluster;

在4台服务器配置IPv4地址，更改系统参数，关闭服务器防火墙和SELINUX，关闭SWAP空间，更改节点主机名；Configure IPv4 addresses on the four servers, change system parameters, disable server firewalls and SELINUX, close SWAP space, and change node host names;

在K8S集群的Node节点上安装部署应用管理程序；Install and deploy the application management program on the Node node of the K8S cluster;

步骤2，K8S和Etcd之间通信使用的是https协议，需要签发证书，在生产环境中使用自定义证书；使用开源工具生成CA证书，证书有效期为10年，使用生成的CA证书来签发Etcd证书、K8S组件kube-apiserver/kube-scheduler/kube-proxy/kubelet/kube-controller-manager等相关证书；Step 2: The communication between K8S and Etcd uses the https protocol, which requires the issuance of a certificate. Custom certificates are used in the production environment. Use open source tools to generate a CA certificate, which is valid for 10 years. Use the generated CA certificate to issue Etcd certificates, K8S component kube-apiserver/kube-scheduler/kube-proxy/kubelet/kube-controller-manager and other related certificates.

步骤3，证书配置完成后，创建安装目录，把生成的证书传送到其它节点，使用二进制安装包部署K8S集群，编辑K8S组件需要的配置文件添加额外参数InPlacePodVerticalScaling=true来启用平滑扩容新特性，如kubelet.conf文件：Step 3. After the certificate configuration is completed, create an installation directory, transfer the generated certificate to other nodes, deploy the K8S cluster using the binary installation package, and edit the configuration file required by the K8S component to add the additional parameter InPlacePodVerticalScaling=true to enable the new feature of smooth expansion, such as the kubelet.conf file:

KUBELET_OPTS="--v=4 \KUBELET_OPTS="--v=4 \

--container-runtime-endpoint=unix:///run/containerd/containerd.sock \--container-runtime-endpoint=unix:///run/containerd/containerd.sock \

--feature-gates=InPlacePodVerticalScaling=true " //新增参数--feature-gates=InPlacePodVerticalScaling=true " // New parameters

其它组件配置文件kube-proxy.conf/kube-scheduler.conf/Other component configuration files kube-proxy.conf/kube-scheduler.conf/

kube-apiserver.conf配置相似，完成新特性参数添加后，还需要添加如节点ip/端口/证书存放目录/日志存放目录等相关配置内容，配置完成后启动相关服务，查看集群状态；The configuration of kube-apiserver.conf is similar. After adding the new feature parameters, you also need to add related configuration contents such as node IP/port/certificate storage directory/log storage directory. After the configuration is completed, start the related services and check the cluster status.

步骤4，在集群中部署网络组件Calico，保证POD跨节点通信；Step 4: Deploy the network component Calico in the cluster to ensure cross-node communication between PODs.

步骤5，网络组件部署完成，部署应用微服务POD，在POD的yaml中需要定义CPU和内存资源如下：Step 5: After the network components are deployed, deploy the application microservice POD. In the POD's YAML, you need to define the CPU and memory resources as follows:

limits:limits:

memory: "100Mi"memory: "100Mi"

cpu: "100m"cpu: "100m"

requests:requests:

memory: "100Mi"memory: "100Mi"

cpu: "100m"cpu: "100m"

步骤6，应用部署完成后，通过应用管理程序周期性调用集群api接口获取pod的CPU和内存信息、获取集群节点的CPU和内存信息，如类似的api接口https://X.X.X.X:9009/k8s/clusters/c-m-bd7ssk4c/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/cms-bd89844c9-x492r，通过调用该接口会返回相应POD的信息，以json格式呈现，过滤出相应的字段获得CPU和内存的相关信息；Step 6. After the application is deployed, the cluster API interface is periodically called through the application management program to obtain the CPU and memory information of the pod and the CPU and memory information of the cluster nodes. For example, a similar API interface is available at https://X.X.X.X:9009/k8s/clusters/c-m-bd7ssk4c/apis/metrics.k8s.io/v1beta1/namespaces/default/pods/cms-bd89844c9-x492r. Calling this interface will return the information of the corresponding POD in JSON format. Filter out the corresponding fields to obtain the relevant information of CPU and memory.

步骤7，获取到的POD性能数据与集群中存储的阈值计算，阈值通过ConfigMap存储到集群中，ConfigMap的阈值内容如下定义：Step 7: The obtained POD performance data is calculated with the threshold stored in the cluster. The threshold is stored in the cluster through ConfigMap. The threshold content of ConfigMap is defined as follows:

[App][App]

CpuDrainLimit=60CpuDrainLimit=60

MemDrainLimit=60MemDrainLimit=60

通过kubectl create命令创建到集群中；Create the cluster using the kubectl create command;

在业务微服务POD的实时内存占用量/原内存门限值的结果大于60%，和/或，业务微服务POD的实时CPU内核数量/ 原CPU门限值的结果大于60%时，代表POD当前负载过高，会根据提前设定好的步长程序进行扩容操作，该操作是由应用管理程序后台自动完成。但每个POD设置有最大值，当扩容操作到达最大值时，自动扩容操作停止，不再进行扩容操作；When the result of the real-time memory usage of the business microservice POD/the original memory threshold is greater than 60%, and/or the result of the real-time number of CPU cores of the business microservice POD/the original CPU threshold is greater than 60%, it means that the current load of the POD is too high, and the capacity expansion operation will be performed according to the pre-set step length program. This operation is automatically completed by the application management program background. However, each POD is set with a maximum value. When the capacity expansion operation reaches the maximum value, the automatic capacity expansion operation stops and no more capacity expansion operations are performed;

步骤8，应用管理程序通过集群的API接口获取到的集群节点性能数据与集群中存储的阈值计算，如类似的获取Node信息的接口：Step 8: The application management program obtains the cluster node performance data through the cluster API interface and calculates the threshold stored in the cluster, such as the interface for obtaining Node information:

https://X.X.X.X:9009/k8s/clusters/c-m-jnt7pjdz/v1/metrics.k8s.io.nodes，通过调用接口，获取node的信息，以json格式返回，程序代码通过过滤相应的字段提前出Node的CPU和内存信息，阈值通过ConfigMap存储到集群中，ConfigMap的阈值内容如下定义：https://X.X.X.X:9009/k8s/clusters/c-m-jnt7pjdz/v1/metrics.k8s.io.nodes, obtain the node information by calling the interface and return it in json format. The program code filters the corresponding fields to get the CPU and memory information of the Node in advance. The threshold is stored in the cluster through ConfigMap. The threshold content of ConfigMap is defined as follows:

[App][App]

CpuDrainLimit=60CpuDrainLimit=60

MemDrainLimit=60MemDrainLimit=60

通过kubectl create命令创建到集群中。Use the kubectl create command to create the cluster.

在（业务微服务POD所在物理节点的实时内存占用量/该物理节点内存门限值）的结果大于60%，和/或，（业务微服务POD所在物理节点的实时CPU内核数量/ 该物理节点原CPU门限值）的结果大于60%时，说明物理节点负载过高，程序会对该节点上的POD进行驱逐，并自动重新调度到其它物理节点以保障集群的稳定和可用。When the result of (real-time memory usage of the physical node where the business microservice POD is located / memory threshold of the physical node) is greater than 60%, and/or the result of (real-time number of CPU cores of the physical node where the business microservice POD is located / original CPU threshold of the physical node) is greater than 60%, it means that the physical node load is too high. The program will evict the POD on the node and automatically reschedule it to other physical nodes to ensure the stability and availability of the cluster.

需要说明的是，一个POD调度到哪个Node节点上运行，是由K8S的scheduler组件根据一定的条件和内部算法计算出来的，这个过程不受人为控制。POD调度完成后，在没有人为参与重新调度或节点故障的前提下，是无法根据节点的资源实时使用情况重新进行调度。这会导致集群中各个Node节点的资源使用率变的很不均衡，不仅会造成资源浪费，还有可能会使部分Node的资源使用率飙升，从而导致其上承载的所有服务卡顿甚至完全无法响应。为避免这种“首次即终生”模式带来的问题，本实施例提出一种能承载铁路综合视频监控系统的K8S平台，不仅可以实现POD层面的资源动态扩展能力，还可以实现更高层次的Node物理节点间的自动重调度功能，使POD可以从重负载的Node节点迁移到轻负载的Node节点上，进而均衡优化整个集群的资源负载情况。It should be noted that the Node node to which a POD is scheduled to run is calculated by the scheduler component of K8S according to certain conditions and internal algorithms, and this process is not controlled by humans. After the POD scheduling is completed, it is impossible to reschedule according to the real-time resource usage of the node without human participation in rescheduling or node failure. This will cause the resource utilization rate of each Node node in the cluster to become very unbalanced, which will not only cause resource waste, but also may cause the resource utilization rate of some Nodes to soar, causing all services carried on it to freeze or even completely fail to respond. In order to avoid the problems caused by this "first time is lifelong" mode, this embodiment proposes a K8S platform that can carry a railway integrated video surveillance system, which can not only realize the dynamic expansion capability of resources at the POD level, but also realize the automatic rescheduling function between higher-level Node physical nodes, so that POD can be migrated from heavily loaded Node nodes to lightly loaded Node nodes, thereby balancing and optimizing the resource load of the entire cluster.

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments may be arbitrarily combined. To make the description concise, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

最后应当说明的是：以上实施例仅用以说明本发明的技术方案而非对其限制；尽管参照较佳实施例对本发明进行了详细的说明，所属领域的普通技术人员应当理解：依然可以对本发明的具体实施方式进行修改或者对部分技术特征进行等同替换；而不脱离本发明技术方案的精神，其均应涵盖在本发明请求保护的技术方案范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention rather than to limit it. Although the present invention has been described in detail with reference to the preferred embodiments, ordinary technicians in the field should understand that the specific implementation methods of the present invention can still be modified or some technical features can be replaced by equivalents without departing from the spirit of the technical solution of the present invention, which should be included in the scope of the technical solution for protection of the present invention.

Claims

1. A method for dynamic and smooth expansion of POD resources based on K8S, characterized by comprising:

Pre-deploy business microservice POD and application management program; wherein the business microservice POD is used to install video surveillance management software;

Pre-configure the original memory threshold value, memory expansion threshold value, memory maximum value threshold value, original CPU threshold value, CPU expansion threshold value and CPU core maximum value threshold value corresponding to the business microservice POD; wherein the original memory threshold value and the original CPU threshold value are both POD resource dynamic parameters;

During the operation of the business microservice Pod, the pre-deployed application management program is used to monitor the real-time memory usage and real-time CPU core number of the business microservice Pod;

Based on the memory expansion threshold, real-time memory usage, original memory threshold and maximum memory threshold corresponding to the business microservice POD, determine whether to update the original memory threshold;

If yes, the application management program obtains a new memory threshold value based on the preset step size I and the original memory threshold value, and updates the original memory threshold value based on the new memory threshold value; wherein the new memory threshold value>the original memory threshold value;

Based on the CPU expansion threshold, the real-time number of CPU cores, the original CPU threshold and the maximum value threshold of the number of CPU cores corresponding to the business microservice POD, determine whether to update the original CPU threshold;

If so, the application management program obtains a new CPU threshold value based on the preset step size II and the original CPU threshold value, and updates the original CPU threshold value based on the new CPU threshold value; wherein the new CPU threshold value>the original CPU threshold value.

2. According to the K8S-based POD resource dynamic and smooth expansion method of claim 1, it is characterized in that the memory expansion threshold is a percentage of the original memory threshold value, and the CPU expansion threshold is a percentage of the original CPU threshold value;

When determining whether to update the original memory threshold value based on the memory expansion threshold, real-time memory usage, original memory threshold value, and maximum memory threshold value corresponding to the business microservice POD, execute:

Calculate the real-time memory percentage, where the real-time memory percentage = the real-time memory usage of the business microservice Pod/the original memory threshold value;

When the real-time memory percentage is greater than the memory expansion threshold and the original memory threshold value is less than or equal to the preset maximum memory threshold value, the application management program determines to update the original memory threshold value; otherwise, the application management program determines not to update the original memory threshold value;

When determining whether to update the original CPU threshold value based on the CPU expansion threshold value, the real-time number of CPU cores, the original CPU threshold value, and the maximum threshold value of the number of CPU cores corresponding to the business microservice POD, the following is executed:

Calculate the real-time CPU percentage, where the real-time CPU percentage = the number of real-time CPU cores of the business microservice POD/the original CPU threshold value;

When the real-time CPU percentage is greater than the CPU expansion threshold and the original CPU threshold value is less than or equal to the preset CPU core maximum value threshold, the application management program determines to update the original CPU threshold value; otherwise, the application management program determines not to update the original CPU threshold value.

3. According to the K8S-based POD resource dynamic and smooth expansion method of claim 1, it is characterized in that the unit of the memory expansion threshold is consistent with the real-time memory usage, and the unit of the CPU expansion threshold is consistent with the real-time CPU core number;

When the real-time memory usage of the business microservice POD is greater than the memory expansion threshold, and the original memory threshold value is less than or equal to the preset maximum memory threshold value, the application management program determines to update the original memory threshold value; otherwise, the application management program determines not to update the original memory threshold value;

When the real-time number of CPU cores of the business microservice POD is greater than the CPU expansion threshold, and the original CPU threshold value is less than or equal to the preset maximum value threshold of the number of CPU cores, the application management program determines to update the original CPU threshold value; otherwise, the application management program determines not to update the original CPU threshold value.

4. The method for dynamic and smooth expansion of POD resources based on K8S according to any one of claims 1 to 3, characterized in that it also includes:

In the configuration files of the API server, key components and default scheduler in the K8S cluster, in-place vertical scaling parameters are respectively added, and the in-place vertical scaling parameters are pre-configured to true to enable smooth expansion operations.

5. A rescheduling method for POD under heavy load based on K8S, characterized by comprising:

Set the target physical node for the Node node where the business microservice POD is deployed, and pre-configure the rescheduling threshold of the target physical node;

In the process of executing the K8S-based POD resource dynamic smooth expansion method according to any one of claims 1 to 4, the target physical node is monitored by the application management program, and whether the target physical node is in a heavy load state is determined based on the rescheduling threshold;

If the target physical node is in a heavy load state, the business microservice POD on the Node node will be rescheduled to other available nodes in the K8S cluster.

6. The rescheduling method for POD under heavy load based on K8S according to claim 5 is characterized in that before rescheduling the business microservice POD on the Node node to other available nodes in the K8S cluster, it also includes:

Determine whether the node where the target business microservice POD is located is the same physical node as the node in a heavy load state. If so, reschedule the target business microservice POD to other available nodes in the K8S cluster;

The target business microservice POD refers to a business microservice POD for which the original memory threshold value and/or the original CPU threshold value need to be updated.

7. A K8S cluster, characterized in that: it comprises N master nodes and M Node nodes, and a load proxy component Nginx is compiled and installed on the master node; wherein the configuration file of the load proxy component Nginx includes proxy information of the N Master nodes;

Pre-deploy a business microservice POD on the K8S cluster; wherein the business microservice POD is used to install the video surveillance management software;

An application management program is pre-installed on the K8S cluster; wherein the application management program is used to monitor the real-time memory usage and the real-time number of CPU cores of the business microservice POD;

During the operation of the business microservice Pod, the real-time memory usage and the real-time number of CPU cores of the business microservice Pod are monitored using the pre-installed application management program;

8. The K8S cluster according to claim 7 is characterized in that in the configuration files of the API server, key components and default scheduler in the K8S cluster, in-place vertical scaling parameters are pre-added respectively, and the in-place vertical scaling parameters are pre-configured to true to enable smooth expansion operations.

9. The K8S cluster according to claim 7 is characterized in that the Node node where the business microservice POD is deployed in the K8S cluster is set as a target physical node, and the rescheduling threshold of the target physical node is preconfigured;

Monitoring the target physical node using the application management program, and determining whether the target physical node is in a heavy load state based on the rescheduling threshold;

10. The K8S cluster according to claim 9, characterized in that before rescheduling the business microservice POD on the Node node to other available nodes in the K8S cluster, it also includes: