CN118509424A

CN118509424A - Cross-multi-version in-place upgrading method and device for k8s cluster

Info

Publication number: CN118509424A
Application number: CN202410524475.4A
Authority: CN
Inventors: 汤波; 沈一帆
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2024-04-29
Filing date: 2024-04-29
Publication date: 2024-08-16

Abstract

The invention provides a cross-multi-version in-situ upgrading method and device of a k8s cluster, which relate to the technical field of distribution, and comprise the following steps: distributing the k8s installation package of the new version and the script required by the upgrade to all nodes in the cluster; in-situ upgrade and object upgrade of ETCD are carried out after kube-apiserver and ETCD service are stopped; stopping the service of the Master Node and the Node in sequence, and finishing the in-situ upgrading of the Master Node and the Node; sequentially starting the Master Node and the service on the Node, and synchronously updating the local file of the Node; the kube-controller-manager service is started to resume k8s management of the cluster. The invention breaks through the in-situ upgrading of the k8s product, and the version-by-version frequent upgrading is not needed by modifying the mode of the k8s storage object data and the local file.

Description

Cross-multi-version in-place upgrading method and device for k8s cluster

Technical Field

The invention relates to the technical field of distribution, in particular to a multi-version-crossing in-situ upgrading method and device for a k8s cluster.

Background

In recent years, with the continuous development of digital economy, cloud computing plays a vital role as a core of new infrastructure. Cloud computing core technology kubernetes (k 8 s) maintains a 3 month version of high frequency release cadence. To obtain the bonus from kubernetes developments, the k8s version of the upgrade cluster is a necessary choice for each cloud vendor.

The Kubernetes official community currently adopts a bias policy-based upgrade mode, which can ensure that a high-availability cluster cannot cause problems due to component version differences, but requires that each upgrade version difference cannot be greater than one secondary version, and that its compute node components kubelet do not support in-place upgrades, requiring manual eviction of all containers before upgrades. Obviously, there are many limitations to this way of upgrading.

To address the above limitations, the industry has modified Kubernetes the source code based on the Kubernetes official in-place upgrade scheme: namely, the branch is updated through a new cluster in the original logic to expand the span of new and old component versions which can be supported by a deviation strategy of the Kubernetes update, and simultaneously, the problem of in-situ update of kubelet components is solved.

However, the general scheme in the industry can damage the integrity of kubernetes products, and the general scheme in the industry can invasively modify the original logic, so that the safety and the robustness of the products are affected, a large risk is introduced, and meanwhile, high development and test costs are also brought, and the maintainability and the reusability of the products are reduced due to the code residues of the one-time upgrading process.

Disclosure of Invention

In view of the above, the present invention provides a method and apparatus for in-place upgrade across multiple versions of k8s clusters to solve at least one of the problems mentioned above.

In order to achieve the above purpose, the present invention adopts the following scheme:

According to a first aspect of the present invention, there is provided a cross-multi-version in-place upgrade method of a k8s cluster, the method comprising: distributing the k8s installation package of the new version and the script required by the upgrade to all nodes in the cluster; stopping kube-apiserver and ETCD services of a k8s cluster to bring the k8s cluster into a maintenance phase; the method comprises the steps of performing in-situ upgrading on the ETCD by using an upgrading script, and upgrading an object stored in the ETCD after the in-situ upgrading of the ETCD is completed; stopping all services of the Master node, and finishing in-situ upgrading of the Master node by using an upgrading script; stopping all services of the Node, and finishing in-situ upgrading of the Node by using an upgrading script; sequentially starting kube-apiserver and kube-schedule services on the Master node; starting the service on the Node nodes, calling kube-apiserver interfaces on the host of each Node to acquire container object information, and rewriting a local file based on container dimensions according to the acquired container object information to ensure compatibility with K8s of a new version; and starting kube-controller-manager service of the Master node to resume the management of k8s on the cluster.

As an embodiment of the present invention, after the kube-controller-manager service of the Master node is started to resume the management of k8s on the cluster in the above method, the method further includes: executing cluster health check operation by calling an API interface of K8s, and checking whether a service is unavailable or a container is restarted; and updating the state and version information of the clusters in the K8s cluster management system.

As an embodiment of the present invention, the distributing the new version of the k8s installation package and the script required for upgrading to all the nodes in the cluster in the method includes: and checking the component state of the k8s cluster, judging whether the k8s cluster meets the upgrading condition, and distributing a new version of k8s installation package and scripts required by upgrading to all nodes in the cluster in response to the fact that the k8s cluster meets the upgrading condition.

As an embodiment of the present invention, the in-situ upgrade of the ETCD with the upgrade script in the above method includes: executing the ETCD snapshot, creating a backup of current ETCD data, and backing up the configuration file and system configuration of the related k8s cluster; and upgrading the ETCD in situ by using the upgrade script and the new version of k8s installation package.

As an embodiment of the present invention, executing the ETCD snapshot in the above method, creating a backup of current ETCD data, and then backing up the configuration file and system configuration of the related k8s cluster includes: determining node addresses, port numbers and required certificate authentication information of an operation ETCD cluster; creating an ETCD snapshot check snapshot file state by utilizing etcdctl tools based on the node address, the port number and the certificate path to ensure that the snapshot creation is successful and the data is complete; storing the snapshot file which is successfully created on other servers or cloud storage services; backup Master Node and Node configuration files, static Pod configuration files, network configuration files, service account keys, and encryption configuration files.

As an embodiment of the present invention, after distributing the new version of the k8s installation package and the script required for upgrading to all the nodes in the cluster in the method, the method further includes: evaluating the resources required in the upgrading process and auditing the use condition of the existing resources; performing an upgrade based on the evaluation result and the audit result during a period of low system load to reduce the impact on the service; resource quota and limits are set based on the evaluation results and the audit results to ensure that the upgrade operations do not consume excessive resources.

As an embodiment of the present invention, the evaluating the resources required in the upgrade process and auditing the existing resource usage in the above method includes: evaluating resources required by the upgrade operation itself, including CPU, memory, storage space and network bandwidth; evaluating the change of service load during upgrading operation, and predicting the load condition of the system in different time periods; the monitoring tool is used for auditing the resource utilization condition of the existing k8s cluster, including the resource utilization rate of the node, the resource request and limit setting of the Pod and the traffic mode of the service.

According to a second aspect of the present invention, there is provided a cross-multi-version in-place upgrade apparatus for a k8s cluster, the apparatus comprising: the distributing unit is used for distributing the k8s installation package of the new version and the script required by the upgrade to all nodes in the cluster; a maintenance unit, configured to stop kube-apiserver of a k8s cluster and an ETCD service to make the k8s cluster enter a maintenance phase; the ETCD upgrading unit is used for upgrading the ETCD in situ by utilizing the upgrading script and upgrading the object stored in the ETCD after the in-situ upgrading of the ETCD is completed; the Master upgrading unit is used for stopping all services of the Master node and finishing in-situ upgrading of the Master node by utilizing an upgrading script; the Node upgrading unit is used for stopping all services of the Node and finishing in-situ upgrading of the Node by using an upgrading script; the Master recovery unit is used for starting kube-apiserver and kube-scheduler services on the Master Node in sequence, and starting kube-controller-manager services of the Master Node to recover k8s management of the cluster after the Node recovery unit recovers the services; and the Node recovery unit is used for starting the service on the Node nodes, calling kube-apiserver interfaces on the host computers of each Node to acquire container object information, and rewriting the local file based on container dimensions according to the acquired container object information so as to ensure compatibility with the K8s of the new version.

As an embodiment of the present invention, the above apparatus further includes: the health check unit is used for executing cluster health check operation by calling an API interface of K8s after the Master restoring unit starts kube-controller-manager service of the Master node to restore the management of the K8s to the cluster, and checking whether the service is unavailable or the problem of restarting the container exists or not; and the state updating unit is used for updating the state and version information of the clusters in the K8s cluster management system.

As an embodiment of the present invention, the distributing unit distributes the new version of the k8s installation package and the script required for the upgrade to all nodes in the cluster includes: and checking the component state of the k8s cluster, judging whether the k8s cluster meets the upgrading condition, and distributing a new version of k8s installation package and scripts required by upgrading to all nodes in the cluster in response to the fact that the k8s cluster meets the upgrading condition.

For one embodiment of the present invention, the above-mentioned ETCD upgrade unit performs in-situ upgrade on the ETCD using upgrade scripts, including: executing the ETCD snapshot, creating a backup of current ETCD data, and backing up the configuration file and system configuration of the related k8s cluster; and upgrading the ETCD in situ by using the upgrade script and the new version of k8s installation package.

For one embodiment of the present invention, the ETCD upgrade unit executes an ETCD snapshot, creates a backup of current ETCD data, and then backs up a configuration file and a system configuration of a related k8s cluster, where the configuration file and the system configuration include: determining node addresses, port numbers and required certificate authentication information of an operation ETCD cluster; creating an ETCD snapshot with etcdctl tools based on the node address, the port number, and a credential path; checking the state of the snapshot file to ensure that the snapshot is created successfully and the data is complete; storing the snapshot file which is successfully created on other servers or cloud storage services; backup Master Node and Node configuration files, static Pod configuration files, network configuration files, service account keys, and encryption configuration files.

For one embodiment of the present invention, the apparatus further includes: the resource evaluation unit is used for evaluating the resources required in the upgrading process and auditing the use condition of the existing resources; an upgrade period selection unit for performing upgrade in a period of low system load based on the evaluation result and the audit result to reduce influence on the service; and the upgrading resource allocation unit is used for setting resource quota and limit based on the evaluation result and the audit result so as to ensure that excessive resources are not consumed by the upgrading operation.

For one embodiment of the present invention, the resource evaluation unit evaluates the resources required in the upgrade process, and audits the existing resource usage conditions including: evaluating resources required by the upgrade operation itself, including CPU, memory, storage space and network bandwidth; evaluating the change of service load during upgrading operation, and predicting the load condition of the system in different time periods; the monitoring tool is used for auditing the resource utilization condition of the existing k8s cluster, including the resource utilization rate of the node, the resource request and limit setting of the Pod and the traffic mode of the service.

According to a third aspect of the present invention there is provided an electronic device comprising a memory, a processor and a computer program stored on said memory and executable on said processor, the processor implementing the steps of the above method when executing said computer program.

According to a fourth aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method.

According to the technical scheme, the cross-version in-situ upgrading method and device for the k8s cluster break through the in-situ upgrading of the k8s product, under the condition of ensuring the integrity of the k8s product, the upgrading span of the k8s is expanded by modifying the data of the k8s storage object and the local file mode, the frequent upgrading of the k8s version is not needed, the defect that a container needs to be restarted in the upgrading of the k8s version is overcome, and the non-inductive lossless upgrading of the k8s version is realized. Specifically, data backup is realized through an ETCD snapshot, k8s storage object data is modified to complete k8s object upgrading transformation, data preparation is performed for local file upgrading, then a kube-apiserver interface is called to obtain updated container object information, and the local file is upgraded based on container dimensions according to the obtained container object information. And the whole upgrading process adopts a certain start-stop sequence mode by disabling kube-controller-manager service, reasonably applies a k8s deviation strategy, and solves the problem of version compatibility of each component and kube-apiserver component interfaces in the upgrading process.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:

FIG. 1 is a flow diagram of a cross-multi-version in-place upgrade method for a k8s cluster provided by an embodiment of the present invention;

FIG. 2 is a diagram of k8s component access relationships provided by an embodiment of the present application;

FIG. 3 is a schematic flow chart of in-situ upgrading of an ETCD using an upgrade script according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of executing an ETCD snapshot and backing up k8s cluster key files according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a cross-multi-version in-place upgrade method for a k8s cluster provided by an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a cross-multi-version in-place upgrade apparatus for a k8s cluster according to an embodiment of the present invention;

Fig. 7 is a schematic block diagram of a system configuration of an electronic device provided in an embodiment of the invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.

The information collected in the technical scheme is information and data which are authorized by a user or are fully authorized by each party, and the related data are collected, stored, used, processed, transmitted, provided, disclosed, applied and the like, so that the information complies with related laws and regulations and standards of related countries and regions, necessary security measures are adopted, the public welfare is not violated, and a corresponding operation entrance is provided for the user to select authorization or rejection. Providing a corresponding operation inlet for the user, and enabling the user to select to agree or reject the automatic decision result; if the user selects refusal, the expert decision flow is entered.

Fig. 1 is a flow chart of a multi-version in-place upgrading method for k8s clusters according to an embodiment of the present application, where the embodiment describes the present application from an upgrading script side, and the method includes the following steps:

step S101: and distributing the new version of the k8s installation package and the script required by the upgrade to all nodes in the cluster.

The method comprises the steps of preparing the k8s cluster cross-multi-version in-situ upgrading method, wherein in-situ upgrading is that a new version of software package is directly used for in-situ replacement, and the new software is based on an upgrading mode of re-running service. Thus, before upgrading in place, the k8s installation package and the scripts required for the upgrade need to be placed into all nodes in the cluster.

Depending on the architecture of the Kubernetes cluster and the roles of the various nodes, different types of nodes require different installation packages and upgrade scripts. Kubernetes clusters are mainly composed of two types of nodes: master Node and Node nodes, each of which assumes different responsibilities, thus distinguishing between the software packages and configurations required during the upgrade process. As shown in fig. 2, the k8s component access relationship diagram is that the k8s components kube-apiserver expose API SERVER interfaces to the outside, not only provide external interface services, but also serve as a bridge for communicating with other components, and meanwhile, convert all information into API objects resource and store them in ETCD. The k8s component kubelet is responsible for the management and maintenance of application containers for the run-time of the inter-connection Docker. While for the Docker runtime, the container corresponds to the user process and the local file.

The Master node is a part of a control plane and is responsible for the management and the scheduling of the whole cluster, and mainly comprises the installation and the update of the following components:

kube-apiserver: the Kubernetes API service is provided as a front end of the control plane.

Kube-schedule: and the scheduling decision is responsible for selecting a proper Node for the newly created Pod.

Kube-controller-manager: the controller is run to handle cluster-level functions such as joining and leaving of nodes.

Etcd: and the distributed key value storage is used for storing the state of the whole cluster.

The upgrade installation package and script of the Master node therefore need to contain the updated contents of these components.

The Node is a Node for running the actual application container and mainly comprises the following components of installation and update:

kubelet: running on each Node, is responsible for launching the Pod and container (e.g., dcoker) and reporting to the Master.

Kube-proxy: and maintaining network rules on the Node and realizing network connection of the service.

The upgrade installation package and scripts of the Node are mainly directed to these components.

Therefore, when the application is used for upgrading, the proper installation package and script can be selected for distribution according to the role of the node: preparing different installation packages and scripts: corresponding upgrade packages and scripts are prepared for Master and Node nodes, and the upgrade packages and scripts are ensured to contain components of correct versions and required configuration changes. By carefully preparing and using automation tools to distribute the installation packages and upgrade scripts, the upgrade process of the Kubernetes cluster can be ensured to be both efficient and secure.

Thus, preferably, the step may specifically include: and checking the component state of the k8s cluster, judging whether the k8s cluster meets the upgrading condition, and distributing a new version of k8s installation package and scripts required by upgrading to all nodes in the cluster in response to the fact that the k8s cluster meets the upgrading condition.

Further preferably, in addition to the preparation work described above, the embodiment of the present application may further include the following preparation works: executing cluster health check operation by calling an API interface of K8s, and checking whether a service is unavailable or a container is restarted; and updating the state and version information of the clusters in the K8s cluster management system.

Here, the cluster health check operation, the items that can be checked include node status, pod status, system component status, etc., and pay special attention to whether there are nodes in NotReady status or if there are Pod frequent restarts (excessive number of restarts), which may be an indication that the cluster is problematic. In addition, availability checking is performed on critical services within the cluster to ensure that the service response is normal, which may be accomplished by invoking a health check endpoint of the service or executing an actual service request. Any found problems and anomalies are recorded in detail when performing health checks, which information is critical to diagnosing problems and planning an upgrade strategy.

The present application may also detail the configuration, status, and version information of the current cluster, including but not limited to version information of each node, running service version, snapshot of configuration files, etc., prior to upgrade, such as may be obtained using kubectl version or accessing Kubernetes API. The current state and configuration information of the cluster are backed up to a safe location, and during or after the upgrade, if a problem is encountered, the information is very important.

By executing the supplementary preparation work, the current health condition of the cluster can be more comprehensively evaluated, important basic information is provided for upgrading, and the problems in and after the upgrading process can be rapidly positioned and solved, so that the success rate of upgrading can be improved, and the long-term stability and safety of the cluster can be maintained.

Step S102: stopping kube-apiserver and ETCD services of a k8s cluster causes the k8s cluster to enter a maintenance phase.

By stopping kube-apiserver of the k8s cluster and the ETCD service, data inconsistency or state change can be prevented from occurring when a critical operation is performed, and the security of an upgrade or maintenance operation is ensured. Moreover, stopping kube-apiserver and ETCD services of the k8s cluster does not affect the running container instances, which can normally provide services, because the container instances that have been started are managed directly by kubelet on the various Node nodes, rather than being dynamically scheduled through kube-apiserver.

Step S103: and upgrading the ETCD in situ by using an upgrading script, and upgrading the object stored in the ETCD after the in situ upgrading of the ETCD is completed.

Preferably, as shown in fig. 3, in-situ upgrading the ETCD by using the upgrade script in this step includes:

step 301: executing the ETCD snapshot, creating a backup of the current ETCD data, and backing up the configuration file and system configuration of the related k8s cluster.

Further preferably, this step may comprise the substeps as shown in fig. 4:

step S3011: and determining the node address, port number and required certificate authentication information of the running ETCD cluster.

Step S3012: an ETCD snapshot is created with etcdctl tools based on the node address, the port number, and a credential path.

Step S3013: the state of the snapshot file is checked to ensure that the snapshot creation was successful and the data is complete.

Step S3014: and storing the snapshot file which is successfully created on other servers or cloud storage services.

Step S3015: backup Master Node and Node configuration files, static Pod configuration files, network configuration files, service account keys, and encryption configuration files.

After the above steps are completed, a complete backup of the ETCD data snapshot and cluster configuration file may be obtained, which provides a solid foundation for the in-place upgrades of ETCDs and any necessary disaster recovery operations.

Step S302: and upgrading the ETCD in situ by using the upgrade script and the new version of k8s installation package.

From the above, it can be seen that, in the present application, after the ETCD is upgraded in place, the object stored in the ETCD is upgraded immediately, because the cross-version upgrade in the k8s cluster means that the data model stored in the ETCD may also need to be updated to adapt to the API and resource definition of the new version, and by updating the object stored in the ETCD immediately after the ETCD is upgraded, it is ensured that the data model is compatible with the new version of Kubernetes, so as to avoid errors or anomalies caused by version mismatch.

The objects in ETCD refer to various resource objects of k8s, including but not limited to Pods (the basic deployment unit in k8s, each Pod may contain one or more containers), services (defining a way to access Pods, typically used as a load balancer), deployments and StatefulSets (for managing deployment and scaling of Pods), configMaps and Secrets (for storing configuration data and sensitive information), roles and RoleBindings (defining access control rules for Kubernetes resources), network Policies (defining Network access Policies between Pods), and the like. When an ETCD upgrade is performed, these objects stored in the ETCD are updated for reasons including, but not limited to, the following:

1) API version update: the Kubernetes API may evolve with version updates, some API versions may be discarded and deleted in subsequent versions, and if the objects in ETCD still use old API versions, they may not be properly identified or handled by the new version of Kubernetes.

2) Data model changes: in some Kubernetes version updates, the data model of the resource object may change, in which case the object stored in the ETCD needs to be updated to a new data model to ensure proper functioning of the cluster.

3) Utilizing the new characteristics: new versions of Kubernetes may introduce new properties or fields that update objects may allow those objects to take advantage of the improvements and properties provided by the new versions.

Step S104: stopping all services of the Master node, and finishing in-situ upgrading of the Master node by using an upgrading script.

Specifically, the service kube-scheduler, kube-controller-manager on the Master node is stopped, and the method for stopping the service depends on the running mode of the service, such as a system service or a Pod.

Step S105: stopping all services of the Node, and finishing in-situ upgrading of the Node by using the upgrading script.

Step S106: and sequentially starting kube-apiserver and kube-schedule services on the Master node.

Start kube-apiserver: the kube-apiserver service on the Master node is started first, as it is the server side of the Kubernetes API, which other components rely on to communicate. After startup, it is also necessary to confirm that the service startup was successful, and normal communication with the cluster can be through kubectl or other API clients.

Start kube-schedule: after confirming kube-apiserver that it starts and operates properly, a kube-scheduler service is started, which is responsible for scheduling Pods to the appropriate Node. After start-up, the log or status needs to be checked to confirm kube-schedule is functioning properly.

Step S107: and starting the service on the Node nodes, calling kube-apiserver interfaces on the host computers of each Node to acquire updated container object information, and rewriting the local storage file based on container dimensions according to the acquired container object information so as to ensure compatibility with new version k8 s.

When the k8s cluster is upgraded, the local storage file on the host machine and the ETCD storage data on the remote Master node are simultaneously upgraded, so that the content represented by the local storage file and the ETCD storage data are consistent, the content is matched with the k8s version, and after the k8s component service of different versions is started, the modified resource object can be smoothly taken over, wherein the resource object is an operation resource object of an API interface provided by the k8s, and different resouce objects correspond to different functions.

Step S108: and starting kube-controller-manager service of the Master node to resume the management of k8s on the cluster.

As can be seen from the above steps S106-S108, the startup sequence after the upgrade of the present application is performed according to a specific sequence, namely, the kube-apiserver and kube-scheduler services on the Master Node are started first, then the services of the Node are started, and finally the kube-controller-manager services of the Master Node are started. This particular start-up sequence may bring about the following benefits:

1. Ensuring node readiness

Starting services on Node nodes (particularly kubelet) enables these nodes to be rejoined into the cluster first and identified by kube-apiserver. This ensures that when kube-controller-manager is started, it can correctly see the latest state of all nodes and make decisions based on this information.

2. Avoiding premature scheduling

If kube-controller-manager starts up too early, it may attempt to perform Pod scheduling or execute other control logic based on incomplete or outdated cluster state information, possibly resulting in unnecessary rescheduling or other conflicts, especially during upgrades in which cluster state changes rapidly.

3. Managing resources and workload

After the services of the Node nodes are started and the nodes are confirmed to be healthy, kube-controller-manager can manage resources and workload more accurately. For example, it may automatically scale the number of Pod copies as needed or reschedule the Pod to a healthy node when needed.

Preferably, as shown in fig. 4, after the new version of the k8S installation package and the script required for upgrading are distributed to all the nodes in the cluster through step S101 in the above method, the method may further include the following steps:

Step S401: and evaluating the resources required in the upgrading process and auditing the use condition of the existing resources.

Specifically, this step may further include the substeps as shown in fig. 5:

Step S4011: the resources required for the upgrade operation itself are evaluated, including CPU, memory, storage space, and network bandwidth.

Step S4012: the change in traffic load during the upgrade operation is evaluated, and the load situation of the system during different time periods is predicted.

Step S4013: the monitoring tool is used for auditing the resource utilization condition of the existing k8s cluster, including the resource utilization rate of the node, the resource request and limit setting of the Pod and the traffic mode of the service.

Step S402: upgrades are performed during periods of low system load based on the assessment results and audit results to reduce impact on the business, such as may occur during the night.

Step S403: resource quota and limits are set based on the evaluation results and the audit results to ensure that the upgrade operations do not consume excessive resources.

Through the steps, the upgrading operation of the k8s cluster can be ensured to be successfully executed, the influence on running service is minimized, and meanwhile, the effective utilization of cluster resources and the continuous stability of service are ensured.

Referring to fig. 5, it can be seen from the schematic diagram of the steps S101-S108 that fig. 5 shows that the key point of the upgrade of the scheme is that, in a proper time before and after the upgrade of the k8S component, the local storage file and the remote ETCD storage data on the host are synchronously modified, so that the content represented by the local storage file and the remote ETCD storage data are ensured to be consistent, and the content is matched with the k8S version, so that after the service of the k8S component with different versions is started, the modified resource object can be successfully taken over.

Therefore, the multi-version in-situ upgrading method for the k8s cluster breaks through the in-situ upgrading of the k8s product, expands the upgrading span of the k8s by modifying the data of the k8s storage object and the local file mode under the condition of ensuring the integrity of the k8s product, does not need to frequently upgrade from version to version, simultaneously solves the defect that a container needs to be restarted in the upgrading of the k8s version, and realizes the noninductive lossless upgrading of the k8s version. Specifically, data backup is realized through an ETCD snapshot, k8s storage object data is modified to complete k8s object upgrading transformation, data preparation is performed for local file upgrading, then a kube-apiserver interface is called to obtain updated container object information, and the local file is upgraded based on container dimensions according to the obtained container object information. And the whole upgrading process adopts a certain start-stop sequence mode by disabling kube-controller-manager service, reasonably applies a k8s deviation strategy, and solves the problem of version compatibility of each component and kube-apiserver component interfaces in the upgrading process.

Fig. 6 is a schematic structural diagram of a cross-multi-version in-place upgrade apparatus for a k8s cluster according to an embodiment of the present application, where the apparatus includes: distribution unit 610, maintenance unit 620, ETCD upgrade unit 630, master upgrade unit 640, node upgrade unit 650, master recovery unit 660, and Node recovery unit 670, which are sequentially adjacent to each other. Wherein:

a distributing unit 610, configured to distribute the new version of the k8s installation package and the scripts required for upgrading to all nodes in the cluster;

A maintenance unit 620, configured to stop kube-apiserver of the k8s cluster and the ETCD service to make the k8s cluster enter a maintenance phase;

An ETCD upgrade unit 630, configured to upgrade the ETCD in situ by using an upgrade script, and upgrade an object stored in the ETCD after the in situ upgrade of the ETCD is completed;

a Master upgrade unit 640, configured to stop all services of a Master node, and complete in-situ upgrade of the Master node by using an upgrade script;

A Node upgrade unit 650, configured to stop all services of the Node, and complete in-situ upgrade of the Node by using an upgrade script;

A Master recovery unit 660, configured to sequentially start kube-apiserver and kube-scheduler services on the Master Node, and start kube-controller-manager services of the Master Node to resume k8s management on the cluster after the Node recovery unit 670 resumes services;

And the Node recovery unit 670 is used for starting the service on the Node nodes, calling kube-apiserver interfaces on the host computers of each Node to acquire container object information, and rewriting the local file based on container dimensions according to the acquired container object information so as to ensure compatibility with the K8s of the new version.

Preferably, the apparatus further comprises: the health check unit is used for executing cluster health check operation by calling an API interface of K8s after the Master restoring unit starts kube-controller-manager service of the Master node to restore the management of the K8s to the cluster, and checking whether the service is unavailable or the problem of restarting the container exists or not; and the state updating unit is used for updating the state and version information of the clusters in the K8s cluster management system.

Preferably, the distributing unit 610 distributes the new version of the k8s installation package and the script required for the upgrade to all nodes in the cluster includes: and checking the component state of the k8s cluster, judging whether the k8s cluster meets the upgrading condition, and distributing a new version of k8s installation package and scripts required by upgrading to all nodes in the cluster in response to the fact that the k8s cluster meets the upgrading condition.

Preferably, the above-mentioned ETCD upgrade unit 630 performs in-situ upgrade of ETCD by using upgrade scripts, including: executing the ETCD snapshot, creating a backup of current ETCD data, and backing up the configuration file and system configuration of the related k8s cluster; and upgrading the ETCD in situ by using the upgrade script and the new version of k8s installation package.

Preferably, the ETCD upgrade unit 630 executes an ETCD snapshot to create a backup of current ETCD data, and then the configuration file and system configuration of the k8s cluster related to the backup include: determining node addresses, port numbers and required certificate authentication information of an operation ETCD cluster; creating an ETCD snapshot with etcdctl tools based on the node address, the port number, and a credential path; checking the state of the snapshot file to ensure that the snapshot is created successfully and the data is complete; storing the snapshot file which is successfully created on other servers or cloud storage services; backup Master Node and Node configuration files, static Pod configuration files, network configuration files, service account keys, and encryption configuration files.

Preferably, the apparatus further comprises: the resource evaluation unit is used for evaluating the resources required in the upgrading process and auditing the use condition of the existing resources; an upgrade period selection unit for performing upgrade in a period of low system load based on the evaluation result and the audit result to reduce influence on the service; and the upgrading resource allocation unit is used for setting resource quota and limit based on the evaluation result and the audit result so as to ensure that excessive resources are not consumed by the upgrading operation.

Preferably, the resource evaluation unit evaluates the resources required in the upgrading process and audits the existing resource use conditions, including: evaluating resources required by the upgrade operation itself, including CPU, memory, storage space and network bandwidth; evaluating the change of service load during upgrading operation, and predicting the load condition of the system in different time periods; the monitoring tool is used for auditing the resource utilization condition of the existing k8s cluster, including the resource utilization rate of the node, the resource request and limit setting of the Pod and the traffic mode of the service.

According to the technical scheme, the cross-multi-version in-situ upgrading device for the k8s cluster provided by the invention breaks through the in-situ upgrading of the k8s product, expands the upgrading span of the k8s by modifying the data of the k8s storage object and the local file mode under the condition of ensuring the integrity of the k8s product, does not need to be frequently upgraded version by version, and simultaneously solves the defect that a container needs to be restarted in the upgrading of the k8s version, so that the noninductive lossless upgrading of the k8s version is realized. Specifically, data backup is realized through an ETCD snapshot, k8s storage object data is modified to complete k8s object upgrading transformation, data preparation is performed for local file upgrading, then a kube-apiserver interface is called to obtain updated container object information, and the local file is upgraded based on container dimensions according to the obtained container object information. And the whole upgrading process adopts a certain start-stop sequence mode by disabling kube-controller-manager service, reasonably applies a k8s deviation strategy, and solves the problem of version compatibility of each component and kube-apiserver component interfaces in the upgrading process.

The embodiment of the invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the method when executing the program.

The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program for executing the method.

As shown in fig. 7, the electronic device 600 may further include: a communication module 110, an input unit 120, an audio processor 130, a display 160, a power supply 170. It is noted that the electronic device 600 need not include all of the components shown in fig. 7; in addition, the electronic device 600 may also include the components shown in fig. 7, to which reference may be made.

As shown in fig. 7, the central processor 100, sometimes also referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, which central processor 100 receives inputs and controls the operation of the various components of the electronic device 600.

The memory 140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information about failure may be stored, and a program for executing the information may be stored. And the central processor 100 can execute the program stored in the memory 140 to realize information storage or processing, etc.

The input unit 120 provides an input to the central processor 100. The input unit 120 is, for example, a key or a touch input device. The power supply 170 is used to provide power to the electronic device 600. The display 160 is used for displaying display objects such as images and characters. The display may be, for example, but not limited to, an LCD display.

The memory 140 may be a solid state memory such as Read Only Memory (ROM), random Access Memory (RAM), SIM card, or the like. But also a memory which holds information even when powered down, can be selectively erased and provided with further data, an example of which is sometimes referred to as EPROM or the like. Memory 140 may also be some other type of device. Memory 140 includes a buffer memory 141 (sometimes referred to as a buffer). The memory 140 may include an application/function storage 142, the application/function storage 142 for storing application and function programs or a flow for executing operations of the electronic device 600 by the central processor 100.

The memory 140 may also include a data store 143, the data store 143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage 144 of the memory 140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, address book applications, etc.).

The communication module 110 is a transmitter/receiver 110 that transmits and receives signals via an antenna 111. A communication module (transmitter/receiver) 110 is coupled to the central processor 100 to provide an input signal and receive an output signal, which may be the same as in the case of a conventional mobile communication terminal.

Based on different communication technologies, a plurality of communication modules 110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, etc., may be provided in the same electronic device. The communication module (transmitter/receiver) 110 is also coupled to a speaker 131 and a microphone 132 via an audio processor 130 to provide audio output via the speaker 131 and to receive audio input from the microphone 132 to implement usual telecommunication functions. The audio processor 130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 130 is also coupled to the central processor 100 so that sound can be recorded locally through the microphone 132 and so that sound stored locally can be played through the speaker 131.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. A method for cross-multi-version in-place upgrades of a k8s cluster, the method comprising:

distributing the k8s installation package of the new version and the script required by the upgrade to all nodes in the cluster;

stopping kube-apiserver and ETCD services of a k8s cluster to bring the k8s cluster into a maintenance phase;

The method comprises the steps of performing in-situ upgrading on the ETCD by using an upgrading script, and upgrading an object stored in the ETCD after the in-situ upgrading of the ETCD is completed;

stopping all services of the Master node, and finishing in-situ upgrading of the Master node by using an upgrading script;

Stopping all services of the Node, and finishing in-situ upgrading of the Node by using an upgrading script;

Sequentially starting kube-apiserver and kube-schedule services on the Master node;

starting the service on the Node nodes, calling kube-apiserver interfaces on the host of each Node to acquire updated container object information, and rewriting a local file based on container dimensions according to the acquired container object information so as to ensure compatibility with K8s of a new version;

And starting kube-controller-manager service of the Master node to resume the management of k8s on the cluster.

2. The cross-multi-version in-place upgrade method of k8s clusters of claim 1, wherein after said enabling kube-controller-manager service of said Master node to resume management of k8s clusters, said method further comprises:

Executing cluster health check operation by calling an API interface of K8s, and checking whether a service is unavailable or a container is restarted;

And updating the state and version information of the clusters in the K8s cluster management system.

3. The method for in-place upgrades across multiple versions of a k8s cluster of claim 1, wherein distributing new versions of the k8s installation package and scripts required for upgrades to all nodes in the cluster comprises:

And checking the component state of the k8s cluster, judging whether the k8s cluster meets the upgrading condition, and distributing a new version of k8s installation package and scripts required by upgrading to all nodes in the cluster in response to the fact that the k8s cluster meets the upgrading condition.

4. The method of in-place upgrades across multiple versions of a k8s cluster of claim 1, wherein said in-place upgrades to ETCDs with upgrade scripts comprises:

Executing the ETCD snapshot, creating a backup of current ETCD data, and backing up the configuration file and system configuration of the related k8s cluster;

and upgrading the ETCD in situ by using the upgrade script and the new version of k8s installation package.

5. The method of claim 4, wherein executing the ETCD snapshot creates a backup of current ETCD data, and backing up the associated configuration file and system configuration of the k8s cluster comprises:

determining node addresses, port numbers and required certificate authentication information of an operation ETCD cluster;

creating an ETCD snapshot with etcdctl tools based on the node address, the port number, and a credential path;

checking the state of the snapshot file to ensure that the snapshot is created successfully and the data is complete;

Storing the snapshot file which is successfully created on other servers or cloud storage services;

Backup Master Node and Node configuration files, static Pod configuration files, network configuration files, service account keys, and encryption configuration files.

6. The method for in-place upgrades across multiple versions of a k8s cluster of claim 1, wherein after distributing new versions of the k8s installation package and upgrade required scripts to all nodes in the cluster, the method further comprises:

Evaluating the resources required in the upgrading process and auditing the use condition of the existing resources;

performing an upgrade based on the evaluation result and the audit result during a period of low system load to reduce the impact on the service;

resource quota and limits are set based on the evaluation results and the audit results to ensure that the upgrade operations do not consume excessive resources.

7. The method for cross-multi-version in-place upgrades of k8s clusters of claim 6, wherein said evaluating resources required in the upgrade process and auditing existing resource usage comprises:

evaluating resources required by the upgrade operation itself, including CPU, memory, storage space and network bandwidth;

evaluating the change of service load during upgrading operation, and predicting the load condition of the system in different time periods;

the monitoring tool is used for auditing the resource utilization condition of the existing k8s cluster, including the resource utilization rate of the node, the resource request and limit setting of the Pod and the traffic mode of the service.

8. A cross-multi-version in-place upgrade apparatus for a k8s cluster, the apparatus comprising:

The distributing unit is used for distributing the k8s installation package of the new version and the script required by the upgrade to all nodes in the cluster;

A maintenance unit, configured to stop kube-apiserver of a k8s cluster and an ETCD service to make the k8s cluster enter a maintenance phase;

The ETCD upgrading unit is used for upgrading the ETCD in situ by utilizing the upgrading script and upgrading the object stored in the ETCD after the in-situ upgrading of the ETCD is completed;

the Master upgrading unit is used for stopping all services of the Master node and finishing in-situ upgrading of the Master node by utilizing an upgrading script;

the Node upgrading unit is used for stopping all services of the Node and finishing in-situ upgrading of the Node by using an upgrading script;

The Master recovery unit is used for starting kube-apiserver and kube-scheduler services on the Master Node in sequence, and starting kube-controller-manager services of the Master Node to recover k8s management of the cluster after the Node recovery unit recovers the services;

And the Node recovery unit is used for starting the service on the Node nodes, calling kube-apiserver interfaces on the host computers of each Node to acquire container object information, and rewriting the local file based on container dimensions according to the acquired container object information so as to ensure compatibility with the K8s of the new version.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed by the processor.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 7.