[go: up one dir, main page]

HK40044598A - Method and device for hot upgrading virtual machine monitoring program of security container - Google Patents

Method and device for hot upgrading virtual machine monitoring program of security container Download PDF

Info

Publication number
HK40044598A
HK40044598A HK42021034510.4A HK42021034510A HK40044598A HK 40044598 A HK40044598 A HK 40044598A HK 42021034510 A HK42021034510 A HK 42021034510A HK 40044598 A HK40044598 A HK 40044598A
Authority
HK
Hong Kong
Prior art keywords
virtual
virtual machine
mode
processor
thread
Prior art date
Application number
HK42021034510.4A
Other languages
Chinese (zh)
Other versions
HK40044598B (en
Inventor
徐权
秦承刚
贺勇
Original Assignee
支付宝(杭州)信息技术有限公司
Filing date
Publication date
Application filed by 支付宝(杭州)信息技术有限公司 filed Critical 支付宝(杭州)信息技术有限公司
Publication of HK40044598A publication Critical patent/HK40044598A/en
Publication of HK40044598B publication Critical patent/HK40044598B/en

Links

Description

Method and device for hot upgrading of virtual machine monitoring program of security container
Technical Field
The embodiments of the present specification relate generally to the field of computers, and more particularly, to a method and apparatus for a virtual machine monitor for hot-upgrading a secure container.
Background
A secure container is a runtime technique that provides an operating system execution environment for container applications, but isolates the execution of the application from the host operating system, avoiding direct access of the application to host resources, and thus can provide additional protection between containers and hosts or between containers.
In the secure container, a Virtual Machine (Virtual Machine) is used as an isolation layer. The virtual machine has a user mode Kernel (Guest Kernel). Guest Kernel can run container applications, but does not contain a complete operating system. For containers, the Guest Kernel has only a container engine, and further reduces memory overhead by using less unnecessary memory, sharing sharable memory.
In the secure container, a Virtual Machine monitor (Virtual Machine Manager) is also required. The virtual machine monitor provides a virtual processor (vCPU), memory, and a series of hardware virtualizations to run Guest Kernel. In the application of the secure container, the virtual machine monitor program needs to be upgraded. However, it is undesirable for the container application and the secure container to stop working when the virtual machine monitor is upgraded. How to upgrade the virtual machine monitor while the container application and the secure container remain operational.
Disclosure of Invention
In view of the foregoing, embodiments of the present specification provide a method and apparatus system for hot upgrading a virtual machine monitor of a secure container. By using the method and the device, the virtual machine monitoring program is upgraded under the condition that the container application and the security container keep working.
According to an aspect of embodiments of the present specification, there is provided a method for a virtual machine monitor for hot-upgrading a secure container, the virtual machine monitor being configured to run a user-mode kernel, the method including: in response to the obtained upgrading request of the virtual machine monitoring program, suspending the running operation of the user mode thread in the container example on the system thread; decoupling binding relationships between the system threads and a first set of virtual processors currently used by the container instance, the first set of virtual processors being created according to a virtual machine device file of a first virtual machine monitor; deleting the first set of virtual processors; creating a second virtual processor set according to the virtual machine equipment file of the second virtual machine monitoring program; and continuing to run each user-mode thread on each system thread using the second set of virtual processors.
Optionally, in an example of the above aspect, the container instance is constructed with a virtual machine device file of a multi-version virtual machine monitor.
Optionally, in one example of the above aspect, the method is implemented in a Go language programming environment.
Optionally, in one example of the above aspect, decoupling the binding relationship between the respective system threads and the first set of virtual processors currently used by the container instance comprises: popping the system thread running in the guest ring3 mode back to the guest ring0 mode; and bouncing the first virtual processor running in host ring0 mode back to guest ring0 mode.
Optionally, in an example of the above aspect, decoupling the binding relationship between each system thread and the first set of virtual processors currently used by the container instance further comprises: the system call is returned from the guest ring0 mode to the host ring3 mode.
Optionally, in an example of the above aspect, deleting the first set of virtual processors comprises: deleting the mapping relation between the virtual processor bitmap of each first virtual processor in the first virtual processor set and the thread identification of each system thread, and disassembling the virtual machine state of each first virtual processor.
Optionally, in one example of the above aspect, the virtual machine monitor supports process level virtualization.
According to another aspect of embodiments of the present specification, there is provided an apparatus of a virtual machine monitor for hot-upgrading a secure container, the virtual machine monitor being configured to run a user-mode kernel, the apparatus including: the upgrading request acquisition unit is used for acquiring a virtual machine monitoring program upgrading request; the instance running suspension unit is used for suspending the running operation of the user mode thread in the container instance on the system thread in response to the obtained upgrading request of the virtual machine monitoring program; a binding relationship decoupling unit that decouples a binding relationship between each system thread and a first virtual processor set currently used by the container instance, the first virtual processor set being created according to a first file handle created using a virtual machine device file of a first virtual machine monitor; a virtual processor processing unit that deletes the first virtual processor set and creates a second virtual processor set according to a virtual machine device file of a second virtual machine monitor; and a thread rerun unit to continue running each user-mode thread on each system thread using the second set of virtual processors.
Optionally, in an example of the above aspect, the apparatus further comprises: and the device file construction unit is used for constructing the virtual machine device file of the multi-version virtual machine monitoring program file of the container instance.
Optionally, in an example of the above aspect, the apparatus is implemented in a Go language programming environment, and the binding relationship decoupling unit includes: the system thread bounce module bounces the system thread running in the guest ring3 mode back to the guest ring0 mode; and a virtual processor bounce module to bounce the first virtual processor running in host ring0 mode to guest ring0 mode.
Optionally, in an example of the above aspect, the binding relation decoupling unit further includes: and the system call returning module returns the system call from the guest ring0 mode to the host ring3 mode.
Optionally, in an example of the above aspect, the virtual processor deleting unit deletes a mapping relationship between a processor bitmap of each first virtual processor in the first set of virtual processors and a thread identifier of a system thread, and tears down a virtual machine state of each first virtual processor.
According to another aspect of embodiments of the present specification, there is provided an electronic apparatus including: at least one processor, and a memory coupled with the at least one processor, the memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform a method for hot-upgrading a virtual machine monitor as described above.
According to another aspect of embodiments of the present specification, there is provided a machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method for hot-upgrading a virtual machine monitor as described above.
Drawings
A further understanding of the nature and advantages of the present disclosure may be realized by reference to the following drawings. In the drawings, similar components or features may have the same reference numerals.
Fig. 1 shows an example architectural schematic of a secure container solution in accordance with embodiments of the present description.
FIG. 2 illustrates an operational scenario diagram of CPU virtualization according to an embodiment of the present description.
Fig. 3 illustrates a switching control schematic of Guest and VMM according to an embodiment of the present description.
FIG. 4 shows an architectural diagram of a user-mode kernel according to an embodiment of the present description.
Figure 5 illustrates a framework diagram of a network stack according to embodiments of the present description.
FIG. 6 illustrates a schematic diagram of a host kernel that builds a virtual machine device file with multiple versions of a virtual machine monitor in accordance with an embodiment of the present description.
Fig. 7 shows an overall architectural schematic of a secure container solution provided with an Avirt mode according to an embodiment of the present description.
FIG. 8 illustrates an example schematic of a process for a hot upgrade of a virtual machine monitor according to an embodiment of this specification.
FIG. 9 illustrates a flow diagram of a method for hot-upgrading a virtual machine monitor of a secure container according to an embodiment of the present description.
FIG. 10 illustrates a block diagram of an apparatus for a virtual machine monitor to thermally upgrade a secure container, according to an embodiment of the present description.
Fig. 11 shows a block diagram of an implementation example of a binding relationship decoupling unit according to an embodiment of the present description.
Fig. 12 shows a schematic diagram of an electronic device for implementing a hot upgrade process for a virtual machine monitor of a secure container according to an embodiment of the present description.
Detailed Description
The subject matter described herein will now be discussed with reference to example embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and thereby implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as needed. For example, the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. In addition, features described with respect to some examples may also be combined in other examples.
As used herein, the term "include" and its variants mean open-ended terms in the sense of "including, but not limited to. The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment". The term "another embodiment" means "at least one other embodiment". The terms "first," "second," and the like may refer to different or the same object. Other definitions, whether explicit or implicit, may be included below. The definition of a term is consistent throughout the specification unless the context clearly dictates otherwise.
Fig. 1 illustrates an example architectural schematic of a secure container solution 100 in accordance with embodiments of the present description.
As shown in fig. 1, the secure container solution 100 includes a physical hardware layer 110, a virtual machine monitor 120, and a plurality of container instances, each container instance including a user mode kernel 130, a network stack 140, and a container application 150.
The virtual machine monitor 120 is middleware running between the physical hardware layer 110 and the container instance. The virtual machine monitor 120 may allow multiple container instances to share a set of underlying physical hardware and thus may be viewed as a "meta" operating system in a virtual environment. The virtual machine monitor 120 may coordinate access to all physical devices and virtual machines on the server.
The virtual machine monitor 120 is responsible for managing the resources of the virtual machine and has control over all the resources of the virtual machine, including switching the CPU context of the virtual machine, and the like. When the hypervisor 120 is launched and executed, the hypervisor 120 provides virtual machine CPU, memory, and a series of hardware virtualizations for each container instance, and loads and runs the user mode kernel 130 for all container instances. In this specification, the user-mode (Guest) kernel 130 may also be referred to as a client operating system, a client OS or a Guest OS, or the like.
In this specification, the Guest kernel 130 may be an Operating System (OS) or may be a binary program. For the virtual machine monitor 120, the Guest kernel 130 is an instruction set. Only with knowledge of the entry (rip register values) the hypervisor 120 can load and run the user mode kernel 130.
Running Guest kernel 130 requires a virtual processor (virtual CPU, vCPU) to run. The vCPU supports ring 0-ring 34 operation modes, and only two operation models of ring0 and ring3 are used in the Linux system. When the vCPU register indicates that the vCPU is currently in ring0 mode, it is kernel code that the vCPU is running at this time. And when the vCPU is in ring3 mode, it is user code that indicates that the vCPU is running at this time. When a system call or process switch occurs, the vCPU will transition from the ring3 mode to the ring0 mode. The ring3 mode does not allow hardware operations to be performed, all of which need to be completed using system calls provided by the kernel.
VMX modes are introduced in the virtual CPU technology and are divided into root modes and non-root modes. The virtual machine monitor runs in root mode and Guest runs in non-root mode. FIG. 2 illustrates an operational scenario diagram of CPU virtualization according to an embodiment of the present description.
As shown in FIG. 2, the kernel in Guest OS runs in ring0 mode in non-root mode, i.e. GR0 mode. Although the kernel of the Guest OS also runs in ring0 mode, the Guest kernel cannot operate some resources nor can it run sensitive instructions because it is in non-root mode. An application (Guest APP) in Guest OS runs in ring3 mode in non-root mode, i.e. GR3 mode. The kernel and the virtual machine monitor of the Host OS run in a ring0 mode in root mode, i.e., HR0 mode, and the application (Host APP) in the Host OS runs in a ring3 mode in root mode, i.e., HR3 mode.
Guest code is in VMX non-root mode at runtime. In this mode, when a special instruction (such as an out instruction in Demo) is executed, the Guest OS needs to give the vCPU control to the hypervisor, and the hypervisor processes the special instruction to complete the hardware operation. Fig. 3 illustrates a switching control schematic of Guest and VMM according to an embodiment of the present description.
As shown in fig. 3, switching between Guest OS and virtual machine monitor includes two processes: container Instance Entry (Instance Entry) and container Instance Exit (Instance Exit). When the virtual machine monitoring system runs in a non-root mode, when special instruction operation is executed, the Guest OS returns vCPU control right to the virtual machine monitoring program through the Instance Exit, so that the virtual machine monitoring program trapped in the ring0 in the root mode carries out 'trapping simulation'. After the virtual machine monitor program processes the special instruction operation, the result and the control right are returned to the Guest OS through the Instance Entry.
At Guest Exit, the context of the current Guest OS will be saved to the VMCS (virtual machine control structure). At Guest Entry, the context stored in the VMCS is restored to the Guest OS. The VMCS is a 64-bit pointer that points to a real memory address. The VMCS takes vCPU as a unit, namely how many vCPUs the Guest OS has, and how many VMCS pointers correspond.
When an instruction initiated by the Guest OS to be executed is in the VMX mode (including the root mode and the non-root mode), the Guest OS cannot determine whether the current vCPU is in the VMX mode or the non-VMX mode. When generating Instance Exit, the vCPU will save the Exit coast to the MSRs (special register set in VMX mode). And the virtual machine monitoring program performs corresponding processing according to the Exit _ replay.
In the VMCS architecture, when a Root mode switch occurs with a non-Root mode switch, the VMCS configures the state and execution environment of the logical processor currently in which the switch occurs. One logical processor manages multiple VMCSs, but at the same time, only one VMCS of one logical processor is current-VMCS. The VMCS includes three attributes: 1) an activity attribute for indicating active and inactive states; 2) a current attribute for indicating current and non-current states; 3) the launch attribute indicates clear and launched states. The physical area in which the VMCS structure resides requires a 4K alignment boundary. The virtual machine monitor needs to manage the entire VMX operation mode using a region called "VMXON region". The VMXON region size and the supported cache types are consistent with the VMCS region. One virtual machine monitor corresponds to one VMXON pointer. Unless the VMXON mode is turned off in the VMM, the VMXON pointer will not change unless the VMXON mode is turned back on by the virtual machine monitor using another VMXON pointer. The VMXOFF instruction also operates on this VMXON pointer to turn off the operation mode managed by the current VMXON region. Access to the VMCS fields must be via VMREAD and VMWRITE instructions, each field defining a unique ID value.
In one example of the present specification, the virtual machine monitor is a secure virtual machine monitor NanoVM that supports process level virtualization. Here, the process level virtualization refers to a virtualization technique for running a piece of code in a CPU Guest (non root) state without heavy device simulation and a complete Guest operating system. The process level virtualization has the characteristics of low resource overhead and high starting speed.
When the NanoVM is used as a virtual machine monitor, the sentry in the sandbox gvsor can be used as a Guest kernel to provide operating system support for container applications in a secure container (Guest).
FIG. 4 shows an architectural diagram of a user-mode kernel according to an embodiment of the present description.
As shown in FIG. 4, the application running in sandbox gvsor has its own kernel and virtual device, distinct from the host and other sandboxes gvsor. By intercepting application system calls and running as a guest kernel, the gvison provides a strong isolation boundary.
The gvison is developed based on the Go language and is divided into two independent processes send and Gofer at runtime. The sentry process includes a kernel, which is responsible for executing user code and handling system calls. The Gofer process is a file system operation agent and file system operations that exceed the sandbox gvsor (non-internal Proc or tmp files, pipes, etc.) are sent to the Gofer process over the 9P connection.
The send process needs to use the platform to implement the basic context switching and memory mapping functions. Examples of platforms used may include, but are not limited to, a Ptrace platform and a KVM platform.
In a NanoVM, each Container Instance "can be a process. Each container instance may have multiple vcpus. The upper limit on the number of vcpus per container instance may be set to 256. If more than 256 vCPUs need to be supported in Guest, thread scheduling needs to be maintained based on the vCPUs.
The NanoVM is provided with several interfaces so that business processes can enter a virtualized state with a minimum of steps. Similar to KVM, the interface provided by the NanoVM is based on the virtual machine device file: and/dev/nanovm. The basic operation functions may include open (), close (), and ioctl ().
After opening the file/dev/nav using the interface open (/ dev/nav), the nav defaults to creating an instance structure for the current process. From the process perspective, one file handle fd uniquely corresponds to one instance. From the viewpoint of navm, the structure instance structure is saved in the "private _ data" of the file handle fd. The file handle is a struct file in the kernel.
After executing the interface close (), the instance structure opened by the current process and all the resources applied in the running process are released in close. After close () execution ends, instance is destroyed.
The parameter of the interface ioctl (nano _ CREATE _ vCPU) is navm _ conf. ioctl (nano _ CREATE _ vCPU) can be used to CREATE vCPU and initialize its state, VMCS, VPID, etc. according to the configuration parameters under process delivery. Configuration parameters may include, for example, but are not limited to, the configuration of CR0/CR4 control registers, the values of various segment registers. Examples of configuration parameters are as follows:
the user _ regs stores general registers after the process enters Guest, such as RIP, RSP, etc. sys _ regs stores system-related registers such as control registers CR 0-CR 4, various segment registers, etc.
After calling the function, a vCPU number will be returned. The NanoVM also writes the values of these registers to the VMCS in preparation for entering Guest mode.
The interface ioctl (nanvm _ RUN) is configured to execute vmlainch, entering Guest mode. After the process calls the interface, the process formally enters a Guest state.
The interface ioctl (nano _ SET _ MEMORY _ REGION) is configured to SET the MEMORY REGION. The process can register a mapping relation between GPA (guest Physical address) and HVA (host Virtual address) to the NanoVM through the interface. The parameters to be transferred by the user mode kernel are as follows:
the mapping relationship may be described by "struct users _ memory _ region". "struct userspace _ memory _ region" describes a host virtual address space, which corresponds to a physical address space in Guest. Its size is "memory _ size", "host _ virtual _ addr" and "guest _ phys _ addr", which describe the starting addresses of the virtual address space and the physical address space, respectively. The user-state process needs to register these mappings to the NanoVM. The NanoVM maintains these relationships in a linked list.
When EPT Page Fault occurs, reading out GPA causing missing Page from VMCS, and inquiring corresponding HVA from the above linked list. Then by function of kabi: and (4) getting _ user _ page _ fast (), applying for a real physical page in the Host, and establishing the mapping from the HVA to the HPA, namely the page table relationship.
Through the above interface provided by the NanoVM, the user mode process can enter a Guest mode and run in a GR0 mode, namely a ring0 mode of non root. Nanovm is the basis for a safety container for which VMX can be used for safety isolation.
There are two major overheads to the network latency of gvsor: 1) copying of data in the Host Kernel, and 2) the protocol stacks in the Host Kernel and sentry are not efficient. Thus, in the embodiment of the present specification, as shown in fig. 1, a network stack (network protocol stack) is made in send. Specifically, a network stack scheme based on the DPDK + user mode protocol stack is integrated into the send. The DPDK is mainly used for driving a network device, and adopts an interrupt mode.
Figure 5 illustrates a framework diagram of a network stack according to embodiments of the present description. As shown in fig. 5, the DPDK may interface down several devices, e.g., VF device, ENI device, and veth device. In the implementation, the DPDK directly interfaces the VF device, and furthermore, interfaces the ENI device and the veth device using virtio. Examples of user mode protocol stacks used may include FreeBSD and TLDK.
FIG. 6 illustrates a schematic diagram of a host kernel that builds a virtual machine device file with multiple versions of a virtual machine monitor in accordance with an embodiment of the present description. As shown in fig. 6, for each container instance, multiple virtual machine device files, e.g., navm.ko, navm _ n.ko, etc., may be built at the host kernel, each saved under one file directory, e.g., navm.ko under dev/navm, and navm _ n.ko under dev/navm _ N. Each virtual machine device file corresponds to a virtual machine monitor version.
Optionally, in one example, as shown in fig. 7, an Avirt module may also be provided on the Guest side. The Avirt module is configured to uniformly manage a plurality of virtual machine device files.
FIG. 8 illustrates an example schematic of a process for a hot upgrade of a virtual machine monitor according to an embodiment of this specification.
As shown in fig. 8, when the virtual machine monitor currently used by the container instance needs to be upgraded, first, the binding relationship between each system thread for running each user-mode thread in the container instance and the first virtual processor set currently used by the container instance is decoupled, and the first virtual processor set is deleted. Subsequently, a second set of virtual processors is created with a file handle of a virtual machine device file of another version of the virtual machine monitor built in the host kernel, and the created second set of virtual processors is used to continue running the respective user-mode threads on the respective system threads. In the upgrading process of the virtual machine monitor program, the running process of each user mode thread is only suspended and is not stopped, and after the second virtual processor set is created, the created second virtual processor set is used for continuing running in the running state of each user mode thread when the user mode thread is suspended, so that the hot upgrading for the virtual machine monitor program is realized.
Fig. 9 illustrates a flow diagram of a method for hot upgrading a virtual machine monitor of a secure container according to an embodiment of the present description.
As shown in FIG. 9, at 910, a virtual machine monitor upgrade request for a container instance is obtained.
At 920, in response to acquiring the virtual machine monitor upgrade request, the running operation of the user mode thread (goroutine) in the Container Instance (Container Instance) on the system thread (Linux thread) is suspended. Here, the suspend operation for the run operation is to suspend or pause (pause) the run operation of each user-mode thread, and save the current processing state or current processing result of each user-mode thread. For example, in the case of a Go-based implementation, the implemented Pause mechanism and corresponding API may be utilized to send a bounce signal to each system thread, which suspends all running user mode threads.
At 930, the binding relationships between the respective system threads and a first set of virtual processors currently used by the container instance are decoupled, the first set of virtual processors being created from the virtual machine device file of the first virtual machine monitor. Here, the first virtual machine monitor is a currently used virtual machine monitor. In performing virtual processor creation, a first file handle (VMfd1) may be created from a virtual machine device file of a first virtual machine monitor (e.g., navm. ko saved under dev/navm), and then a first virtual processor set may be created from the first file handle.
In one example, decoupling the binding relationship between the respective system threads and the first set of virtual processors currently used by the container instance may comprise: popping the system thread running in the guest ring3 mode back to the guest ring0 mode; and bouncing the first virtual processor running in host ring0 mode to guest ring0 mode.
For example, in the case where the above-described virtual machine monitor upgrade process is implemented in the Go language programming environment, if the system thread runs in the GR3 mode, a bounce signal is sent to the system thread using the function bouncetokerol that has been implemented. Upon receiving the bounce signal, the system thread bounces back to the GR0 mode (GR0 core). If the vCPU is operating in HR0 mode, a continuous bounce signal is sent to the system thread using the function BounceToHost that has been implemented. Upon receiving the bounce signal, the vCPU is bounced back to GR0 mode (GR0 core). Through the above operation, all system threads can be brought into the "user state (HR 3)".
In another example, decoupling the binding relationship between the respective system threads and the first set of virtual processors currently used by the container instance may further comprise: the system call is returned from the guest ring0 mode to the host ring3 mode. In this manner, once the execution of all user mode threads is suspended, the G0 scheduler will execute the FUTEX system call to stop (park) all system threads.
At 940, the first set of virtual processors is deleted. In one example, deleting the first set of virtual processors may comprise: and deleting the mapping relation between the virtual processor bitmap of each first virtual processor in the first virtual processor set and the thread identification of each system thread, and disassembling the virtual machine state of each first virtual processor.
In one example, before deleting the first set of virtual processors, it is also necessary to check whether all first virtual processors in the first set of virtual processors are not running in user mode. And deleting the mapping relation between the virtual processor bitmap of each first virtual processor in the first virtual processor set and the thread identification of each system thread and disassembling the virtual machine state of each first virtual processor when all the first virtual processors are confirmed not to run in the user mode. And if at least one first virtual processor runs in the user mode, not deleting the current mapping relation between the virtual processor bitmap of each first virtual processor in the first virtual processor set and the thread identification of each system thread.
At 950, a second set of virtual processors is created from the virtual machine device file of the second virtual machine monitor. For example, a second file handle (VMfd2) may be created from a virtual machine device file of a second virtual machine monitor (e.g., navm _1.ko saved under dev/navm _ 1), and then a second set of virtual processors may be created from the second file handle. Here, the second virtual machine monitoring device may be selected from the remaining virtual machine device files in the virtual machine device files constructed by the host kernel, may be a new virtual machine device file that is not run in the secure container, or may be another virtual machine device file that is run in the secure container.
At 960, the second set of virtual processors is used to continue running the respective user-mode threads on the respective system threads. Since the binding relationship between each system thread and each first virtual processor in the first virtual processor set is decoupled, each system thread can establish a relationship with a second virtual processor created based on the second file handle, thereby implementing an upgrade process for the virtual machine monitor.
Fig. 10 shows a block diagram of an apparatus 1000 for hot-upgrading a virtual machine monitor of a secure container (hereinafter referred to as "monitor upgrading apparatus") according to an embodiment of the present specification. As shown in fig. 10, the monitor upgrading apparatus 1000 includes an upgrade request acquisition unit 1010, an instance operation suspension unit 1020, a binding relationship decoupling unit 1030, a virtual processor processing unit 1040, and a thread rerun unit 1050.
The upgrade request acquisition unit 1010 is configured to acquire a virtual machine monitor program upgrade request. The instance run suspension unit 1020 is configured to suspend running operations of the user-mode thread in the container instance on the system thread in response to acquiring the virtual machine monitor upgrade request.
The binding relationship decoupling unit 1030 is configured to decouple the binding relationships between the respective system threads and a first set of virtual processors currently used by the container instance, the first set of virtual processors being created from a first file handle created using a virtual machine device file of a first virtual machine monitor.
Fig. 11 shows a block diagram of an implementation example of a binding relationship decoupling unit 1100 according to an embodiment of the present description. As shown in FIG. 11, the binding relationship decoupling unit 1100 includes a system thread bounce module 1110 and a virtual processor bounce module 1120.
The system thread bounce module 1110 is configured to bounce a system thread operating in the guest ring3 mode back to the guest ring0 mode. For example, if the system thread is running in GR3 mode, the system thread bounce module 1110 may send a bounce signal to the system thread using the function BounceToKernel that has been implemented. Upon receiving the bounce signal, the system thread bounces back to the GR0 mode (GR0 kernel).
The virtual processor bounce module 1120 is configured to bounce a first virtual processor running in host ring0 mode to guest ring0 mode. If the vCPU is operating in HR0 mode, the virtual processor bounce module 1120 utilizes the function BounceToHost that has been implemented to signal a consecutive bounce to a system thread. Upon receiving the bounce signal, the vCPU is bounced back to GR0 mode (GR0 core).
Further optionally, in an example, the binding relationship decoupling unit 1100 may further include a system call return module 1130. The system call return module 1130 is configured to return the system call from the guest ring0 mode to the host ring3 mode.
Returning to fig. 10, the virtual processor processing unit 1040 is configured to delete the first set of virtual processors and create a second set of virtual processors from the virtual machine device file of the second virtual machine monitor. In one example, virtual processor processing unit 1040 may delete the mapping between the processor bitmap of each first virtual processor in the first set of virtual processors and the thread identification of the system thread and tear down the virtual machine state of each first virtual processor. Further, the virtual processor processing unit 1040 may create a second file handle from the virtual machine device file of the second virtual machine monitor, and create a second virtual processor set from the second file handle.
Thread rerun unit 1050 is configured to continue running the respective user-mode thread on the respective system thread using the second set of virtual processors.
Further, optionally, the monitoring program upgrading apparatus 1000 may further include a device file building unit (not shown). The device file construction unit is configured to construct a virtual machine device file of a multi-version virtual machine monitor file of a container instance.
As described above with reference to fig. 1 to 11, a virtual machine monitor upgrading method and a virtual machine monitor upgrading apparatus according to an embodiment of the present specification are described. The above upgrading device for the virtual machine monitor program can be realized by hardware, and also can be realized by software or a combination of hardware and software.
Fig. 12 shows a schematic diagram of an electronic device for implementing a hot upgrade process for a virtual machine monitor of a secure container according to an embodiment of the present description. As shown in fig. 12, the electronic device 1200 may include at least one processor 1210, a memory (e.g., non-volatile storage) 1220, a memory 1230, and a communication interface 1240, and the at least one processor 1210, the memory 1220, the memory 1230, and the communication interface 1240 are connected together via a bus 1260. The at least one processor 1210 executes at least one computer-readable instruction (i.e., the elements described above as being implemented in software) stored or encoded in memory.
In one embodiment, computer-executable instructions are stored in the memory that, when executed, cause the at least one processor 1210 to: in response to the obtained upgrading request of the virtual machine monitoring program, suspending the running operation of the user mode thread in the container example on the system thread; decoupling the binding relationship between each system thread and a first virtual processor set currently used by the container instance, wherein the first virtual processor set is created according to a virtual machine device file of a first virtual machine monitor program; deleting the first set of virtual processors; creating a second virtual processor set according to the virtual machine equipment file of the second virtual machine monitoring program; and continuing to run each user mode thread on each system thread using the second set of virtual processors.
It should be appreciated that the computer-executable instructions stored in the memory, when executed, cause the at least one processor 1210 to perform the various operations and functions described above in connection with fig. 1-11 in the various embodiments of the present description.
According to one embodiment, a program product, such as a machine-readable medium (e.g., a non-transitory machine-readable medium), is provided. A machine-readable medium may have instructions (i.e., elements described above as being implemented in software) that, when executed by a machine, cause the machine to perform various operations and functions described above in connection with fig. 1-11 in the various embodiments of the present specification. Specifically, a system or apparatus may be provided which is provided with a readable storage medium on which software program code implementing the functions of any of the above embodiments is stored, and causes a computer or processor of the system or apparatus to read out and execute instructions stored in the readable storage medium.
In this case, the program code itself read from the readable medium can realize the functions of any of the above-described embodiments, and thus the machine-readable code and the readable storage medium storing the machine-readable code form part of the present invention.
Examples of the readable storage medium include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or from the cloud via a communications network.
It will be understood by those skilled in the art that various changes and modifications may be made to the various embodiments disclosed above without departing from the spirit of the invention. Accordingly, the scope of the invention should be determined from the following claims.
It should be noted that not all steps and units in the above flows and system structure diagrams are necessary, and some steps or units may be omitted according to actual needs. The execution order of the steps is not fixed, and can be determined as required. The apparatus structures described in the above embodiments may be physical structures or logical structures, that is, some units may be implemented by the same physical entity, or some units may be implemented by a plurality of physical entities, or some units may be implemented by some components in a plurality of independent devices.
In the above embodiments, the hardware units or modules may be implemented mechanically or electrically. For example, a hardware unit, module or processor may comprise permanently dedicated circuitry or logic (such as a dedicated processor, FPGA or ASIC) to perform the corresponding operations. The hardware units or processors may also include programmable logic or circuitry (e.g., a general purpose processor or other programmable processor) that may be temporarily configured by software to perform the corresponding operations. The specific implementation (mechanical, or dedicated permanent, or temporarily set) may be determined based on cost and time considerations.
The detailed description set forth above in connection with the appended drawings describes exemplary embodiments but does not represent all embodiments that may be practiced or fall within the scope of the claims. The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous" over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (14)

1. A method for hot-upgrading a virtual machine monitor of a secure container, the virtual machine monitor to run a user-mode kernel, the method comprising:
in response to the obtained upgrading request of the virtual machine monitoring program, suspending the running operation of a user mode thread in the container instance on a system thread;
decoupling binding relationships between the system threads and a first set of virtual processors currently used by the container instance, the first set of virtual processors being created according to a virtual machine device file of a first virtual machine monitor;
deleting the first set of virtual processors;
creating a second virtual processor set according to the virtual machine equipment file of the second virtual machine monitoring program; and
the second set of virtual processors is used to continue running the respective user-mode thread on the respective system thread.
2. The method of claim 1, wherein the container instance is constructed with a virtual machine device file of a multi-version virtual machine monitor.
3. The method of claim 1, wherein the method is implemented in a Go language programming environment.
4. The method of claim 3, wherein decoupling binding relationships between individual system threads and a first set of virtual processors currently used by the container instance comprises:
popping the system thread running in the guest ring3 mode back to the guest ring0 mode; and
the first virtual processor running in host ring0 mode is bounced to guest ring0 mode.
5. The method of claim 4, wherein decoupling binding relationships between individual system threads and a first set of virtual processors currently used by the container instance further comprises:
the system call is returned from the guest ring0 mode to the host ring3 mode.
6. The method of claim 1, wherein deleting the first set of virtual processors comprises:
deleting the mapping relation between the virtual processor bitmap of each first virtual processor in the first virtual processor set and the thread identification of each system thread, and disassembling the virtual machine state of each first virtual processor.
7. The method of claim 1, wherein the virtual machine monitor supports process level virtualization.
8. An apparatus for hot-upgrading a virtual machine monitor of a secure container, the virtual machine monitor to run a user-mode kernel, the apparatus comprising:
the upgrading request acquisition unit is used for acquiring the upgrading request of the monitoring program of the virtual machine;
the instance running suspension unit is used for suspending the running operation of the user mode thread in the container instance on the system thread in response to the obtained upgrading request of the virtual machine monitoring program;
a binding relationship decoupling unit that decouples a binding relationship between each system thread and a first virtual processor set currently used by the container instance, the first virtual processor set being created according to a first file handle created using a virtual machine device file of a first virtual machine monitor;
a virtual processor processing unit that deletes the first virtual processor set and creates a second virtual processor set according to a virtual machine device file of a second virtual machine monitor; and
and a thread rerun unit to continue running each user mode thread on each system thread using the second set of virtual processors.
9. The apparatus of claim 8, further comprising:
and the device file construction unit is used for constructing the virtual machine device file of the multi-version virtual machine monitoring program file of the container instance.
10. The apparatus of claim 8, wherein the apparatus is implemented in a Go language programming environment, the binding relation decoupling unit comprises:
the system thread bounce module bounces the system thread running in the guest ring3 mode back to the guest ring0 mode; and
the virtual processor bounce module bounces a first virtual processor running in host ring0 mode to guest ring0 mode.
11. The apparatus of claim 10, wherein the binding relationship decoupling unit further comprises:
and the system call returning module returns the system call from the guest ring0 mode to the host ring3 mode.
12. The apparatus of claim 8, wherein the virtual processor processing unit is to:
deleting the mapping relation between the processor bitmap of each first virtual processor in the first virtual processor set and the thread identification of the system thread, and disassembling the virtual machine state of each first virtual processor.
13. An electronic device, comprising:
at least one processor, and
a memory coupled with the at least one processor, the memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the method of any of claims 1-7.
14. A machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method of any one of claims 1 to 7.
HK42021034510.4A 2021-07-06 Method and device for hot upgrading virtual machine monitoring program of security container HK40044598B (en)

Publications (2)

Publication Number Publication Date
HK40044598A true HK40044598A (en) 2021-10-08
HK40044598B HK40044598B (en) 2022-10-07

Family

ID=

Similar Documents

Publication Publication Date Title
CN112199165B (en) Method and device for hot upgrading of virtual machine monitoring program of security container
US9396013B2 (en) Method for controlling a virtual machine and a virtual machine system
Williams et al. Unikernel monitors: extending minimalism outside of the box
US9547346B2 (en) Context agent injection using virtual machine introspection
US8225317B1 (en) Insertion and invocation of virtual appliance agents through exception handling regions of virtual machines
CN100392598C (en) operating system
US7945436B2 (en) Pass-through and emulation in a virtual machine environment
US8024742B2 (en) Common program for switching between operation systems is executed in context of the high priority operating system when invoked by the high priority OS
KR101626398B1 (en) System and method for virtual partition monitoring
CN102231138B (en) Accurate memory data acquisition system and method for computer
KR20080047372A (en) Hierarchical Virtualization Using Multilevel Virtualization Mechanisms
KR20160111996A (en) Co-designed dynamic language accelerator for a processor
US7552434B2 (en) Method of performing kernel task upon initial execution of process at user level
US20230376302A1 (en) Techniques for non-disruptive system upgrade
US7546600B2 (en) Method of assigning virtual process identifier to process within process domain
CN116225765A (en) Method for executing instruction in virtual machine and virtual machine monitor
US11726807B2 (en) Safe execution of virtual machine callbacks in a hypervisor
Tan et al. How low can you go? Practical cold-start performance limits in FaaS
HK40044598A (en) Method and device for hot upgrading virtual machine monitoring program of security container
HK40044598B (en) Method and device for hot upgrading virtual machine monitoring program of security container
Im et al. On-demand virtualization for live migration in bare metal cloud
JP2006522971A (en) operating system
Kanda et al. SIGMA system: A multi-OS environment for embedded systems
WO2018184698A1 (en) Method and system for supporting creation of virtual machines on a virtualization platform
US12443694B2 (en) Process credential protection