KVM system supporting novel large-page frame
Technical Field
The invention belongs to the technical field of cloud services, and particularly relates to a Kernel-based Virtual Machine (KVM) system supporting a novel large-page frame.
Background
Today, cloud computing is well developed, and a large number of cloud service providers, such as amazon, microsoft, and acliicloud, are emerging on the market, which do not want to provide efficient and large amount of services with acceptable hardware cost, which requires efficient and practical use of existing hardware.
Most of the cloud services today are built on a large-scale server cluster, and a virtualization technology is often used for multiplexing hardware for a single physical server, so as to achieve the purpose of carrying as much traffic as possible without affecting the operation. Therefore, tens or even hundreds of virtual machines are often run on a single physical server, and the utilization degree of hardware at this time directly affects the running quality and quantity of services. In the aspects of CPU, memory and I/O, the physical memory is a better point to be optimized, and the physical memory of the current mainstream server generally reaches 256G or even higher, under the condition, the actual benefit brought by the memory optimization is also very high. For example, the Linux Kernel has a KSM (Kernel pages Merging) technology for Merging duplicate pages. In addition, in such a large memory situation, the system will often open the large page mechanism, because the following bottlenecks will be caused by continuing to use the conventional 4KB small page frame: (ii) a large amount of overhead in memory management structure, e.g. 256G physical memory needs to be used 226And a struct page structure. ② the TLB is also called fast table,the cache is a page table cache, the speed of accessing the TLB by the CPU is far higher than that of accessing the memory, but the TLB capacity is very small, the possibility of causing TLB miss by fine-grained paging is very high, and great influence is caused on the service operation efficiency.
The existing Linux kernel has two types of support for large pages: the static large page mechanism and the transparent large page mechanism. Both large page mechanisms are expanded from common 4KB small pages, the management structure of each large page is built on the management structure of each small page, if the structure representing a page frame is still struct pages, every 512 continuous struct pages are compounded into one large page, the problem (II) is effectively overcome, but the problem (I) is still not solved, and the expansibility of the large page is not very good, if the two large pages do not support KSM and data compression, the static large page cannot be exchanged or compressed, and the transparent large page can be split into the common small pages before the exchange or the compression.
Therefore, a new big Page framework PHPA (primary big Page Allocator) is proposed by a certain subject group in China, and the framework is managed by using a new data structure, is compatible with interfaces of hugetlbfs, and is only 1/512 in terms of metadata overhead. However, the PHPA framework has a serious problem that it is not compatible with native KVM modules. Macroscopically, this is mainly caused by the fact that the PHPA framework adopts new page descriptors, which are the fundamental basis of the PHPA with good extensibility and are a design that cannot be discarded, so that in order to make the PHPA framework operate in a virtualized environment with KVM as hypervisor, the KVM module is required to be modified.
Disclosure of Invention
In view of the above, the present invention provides a KVM system supporting a new large-page frame, which makes the PHPA operate in a virtualized environment with KVM as hypervisor by modifying the KVM module, and enhances the function of KVM.
A KVM system supporting a novel large-page frame comprises a memory virtualization unit, a virtual machine and a virtual machine management unit, wherein the memory virtualization unit is responsible for memory management of the virtual machine; the memory virtualization unit comprises an EPT (Extended Page Tables) Page fault processing module and a Page table item deleting module, wherein the EPT Page fault processing module is responsible for performing Page missing processing on an EPT Page table, and the Page table item deleting module is used for deleting corresponding Page table items when a Page is released;
the EPT page fault handling module comprises:
the Level calculation submodule judges whether the EPT page table entries lacking in the EPT page table are 4KB page table entries or 2M page table entries;
the address conversion submodule converts the virtual machine page frame number GFN into a host machine physical address HPA according to the judgment result of the Level calculation submodule, and further converts the host machine physical address HPA into a physical page frame number PFN by utilizing a relevant API;
and the EPT page table filling submodule utilizes the physical page frame number PFN to reversely calculate the finally required host machine physical address HPA and fills the host machine physical address HPA into the EPT page table entry.
Furthermore, the KVM system runs on a host kernel, the host detects a request of the virtual machine for searching a host physical address HPA corresponding to a page frame number GFN of the virtual machine by using hardware of the host, and queries the host physical address HPA corresponding to the virtual machine page frame number GFN from an EPT page table in the host kernel, and if the request is found, directly accesses the host physical address HPA to perform a related physical memory operation; if the virtual machine page fault is not found, triggering an EPT page fault processing module in the KVM system, and simultaneously utilizing virtual machine management software (such as QEMU) in a user state to transfer a virtual machine page frame number GFN into the EPT page fault processing module.
Further, the Level calculation sub-module determines whether the missing EPT page table entry in the EPT page table is a 4KB page table entry or a 2M page table entry, and the specific method is as follows: comparing the size of the A with that of the host _ Level, taking the smaller value as the Level, and if the Level is 4KB, judging that the missing EPT page table entry is a 4KB page table entry; if the Level is 2M, judging that the missing EPT page table entry is a 2M page table entry; a is the page size allowed to be allocated inside the KVM, and host _ Level is the page size allocated to the virtual machine by the host (set by a user when the virtual machine is started).
Further, the specific operation process of the address translation submodule for translating the virtual machine page frame number GFN into the host physical address HPA is as follows:
(1) preprocessing a page frame number GFN of a virtual machine: if the page size host _ Level allocated to the virtual machine by the host machine is 2M, preprocessing according to a formula GFN '═ GFN-GFN% 512, wherein GFN' is the preprocessed virtual machine page frame number,% is a modulus operator; in addition, the GFN is kept unchanged, namely GFN' is GFN;
(2) calculating a host machine virtual address HVA corresponding to the GFN 'by searching a memslot (memory slot) corresponding to the GFN';
(3) according to the obtained host virtual address HVA, a host page table is searched in a traversal mode, and if the HVA-HPA mapping relation is found, the host physical address HPA' in a large page mode can be directly obtained; if the mapping relation of HVA-HPA cannot be found, entering a host page fault processing flow and finally obtaining a host physical address HPA' in a large page mode;
(4) obtaining a corresponding struct hugepage structure through a host physical address HPA' in a large page mode, wherein if the Level is 4KB, the value is HVA% 2 according to a formula pfn _ offset21Calculating the offset pfn _ offset of the host physical address HPA and the host physical address HPA' in the large page mode, and storing the offset pfn _ offset into a struct hugepage structure; if the Level is 2M, making the offset pfn _ offset 0;
(5) and adding the offset PFN _ offset to the host physical address HPA' in the large-page mode to obtain the final required host physical address HPA, and converting the host physical address HPA into a physical page frame number PFN by using a related API (application program interface) and returning the physical page frame number PFN to an upper function.
Furthermore, due to the use of the struct pagepage structure, one physical large page may correspond to multiple EPT page table entries, and if the logic of the page table entry deletion module is maintained unchanged, the problem that the residual EPT page table entries are not deleted may be caused, that is, the corresponding EPT page table entries in the EPT page table are invalid and residual after the memory is released; therefore, the page table item deleting module searches all the residual EPT page table items in a traversing manner and then deletes the residual EPT page table items, namely, the EPT page table item deleting operation is tried to be carried out on each HVA by traversing the range of the host virtual address HVA corresponding to the whole large page.
The invention provides a specific method for modifying KVM and related implementation by modifying a KVM module and a PHPA framework, so that the KVM module and the PHPA framework are organically combined, and the PHPA large page framework has good expansibility by passing related tests, thereby solving many problems of the Linux hugetlbfs static large page. In addition, the invention realizes the statistics of the hot and cold pages of the virtual machine by further modifying the KVM module, and finally forms a new KVM system, so that the PHPA framework takes a very critical step from a laboratory to the industry.
Drawings
FIG. 1 is a logic diagram illustrating EPT page faults according to the present invention.
FIG. 2 is a diagram illustrating the mapping logic of EPT small page table entries to host large page table entries.
Fig. 3 is a logic diagram of hva _ to _ pfn _ fast () function.
FIG. 4 is a logic diagram illustrating the trapping of a page fault in a host large page after an EPT page fault.
FIG. 5 is a logic diagram illustrating the residue of an invalid EPT page table entry.
FIG. 6 is a logic diagram illustrating the deletion of the EPT residual item.
FIG. 7 is a physical memory distribution diagram managed by the PHPA system.
Fig. 8 is a PHPA page frame addressing diagram.
Detailed Description
In order to more specifically describe the present invention, the following detailed description is provided for the technical solution of the present invention with reference to the accompanying drawings and the specific embodiments.
There are three main contradictions between native KVM and PHPA large page frames: (1) the physical page size of the virtual machine is inconsistent with that of the host machine; (2)3.2.5pfn to struct hugepage conversion; (3)3.2.6 differentiation of Small pages from Large pages.
The specific implementation of the technical solution of the present invention mainly solves the three problems, wherein the most important thing is to solve the first point, the reason of the first contradiction is caused by the EPT page fault logic itself, and the EPT page fault logic is shown in fig. 1.
When the EPT page table is out of page, the EPT page fault handling logic is entered, in the EPT page fault handling logic, the virtual machine physical memory address gpa is first converted into the host machine virtual memory address hva, then the host machine page table is queried according to hva to find the corresponding host machine physical address hpa, and finally hpa is filled back into the EPT page table. However, when the corresponding hpa is found from the host page table, it is the page structure that is returned to the upper logic and is converted to the physical page frame number pfn in hva _ to _ pfn, and finally is converted from pfn to hpa to fill the EPT page table. Consider that the EPT page table has a small page table entry, but the corresponding host page table entry is a large page table entry, as shown in FIG. 2.
The EPT page table missing address in FIG. 2 is the physical address of the small page frame Y, but the host page table entry found from hva is the large page frame X. In the native system, the offset between X and Y is calculated, and then the page structure of the page frame Y is returned, but the new large page frame PHPA manages the physical memory by using the new page management structure hugepage, so the page structure of Y cannot be found according to the offset, that is, the original logic fails.
The technical solution adopted by the invention is as follows: focus is put on the last step of host page table access, i.e. the gup _ huge _ pmd function; in this step, the physical address of the large page may be extracted from page table entry pte (actually pmd). Assuming that the corresponding hugpage structure can be obtained from the physical address, considering the problem, the parameters of the gup _ huge _ pmd function may be basically not modified, and the page structure may still be returned, except that when the PHPA large page is allocated, the returned page structure is actually a page structure strongly transformed from the struct huge structure (struct huge and struct page may be strongly transformed from each other because the internal structures of the two are consistent). However, returning only to the hugepage is not enough, and for the example of fig. 2, only the physical address of the PAGE frame X can be obtained through the hugepage structure, and the physical address of the required PAGE frame Y cannot be obtained, and the offset ((addr &pmd _ MASK) > > PAGE _ SHIFT) of the PAGE frame X and the PAGE frame Y still needs to be saved in the gup _ hugage _ PMD, which is not difficult to solve, and a new member pfn _ offset can be added in the struct hugepage structure to save the offset. Thus, returning only the structure of struct hugepage can make the upper layer function (specifically hva _ to _ pfn _ fast () function) have the complete information of the corresponding pfn, i.e. add the following branches to gup _ huge _ pmd:
the offset is stored in page- > debug _ flags in the code, and debug _ flags are pfn _ offset, because the offsets in the respective structures are consistent, as shown in table 1:
TABLE 1
The hugpage structure is stored in a pages [ ] array, which is a local variable defined in the hva _ to _ pfn _ fast () function and passes all the way down to the gup _ huge _ pmd () function, hva _ to _ pfn _ fast () function correlation flow is shown in figure 3.
page [0] is the structpage structure obtained in gup _ huge _ pmd, and it can be seen that hva _ to _ pfn _ fast () function converts to pfn using page _ to _ pfn () function after obtaining the page structure and returns. Since PHPA replaces the page structure with the hugpage structure, this part of the code needs to be modified as follows:
first, an interface is needed to determine whether page [0] is a normal page or a PHPA large page, which is named as is _ hpa _ page (), and the specific implementation thereof will be described below. When the page [0] is a PHPA large page, i.e., a struct hugepage structure, it is converted into pfn and added with the offset debug _ flags (pfn _ offset) previously stored in the hugepage structure, so that the obtained pfn is pfn of the page frame Y in fig. 3, and when the page [0] is a normal 4KB page, it is converted by a conventional method.
The same modification needs to be applied in the other branch of the EPT page fault handling, as shown in FIG. 4; the code logic indicates that after the EPT is out of page, the host page table is traversed to search for a corresponding large page, and as a result, the PHPA large page is found to be not allocated, and at this time, a hugetlb _ fault () function, namely a large page out of page interrupt processing function, is entered.
After the large page is allocated from the PHPA system, the problem of returning pfn also exists, and this problem is solved by temporarily storing the offset in the struct hugepage structure, which is similar to the above, so that the description is omitted.
So far, the technical scheme of the invention solves the problem of returning the page frame physical address in the EPT page fault processing process relatively elegantly, and the problem is mainly caused by that the EPT page table entries are small page table entries and the host page table entries are large page table entries, namely the level problem.
Meanwhile, the first contradiction point is also expressed in the logic process of releasing the PHPA large page. The native large page is decomposed into small pages by the system again when released, and each small page deletes the corresponding EPT page table entry when recovered, but because of the adoption of the PHPA new large page frame, the PHPA large page is not decomposed when released, because of the adoption of the hugepage structure to manage the physical memory, the EPT page table entry may not be completely deleted, as shown in fig. 5 specifically; this problem can be solved by manually releasing the EPT page table entries referred to by the PHPA large page, as shown in FIG. 6.
The second contradiction is mainly caused by improper coding of the page frame number pfn of the large page of the PHPA, the page frame addressing of the PHPA frame is as shown in fig. 7, fig. 7 is a physical memory map managed by the PHPA system and pfn corresponding to each page frame, it can be seen that the difference between the page frame numbers pfn of adjacent physical pages is 1, but since the page size is 2MB, the method of multiplying 4KB to obtain the physical address is not correct any more, and the definition or distribution of pfn of the hugepage needs to be modified.
In this regard, the solution adopted by the present invention is to modify the addressing method of the page frame of the PHPA framework as shown in fig. 8, so as to ensure the consistency of the PHPA large page and the normal 4KB page on pfn, which has the advantage that all the code related to pfn operation in the kernel does not need to be modified.
The third contradiction point can be solved by the above-mentioned change to PHPA page frame addressing, only need to realize is _ hpa _ pfn () and is _ hpa _ page () these two function interfaces can, the implementation is very direct too, for the former, can set up two variables hpa _ start _ pfn and hpa _ end _ pfn at the time of starting, represent the pfn value of the first PHPA large page and the last PHPA large page respectively, can judge pfn is the large page or the small page through comparing with these two boundary values; for the latter, the problem can be solved by a comparison of the pointers.
The embodiments described above are presented to enable a person having ordinary skill in the art to make and use the invention. It will be readily apparent to those skilled in the art that various modifications to the above-described embodiments may be made, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.