CN113568734A - Virtualization method and system based on multi-core processor, multi-core processor and electronic equipment - Google Patents

Virtualization method and system based on multi-core processor, multi-core processor and electronic equipment Download PDF

Info

Publication number
CN113568734A
CN113568734A CN202010358635.4A CN202010358635A CN113568734A CN 113568734 A CN113568734 A CN 113568734A CN 202010358635 A CN202010358635 A CN 202010358635A CN 113568734 A CN113568734 A CN 113568734A
Authority
CN
China
Prior art keywords
virtual
processing
core processor
container
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010358635.4A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Cambricon Information Technology Co Ltd
Original Assignee
Anhui Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Cambricon Information Technology Co Ltd filed Critical Anhui Cambricon Information Technology Co Ltd
Priority to CN202010358635.4A priority Critical patent/CN113568734A/en
Priority to US17/904,824 priority patent/US20230111884A1/en
Priority to PCT/CN2021/077977 priority patent/WO2021170054A1/en
Publication of CN113568734A publication Critical patent/CN113568734A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects

Abstract

The present disclosure describes a multi-core processor based virtualization method, system, electronic device and computing apparatus, wherein the computing apparatus may be included in a combined processing apparatus, which may further include a universal interconnect interface and other processing apparatuses. The computing device interacts with other processing devices to jointly complete computing operations specified by a user. The combined processing device may further comprise a storage device connected to the computing device and the other processing device, respectively, for data of the computing device and the other processing device.

Description

Virtualization method and system based on multi-core processor, multi-core processor and electronic equipment
Technical Field
The present disclosure relates to the field of artificial intelligence, and more particularly, to techniques for virtualization of processors.
Background
In a computer, Virtualization (Virtualization) is a resource management technology, which abstracts and converts various resources of the computer, such as servers, networks, memories, storage, etc., so that users can apply the resources in a better way than the original configuration.
FIG. 1 shows an exemplary block diagram of a virtualization implementation by time slicing (time slicing) technique.
As shown in FIG. 1, there are four virtual machines VM0-VM3 that each perform its own tasks that, after passing through the time-slice manager, are time-sliced and ordered by time. The compute engine processes different tasks (timeshared tasks) according to the time slice. In this mode, when virtual machine VM1 is operating, other virtual machines cannot operate and are waiting. When the time slice is small, the user may not be aware of the time delay easily, but if the task of a certain virtual machine takes a lot of time (e.g., VM1 shown in fig. 1), other users may experience a significant time delay, thereby affecting the user experience.
In addition, in the prior art, the computing engine is common to different virtual machines, and once a problem occurs in the computing engine due to a certain virtual machine, paralysis of all the virtual machines is affected, so that all users are affected.
Therefore, the existing virtual machine scheme has the defects of low computational efficiency, head of queue Blocking (HOL Blocking), large adjacent noise, difficulty in expansion and the like.
Disclosure of Invention
It is an object of the present disclosure to provide a method and system for multicore processor based virtualization that overcomes at least one of the deficiencies of the prior art.
According to a first aspect of the present disclosure, there is provided a method of virtualization based on a multi-core processor, wherein the multi-core processor includes a plurality of processing cores, the method comprising: dividing the multi-core processor into a plurality of virtual functions, each virtual function corresponding to one or more processing cores; and corresponding the virtual function to a virtual machine.
According to a second aspect of the present disclosure, there is provided a method of multi-core processor-based virtualization, wherein the multi-core processor includes a plurality of processing cores, the method comprising: partitioning the multi-core processor into a plurality of virtual functions, the plurality of virtual functions sharing the plurality of processing cores; and corresponding the virtual function to a container.
According to a third aspect of the present disclosure, there is provided a virtualization system comprising: a multi-core processor comprising a plurality of processing cores; a plurality of virtual functions sharing the plurality of processing cores; and a virtual machine, the virtual machine corresponding to the virtual function.
According to a fourth aspect of the present disclosure, there is provided a virtualization system comprising: a multi-core processor comprising a plurality of processing cores; a plurality of virtual functions sharing the plurality of processing cores; and a container, the container corresponding to the virtual function.
According to a fifth aspect of the present disclosure, there is provided a multi-core processor comprising a plurality of processing cores, wherein the multi-core processor is partitioned into a plurality of virtual functions, the plurality of virtual functions sharing one or more processing cores.
According to a sixth aspect of the present disclosure, there is provided an electronic device comprising the virtualization system as described above or the multi-core processor as described above.
According to a seventh aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program code which, when executed by a processor, performs the method described above.
The present disclosure can achieve at least one of the following technical effects:
higher quality of service (QoS);
no head of line blocking;
no adjacent noise influence exists;
no context switch overhead;
easy to expand and deploy.
Drawings
The foregoing and other objects, features and advantages of exemplary embodiments of the present disclosure will be readily understood by reading the following detailed description with reference to the accompanying drawings. In the drawings, several embodiments of the disclosure are illustrated by way of example and not by way of limitation, and like or corresponding reference numerals indicate like or corresponding parts and in which:
FIG. 1 shows an exemplary block diagram of one implementation of virtualization through time slicing (time slicing) techniques;
FIG. 2a shows a schematic diagram of the internal structure of a processing cluster to which the method of the present disclosure may be applied;
FIG. 2b shows a block schematic diagram of an artificial intelligence processor to which the method of the present disclosure may be applied;
FIG. 3 illustrates a multi-core processor based virtualization method according to a first aspect of the present disclosure;
FIG. 4 illustrates a virtualization system according to one embodiment of the present disclosure;
FIG. 5 illustrates a schematic diagram of virtual functions corresponding to a processing cluster, according to one embodiment of the present disclosure;
fig. 6a, 6b and 6c exemplarily show resource occupation of the PEIe card when the virtual functions are divided into 1, 2 and 4;
FIG. 7 shows a schematic block diagram of a virtualization system according to yet another embodiment of the present disclosure;
FIG. 8 illustrates an architectural diagram of a virtualization system;
FIG. 9 shows a schematic diagram of a combined treatment apparatus according to the present disclosure;
FIG. 10 shows a schematic block diagram of a board card according to the present disclosure;
FIGS. 11a and 11b show schematic diagrams comparing virtual machine mode and Docker mode;
FIG. 12 illustrates a multi-core processor based virtualization method according to a first aspect of the present disclosure;
FIG. 13 illustrates a virtualization system according to one embodiment of the present disclosure; and
FIG. 14 shows a schematic block diagram of a virtualization system according to one embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is to be understood that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by one skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the scope of protection of the present disclosure.
It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, description, and drawings of the present disclosure are used to distinguish between different objects and are not used to describe a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.
As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted depending on the context to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Further, in the specification and claims of the present disclosure, correspondence between two parts may be understood as a connection relationship, a response relationship, or a matching relationship between the two parts.
Virtualization is a technique for virtualizing a computer device into a plurality of virtual machines. When a plurality of virtual machines are simultaneously operated on one computer, each virtual machine can operate the same or different operating systems, and application programs operated on the operating systems can not influence each other in independent spaces, so that the working efficiency of the computer is obviously improved.
Virtualization techniques are distinct from multitasking or hyper-threading techniques. Multitasking refers to the simultaneous operation of multiple programs in one operating system, while in virtualization technology, multiple operating systems can be operated simultaneously, and each operating system has multiple programs to operate, and each operating system operates on one virtual machine. The hyper-threading technology is only that a single processor simulates double processors to balance program running performance, the two simulated processors cannot be separated and can only work together, and in the virtualization technology, virtual processors operate independently.
The virtualization technology generally redefines and divides physical resources of a computer by software to realize dynamic allocation, flexible scheduling and cross-domain sharing of the computer resources, thereby improving the resource utilization rate.
Fig. 2a shows a schematic diagram of the internal structure of a processing cluster to which the method of the present disclosure may be applied.
An Artificial Intelligence (AI) chip accelerates the data computing capacity and reduces the memory access delay. The AI chip adopts a multi-core processor architecture, and adds a storage unit core (also called an on-chip or on-chip storage unit) to accelerate data reading, thereby solving the memory access bottleneck problem of a processing core and a DDR (also called an off-chip storage unit) of the AI chip. And stronger computing capability is provided for a user in scenes of processing deep learning, network computing and the like.
The AI chip may have, for example, 16 processing cores for performing computational tasks. Every 4 processing cores form one processing cluster, namely 4 processing clusters in total. Within each processing cluster are multiple memory unit cores. The storage unit core is mainly used for processing data exchange between the shared storage unit and the processing core in the cluster and processing data exchange between the clusters. When the memory core and the processing core access the DDR at the same time, only one group of buses are guaranteed to access the DDR after the arbitration of the multiplexer.
FIG. 2b shows a block diagram of an artificial intelligence processor to which the method of the present disclosure can be applied.
The DDR of the AI chip adopts a Non-Uniform Memory Access (NUMA) architecture, and each processing cluster can Access different DDR channels through the NOC0, but has different delays for accessing different DDR channels. Each processing cluster corresponds to a DDR channel with the lowest access delay, and the access delay of other channels is relatively long. As shown in the structure diagram of the processing cluster and the DDR in fig. 2b, the processing cluster 0, the processing cluster 1, the processing cluster 2, and the processing cluster 3 have the lowest delay when accessing the corresponding DDR0, DDR1, DDR2, and DDR3, respectively. That is, each processing core accesses the DDR channel with the lowest access delay of the respective processing cluster.
Because the access bandwidth inside the processing cluster is higher than the access bandwidth between the processing core and the DDR, the AI chip can internally access the shared memory unit by adopting the processing cluster so as to reduce the direct access of the processing core to the DDR, thereby improving the data throughput.
When 4-core parallel computing is required, the storage unit core may broadcast data from the shared storage unit to 4 processing cores within the processing cluster simultaneously for data computation by way of data broadcast (via NOC 1). Compared with a mode that all processing cores read data through DDR, under the condition, the memory access delay can be reduced, and the computing performance is optimized.
If virtualization is performed in a conventional manner, all virtual machines share all four processing clusters, and when there are few tasks, some processing clusters will be vacant, thereby causing waste of resources.
Having described the environment in which the technical solutions of the present disclosure are applied, various embodiments of the present disclosure will be described in detail below. An embodiment of the present invention is described below with reference to fig. 3 and 4.
Fig. 3 illustrates a virtualization method based on a multi-core processor, such as an AI processor, wherein the multi-core processor includes a plurality of processing cores, according to a first aspect of the present disclosure, the method including: in operation S310, dividing the multi-core processor into a plurality of virtual functions, each of the virtual functions corresponding to one or more processing cores; and corresponding the virtual function to the virtual machine in operation S320.
FIG. 4 illustrates a virtualization system according to one embodiment of the present disclosure, the virtualization system comprising: a multi-core processor comprising a plurality of processing cores; a plurality of virtual functions VF0-VF3, each of the virtual functions corresponding to one or more processing cores; and virtual machines (virtual machine 0-virtual machine 3) corresponding to the virtual functions.
The above method and system can be implemented by SR-IOV (Single Root I/O Virtualization) technology. The SR-IOV technology is a virtualization solution based on hardware, and can provide a virtualization solution with high performance and scalability. SR-IOV enacts a standardized mechanism to enable multiple virtual machines to share one I/O device. Such that PCIe (Peripheral Component Interconnect Express) devices are efficiently shared between virtual machines, I/O performance similar to native machines can be obtained.
SR-IOV is divided into the following two functional types:
PF (Physical Function) has a PCI Function for supporting SR-IOV functions as defined in the SR-IOV specification. The PF comprises an SR-IOV function structure for managing SR-IOV functions. The PF is a full-function PCIe function that can be discovered, managed, and processed like any other PCIe device. The PF has full configuration resources that can be used to configure or control the PCIe device.
VF (Virtual Function) is a Function associated with a PF. The VF is a lightweight PCIe function that may share physical resources with the PF and with other VFs of the same PEIe device. The VF only owns the configuration resources for its own behavior.
Each SR-IOV device may have a PF, and each PF may have multiple VFs associated with it. Each VF may have a PCI memory space to map its set of registers. The VF device driver operates on the register set to enable its functionality and is actually a PCI device that is present. After a VF is created, it can be assigned directly to a guest virtual machine VM. The VFs are made to share the same physical device and perform data input and output without the CPU and hypervisor software overhead.
It should be understood that the same physical device as described above refers to different hardware resources on the same physical device. For example, the physical device may be a multi-core processor, but the hardware resources may be different processing cores on the physical device.
It follows that the virtual functions may be single or multiple. When the virtual function is single, it means that all the processing cores in the multi-core processor can be divided into a single virtual function; when the virtual functions are multiple, the virtual machines can independently run. Independent operation means that each virtual machine is isolated from each other, operates independent of and unaffected by other virtual machines, and, because the isolation of the present disclosure is hardware-based isolation, interferes less with each other. Furthermore, running independently may be with each virtual machine employing a different operating system without affecting each other.
The virtual function can execute the same work content as a multicore processor, which is obtained by logically dividing the multicore processor. One or more processing cores may be included in the virtual function, the more processing cores, the more powerful the virtual function is. It is also possible to divide all processing cores into one virtual function.
As shown in fig. 3 and 4, the virtual function may correspond to a virtual machine, e.g., virtual function VF0 corresponds to virtual machine 0, virtual function VF1 corresponds to virtual machine 1, virtual function VF2 corresponds to virtual machine 2, and virtual function VF3 corresponds to virtual machine 3. It should be understood that this correspondence is merely an example, and that other correspondences may be employed in the present disclosure to further facilitate system deployment. This will be described in more detail later. Further, although 4 virtual functions and 4 virtual machines are depicted in FIG. 4, other numbers of fewer or more are possible.
In the present disclosure, the virtual machines can operate independently without interfering with each other. Compared with the virtualization scheme adopting the time slicing technology in the prior art, the technical scheme of the disclosure adopts the virtual machines which run independently, so that the problem of head of line blocking does not exist between the virtual machines, the virtual machines are not influenced by adjacent noise, and the context switching overhead is avoided.
As shown in fig. 2a and 2b, in a multi-core processor, a certain number of processing cores constitute one processing cluster, and thus each virtual function may correspond to one or more processing clusters.
FIG. 5 illustrates a schematic diagram of virtual functions corresponding to a processing cluster, according to one embodiment of the present disclosure. It should be appreciated that although fig. 5 depicts four processing clusters (processing cluster 0-processing cluster 3) as an example, any other number of processing clusters is possible.
In example 1 shown in fig. 5, processing cluster 0, processing cluster 1, processing cluster 2, and processing cluster 3 correspond to virtual function 0, i.e., the multi-core processor is divided into one virtual function.
In example 2 shown in fig. 5, processing cluster 0, processing cluster 1, and processing cluster 2 correspond to virtual function 0, and processing cluster 3 corresponds to virtual function 1, that is, the multi-core processor is divided into two virtual functions, and virtual function 0 has a stronger processing capability than virtual function 1.
In example 3 shown in fig. 5, processing cluster 0 and processing cluster 1 correspond to virtual function 0, and processing cluster 2 and processing cluster 3 correspond to virtual function 1, that is, the multi-core processor is divided into two virtual functions, and virtual function 0 and virtual function 1 have equal processing capabilities.
In example 4 shown in fig. 5, processing cluster 0 and processing cluster 1 correspond to virtual function 0, processing cluster 2 corresponds to virtual function 1, and processing cluster 3 corresponds to virtual function 2, that is, the multi-core processor is divided into three virtual functions, virtual function 0 has a stronger processing power than virtual function 1 and virtual function 2, and virtual function 1 and virtual function 2 have an equivalent processing power.
In example 5 shown in fig. 5, processing cluster 0 corresponds to virtual function 0, processing cluster 1 corresponds to virtual function 1, processing cluster 2 corresponds to virtual function 2, and processing cluster 3 corresponds to virtual function 3, which have equal processing capabilities.
In example 6 shown in fig. 5, processing cluster 0 corresponds to virtual function 0, and processing cluster 1, processing cluster 2, and processing cluster 3 correspond to virtual function 1, with virtual function 0 having a weaker processing capability relative to virtual function 1. This example is equivalent to example 2.
In example 7 shown in fig. 5, processing cluster 0 corresponds to virtual function 0, processing cluster 1 corresponds to virtual function 1, processing cluster 2 corresponds to virtual function 0, processing cluster 3 corresponds to virtual function 1, and virtual functions 1 and 2 have equivalent processing capabilities. This example is equivalent to example 3.
In the example 8 shown in fig. 5, processing cluster 0 corresponds to virtual function 0, processing cluster 1 corresponds to virtual function 1, processing cluster 2 corresponds to virtual function 0, and processing cluster 3 corresponds to virtual function 2. Virtual function 0 has a higher processing power than virtual function 1 and virtual function 2, and virtual function 1 and virtual function 2 have the same processing power. This example is equivalent to example 4.
Therefore, different processing clusters are corresponding to different virtual functions, so that flexible configuration of the virtual functions can be realized, and the processing capacity of the virtual functions can be dynamically configured according to different requirements. Therefore, the technical solution of the present disclosure also has the advantage of simple and flexible configuration with respect to the prior art.
According to yet another disclosed embodiment, each virtual function has independent hardware resources.
The hardware resources described herein may be processing cores, as well as memory (e.g., DDR), buses, encoders/decoders, video/audio drivers, interface units, and so forth. For example, for the PCIe board resource, it includes an AI computation unit (IPU), a graphics codec unit (VPU), a graphics codec unit (JPU), and a memory. The present disclosure does not impose any limitations on the types of hardware resources.
Fig. 6a, 6b and 6c exemplarily show resource occupation of the PEIe card when the virtual functions are divided into 1, 2 and 4. The multi-core processor may be a computing device having a plurality of computing cores, such as JPU and VPU.
As shown in FIG. 6a, when there are 1 virtual functions, the virtual function VF0 will dedicate all resources, i.e., occupy all compute cores, all channels, all VPUs, and all JPUs.
As shown in fig. 6b, when the number of virtual functions is 2, the virtual functions VF0 and VF2 will use half of the resources, i.e. VF0 occupies half of the computation cores, and VF1 occupies the other half of the computation cores, respectively. Assuming there are four DDR lanes, VF0 may occupy lane 0 and lane 1, VF1 may occupy lane 2 and lane 3. Similarly, four VPUs and JPUs are arranged, so that the VF0 can occupy the VPU0 and the VPU1, and the VF1 can occupy the VPU2 and the VPU 3; VF0 may occupy JPU0 and JPU1, while VF1 may occupy JPU2 and JPU 3.
As shown in FIG. 6c, when there are 4 virtual functions, the virtual functions VF0-VF3 each account for the 1/4 compute core. Similarly, if there are four DDR channels, four VPUs and four JPUs, the virtual functions VF0-VF3 can occupy channels 0-3, respectively; virtual functions VF0-VF3 can respectively occupy VPUs 0-3; virtual functions VF0-VF3 may occupy JPU0-JPU3, respectively.
FIG. 7 shows a schematic block diagram of a virtualization system according to yet another embodiment of the present disclosure.
As shown in fig. 7, according to another embodiment of the present disclosure, the virtualization system of the present disclosure further includes: a plurality of drivers, the plurality of virtual functions driven by different drivers.
According to one embodiment of the present disclosure, the corresponding node is established for the corresponding virtual function through the driver, i.e., the client includes a driver and a directory, so that each client needs to load the driver itself and create a node, i.e., a character-type device, under the directory of the client.
Fig. 8 schematically shows a structure of the virtualization system. In the system of fig. 8, a virtual machine is employed.
As shown in FIG. 8, the framework 800 includes a user space 802, a kernel space 804, and an on-chip system 806, separated by dashed lines. The user space 802 is an operating space of a user program, and only simple operations are performed, and system resources cannot be directly called, and an instruction can be issued to a kernel only through a system interface. The kernel space 804 is a space where kernel code runs, and can execute any command and call all resources of the system. The system-on-chip 806 is a module of an artificial intelligence chip that cooperates with the user space 802 through the kernel space 804.
This embodiment is illustrated with one component virtualized into four virtual components unless otherwise emphasized, but the present disclosure does not limit the number of virtual components.
The user space 802 is controlled by the hardware monitor tool 808 to obtain information from the system-on-chip 806 by invoking an interface before virtualization is run. The hardware monitor tool 808 can not only collect information of the system-on-chip 806, but also obtain overhead of upper-layer software on resources of the system-on-chip 806 in real time, and grasp detailed information and status of the current system-on-chip 806 in real time for a user, where the detailed information and status may be: the hardware device model, the firmware version number, the drive version number, the device utilization rate, the overhead state of the storage device, the board power consumption and the board peak power consumption, the peripheral component interconnect express (PCIe), and the like. The content and quantity of information monitored may vary depending on the version and usage scenario of the hardware monitor tool 808.
After the system starts virtualization, the operation of the user space 802 is instead taken over by the user virtual machine 810, the user virtual machine 810 is an abstraction and simulation of the real computing environment, and the system allocates a set of data structures to manage the state of the user virtual machine 810, where the data structures include the use of a full set of registers, physical memory, the state of virtual devices, and so on. The physical space of the user space 802 in this embodiment is virtualized into four virtual spaces 812, 814, 816, 818, the four virtual spaces 812, 814, 816, 818 are independent and do not affect each other, and different guest operating systems, such as guest operating system 1, guest operating system 2, guest operating system 3, and guest operating system 4 shown in the figure, may be respectively loaded on the guest operating systems, which may be Windows, Linux, Unix, iOS, android, and the like, and each guest operating system runs different application programs.
In this embodiment, user virtual machine 810 is implemented with a Quick Emulator (QEMU). QEMU is an open source virtualization software written in C language that virtualizes the interface through dynamic binary translation and provides a series of hardware models that allow guest operating system 1, guest operating system 2, guest operating system 3, and guest operating system 4 to think they are accessing system-on-chip 806 directly. The user space 802 includes processors, memory, I/O devices, etc., and the QEMU may virtualize the processors of the user space 802 into four virtual processors, and memory into four virtual memories, as well as virtualize the I/O devices into four virtual I/O devices. Each guest operating system occupies a portion of the user space 802 resources, e.g., one-fourth, that is, has access to a virtual processor, a virtual memory, and a virtual I/O device, respectively, to perform the tasks of the guest operating system. In this mode, the guest operating systems 1, 2, 3 and 4 can operate independently.
The kernel space 804 carries a kernel virtual machine 820 and a chip driver 822. The kernel virtual machine 820, in conjunction with the QEMU, is primarily responsible for virtualizing the kernel space 804 and the system-on-chip 806, so that each guest operating system can obtain its own address space when accessing the system-on-chip 806. In more detail, the space on the system-on-chip 806 that maps to the guest operating system is actually a virtual component that maps to this process.
From the perspective of the user virtual machine 810, during the running period of the virtual machine, the QEMU performs kernel setting through a system call interface provided by the kernel virtual machine 820, and the QEMU uses the virtualization function of the kernel virtual machine 820 to provide hardware virtualization acceleration for the own virtual machine so as to improve the performance of the virtual machine. From the perspective of the kernel virtual machine 820, when a user cannot directly interact with the kernel space 804, a management tool via the user space 802 is required, and therefore a tool operating in the user space 802 via QEMU is required.
The chip driver 822 is used to drive the physical function 826, and during the running of the virtual machine, the user space 802 does not access the system-on-chip 806 from the hardware monitor tool 808 through the chip driver 822, so that the guest os 1, the guest os 2, the guest os 3, and the guest os 4 are respectively configured with the kernel space 824 for loading the chip driver 822, so that each guest os can still drive the system-on-chip 806.
The system-on-chip 806 performs virtualization through SR-IOV techniques, which may, in more detail, virtualize the components of the system-on-chip 806. Thus, each virtual component has its own corresponding uniquely accessible resource.
The system-on-chip 806 of this embodiment includes hardware and firmware. The hardware includes read only memory ROM (not shown) to store firmware including physical functions 826 to support or cooperate with the PCIe functions of the SR-IOV, the physical functions 826 having the authority to fully configure PCIe resources. In implementing the SR-IOV technique, the physical function 826 virtualizes a plurality of virtual functions 828, in this embodiment four virtual functions 828. Virtual function 828 is a lightweight PCIe function managed by physical function 826 that may share PCIe physical resources with physical function 826 and other virtual functions 828 associated with the same physical function 826. Virtual function 828 only allows control of the resources that physical function 826 configures to itself.
Once SR-IOV is enabled in physical function 826, each virtual function 828 has access to its own PCIe configuration space through its bus, device and function number. Each virtual function 828 has a memory space for mapping its register set. The virtual function 828 driver operates on a set of registers to enable its functionality and is directly assigned to the corresponding user virtual machine 810. Although virtual, the user virtual machine 810 is said to be an actually present PCIe device.
The hardware of the system-on-chip 806 also includes a computing device 830, a video codec device 832, a JPEG codec device 834, a storage device 836, and PCIe 838. In this embodiment, the computing device 830 is an intelligent processing device IPU, which is used to perform convolution calculation of a neural network; the video codec device 832 is used for coding and decoding video data; the JPEG codec 834 is configured to encode and decode a still picture using a JPEG algorithm; memory device 836 may be a Dynamic Random Access Memory (DRAM) device for storing data; PCIe 838 is the aforementioned PCIe, during the operation of the virtual machine, PCIe 838 is virtualized into four virtual interfaces 840, and virtual functions 828 and virtual interfaces 840 are in a one-to-one correspondence relationship, that is, a first virtual function interfaces to a first virtual interface, a second virtual function interfaces to a second virtual interface, and so on.
With SR-IOV technology, the computing device 830 is virtualized into four virtual computing devices 842, the video codec device 832 is virtualized into four virtual video codec devices 844, the JPEG codec device 834 is virtualized into four virtual JPEG codec devices 846, and the storage device 836 is virtualized into four virtual storage devices 848.
Each guest operating system is configured with a set of virtual suites, each set of virtual suites comprising a user virtual machine 810, a virtual interface 840, a virtual function 828, a virtual compute device 842, a virtual video codec device 844, a virtual JPEG codec device 846, and a virtual storage device 848. Each set of virtual suites runs independently and independently without interference, and is used to perform the tasks delivered by the corresponding guest operating systems, so as to determine that each guest operating system can access the configured virtual computing device 842, virtual video codec 844, virtual JPEG codec 846, and virtual storage 848 via the configured virtual interface 840 and virtual function 828.
In more detail, each guest operating system responds to different tasks when executing the tasks, and hardware required to be accessed may also be different, for example: if a task is to perform an operation, such as a matrix convolution operation, the guest operating system accesses the configured virtual compute device 842 through the configured virtual interface 840 and virtual function 828; if a task is to perform video codec, the client os accesses the configured virtual video codec device 844 through the configured virtual interface 840 and the virtual function 828; if a task is JPEG coding, the guest OS accesses the configured virtual JPEG codec 846 through the configured virtual interface 840 and virtual function 828; if a task is to read or write data, the guest operating system accesses the configured virtual storage 848 via the configured virtual interface 840 and virtual function 828.
In the above, a virtualization method based on a multi-core processor is described, which adopts a virtual machine manner. And hereinafter, a docker-container approach may be adopted.
As shown in fig. 12, the present disclosure also provides a virtualization method based on a multi-core processor, wherein the multi-core processor includes a plurality of processing cores, the method including: in operation S1210, dividing the multi-core processor into a plurality of virtual functions, the plurality of virtual functions sharing the plurality of processing cores; and corresponding the virtual function to a container in operation S1220.
FIG. 13 illustrates a virtualization system according to one embodiment of the present disclosure, the virtualization system comprising: a multi-core processor comprising a plurality of processing cores; a plurality of virtual functions sharing the plurality of processing cores; and a container, the container corresponding to the virtual function.
As shown in fig. 12 and 13, the virtual function may correspond to a container, e.g., virtual function VF0 corresponds to container 0, virtual function VF1 corresponds to container 1, virtual function VF2 corresponds to container 2, and virtual function VF3 corresponds to container 3. It is to be understood that this correspondence is merely an example, and that other correspondences may be employed in the present disclosure, thereby facilitating system deployment. This will be described in more detail later. Further, while 4 virtual functions and 4 containers are depicted in FIG. 13, other numbers, fewer or greater, are possible.
In the present disclosure, the containers hold hardware and software resources required to perform tasks (e.g., task 0-task 3), which can run independently of each other without interfering with each other. Compared with the virtualization scheme adopting the time slicing technology in the prior art, the technical scheme of the disclosure adopts the independently operated containers, so that the problem of head of line blocking does not exist between the containers, the containers are not influenced by adjacent noise, and the context switching overhead is avoided.
As shown in fig. 2a and 2b, in a multi-core processor, a certain number of processing cores constitute one processing cluster, and thus a plurality of virtual functions share one or more processor clusters.
FIG. 5 illustrates a schematic diagram of virtual functions corresponding to a processing cluster, according to one embodiment of the present disclosure. It should be appreciated that although fig. 5 depicts four processing clusters (processing cluster 0-processing cluster 3) as an example, any other number of processing clusters is possible.
In example 1 shown in fig. 5, processing cluster 0, processing cluster 1, processing cluster 2, and processing cluster 3 correspond to virtual function 0, i.e., the multi-core processor is divided into one virtual function.
In example 2 shown in fig. 5, processing cluster 0, processing cluster 1, and processing cluster 2 correspond to virtual function 0, and processing cluster 3 corresponds to virtual function 1, that is, the multi-core processor is divided into two virtual functions, and virtual function 0 has a stronger processing capability than virtual function 1.
In example 3 shown in fig. 5, processing cluster 0 and processing cluster 1 correspond to virtual function 0, and processing cluster 2 and processing cluster 3 correspond to virtual function 1, that is, the multi-core processor is divided into two virtual functions, and virtual function 0 and virtual function 1 have equal processing capabilities.
In example 4 shown in fig. 5, processing cluster 0 and processing cluster 1 correspond to virtual function 0, processing cluster 2 corresponds to virtual function 1, and processing cluster 3 corresponds to virtual function 2, that is, the multi-core processor is divided into three virtual functions, virtual function 0 has a stronger processing power than virtual function 1 and virtual function 2, and virtual function 1 and virtual function 2 have an equivalent processing power.
In example 5 shown in fig. 5, processing cluster 0 corresponds to virtual function 0, processing cluster 1 corresponds to virtual function 1, processing cluster 2 corresponds to virtual function 2, and processing cluster 3 corresponds to virtual function 3, which have equal processing capabilities.
In example 6 shown in fig. 5, processing cluster 0 corresponds to virtual function 0, and processing cluster 1, processing cluster 2, and processing cluster 3 correspond to virtual function 1, with virtual function 0 having a weaker processing capability relative to virtual function 1. This example is equivalent to example 2.
In example 7 shown in fig. 5, processing cluster 0 corresponds to virtual function 0, processing cluster 1 corresponds to virtual function 1, processing cluster 2 corresponds to virtual function 0, processing cluster 3 corresponds to virtual function 1, and virtual functions 1 and 2 have equivalent processing capabilities. This example is equivalent to example 3.
In the example 8 shown in fig. 5, processing cluster 0 corresponds to virtual function 0, processing cluster 1 corresponds to virtual function 1, processing cluster 2 corresponds to virtual function 0, and processing cluster 3 corresponds to virtual function 2. Virtual function 0 has a higher processing power than virtual function 1 and virtual function 2, and virtual function 1 and virtual function 2 have the same processing power. This example is equivalent to example 4.
Therefore, one or more processor clusters are shared by a plurality of virtual functions, flexible configuration of the virtual functions can be realized, and the processing capacity of the virtual functions can be dynamically configured according to different requirements. Therefore, the technical solution of the present disclosure also has the advantage of simple and flexible configuration with respect to the prior art.
In the present disclosure, the plurality of virtual functions may share hardware resources, which may be processing cores, or may be memory (e.g., DDR), buses, encoders/decoders, video/audio drivers, interface units, and the like. For example, for the PCIe board resource, it includes an AI computation unit (IPU), a graphics codec unit (VPU), a graphics codec unit (JPU), and a memory. The present disclosure does not impose any limitations on the types of hardware resources.
Fig. 6a, 6b and 6c exemplarily show resource occupation of the PEIe card when the virtual functions are divided into 1, 2 and 4. The multi-core processor may be a computing device having a plurality of computing cores, such as JPU and VPU.
As shown in FIG. 6a, when there are 1 virtual functions, the virtual function VF0 will dedicate all resources, i.e., occupy all compute cores, all channels, all VPUs, and all JPUs.
As shown in fig. 6b, when the number of virtual functions is 2, the virtual functions VF0 and VF2 will use half of the resources, i.e. VF0 occupies half of the computation cores, and VF1 occupies the other half of the computation cores, respectively. Assuming there are four DDR lanes, VF0 may occupy lane 0 and lane 1, VF1 may occupy lane 2 and lane 3. Similarly, four VPUs and JPUs are arranged, so that the VF0 can occupy the VPU0 and the VPU1, and the VF1 can occupy the VPU2 and the VPU 3; VF0 may occupy JPU0 and JPU1, while VF1 may occupy JPU2 and JPU 3.
As shown in FIG. 6c, when there are 4 virtual functions, the virtual functions VF0-VF3 each account for the 1/4 compute core. Similarly, if there are four DDR channels, four VPUs and four JPUs, the virtual functions VF0-VF3 can occupy channels 0-3, respectively; virtual functions VF0-VF3 can respectively occupy VPUs 0-3; virtual functions VF0-VF3 may occupy JPU0-JPU3, respectively.
FIG. 14 shows a schematic block diagram of a virtualization system according to yet another embodiment of the present disclosure.
As shown in fig. 14, according to another embodiment of the present disclosure, the virtualization system of the present disclosure further includes: a common driver by which the plurality of virtual functions are driven.
The driver may be common to all virtual functions, which may be a program installed in the operating system. The driver may, for example, establish a corresponding node for each virtual function VF, which may be a file stored in a certain directory (e.g. the dev directory) for other applications to run or call. The name of the file may vary from vendor to vendor.
After the nodes are created, one or more of the nodes may be included or mapped into a corresponding container. Each container may contain one or more nodes, which means that each container may correspond to or contain one or more virtual functions. In the present disclosure, each container may correspond to or contain a different number of nodes, whereby the configuration of the container will be more flexible and the deployment will be more convenient. In addition, since the computing power of each virtual function may be different, it is possible to design very flexibly according to the requirement.
After the containers are established, the method of the present disclosure may further comprise establishing a one-to-one mirror image for each of the containers, the mirror image being capable of communicating with the containers. The mirror image described above may be established by the docker-container technique.
The image can be remotely installed at a user end, and a user can run or call the container through the image so as to call the multi-core processor and other related various resources.
Fig. 11a and 11b show comparative schematic diagrams of a virtual machine mode and a Docker-container mode.
In fig. 11a, in virtual machine mode, Host delivers (Pass through) PCIe devices into client Guest, which includes a driver and a directory, so each client needs to load the driver itself and create a node, i.e., a character-type device, under the client's directory.
In FIG. 11b, in Docker mode, both the driver and the directory are in the host, so only the host needs to load the driver, which is common to all virtual functions. Thus, the driver of the host creates a node, i.e., a character-type device, under the host directory and then passes the device into the mirroring device, i.e., inside the Docker. Thus, compared with the virtual machine mode, the docker-container mode of the present disclosure does not require each client to install and load a driver, thereby simplifying the setup of the system and facilitating the use of the user.
Similar to the use scenario of a hardware virtual machine, the lightweight virtualization solution based on Docker not only uses the whole card as the granularity, but also needs a plurality of containers with finer granularity to share one or more physical accelerator cards. Within each Docker container, one or more VFs may be used. VF between different containers may work independently and safely isolated from each other.
A hardware virtualization scheme using SR-IOV may simultaneously support Docker's usage patterns and enable the simultaneous generation of multiple VFs on a physical machine. The system administrator may assign different VFs to different containers as needed. The VFs belonging to different containers can work independently without interference, and the robustness (also called robustness) and the safety isolation between the VFs are equal to those between the PFs. Compared with the virtual machine mode, the Docker has the advantages of being faster in starting, less in required resources and higher in utilization rate of the system. It is simpler to develop, test and deploy.
The present disclosure also provides for a multi-core processor including a plurality of processing cores, wherein the multi-core processor is partitioned into a plurality of virtual functions, each of the virtual functions corresponding to one or more processing cores.
The present disclosure also discloses an electronic device comprising the virtualization system or the multi-core processor as described above. The electronic device may be a host, that is, the technical solution of the present disclosure is implemented in the host, and communicates with an external mirror image (docker).
The SR-IOV function has better tenant isolation and application hot migration characteristics, and can provide safe and high-quality AI computing resources for cloud service providers so as to fully guarantee investment of users in the AI field.
The scheme of the present disclosure aims at one pain point of the user, namely how to efficiently utilize AI computing resources.
The chip, the device and the electronic equipment adopting the scheme disclosed by the invention support comprehensive AI inference scene deployment, including diversified artificial intelligence applications such as vision, voice and natural language processing. The technical scheme disclosed by the invention supports diversified deployment scenes such as a data center, professional scenes and even desktops.
In these deployment scenarios, when cloud-oriented deployment, diversified artificial intelligence inference, and application development in cooperation with an edge side board are performed, how to effectively utilize AI computing resources is a primary concern of users. Is also a core appeal of the SR-IOV virtualization function of the present disclosure:
1) deployment facing cloud: under a cloud deployment environment, a Cloud Service Provider (CSP) helps massive tenants to provide services of computing, storage and network resources in a cost-effective and high-availability mode, and on the basis, a high available service level of up to 99.99% can be provided. The basic appeal of the AI cloud service is achieved by efficiently sharing resources from the Hypervisor and underlying hardware and mutually isolating multi-tenants and instances.
2) Complex artificial intelligence oriented inference: when an AI application is deployed, a user usually encounters a scene with complex business logic, and needs to construct an AI decision-making assisting system by means of multiple network models. In order to ensure the quality of service in the server node, a deployment manner of one machine with multiple cards is generally adopted. However, when the calculation cost and the service quality need to be considered, a user may want to solve the problem by using a single board card to parallel a plurality of models.
3) Edge-oriented, end-side application development: according to the scheme, comprehensive coverage can be achieved in three dimensions of cloud, edge and end, and in the application development process facing the edge side and the end side, users are often limited by a CPU (central processing unit), product forms or network conditions of a deployment side and cannot be directly developed on finally deployed equipment. The scheme disclosed by the invention supports the adoption of a terminal-cloud integrated development environment to help a user quickly land an application, and helps cloud-side computing resources to be efficiently and reasonably distributed to an application development group, and is an object of the disclosure.
The SR-IOV function provided by the disclosure can make AI cloud, service deployment and application development more flexible, efficient and safe.
The virtualization technology adopted by the present disclosure allows multiple operating systems and application programs to coexist on one physical computing platform, sharing the computing resources of the same chip. It provides good security and isolation for the user and also supports high flexibility characteristics such as hot migration. The virtualization technology is also helpful for improving the cloud computing density and enabling IT asset management of the data center to be more flexible.
In addition to virtualizing the basic resource sharing features, the SR-IOV virtualization technology of the present disclosure supports multiple instances running on a cloud server to directly share the hardware resources of a smart chip. A large amount of resources and time in the traditional virtualization system are consumed at the level of Hypervisor or VMM software, and the performance advantage of PCIe equipment cannot be fully exerted. And the value of the SR-IOV lies in eliminating the software bottleneck and assisting a plurality of virtual machines to realize efficient physical resource sharing.
Unlike the virtualization technology used by the vGPU of the conventional graphics accelerator card, the scheme of the present disclosure uses a "non-slice-based sharing" approach, because there is no performance loss caused by slice switching context, the independent service quality of each VF can be fully guaranteed, and the VF can be operated completely independently without affecting each other.
In addition, SR-IOV can also avoid the performance overhead brought by time-sharing multiplexing switching application. As shown in the above figure, when the virtual function is run in conjunction with a Docker or a Virtual Machine (VM), the service performance of a single VF is kept above 91% of the hardware performance. This allows the user to make more accurate quality of service (QoS) expectations for each VF when multiple models are concurrent, without having to consider the performance overhead of congestion or handovers in multiple models.
SR-IOV based virtual functions (e.g., vMLU) can also provide better tenant isolation. Virtualization technology is widely adopted by data centers, in addition to because it provides the ability to share resources (providing better density performance), and because virtualization provides better isolation and security relative to other technologies (e.g., docker). The SR-IOV-based virtualization technology in the disclosure can help cloud users to realize better isolation characteristics, and has the following specific advantages:
firstly, the resources are independent and do not interfere with each other, so that the quality of service (QoS) can be ensured; secondly, when multiple tasks are carried out, the trouble of no queue blocking is avoided; thirdly, the VF is provided with independent memory resources, and the VFs are not visible with each other; finally, its deployment is relatively simple, requiring no modifications to the open source software components.
The SR-IOV flat technology (e.g., as shown in FIGS. 12-14) in the present disclosure, which is directed to Docker-container, can provide a more efficient deployment. In addition to providing virtualization support for Virtual Machines (VMs), the disclosed technology also provides SR-IOV based virtualization extensions (SR-IOV flat mode) for docker-containers for multiple containers (containers) to share the computing power of a single board, while providing kubernets based management plug-ins. This functionality provides a lighter weight deployment for data centers that have less stringent isolation and security requirements.
Compared with an Elastic GPU Shared pool technology (Elastic GPUs), the SR-IOV-Flat technology adopted by the disclosure has obvious advantages in isolation and QoS.
According to different application scenarios, the electronic device or apparatus may further include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet computer, a smart terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.
The present disclosure also provides a computer readable storage medium having stored thereon computer program code for performing the method as described above when the computer program code is executed by a processor.
The present disclosure can achieve at least one of the following technical effects:
1. and by adopting hardware isolation, the safety is greatly improved, and even if one virtual function or container has a problem, the normal operation of other parts cannot be influenced.
2. The Quick Emulator (QEMU) does not need to be modified, thereby reducing the complexity of setting up the system.
3. Since each part is relatively independent, the delay is small, and the service quality (QoS) is high.
4. Without head of line blocking.
5. No adjacent noise influence.
6. The method has no overhead of context switching, and adopts a mode of non-time slice-based sharing different from a virtualization technology adopted by the traditional vGPU, thereby eliminating performance overhead caused by context switching.
7. Easy to expand and deploy.
Fig. 9 illustrates a combined processing device 900 that includes a computing device 902 (e.g., computing device 830 described in fig. 8, etc.), a universal interconnect interface 904, and other processing devices 906. The computing device according to the present disclosure interacts with other processing devices to collectively perform operations specified by a user. Fig. 9 is a schematic view of a combined treatment apparatus.
Other processing devices include one or more of general purpose/special purpose processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing devices is not limited. The other processing devices are used as interfaces of the machine learning arithmetic device and external data and control, and comprise data transportation to finish basic control of starting, stopping and the like of the machine learning arithmetic device; other processing devices may cooperate with the machine learning computing device to perform computing tasks.
A universal interconnect interface for transferring data and control instructions between a computing device (including, for example, a machine learning computing device) and other processing devices. The computing device acquires required input data from other processing devices and writes the input data into a storage device on the computing device chip; control instructions can be obtained from other processing devices and written into a control cache on a computing device slice; the data in the storage module of the computing device can also be read and transmitted to other processing devices.
Optionally, the architecture may further comprise a storage device 908, which is connected to said computing device and said other processing device, respectively. The storage device is used for storing data in the computing device and the other processing devices, and is particularly suitable for storing all data which cannot be stored in the internal storage of the computing device or the other processing devices.
The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, effectively reduces the core area of a control part, improves the processing speed and reduces the overall power consumption. In this case, the generic interconnect interface of the combined processing device is connected to some components of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface.
In some embodiments, the present disclosure also discloses a chip including the above-mentioned computing device or combined processing device.
In some embodiments, the disclosure also discloses a board card comprising the chip. Referring to fig. 10, an exemplary board card is provided that may include other kits in addition to the chip 1002, including but not limited to: a memory device 1004, an interface device 1006, and a control device 1008.
The memory device is connected with the chip in the chip packaging structure through a bus and used for storing data. The memory device may include multiple sets of memory cells 1010. Each group of the storage units is connected with the chip through a bus. It is understood that each group of the memory cells may be a DDR SDRAM (Double Data Rate SDRAM).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the chip may internally include 4 72-bit DDR4 controllers, and 64 bits of the 72-bit DDR4 controller are used for data transmission and 8 bits are used for ECC check. In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each memory unit.
The interface device is electrically connected with a chip in the chip packaging structure. The interface means are used for enabling data transfer between the chip and an external device 1012, such as a server or a computer. For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface, so as to implement data transfer. In another embodiment, the interface device may also be another interface, and the disclosure does not limit the concrete expression of the other interface, and the interface unit may implement the switching function. In addition, the calculation result of the chip is still transmitted back to an external device (e.g., server) by the interface device.
The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). As the chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, a plurality of loads may be brought. Therefore, the chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing and/or a plurality of processing circuits in the chip.
It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art will also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other ways of dividing the actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, optical, acoustic, magnetic or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may also be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.
The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. With this understanding, when the technical solution of the present disclosure can be embodied in the form of a software product stored in a memory, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned memory comprises: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
The foregoing detailed description of the disclosed embodiments, and the specific examples used herein to illustrate the principles and implementations of the present disclosure, are presented solely to aid in the understanding of the methods and their core concepts; in addition, the disclosure should not be construed as limited to the particular embodiments set forth herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The technical solution of the present disclosure can be better understood by the following clauses:
clause 1. a method of multi-core processor-based virtualization, wherein the multi-core processor includes a plurality of processing cores, the method comprising:
dividing the multi-core processor into a plurality of virtual functions, each virtual function corresponding to one or more processing cores; and
and corresponding the virtual function to a virtual machine.
Clause 2. the method of clause 1, wherein the virtual machine is multiple, and the multiple virtual machines can operate independently.
Clause 3. the method of clause 1 or 2, wherein a particular number of processing cores form a processing cluster, each virtual function corresponding to one or more processing clusters.
Clause 4. the method of any one of clauses 1-3, wherein one virtual function is mapped to one virtual machine; or to correspond multiple virtual functions to one virtual machine.
Clause 5. the method of any of clauses 1-4, wherein each virtual function has independent hardware resources.
Clause 6. the method of any of clauses 1-5, wherein the plurality of virtual functions are driven by different drivers.
Clause 7. the method of clause 6, wherein a corresponding node is established for the respective virtual function by the driver.
Clause 8. a method of multi-core processor-based virtualization, wherein the multi-core processor includes a plurality of processing cores, the method comprising:
partitioning the multi-core processor into a plurality of virtual functions, the plurality of virtual functions sharing the plurality of processing cores; and
and corresponding the virtual function to a container.
Clause 9. the method of clause 8, wherein the container is multiple, the multiple containers being independently operable therebetween.
Clause 10. the method of clause 8 or 9, wherein a particular number of the processing cores form a processing cluster, and the plurality of virtual functions share one or more processing clusters.
Clause 11. the method of any one of clauses 8-10, wherein one virtual function is mapped to one container; or to correspond multiple virtual functions to one container.
Clause 12. the method of any of clauses 9-11, wherein the plurality of virtual functions are driven by a common driver.
Clause 13. the method of clause 12, wherein a corresponding node is established for each virtual function by the driver, the container corresponding to one or more nodes.
Clause 14. the method of any of clauses 8-13, further comprising establishing a one-to-one correspondence mirror image for each of the containers, the mirror image capable of communicating with the container.
Clause 15. a virtualization system, comprising:
a multi-core processor comprising a plurality of processing cores;
a plurality of virtual functions sharing the plurality of processing cores; and
a virtual machine, the virtual machine corresponding to the virtual function.
Clause 16. the virtualization system according to clause 15, wherein the virtual machine is multiple, and the multiple virtual machines can operate independently.
Clause 17. the virtualization system according to clause 15 or 16, wherein a certain number of the processing cores form one processing cluster, and the plurality of virtual functions share one or more of the processor clusters.
Clause 18. the virtualization system according to any one of clauses 15-17, wherein one virtual function corresponds to one virtual machine; or a plurality of virtual functions correspond to one virtual machine.
Clause 19. the virtualization system of any one of clauses 15-18, wherein each virtual function has independent hardware resources.
Clause 20. the virtualization system of any of clauses 15-19, further comprising: a plurality of drivers, the plurality of virtual functions driven by different drivers.
Clause 21. the virtualization system of clause 22, wherein the driver is configured to establish a corresponding node for a respective virtual function.
Clause 22. a virtualization system, comprising:
a multi-core processor comprising a plurality of processing cores;
a plurality of virtual functions sharing the plurality of processing cores; and
a container, the container corresponding to the virtual function.
Clause 23. the virtualization system of clause 22, wherein the container is multiple, the multiple containers being capable of independent operation therebetween.
Clause 24. the virtualization system according to clause 22 or 23, wherein a certain number of the processing cores form one processing cluster, and the plurality of virtual functions share one or more of the processor clusters.
Clause 25. the virtualization system of any one of clauses 22-24, wherein one virtual function corresponds to one container; or a plurality of virtual functions correspond to one container.
Clause 26. the virtualization system of any one of clauses 22-25, wherein the plurality of virtual functions share hardware resources.
Clause 27. the virtualization system of any of clauses 22-26, further comprising: a common driver by which the plurality of virtual functions are driven.
Clause 28. the virtualization system of clause 27, wherein the common driver is configured to establish a corresponding node for each virtual function, the container corresponding to one or more nodes.
Clause 29. the virtualization system of any of clauses 22-28, further comprising a mirror, the mirror corresponding one-to-one to the container and being capable of communicating with the container.
Clause 30 a multi-core processor that includes a plurality of processing cores, wherein,
the multi-core processor partition is divided into a plurality of virtual functions that share one or more processing cores.
Clause 31. an electronic device comprising the virtualization system of any one of clauses 15-29 or the multi-core processor of clause 30.
Clause 32. a computer-readable storage medium having computer program code stored thereon, which, when executed by a processor, performs the method of any of clauses 1-14.

Claims (32)

1. A method of multi-core processor-based virtualization, wherein the multi-core processor includes a plurality of processing cores, the method comprising:
dividing the multi-core processor into a plurality of virtual functions, each virtual function corresponding to one or more processing cores; and
and corresponding the virtual function to a virtual machine.
2. The method of claim 1, wherein the virtual machines are multiple and can run independently.
3. A method according to claim 1 or 2, wherein a certain number of processing cores form one processing cluster, each virtual function corresponding to one or more processing clusters.
4. A method according to any of claims 1-3, wherein one virtual function is assigned to one virtual machine; or to correspond multiple virtual functions to one virtual machine.
5. The method of any of claims 1-4, wherein each virtual function has independent hardware resources.
6. The method of any of claims 1-5, wherein the plurality of virtual functions are driven by different drivers.
7. The method of claim 6, wherein a corresponding node is established for a respective virtual function by the driver.
8. A method of multi-core processor-based virtualization, wherein the multi-core processor includes a plurality of processing cores, the method comprising:
partitioning the multi-core processor into a plurality of virtual functions, the plurality of virtual functions sharing the plurality of processing cores; and
and corresponding the virtual function to a container.
9. The method of claim 8, wherein the container is a plurality of containers capable of independent operation.
10. A method according to claim 8 or 9, wherein a certain number of processing cores form one processing cluster, and a plurality of virtual functions share one or more processing clusters.
11. The method according to any of claims 8-10, wherein one virtual function is corresponded to one container; or to correspond multiple virtual functions to one container.
12. The method of any of claims 9-11, wherein the plurality of virtual functions are driven by a common driver.
13. The method of claim 12, wherein a corresponding node is established for each virtual function by the driver, the container corresponding to one or more nodes.
14. The method of any of claims 8-13, further comprising establishing a one-to-one mirror for each of the containers, the mirror being capable of communicating with the container.
15. A virtualization system, comprising:
a multi-core processor comprising a plurality of processing cores;
a plurality of virtual functions sharing the plurality of processing cores; and
a virtual machine, the virtual machine corresponding to the virtual function.
16. The virtualization system according to claim 15, wherein the virtual machine is a plurality of virtual machines capable of operating independently.
17. The virtualization system of claim 15 or 16, wherein a certain number of processing cores form one processing cluster, and wherein multiple virtual functions share one or more processor clusters.
18. The virtualization system of any one of claims 15-17, wherein one virtual function corresponds to one virtual machine; or a plurality of virtual functions correspond to one virtual machine.
19. The virtualization system of any one of claims 15-18, wherein each virtual function has independent hardware resources.
20. The virtualization system of any one of claims 15-19, further comprising: a plurality of drivers, the plurality of virtual functions driven by different drivers.
21. The virtualization system of claim 22, wherein the driver is configured to establish a corresponding node for a respective virtual function.
22. A virtualization system, comprising:
a multi-core processor comprising a plurality of processing cores;
a plurality of virtual functions sharing the plurality of processing cores; and
a container, the container corresponding to the virtual function.
23. The virtualization system of claim 22 wherein said container is multiple, and wherein multiple containers are capable of independent operation.
24. The virtualization system of claim 22 or 23, wherein a certain number of processing cores form one processing cluster, and wherein multiple virtual functions share one or more processor clusters.
25. The virtualization system of any one of claims 22-24, wherein one virtual function corresponds to one container; or a plurality of virtual functions correspond to one container.
26. The virtualization system of any one of claims 22-25, wherein said plurality of virtual functions share hardware resources.
27. The virtualization system of any one of claims 22-26, further comprising: a common driver by which the plurality of virtual functions are driven.
28. The virtualization system of claim 27, wherein the common driver is configured to establish a corresponding node for each virtual function, the container corresponding to one or more nodes.
29. The virtualization system of any one of claims 22-28, further comprising a mirror image, said mirror image corresponding one-to-one with said container and capable of communicating with said container.
30. A multi-core processor including a plurality of processing cores, wherein,
the multi-core processor partition is divided into a plurality of virtual functions that share one or more processing cores.
31. An electronic device comprising the virtualization system of any one of claims 15-29 or the multi-core processor of claim 30.
32. A computer-readable storage medium having stored thereon computer program code which, when executed by a processor, performs the method of any one of claims 1-14.
CN202010358635.4A 2020-02-28 2020-04-29 Virtualization method and system based on multi-core processor, multi-core processor and electronic equipment Pending CN113568734A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202010358635.4A CN113568734A (en) 2020-04-29 2020-04-29 Virtualization method and system based on multi-core processor, multi-core processor and electronic equipment
US17/904,824 US20230111884A1 (en) 2020-02-28 2021-02-25 Virtualization method, device, board card and computer-readable storage medium
PCT/CN2021/077977 WO2021170054A1 (en) 2020-02-28 2021-02-25 Virtualization method, device, board card and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010358635.4A CN113568734A (en) 2020-04-29 2020-04-29 Virtualization method and system based on multi-core processor, multi-core processor and electronic equipment

Publications (1)

Publication Number Publication Date
CN113568734A true CN113568734A (en) 2021-10-29

Family

ID=78158874

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010358635.4A Pending CN113568734A (en) 2020-02-28 2020-04-29 Virtualization method and system based on multi-core processor, multi-core processor and electronic equipment

Country Status (1)

Country Link
CN (1) CN113568734A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114217997A (en) * 2022-02-22 2022-03-22 苏州浪潮智能科技有限公司 Method, system, equipment and storage medium for improving real-time performance of KVM display data
WO2023159652A1 (en) * 2022-02-28 2023-08-31 华为技术有限公司 Ai system, memory access control method, and related device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114217997A (en) * 2022-02-22 2022-03-22 苏州浪潮智能科技有限公司 Method, system, equipment and storage medium for improving real-time performance of KVM display data
WO2023159652A1 (en) * 2022-02-28 2023-08-31 华为技术有限公司 Ai system, memory access control method, and related device

Similar Documents

Publication Publication Date Title
US10691363B2 (en) Virtual machine trigger
CA2933712C (en) Resource processing method, operating system, and device
US9798565B2 (en) Data processing system and method having an operating system that communicates with an accelerator independently of a hypervisor
WO2018119952A1 (en) Device virtualization method, apparatus, system, and electronic device, and computer program product
US20120054740A1 (en) Techniques For Selectively Enabling Or Disabling Virtual Devices In Virtual Environments
US20180157519A1 (en) Consolidation of idle virtual machines
CN103034524A (en) Paravirtualized virtual GPU
CN101183315A (en) Paralleling multi-processor virtual machine system
CN115988217B (en) Virtualized video encoding and decoding system, electronic equipment and storage medium
US20120144146A1 (en) Memory management using both full hardware compression and hardware-assisted software compression
CN113326226A (en) Virtualization method and device, board card and computer readable storage medium
US20190205259A1 (en) Exitless extended page table switching for nested hypervisors
CN113568734A (en) Virtualization method and system based on multi-core processor, multi-core processor and electronic equipment
WO2021223744A1 (en) Method for realizing live migration, chip, board, and storage medium
CN114138423A (en) Virtualization construction system and method based on domestic GPU (graphics processing Unit) display card
CN113326118A (en) Virtualization method and system based on multi-core processor, multi-core processor and electronic equipment
CN114281529A (en) Distributed virtualized client operating system scheduling optimization method, system and terminal
WO2021170054A1 (en) Virtualization method, device, board card and computer-readable storage medium
Yang et al. On construction of a virtual GPU cluster with InfiniBand and 10 Gb Ethernet virtualization
KR101435772B1 (en) Gpu virtualizing system
CN113326091A (en) Virtualization method, virtualization device, board card and computer-readable storage medium
CN114816648A (en) Computing device and computing method
US8402191B2 (en) Computing element virtualization
WO2021170055A1 (en) Virtualization method, device, board card and computer readable storage medium
CN113326110A (en) System on chip and board card

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination