WO2021170054A1 - 一种虚拟化的方法、设备、板卡及计算机可读存储介质 - Google Patents

一种虚拟化的方法、设备、板卡及计算机可读存储介质 Download PDF

Info

Publication number
WO2021170054A1
WO2021170054A1 PCT/CN2021/077977 CN2021077977W WO2021170054A1 WO 2021170054 A1 WO2021170054 A1 WO 2021170054A1 CN 2021077977 W CN2021077977 W CN 2021077977W WO 2021170054 A1 WO2021170054 A1 WO 2021170054A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual
processing
function
virtual function
container
Prior art date
Application number
PCT/CN2021/077977
Other languages
English (en)
French (fr)
Inventor
鲁海波
符方晓
杨宇
Original Assignee
安徽寒武纪信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202010131483.4A external-priority patent/CN113326118A/zh
Priority claimed from CN202010358635.4A external-priority patent/CN113568734A/zh
Application filed by 安徽寒武纪信息科技有限公司 filed Critical 安徽寒武纪信息科技有限公司
Priority to US17/904,824 priority Critical patent/US20230111884A1/en
Publication of WO2021170054A1 publication Critical patent/WO2021170054A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/483Multiproc

Definitions

  • the present disclosure relates to the field of artificial intelligence, and more specifically, to the virtualization technology of a processor.
  • virtualization is a resource management technology that abstracts and presents various computer resources, such as servers, networks, memory, and storage, so that users can compare their original configuration. A better way to apply these resources.
  • Figure 1-1 shows a schematic block diagram of implementing virtualization through time slicing technology.
  • FIG. 1-1 there are four virtual machines VM0-VM3. These virtual machines perform their own tasks respectively. After these tasks pass through the time slice manager, they will form time slices and are sorted by time. The calculation engine processes different tasks (time-sharing tasks) according to time slices. In this mode, when the virtual machine VM1 is working, other virtual machines cannot work and are in a waiting time. When the time slice is small, it is not easy for users to perceive the time delay, but if a task of a virtual machine takes up a lot of time (such as VM1 as shown in Figure 1-1), other users will feel the obvious Time delay, thereby affecting the user experience.
  • the computing engine is common to different virtual machines. Once a virtual machine causes a problem with the computing engine, it will affect the paralysis of all virtual machines, thereby affecting all users.
  • the existing virtual machine solutions have disadvantages such as low computing efficiency, HOL Blocking, large adjacent noise, and difficulty in expansion.
  • the purpose of the present disclosure is to provide a virtualization method and system based on a multi-core processor that can overcome at least one of the defects in the prior art.
  • a virtualization method based on a multi-core processor wherein the multi-core processor includes a plurality of processing cores, and the method includes: dividing the multi-core processor into a plurality of virtual functions , Each of the virtual functions corresponds to one or more processing cores; and the virtual functions are corresponded to the container.
  • a virtualization system including: a multi-core processor including a plurality of processing cores; a plurality of virtual functions, each of the virtual functions corresponding to one or more processing And a container, the container corresponding to the virtual function.
  • a multi-core processor including a plurality of processing cores, wherein the multi-core processor is divided into a plurality of virtual functions, and each of the virtual functions corresponds to one or more processing cores. nuclear.
  • an electronic device including the virtualization system as described above or the multi-core processor as described above.
  • a computer-readable storage medium having computer program code stored thereon, and when the computer program code is run by a processor, the method described above is executed.
  • QoS Higher quality of service
  • the purpose of the present disclosure is to provide a virtualization method and system based on a multi-core processor that can overcome at least one of the defects in the prior art.
  • a virtualization method based on a multi-core processor wherein the multi-core processor includes a plurality of processing cores, and the method includes: dividing the multi-core processor into a plurality of virtual functions , Each of the virtual functions corresponds to one or more processing cores; and the virtual functions are corresponded to virtual machines.
  • a virtualization method based on a multi-core processor wherein the multi-core processor includes a plurality of processing cores, and the method includes: dividing the multi-core processor into a plurality of virtual functions , The plurality of virtual functions share the plurality of processing cores; and the virtual function is mapped to the container.
  • a virtualization system including: a multi-core processor including a plurality of processing cores; a plurality of virtual functions, the plurality of virtual functions sharing the plurality of processing cores And a virtual machine, the virtual machine corresponding to the virtual function.
  • a virtualization system including: a multi-core processor including a plurality of processing cores; a plurality of virtual functions, the plurality of virtual functions sharing the plurality of processing cores And a container, the container corresponding to the virtual function.
  • a multi-core processor including a plurality of processing cores, wherein the multi-core processor is divided into a plurality of virtual functions, and the plurality of virtual functions share one or more processing cores.
  • an electronic device including the virtualization system as described above or the multi-core processor as described above.
  • a computer-readable storage medium having computer program code stored thereon, and when the computer program code is run by a processor, the method described above is executed.
  • QoS Higher quality of service
  • Figure 1-1 shows a schematic block diagram of implementing virtualization through time slicing technology
  • Figure 1-2a shows a schematic diagram of the internal structure of a processing cluster to which the method of the present disclosure can be applied;
  • Figure 1-2b shows a schematic structural diagram of an artificial intelligence processor to which the method of the present disclosure can be applied;
  • Figures 1-3 show a virtualization method based on a multi-core processor according to the first aspect of the present disclosure
  • Figures 1-4 show a virtualization system according to an embodiment of the present disclosure
  • Figures 1-5 show schematic diagrams of correspondence between virtual functions and processing clusters according to an embodiment of the present disclosure
  • Figure 1-6a, Figure 1-6b and Figure 1-6c exemplarily show the resource occupation of the PEIe card when it is divided into 1, 2 and 4 virtual functions;
  • Figures 1-7 show a schematic block diagram of a virtualization system according to yet another embodiment of the present disclosure
  • Figures 1-8 exemplarily show the structure diagram of the virtualization system
  • Figures 1-9 show schematic diagrams of a combined processing device according to the present disclosure
  • Figures 1-10 show a schematic block diagram of a board according to the present disclosure
  • Figure 1-11a and Figure 1-11b show a schematic diagram of the comparison between the virtual machine mode and the Docker mode
  • Figure 2-1 shows a schematic block diagram of realizing virtualization through time slicing technology
  • Figure 2-2a shows a schematic diagram of the internal structure of a processing cluster to which the method of the present disclosure can be applied;
  • Figure 2-2b shows a schematic structural diagram of an artificial intelligence processor to which the method of the present disclosure can be applied;
  • Figures 2-3 show a virtualization method based on a multi-core processor according to the first aspect of the present disclosure
  • Figures 2-4 show a virtualization system according to an embodiment of the present disclosure
  • 2-5 show a schematic diagram of the correspondence between virtual functions and processing clusters according to an embodiment of the present disclosure
  • Figure 2-6a, Figure 2-6b and Figure 2-6c exemplarily show the resource occupation of the PEIe card when it is divided into 1, 2, and 4 virtual functions;
  • FIG. 2-7 show a schematic block diagram of a virtualization system according to another embodiment of the present disclosure.
  • Figures 2-8 exemplarily show the structure diagram of the virtualization system
  • FIGS. 2-9 show schematic diagrams of a combined processing device according to the present disclosure
  • FIGS. 2-10 show schematic block diagrams of boards according to the present disclosure
  • Figure 2-11a and Figure 2-11b show a schematic diagram of the comparison between the virtual machine mode and the Docker mode
  • FIG. 2-12 show a virtualization method based on a multi-core processor according to the first aspect of the present disclosure
  • Figures 2-13 show a virtualization system according to an embodiment of the present disclosure.
  • FIGS. 2-14 show a schematic block diagram of a virtualization system according to an embodiment of the present disclosure.
  • the term “if” can be interpreted as “when” or “once” or “in response to determination” or “in response to detection” depending on the context.
  • the phrase “if determined” or “if detected [described condition or event]” can be interpreted as meaning “once determined” or “in response to determination” or “once detected [described condition or event]” depending on the context ]” or “in response to detection of [condition or event described]”.
  • connection relationship a connection relationship
  • response relationship a matching relationship between the two parts
  • Virtualization is a technology that virtualizes a computer device into multiple virtual machines.
  • each virtual machine can run the same or a different operating system, and the applications running on the operating system can be independent of each other in an independent space, thereby significantly improving the computer’s performance. Work efficiency.
  • Virtualization technology is different from multitasking or hyperthreading technology.
  • Multitasking means that multiple programs run at the same time in an operating system, while in virtualization technology, multiple operating systems can be run at the same time, and each operating system has multiple programs running, and each operating system runs On a virtual machine.
  • Hyper-threading technology is just a single processor simulating dual processors to balance the performance of the program. The two simulated processors cannot be separated and can only work together. In virtualization technology, the virtual processors operate independently.
  • Virtualization technology usually uses software to redefine and divide the physical resources of a computer to achieve dynamic allocation, flexible scheduling, and cross-domain sharing of computer resources, thereby improving resource utilization.
  • Figure 1-2a shows a schematic diagram of the internal structure of a processing cluster to which the method of the present disclosure can be applied.
  • AI chips accelerate data computing capabilities and reduce memory access latency.
  • the AI chip adopts a multi-core processor architecture, and adds a storage unit core (also called an on-chip or on-chip storage unit) to accelerate data reading, which solves the processing core of the AI chip and DDR (also called an off-chip storage unit)
  • DDR also called an off-chip storage unit
  • the memory access bottleneck problem Provide users with stronger computing capabilities in scenarios such as deep learning and network computing.
  • the AI chip may have 16 processing cores for performing computing tasks. Every 4 processing cores form a processing cluster, that is, a total of 4 processing clusters. There is a storage unit core in each processing cluster.
  • the storage unit core is mainly used to process the data exchange between the shared storage unit and the processing core within the cluster and the data exchange between the processing clusters.
  • Figure 1-2b shows a schematic structural diagram of an artificial intelligence processor to which the method of the present disclosure can be applied.
  • the DDR of the AI chip adopts a non-uniform memory access (NUMA) architecture.
  • NUMA non-uniform memory access
  • Each processing cluster can access different DDR channels through NOC0, but the delays for accessing different DDR channels are different.
  • Each processing cluster corresponds to a DDR channel with the lowest access delay, and the delay when accessing other channels is relatively long.
  • processing cluster 0, processing cluster 1, processing cluster 2 and processing cluster 3 respectively access the corresponding DDR0, DDR1, DDR2, and DDR3 with the lowest latency. That is, each processing core accesses the DDR channel with the lowest latency of its processing cluster.
  • the AI chip can internally access the shared memory unit by using the processing cluster to reduce the direct access of the processing core to the DDR, thereby increasing the data throughput .
  • the storage unit core can broadcast data from the shared storage unit to the 4 processing cores in the processing cluster at the same time through data broadcasting (through NOC1) for data calculation.
  • NOC1 data broadcasting
  • the memory access delay can be reduced and the calculation performance can be optimized.
  • Figures 1-3 show a virtualization method based on a multi-core processor, such as an AI processor, according to the first aspect of the present disclosure, wherein the multi-core processor includes multiple processing cores, and the method includes: -S310, dividing the multi-core processor into a plurality of virtual functions, each of the virtual functions corresponding to one or more processing cores; and in operation 1-S320, corresponding the virtual functions to a container.
  • a multi-core processor such as an AI processor
  • FIGS 1-4 show a virtualization system according to an embodiment of the present disclosure.
  • the virtualization system includes: a multi-core processor, the multi-core processor includes a plurality of processing cores; a plurality of virtual functions VF0-VF3, Each of the virtual functions corresponds to one or more processing cores; and a container (container 0-container 3), the container corresponding to the virtual function.
  • SR-IOV Single Root I/O Virtualization
  • SR-IOV technology is a hardware-based virtualization solution that provides high-performance and scalable virtualization solutions.
  • SR-IOV has developed a standardized mechanism to enable multiple virtual machines to share an I/O device. This enables efficient sharing of PCIe (Peripheral Component Interconnect Express) devices between virtual machines, and can obtain I/O performance similar to that of the local machine.
  • PCIe Peripheral Component Interconnect Express
  • SR-IOV is divided into the following two types of functions:
  • PF Physical Function: It has PCI function to support SR-IOV function, as defined in SR-IOV specification.
  • the PF contains the SR-IOV function structure, which is used to manage the SR-IOV function.
  • PF is a full-featured PCIe function that can be discovered, managed, and processed like any other PCIe device.
  • PF has full configuration resources, which can be used to configure or control PCIe devices.
  • VF Virtual Function: A function associated with PF.
  • VF is a lightweight PCIe function that can share physical resources with PF and other VFs of the same PEIe device. VF only has configuration resources for its own behavior.
  • Each SR-IOV device can have one PF, and each PF can have multiple VFs associated with it.
  • Each VF can have a PCI memory space for mapping its register set.
  • the VF device driver operates on the register set to enable its function, and it is actually an actual PCI device. After the VF is created, it can be directly assigned to the guest virtual machine VM. This allows VF to share the same physical device and perform data input and output without CPU and hypervisor software overhead.
  • the same physical device mentioned above refers to different hardware resources on the same physical device.
  • the physical device may be a multi-core processor, but the hardware resources may be different processing cores on the physical device.
  • virtual functions can be single or multiple.
  • the virtual function when the virtual function is single, it means that all processing cores in the multi-core processor can be divided into a single virtual function; when there are multiple virtual functions, the containers can run independently.
  • Independent operation means that each container is isolated from each other, the operation does not depend on other containers, and will not be affected by other containers. Moreover, since the isolation in the present disclosure is based on hardware isolation, there is less interference between each other. In addition, independent operation can be that each container uses a different operating system without affecting each other.
  • the virtual function can perform the same work content as the multi-core processor, which is obtained by logically dividing the multi-core processor.
  • a virtual function may include one or more processing cores. The more processing cores, the stronger the computing power of the virtual function. It is also possible to divide all processing cores into one virtual function.
  • virtual functions can correspond to containers.
  • virtual function VF0 corresponds to container 0
  • virtual function VF1 corresponds to container 1
  • virtual function VF2 corresponds to container 2
  • virtual function VF3 corresponds to Container 3. It should be understood that this correspondence relationship is only an example, and other correspondence relationships may also be used in the present disclosure, so as to facilitate the deployment of the system. This will be described in more detail later.
  • 4 virtual functions and four containers are described in Figures 1-4, other numbers may be less or more.
  • the container contains the hardware resources and software resources required to execute tasks (for example, task 0-task 3), and they can run independently of each other without interfering with each other.
  • the technical solution of the present disclosure adopts independent running containers, so there is no head-of-line blocking problem between containers, and it will not be affected by adjacent noises. , There is no context switching overhead.
  • a certain number of processing cores constitute a processing cluster, so each virtual function can correspond to one or more processing clusters.
  • Figures 1-5 show schematic diagrams of the correspondence between virtual functions and processing clusters according to an embodiment of the present disclosure. It should be understood that although Figures 1-5 describe four processing clusters (processing cluster 0-processing cluster 3) as an example, the processing clusters can also be any other number.
  • processing cluster 0, processing cluster 1, processing cluster 2, and processing cluster 3 correspond to virtual function 0, that is, the multi-core processor is divided into a virtual function.
  • processing cluster 0, processing cluster 1, and processing cluster 2 correspond to virtual function 0, and processing cluster 3 corresponds to virtual function 1, that is, the multi-core processor is divided into two virtual functions Compared with virtual function 1, virtual function 0 has stronger processing capabilities.
  • processing cluster 0 and processing cluster 1 correspond to virtual function 0
  • processing cluster 2 and processing cluster 3 correspond to virtual function 1
  • the multi-core processor is divided into two virtual functions , Virtual function 0 and virtual function 1 have equivalent processing capabilities.
  • processing cluster 0 and processing cluster 1 correspond to virtual function 0
  • processing cluster 2 corresponds to virtual function 1
  • processing cluster 3 corresponds to virtual function 2, that is, the multi-core processor is divided There are three virtual functions. Compared with virtual function 1 and virtual function 2, virtual function 0 has stronger processing capabilities, and virtual function 1 and virtual function 2 have equivalent processing capabilities.
  • processing cluster 0 corresponds to virtual function 0
  • processing cluster 1 corresponds to virtual function 1
  • processing cluster 2 corresponds to virtual function 2
  • processing cluster 3 corresponds to virtual function 3.
  • Each virtual function has equivalent processing capabilities.
  • processing cluster 0 corresponds to virtual function
  • processing cluster 1 processing cluster 2
  • processing cluster 3 correspond to virtual function 1.
  • virtual function 0 has weaker functions. Processing power. This example is equivalent to example 2.
  • processing cluster 0 corresponds to virtual function
  • processing cluster 1 corresponds to virtual function 1
  • processing cluster 2 corresponds to virtual function 1
  • processing cluster 3 corresponds to virtual function 1
  • virtual function 1 and 2 have the same processing capacity.
  • This example is equivalent to example 3.
  • Example 8 shown in Figure 1-5 processing cluster 0 corresponds to virtual function 0, processing cluster 1 corresponds to virtual function 1, processing cluster 2 corresponds to virtual function 0, and processing cluster 3 corresponds to virtual function 2.
  • virtual function 0 has stronger processing capabilities, and virtual function 1 and virtual function 2 have equivalent processing capabilities.
  • This example is equivalent to Example 4.
  • each virtual function has independent hardware resources.
  • the hardware resources mentioned here may be processing cores, or memory (for example, DDR), buses, encoders/decoders, video/audio drivers, interface units, and so on.
  • PCIe board card resources it includes AI computing unit (IPU), graphics codec unit (VPU), graphics codec unit (JPU), and memory.
  • AI computing unit IPU
  • GPU graphics codec unit
  • JPU graphics codec unit
  • memory for example, DDR
  • IPU AI computing unit
  • VPU graphics codec unit
  • JPU graphics codec unit
  • the present disclosure does not impose any restrictions on the types of hardware resources.
  • Figure 1-6a, Figure 1-6b and Figure 1-6c exemplarily show the resource occupation of the PEIe card when it is divided into 1, 2, and 4 virtual functions.
  • the above-mentioned multi-core processor may be a computing device with multiple computing cores, such as JPU and VPU.
  • the virtual function VF0 when there is one virtual function, the virtual function VF0 will dedicate all resources, that is, occupy all computing cores, all channels, all VPUs, and all JPUs.
  • VF0 occupies half of the computing cores
  • VF1 occupies the other half of the computing cores.
  • VF0 can occupy channel 0 and channel 1
  • VF1 can occupy channel 2 and channel 3.
  • VPUs and JPUs VF0 can occupy VPU0 and VPU1, VF1 can occupy VPU2 and VPU3; VF0 can occupy JPU0 and JPU1, and VF1 can occupy JPU2 and JPU3.
  • the virtual functions VF0-VF3 each occupy 1/4 of the computing core.
  • the virtual functions VF0-VF3 can occupy channel 0-channel 3 respectively; the virtual functions VF0-VF3 can occupy VPU0-VPU3 respectively; the virtual functions VF0-VF3 respectively Can occupy JPU0-JPU3.
  • Figures 1-7 show a schematic block diagram of a virtualization system according to yet another embodiment of the present disclosure.
  • the virtualization system of the present disclosure further includes: a common driver, and the multiple virtual functions are driven by the common driver.
  • the driver may be common to all virtual functions, and it may be a program installed in the operating system.
  • the driver may, for example, establish a corresponding node for each virtual function VF, and the node may be a file stored in a certain directory (for example, a dev directory) for other applications to run or call.
  • the name of the file can vary from manufacturer to manufacturer.
  • Each container can contain one or more nodes, which means that each container can correspond to or contain one or more virtual functions.
  • each container may correspond to or contain a different number of nodes, so that the configuration of the container will be more flexible and the deployment will be more convenient.
  • the computing power of each virtual function may be different, it can be designed very flexibly according to requirements.
  • the method of the present disclosure can be further advanced to include establishing a one-to-one corresponding image for each container, and the image can communicate with the container.
  • the above-mentioned image can be established through docker-container technology.
  • the image can be remotely installed on the user side, and the user can run or call the container through the image, and then call the multi-core processor and other related resources.
  • Figures 1-8 exemplarily show the structure diagram of the virtualization system.
  • a virtual machine is used.
  • the framework 800 includes a user space 802, a kernel space 804, and a system on chip 806, which are separated by a dotted line in the figure.
  • the user space 802 is the running space of the user program. It only performs simple operations and cannot directly call system resources. It must pass through the system interface to issue instructions to the kernel.
  • the kernel space 804 is the space where the kernel code runs. Any command can be executed and all the resources of the system can be called.
  • the system-on-chip 806 is each module of the artificial intelligence chip, which cooperates with the user space 802 through the kernel space 804.
  • this embodiment is illustrated by virtualizing one component into four virtual components, but the present disclosure does not limit the number of virtual components.
  • the user space 802 is controlled by the hardware monitor tool 808 before the virtualization is executed, and the information of the system-on-chip 806 is obtained by calling the interface.
  • the hardware monitor tool 808 can not only collect the information of the system-on-chip 806, but also obtain the resources of the system-on-chip 806 by the upper-level software in real time, so that the user can grasp the detailed information and status of the current system-on-chip 806 in real time.
  • the detailed information and status can be: Dozens of data such as hardware device model, firmware version number, driver version number, device utilization, storage device overhead status, board power consumption and board peak power consumption, and fast peripheral component interconnection (PCIe).
  • PCIe fast peripheral component interconnection
  • the user virtual machine 810 is an abstraction and simulation of the real computing environment.
  • the system will allocate a set of data structures to manage the state of the user virtual machine 810. Its data structure includes a full set of registers, physical memory usage, virtual device status, and so on.
  • the physical space of the user space 802 in this embodiment is virtualized into four virtual spaces 812, 814, 816, and 818. These four virtual spaces 812, 814, 816, and 818 are independent of each other and can be equipped with different guest operating systems.
  • guest operating system 1, guest operating system 2, guest operating system 3, and guest operating system 4 the guest operating systems can be Windows, Linux, Unix, iOS, Android, etc., and each guest operating system is separately Run different applications.
  • the user virtual machine 810 is implemented by a fast emulator (QEMU).
  • QEMU is an open source virtualization software written in C language. It virtualizes the interface through dynamic binary conversion and provides a series of hardware models to make the guest operating system 1, guest operating system 2, guest operating system 3, and guest operating system 4 They think that they have direct access to the system-on-chip 806.
  • User space 802 includes processors, memories, I/O devices, etc.
  • QEMU can virtualize processors in user space 802 into four virtual processors, and virtualize memory into four virtual memories, and also I/O devices The virtualization is four virtual I/O devices.
  • Each guest operating system occupies a portion of the resources of the user space 802, for example, each occupies a quarter, that is, they can respectively access a virtual processor, a virtual memory, and a virtual I/O device to execute the guest operating system's resources. Task. Through this mode, guest operating system 1, guest operating system 2, guest operating system 3, and guest operating system 4 can operate independently.
  • the kernel space 804 carries the kernel virtual machine 820 and the chip driver 822.
  • the kernel virtual machine 820 is matched with QEMU, and is mainly responsible for the virtualization of the kernel space 804 and the system on chip 806, so that each guest operating system can obtain its own address space when accessing the system on chip 806.
  • the space on the system-on-chip 806 mapped to the guest operating system is actually a virtual component mapped to this process.
  • QEMU From the perspective of the user virtual machine 810, during the running of the virtual machine, QEMU performs kernel settings through the system call interface provided by the kernel virtual machine 820. QEMU uses the virtualization function of the kernel virtual machine 820 to provide hardware virtualization for its own virtual machine. To improve the performance of the virtual machine. From the perspective of the kernel virtual machine 820, when a user cannot directly interact with the kernel space 804, the management tool of the user space 802 needs to be used, and therefore, the tool that runs in the user space 802 is required to use QEMU.
  • the chip driver 822 is used to drive the physical functions 826.
  • the user space 802 is not accessed by the hardware monitor tool 808 via the chip driver 822 to access the system on chip 806, so the guest operating system 1, the guest operating system 2, and the guest operation
  • the system 3 and the guest operating system 4 are respectively configured with a kernel space 824 for loading the chip driver 822, so that each guest operating system can still drive the system-on-chip 806.
  • the system-on-chip 806 implements virtualization through the SR-IOV technology.
  • the SR-IOV technology can virtualize the components of the system-on-chip 806. In this way, each virtual component has its own corresponding uniquely accessible resource.
  • the system-on-chip 806 of this embodiment includes hardware and firmware.
  • the hardware includes a read-only memory ROM (not shown in the figure) for storing firmware
  • the firmware includes a physical function 826 for supporting or cooperating with the PCIe function of SR-IOV.
  • the physical function 826 has the right to fully configure PCIe resources.
  • the physical function 826 will virtualize a plurality of virtual functions 828, in this embodiment there are four virtual functions 828.
  • the virtual function 828 is a lightweight PCIe function, managed by the physical function 826, and can share PCIe physical resources with the physical function 826 and other virtual functions 828 associated with the same physical function 826.
  • the virtual function 828 is only allowed to control the resources allocated by the physical function 826 to itself.
  • each virtual function 828 can access its own PCIe configuration space through its own bus, device, and function number.
  • Each virtual function 828 has a memory space for mapping its register set. The virtual function 828 driver operates on the register set to enable its function, and directly assigns it to the corresponding user virtual machine 810. Although it is virtual, it will make the user virtual machine 810 think that it is an actual PCIe device.
  • the hardware of the system-on-chip 806 also includes a computing device 830, a video codec device 832, a JPEG codec device 834, a storage device 836, and PCIe 838.
  • the computing device 830 is an intelligent processing device IPU for performing the convolution calculation of a neural network;
  • the video coding and decoding device 832 is used for coding and decoding video data;
  • the JPEG coding and decoding device 834 is used for using the JPEG algorithm
  • the storage device 836 can be a dynamic random access memory (DRAM) to store data;
  • PCIe 838 is the aforementioned PCIe.
  • PCIe 838 will be virtualized into four virtual interfaces 840, the virtual function 828 and the virtual interface 840 have a one-to-one correspondence, that is, the first virtual function is connected to the first virtual interface, the second virtual function is connected to the second virtual interface, and so on.
  • the computing device 830 is virtualized into four virtual computing devices 842
  • the video codec device 832 is virtualized into four virtual video codec devices 844
  • the JPEG codec device 834 is virtualized into four virtual JPEGs
  • the codec device 846 virtualizes the storage device 836 into four virtual storage devices 848.
  • Each guest operating system is configured with a set of virtual kits.
  • Each set of virtual kits includes a user virtual machine 810, a virtual interface 840, a virtual function 828, a virtual computing device 842, a virtual video codec device 844, and a virtual JPEG Codec device 846 and a virtual storage device 848.
  • Each group of virtual suites run independently without affecting each other, and is used to perform the tasks delivered by the corresponding guest operating system to ensure that each guest operating system can access the configured virtual computing through the configured virtual interface 840 and virtual function 828 Device 842, virtual video codec device 844, virtual JPEG codec device 846, and virtual storage device 848.
  • each guest operating system responds to different tasks when performing tasks, and the hardware that needs to be accessed may also be different. For example, if a task is to perform operations, such as matrix convolution operations, the guest operating system will Access the configured virtual computing device 842 through the configured virtual interface 840 and virtual function 828; if a task is to perform video encoding and decoding, the guest operating system will access the configured virtual interface 840 and virtual function 828 The virtual video encoding and decoding device 844; if a task is to perform JPEG encoding and decoding, the guest operating system will access the configured virtual JPEG encoding and decoding device 846 through the configured virtual interface 840 and virtual function 828; such as a task To read or write data, the guest operating system will access the configured virtual storage device 848 through the configured virtual interface 840 and virtual function 828.
  • operations such as matrix convolution operations
  • the guest operating system will Access the configured virtual computing device 842 through the configured virtual interface 840 and virtual function 828
  • video encoding and decoding the
  • Figure 1-11a and Figure 1-11b show a schematic diagram of the comparison between the virtual machine mode and the Docker mode.
  • the host host passes PCIe devices (pass through) to the guest guest.
  • the guest includes a drive and a directory. Therefore, each guest needs to load the drive by itself. Create a node under the client's directory, that is, a character device.
  • the drive and directory are in the host, so only the host needs to load the drive, and the drive is common to all virtual functions. Therefore, the driver of the host creates a node, that is, a character device, in the host directory, and then transfers the device to the mirror device, that is, Docker. Therefore, compared with the virtual machine mode, the docker-container mode of the present disclosure does not require each client to install and load a driver, thereby simplifying the system setting and facilitating the user's use.
  • the lightweight virtualization solution based on Docker not only takes the entire card as the granularity, but requires multiple containers to be more granular, sharing one or more physical acceleration Card.
  • one or more VFs can be used. VFs between different containers can work independently and safely and separately from each other.
  • the hardware virtualization solution using SR-IOV can simultaneously support the use mode of Docker and can generate multiple VFs on the physical machine at the same time.
  • the system administrator can assign different VFs to different containers according to requirements. VFs belonging to different containers can work independently without interfering with each other, and VFs have the same robustness (also known as robustness) and safety isolation as those between PFs.
  • Docker has the advantage of faster startup, fewer resources, and higher system utilization. Whether it is development, testing or deployment, it is easier.
  • the present disclosure also provides a multi-core processor including a plurality of processing cores, wherein the multi-core processor is divided into a plurality of virtual functions, and each of the virtual functions corresponds to one or more processing cores.
  • the present disclosure also discloses an electronic device, including the above-mentioned virtualization system or the above-mentioned multi-core processor.
  • the electronic device may be a host, that is, the technical solution of the present disclosure is implemented in the host and communicates with an external mirror (docker).
  • electronic equipment or devices can also include data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, Cameras, camcorders, projectors, watches, headsets, mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.
  • the transportation means include airplanes, ships, and/or vehicles;
  • the household appliances include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods;
  • the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
  • the present disclosure also provides a computer-readable storage medium having computer program code stored thereon, and when the computer program code is run by a processor, the method described above is executed.
  • FIGS. 1-8 show a combined processing device 900, which includes the aforementioned computing device 902 (for example, the computing device 830 described in FIGS. 1-8, etc.), a universal interconnect interface 904, and other processing devices 906.
  • the computing device according to the present disclosure interacts with other processing devices to jointly complete operations specified by the user.
  • Figures 1-9 are schematic diagrams of the combined processing device.
  • Other processing devices include one or more types of general-purpose/special processors such as central processing unit CPU, graphics processing unit GPU, neural network processor, etc.
  • the number of processors included in other processing devices is not limited.
  • Other processing devices serve as the interface between the machine learning computing device and external data and control, including data handling, and completing basic controls such as turning on and stopping the machine learning computing device; other processing devices can also cooperate with the machine learning computing device to complete computing tasks.
  • the universal interconnection interface is used to transmit data and control commands between computing devices (including, for example, machine learning computing devices) and other processing devices.
  • the computing device obtains the required input data from other processing devices and writes it to the storage device on the computing device chip; it can obtain control instructions from other processing devices and write it to the control buffer on the computing device chip; it can also read from the computing device
  • the data in the storage module is transmitted to other processing devices.
  • the structure may further include a storage device 908, which is respectively connected to the computing device and the other processing device.
  • the storage device is used to store the data in the computing device and the other processing device, and is especially suitable for data that cannot be fully stored in the internal storage of the computing device or other processing device.
  • the combined processing device can be used as an SOC system on chip for mobile phones, robots, unmanned aerial vehicles, video surveillance equipment and other equipment, effectively reducing the core area of the control part, increasing processing speed, and reducing overall power consumption.
  • the universal interconnection interface of the combined processing device is connected to some parts of the equipment. Some components such as camera, monitor, mouse, keyboard, network card, wifi interface.
  • the present disclosure also discloses a chip, which includes the aforementioned computing device or combined processing device.
  • the present disclosure also discloses a board card, which includes the above-mentioned chip. 1-10, it provides an exemplary board card.
  • the board card may also include other supporting components.
  • the supporting components include, but are not limited to: a storage device 1004 and an interface device 1006. ⁇ 1008 ⁇ And the control device 1008.
  • the storage device is connected to the chip in the chip packaging structure through a bus for storing data.
  • the storage device may include multiple groups of storage units 1010. Each group of the storage unit and the chip are connected by a bus. It can be understood that each group of the storage units may be DDR SDRAM (English: Double Data Rate SDRAM, double-rate synchronous dynamic random access memory).
  • the storage device may include 4 groups of the storage units. Each group of the storage unit may include a plurality of DDR4 particles (chips). In an embodiment, the chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification. In one embodiment, each group of the storage unit includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel. DDR can transmit data twice in one clock cycle. A controller for controlling the DDR is provided in the chip for controlling the data transmission and data storage of each storage unit.
  • the interface device is electrically connected with the chip in the chip packaging structure.
  • the interface device is used to implement data transmission between the chip and an external device 1012 (for example, a server or a computer).
  • the interface device may be a standard PCIE interface.
  • the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer.
  • the interface device may also be other interfaces.
  • the present disclosure does not limit the specific manifestations of the other interfaces mentioned above, and the interface unit only needs to be able to realize the switching function.
  • the calculation result of the chip is still transmitted by the interface device back to an external device (such as a server).
  • the control device is electrically connected with the chip.
  • the control device is used to monitor the state of the chip.
  • the chip and the control device may be electrically connected through an SPI interface.
  • the control device may include a single-chip microcomputer (Micro Controller Unit, MCU).
  • MCU Micro Controller Unit
  • the chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the chip can be in different working states such as multi-load and light-load.
  • the control device can realize the regulation and control of the working states of multiple processing chips, multiple processing and/or multiple processing circuits in the chip.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, optical, acoustic, magnetic or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or in the form of software program modules.
  • the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the computer software product is stored in a memory and includes several instructions to enable a computer device (which can be a personal computer, a server, or a network device) Etc.) Perform all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • a virtualization method based on a multi-core processor wherein the multi-core processor includes a plurality of processing cores, and the method includes:
  • the virtual function is mapped to the container.
  • Clause A2 The method according to Clause A1, wherein there are multiple containers, and multiple containers can run independently.
  • Clause A3 The method according to clause A1 or A2, wherein a certain number of processing cores constitute a processing cluster, and each virtual function corresponds to one or more processing clusters.
  • Clause A4 The method according to any one of clauses A1-A3, wherein one virtual function is mapped to one container; or multiple virtual functions are mapped to one container.
  • Clause A5. The method according to any one of clauses A1-A4, wherein each virtual function has independent hardware resources.
  • Clause A6 The method of any one of clauses A1-A5, wherein the multiple virtual functions are driven by a common driver.
  • Clause A7 The method according to clause A6, wherein a corresponding node is established for each virtual function through the driver, and the container corresponds to one or more nodes.
  • Clause A8 The method according to any one of clauses A1-A7, further comprising establishing a one-to-one corresponding image for each of the containers, the image being able to communicate with the container.
  • a virtualization system including:
  • a multi-core processor where the multi-core processor includes a plurality of processing cores
  • a plurality of virtual functions each of the virtual functions corresponding to one or more processing cores;
  • a container the container corresponding to the virtual function.
  • Clause A10 The virtualization system according to Clause A9, wherein there are multiple containers, and the multiple containers can run independently.
  • Clause A11 The virtualization system according to Clause A9 or A10, wherein a certain number of processing cores constitute a processing cluster, and each virtual function corresponds to one or more processor clusters.
  • Clause A12 The virtualization system according to any one of clauses A9-A11, wherein one virtual function corresponds to one container; or multiple virtual functions correspond to one container.
  • Clause A14 The virtualization system according to any one of clauses A9-A13, further comprising: a common driver, and the plurality of virtual functions are driven by the common driver.
  • Clause A15 The virtualization system according to clause A14, wherein the common driver is configured to establish a corresponding node for each virtual function, and the container corresponds to one or more nodes.
  • Clause A16 The virtualization system according to any one of clauses A9-A15, further comprising an image, the image has a one-to-one correspondence with the container and can communicate with the container.
  • a multi-core processor including multiple processing cores, wherein,
  • the multi-core processor is divided into a plurality of virtual functions, and each of the virtual functions corresponds to one or more processing cores.
  • Clause A18 An electronic device comprising a virtualization system as described in any one of clauses A9-A16 or a multi-core processor as described in clause A17.
  • This disclosure relates to the field of artificial intelligence, and more specifically, to the virtualization technology of processors.
  • Virtualization is a resource management technology that abstracts and presents various computer resources, such as servers, networks, memory, and storage, so that users can compare their original configuration. A better way to apply these resources.
  • Figure 2-1 shows a schematic block diagram of implementing virtualization through time slicing technology.
  • FIG. 2-1 there are four virtual machines VM0-VM3. These virtual machines execute their own tasks. After these tasks pass through the time slice manager, they will form time slices and sort them by time.
  • the calculation engine processes different tasks (time-sharing tasks) according to time slices. In this mode, when the virtual machine VM1 is working, other virtual machines cannot work and are in a waiting time.
  • the time slice is small, it is not easy for users to notice the time delay, but if a task of a virtual machine takes up a lot of time (such as VM1 as shown in Figure 2-1), other users will feel the obvious Time delay, thereby affecting the user experience.
  • the computing engine is common to different virtual machines. Once a virtual machine causes a problem with the computing engine, it will affect the paralysis of all virtual machines, thereby affecting all users.
  • the existing virtual machine solutions have disadvantages such as low computing efficiency, HOL Blocking, large adjacent noise, and difficulty in expansion.
  • the term “if” can be interpreted as “when” or “once” or “in response to determination” or “in response to detection” depending on the context.
  • the phrase “if determined” or “if detected [described condition or event]” can be interpreted as meaning “once determined” or “in response to determination” or “once detected [described condition or event]” depending on the context ]” or “in response to detection of [condition or event described]”.
  • connection relationship a connection relationship
  • response relationship a matching relationship between the two parts
  • Virtualization is a technology that virtualizes a computer device into multiple virtual machines.
  • each virtual machine can run the same or a different operating system, and the applications running on the operating system can be independent of each other in an independent space, thereby significantly improving the computer’s performance. Work efficiency.
  • Virtualization technology is different from multitasking or hyperthreading technology.
  • Multitasking means that multiple programs run at the same time in an operating system, while in virtualization technology, multiple operating systems can be run at the same time, and each operating system has multiple programs running, and each operating system runs On a virtual machine.
  • Hyper-threading technology is just a single processor simulating dual processors to balance the performance of the program. The two simulated processors cannot be separated and can only work together. In virtualization technology, the virtual processors operate independently.
  • Virtualization technology usually uses software to redefine and divide the physical resources of a computer to achieve dynamic allocation, flexible scheduling, and cross-domain sharing of computer resources, thereby improving resource utilization.
  • Figure 2-2a shows a schematic diagram of the internal structure of a processing cluster to which the method of the present disclosure can be applied.
  • AI chips accelerate data computing capabilities and reduce memory access latency.
  • the AI chip adopts a multi-core processor architecture, and adds a storage unit core (also called an on-chip or on-chip storage unit) to accelerate data reading, which solves the processing core of the AI chip and DDR (also called an off-chip storage unit)
  • DDR also called an off-chip storage unit
  • the memory access bottleneck problem Provide users with stronger computing capabilities in scenarios such as deep learning and network computing.
  • the AI chip may have 16 processing cores for performing computing tasks. Every 4 processing cores form a processing cluster, that is, a total of 4 processing clusters. There are multiple storage unit cores in each processing cluster.
  • the storage unit core is mainly used to process the data exchange between the shared storage unit and the processing core within the cluster and the data exchange between the processing clusters.
  • Figure 2-2b shows a schematic structural diagram of an artificial intelligence processor to which the method of the present disclosure can be applied.
  • the DDR of the AI chip adopts a non-uniform memory access (NUMA) architecture.
  • NUMA non-uniform memory access
  • Each processing cluster can access different DDR channels through NOC0, but the delays for accessing different DDR channels are different.
  • Each processing cluster corresponds to a DDR channel with the lowest access delay, and the delay when accessing other channels is relatively long.
  • processing cluster 0, processing cluster 1, processing cluster 2 and processing cluster 3 respectively access the corresponding DDR0, DDR1, DDR2, and DDR3 with the lowest latency. That is, each processing core accesses the DDR channel with the lowest latency of its processing cluster.
  • the AI chip can internally access the shared memory unit by using the processing cluster to reduce the direct access of the processing core to the DDR, thereby increasing the data throughput .
  • the storage unit core can broadcast data from the shared storage unit to the 4 processing cores in the processing cluster at the same time through data broadcasting (through NOC1) for data calculation.
  • NOC1 data broadcasting
  • the memory access delay can be reduced and the calculation performance can be optimized.
  • Figures 2-3 show a virtualization method based on a multi-core processor, such as an AI processor, according to the first aspect of the present disclosure, wherein the multi-core processor includes multiple processing cores, and the method includes: -S310, dividing the multi-core processor into multiple virtual functions, each of the virtual functions corresponding to one or more processing cores; and in operation 2-S320, corresponding the virtual functions to virtual machines.
  • a multi-core processor such as an AI processor
  • FIGS 2-4 show a virtualization system according to an embodiment of the present disclosure.
  • the virtualization system includes: a multi-core processor, the multi-core processor includes a plurality of processing cores; a plurality of virtual functions VF0-VF3, Each of the virtual functions corresponds to one or more processing cores; and a virtual machine (virtual machine 0-virtual machine 3), the virtual machine corresponding to the virtual function.
  • SR-IOV Single Root I/O Virtualization
  • SR-IOV technology is a hardware-based virtualization solution that provides high-performance and scalable virtualization solutions.
  • SR-IOV has developed a standardized mechanism to enable multiple virtual machines to share an I/O device. This enables efficient sharing of PCIe (Peripheral Component Interconnect Express) devices between virtual machines, and can obtain I/O performance similar to that of the local machine.
  • PCIe Peripheral Component Interconnect Express
  • SR-IOV is divided into the following two types of functions:
  • PF Physical Function: It has PCI function to support SR-IOV function, as defined in SR-IOV specification.
  • the PF contains the SR-IOV function structure, which is used to manage the SR-IOV function.
  • PF is a full-featured PCIe function that can be discovered, managed, and processed like any other PCIe device.
  • PF has full configuration resources, which can be used to configure or control PCIe devices.
  • VF Virtual Function: A function associated with PF.
  • VF is a lightweight PCIe function that can share physical resources with PF and other VFs of the same PEIe device. VF only has configuration resources for its own behavior.
  • Each SR-IOV device can have one PF, and each PF can have multiple VFs associated with it.
  • Each VF can have a PCI memory space for mapping its register set.
  • the VF device driver operates on the register set to enable its function, and it is actually an actual PCI device. After the VF is created, it can be directly assigned to the guest virtual machine VM. This allows VF to share the same physical device and perform data input and output without CPU and hypervisor software overhead.
  • the same physical device mentioned above refers to different hardware resources on the same physical device.
  • the physical device may be a multi-core processor, but the hardware resources may be different processing cores on the physical device.
  • virtual functions can be single or multiple.
  • the virtual function is single, it means that all processing cores in the multi-core processor can be divided into a single virtual function; when there are multiple virtual functions, the virtual machines can run independently.
  • Independent operation means that each virtual machine is isolated from each other, the operation does not depend on other virtual machines, and will not be affected by other virtual machines. Moreover, since the isolation in the present disclosure is based on hardware isolation, the interference between each other is more few. In addition, independent operation can be that each virtual machine uses a different operating system without affecting each other.
  • the virtual function can execute the same work content as the multi-core processor, which is obtained by logically dividing the multi-core processor.
  • a virtual function may include one or more processing cores. The more processing cores, the stronger the computing power of the virtual function. It is also possible to divide all processing cores into one virtual function.
  • virtual functions can correspond to virtual machines.
  • virtual function VF0 corresponds to virtual machine 0
  • virtual function VF1 corresponds to virtual machine 1
  • virtual function VF2 corresponds to virtual machine 2
  • virtual Function VF3 corresponds to virtual machine 3. It should be understood that this correspondence relationship is only an example, and other correspondence relationships may also be used in the present disclosure, so as to facilitate the deployment of the system. This will be described in more detail later.
  • 4 virtual functions and 4 virtual machines are described in Figures 2-4, other numbers may be less or more.
  • virtual machines can run independently without interfering with each other.
  • the technical solution of the present disclosure adopts independent running virtual machines, so there is no head-of-line blocking problem between virtual machines, and it will not be affected by adjacent virtual machines. Noise impact and no context switching overhead.
  • each virtual function can correspond to one or more processing clusters.
  • Figures 2-5 show schematic diagrams of the correspondence between virtual functions and processing clusters according to an embodiment of the present disclosure. It should be understood that although Figures 2-5 describe four processing clusters (processing cluster 0-processing cluster 3) as an example, the processing clusters can also be any other number.
  • processing cluster 0, processing cluster 1, processing cluster 2, and processing cluster 3 correspond to virtual function 0, that is, the multi-core processor is divided into one virtual function.
  • processing cluster 0, processing cluster 1, and processing cluster 2 correspond to virtual function 0, and processing cluster 3 corresponds to virtual function 1. That is, the multi-core processor is divided into two virtual functions. Function, virtual function 0 has stronger processing capability than virtual function 1.
  • processing cluster 0 and processing cluster 1 correspond to virtual function 0
  • processing cluster 2 and processing cluster 3 correspond to virtual function 1. That is, the multi-core processor is divided into two virtual functions. Function, virtual function 0 and virtual function 1 have equivalent processing capabilities.
  • processing cluster 0 and processing cluster 1 correspond to virtual function 0
  • processing cluster 2 corresponds to virtual function 1
  • processing cluster 3 corresponds to virtual function 2, that is, the multi-core processor is Divided into three virtual functions, virtual function 0 has stronger processing power than virtual function 1 and virtual function 2, and virtual function 1 and virtual function 2 have equivalent processing power.
  • processing cluster 0 corresponds to virtual function 0
  • processing cluster 1 corresponds to virtual function 1
  • processing cluster 2 corresponds to virtual function 2
  • processing cluster 3 corresponds to virtual function 3.
  • the four virtual functions have equivalent processing capabilities.
  • processing cluster 0 corresponds to virtual function 0
  • processing cluster 1 processing cluster 2
  • processing cluster 3 correspond to virtual function 1.
  • virtual function 0 has a higher Weak processing power.
  • This example is equivalent to example 2.
  • processing cluster 0 corresponds to virtual function
  • processing cluster 1 corresponds to virtual function 1
  • processing cluster 2 corresponds to virtual function 1
  • processing cluster 3 corresponds to virtual function 1
  • virtual Functions 1 and 2 have the same processing capacity.
  • This example is equivalent to example 3.
  • processing cluster 0 corresponds to virtual function
  • processing cluster 1 corresponds to virtual function 1
  • processing cluster 2 corresponds to virtual function 1
  • processing cluster 3 corresponds to virtual function 2.
  • virtual function 0 has stronger processing capabilities
  • virtual function 1 and virtual function 2 have equivalent processing capabilities.
  • This example is equivalent to Example 4.
  • each virtual function has independent hardware resources.
  • the hardware resources mentioned here may be processing cores, or memory (for example, DDR), buses, encoders/decoders, video/audio drivers, interface units, and so on.
  • PCIe board card resources it includes AI computing unit (IPU), graphics codec unit (VPU), graphics codec unit (JPU), and memory.
  • AI computing unit IPU
  • GPU graphics codec unit
  • JPU graphics codec unit
  • memory for example, DDR
  • IPU AI computing unit
  • VPU graphics codec unit
  • JPU graphics codec unit
  • the present disclosure does not impose any restrictions on the types of hardware resources.
  • Figure 2-6a, Figure 2-6b and Figure 2-6c exemplarily show the resource occupation of the PEIe card when it is divided into 1, 2, and 4 virtual functions.
  • the above-mentioned multi-core processor may be a computing device with multiple computing cores, such as JPU and VPU.
  • the virtual function VF0 when there is one virtual function, the virtual function VF0 will dedicate all resources, that is, occupy all computing cores, all channels, all VPUs, and all JPUs.
  • the virtual function VF0 and virtual function VF2 will use half of the resources respectively, that is, VF0 occupies half of the computing cores, and VF1 occupies the other half of the computing cores.
  • VF0 can occupy channel 0 and channel 1
  • VF1 can occupy channel 2 and channel 3.
  • VPUs and JPUs VF0 can occupy VPU0 and VPU1, VF1 can occupy VPU2 and VPU3; VF0 can occupy JPU0 and JPU1, and VF1 can occupy JPU2 and JPU3.
  • the virtual functions VF0-VF3 each occupy 1/4 of the computing core.
  • the virtual functions VF0-VF3 can occupy channel 0-channel 3 respectively; the virtual functions VF0-VF3 can occupy VPU0-VPU3 respectively; the virtual functions VF0-VF3 respectively Can occupy JPU0-JPU3.
  • FIGS. 2-7 show a schematic block diagram of a virtualization system according to another embodiment of the present disclosure.
  • the virtualization system of the present disclosure further includes: a plurality of drivers, and the plurality of virtual functions are driven by different drivers.
  • a corresponding node is established for the corresponding virtual function through the driver, that is, the client includes a driver and a directory, so each client needs to load the driver by itself, and create it in the client's directory.
  • Node that is, a character device.
  • Figures 2-8 exemplarily show the structure diagram of the virtualization system.
  • a virtual machine is used.
  • the framework 800 includes a user space 802, a kernel space 804, and a system-on-chip 806, which are separated by a dotted line in the figure.
  • the user space 802 is the running space of the user program. It only performs simple operations and cannot directly call system resources. It must pass through the system interface to issue instructions to the kernel.
  • the kernel space 804 is the space where the kernel code runs. Any command can be executed and all the resources of the system can be called.
  • the system-on-chip 806 is each module of the artificial intelligence chip, which cooperates with the user space 802 through the kernel space 804.
  • this embodiment is illustrated by virtualizing one component into four virtual components, but the present disclosure does not limit the number of virtual components.
  • the user space 802 is controlled by the hardware monitor tool 808 before the virtualization is executed, and the information of the system-on-chip 806 is obtained by calling the interface.
  • the hardware monitor tool 808 can not only collect the information of the system-on-chip 806, but also obtain the resources of the system-on-chip 806 by the upper-level software in real time, so that the user can grasp the detailed information and status of the current system-on-chip 806 in real time.
  • the detailed information and status can be: Dozens of data such as hardware device model, firmware version number, driver version number, device utilization, storage device overhead status, board power consumption and board peak power consumption, and fast peripheral component interconnection (PCIe).
  • PCIe fast peripheral component interconnection
  • the user virtual machine 810 is an abstraction and simulation of the real computing environment.
  • the system will allocate a set of data structures to manage the state of the user virtual machine 810. Its data structure includes a full set of registers, physical memory usage, virtual device status, and so on.
  • the physical space of the user space 802 in this embodiment is virtualized into four virtual spaces 812, 814, 816, and 818. These four virtual spaces 812, 814, 816, and 818 are independent of each other and can be equipped with different guest operating systems.
  • guest operating system 1, guest operating system 2, guest operating system 3, and guest operating system 4 the guest operating systems can be Windows, Linux, Unix, iOS, Android, etc., and each guest operating system is separately Run different applications.
  • the user virtual machine 810 is implemented by a fast emulator (QEMU).
  • QEMU is an open source virtualization software written in C language. It virtualizes the interface through dynamic binary conversion and provides a series of hardware models to make the guest operating system 1, guest operating system 2, guest operating system 3, and guest operating system 4 They think that they have direct access to the system-on-chip 806.
  • User space 802 includes processors, memories, I/O devices, etc.
  • QEMU can virtualize processors in user space 802 into four virtual processors, and virtualize memory into four virtual memories, and also I/O devices The virtualization is four virtual I/O devices.
  • Each guest operating system occupies a portion of the resources of the user space 802, for example, each occupies a quarter, that is, they can respectively access a virtual processor, a virtual memory, and a virtual I/O device to execute the guest operating system's resources. Task. Through this mode, guest operating system 1, guest operating system 2, guest operating system 3, and guest operating system 4 can operate independently.
  • the kernel space 804 carries the kernel virtual machine 820 and the chip driver 822.
  • the kernel virtual machine 820 is matched with QEMU, and is mainly responsible for the virtualization of the kernel space 804 and the system on chip 806, so that each guest operating system can obtain its own address space when accessing the system on chip 806.
  • the space on the system-on-chip 806 mapped to the guest operating system is actually a virtual component mapped to this process.
  • QEMU performs kernel settings through the system call interface provided by the kernel virtual machine 820.
  • QEMU uses the virtualization function of the kernel virtual machine 820 to provide hardware virtualization for its own virtual machine. To improve the performance of the virtual machine.
  • the management tool of the user space 802 needs to be used, and therefore, the tool that runs in the user space 802 is required to use QEMU.
  • the chip driver 822 is used to drive the physical functions 826.
  • the user space 802 is not accessed by the hardware monitor tool 808 via the chip driver 822 to access the system on chip 806, so the guest operating system 1, the guest operating system 2, and the guest operation
  • the system 3 and the guest operating system 4 are respectively configured with a kernel space 824 for loading the chip driver 822, so that each guest operating system can still drive the system-on-chip 806.
  • the system-on-chip 806 implements virtualization through the SR-IOV technology.
  • the SR-IOV technology can virtualize the components of the system-on-chip 806. In this way, each virtual component has its own corresponding uniquely accessible resource.
  • the system-on-chip 806 of this embodiment includes hardware and firmware.
  • the hardware includes a read-only memory ROM (not shown in the figure) for storing firmware
  • the firmware includes a physical function 826 for supporting or cooperating with the PCIe function of SR-IOV.
  • the physical function 826 has the right to fully configure PCIe resources.
  • the physical function 826 will virtualize a plurality of virtual functions 828, in this embodiment there are four virtual functions 828.
  • the virtual function 828 is a lightweight PCIe function, managed by the physical function 826, and can share PCIe physical resources with the physical function 826 and other virtual functions 828 associated with the same physical function 826.
  • the virtual function 828 is only allowed to control the resources that the physical function 826 allocates to itself.
  • each virtual function 828 can access its own PCIe configuration space through its own bus, device, and function number.
  • Each virtual function 828 has a memory space for mapping its register set. The virtual function 828 driver operates on the register set to enable its function, and directly assigns it to the corresponding user virtual machine 810. Although it is virtual, it will make the user virtual machine 810 think that it is an actual PCIe device.
  • the hardware of the system-on-chip 806 also includes a computing device 830, a video codec device 832, a JPEG codec device 834, a storage device 836, and PCIe 838.
  • the computing device 830 is an intelligent processing device IPU for performing the convolution calculation of a neural network;
  • the video coding and decoding device 832 is used for coding and decoding video data;
  • the JPEG coding and decoding device 834 is used for using the JPEG algorithm
  • the storage device 836 can be a dynamic random access memory (DRAM) to store data;
  • PCIe 838 is the aforementioned PCIe.
  • PCIe 838 will be virtualized into four virtual interfaces 840, the virtual function 828 and the virtual interface 840 have a one-to-one correspondence, that is, the first virtual function is connected to the first virtual interface, the second virtual function is connected to the second virtual interface, and so on.
  • the computing device 830 is virtualized into four virtual computing devices 842
  • the video codec device 832 is virtualized into four virtual video codec devices 844
  • the JPEG codec device 834 is virtualized into four virtual JPEGs
  • the codec device 846 virtualizes the storage device 836 into four virtual storage devices 848.
  • Each guest operating system is configured with a set of virtual kits.
  • Each set of virtual kits includes a user virtual machine 810, a virtual interface 840, a virtual function 828, a virtual computing device 842, a virtual video codec device 844, and a virtual JPEG Codec device 846 and a virtual storage device 848.
  • Each group of virtual suites run independently without affecting each other, and is used to perform the tasks delivered by the corresponding guest operating system to ensure that each guest operating system can access the configured virtual computing through the configured virtual interface 840 and virtual function 828 Device 842, virtual video codec device 844, virtual JPEG codec device 846, and virtual storage device 848.
  • each guest operating system responds to different tasks when performing tasks, and the hardware that needs to be accessed may also be different. For example, if a task is to perform operations, such as matrix convolution operations, the guest operating system will Access the configured virtual computing device 842 through the configured virtual interface 840 and virtual function 828; if a task is to perform video encoding and decoding, the guest operating system will access the configured virtual interface 840 and virtual function 828 The virtual video encoding and decoding device 844; if a task is to perform JPEG encoding and decoding, the guest operating system will access the configured virtual JPEG encoding and decoding device 846 through the configured virtual interface 840 and virtual function 828; such as a task To read or write data, the guest operating system will access the configured virtual storage device 848 through the configured virtual interface 840 and virtual function 828.
  • operations such as matrix convolution operations
  • the guest operating system will Access the configured virtual computing device 842 through the configured virtual interface 840 and virtual function 828
  • video encoding and decoding the
  • a virtualization method based on a multi-core processor which uses a virtual machine.
  • the docker-container method can be used.
  • the present disclosure also provides a multi-core processor-based virtualization method, wherein the multi-core processor includes multiple processing cores, and the method includes: in operation 2-S1210, The multi-core processor is divided into a plurality of virtual functions, and the plurality of virtual functions share the plurality of processing cores; and in operation 2-S1220, the virtual function is mapped to the container.
  • the virtualization system includes: a multi-core processor, the multi-core processor includes a plurality of processing cores; a plurality of virtual functions, the plurality of virtual The function shares the plurality of processing cores; and a container, the container corresponding to the virtual function.
  • virtual functions can correspond to containers.
  • virtual function VF0 corresponds to container 0
  • virtual function VF1 corresponds to container 1
  • virtual function VF2 corresponds to container 2
  • virtual function VF3 corresponds to Container 3. It should be understood that this correspondence relationship is only an example, and other correspondence relationships may also be used in the present disclosure, so as to facilitate the deployment of the system. This will be described in more detail later.
  • 4 virtual functions and 4 containers are described in Fig. 2-13, other numbers may be less or more.
  • the container contains the hardware resources and software resources required to execute tasks (for example, task 0-task 3), and they can run independently of each other without interfering with each other.
  • the technical solution of the present disclosure adopts independent running containers, so there is no head-of-line blocking problem between containers, and it will not be affected by adjacent noises. , There is no context switching overhead.
  • a certain number of processing cores constitute a processing cluster, so multiple virtual functions share one or more processor clusters.
  • Figures 2-5 show schematic diagrams of the correspondence between virtual functions and processing clusters according to an embodiment of the present disclosure. It should be understood that although Figures 2-5 describe four processing clusters (processing cluster 0-processing cluster 3) as an example, the processing clusters can also be any other number.
  • processing cluster 0, processing cluster 1, processing cluster 2, and processing cluster 3 correspond to virtual function 0, that is, the multi-core processor is divided into one virtual function.
  • processing cluster 0, processing cluster 1, and processing cluster 2 correspond to virtual function 0, and processing cluster 3 corresponds to virtual function 1. That is, the multi-core processor is divided into two virtual functions. Function, virtual function 0 has stronger processing capability than virtual function 1.
  • processing cluster 0 and processing cluster 1 correspond to virtual function 0
  • processing cluster 2 and processing cluster 3 correspond to virtual function 1. That is, the multi-core processor is divided into two virtual functions. Function, virtual function 0 and virtual function 1 have equivalent processing capabilities.
  • processing cluster 0 and processing cluster 1 correspond to virtual function 0
  • processing cluster 2 corresponds to virtual function 1
  • processing cluster 3 corresponds to virtual function 2, that is, the multi-core processor is Divided into three virtual functions, virtual function 0 has stronger processing power than virtual function 1 and virtual function 2, and virtual function 1 and virtual function 2 have equivalent processing power.
  • processing cluster 0 corresponds to virtual function 0
  • processing cluster 1 corresponds to virtual function 1
  • processing cluster 2 corresponds to virtual function 2
  • processing cluster 3 corresponds to virtual function 3.
  • the four virtual functions have equivalent processing capabilities.
  • processing cluster 0 corresponds to virtual function 0
  • processing cluster 1 processing cluster 2
  • processing cluster 3 correspond to virtual function 1.
  • virtual function 0 has a higher Weak processing power.
  • This example is equivalent to example 2.
  • processing cluster 0 corresponds to virtual function
  • processing cluster 1 corresponds to virtual function 1
  • processing cluster 2 corresponds to virtual function 1
  • processing cluster 3 corresponds to virtual function 1
  • virtual Functions 1 and 2 have the same processing capacity.
  • This example is equivalent to example 3.
  • processing cluster 0 corresponds to virtual function
  • processing cluster 1 corresponds to virtual function 1
  • processing cluster 2 corresponds to virtual function 1
  • processing cluster 3 corresponds to virtual function 2.
  • virtual function 0 has stronger processing capabilities
  • virtual function 1 and virtual function 2 have equivalent processing capabilities.
  • This example is equivalent to Example 4.
  • multiple virtual functions can share hardware resources.
  • the hardware resources can be processing cores, or memory (for example, DDR), buses, encoders/decoders, video/audio drivers, interface units, and so on.
  • PCIe board card resources it includes AI computing unit (IPU), graphics codec unit (VPU), graphics codec unit (JPU), and memory.
  • AI computing unit IPU
  • GPU graphics codec unit
  • JPU graphics codec unit
  • Figure 2-6a, Figure 2-6b and Figure 2-6c exemplarily show the resource occupation of the PEIe card when it is divided into 1, 2, and 4 virtual functions.
  • the above-mentioned multi-core processor may be a computing device with multiple computing cores, such as JPU and VPU.
  • the virtual function VF0 when there is one virtual function, the virtual function VF0 will dedicate all resources, that is, occupy all computing cores, all channels, all VPUs, and all JPUs.
  • the virtual function VF0 and virtual function VF2 will use half of the resources respectively, that is, VF0 occupies half of the computing cores, and VF1 occupies the other half of the computing cores.
  • VF0 can occupy channel 0 and channel 1
  • VF1 can occupy channel 2 and channel 3.
  • VPUs and JPUs VF0 can occupy VPU0 and VPU1, VF1 can occupy VPU2 and VPU3; VF0 can occupy JPU0 and JPU1, and VF1 can occupy JPU2 and JPU3.
  • the virtual functions VF0-VF3 each occupy 1/4 of the computing core.
  • the virtual functions VF0-VF3 can occupy channel 0-channel 3 respectively; the virtual functions VF0-VF3 can occupy VPU0-VPU3 respectively; the virtual functions VF0-VF3 respectively Can occupy JPU0-JPU3.
  • FIGS. 2-14 show a schematic block diagram of a virtualization system according to another embodiment of the present disclosure.
  • the virtualization system of the present disclosure further includes: a common driver, and the multiple virtual functions are driven by the common driver.
  • the driver may be common to all virtual functions, and it may be a program installed in the operating system.
  • the driver may, for example, establish a corresponding node for each virtual function VF, and the node may be a file stored in a certain directory (for example, a dev directory) for other applications to run or call.
  • the name of the file can vary from manufacturer to manufacturer.
  • Each container can contain one or more nodes, which means that each container can correspond to or contain one or more virtual functions.
  • each container may correspond to or contain a different number of nodes, so that the configuration of the container will be more flexible and the deployment will be more convenient.
  • the computing power of each virtual function may be different, it can be designed very flexibly according to requirements.
  • the method of the present disclosure can be further advanced to include establishing a one-to-one corresponding image for each container, and the image can communicate with the container.
  • the above-mentioned image can be established through docker-container technology.
  • the image can be remotely installed on the user side, and the user can run or call the container through the image, and then call the multi-core processor and other related resources.
  • Figure 2-11a and Figure 2-11b show a schematic diagram of the comparison between the virtual machine mode and the Docker-container mode.
  • the host passes the PCIe device (Passthrough) to the guest guest.
  • the guest includes a drive and a directory. Therefore, each guest needs to load the drive by itself. Create a node under the client's directory, that is, a character device.
  • the drive and directory are in the host, so only the host needs to load the drive, and the drive is common to all virtual functions. Therefore, the driver of the host creates a node, that is, a character device, in the host directory, and then transfers the device to the mirror device, that is, Docker. Therefore, compared with the virtual machine mode, the docker-container mode of the present disclosure does not require each client to install and load a driver, thereby simplifying the system setting and facilitating the user's use.
  • the lightweight virtualization solution based on Docker not only takes the entire card as the granularity, but requires multiple containers to be more granular, sharing one or more physical acceleration Card.
  • one or more VFs can be used. VFs between different containers can work independently and safely and separately from each other.
  • the hardware virtualization solution using SR-IOV can simultaneously support the use mode of Docker and can generate multiple VFs on the physical machine at the same time.
  • the system administrator can assign different VFs to different containers according to requirements. VFs belonging to different containers can work independently without interfering with each other, and VFs have the same robustness (also known as robustness) and safety isolation as those between PFs.
  • Docker has the advantage of faster startup, fewer resources, and higher system utilization. Whether it is development, testing or deployment, it is easier.
  • the present disclosure also provides a multi-core processor including a plurality of processing cores, wherein the multi-core processor is divided into a plurality of virtual functions, and each of the virtual functions corresponds to one or more processing cores.
  • the present disclosure also discloses an electronic device, including the above-mentioned virtualization system or the above-mentioned multi-core processor.
  • the electronic device may be a host, that is, the technical solution of the present disclosure is implemented in the host and communicates with an external mirror (docker).
  • the SR-IOV function has better tenant isolation and application hot migration characteristics, which can provide cloud service providers with safe and high-quality AI computing resources to fully protect users' investment in the AI field.
  • the solution of the present disclosure aims at a pain point of users, that is, how to efficiently use AI computing resources.
  • Chips, devices, and electronic devices that adopt the solution of the present disclosure support comprehensive AI inference scenario deployment, including diversified artificial intelligence applications such as vision, speech, and natural language processing.
  • the technical solution of the present disclosure supports diversified deployment scenarios such as data centers, professional scenarios, and even desktops.
  • Cloud-oriented deployment In the cloud deployment environment, cloud service providers (CSP) help massive tenants provide computing, storage, and network resource services in a cost-effective and highly available manner, and can provide up to 99.99% on this basis Highly available service level. Efficient sharing of resources from the Hypervisor and the underlying hardware, as well as the isolation of multi-tenants and instances from each other, have become the basic demands of AI cloud services.
  • CSP cloud service providers
  • Edge-oriented and end-side application development The solution of the present disclosure can achieve comprehensive coverage in the three dimensions of cloud, edge, and end. Due to the limitations of product form or network conditions, it is impossible to develop directly on the final deployed equipment.
  • the solution of the present disclosure supports the adoption of a terminal-cloud integrated development environment to help users quickly implement applications, and helping cloud-side computing resources to be efficiently and reasonably allocated to application development groups is an objective of the present disclosure.
  • the SR-IOV function provided by the present disclosure can make AI cloud, service deployment, and application development more flexible, efficient, and safer.
  • the virtualization technology adopted in the present disclosure allows multiple operating systems and application programs to coexist on a physical computing platform and share the computing resources of the same chip. It provides users with good security and isolation, and also supports highly flexible features such as hot migration. This virtualization technology also helps to increase the density of cloud computing, and also makes the IT asset management of the data center more flexible.
  • the SR-IOV virtualization technology of the present disclosure supports multiple instances running on a cloud server to directly share hardware resources of smart chips.
  • a large amount of resources and time consumption in traditional virtualization systems are at the hypervisor or VMM software level, and the performance advantages of PCIe devices cannot be fully utilized.
  • the value of SR-IOV is to eliminate this software bottleneck and help multiple virtual machines achieve efficient physical resource sharing.
  • the solution of the present disclosure adopts the "non-time slice-based sharing" method, because it has no performance loss due to time slice switching context, so it can fully guarantee each VF Independent service quality, run completely independently of each other without affecting each other.
  • SR-IOV can also avoid the performance overhead caused by time-division multiplexing and switching applications.
  • the service performance of a single VF remains above 91% of the hardware performance. This allows users to make more accurate quality of service (QoS) expectations for each VF when multiple models are in parallel, without having to consider the performance overhead caused by congestion or handover in multiple models.
  • QoS quality of service
  • Virtual functions based on SR-IOV can also provide better tenant isolation.
  • Virtualization technology is widely adopted by data centers, not only because it provides the ability to share resources (providing better density performance), but also because compared to other technologies (such as docker), virtualization provides better isolation and safety.
  • the virtualization technology based on SR-IOV in this disclosure can help cloud users achieve better isolation characteristics. The specific advantages are as follows:
  • the resources are independent and do not interfere with each other, which can ensure the quality of service (QoS); secondly, there is no trouble of queue-free blocking when multitasking; thirdly, it has independent memory resources, and the VFs are invisible to each other; finally, its The deployment is relatively simple, and there is no need to modify the open source software components.
  • QoS quality of service
  • the SR-IOV flat technology for Docker-container in the present disclosure (for example, as shown in Figure 2-12 to Figure 2-14) can provide a more efficient deployment method.
  • the technology of the present disclosure also provides SR-IOV-based virtualization extensions (SR-IOV flat mode) for docker-container, which is used for multiple containers (containers) to share a board card At the same time, it provides a management plug-in based on kubernetes. This feature provides a lighter deployment method for data centers that do not have that high demand for isolation and security.
  • the SR-IOV-Flat technology used in this disclosure has obvious advantages in isolation and QoS.
  • electronic equipment or devices can also include data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, Cameras, camcorders, projectors, watches, headsets, mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.
  • the transportation means include airplanes, ships, and/or vehicles;
  • the household appliances include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods;
  • the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
  • the present disclosure also provides a computer-readable storage medium having computer program code stored thereon, and when the computer program code is run by a processor, the method described above is executed.
  • FIGS. 2-9 show a combined processing device 900, which includes a computing device 902 (for example, the computing device 830 described in FIGS. 2-8, etc.), a universal interconnect interface 904, and other processing devices 906.
  • the computing device according to the present disclosure interacts with other processing devices to jointly complete operations specified by the user.
  • Figures 2-9 are schematic diagrams of the combined processing device.
  • Other processing devices include one or more types of general-purpose/special processors such as central processing unit CPU, graphics processing unit GPU, neural network processor, etc.
  • the number of processors included in other processing devices is not limited.
  • Other processing devices serve as the interface between the machine learning computing device and external data and control, including data handling, and completing basic controls such as turning on and stopping the machine learning computing device; other processing devices can also cooperate with the machine learning computing device to complete computing tasks.
  • the universal interconnection interface is used to transmit data and control commands between computing devices (including, for example, machine learning computing devices) and other processing devices.
  • the computing device obtains the required input data from other processing devices and writes it to the storage device on the computing device chip; it can obtain control instructions from other processing devices and write it to the control buffer on the computing device chip; it can also read from the computing device
  • the data in the storage module is transmitted to other processing devices.
  • the structure may further include a storage device 908, which is respectively connected to the computing device and the other processing device.
  • the storage device is used to store the data in the computing device and the other processing device, and is especially suitable for data that cannot be fully stored in the internal storage of the computing device or other processing device.
  • the combined processing device can be used as an SOC system on chip for mobile phones, robots, unmanned aerial vehicles, video surveillance equipment and other equipment, effectively reducing the core area of the control part, increasing processing speed, and reducing overall power consumption.
  • the universal interconnection interface of the combined processing device is connected to some parts of the equipment. Some components such as camera, monitor, mouse, keyboard, network card, wifi interface.
  • the present disclosure also discloses a chip, which includes the aforementioned computing device or combined processing device.
  • the present disclosure also discloses a board card, which includes the above-mentioned chip.
  • a board card which includes the above-mentioned chip.
  • the board may include other supporting components in addition to the above-mentioned chip 1002.
  • the supporting components include, but are not limited to: a storage device 1004 and an interface device 1006. ⁇ 1008 ⁇ And the control device 1008.
  • the storage device is connected to the chip in the chip packaging structure through a bus for storing data.
  • the storage device may include multiple groups of storage units 1010. Each group of the storage unit and the chip are connected by a bus. It can be understood that each group of the storage units may be DDR SDRAM (English: Double Data Rate SDRAM, double-rate synchronous dynamic random access memory).
  • the storage device may include 4 groups of the storage units. Each group of the storage unit may include a plurality of DDR4 particles (chips). In an embodiment, the chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification. In one embodiment, each group of the storage unit includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel. DDR can transmit data twice in one clock cycle. A controller for controlling the DDR is provided in the chip for controlling the data transmission and data storage of each storage unit.
  • the interface device is electrically connected with the chip in the chip packaging structure.
  • the interface device is used to implement data transmission between the chip and an external device 1012 (for example, a server or a computer).
  • the interface device may be a standard PCIE interface.
  • the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer.
  • the interface device may also be other interfaces.
  • the present disclosure does not limit the specific manifestations of the other interfaces mentioned above, and the interface unit only needs to be able to realize the switching function.
  • the calculation result of the chip is still transmitted by the interface device back to an external device (such as a server).
  • the control device is electrically connected with the chip.
  • the control device is used to monitor the state of the chip.
  • the chip and the control device may be electrically connected through an SPI interface.
  • the control device may include a single-chip microcomputer (Micro Controller Unit, MCU).
  • MCU Micro Controller Unit
  • the chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the chip can be in different working states such as multi-load and light-load.
  • the control device can realize the regulation and control of the working states of multiple processing chips, multiple processing and/or multiple processing circuits in the chip.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, optical, acoustic, magnetic or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or in the form of software program modules.
  • the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the computer software product is stored in a memory and includes several instructions to enable a computer device (which can be a personal computer, a server, or a network device) Etc.) Perform all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • a virtualization method based on a multi-core processor wherein the multi-core processor includes a plurality of processing cores, and the method includes:
  • Clause B2 The method according to clause B1, wherein there are multiple virtual machines, and multiple virtual machines can run independently.
  • Clause B3 The method according to Clause B1 or B2, wherein a certain number of processing cores constitute a processing cluster, and each virtual function corresponds to one or more processing clusters.
  • Clause B4 The method according to any one of clauses B1-3, wherein one virtual function is mapped to one virtual machine; or multiple virtual functions are mapped to one virtual machine.
  • Clause B5. The method according to any one of clauses B1-4, wherein each virtual function has independent hardware resources.
  • Clause B6 The method of any one of clauses B1-5, wherein the plurality of virtual functions are driven by different drivers.
  • Clause B7 The method according to clause B6, wherein a corresponding node is established for a corresponding virtual function through the driver.
  • a virtualization method based on a multi-core processor wherein the multi-core processor includes a plurality of processing cores, and the method includes:
  • the virtual function is mapped to the container.
  • Clause B9 The method according to clause B8, wherein there are multiple containers, and multiple containers can run independently.
  • Clause B10 The method according to clause B8 or 9, wherein a certain number of processing cores constitute a processing cluster, and multiple virtual functions share one or more processing clusters.
  • Clause B11 The method according to any one of clauses B8-10, wherein one virtual function is mapped to one container; or multiple virtual functions are mapped to one container.
  • Clause B12 The method of any of clauses B9-11, wherein the multiple virtual functions are driven by a common driver.
  • Clause B13 The method according to clause B12, wherein a corresponding node is established for each virtual function through the driver, and the container corresponds to one or more nodes.
  • Clause B14 The method of any one of clauses B8-13, further comprising establishing a one-to-one corresponding image for each of the containers, the image being able to communicate with the container.
  • a virtualization system including:
  • a multi-core processor where the multi-core processor includes a plurality of processing cores
  • a virtual machine the virtual machine corresponding to the virtual function.
  • Clause B16 The virtualization system according to Clause B15, wherein there are multiple virtual machines, and multiple virtual machines can run independently.
  • Clause B17 The virtualization system according to Clause B15 or 16, wherein a certain number of processing cores constitute a processing cluster, and multiple virtual functions share one or more processor clusters.
  • Clause B18 The virtualization system according to any one of clauses B15-17, wherein one virtual function corresponds to one virtual machine; or multiple virtual functions correspond to one virtual machine.
  • Clause B19 The virtualization system according to any one of clauses B15-18, wherein each virtual function has independent hardware resources.
  • Clause B20 The virtualization system according to any one of clauses B15-19, further comprising: a plurality of drivers, and the plurality of virtual functions are driven by different drivers.
  • Clause B21 The virtualization system according to Clause B22, wherein the driver is configured to establish a corresponding node for a corresponding virtual function.
  • a virtualization system including:
  • a multi-core processor where the multi-core processor includes a plurality of processing cores
  • a container the container corresponding to the virtual function.
  • Clause B23 The virtualization system according to clause B22, wherein there are multiple containers, and multiple containers can run independently.
  • Clause B24 The virtualization system according to Clause B22 or 23, wherein a certain number of processing cores constitute a processing cluster, and multiple virtual functions share one or more processor clusters.
  • Clause B25 The virtualization system according to any one of clauses B22-24, wherein one virtual function corresponds to one container; or multiple virtual functions correspond to one container.
  • Clause B26 The virtualization system of any one of clauses B22-25, wherein the multiple virtual functions share hardware resources.
  • Clause B27 The virtualization system according to any one of clauses B22-26, further comprising: a common driver, and the plurality of virtual functions are driven by the common driver.
  • Clause B28 The virtualization system according to clause B27, wherein the common driver is configured to establish a corresponding node for each virtual function, and the container corresponds to one or more nodes.
  • Clause B29 The virtualization system according to any one of clauses B22-28, further comprising an image, the image has a one-to-one correspondence with the container and can communicate with the container.
  • Clause B30 is a multi-core processor including multiple processing cores, where:
  • the multi-core processor is divided into multiple virtual functions, and the multiple virtual functions share one or more processing cores.
  • Clause B31 An electronic device comprising a virtualization system as described in any one of clauses B15-29 or a multi-core processor as described in clause B30.
  • Clause B32 A computer-readable storage medium with computer program code stored thereon, and when the computer program code is run by a processor, the method described in any one of clauses B1-14 is executed. 202010358635.4

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

一种虚拟化的方法、设备、板卡及计算机可读存储介质,具体方案为:将多核处理器划分为多个虚拟功能,每个该虚拟功能对应于一个或多个处理核;以及将该虚拟功能对应到容器。解决了现有的虚拟机方案存在计算效率低、队首阻塞(HOL Blocking)、相邻噪声较大、难以扩展等缺陷。

Description

一种虚拟化的方法、设备、板卡及计算机可读存储介质
相关申请的交叉引用
本申请要求于2020年2月28日申请的,申请号为202010131483.4,名称为“基于多核处理器的虚拟化方法、系统、多核处理器和电子设备”;于2020年4月29日申请的,申请号为202010358635.4,名称为“基于多核处理器的虚拟化方法、系统、多核处理器和电子设备”的中国专利申请的优先权,在此将其全文引入作为参考。
技术领域
本公开涉及人工智能领域,更具体地,涉及处理器的虚拟化技术。
背景技术
在计算机中,虚拟化(Virtualization)是一种资源管理技术,是将计算机的各种资源,如服务器、网络、内存及存储等,予以抽象、转换后呈现出来,使用户可以比原本的组态更好的方式来应用这些资源。
图1-1示出了一种通过时间切片(time slicing)技术来实现虚拟化的示意性框图。
如图1-1所示,有四个虚拟机VM0-VM3,这些虚拟机分别执行自身的任务,这些任务经过时间切片管理器之后,会形成时间切片并且按照时间进行排序。计算引擎根据时间切片来处理不同的任务(分时任务)。在此模式下,当虚拟机VM1工作时,则其他虚拟机无法工作,处于等待时间。在时间切片很小的时候,用户不太容易察觉时间延迟,但如果有某个虚拟机的任务占用大量时间(例如如图1-1所示的VM1)时,则其他用户会感受到明显的时间延迟,从而影响用户体验。
此外,现有技术中,计算引擎对于不同的虚拟机是公共的,一旦某个虚拟机造成了计算引擎出现问题,则会影响全部虚拟机的瘫痪,从而影响全部用户。
由此,现有的虚拟机方案存在计算效率低、队首阻塞(HOL Blocking)、相邻噪声较大、难以扩展等缺陷。
发明内容
202010131483.4本公开的目的是提供一种能够克服现有技术中至少一种缺陷的、基于多核处理器的虚拟化方法和系统。
根据本公开的第一方面,提供一种基于多核处理器的虚拟化方法,其中,所述多核处理器包括多个处理核,所述方法包括:将所述多核处理器划分为多个虚拟功能,每个所述虚拟功能对应于一个或多个处理核;以及将所述虚拟功能对应到容器。
根据本公开的第二方面,提供一种虚拟化系统,包括:多核处理器,所述多核处理器包括多个处理核;多个虚拟功能,每个所述虚拟功能对应于一个或多个处理核;以及容器,所述容器对应于所述虚拟功能。
根据本公开的第三方面,提供一种包括多个处理核的多核处理器,其中,所述多核处理器划被分为多个虚拟功能,每个所述虚拟功能对应于一个或多个处理核。
根据本公开的第四方面,提供一种电子设备,包括如上所述的虚拟化系统或者如上所述的多核处理器。
根据本公开的第五方面,提供一种计算机可读存储介质,其上存储有计算机程序代码, 当所述计算机程序代码由处理器运行时,执行上所述的方法。
本公开能够实现至少一个如下技术效果:
较高的服务质量(QoS);
无队首阻塞;
无相邻噪声影响;
无上下文切换开销;
易于扩展和部署。202010131483.4
202010358635.4本公开的目的是提供一种能够克服现有技术中至少一种缺陷的、基于多核处理器的虚拟化方法和系统。
根据本公开的第六方面,提供一种基于多核处理器的虚拟化方法,其中,所述多核处理器包括多个处理核,所述方法包括:将所述多核处理器划分为多个虚拟功能,每个所述虚拟功能对应于一个或多个处理核;以及将所述虚拟功能对应到虚拟机。
根据本公开的第七方面,提供一种基于多核处理器的虚拟化方法,其中,所述多核处理器包括多个处理核,所述方法包括:将所述多核处理器划分为多个虚拟功能,所述多个虚拟功能共享所述多个处理核;以及将所述虚拟功能对应到容器。
根据本公开的第八方面,提供一种虚拟化系统,包括:多核处理器,所述多核处理器包括多个处理核;多个虚拟功能,所述多个虚拟功能共享所述多个处理核;以及虚拟机,所述虚拟机对应于所述虚拟功能。
根据本公开的第九方面,提供一种虚拟化系统,包括:多核处理器,所述多核处理器包括多个处理核;多个虚拟功能,所述多个虚拟功能共享所述多个处理核;以及容器,所述容器对应于所述虚拟功能。
根据本公开的第十方面,提供一种包括多个处理核的多核处理器,其中,所述多核处理器划被分为多个虚拟功能,所述多个虚拟功能共享一个或多个处理核。
根据本公开的第十一方面,提供一种电子设备,包括如上所述的虚拟化系统或者如上所述的多核处理器。
根据本公开的第十二方面,提供一种计算机可读存储介质,其上存储有计算机程序代码,当所述计算机程序代码由处理器运行时,执行上所述的方法。
本公开能够实现至少一个如下技术效果:
较高的服务质量(QoS);
无队首阻塞;
无相邻噪声影响;
无上下文切换开销;
易于扩展和部署。202010358635.4
附图说明
通过参考附图阅读下文的详细描述,本公开示例性实施方式的上述以及其他目的、特征和优点将变得易于理解。在附图中,以示例性而非限制性的方式示出了本公开的若干实施方式,并且相同或对应的标号表示相同或对应的部分其中:
图1-1示出了一种通过时间切片(time slicing)技术来实现虚拟化的示意性框图;
图1-2a示出了本公开的方法可以应用的一个处理集群的内部结构示意图;
图1-2b示出了本公开的方法可以应用的人工智能处理器的结构示意图图;
图1-3示出了根据本公开第一方面的基于多核处理器的虚拟化方法;
图1-4示出了根据本公开的一个实施方式的一种虚拟化系统;
图1-5示出了根据本公开的一个实施方式的虚拟功能与处理集群进行对应的示意图;
图1-6a,图1-6b和图1-6c示例性地示出了分为1个、2个和4个虚拟功能时对PEIe卡的资源占用情况;
图1-7示出了根据本公开的又一个实施方式的虚拟化系统的示意性框图;
图1-8示例性地示出了虚拟化系统的结构示意图;
图1-9示出了根据本公开的组合处理装置的示意图;
图1-10示出了根据本公开的板卡的示意性框图;
图1-11a和图1-11b示出了虚拟机模式和Docker模式的对比示意图;
图2-1示出了一种通过时间切片(time slicing)技术来实现虚拟化的示意性框图;
图2-2a示出了本公开的方法可以应用的一个处理集群的内部结构示意图;
图2-2b示出了本公开的方法可以应用的人工智能处理器的结构示意图图;
图2-3示出了根据本公开第一方面的基于多核处理器的虚拟化方法;
图2-4示出了根据本公开的一个实施方式的一种虚拟化系统;
图2-5示出了根据本公开的一个实施方式的虚拟功能与处理集群进行对应的示意图;
图2-6a,图2-6b和图2-6c示例性地示出了分为1个、2个和4个虚拟功能时对PEIe卡的资源占用情况;
图2-7示出了根据本公开的又一个实施方式的虚拟化系统的示意性框图;
图2-8示例性地示出了虚拟化系统的结构示意图;
图2-9示出了根据本公开的组合处理装置的示意图;
图2-10示出了根据本公开的板卡的示意性框图;
图2-11a和图2-11b示出了虚拟机模式和Docker模式的对比示意图;
图2-12示出了根据本公开第一方面的基于多核处理器的虚拟化方法;
图2-13示出了根据本公开的一个实施方式的一种虚拟化系统;以及
图2-14示出了根据本公开的一个实施方式的虚拟化系统的示意性框图。
具体实施方式
202010131483.4下面将结合本披露实施例中的附图,对本披露实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本披露一部分实施例,而不是全部的实施例。基于本披露中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本披露保护的范围。
应当理解,本披露的权利要求、说明书及附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。本披露的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本披露说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本披露。如在本披露说明书和权利要求书中所使用的那样,除非上下文清 楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一步理解,在本披露说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本说明书和权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
进一步地,在本公开的说明书和权利要求中,两个部分之间的对应,可以理解为两个部分存在连接关系、响应关系或者匹配关系。
虚拟化是一种将一个计算机设备虚拟为多个虚拟机的技术。当在一台计算机上同时运行多个虚拟机时,每个虚拟机可运行相同或不同的操作系统,在操作系统上运行的应用程序可以在独立的空间内互不影响,从而显著提高计算机的工作效率。
虚拟化技术与多任务或是超线程技术是不同的。多任务是指在一个操作系统中多个程序同时运行,而在虚拟化技术中,则可以同时运行多个操作系统,而且每一个操作系统中都有多个程序运行,每一个操作系统都运行在一个虚拟机上。超线程技术只是单处理器模拟双处理器来平衡程序运行性能,这两个模拟出来的处理器是不能分离的,只能协同工作,而在虚拟化技术中,虚拟处理器是独立运作的。
虚拟化技术通常是采用软件重新定义划分计算机的物理资源,以实现计算机资源的动态分配、灵活调度、跨域共享,进而提高资源利用率。
图1-2a示出了本公开的方法可以应用的一个处理集群的内部结构示意图。
人工智能(AI)芯片加速了数据计算能力,降低了访存延时。AI芯片采用多核处理器架构,并加入存储单元核(也可称为片上或片内存储单元)来加速数据读取,解决了AI芯片的处理核与DDR(也可以称为片外存储单元)的访存瓶颈问题。为用户在处理深度学习、网络计算等场景中,提供更强的运算能力。
AI芯片例如可以有16个处理核,用于执行计算任务。每4个处理核组成一个处理集群,即共4个处理集群。每个处理集群内有个存储单元核。存储单元核主要用于处理集群内部的共享存储单元与处理核的数据交换和处理集群之间的数据交换。当存储核和处理核同时访问DDR时,通过多路复用器仲裁后,保证仅一组总线访问DDR。
图1-2b示出了本公开的方法可以应用的人工智能处理器的结构示意图。
AI芯片的DDR采用非统一内存存取(Non-Uniform Memory Access,NUMA)架构,每个处理集群可以通过NOC0访问不同的DDR通道,但访问不同的DDR通道的延时不同。每个处理集群都对应一个访问延时最低的DDR通道,访问其他通道时延时相对较长。如图1-1b中处理集群与DDR结构图所示,处理集群0,处理集群1,处理集群2和处理集群3分别访问对应的DDR0,DDR1,DDR2和DDR3时延时最低。也就是每个处理核访问了各自处理集群访存延时最低的DDR通道。
由于处理集群内部的访存带宽高于处理核和DDR之间的访问带宽,所以AI芯片可以通过采用处理集群来内部存取共享存储单元,以减少处理核直接访问DDR,从而提高了数据吞吐量。
当需要4核并行计算时,存储单元核可以通过数据广播方式(通过NOC1),将数据 由共享存储单元同时广播到处理集群内的4个处理核以进行数据计算。相对于所有处理核通过DDR来读取数据的方式,这种情况下能够降低访存延时,优化计算性能。
如果通过传统方式来进行虚拟化,那么所有的虚拟机将共享全部四个处理集群,当任务比较少的时候,某些处理集群将被空置,从而造成资源浪费。
上面描述了本公开的技术方案所应用的环境,下面将具体描述本公开的多个实施方式。下面结合图1-3和图1-4来描述本发明的具体实施方式。
图1-3示出了根据本公开第一方面的基于多核处理器,例如AI处理器,的虚拟化方法,其中,所述多核处理器包括多个处理核,所述方法包括:在操作1-S310,将所述多核处理器划分为多个虚拟功能,每个所述虚拟功能对应于一个或多个处理核;以及在操作1-S320,将所述虚拟功能对应到容器。
图1-4示出了根据本公开的一个实施方式的一种虚拟化系统,该虚拟化系统包括:多核处理器,所述多核处理器包括多个处理核;多个虚拟功能VF0-VF3,每个所述虚拟功能对应于一个或多个处理核;以及容器(容器0-容器3),所述容器对应于所述虚拟功能。
以上的方法和系统可以通过SR-IOV(Single Root I/O Virtualization)技术来实现。SR-IOV技术是一种基于硬件的虚拟化解决方案,可提供高性能和可伸缩性的虚拟解决方案。SR-IOV制定了标准化机制来实现多个虚拟机共享一个I/O设备。使得在虚拟机之间高效共享PCIe(Peripheral Component Interconnect Express,快速外设组件互连)设备,可以获得与本机相似的I/O性能。
SR-IOV分为的以下两种功能类型:
PF(Physical Function物理功能):具有PCI功能,用于支持SR-IOV功能,如SR-IOV规范中定义。PF包含SR-IOV功能结构,用于管理SR-IOV功能。PF是全功能的PCIe功能,可以像其他任何PCIe设备一样进行发现、管理和处理。PF拥有完全配置资源,可以用于配置或控制PCIe设备。
VF(Virtual Function虚拟功能):与PF关联的一种功能。VF是一种轻量级PCIe功能,可以与PF以及与同PEIe设备的其他VF共享物理资源。VF仅拥有用于其自身行为的配置资源。
每个SR-IOV设备都可有一个PF,并且每个PF可有多个与其关联的VF。每个VF都可以具有一个PCI内存空间,用于映射其寄存器集。VF设备驱动程序对寄存器集进行操作以启用其功能,并且现实为实际存在的PCI设备。创建VF后,可以直接将其指定给客户虚拟机VM。使得VF可以共享同一物理设备,并在没有CPU和虚拟机管理程序软件开销的情况下,执行数据的输入输出。
需要理解的是,上述的同一物理设备是指同一物理设备上的不同硬件资源。例如该物理设备可以是多核处理器,但硬件资源可以是该物理设备上不同的处理核。
由此可见,虚拟功能可以是单个或多个。当虚拟功能为单个时,则意味着可以将多核处理器中所有的处理核划分成单个虚拟功能;当虚拟功能为多个时,容器之间能够独立运行。独立运行是指每个容器相互隔离,运行不依赖于其他容器,并且也不会受到其他容器的影响,而且,由于本公开的隔离是基于硬件的隔离,因此彼此之间的干扰更少。此外,独立运行可以是每个容器采用不同的操作系统,而不相互影响。
虚拟功能可以执行如多核处理器一样的工作内容,其是通过将该多核处理器进行逻辑 划分所得到的。虚拟功能中可以包括一个或多个处理核,处理核越多,该虚拟功能的运算能力也越强。也可以将全部处理核划分到一个虚拟功能中。
如图1-3和图1-4所示,虚拟功能可以对应到容器,例如虚拟功能VF0对应到容器0,虚拟功能VF1对应到容器1,虚拟功能VF2对应到容器2,虚拟功能VF3对应到容器3。需要理解的是,这种对应关系仅仅是一种实例,本公开还可以采用其他的对应关系,从而更加便于系统的部署。这将在后文中进行更加详细的描述。此外,图1-4中尽管描述了4个虚拟功能和四个容器,但也可以是更少或更多的其他数量。
在本公开中,容器容纳了执行任务(例如任务0-任务3)所需的硬件资源和软件资源,其相互之间可以独立运行,互相不产生干扰。与现有技术中采用时间切片技术的虚拟化方案相比,本公开的技术方案由于采用了独立运行的容器,所以在容器之间不存在队首阻塞问题,也不会受到相邻的噪声影响,也没有上下文切换开销。
如图1-2a和图1-2b所示,在多核处理器中,特定数量的处理核构成一个处理集群,因此每个虚拟功能可以对应于一个或多个处理集群。
图1-5示出了根据本公开的一个实施方式的虚拟功能与处理集群进行对应的示意图。看需要理解的是,尽管图1-5以四个处理集群(处理集群0-处理集群3)为例进行了描述,但处理集群也可以是任何其他数量。
在图1-5所示的示例1中,处理集群0、处理集群1、处理集群2和处理集群3对应到虚拟功能0,即该多核处理器被划分为一个虚拟功能。
在图1-5所示的示例2中,处理集群0、处理集群1和处理集群2对应到虚拟功能0,处理集群3对应到虚拟功能1,即该多核处理器被划分为两个虚拟功能,虚拟功能0相对于虚拟功能1具有较强的处理能力。
在图1-5所示的示例3中,处理集群0和处理集群1对应到虚拟功能0,处理集群2和处理集群3对应到虚拟功能1,即该多核处理器被划分为两个虚拟功能,虚拟功能0和虚拟功能1具有等同的处理能力。
在图1-5所示的示例4中,处理集群0和处理集群1对应到虚拟功能0,处理集群2对应到虚拟功能1,处理集群3对应到虚拟功能2,即该多核处理器被划分为三个虚拟功能,虚拟功能0相对于虚拟功能1和虚拟功能2具有较强的处理能,虚拟功能1和虚拟功能2具有等同的处理能力。
在图1-5所示的示例5中,处理集群0对应到虚拟功能0,处理集群1对应到虚拟功能1,处理集群2对应到虚拟功能2,处理集群3对应到虚拟功能3,这四个虚拟功能具有等同的处理能力。
在图1-5所示的示例6中,处理集群0对应到虚拟功能0,处理集群1、处理集群2和处理集群3对应到虚拟功能1,相对于虚拟功能1,虚拟功能0具有较弱的处理能力。该示例等效于示例2。
在图1-5所示的示例7中,处理集群0对应到虚拟功能0,处理集群1对应到虚拟功能1,处理集群2对应到虚拟功能0,处理集群3对应到虚拟功能1,虚拟功能1和2具有等同的处理能力。该示例等效于示例3。
在图1-5所示的示例8中,处理集群0对应到虚拟功能0,处理集群1对应到虚拟功能1,处理集群2对应到虚拟功能0,处理集群3对应到虚拟功能2。虚拟功能0相对于虚拟功能1和虚拟功能2具有较强的处理能,虚拟功能1和虚拟功能2具有等同的处理能 力。该示例等效于示例4。
由此可见,通过将不同的处理集群对应到不同的虚拟功能,能够实现对虚拟功能的灵活配置,从而能够根据不同的需求来动态地配置虚拟功能的处理能力。因此,相对于现有技术,本公开的技术方案还具有配置简单和灵活的优点。
根据公开的又一个实施方式,每个虚拟功能具有独立的硬件资源。
这里所述的硬件资源,可以是处理核,也可以是存储器(例如DDR)、总线、编码器/解码器、视频/音频驱动器、接口单元等等。例如,对于PCIe板卡资源而言,其包括了AI计算单元(IPU)、图形编解码单元(VPU)、图形编解码单元(JPU)和内存。本公开对硬件资源的类型不做任何限制。
图1-6a,图1-6b和图1-6c示例性地示出了分为1个、2个和4个虚拟功能时对PEIe卡的资源占用情况。需要说明的是,上述的多核处理器可以是JPU,VPU等多种多个计算核的计算装置。
如图1-6a所示,当虚拟功能为1个时,该虚拟功能VF0将专用所有的资源,即占用全部的计算核,全部的通道,全部的VPU以及全部的JPU。
如图1-6b所示,当虚拟功能为2个时,虚拟功能VF0和虚拟功能VF2将分别使用一半的资源,即VF0占用一半的计算核,VF1占用另一半计算核。设具有四个DDR通道,则VF0可以占用通道0和通道1,VF1可以占用通道2和通道3。同样设有四个VPU和JPU,则VF0可以占用VPU0和VPU1,VF1可以占用VPU2和VPU3;VF0可以占用JPU0和JPU1,而VF1可以占用JPU2和JPU3。
如图1-6c所示,当虚拟功能为4个时,虚拟功能VF0-VF3各占1/4的计算核。同样,设具有四个DDR通道,四个VPU和四个JPU,则虚拟功能VF0-VF3分别可以占用通道0-通道3;虚拟功能VF0-VF3分别可以占用VPU0-VPU3;虚拟功能VF0-VF3分别可以占用JPU0-JPU3。
图1-7示出了根据本公开的又一个实施方式的虚拟化系统的示意性框图。
如图1-7所示,根据本公开的另一个实施方式,本公开的虚拟化系统进一步包括:公共驱动器,所述多个虚拟功能由所述公共驱动器来驱动。
该驱动器可以是对于所有虚拟功能公用的,其可以是安装在操作系统中的程序。该驱动器例如可以为每个虚拟功能VF建立对应的节点,节点可以是存储在某个目录(例如dev目录)下的文件,以供其他应用运行或调用。文件的名称可以根据厂商的不同而不同。
在创建了节点之后,可以将这些节点中的一个或多个包含或对应到相应的容器中。每个容器可以包含一个或多个节点,这意味着每个容器可以对应或包含一个或多个虚拟功能。在本公开中,每个容器可以对应或包含不同数量的节点,由此容器的配置将更加灵活,部署更加方便。此外,由于每个虚拟功能的运算能力可能不同,因此可以根据需求进行非常灵活的设计。
容器确立之后,本公开的方法可以进步一步包括为每个所述容器建立一一对应的镜像,所述镜像能够与所述容器进行通信。可以通过docker-container技术来建立上述的镜像。
镜像可以远程地安装在用户端,用户可以通过该镜像运行或调用容器,进而调用多核处理器以及其他相关的各种资源。
图1-8示例性地示出了虚拟化系统的结构示意图。在图1-8的系统中,采用虚拟机的 方式。
如图1-8所示,该框架800包括用户空间802、内核空间804及片上系统806,在图上以虚线区隔开。用户空间802为用户程序的运行空间,只执行简单的运算,不能直接调用系统资源,必须通过系统接口,才能向内核发出指令。内核空间804是内核代码运行的空间,可以执行任意命令,调用系统的一切资源。片上系统806为人工智能芯片的各模块,通过内核空间804与用户空间802进行协作。
除非另行强调,此实施例以将一个部件虚拟化为四个虚拟部件来示例说明,但本公开不限制虚拟部件的数量。
用户空间802在未运行虚拟化前,是由硬件监测器工具808所控制,通过调用接口获取片上系统806的信息。硬件监测器工具808不仅可以采集片上系统806的信息,还可以实时获取上层软件对片上系统806资源的开销,为用户实时掌握当前片上系统806的详细信息和状态,这些详细信息和状态可以是:硬件设备型号、固件版本号、驱动版本号、设备利用率、存储装置开销状态、板卡功耗和板卡峰值功耗、快速外设组件互连(PCIe)等数十种数据。基于硬件监测器工具808的版本及使用场景的不同,所监测的信息内容及数量会有所差异。
在系统启动虚拟化后,用户空间802的操作改由用户虚拟机810接管,用户虚拟机810是对真实计算环境的抽象和模拟,系统会分配一套数据结构来管理用户虚拟机810的状态,其数据结构包括全套寄存器、物理内存的使用情况、虚拟设备的状态等等。此实施例的用户空间802的物理空间虚拟化为四个虚拟空间812、814、816、818,这四个虚拟空间812、814、816、818独立互不影响,可分别搭载不同的客户操作系统,如图中所示的客户操作系统1、客户操作系统2、客户操作系统3及客户操作系统4,客户操作系统可以是Windows、Linux、Unix、iOS、安卓等,每个客户操作系统上分别运行不同的应用程序。
在此实施例中,用户虚拟机810是以快速仿真器(QEMU)来实现。QEMU是一个用C语言编写的开源虚拟化软件,通过动态二进制转换将接口虚拟化,并提供一系列的硬件模型,使得客户操作系统1、客户操作系统2、客户操作系统3、客户操作系统4都认为自己直接访问片上系统806。用户空间802包括处理器、存储器、I/O设备等,QEMU可以将用户空间802的处理器虚拟化为四个虚拟处理器,并将存储器虚拟化为四个虚拟存储器,亦将I/O设备的虚拟化为四个虚拟I/O设备。每个客户操作系统各占用一部分用户空间802的资源,例如各占四分之一,也就是分别能访问一个虚拟处理器、一个虚拟存储器及一个虚拟I/O设备,以执行该客户操作系统的任务。通过这种模式,客户操作系统1、客户操作系统2、客户操作系统3、客户操作系统4就能独立运作。
内核空间804载有内核虚拟机820及芯片驱动程序822。内核虚拟机820与QEMU搭配,主要负责内核空间804及片上系统806的虚拟化,使得每个客户操作系统在访问片上系统806时都能获得自己的地址空间。更详细来说,映射给客户操作系统的片上系统806上的空间实际上是映射给这个进程的虚拟部件。
从用户虚拟机810的角度来看,虚拟机运行期间,QEMU通过内核虚拟机820提供的系统调用接口进行内核设置,QEMU使用了内核虚拟机820的虚拟化功能,为自己的虚拟机提供硬件虚拟化加速以提高虚拟机的性能。从内核虚拟机820的角度来看,当用户无法直接跟内核空间804交互时,需要借助用户空间802的管理工具,因此需要借助QEMU 这个运行在用户空间802的工具。
芯片驱动程序822用以驱动物理功能826,在虚拟机运行期间,用户空间802不由硬件监测器工具808经芯片驱动程序822来访问片上系统806,因此客户操作系统1、客户操作系统2、客户操作系统3、客户操作系统4分别配置有内核空间824,用以载入芯片驱动程序822,使得各客户操作系统依然可以驱动片上系统806。
片上系统806是通过SR-IOV技术来执行虚拟化的,更详细来说,SR-IOV技术可以使得片上系统806的各部件虚拟化。这样,每个虚拟部件都有自己对应的唯一可访问的资源。
此实施例的片上系统806包含硬件和固件。硬件包括只读存储器ROM(未显示于图中),用以存储固件,而固件包括物理功能826,用于支持或协作SR-IOV的PCIe功能,物理功能826拥有完全配置PCIe资源的权力。在实施SR-IOV技术时,物理功能826会虚拟化出多个虚拟功能828,在此实施例中为四个虚拟功能828。虚拟功能828是一种轻量级PCIe功能,受物理功能826管理,可与物理功能826以及与同一物理功能826关联的其他虚拟功能828共享PCIe物理资源。虚拟功能828仅允许控制物理功能826配置给自己的资源。
一旦在物理功能826中启用了SR-IOV,各个虚拟功能828就可以通过自身的总线、设备和功能编号去访问的自己的PCIe配置空间。每个虚拟功能828都具有一个内存空间,用于映射其寄存器集。虚拟功能828驱动程序对寄存器集进行操作以启用其功能,并直接指定给对应的用户虚拟机810。虽然是虚拟的,但会让用户虚拟机810认为是实际存在的PCIe设备。
片上系统806的硬件还包括计算装置830、视频编解码装置832、JPEG编解码装置834、存储装置836及PCIe 838。在此实施例中,计算装置830为智能处理装置IPU,用以执行神经网络的卷积计算;视频编解码装置832用以对视频数据进行编解码;JPEG编解码装置834用以对采用JPEG算法的静态图片进行编解码;存储装置836可以为动态随机存取存储器(DRAM),用以存储数据;PCIe 838即为前述的PCIe,在虚拟机运行期间,PCIe 838会虚拟化为四个虚拟接口840,虚拟功能828与虚拟接口840为一对一对应关系,也就是第一虚拟功能对接第一虚拟接口,第二虚拟功能对接第二虚拟接口,以此类推。
通过SR-IOV技术,计算装置830虚拟化为四个虚拟计算装置842、将视频编解码装置832虚拟化为四个虚拟视频编解码装置844、将JPEG编解码装置834虚拟化为四个虚拟JPEG编解码装置846、将存储装置836虚拟化为四个虚拟存储装置848。
每个客户操作系统分别配置一组虚拟套件,每组虚拟套件包括一个用户虚拟机810、一个虚拟接口840、一个虚拟功能828、一个虚拟计算装置842、一个虚拟视频编解码装置844、一个虚拟JPEG编解码装置846及一个虚拟存储装置848。每组虚拟套件各自独立运行互不影响,用来执行相对应的客户操作系统所交付的任务,以确定每个客户操作系统能通过所配置的虚拟接口840及虚拟功能828访问所配置的虚拟计算装置842、虚拟视频编解码装置844、虚拟JPEG编解码装置846及虚拟存储装置848。
更详细来说,每个客户操作系统在执行任务时,响应任务的不同,所需访问的硬件可能也不同,例如:某个任务是进行运算,例如矩阵卷积运算,则该客户操作系统会通过所配置的虚拟接口840及虚拟功能828访问所配置的虚拟计算装置842;如某个任务是进 行视频编解码,则该客户操作系统会通过所配置的虚拟接口840及虚拟功能828访问所配置的虚拟视频编解码装置844;如某个任务是进行JPEG编解码,则该客户操作系统会通过所配置的虚拟接口840及虚拟功能828访问所配置的虚拟JPEG编解码装置846;如某个任务是读取或写入数据,则该客户操作系统会通过所配置的虚拟接口840及虚拟功能828访问所配置的虚拟存储装置848。
图1-11a和图1-11b示出了虚拟机模式和Docker模式的对比示意图。
在图1-11a中,在虚拟机模式下,主机Host将PCIe设备传递(Pass through)到客户机Guest里面,客户机包括了驱动器和目录,因此每个客户机需要自己加载驱动器,在并在客户机的目录下创建节点,即字符型设备。
在图1-11b中,在Docker模式下,驱动器和目录均处于主机中,因此只有主机需要加载驱动器,驱动器对于所有的虚拟功能是公共的。因此,主机的驱动器在主机目录下创建节点,即字符型设备,然后将设备传递到镜像设备,即Docker里面。由此,相对于虚拟机模式,本公开的docker-container模式无需每个客户机安装和加载驱动器,由此简化了系统的设置,方便了用户的使用。
与硬件虚拟机的使用场景类似,基于Docker的轻量化的虚拟化解决方案,不仅仅是以整张卡为粒度,而是需要多个容器更细粒度的,共享使用一张或多张物理加速卡。在每个Docker容器里,可以使用一个或者多个VF。不同的容器之间的VF可以独立地、安全隔离地相互工作。
使用SR-IOV的硬件虚拟化方案可以同时支持Docker的使用模式并且在物理机上能够同时生成多个VF。系统管理员可以根据需求把不同VF指派给不同容器。隶属不同容器的VF可互不干扰的独立工作,VF间具备与PF间同等的健壮(又称鲁棒性,robustness)和安全隔离性。相较于虚拟机模式,Docker的优点在于启动更快,需要的资源更少,对系统的利用率更高。无论是开发、测试还是部署都更加简单。
本公开还提供了一种包括多个处理核的多核处理器,其中,所述多核处理器划被分为多个虚拟功能,每个所述虚拟功能对应于一个或多个处理核。
本公开还公开了一种电子设备,包括上所述的虚拟化系统或者如上所述的多核处理器。该电子设备可以是主机,即本公开的技术方案实现在主机中,并与外部镜像(docker)进行通信。
根据不同的应用场景,电子设备或装置还可以包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。
本公开还提供一种计算机可读存储介质,其上存储有计算机程序代码,当所述计算机程序代码由处理器运行时,执行如上所述的方法。
本公开能够实现至少一个如下技术效果:
1.采用硬件隔离,安全性有较大提高,即使一个虚拟功能或容器出现问题,也不会影响其他部分的正常运行。
2.无需修改快速仿真器(QEMU),从而降低了设置系统的复杂度。
3.由于各个部分相对独立,因此延迟较小,具有较高的服务质量(QoS)。
4.无队首阻塞。
5.无相邻噪声影响。
6.无上下文切换开销,与传统的vGPU所采用的虚拟化技术不同,采用“非基于时间片的共享”方式,从而消除了因上下文切换带来的性能开销。
7.易于扩展和部署。
图1-9示出了一种组合处理装置900,其包括上述的计算装置902(例如图1-8所述的计算装置830等),通用互联接口904,和其他处理装置906。根据本公开的计算装置与其他处理装置进行交互,共同完成用户指定的操作。图1-9为组合处理装置的示意图。
其他处理装置,包括中央处理器CPU、图形处理器GPU、神经网络处理器等通用/专用处理器中的一种或以上的处理器类型。其他处理装置所包括的处理器数量不做限制。其他处理装置作为机器学习运算装置与外部数据和控制的接口,包括数据搬运,完成对本机器学习运算装置的开启、停止等基本控制;其他处理装置也可以和机器学习运算装置协作共同完成运算任务。
通用互联接口,用于在计算装置(包括例如机器学习运算装置)与其他处理装置间传输数据和控制指令。该计算装置从其他处理装置中获取所需的输入数据,写入该计算装置片上的存储装置;可以从其他处理装置中获取控制指令,写入计算装置片上的控制缓存;也可以读取计算装置的存储模块中的数据并传输给其他处理装置。
可选的,该结构还可以包括存储装置908,存储装置分别与所述计算装置和所述其他处理装置连接。存储装置用于保存在所述计算装置和所述其他处理装置的数据,尤其适用于所需要运算的数据在本计算装置或其他处理装置的内部存储中无法全部保存的数据。
该组合处理装置可以作为手机、机器人、无人机、视频监控设备等设备的SOC片上系统,有效降低控制部分的核心面积,提高处理速度,降低整体功耗。此情况时,该组合处理装置的通用互联接口与设备的某些部件相连接。某些部件譬如摄像头,显示器,鼠标,键盘,网卡,wifi接口。
在一些实施例里,本披露还公开了一种芯片,其包括了上述的计算装置或组合处理装置。
在一些实施例里,本披露还公开了一种板卡,其包括了上述芯片。参阅图1-10,其提供了一种示例性的板卡,上述板卡除了包括上述芯片1002以外,还可以包括其他的配套部件,该配套部件包括但不限于:存储器件1004、接口装置1006和控制器件1008。
所述存储器件与所述芯片封装结构内的芯片通过总线连接,用于存储数据。所述存储器件可以包括多组存储单元1010。每一组所述存储单元与所述芯片通过总线连接。可以理解,每一组所述存储单元可以是DDR SDRAM(英文:Double Data Rate SDRAM,双倍速率同步动态随机存储器)。
DDR不需要提高时钟频率就能加倍提高SDRAM的速度。DDR允许在时钟脉冲的上升沿和下降沿读出数据。DDR的速度是标准SDRAM的两倍。在一个实施例中,所述存储装置可以包括4组所述存储单元。每一组所述存储单元可以包括多个DDR4颗粒(芯片)。在一个实施例中,所述芯片内部可以包括4个72位DDR4控制器,上述72位DDR4控制器中64bit用于传输数据,8bit用于ECC校验。在一个实施例中,每一组所述存储单元包括多个并联设置的双倍速率同步动态随机存储器。DDR在一个时钟周期内可以传输两次 数据。在所述芯片中设置控制DDR的控制器,用于对每个所述存储单元的数据传输与数据存储的控制。
所述接口装置与所述芯片封装结构内的芯片电连接。所述接口装置用于实现所述芯片与外部设备1012(例如服务器或计算机)之间的数据传输。例如在一个实施例中,所述接口装置可以为标准PCIE接口。比如,待处理的数据由服务器通过标准PCIE接口传递至所述芯片,实现数据转移。在另一个实施例中,所述接口装置还可以是其他的接口,本披露并不限制上述其他的接口的具体表现形式,所述接口单元能够实现转接功能即可。另外,所述芯片的计算结果仍由所述接口装置传送回外部设备(例如服务器)。
所述控制器件与所述芯片电连接。所述控制器件用于对所述芯片的状态进行监控。具体的,所述芯片与所述控制器件可以通过SPI接口电连接。所述控制器件可以包括单片机(Micro Controller Unit,MCU)。如所述芯片可以包括多个处理芯片、多个处理核或多个处理电路,可以带动多个负载。因此,所述芯片可以处于多负载和轻负载等不同的工作状态。通过所述控制装置可以实现对所述芯片中多个处理芯片、多个处理和/或多个处理电路的工作状态的调控。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本披露并不受所描述的动作顺序的限制,因为依据本披露,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本披露所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本披露所提供的几个实施例中,应该理解到,所披露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性、光学、声学、磁性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本披露各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。
所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,当本披露的技术方案可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本披露各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
以上对本披露实施例进行了详细介绍,本文中应用了具体个例对本披露的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本披露的方法及其核心思想;同时,对于本领域的一般技术人员,依据本披露的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本披露的限制。
通过以下条款,可以对本公开的技术方案有更好的理解:
条款A1.一种基于多核处理器的虚拟化方法,其中,所述多核处理器包括多个处理核,所述方法包括:
将所述多核处理器划分为多个虚拟功能,每个所述虚拟功能对应于一个或多个处理核;以及
将所述虚拟功能对应到容器。
条款A2.根据条款A1所述的方法,其中,所述容器为多个,多个容器之间能够独立运行。
条款A3.根据条款A1或A2所述的方法,其中,特定数量的处理核构成一个处理集群,每个虚拟功能对应于一个或多个处理集群。
条款A4.根据条款A1-A3中任意一项所述的方法,其中,将一个虚拟功能对应到一个容器;或者将多个虚拟功能对应到一个容器。
条款A5.根据条款A1-A4中任意一项所述的方法,其中,每个虚拟功能具有独立的硬件资源。
条款A6.根据条款A1-A5中任意一项所述的方法,其中,所述多个虚拟功能由共同的驱动器来驱动。
条款A7.根据条款A6所述的方法,其中,通过所述驱动器为每个虚拟功能建立对应的节点,所述容器对应于一个或多个节点。
条款A8.根据条款A1-A7中任意一项所述的方法,进一步包括为每个所述容器建立一一对应的镜像,所述镜像能够与所述容器进行通信。
条款A9.一种虚拟化系统,包括:
多核处理器,所述多核处理器包括多个处理核;
多个虚拟功能,每个所述虚拟功能对应于一个或多个处理核;以及
容器,所述容器对应于所述虚拟功能。
条款A10.根据条款A9所述的虚拟化系统,其中,所述容器为多个,多个容器之间能够独立运行。
条款A11.根据条款A9或A10所述的虚拟化系统,其中,特定数量的处理核构成一个处理集群,每个虚拟功能对应于一个或多个处理器集群。
条款A12.根据条款A9-A11中任意一项所述的虚拟化系统,其中,一个虚拟功能对应到一个容器;或者多个虚拟功能对应到一个容器。
条款A13.根据条款A9-A12中任意一项所述的虚拟化系统,其中,每个虚拟功能具有独立的硬件资源。
条款A14.根据条款A9-A13中任意一项所述的虚拟化系统,进一步包括:公共驱动器,所述多个虚拟功能由所述公共驱动器来驱动。
条款A15.根据条款A14所述的虚拟化系统,其中,所述公共驱动器配置为为每个虚拟功能建立对应的节点,所述容器对应于一个或多个节点。
条款A16.根据条款A9-A15中任意一项所述的虚拟化系统,进一步包括镜像,所述镜像与所述容器一一对应,并且能够与所述容器进行通信。
条款A17.一种包括多个处理核的多核处理器,其中,
所述多核处理器划被分为多个虚拟功能,每个所述虚拟功能对应于一个或多个处理核。
条款A18.一种电子设备,包括如条款A9-A16中任意一项所述的虚拟化系统或者如条款A17所述的多核处理器。
条款A19.一种计算机可读存储介质,其上存储有计算机程序代码,当所述计算机程序代码由处理器运行时,执行条款A1-A8中任意一项所述的方法。202010131483.4
202010358635.4本公开涉及人工智能领域,更具体地,涉及处理器的虚拟化技术。
在计算机中,虚拟化(Virtualization)是一种资源管理技术,是将计算机的各种资源,如服务器、网络、内存及存储等,予以抽象、转换后呈现出来,使用户可以比原本的组态更好的方式来应用这些资源。图2-1示出了一种通过时间切片(time slicing)技术来实现虚拟化的示意性框图。
如图2-1所示,有四个虚拟机VM0-VM3,这些虚拟机分别执行自身的任务,这些任务经过时间切片管理器之后,会形成时间切片并且按照时间进行排序。计算引擎根据时间切片来处理不同的任务(分时任务)。在此模式下,当虚拟机VM1工作时,则其他虚拟机无法工作,处于等待时间。在时间切片很小的时候,用户不太容易察觉时间延迟,但如果有某个虚拟机的任务占用大量时间(例如如图2-1所示的VM1)时,则其他用户会感受到明显的时间延迟,从而影响用户体验。
此外,现有技术中,计算引擎对于不同的虚拟机是公共的,一旦某个虚拟机造成了计算引擎出现问题,则会影响全部虚拟机的瘫痪,从而影响全部用户。
由此,现有的虚拟机方案存在计算效率低、队首阻塞(HOL Blocking)、相邻噪声较大、难以扩展等缺陷。
下面将结合本披露实施例中的附图,对本披露实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本披露一部分实施例,而不是全部的实施例。基于本披露中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本披露保护的范围。
应当理解,本披露的权利要求、说明书及附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。本披露的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本披露说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本披露。如在本披露说明书和权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一步理解,在本披露说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本说明书和权利要求书中所使用的那样,术语“如果”可以依据上下文被解释 为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
进一步地,在本公开的说明书和权利要求中,两个部分之间的对应,可以理解为两个部分存在连接关系、响应关系或者匹配关系。
虚拟化是一种将一个计算机设备虚拟为多个虚拟机的技术。当在一台计算机上同时运行多个虚拟机时,每个虚拟机可运行相同或不同的操作系统,在操作系统上运行的应用程序可以在独立的空间内互不影响,从而显著提高计算机的工作效率。
虚拟化技术与多任务或是超线程技术是不同的。多任务是指在一个操作系统中多个程序同时运行,而在虚拟化技术中,则可以同时运行多个操作系统,而且每一个操作系统中都有多个程序运行,每一个操作系统都运行在一个虚拟机上。超线程技术只是单处理器模拟双处理器来平衡程序运行性能,这两个模拟出来的处理器是不能分离的,只能协同工作,而在虚拟化技术中,虚拟处理器是独立运作的。
虚拟化技术通常是采用软件重新定义划分计算机的物理资源,以实现计算机资源的动态分配、灵活调度、跨域共享,进而提高资源利用率。
图2-2a示出了本公开的方法可以应用的一个处理集群的内部结构示意图。
人工智能(AI)芯片加速了数据计算能力,降低了访存延时。AI芯片采用多核处理器架构,并加入存储单元核(也可称为片上或片内存储单元)来加速数据读取,解决了AI芯片的处理核与DDR(也可以称为片外存储单元)的访存瓶颈问题。为用户在处理深度学习、网络计算等场景中,提供更强的运算能力。
AI芯片例如可以有16个处理核,用于执行计算任务。每4个处理核组成一个处理集群,即共4个处理集群。每个处理集群内有多个存储单元核。存储单元核主要用于处理集群内部的共享存储单元与处理核的数据交换和处理集群之间的数据交换。当存储核和处理核同时访问DDR时,通过多路复用器仲裁后,保证仅一组总线访问DDR。
图2-2b示出了本公开的方法可以应用的人工智能处理器的结构示意图。
AI芯片的DDR采用非统一内存存取(Non-Uniform Memory Access,NUMA)架构,每个处理集群可以通过NOC0访问不同的DDR通道,但访问不同的DDR通道的延时不同。每个处理集群都对应一个访问延时最低的DDR通道,访问其他通道时延时相对较长。如图2-1b中处理集群与DDR结构图所示,处理集群0,处理集群1,处理集群2和处理集群3分别访问对应的DDR0,DDR1,DDR2和DDR3时延时最低。也就是每个处理核访问了各自处理集群访存延时最低的DDR通道。
由于处理集群内部的访存带宽高于处理核和DDR之间的访问带宽,所以AI芯片可以通过采用处理集群来内部存取共享存储单元,以减少处理核直接访问DDR,从而提高了数据吞吐量。
当需要4核并行计算时,存储单元核可以通过数据广播方式(通过NOC1),将数据由共享存储单元同时广播到处理集群内的4个处理核以进行数据计算。相对于所有处理核通过DDR来读取数据的方式,这种情况下能够降低访存延时,优化计算性能。
如果通过传统方式来进行虚拟化,那么所有的虚拟机将共享全部四个处理集群,当任务比较少的时候,某些处理集群将被空置,从而造成资源浪费。
上面描述了本公开的技术方案所应用的环境,下面将具体描述本公开的多个实施方式。下面结合图2-3和图2-4来描述本发明的具体实施方式。
图2-3示出了根据本公开第一方面的基于多核处理器,例如AI处理器,的虚拟化方法,其中,所述多核处理器包括多个处理核,所述方法包括:在操作2-S310,将所述多核处理器划分为多个虚拟功能,每个所述虚拟功能对应于一个或多个处理核;以及在操作2-S320,将所述虚拟功能对应到虚拟机。
图2-4示出了根据本公开的一个实施方式的一种虚拟化系统,该虚拟化系统包括:多核处理器,所述多核处理器包括多个处理核;多个虚拟功能VF0-VF3,每个所述虚拟功能对应于一个或多个处理核;以及虚拟机(虚拟机0-虚拟机3),所述虚拟机对应于所述虚拟功能。
以上的方法和系统可以通过SR-IOV(Single Root I/O Virtualization)技术来实现。SR-IOV技术是一种基于硬件的虚拟化解决方案,可提供高性能和可伸缩性的虚拟解决方案。SR-IOV制定了标准化机制来实现多个虚拟机共享一个I/O设备。使得在虚拟机之间高效共享PCIe(Peripheral Component Interconnect Express,快速外设组件互连)设备,可以获得与本机相似的I/O性能。
SR-IOV分为的以下两种功能类型:
PF(Physical Function物理功能):具有PCI功能,用于支持SR-IOV功能,如SR-IOV规范中定义。PF包含SR-IOV功能结构,用于管理SR-IOV功能。PF是全功能的PCIe功能,可以像其他任何PCIe设备一样进行发现、管理和处理。PF拥有完全配置资源,可以用于配置或控制PCIe设备。
VF(Virtual Function虚拟功能):与PF关联的一种功能。VF是一种轻量级PCIe功能,可以与PF以及与同PEIe设备的其他VF共享物理资源。VF仅拥有用于其自身行为的配置资源。
每个SR-IOV设备都可有一个PF,并且每个PF可有多个与其关联的VF。每个VF都可以具有一个PCI内存空间,用于映射其寄存器集。VF设备驱动程序对寄存器集进行操作以启用其功能,并且现实为实际存在的PCI设备。创建VF后,可以直接将其指定给客户虚拟机VM。使得VF可以共享同一物理设备,并在没有CPU和虚拟机管理程序软件开销的情况下,执行数据的输入输出。
需要理解的是,上述的同一物理设备是指同一物理设备上的不同硬件资源。例如该物理设备可以是多核处理器,但硬件资源可以是该物理设备上不同的处理核。
由此可见,虚拟功能可以是单个或多个。当虚拟功能为单个时,则意味着可以将多核处理器中所有的处理核划分成单个虚拟功能;当虚拟功能为多个时,虚拟机之间能够独立运行。独立运行是指每个虚拟机相互隔离,运行不依赖于其他虚拟机,并且也不会受到其他虚拟机的影响,而且,由于本公开的隔离是基于硬件的隔离,因此彼此之间的干扰更少。此外,独立运行可以是每个虚拟机采用不同的操作系统,而不相互影响。
虚拟功能可以执行如多核处理器一样的工作内容,其是通过将该多核处理器进行逻辑划分所得到的。虚拟功能中可以包括一个或多个处理核,处理核越多,该虚拟功能的运算能力也越强。也可以将全部处理核划分到一个虚拟功能中。
如图2-3和图2-4所示,虚拟功能可以对应到虚拟机,例如虚拟功能VF0对应到虚 拟机0,虚拟功能VF1对应到虚拟机1,虚拟功能VF2对应到虚拟机2,虚拟功能VF3对应到虚拟机3。需要理解的是,这种对应关系仅仅是一种实例,本公开还可以采用其他的对应关系,从而更加便于系统的部署。这将在后文中进行更加详细的描述。此外,图2-4中尽管描述了4个虚拟功能和4个虚拟机,但也可以是更少或更多的其他数量。
在本公开中,虚拟机之间可以独立运行,互相不产生干扰。与现有技术中采用时间切片技术的虚拟化方案相比,本公开的技术方案由于采用了独立运行的虚拟机,所以在虚拟机之间不存在队首阻塞问题,也不会受到相邻的噪声影响,也没有上下文切换开销。
如图2-2a和图2-2b所示,在多核处理器中,特定数量的处理核构成一个处理集群,因此每个虚拟功能可以对应于一个或多个处理集群。
图2-5示出了根据本公开的一个实施方式的虚拟功能与处理集群进行对应的示意图。看需要理解的是,尽管图2-5以四个处理集群(处理集群0-处理集群3)为例进行了描述,但处理集群也可以是任何其他数量。
在图2-5所示的示例中1中,处理集群0、处理集群1、处理集群2和处理集群3对应到虚拟功能0,即该多核处理器被划分为一个虚拟功能。
在图2-5所示的示例中2中,处理集群0、处理集群1和处理集群2对应到虚拟功能0,处理集群3对应到虚拟功能1,即该多核处理器被划分为两个虚拟功能,虚拟功能0相对于虚拟功能1具有较强的处理能力。
在图2-5所示的示例中3中,处理集群0和处理集群1对应到虚拟功能0,处理集群2和处理集群3对应到虚拟功能1,即该多核处理器被划分为两个虚拟功能,虚拟功能0和虚拟功能1具有等同的处理能力。
在图2-5所示的示例中4中,处理集群0和处理集群1对应到虚拟功能0,处理集群2对应到虚拟功能1,处理集群3对应到虚拟功能2,即该多核处理器被划分为三个虚拟功能,虚拟功能0相对于虚拟功能1和虚拟功能2具有较强的处理能,虚拟功能1和虚拟功能2具有等同的处理能力。
在图2-5所示的示例中5中,处理集群0对应到虚拟功能0,处理集群1对应到虚拟功能1,处理集群2对应到虚拟功能2,处理集群3对应到虚拟功能3,这四个虚拟功能具有等同的处理能力。
在图2-5所示的示例中6中,处理集群0对应到虚拟功能0,处理集群1、处理集群2和处理集群3对应到虚拟功能1,相对于虚拟功能1,虚拟功能0具有较弱的处理能力。该示例等效于示例2。
在图2-5所示的示例中7中,处理集群0对应到虚拟功能0,处理集群1对应到虚拟功能1,处理集群2对应到虚拟功能0,处理集群3对应到虚拟功能1,虚拟功能1和2具有等同的处理能力。该示例等效于示例3。
在图2-5所示的示例中8中,处理集群0对应到虚拟功能0,处理集群1对应到虚拟功能1,处理集群2对应到虚拟功能0,处理集群3对应到虚拟功能2。虚拟功能0相对于虚拟功能1和虚拟功能2具有较强的处理能,虚拟功能1和虚拟功能2具有等同的处理能力。该示例等效于示例4。
由此可见,通过将不同的处理集群对应到不同的虚拟功能,能够实现对虚拟功能的灵活配置,从而能够根据不同的需求来动态地配置虚拟功能的处理能力。因此,相对于现有技术,本公开的技术方案还具有配置简单和灵活的优点。
根据公开的又一个实施方式,每个虚拟功能具有独立的硬件资源。
这里所述的硬件资源,可以是处理核,也可以是存储器(例如DDR)、总线、编码器/解码器、视频/音频驱动器、接口单元等等。例如,对于PCIe板卡资源而言,其包括了AI计算单元(IPU)、图形编解码单元(VPU)、图形编解码单元(JPU)和内存。本公开对硬件资源的类型不做任何限制。
图2-6a,图2-6b和图2-6c示例性地示出了分为1个、2个和4个虚拟功能时对PEIe卡的资源占用情况。需要说明的是,上述的多核处理器可以是JPU,VPU等多种多个计算核的计算装置。
如图2-6a所示,当虚拟功能为1个时,该虚拟功能VF0将专用所有的资源,即占用全部的计算核,全部的通道,全部的VPU以及全部的JPU。
如图2-6b所示,当虚拟功能为2个时,虚拟功能VF0和虚拟功能VF2将分别使用一半的资源,即VF0占用一半的计算核,VF1占用另一半计算核。设具有四个DDR通道,则VF0可以占用通道0和通道1,VF1可以占用通道2和通道3。同样设有四个VPU和JPU,则VF0可以占用VPU0和VPU1,VF1可以占用VPU2和VPU3;VF0可以占用JPU0和JPU1,而VF1可以占用JPU2和JPU3。
如图2-6c所示,当虚拟功能为4个时,虚拟功能VF0-VF3各占1/4的计算核。同样,设具有四个DDR通道,四个VPU和四个JPU,则虚拟功能VF0-VF3分别可以占用通道0-通道3;虚拟功能VF0-VF3分别可以占用VPU0-VPU3;虚拟功能VF0-VF3分别可以占用JPU0-JPU3。
图2-7示出了根据本公开的又一个实施方式的虚拟化系统的示意性框图。
如图2-7所示,根据本公开的另一个实施方式,本公开的虚拟化系统进一步包括:多个驱动器,所述多个虚拟功能由不同的驱动器来驱动。
根据本公开的一个实施方式,通过所述驱动器为相应的虚拟功能建立对应的节点,即客户机包括了驱动器和目录,因此每个客户机需要自己加载驱动器,在并在客户机的目录下创建节点,即字符型设备。
图2-8示例性地示出了虚拟化系统的结构示意图。在图2-8的系统中,采用虚拟机的方式。
如图2-8所示,该框架800包括用户空间802、内核空间804及片上系统806,在图上以虚线区隔开。用户空间802为用户程序的运行空间,只执行简单的运算,不能直接调用系统资源,必须通过系统接口,才能向内核发出指令。内核空间804是内核代码运行的空间,可以执行任意命令,调用系统的一切资源。片上系统806为人工智能芯片的各模块,通过内核空间804与用户空间802进行协作。
除非另行强调,此实施例以将一个部件虚拟化为四个虚拟部件来示例说明,但本公开不限制虚拟部件的数量。
用户空间802在未运行虚拟化前,是由硬件监测器工具808所控制,通过调用接口获取片上系统806的信息。硬件监测器工具808不仅可以采集片上系统806的信息,还可以实时获取上层软件对片上系统806资源的开销,为用户实时掌握当前片上系统806的详细信息和状态,这些详细信息和状态可以是:硬件设备型号、固件版本号、驱动版本号、设备利用率、存储装置开销状态、板卡功耗和板卡峰值功耗、快速外设组件互连(PCIe)等数十种数据。基于硬件监测器工具808的版本及使用场景的不同,所监测的信息内容及 数量会有所差异。
在系统启动虚拟化后,用户空间802的操作改由用户虚拟机810接管,用户虚拟机810是对真实计算环境的抽象和模拟,系统会分配一套数据结构来管理用户虚拟机810的状态,其数据结构包括全套寄存器、物理内存的使用情况、虚拟设备的状态等等。此实施例的用户空间802的物理空间虚拟化为四个虚拟空间812、814、816、818,这四个虚拟空间812、814、816、818独立互不影响,可分别搭载不同的客户操作系统,如图中所示的客户操作系统1、客户操作系统2、客户操作系统3及客户操作系统4,客户操作系统可以是Windows、Linux、Unix、iOS、安卓等,每个客户操作系统上分别运行不同的应用程序。
在此实施例中,用户虚拟机810是以快速仿真器(QEMU)来实现。QEMU是一个用C语言编写的开源虚拟化软件,通过动态二进制转换将接口虚拟化,并提供一系列的硬件模型,使得客户操作系统1、客户操作系统2、客户操作系统3、客户操作系统4都认为自己直接访问片上系统806。用户空间802包括处理器、存储器、I/O设备等,QEMU可以将用户空间802的处理器虚拟化为四个虚拟处理器,并将存储器虚拟化为四个虚拟存储器,亦将I/O设备的虚拟化为四个虚拟I/O设备。每个客户操作系统各占用一部分用户空间802的资源,例如各占四分之一,也就是分别能访问一个虚拟处理器、一个虚拟存储器及一个虚拟I/O设备,以执行该客户操作系统的任务。通过这种模式,客户操作系统1、客户操作系统2、客户操作系统3、客户操作系统4就能独立运作。
内核空间804载有内核虚拟机820及芯片驱动程序822。内核虚拟机820与QEMU搭配,主要负责内核空间804及片上系统806的虚拟化,使得每个客户操作系统在访问片上系统806时都能获得自己的地址空间。更详细来说,映射给客户操作系统的片上系统806上的空间实际上是映射给这个进程的虚拟部件。
从用户虚拟机810的角度来看,虚拟机运行期间,QEMU通过内核虚拟机820提供的系统调用接口进行内核设置,QEMU使用了内核虚拟机820的虚拟化功能,为自己的虚拟机提供硬件虚拟化加速以提高虚拟机的性能。从内核虚拟机820的角度来看,当用户无法直接跟内核空间804交互时,需要借助用户空间802的管理工具,因此需要借助QEMU这个运行在用户空间802的工具。
芯片驱动程序822用以驱动物理功能826,在虚拟机运行期间,用户空间802不由硬件监测器工具808经芯片驱动程序822来访问片上系统806,因此客户操作系统1、客户操作系统2、客户操作系统3、客户操作系统4分别配置有内核空间824,用以载入芯片驱动程序822,使得各客户操作系统依然可以驱动片上系统806。
片上系统806是通过SR-IOV技术来执行虚拟化的,更详细来说,SR-IOV技术可以使得片上系统806的各部件虚拟化。这样,每个虚拟部件都有自己对应的唯一可访问的资源。
此实施例的片上系统806包含硬件和固件。硬件包括只读存储器ROM(未显示于图中),用以存储固件,而固件包括物理功能826,用于支持或协作SR-IOV的PCIe功能,物理功能826拥有完全配置PCIe资源的权力。在实施SR-IOV技术时,物理功能826会虚拟化出多个虚拟功能828,在此实施例中为四个虚拟功能828。虚拟功能828是一种轻量级PCIe功能,受物理功能826管理,可与物理功能826以及与同一物理功能826关联的其他虚拟功能828共享PCIe物理资源。虚拟功能828仅允许控制物理功能826配置给 自己的资源。
一旦在物理功能826中启用了SR-IOV,各个虚拟功能828就可以通过自身的总线、设备和功能编号去访问的自己的PCIe配置空间。每个虚拟功能828都具有一个内存空间,用于映射其寄存器集。虚拟功能828驱动程序对寄存器集进行操作以启用其功能,并直接指定给对应的用户虚拟机810。虽然是虚拟的,但会让用户虚拟机810认为是实际存在的PCIe设备。
片上系统806的硬件还包括计算装置830、视频编解码装置832、JPEG编解码装置834、存储装置836及PCIe 838。在此实施例中,计算装置830为智能处理装置IPU,用以执行神经网络的卷积计算;视频编解码装置832用以对视频数据进行编解码;JPEG编解码装置834用以对采用JPEG算法的静态图片进行编解码;存储装置836可以为动态随机存取存储器(DRAM),用以存储数据;PCIe 838即为前述的PCIe,在虚拟机运行期间,PCIe 838会虚拟化为四个虚拟接口840,虚拟功能828与虚拟接口840为一对一对应关系,也就是第一虚拟功能对接第一虚拟接口,第二虚拟功能对接第二虚拟接口,以此类推。
通过SR-IOV技术,计算装置830虚拟化为四个虚拟计算装置842、将视频编解码装置832虚拟化为四个虚拟视频编解码装置844、将JPEG编解码装置834虚拟化为四个虚拟JPEG编解码装置846、将存储装置836虚拟化为四个虚拟存储装置848。
每个客户操作系统分别配置一组虚拟套件,每组虚拟套件包括一个用户虚拟机810、一个虚拟接口840、一个虚拟功能828、一个虚拟计算装置842、一个虚拟视频编解码装置844、一个虚拟JPEG编解码装置846及一个虚拟存储装置848。每组虚拟套件各自独立运行互不影响,用来执行相对应的客户操作系统所交付的任务,以确定每个客户操作系统能通过所配置的虚拟接口840及虚拟功能828访问所配置的虚拟计算装置842、虚拟视频编解码装置844、虚拟JPEG编解码装置846及虚拟存储装置848。
更详细来说,每个客户操作系统在执行任务时,响应任务的不同,所需访问的硬件可能也不同,例如:某个任务是进行运算,例如矩阵卷积运算,则该客户操作系统会通过所配置的虚拟接口840及虚拟功能828访问所配置的虚拟计算装置842;如某个任务是进行视频编解码,则该客户操作系统会通过所配置的虚拟接口840及虚拟功能828访问所配置的虚拟视频编解码装置844;如某个任务是进行JPEG编解码,则该客户操作系统会通过所配置的虚拟接口840及虚拟功能828访问所配置的虚拟JPEG编解码装置846;如某个任务是读取或写入数据,则该客户操作系统会通过所配置的虚拟接口840及虚拟功能828访问所配置的虚拟存储装置848。
在上文中,描述了一种基于多核处理器的虚拟化方法,采用了虚拟机的方式。而在下文中,可以采用docker-container的方式。
如图2-12所示,本公开还提供一种基于多核处理器的虚拟化方法,其中,所述多核处理器包括多个处理核,所述方法包括:在操作2-S1210,将所述多核处理器划分为多个虚拟功能,所述多个虚拟功能共享所述多个处理核;以及在操作2-S1220,将所述虚拟功能对应到容器。
图2-13示出了根据本公开的一个实施方式种虚拟化系统,该虚拟化系统包括:多核处理器,所述多核处理器包括多个处理核;多个虚拟功能,所述多个虚拟功能共享所述多个处理核;以及容器,所述容器对应于所述虚拟功能。
如图2-12和图2-13所示,虚拟功能可以对应到容器,例如虚拟功能VF0对应到容器0,虚拟功能VF1对应到容器1,虚拟功能VF2对应到容器2,虚拟功能VF3对应到容器3。需要理解的是,这种对应关系仅仅是一种实例,本公开还可以采用其他的对应关系,从而更加便于系统的部署。这将在后文中进行更加详细的描述。此外,图2-13中尽管描述了4个虚拟功能和4个容器,但也可以是更少或更多的其他数量。
在本公开中,容器容纳了执行任务(例如任务0-任务3)所需的硬件资源和软件资源,其相互之间可以独立运行,互相不产生干扰。与现有技术中采用时间切片技术的虚拟化方案相比,本公开的技术方案由于采用了独立运行的容器,所以在容器之间不存在队首阻塞问题,也不会受到相邻的噪声影响,也没有上下文切换开销。
如图2-2a和图2-2b所示,在多核处理器中,特定数量的处理核构成一个处理集群,因此多个虚拟功能共享一个或多个处理器集群。
图2-5示出了根据本公开的一个实施方式的虚拟功能与处理集群进行对应的示意图。看需要理解的是,尽管图2-5以四个处理集群(处理集群0-处理集群3)为例进行了描述,但处理集群也可以是任何其他数量。
在图2-5所示的示例中1中,处理集群0、处理集群1、处理集群2和处理集群3对应到虚拟功能0,即该多核处理器被划分为一个虚拟功能。
在图2-5所示的示例中2中,处理集群0、处理集群1和处理集群2对应到虚拟功能0,处理集群3对应到虚拟功能1,即该多核处理器被划分为两个虚拟功能,虚拟功能0相对于虚拟功能1具有较强的处理能力。
在图2-5所示的示例中3中,处理集群0和处理集群1对应到虚拟功能0,处理集群2和处理集群3对应到虚拟功能1,即该多核处理器被划分为两个虚拟功能,虚拟功能0和虚拟功能1具有等同的处理能力。
在图2-5所示的示例中4中,处理集群0和处理集群1对应到虚拟功能0,处理集群2对应到虚拟功能1,处理集群3对应到虚拟功能2,即该多核处理器被划分为三个虚拟功能,虚拟功能0相对于虚拟功能1和虚拟功能2具有较强的处理能,虚拟功能1和虚拟功能2具有等同的处理能力。
在图2-5所示的示例中5中,处理集群0对应到虚拟功能0,处理集群1对应到虚拟功能1,处理集群2对应到虚拟功能2,处理集群3对应到虚拟功能3,这四个虚拟功能具有等同的处理能力。
在图2-5所示的示例中6中,处理集群0对应到虚拟功能0,处理集群1、处理集群2和处理集群3对应到虚拟功能1,相对于虚拟功能1,虚拟功能0具有较弱的处理能力。该示例等效于示例2。
在图2-5所示的示例中7中,处理集群0对应到虚拟功能0,处理集群1对应到虚拟功能1,处理集群2对应到虚拟功能0,处理集群3对应到虚拟功能1,虚拟功能1和2具有等同的处理能力。该示例等效于示例3。
在图2-5所示的示例中8中,处理集群0对应到虚拟功能0,处理集群1对应到虚拟功能1,处理集群2对应到虚拟功能0,处理集群3对应到虚拟功能2。虚拟功能0相对于虚拟功能1和虚拟功能2具有较强的处理能,虚拟功能1和虚拟功能2具有等同的处理能力。该示例等效于示例4。
由此可见,通过多个虚拟功能共享一个或多个处理器集群,能够实现对虚拟功能的 灵活配置,从而能够根据不同的需求来动态地配置虚拟功能的处理能力。因此,相对于现有技术,本公开的技术方案还具有配置简单和灵活的优点。
在本公开中,多个虚拟功能可以共享硬件资源,硬件资源可以是处理核,也可以是存储器(例如DDR)、总线、编码器/解码器、视频/音频驱动器、接口单元等等。例如,对于PCIe板卡资源而言,其包括了AI计算单元(IPU)、图形编解码单元(VPU)、图形编解码单元(JPU)和内存。本公开对硬件资源的类型不做任何限制。
图2-6a,图2-6b和图2-6c示例性地示出了分为1个、2个和4个虚拟功能时对PEIe卡的资源占用情况。需要说明的是,上述的多核处理器可以是JPU,VPU等多种多个计算核的计算装置。
如图2-6a所示,当虚拟功能为1个时,该虚拟功能VF0将专用所有的资源,即占用全部的计算核,全部的通道,全部的VPU以及全部的JPU。
如图2-6b所示,当虚拟功能为2个时,虚拟功能VF0和虚拟功能VF2将分别使用一半的资源,即VF0占用一半的计算核,VF1占用另一半计算核。设具有四个DDR通道,则VF0可以占用通道0和通道1,VF1可以占用通道2和通道3。同样设有四个VPU和JPU,则VF0可以占用VPU0和VPU1,VF1可以占用VPU2和VPU3;VF0可以占用JPU0和JPU1,而VF1可以占用JPU2和JPU3。
如图2-6c所示,当虚拟功能为4个时,虚拟功能VF0-VF3各占1/4的计算核。同样,设具有四个DDR通道,四个VPU和四个JPU,则虚拟功能VF0-VF3分别可以占用通道0-通道3;虚拟功能VF0-VF3分别可以占用VPU0-VPU3;虚拟功能VF0-VF3分别可以占用JPU0-JPU3。
图2-14示出了根据本公开的又一个实施方式的虚拟化系统的示意性框图。
如图2-14所示,根据本公开的另一个实施方式,本公开的虚拟化系统进一步包括:公共驱动器,所述多个虚拟功能由所述公共驱动器来驱动。
该驱动器可以是对于所有虚拟功能公用的,其可以是安装在操作系统中的程序。该驱动器例如可以为每个虚拟功能VF建立对应的节点,节点可以是存储在某个目录(例如dev目录)下的文件,以供其他应用运行或调用。文件的名称可以根据厂商的不同而不同。
在创建了节点之后,可以将这些节点中的一个或多个包含或对应到相应的容器中。每个容器可以包含一个或多个节点,这意味着每个容器可以对应或包含一个或多个虚拟功能。在本公开中,每个容器可以对应或包含不同数量的节点,由此容器的配置将更加灵活,部署更加方便。此外,由于每个虚拟功能的运算能力可能不同,因此可以根据需求进行非常灵活的设计。
容器确立之后,本公开的方法可以进步一步包括为每个所述容器建立一一对应的镜像,所述镜像能够与所述容器进行通信。可以通过docker-container技术来建立上述的镜像。
镜像可以远程地安装在用户端,用户可以通过该镜像运行或调用容器,进而调用多核处理器以及其他相关的各种资源。
图2-11a和图2-11b示出了虚拟机模式和Docker-container模式的对比示意图。
在图2-11a中,在虚拟机模式下,主机Host将PCIe设备传递(Pass through)到客户机Guest里面,客户机包括了驱动器和目录,因此每个客户机需要自己加载驱动器,在并在客户机的目录下创建节点,即字符型设备。
在图2-11b中,在Docker模式下,驱动器和目录均处于主机中,因此只有主机需要加载驱动器,驱动器对于所有的虚拟功能是公共的。因此,主机的驱动器在主机目录下创建节点,即字符型设备,然后将设备传递到镜像设备,即Docker里面。由此,相对于虚拟机模式,本公开的docker-container模式无需每个客户机安装和加载驱动器,由此简化了系统的设置,方便了用户的使用。
与硬件虚拟机的使用场景类似,基于Docker的轻量化的虚拟化解决方案,不仅仅是以整张卡为粒度,而是需要多个容器更细粒度的,共享使用一张或多张物理加速卡。在每个Docker容器里,可以使用一个或者多个VF。不同的容器之间的VF可以独立地、安全隔离地相互工作。
使用SR-IOV的硬件虚拟化方案可以同时支持Docker的使用模式并且在物理机上能够同时生成多个VF。系统管理员可以根据需求把不同VF指派给不同容器。隶属不同容器的VF可互不干扰的独立工作,VF间具备与PF间同等的健壮(又称鲁棒性,robustness)和安全隔离性。相较于虚拟机模式,Docker的优点在于启动更快,需要的资源更少,对系统的利用率更高。无论是开发、测试还是部署都更加简单。
本公开还提供了一种包括多个处理核的多核处理器,其中,所述多核处理器划被分为多个虚拟功能,每个所述虚拟功能对应于一个或多个处理核。
本公开还公开了一种电子设备,包括上所述的虚拟化系统或者如上所述的多核处理器。该电子设备可以是主机,即本公开的技术方案实现在主机中,并与外部镜像(docker)进行通信。
SR-IOV功能具备更好的租户隔离、应用热迁移特性,可为云服务供应商提供安全、优质的AI计算资源,以充分保障用户在AI领域的投资。
本公开的方案瞄准了用户的一个痛点,即如何高效利用AI计算资源。
采用本公开的方案的芯片、装置和电子设备支持全面的AI推断场景部署,包括视觉、语音、自然语言处理等多样化的人工智能应用。本公开的技术方案支撑数据中心、专业场景乃至桌面等多元化部署场景。
在这些部署场景中,面向云端部署、多样化人工智能推断、以及配合边缘侧板卡进行应用开发时,如何对AI计算资源的有效利用是用户首要关心的问题。也是本公开的SR-IOV虚拟化功能的核心诉求:
1)面向云端部署:在云部署环境下,云服务提供商(CSP)帮助海量租户以高性价比、高可用性的方式提供计算、存储、网络资源的服务,在此基础上还可提供高达99.99%的高可用服务级别。从Hypervisor和底层硬件上对资源进行高效共享以及多租户、实例进行相互隔离,成为了AI云服务的基本诉求。
2)面向复杂的人工智能推断:在AI应用进行部署时,用户通常会遇到业务逻辑较为复杂的场景,需借助多个网络模型来构建AI辅助决策系统。为保证服务器节点内的服务质量,通常会采用一机多卡的部署方式。但计算成本和服务质量需要兼顾时,用户会希望用单张板卡并行多个模型来解决问题。
3)面向边缘、端侧应用开发:本公开的方案能够在云、边、端三个维度实现全面覆盖,在面向边缘侧和端侧的应用开发过程中,用户经常会受部署侧的CPU、产品形态或网络条件的限制,无法直接在最终部署的设备上进行开发。本公开的方案支持采用端云一体的开发环境帮助用户快速将应用落地,而帮助云侧计算资源高效、合理的分配给应用开 发组,是本公开的一个目标。
本公开所提供的SR-IOV功能能够让AI云、业务部署和应用开发更灵活、高效、安全。
本公开采用的虚拟化技术允许多个操作系统和应用程序共存于一个物理计算平台上,共享同一个芯片的计算资源。它为用户提供良好的安全性和隔离性,还支持如热迁移等高灵活特性。本虚拟化技术还有助于提高云计算密度,也使数据中心的IT资产管理更灵活。
除了虚拟化基本的资源共享特性,本公开的SR-IOV虚拟化技术支持运行在云服务器上的多个实例直接共享智能芯片的硬件资源。传统虚拟化系统中大量的资源和时间损耗在Hypervisor或VMM软件层面,PCIe设备的性能优势无法彻底发挥。而SR-IOV的价值在于消除这一软件瓶颈,助力多个虚拟机实现高效物理资源共享。
与传统图形加速卡的vGPU所采用的虚拟化技术不同,本公开的方案采用「非基于时间片的共享」方式,因为其没有因时间片切换上下文带来的性能损失,因此能充分保证各VF独立的服务质量,彼此完全独立运行互不影响。
另外,SR-IOV还可以避免因分时复用切换应用带来的性能开销。如上图显示,虚拟功能搭配Docker或虚拟机(VM)运行时,单个VF业务性能保持在硬件性能的91%以上。这使得用户在多模型并行时,对各VF可以做出更准确的服务质量(QoS)预期,而不必考虑多模型时的拥塞或切换带来的性能开销。
基于SR-IOV的虚拟功能(例如vMLU)还能够提供更好的租户隔离性。虚拟化技术被数据中心广泛采用,除了因为其提供了对资源共享的能力(提供了更好的密度性能),也因为相对于其它技术(如docker),虚拟化提供了更好的隔离性和安全性。本公开中基于SR-IOV的虚拟化技术可以帮助云用户实现更好的隔离特性,具体优势如下:
首先,资源独立,互不干扰,能确保服务质量(QoS);其次,多任务时,没有无队列阻塞的烦恼;再次,其具备独立内存资源,各VF之间互不可见;最后,它的部署相对简单,不需要对开源软件成分进行修改。
本公开中面向Docker-container的SR-IOV flat技术(例如如图2-12至图2-14所示)能够提供更高效的部署方式。除了对虚拟机(VM)提供虚拟化支持,本公开的技术还对docker-container提供基于SR-IOV的虚拟化扩展(SR-IOV flat模式),用于多个容器(container)共享一块板卡的计算能力,同时,提供了基于kubernetes的管理插件。该功能为那些对隔离性和安全性需求没那么高的数据中心提供更轻量级部署方式。
相对于弹性GPU共享池技术(Elastic GPUs Shared Pools),本公开所采用的SR-IOV-Flat技术在隔离性、QoS上都有明显优势。
根据不同的应用场景,电子设备或装置还可以包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。
本公开还提供一种计算机可读存储介质,其上存储有计算机程序代码,当所述计算机程序代码由处理器运行时,执行如上所述的方法。
本公开能够实现至少一个如下技术效果:
1.采用硬件隔离,安全性有较大提高,即使一个虚拟功能或容器出现问题,也不会影响其他部分的正常运行。
2.无需修改快速仿真器(QEMU),从而降低了设置系统的复杂度。
3.由于各个部分相对独立,因此延迟较小,具有较高的服务质量(QoS)。
4.无队首阻塞。
5.无相邻噪声影响。
6.无上下文切换开销,与传统的vGPU所采用的虚拟化技术不同,采用“非基于时间片的共享”方式,从而消除了因上下文切换带来的性能开销。
7.易于扩展和部署。
图2-9示出了一种组合处理装置900,其包括计算装置902(例如图2-8所述的计算装置830等),通用互联接口904,和其他处理装置906。根据本公开的计算装置与其他处理装置进行交互,共同完成用户指定的操作。图2-9为组合处理装置的示意图。
其他处理装置,包括中央处理器CPU、图形处理器GPU、神经网络处理器等通用/专用处理器中的一种或以上的处理器类型。其他处理装置所包括的处理器数量不做限制。其他处理装置作为机器学习运算装置与外部数据和控制的接口,包括数据搬运,完成对本机器学习运算装置的开启、停止等基本控制;其他处理装置也可以和机器学习运算装置协作共同完成运算任务。
通用互联接口,用于在计算装置(包括例如机器学习运算装置)与其他处理装置间传输数据和控制指令。该计算装置从其他处理装置中获取所需的输入数据,写入该计算装置片上的存储装置;可以从其他处理装置中获取控制指令,写入计算装置片上的控制缓存;也可以读取计算装置的存储模块中的数据并传输给其他处理装置。
可选的,该结构还可以包括存储装置908,存储装置分别与所述计算装置和所述其他处理装置连接。存储装置用于保存在所述计算装置和所述其他处理装置的数据,尤其适用于所需要运算的数据在本计算装置或其他处理装置的内部存储中无法全部保存的数据。
该组合处理装置可以作为手机、机器人、无人机、视频监控设备等设备的SOC片上系统,有效降低控制部分的核心面积,提高处理速度,降低整体功耗。此情况时,该组合处理装置的通用互联接口与设备的某些部件相连接。某些部件譬如摄像头,显示器,鼠标,键盘,网卡,wifi接口。
在一些实施例里,本披露还公开了一种芯片,其包括了上述的计算装置或组合处理装置。
在一些实施例里,本披露还公开了一种板卡,其包括了上述芯片。参阅图2-10,其提供了一种示例性的板卡,上述板卡除了包括上述芯片1002以外,还可以包括其他的配套部件,该配套部件包括但不限于:存储器件1004、接口装置1006和控制器件1008。
所述存储器件与所述芯片封装结构内的芯片通过总线连接,用于存储数据。所述存储器件可以包括多组存储单元1010。每一组所述存储单元与所述芯片通过总线连接。可以理解,每一组所述存储单元可以是DDR SDRAM(英文:Double Data Rate SDRAM,双倍速率同步动态随机存储器)。
DDR不需要提高时钟频率就能加倍提高SDRAM的速度。DDR允许在时钟脉冲的上升沿和下降沿读出数据。DDR的速度是标准SDRAM的两倍。在一个实施例中,所述存储装置可以包括4组所述存储单元。每一组所述存储单元可以包括多个DDR4颗粒(芯片)。 在一个实施例中,所述芯片内部可以包括4个72位DDR4控制器,上述72位DDR4控制器中64bit用于传输数据,8bit用于ECC校验。在一个实施例中,每一组所述存储单元包括多个并联设置的双倍速率同步动态随机存储器。DDR在一个时钟周期内可以传输两次数据。在所述芯片中设置控制DDR的控制器,用于对每个所述存储单元的数据传输与数据存储的控制。
所述接口装置与所述芯片封装结构内的芯片电连接。所述接口装置用于实现所述芯片与外部设备1012(例如服务器或计算机)之间的数据传输。例如在一个实施例中,所述接口装置可以为标准PCIE接口。比如,待处理的数据由服务器通过标准PCIE接口传递至所述芯片,实现数据转移。在另一个实施例中,所述接口装置还可以是其他的接口,本披露并不限制上述其他的接口的具体表现形式,所述接口单元能够实现转接功能即可。另外,所述芯片的计算结果仍由所述接口装置传送回外部设备(例如服务器)。
所述控制器件与所述芯片电连接。所述控制器件用于对所述芯片的状态进行监控。具体的,所述芯片与所述控制器件可以通过SPI接口电连接。所述控制器件可以包括单片机(Micro Controller Unit,MCU)。如所述芯片可以包括多个处理芯片、多个处理核或多个处理电路,可以带动多个负载。因此,所述芯片可以处于多负载和轻负载等不同的工作状态。通过所述控制装置可以实现对所述芯片中多个处理芯片、多个处理和/或多个处理电路的工作状态的调控。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本披露并不受所描述的动作顺序的限制,因为依据本披露,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本披露所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本披露所提供的几个实施例中,应该理解到,所披露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性、光学、声学、磁性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本披露各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。
所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,当本披露的技术方案可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本披露各个实施例所述 方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
以上对本披露实施例进行了详细介绍,本文中应用了具体个例对本披露的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本披露的方法及其核心思想;同时,对于本领域的一般技术人员,依据本披露的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本披露的限制。
通过以下条款,可以对本公开的技术方案有更好的理解:
条款B1.一种基于多核处理器的虚拟化方法,其中,所述多核处理器包括多个处理核,所述方法包括:
将所述多核处理器划分为多个虚拟功能,每个所述虚拟功能对应于一个或多个处理核;以及
将所述虚拟功能对应到虚拟机。
条款B2.根据条款B1所述的方法,其中,所述虚拟机为多个,多个虚拟机之间能够独立运行。
条款B3.根据条款B1或B2所述的方法,其中,特定数量的处理核构成一个处理集群,每个虚拟功能对应于一个或多个处理集群。
条款B4.根据条款B1-3中任意一项所述的方法,其中,将一个虚拟功能对应到一个虚拟机;或者将多个虚拟功能对应到一个虚拟机。
条款B5.根据条款B1-4中任意一项所述的方法,其中,每个虚拟功能具有独立的硬件资源。
条款B6.根据条款B1-5中任意一项所述的方法,其中,所述多个虚拟功能由不同的驱动器来驱动。
条款B7.根据条款B6所述的方法,其中,通过所述驱动器为相应的虚拟功能建立对应的节点。
条款B8.一种基于多核处理器的虚拟化方法,其中,所述多核处理器包括多个处理核,所述方法包括:
将所述多核处理器划分为多个虚拟功能,所述多个虚拟功能共享所述多个处理核;以及
将所述虚拟功能对应到容器。
条款B9.根据条款B8所述的方法,其中,所述容器为多个,多个容器之间能够独立运行。
条款B10.根据条款B8或9所述的方法,其中,特定数量的处理核构成一个处理集群,多个虚拟功能共享一个或多个处理集群。
条款B11.根据条款B8-10中任意一项所述的方法,其中,将一个虚拟功能对应到一个容器;或者将多个虚拟功能对应到一个容器。
条款B12.根据条款B9-11中任意一项所述的方法,其中,所述多个虚拟功能由共同的驱动器来驱动。
条款B13.根据条款B12所述的方法,其中,通过所述驱动器为每个虚拟功能建立对应的节点,所述容器对应于一个或多个节点。
条款B14.根据条款B8-13中任意一项所述的方法,进一步包括为每个所述容器建立一一对应的镜像,所述镜像能够与所述容器进行通信。
条款B15.一种虚拟化系统,包括:
多核处理器,所述多核处理器包括多个处理核;
多个虚拟功能,所述多个虚拟功能共享所述多个处理核;以及
虚拟机,所述虚拟机对应于所述虚拟功能。
条款B16.根据条款B15所述的虚拟化系统,其中,所述虚拟机为多个,多个虚拟机之间能够独立运行。
条款B17.根据条款B15或16所述的虚拟化系统,其中,特定数量的处理核构成一个处理集群,多个虚拟功能共享一个或多个处理器集群。
条款B18.根据条款B15-17中任意一项所述的虚拟化系统,其中,一个虚拟功能对应到一个虚拟机;或者多个虚拟功能对应到一个虚拟机。
条款B19.根据条款B15-18中任意一项所述的虚拟化系统,其中,每个虚拟功能具有独立的硬件资源。
条款B20.根据条款B15-19中任意一项所述的虚拟化系统,进一步包括:多个驱动器,所述多个虚拟功能由不同的驱动器来驱动。
条款B21.根据条款B22所述的虚拟化系统,其中,所述驱动器配置为,为相应的虚拟功能建立对应的节点。
条款B22.一种虚拟化系统,包括:
多核处理器,所述多核处理器包括多个处理核;
多个虚拟功能,所述多个虚拟功能共享所述多个处理核;以及
容器,所述容器对应于所述虚拟功能。
条款B23.根据条款B22所述的虚拟化系统,其中,所述容器为多个,多个容器之间能够独立运行。
条款B24.根据条款B22或23所述的虚拟化系统,其中,特定数量的处理核构成一个处理集群,多个虚拟功能共享一个或多个处理器集群。
条款B25.根据条款B22-24中任意一项所述的虚拟化系统,其中,一个虚拟功能对应到一个容器;或者多个虚拟功能对应到一个容器。
条款B26.根据条款B22-25中任意一项所述的虚拟化系统,其中,所述多个虚拟功能共享硬件资源。
条款B27.根据条款B22-26中任意一项所述的虚拟化系统,进一步包括:公共驱动器,所述多个虚拟功能由所述公共驱动器来驱动。
条款B28.根据条款B27所述的虚拟化系统,其中,所述公共驱动器配置为,为每个虚拟功能建立对应的节点,所述容器对应于一个或多个节点。
条款B29.根据条款B22-28中任意一项所述的虚拟化系统,进一步包括镜像,所述镜像与所述容器一一对应,并且能够与所述容器进行通信。
条款B30一种包括多个处理核的多核处理器,其中,
所述多核处理器划被分为多个虚拟功能,所述多个虚拟功能共享一个或多个处理核。
条款B31.一种电子设备,包括如条款B15-29中任意一项所述的虚拟化系统或者如条款B30所述的多核处理器。
条款B32.一种计算机可读存储介质,其上存储有计算机程序代码,当所述计算机程序代码由处理器运行时,执行条款B1-14中任意一项所述的方法。202010358635.4

Claims (19)

  1. 一种基于多核处理器的虚拟化方法,其中,所述多核处理器包括多个处理核,所述方法包括:
    将所述多核处理器划分为多个虚拟功能,每个所述虚拟功能对应于一个或多个处理核;以及
    将所述虚拟功能对应到容器。
  2. 根据权利要求1所述的方法,其中,所述容器为多个,多个容器之间能够独立运行。
  3. 根据权利要求1或2所述的方法,其中,特定数量的处理核构成一个处理集群,每个虚拟功能对应于一个或多个处理集群。
  4. 根据权利要求1-3中任意一项所述的方法,其中,将一个虚拟功能对应到一个容器;或者将多个虚拟功能对应到一个容器。
  5. 根据权利要求1-4中任意一项所述的方法,其中,每个虚拟功能具有独立的硬件资源。
  6. 根据权利要求1-5中任意一项所述的方法,其中,所述多个虚拟功能由共同的驱动器来驱动。
  7. 根据权利要求6所述的方法,其中,通过所述驱动器为每个虚拟功能建立对应的节点,所述容器对应于一个或多个节点。
  8. 根据权利要求1-7中任意一项所述的方法,进一步包括为每个所述容器建立一一对应的镜像,所述镜像能够与所述容器进行通信。
  9. 一种虚拟化系统,包括:
    多核处理器,所述多核处理器包括多个处理核;
    多个虚拟功能,每个所述虚拟功能对应于一个或多个处理核;以及
    容器,所述容器对应于所述虚拟功能。
  10. 根据权利要求9所述的虚拟化系统,其中,所述容器为多个,多个容器之间能够独立运行。
  11. 根据权利要求9或10所述的虚拟化系统,其中,特定数量的处理核构成一个处理集群,每个虚拟功能对应于一个或多个处理器集群。
  12. 根据权利要求9-11中任意一项所述的虚拟化系统,其中,一个虚拟功能对应到一个容器;或者多个虚拟功能对应到一个容器。
  13. 根据权利要求9-12中任意一项所述的虚拟化系统,其中,每个虚拟功能具有独立的硬件资源。
  14. 根据权利要求9-13中任意一项所述的虚拟化系统,进一步包括:公共驱动器,所述多个虚拟功能由所述公共驱动器来驱动。
  15. 根据权利要求14所述的虚拟化系统,其中,所述公共驱动器配置为,为每个虚拟功能建立对应的节点,所述容器对应于一个或多个节点。
  16. 根据权利要求9-15中任意一项所述的虚拟化系统,进一步包括镜像,所述镜像与所述容器一一对应,并且能够与所述容器进行通信。
  17. 一种包括多个处理核的多核处理器,其中,
    所述多核处理器划被分为多个虚拟功能,每个所述虚拟功能对应于一个或多个处 理核。
  18. 一种电子设备,包括如权利要求9-16中任意一项所述的虚拟化系统或者如权利要求17所述的多核处理器。
  19. 一种计算机可读存储介质,其上存储有计算机程序代码,当所述计算机程序代码由处理器运行时,执行权利要求1-8中任意一项所述的方法。
PCT/CN2021/077977 2020-02-28 2021-02-25 一种虚拟化的方法、设备、板卡及计算机可读存储介质 WO2021170054A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/904,824 US20230111884A1 (en) 2020-02-28 2021-02-25 Virtualization method, device, board card and computer-readable storage medium

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202010131483.4A CN113326118A (zh) 2020-02-28 2020-02-28 基于多核处理器的虚拟化方法、系统、多核处理器和电子设备
CN202010131483.4 2020-02-28
CN202010358635.4 2020-04-29
CN202010358635.4A CN113568734A (zh) 2020-04-29 2020-04-29 基于多核处理器的虚拟化方法、系统、多核处理器和电子设备

Publications (1)

Publication Number Publication Date
WO2021170054A1 true WO2021170054A1 (zh) 2021-09-02

Family

ID=77489872

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/077977 WO2021170054A1 (zh) 2020-02-28 2021-02-25 一种虚拟化的方法、设备、板卡及计算机可读存储介质

Country Status (2)

Country Link
US (1) US20230111884A1 (zh)
WO (1) WO2021170054A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017028317A1 (en) * 2015-08-20 2017-02-23 Hewlett Packard Enterprise Development Lp Containerized virtual network function
US20170329644A1 (en) * 2016-05-16 2017-11-16 Fujitsu Limited Computer-readable recording medium having stored therein program, information processing apparatus, information processing system, and method for processing information
US20170344391A1 (en) * 2016-05-26 2017-11-30 International Business Machines Corporation Extending trusted hypervisor functions with existing device drivers
CN107463402A (zh) * 2017-07-31 2017-12-12 腾讯科技(深圳)有限公司 虚拟操作系统的运行方法和装置
CN109983438A (zh) * 2016-12-22 2019-07-05 英特尔公司 使用直接存储器访问(dma)重新映射来加速半虚拟化网络接口
CN110569101A (zh) * 2018-06-05 2019-12-13 华为技术有限公司 管理容器服务的方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017028317A1 (en) * 2015-08-20 2017-02-23 Hewlett Packard Enterprise Development Lp Containerized virtual network function
US20170329644A1 (en) * 2016-05-16 2017-11-16 Fujitsu Limited Computer-readable recording medium having stored therein program, information processing apparatus, information processing system, and method for processing information
US20170344391A1 (en) * 2016-05-26 2017-11-30 International Business Machines Corporation Extending trusted hypervisor functions with existing device drivers
CN109983438A (zh) * 2016-12-22 2019-07-05 英特尔公司 使用直接存储器访问(dma)重新映射来加速半虚拟化网络接口
CN107463402A (zh) * 2017-07-31 2017-12-12 腾讯科技(深圳)有限公司 虚拟操作系统的运行方法和装置
CN110569101A (zh) * 2018-06-05 2019-12-13 华为技术有限公司 管理容器服务的方法和装置

Also Published As

Publication number Publication date
US20230111884A1 (en) 2023-04-13

Similar Documents

Publication Publication Date Title
US11669372B2 (en) Flexible allocation of compute resources
US10180843B2 (en) Resource processing method and device for a multi-core operating system
US8930507B2 (en) Physical memory shared among logical partitions in a VLAN
US10275558B2 (en) Technologies for providing FPGA infrastructure-as-a-service computing capabilities
EP4053706A1 (en) Cross address-space bridging
CN113326226A (zh) 一种虚拟化的方法、装置、板卡及计算机可读存储介质
US20120144146A1 (en) Memory management using both full hardware compression and hardware-assisted software compression
CN114900699A (zh) 视频编解码卡虚拟化方法、装置、存储介质及终端
WO2021223744A1 (zh) 实现热迁移的方法、芯片、板卡和存储介质
CN115658586A (zh) 资源管理芯片、方法、电子设备及可读存储介质
TWI616759B (zh) 設備分配控制器以及設備分配方法
CN113568734A (zh) 基于多核处理器的虚拟化方法、系统、多核处理器和电子设备
WO2021170054A1 (zh) 一种虚拟化的方法、设备、板卡及计算机可读存储介质
CN115809158A (zh) 一种车载座舱娱乐系统用双系统多通道共享内存方法
CN113326118A (zh) 基于多核处理器的虚拟化方法、系统、多核处理器和电子设备
CN115202808A (zh) 一种用于虚拟化环境中片上系统的dma方法及系统
US11853798B2 (en) Disaggregated memory pool assignment
CN113326091A (zh) 一种虚拟化的方法、设备、板卡及计算机可读存储介质
WO2021170055A1 (zh) 一种虚拟化的方法、设备、板卡及计算机可读存储介质
CN114816648A (zh) 一种计算装置和计算方法
Yang et al. On construction of a virtual GPU cluster with InfiniBand and 10 Gb Ethernet virtualization
US11620120B1 (en) Configuration of secondary processors
CN113326110A (zh) 一种片上系统及板卡
US20240020174A1 (en) Memory disaggregation in a multi-node environment
CN113326092A (zh) 一种虚拟化的方法、设备、板卡及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21759791

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21759791

Country of ref document: EP

Kind code of ref document: A1