CN113326110A

CN113326110A - System on chip and board card

Info

Publication number: CN113326110A
Application number: CN202011177174.7A
Authority: CN
Inventors: 不公告发明人
Original assignee: Anhui Cambricon Information Technology Co Ltd
Current assignee: Anhui Cambricon Information Technology Co Ltd
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2021-08-31
Also published as: CN113326109A; CN113326121A

Abstract

The present disclosure relates to a virtualized computer frame, wherein the computer frame may include a user space, a kernel space, and a system on chip, which may include a computing device, a video codec device, a JPEG codec device, and a storage device, which interact with the user space and the kernel space to jointly complete a designated computing operation.

Description

System on chip and board card

Technical Field

The present disclosure relates generally to the field of computers. More particularly, the present disclosure relates to a virtualized system-on-chip and a board card.

Background

Time slice round robin scheduling is the most widely used algorithm in the computer field. Each process is assigned a time period, called a time slice, that is, the time that the process is allowed to run. If the process is still running at the end of the time slice, the process may pause and allocate processor resources to another process. If the process is blocked or ended before the time slice is ended, the processor immediately switches. All the scheduler has to do is maintain a list of ready processes, and when a process runs out of its allocated time slice, its other tasks are moved to the end of the queue.

There are many problems with time slice round robin scheduling, such as quality of service (QoS), isolation and head of line blocking (HOL) cannot be guaranteed. Especially in the field of artificial intelligence chips, it needs a lot of computing resources, for example, allocating computer resources by using time slice round robin scheduling will result in too low running efficiency, so how to obtain a technical solution for reasonably allocating hardware resources is still a problem to be solved in the prior art.

Disclosure of Invention

To at least partially solve the technical problems mentioned in the background, the solution of the present disclosure provides a virtualized system on chip and a board card.

In one aspect, the present disclosure discloses a system on a chip, comprising: virtual computing means for performing convolution calculations of the neural network; the virtual video coding and decoding device is used for coding and decoding videos; the virtual JPEG coding and decoding device is used for carrying out JPEG coding and decoding; and a virtual storage device for storing data.

In another aspect, the present disclosure discloses a board including the above system on chip.

The virtualization technology disclosed by the invention divides resources of the system on chip into a plurality of virtual components for simultaneous use by a plurality of virtual machines in a user space, and provides excellent resource sharing and parallelism, isolation, configuration flexibility and safety.

Drawings

The description of the exemplary embodiments of the present disclosure as well as other objects, features and advantages thereof will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar or corresponding parts and in which:

FIG. 1 is a framework diagram illustrating an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating the internal structure of a computing device according to an embodiment of the disclosure;

FIG. 3 is a schematic diagram illustrating a flexible deployment cluster according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram illustrating a board card according to another embodiment of the present disclosure;

FIG. 5 is a flow diagram illustrating user space virtualization of another embodiment of the present disclosure;

FIG. 6 is a flow diagram illustrating system-on-chip virtualization of another embodiment of the present disclosure; and

FIG. 7 is a flow diagram illustrating virtualization of a computing device according to another embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure. The described embodiments are only a subset of the embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, description, and drawings of the present disclosure are used to distinguish between different objects and are not used to describe a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection".

Specific embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.

Virtualization is a technique for virtualizing a computer device into a plurality of virtual machines. When a plurality of virtual machines are simultaneously operated on one computer, each virtual machine can operate different operating systems, and application programs operated on the operating systems can not influence each other in independent spaces, so that the working efficiency of the computer is obviously improved.

Virtualization techniques are distinct from multitasking or hyper-threading techniques. Multitasking refers to the simultaneous operation of multiple programs in one operating system, while in virtualization technology, multiple operating systems can be operated simultaneously, and each operating system has multiple programs running therein, and each operating system runs on a corresponding virtual machine. Hyper-threading is a technique in which a single processor simulates two processors to balance program execution performance, the two simulated processors cannot be separated and can only work together, and in a virtualization technique, virtual processors or components operate independently.

The virtualization technology generally redefines and divides physical resources of a computer by software to realize dynamic allocation and flexible scheduling of the computer resources, thereby improving the resource utilization rate.

In the present disclosure, description of hardware, software, firmware, etc. is referred to, and hardware includes various devices, units, devices, apparatuses, etc., software includes various operating systems, machines, programs, tools, etc., and firmware includes functions, etc., which are described by components when referring to the general names of hardware, software, firmware, etc. Such an arrangement is merely for the purpose of more clearly describing the technology of the present disclosure and is not intended to limit the technology of the present disclosure in any way.

One embodiment of the present disclosure is a framework employing virtualization technology for application on an artificial intelligence chip. More particularly, the present invention relates to a machine learning device for neural networks, which may be a convolutional neural network accelerator. FIG. 1 is a block diagram of an artificial intelligence chip virtualization, the block diagram 100 including a user space 102, a kernel space 104, and a system-on-chip 106, separated by dashed lines. The user space 102 is an operating space of a user program, and only simple operations are performed, and system resources cannot be directly called, and an instruction can be issued to a kernel only through a system interface. The kernel space 104 is a space where kernel code runs, and can execute any command and call all resources of the system. The system-on-chip 106 is a module of an artificial intelligence chip that cooperates with the user space 102 through the kernel space 104.

In this embodiment, the hardware of the user space 102 is collectively referred to as a device or apparatus, and the hardware of the system-on-chip 106 is collectively referred to as a device or unit, for distinction. Such an arrangement is merely for the purpose of more clearly describing the technology of this embodiment, and does not set any limit to the technology of the present disclosure.

This embodiment is illustrated with one component virtualized into four virtual components unless otherwise emphasized, but the present disclosure does not limit the number of virtual components.

Before the virtualization is not run, the user space 102 is controlled by the hardware monitor tool 108 to obtain information of the system-on-chip 106 through a call interface. The hardware monitor tool 108 may not only collect information of the soc 106, but also obtain overhead of upper layer software to resources of the soc 106 in real time, and grasp detailed information and status of the current soc 106 for a user in real time, where the detailed information and status may be: the hardware device model, the firmware version number, the drive version number, the device utilization rate, the overhead state of the storage device, the board power consumption and the board peak power consumption, the peripheral component interconnect express (PCIe), and the like. The content and amount of information monitored may vary based on the version and usage scenario of the hardware monitor tool 108.

After the system starts virtualization, the operation of the user space 102 is instead taken over by the user virtual machine 110, the user virtual machine 110 is an abstraction and a simulation of the real computing environment, and the system allocates a set of data structures to manage the state of the user virtual machine 110, where the data structures include a complete set of registers, the use of physical memory, the state of virtual devices, and so on. The physical space of the user space 102 in this embodiment is virtualized into four

virtual spaces

112, 114, 116, and 118, the four

virtual spaces

112, 114, 116, and 118 are independent from each other, and can respectively carry different guest operating systems, such as guest operating system 1, guest operating system 2, guest operating system 3, and guest operating system 4 shown in the figure, where the guest operating systems may be Windows, Linux, Unix, iOS, android, and the like, and each guest operating system runs different application programs.

In this embodiment, the user virtual machine 110 is implemented with a Quick Emulator (QEMU). QEMU is an open source virtualization software written in C language that virtualizes the interfaces through dynamic binary translation and provides a series of hardware models that allow guest operating systems 1, 2, 3, 4 to think they are accessing the system-on-chip 106 directly. The user space 102 includes processors, memory, I/O devices, etc., and the QEMU may virtualize the processors of the user space 102 into four virtual processors, and memory into four virtual memories, as well as virtualize the I/O devices into four virtual I/O devices. Each guest operating system occupies a portion of the user space 102, e.g., one-fourth, i.e., has access to a virtual processor, a virtual memory, and a virtual I/O device, respectively, to perform the tasks of the guest operating system. In this mode, the guest operating systems 1, 2, 3 and 4 can operate independently.

Kernel space 104 carries kernel virtual machine 120 and chip driver 122. The kernel virtual machine 120, in conjunction with the QEMU, is primarily responsible for virtualizing the kernel space 104 and the system-on-chip 106, so that each guest operating system can obtain its own address space when accessing the system-on-chip 106. In more detail, the space on the system-on-chip 106 that maps to the guest operating system is actually a virtual component that maps to this process.

From the perspective of the user virtual machine 110, during the running period of the virtual machine, the QEMU performs kernel setting through a system call interface provided by the kernel virtual machine 120, and the QEMU uses the virtualization function of the kernel virtual machine 120 to provide hardware virtualization acceleration for the own virtual machine so as to improve the performance of the virtual machine. From the perspective of kernel virtual machine 120, when a user cannot directly interact with kernel space 104, a management tool via user space 102 is required, and therefore a tool operating in user space 102 via QEMU is required.

The chip driver 122 is used to drive a Physical Function (PF) 126, and during the operation of the virtual machine, the user space 102 does not access the system-on-chip 106 through the chip driver 122 by the hardware monitor tool 108, so that the guest os 1, the guest os 2, the guest os 3, and the guest os 4 are respectively configured with a kernel space 124 for loading the chip driver 122, so that each guest os can still drive the system-on-chip 106.

The system-on-chip 106 performs virtualization through single root I/O virtualization (SR-IOV) technology, and more particularly, the SR-IOV technology may enable virtualization of various components of the system-on-chip 106. SR-IOV technology is a hardware-based virtualization solution that allows efficient sharing of PCIe resources among virtual machines, and enables a single PCIe resource to be shared by multiple virtual components of the system-on-chip 106, providing dedicated resources for these virtual components. Thus, each virtual component has its own corresponding uniquely accessible resource.

The system-on-chip 106 of this embodiment includes hardware and firmware. The hardware includes read only memory ROM (not shown) for storing firmware including physical functions 126 for supporting or coordinating PCIe functions of the SR-IOV, the physical functions 126 having the authority to fully configure PCIe resources. In implementing the SR-IOV technique, the physical functions 126 virtualize a plurality of Virtual Functions (VFs) 128, in this embodiment four virtual functions 128. The virtual function 128 is a lightweight PCIe function, managed by the physical function 126, that may share PCIe physical resources with the physical function 126 and with other virtual functions 128 associated with the same physical function 126. The virtual function 128 only allows control of the resources that the physical function 126 configures to itself.

Once SR-IOV is enabled in a physical function 126, each virtual function 128 may access its own PCIe configuration space through its own bus, device and function number. Each virtual function 128 has a memory space for mapping its register set. The virtual function 128 driver operates on the register set to enable its functionality and is directly assigned to the corresponding user virtual machine 110. Although virtual, the user virtual machine 110 is made to consider the PCIe device as actually existing.

The hardware of the system-on-chip 106 also includes a computing device 130, a video codec device 132, a JPEG codec device 134, a storage device 136, and PCIe 138. In this embodiment, the computing device 130 is an Intelligent Processing Unit (IPU) for performing convolution calculation of the neural network; the video codec device 132 is used for coding and decoding video data; the JPEG codec device 134 is used for encoding and decoding a still picture using the JPEG algorithm; the memory device 136 may be a Dynamic Random Access Memory (DRAM) for storing data; PCIe 138 is the aforementioned PCIe, during the virtual machine operation, PCIe 138 is virtualized into four virtual interfaces 140, and virtual functions 128 and virtual interfaces 140 are in a one-to-one correspondence, that is, a first virtual function interfaces to a first virtual interface, a second virtual function interfaces to a second virtual interface, and so on.

With SR-IOV technology, computing device 130 is virtualized as four virtual computing devices 142, video codec device 132 is virtualized as four virtual video codec devices 144, JPEG codec device 134 is virtualized as four virtual JPEG codec devices 146, storage device 136 is virtualized as four virtual storage devices 148.

Each guest operating system is configured with a set of virtual suites, each set of virtual suites including a user virtual machine 110, a virtual interface 140, a virtual function 128, a virtual computing device 142, a virtual video codec device 144, a virtual JPEG codec device 146, and a virtual storage device 148. Each set of virtual suites runs independently without affecting each other, and is used for executing tasks delivered by the corresponding guest operating systems, so as to determine that each guest operating system can access the configured virtual computing device 142, the virtual video codec device 144, the virtual JPEG codec device 146 and the virtual storage device 148 through the configured virtual interface 140 and the virtual function 128.

In more detail, each guest operating system responds to different tasks when executing the tasks, and hardware required to be accessed may also be different, for example: if a task is to perform a matrix convolution calculation, the guest operating system accesses the configured virtual compute device 142 through the configured virtual interface 140 and virtual function 128; if a task is video codec, the guest operating system accesses the configured virtual video codec device 144 through the configured virtual interface 140 and virtual function 128; if a task is JPEG encoding and decoding, the guest OS accesses the configured virtual JPEG codec device 146 via the configured virtual interface 140 and virtual function 128; if a task is to read or write data, the guest operating system accesses the configured virtual storage device 148 through the configured virtual interface 140 and virtual function 128.

Fig. 2 shows a schematic diagram of the internal structure of the computing apparatus 130. The computing device 130 has sixteen processing unit cores (processing unit core 0 to processing unit core 15) in total for executing the matrix computing task, and each four processing unit cores form a processing unit group, i.e., a cluster (cluster). In more detail, processing unit core 0 through processing unit core 3 form a first cluster 202, processing unit core 4 through processing unit core 7 form a second cluster 204, processing unit core 8 through processing unit core 11 form a third cluster 206, and processing unit core 12 through processing unit core 15 form a fourth cluster 208. The computing device 130 basically performs computing tasks in units of clusters.

Computing device 130 also includes a memory unit core 210 and a shared memory unit 212. The memory cell core 210 is mainly used for controlling data exchange, and is used as a channel for the computing device 130 to communicate with the storage device 136. The shared memory unit 212 is used for temporarily storing the calculated intermediate values of the

clusters

202, 204, 206, 208. During the virtualization operation, the memory unit core 210 will be split into four virtual memory unit cores, and the shared memory unit 212 will also be split into four virtual shared memory units.

Each virtual compute device 142 is configured with a virtual storage unit core, a virtual shared storage unit, and a cluster, respectively, to support the tasks of a particular guest operating system. Similarly, each of the virtual computing devices 142 operates independently and does not affect each other.

The computing device 130 can flexibly allocate the virtual components in units of clusters according to the number of the virtual components and the required resources. Fig. 3 is a diagram illustrating a possible allocation. The first exemplary allocation 302 is the case where there are four virtual computing devices 142, and each virtual computing device 142 is configured as a cluster. The second exemplary allocation 304 is for three virtual computing devices 142, and assumes that the first virtual computing device needs to use more hardware resources, and therefore two clusters are configured, while the second virtual computing device and the third virtual computing device are configured with one cluster respectively. The third exemplary allocation 306 is for the case where two virtual compute devices 142 share cluster resources equally, i.e., each configures two clusters. The fourth exemplary allocation 308 is also the case with two virtual computing devices 142, but the first virtual computing device needs to use more hardware resources, so that three clusters are configured, and the second virtual computing device configures one cluster.

The number of clusters of computing device 130 should be at least the same as the number of virtual computing devices 142 to ensure that each virtual computing device 142 can configure one cluster, and when the number of clusters is greater than the number of virtual computing devices 142, the clusters can be appropriately configured to the virtual computing devices 142 according to actual needs to increase flexibility of hardware configuration.

The video codec device 132 of this embodiment includes six video codec units. The video codec device 132 can flexibly allocate the video codec units according to the number of virtual components and the required resources. For example: the video codec device 132 is virtualized into four virtual video codec devices 144, and assuming that the first virtual video codec device and the second virtual video codec device require more video codec resources, two video codec units may be respectively configured for the first virtual video codec device and the second virtual video codec device, and one video codec unit may be respectively configured for the other virtual video codec devices 144. Another example is: the video codec device 132 is virtualized into three virtual video codec devices 144, and under the condition that any one of the virtual video codec devices does not need more video codec resources, two video codec units can be respectively configured for each virtual video codec device 144.

The number of the video codec units should be at least the same as the number of the virtual video codec devices 144, so as to ensure that each virtual video codec device 144 can configure one video codec unit, and when the number of the video codec units is greater than the number of the virtual video codec devices 144, the video codec units can be properly configured to the virtual video codec devices 144 according to actual requirements, so as to increase flexibility of hardware configuration.

Likewise, the JPEG encoding and decoding device 134 of this embodiment comprises six JPEG encoding and decoding units. The JPEG codec device 134 can flexibly allocate JPEG codec units according to the number of virtual components and the required resources, and the allocation method is the same as that of the video codec device 132, so that the details are not repeated.

The storage device 136 may adopt a non-uniform memory access (NUMA) architecture, and includes a plurality of DDR channels, and the storage device 136 may flexibly allocate the DDR channels according to the number of virtual components and required resources, and the allocation manner of the DDR channels is the same as that of the computing device 130, the video codec device 132, and the JPEG codec device 134, and thus is not described herein again.

The foregoing embodiments are configured on the premise that all the components of the system are divided into the same number of virtual components, and in some special scenarios, the number of virtual components of each component may be inconsistent.

Another embodiment of the present disclosure also employs the framework shown in fig. 1, which differs from the previous embodiment in that PCIe 138 is virtualized as six virtual interfaces 140, and the other components maintain four virtual components. In this embodiment, the system will base the virtualization operation on the least number of virtual components among all the components, i.e., four virtual components. In this case, the PCIe 138 will have two idle virtual interfaces 140, and the QEMU may choose to shut down or not configure the two idle virtual interfaces 140, or to include the two virtual interfaces 140 in a virtualization operation, such as: the first virtual function and the second virtual function require more interface resources, and the QEMU may configure two virtual interfaces 140 for the first virtual function and the second virtual function, respectively, and configure one virtual interface for the other virtual functions, respectively.

Another embodiment of the present disclosure is a board card including the frame shown in fig. 1. As shown in fig. 4, the board 400 includes at least one chip 402 (only two shown), a memory device 404, an interface device 406, and a control device 408.

The chip 402 of fig. 4 integrates the computing device 130, the video codec device 132, the JPEG codec device 143, and the like of fig. 1, and can be in different operating states such as a multi-load state and a light load state. The control device 408 can control the operating states of the computing device 130, the video codec device 132, and the JPEG codec device 143 in the chip 402.

Memory device 404 is coupled to chip 402 via bus 414 for storing data. Memory device 404 is memory device 136 of FIG. 1 and, as previously mentioned, includes multiple sets of DDR channels 410. Each set of DDR channels 410 may include a plurality of DDR4 particles (chips). Each group of DDR channels 410 is coupled to chip 402 via bus 414.

The interface device 406, i.e., PCIe 138 of fig. 1, is electrically connected to the chip 402 in the chip package structure. Interface device 406 is used to enable data transfer between chip 402 and external device 412 (i.e., user space 102). The task to be processed is transferred to chip 402 by external device 412 through the PCIe interface, and data transfer is implemented. The interface device 406 may also be another interface, and the present disclosure does not limit the specific representation form of the interface, and may implement the virtualization function. In addition, the calculation result of the chip 402 is still transmitted back to the external device 412 by the interface device 406.

Control device 408, i.e., kernel space 104 of FIG. 1, is electrically connected to chip 402 for monitoring the status of chip 402. Specifically, chip 402 and control device 408 may be electrically connected through an SPI interface. The control device 408 may include a Micro Controller Unit (MCU) and store the kernel virtual machine 120 and the chip driver 122.

The present disclosure also discloses an electronic device or apparatus, which includes the above board card 400. According to different application scenarios, the electronic device or apparatus may include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet computer, a smart terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

Another embodiment of the present disclosure is a method of virtualization, and more particularly, a method of virtualization for the framework 100 of fig. 1, which is applied to a machine learning device of a neural network, which may be a convolutional neural network accelerator.

Before being virtualized, the user space 102 is controlled by the hardware monitor tool 108, and obtains information of the system-on-chip 106 through a call interface. The hardware monitor tool 108 may not only collect information of the system-on-chip 106, but also obtain overhead of upper-layer software on resources of the system-on-chip 106 in real time, and display detailed information and status of the current system-on-chip 106 in real time for the user space 102.

FIG. 5 illustrates a flow diagram for virtualizing user space 102.

During virtual execution, the user space 102 is instead hosted by the user virtual machine 110, the user virtual machine 110 is an abstraction and a simulation of the real computing environment, and the system allocates a set of data structures to manage the state of the user virtual machine 110, the data structures including a complete set of registers, the usage of physical memory, the state of virtual devices, and so on. User virtual machine 110 is implemented with QEMU. User space 102 includes processors, memory, I/O devices, and the like.

In executing step 502, the QEMU virtualizes the processors of user space 102 to produce four virtual processors. In executing step 504, the QEMU virtualizes the memory of the user space 102 to produce four virtual memories. In performing step 506, the QEMU virtualizes the I/O devices of the user space 102 to produce four virtual I/O devices.

In performing step 508, the QEMU individually configures one of the virtual processors to each guest operating system. In performing step 510, the QEMU individually configures one of these virtual memories to each guest operating system. In performing step 512, the QEMU individually configures one of the virtual I/O devices to each guest operating system.

After the foregoing steps are performed, each guest operating system occupies a portion of the resources at the user space 102 end, for example, one fourth of the resources. More specifically, each guest operating system has access to a virtual processor, a virtual memory, and a virtual I/O device, respectively, to perform the tasks of the guest operating system. In this mode, the guest operating systems 1, 2, 3 and 4 can operate independently.

The firmware of the system-on-chip 106 includes physical functions 126, and the hardware includes a computing device 130, a video codec device 132, a JPEG codec device 134, a storage device 136, and PCIe 138. In the SR-IOV environment, the virtualization of the system-on-chip 106 is realized based on the SR-IOV technology, and the flow is shown in FIG. 6.

In performing step 602, the PCIe 138 is virtualized to produce at least four virtual interfaces 140. In execution of step 604, the physical function 126 is virtualized to produce four virtual functions 128. In performing step 606, computing device 130 is virtualized to produce four virtual computing devices 142.

In step 608, the video codec device 132 is virtualized to generate four virtual video codec devices 144. In more detail, the video codec device 132 of this embodiment includes six video codec units, one of which is configured for each virtual video codec device 144. The video codec device 132 can flexibly allocate the video codec units according to the number of virtual components and the required resources. The number of the video codec units should be at least the same as the number of the virtual video codec devices 144, so as to ensure that each virtual video codec device 144 can configure one video codec unit, and when the number of the video codec units is greater than the number of the virtual video codec devices 144, the video codec units can be properly configured to the virtual video codec devices 144 according to actual requirements, so as to increase flexibility of hardware configuration.

Next, in step 610, the JPEG codec device 134 is virtualized to generate four virtual JPEG codec devices 146. In more detail, the JPEG codec device 134 of this embodiment includes six JPEG codec units, one of which is allocated to each virtual JPEG codec device 146. The JPEG codec device 134 can flexibly allocate the JPEG codec units according to the number of the virtual components and the required resources. The number of JPEG codec units should be at least the same as the number of virtual JPEG codec devices 146, so as to ensure that each virtual JPEG codec device 146 can configure one JPEG codec unit, and when the number of JPEG codec units is greater than the number of virtual JPEG codec devices 146, the JPEG codec units can be properly configured to the virtual JPEG codec devices 146 according to actual requirements, so as to increase the flexibility of hardware configuration.

In performing step 612, the storage 136 is virtualized to produce four virtual storage 148. In this step, at least one of the DDR channels in memory device 136 is allocated to each virtual memory device. Similarly, the storage device 136 can be flexibly configured in units of DDR channels according to the number of virtual components and the required resources.

In performing step 614, one of the virtual interfaces 140 is configured for each guest operating system. In performing step 616, one of the virtual functions 128 is configured for each guest operating system. In performing step 618, one of the virtual computing devices 142 is configured for each guest operating system. In performing step 620, one of the virtual video codecs 144 is configured for each guest operating system. In performing step 622, one of the virtual JPEG codec devices 146 is configured for each guest operating system. In performing step 624, one of the virtual storage devices 148 is configured for each guest operating system.

After the steps of fig. 5 and 6 are performed, each guest os configures a set of virtual suites, each set of virtual suites including a processor, a memory, a user vm 110, a virtual interface 140, a virtual function 128, a virtual computing device 142, a virtual video codec device 144, a virtual JPEG codec device 146, and a virtual storage device 148. Each set of virtual suites runs independently without affecting each other, and is used for executing tasks delivered by the corresponding guest operating systems, so as to determine that each guest operating system can access the configured virtual computing device 142, the virtual video codec device 144, the virtual JPEG codec device 146 and the virtual storage device 148 through the configured virtual interface 140 and the virtual function 128.

FIG. 7 illustrates a flow diagram of the computing device 130 in performing virtualization based on SR-IOV techniques. The computing device 130 has sixteen processing unit cores (processing unit core 0 to processing unit core 15) in total for performing the matrix computation task.

In executing step 702, every four processing unit cores are grouped into a cluster, and the computing device 130 basically executes the computing task in units of clusters.

Computing device 130 also includes a memory unit core 210 and a shared memory unit 212. In performing step 704, the storage unit core 210 is virtualized to produce four virtual storage unit cores. In step 706, the shared memory unit 212 is virtualized to produce four virtual shared memory units. In performing step 708, one of the clusters is configured to virtual computing device 142. In performing step 710, one of the virtual memory unit cores is configured to virtual compute device 142. In performing step 712, one of the virtual shared memory units is configured to the virtual computing device 142.

In more detail, in step 708, the computing apparatus 130 can flexibly allocate the virtual components in units of clusters according to the number of the virtual components and the required resources. The number of clusters of computing device 130 should be at least the same as the number of virtual computing devices 142 to ensure that each virtual computing device 142 can configure one cluster, and when the number of clusters is greater than the number of virtual computing devices 142, the clusters can be appropriately configured to the virtual computing devices 142 according to actual needs to increase flexibility of hardware configuration.

After the steps of fig. 7 are performed, each virtual computing device 142 is configured with a virtual storage unit core, a virtual shared storage unit and a cluster to support the computing tasks of a specific guest operating system. Similarly, each of the virtual computing devices 142 operates independently and does not affect each other.

Another embodiment of the present disclosure is a computer-readable storage medium, on which a computer program code for virtualization based on a machine learning device is stored, and when the computer program code is executed by a processor, the method of the foregoing embodiments, such as the solutions shown in fig. 5, fig. 6, and fig. 7, may be executed.

The virtualization technology disclosed by the invention is based on SR-IOV, and the resources of the system on chip are divided into a plurality of virtual components for a plurality of virtual machines in user space to use simultaneously. The technology completely divides hardware resources (computing resources, storage resources and the like) and is not a sharing mechanism based on time slices, so that the problems of service quality, queue head blockage and the like caused by a time slice scheduling mode are solved, and excellent resource sharing, parallelism, isolation, configuration flexibility and safety are provided.

The foregoing may be better understood in light of the following clauses:

clause a1, a system-on-a-chip, comprising: virtual computing means for performing convolution calculations of the neural network; the virtual video coding and decoding device is used for coding and decoding videos; the virtual JPEG coding and decoding device is used for carrying out JPEG coding and decoding; and a virtual storage device for storing data.

Clause a2, the system-on-chip of clause a1, further comprising a virtual interface and a virtual function through which a guest operating system accesses the virtual computing device, the virtual video codec device, the virtual JPEG codec device, and the virtual storage device.

Clause A3, the system-on-chip of clause a1, wherein the virtual computing device comprises at least one cluster.

Clause a4, the system-on-chip of clause A3, wherein the virtual computing device comprises: and the virtual shared storage unit is used for temporarily storing the calculation intermediate value of the cluster.

Clause a5, the system-on-chip of clause a1, wherein the virtual computing device comprises: and the virtual storage unit core is used for controlling data exchange.

Clause a6, the system-on-chip of clause a1, wherein the virtual video codec device comprises at least one video codec unit.

Clause a7, the system-on-chip of clause a1, wherein the virtual JPEG codec device comprises at least one JPEG codec unit.

Clause A8, the system on a chip of clause a1, wherein the virtual store comprises at least one DDR channel.

Clause a9, a board comprising the system-on-chip of any of clauses a 1-8.

Claims

1. A system on a chip, comprising:

virtual computing means for performing convolution calculations of the neural network;

the virtual video coding and decoding device is used for coding and decoding videos;

the virtual JPEG coding and decoding device is used for carrying out JPEG coding and decoding; and

the virtual storage device is used for storing data.

2. The system on a chip of claim 1, further comprising a virtual interface and a virtual function through which a guest operating system accesses the virtual computing device, the virtual video codec device, the virtual JPEG codec device, and the virtual storage device.

3. The system on a chip of claim 1, wherein the virtual compute device comprises at least one cluster.

4. The system on a chip of claim 3, wherein the virtual computing device comprises:

and the virtual shared storage unit is used for temporarily storing the calculation intermediate value of the cluster.

5. The system on a chip of claim 1, wherein the virtual computing device comprises:

and the virtual storage unit core is used for controlling data exchange.

6. The system on chip of claim 1, wherein the virtual video codec device comprises at least one video codec unit.

7. The system-on-chip as recited in claim 1, wherein the virtual JPEG codec device comprises at least one JPEG codec unit.

8. The system on a chip of claim 1, wherein the virtual storage comprises at least one DDR channel.

9. A board comprising the system on chip of any of claims 1-8.