CN117112165A

CN117112165A - Virtual reality application task processing method and device and virtual reality system

Info

Publication number: CN117112165A
Application number: CN202311012427.9A
Authority: CN
Inventors: 韦茜; 崔恩放; 刘荣凯; 林显成; 黄志兰
Original assignee: China Telecom Technology Innovation Center; China Telecom Corp Ltd
Current assignee: China Telecom Technology Innovation Center; China Telecom Corp Ltd
Priority date: 2023-08-11
Filing date: 2023-08-11
Publication date: 2023-11-24

Abstract

The application relates to a processing method and device of a virtual reality application task and a virtual reality system. The method comprises the following steps: dividing tasks to be processed of the virtual reality application into different types of computing subtasks under the condition that computing instructions of the virtual reality application are read; adopting a computing accelerator matched with the type of each computing subtask to process each computing subtask in parallel, and obtaining the processing result of each computing subtask; dividing the processing results of each calculation subtask into vertex data and texture data corresponding to the vertex data, carrying out operation processing on the vertex data, and carrying out fusion processing on the operation processing results of the vertex data and the texture data corresponding to the vertex data to obtain the graph rendering results of the task to be processed. By adopting the method, the running efficiency of the virtual reality application task can be improved.

Description

Virtual reality application task processing method and device and virtual reality system

Technical Field

The present application relates to the field of virtual reality technologies, and in particular, to a method and an apparatus for processing a virtual reality application task, and a virtual reality system.

Background

The meta-universe is an immersive virtual space simulated by a computer, and is interaction between a person, a virtual object and a digital asset. The rise of the meta-universe has raised a great deal of attention in the internet industry, and has also driven the development of Virtual Reality (VR) and augmented Reality (Augmented Reality, AR) technologies, increasing the demands of people for VR/AR devices.

In the conventional art, VR/AR devices are composed of a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphic Processing Unit, GPU), a display, a memory, a communication module, and the like.

However, the CPU-specific computing power is insufficient to efficiently handle the high-intensive computing tasks of VR/AR applications. The GPU has higher power consumption and higher chip cost in the computation. In addition, the available System on Chip (SoC) area, computing and memory resources of VR/AR devices are very limited, subject to device size and battery size constraints. The above factors result in limited computing power of current VR/AR devices, resulting in lower operating efficiency of VR/AR application tasks.

Disclosure of Invention

The embodiment of the application provides a processing method, a processing device and a virtual reality system for a virtual reality application task, which can improve the running efficiency of the virtual reality application task.

A method of processing a virtual reality application task, the method comprising:

dividing tasks to be processed of the virtual reality application into different types of computing subtasks under the condition that computing instructions of the virtual reality application are read;

adopting a computing accelerator matched with the type of each computing subtask to process each computing subtask in parallel, and obtaining the processing result of each computing subtask;

dividing the processing results of each calculation subtask into vertex data and texture data corresponding to the vertex data, carrying out operation processing on the vertex data, and carrying out fusion processing on the operation processing results of the vertex data and the texture data corresponding to the vertex data to obtain the graph rendering results of the task to be processed.

In one possible implementation, the method further includes:

and under the condition that a plurality of tasks to be processed exist in the virtual reality application, scheduling the computing subtasks of the tasks to be processed according to the number of the tasks to be processed of the virtual reality application and the priority of each task to be processed.

In one possible implementation, the method further includes:

And determining the priority of the task to be processed according to one or more of the data size of the task to be processed, the arrival sequence of the task to be processed, the delay tolerance of the task to be processed and the time for the task to be processed to be sent to the processor.

In one possible implementation, the type of the computing subtask is determined according to the computing requirements of the computing subtask.

In one possible implementation, the types of computing subtasks include vector computing types, artificial intelligence computing types, and convolutional neural network computing types.

In one possible implementation, the scheduling of the computing sub-tasks of the plurality of tasks to be processed is based on an asynchronous computing instruction set, the parallel processing is based on a parallel computing instruction set, the vertex data operations are based on a graphics rendering instruction set, and the asynchronous computing instruction set, the parallel computing instruction set, and the graphics rendering instruction set are RISC-V extended instruction sets.

In one possible implementation manner, the dividing the processing result of each computing subtask into vertex data and texture data corresponding to the vertex data, performing operation processing on the vertex data, and performing fusion processing on the operation processing result of the vertex data and the texture data corresponding to the vertex data to obtain a graphics rendering result of the task to be processed includes:

Inputting the processing results of the calculation subtasks into a multiplexer, and outputting vertex data and texture data of each calculation subtask;

inputting the vertex data of each calculation subtask into a programmable vertex engine for operation processing to obtain transformed and lightened vertex data;

and inputting the transformed and lightened vertex data and the texture data into a fixed pipeline slicing engine for fusion processing to obtain a graph rendering result of the task to be processed.

In one possible implementation, the method further includes:

reading instructions from the data storage module;

and decoupling the instruction to obtain a memory access instruction and the calculation instruction.

In one possible implementation, the method further includes:

and acquiring man-machine interaction data and storing the man-machine interaction data into the data storage module, wherein the man-machine interaction data corresponds to one or more tasks to be processed.

A processing apparatus for virtual reality application tasks, comprising:

the task dividing module is used for dividing the task to be processed of the virtual reality application into different types of computing subtasks under the condition that the computing instruction of the virtual reality application is read;

The parallel processing module is used for carrying out parallel processing on each calculation subtask by adopting a calculation accelerator matched with the type of each calculation subtask to obtain the processing result of each calculation subtask;

and the fusion module is used for dividing the processing result of each calculation subtask into vertex data and texture data corresponding to the vertex data, carrying out operation processing on the vertex data, and carrying out fusion processing on the operation processing result of the vertex data and the texture data corresponding to the vertex data to obtain the graph rendering result of the task to be processed.

In one possible implementation, the apparatus further includes:

and the task scheduling module is used for scheduling the computing subtasks of the plurality of tasks to be processed according to the number of the tasks to be processed of the virtual reality application and the priority of each task to be processed under the condition that the plurality of tasks to be processed exist in the virtual reality application.

In one possible implementation, the apparatus further includes:

the priority determining module is used for determining the priority of the task to be processed according to one or more of the data size of the task to be processed, the arrival sequence of the task to be processed, the delay tolerance of the task to be processed and the time of sending the task to be processed to the processor.

In one possible implementation, the fusion module is further configured to:

In one possible implementation, the apparatus further includes:

the instruction reading module is used for reading the instruction from the data storage module;

and the instruction decoupling module is used for decoupling the instruction to obtain a memory access instruction and the calculation instruction.

In one possible implementation, the apparatus further includes:

the data acquisition module is used for acquiring man-machine interaction data and storing the man-machine interaction data into the data storage module, wherein the man-machine interaction data corresponds to one or more tasks to be processed.

A virtual reality system comprises a data storage module, a processor, a calculation acceleration module, a graph rendering module and a display module;

the data storage module is used for storing instructions of the virtual reality application, human-computer interaction data of a task to be processed of the virtual reality application and graphic rendering results;

the processor is used for dividing the task to be processed of the virtual reality application into different types of computing subtasks under the condition that the instruction read from the data storage module is a computing instruction; transmitting each subtask to a computing accelerator matched with the type of each computing subtask; and sending the processing results of the calculation subtasks returned by the calculation accelerator to the graphic rendering module;

The computing accelerator is used for carrying out parallel processing on the received computing subtasks to obtain a processing result of the computing subtasks, and sending the processing result of the computing subtasks to the processor;

the graphics rendering module is used for dividing the processing results of the computing subtasks into vertex data and texture data corresponding to the vertex data, carrying out operation processing on the vertex data, and carrying out fusion processing on the operation processing results of the vertex data and the texture data corresponding to the vertex data to obtain the graphics rendering results of the tasks to be processed;

the display module is used for displaying the graphic rendering result.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

A computer program product, including a computer program, which when executed by a processor implements a method for processing a virtual reality application task provided by an embodiment of the present application, where the method may be:

According to the processing method and device for the virtual reality application task and the virtual reality system, firstly, the task is divided into a plurality of computing subtasks in a finer granularity, and the computing subtasks are processed in parallel, so that the computing efficiency of the computing subtasks is improved; meanwhile, the computing subtasks are processed by adopting a computing accelerator matched with the type of the computing subtasks, so that the computing efficiency of the subtasks is further improved while the requirement on the computing capacity of a processor is reduced; then, through splitting the vertex data, the graphics rendering efficiency is improved. Therefore, the processing method, the processing device and the virtual reality system for the virtual reality application task effectively improve the operation efficiency of the virtual reality application task.

Drawings

Fig. 1 is a schematic diagram of a virtual reality system according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a parallel computing instruction set according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an asynchronous computing instruction set according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a graphics rendering instruction set according to an embodiment of the present application;

FIG. 5 is a block diagram illustrating a graphics rendering module according to an embodiment of the present application;

Fig. 6 is a flow chart of a processing method of a virtual reality application task according to an embodiment of the present application;

fig. 7 is a flow chart of a method for processing a virtual reality application task according to an embodiment of the present application;

fig. 8 is a block diagram of a processing device for a virtual reality application task according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a virtual reality device according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of a virtual reality system according to an embodiment of the present application. The processing method of the virtual reality application task provided by the embodiment of the application can be applied to the system. It should be noted that, the virtual reality system according to the embodiment of the present application includes both a VR system and an AR system, which may also be referred to as a VR/AR system. Similarly, the virtual reality device involved in the embodiment of the present application may also be referred to as VR/AR device, and the virtual reality application may also be referred to as VR/AR application. As shown in fig. 1, the virtual reality system may be divided into a system software layer and a system hardware layer.

As shown in fig. 1, the system software layer may include a priority determination module, a task management module, and a device driver module.

Different task execution sequences can influence the operation efficiency of the system, and the urgency and timeliness of the tasks can influence the cooperation efficiency of software and hardware. Thus, tasks may be scheduled based on their priorities. The priority determination module may make the priority determination based on one or more of a data size of the tasks, an order of arrival of the tasks, a latency tolerance of the tasks, and a time at which the tasks are sent to the processor. For example, the greater the amount of data for a task, the earlier the order of arrival of the tasks, the less delay tolerance for the task, and the earlier the time for the task to be sent to the processor, the priority determination module may determine that the higher the priority for the task. The task management module can perform task scheduling and management according to the current task number and the priority of each task, and controls the task scheduling module in the system hardware layer. The device driver module may be used for drivers of the communication sensor module, the display module, the computing accelerator, and the graphics rendering module in the system hardware layer.

As shown in fig. 1, the system hardware layer includes: the system comprises a communication sensing module, a data storage module, a processor, a computing accelerator, a graphic rendering module and a display module.

The communication sensing module is used for acquiring man-machine interaction data and storing the man-machine interaction data into the data storage module. The human-machine interaction data may be motion data (e.g., transport data generated by a user turning his/her body, walking, lifting his/her hand, or twisting his/her head) or audio data (e.g., voice commands issued by the user). The communication sensing module can realize communication between a user and the virtual reality system.

The data storage module is used for storing man-machine interaction data, instructions of virtual reality application tasks, graphic rendering results and other data. The man-machine interaction data can trigger tasks of the virtual reality application, and the number of the triggered tasks can be one or a plurality of. The processor may read instructions from the data storage module for task processing. The graphics rendering module may store the output graphics rendering result in the data storage module. The display module may read the graphics rendering results from the data storage module and display them. The data storage module can be connected with the communication sensing module, the processor, the graphic rendering module, the display module and other modules through a PCIe (Peripheral Component Interconnect express) bus. The current data storage module may also be connected to other modules through other types of buses, which is not limited to the embodiments of the present application.

The processor may be configured to divide a task to be processed of the virtual reality application into different types of computing subtasks in a case where the instruction read from the data storage module is a computing instruction; each computing sub-task is sent to a computing accelerator that matches the type of each computing sub-task.

The processor is the core of the VR/AR system hardware layer and is designed based on an open source RISC-V instruction set architecture. The processor provided by the embodiment of the application is a processor integrated with RISC-V architecture, has a powerful image processing function, and is a control chip which best meets the requirements of virtual reality products. For example, the dynamic frame rate switching function can be switched from 1Hz to 240Hz directly without additional transition time, corresponding to a fast game scene; the high dynamic range image (High Dynamic Range, HDR) functionality may provide a user with a broader color gamut and detail experience, etc. The open source instruction set RISC-V is free to be used for any purpose compared to other instruction sets, allowing anyone to design, manufacture, and sell RISC-V chips and software.

As shown in fig. 1, the processor includes an instruction operation module, a task scheduling module, a cache region, and extends three instruction sets.

The instruction manipulation module may be used for instruction decoupling and instruction mapping. After the processor reads the instruction, the instruction operation module decouples the instruction into a memory access instruction and a calculation instruction, maps the memory access instruction to the data storage module, and maps the calculation instruction to the calculation accelerator. The memory access instruction refers to an instruction for accessing a memory, and includes an instruction for storing data in the memory and an instruction for acquiring data from the memory. The calculation instructions represent instructions to process tasks of the virtual reality application.

The task scheduling module is controlled by a task management module of the system software layer, and multi-task scheduling is operated according to the number of tasks and the priority of each task.

The cache region may be used to cache data and instructions used in the processor's computation process.

The extended instruction sets include a parallel computing instruction set, an asynchronous computing instruction set, and a graphics rendering instruction set. In the embodiment of the application, the instruction set is customized in an instruction expansion mode, so that the execution efficiency of the task is improved and the calculation performance is improved. Of course, the above is merely exemplary illustration of expanding instruction sets, and is not intended to limit the instruction sets expanded by embodiments of the present application, and other instruction sets may be expanded by embodiments of the present application. The three instruction sets are described in detail below.

Since VR/AR application tasks can be divided into multiple computing sub-tasks with finer granularity, each computing sub-task can be computed in parallel to speed up computing efficiency. The parallel instruction set includes instructions for performing parallel computations, which may be referred to as parallel computing instructions.

FIG. 2 is a schematic diagram of a parallel computing instruction set according to an embodiment of the present application. As shown in fig. 2, the parallel computing instruction set includes 4 parallel computing instructions, namely, a parallel add computing instruction (PACADD instruction), a parallel multiply computing instruction (PACMUL instruction), a parallel store instruction (PACSTORE instruction), and a parallel load instruction (PACLOAD instruction), respectively. As shown in fig. 2, the parallel computing instruction is 32 bits.

The PACADD instruction may be used to operate on parallel additions. The 0 th bit to the 6 th bit are operation codes (opcodes), and the value is 1001010; the 7 th bit to 11 th bit are destination registers for storing calculation results; bits 12 to 14 are func3, and the value is 001; the 15 th bit to the 19 th bit are the source registers 1, the 20 th bit to the 24 th bit are the source registers 2, and are respectively used for acquiring the operation objects; bits 25 to 31 are immediate values, valued imm [6:0].

The PACMUL instruction may be used to operate parallel multiplications. The 0 th bit to the 6 th bit are operation codes (opcodes), and the value is 1001010; the 7 th bit to 11 th bit are destination registers for storing calculation results; bits 12 to 14 are func3, and the value is 001; the 15 th bit to the 19 th bit are the source registers 1, the 20 th bit to the 24 th bit are the source registers 2, and are respectively used for acquiring the operation objects; bits 25 to 31 are immediate values, valued imm [6:0].

The PACSTORE instruction may be used to store a compute subtask address to memory. The 0 th bit to the 6 th bit are operation codes (opcodes), and the value is 1001010; 7 th to 11 th bits are destination registers; bits 12 to 14 are func3, which takes the value 000; bits 15 to 19 are the source register 1; the 20 th bit to the 24 th bit are state registers used for identifying the completion state of the calculation subtasks so as to integrate the task results of the next step; bits 25 to 31 are func7, and the value 0000001.

The PACLOAD instruction may be used to load a compute subtask address from memory. The 0 th bit to the 6 th bit are operation codes (opcodes), and the value is 1001010; bits 7 to 11 are destination registers. Bits 12 to 14 are func3, which takes the value 000; bits 15 to 19 are the source register 1; the 20 th bit to the 24 th bit are state registers used for identifying the completion state of the calculation subtasks so as to integrate the task results of the next step; bits 25 to 31 are func7, and take on the value 0000010.

Of course, the foregoing is merely illustrative of parallel computing instructions, and the parallel computing instruction set provided in the embodiments of the present application may further include other parallel computing instructions, which is not limited to the embodiments of the present application.

When various tasks are subdivided into computing sub-tasks, hysteresis may occur on the computing accelerator due to different processing times of different application tasks, for example, the computing sub-task B on the vector computing accelerator of task 1 is completed before the computing sub-task a on the AI computing module, and the sub-task B needs to wait for the computing sub-task a to complete and then perform the next operation, where the vector computing accelerator is idle. To improve computational efficiency, embodiments of the present application introduce multi-tasking asynchronous computations with priority.

FIG. 3 is a schematic diagram of an asynchronous computing instruction set according to an embodiment of the present application. As shown in fig. 3, the asynchronous calculation instruction set includes 2 asynchronous calculation instructions, an asynchronous store instruction (ASCSTORE instruction) and an asynchronous load instruction (ASCLOAD instruction), respectively. As shown in FIG. 3, the asynchronous calculation instruction is 32 bits.

The ASCSTORE instruction may be used to store a task address to memory. The 0 th bit to the 6 th bit are operation codes (opcodes), and the value is 1001010; 7 th to 11 th bits are destination registers; bits 12 to 14 are func3, which takes the value 000; bits 15 to 19 are the source register 1. The 20 th bit to the 24 th bit are priority registers for identifying the priority state of each task; bits 25 to 31 are func7, and take on the value 0000010.

An ASCLOAD instruction may be used to load a task address from memory. The 0 th bit to the 6 th bit are operation codes (opcodes), and the value is 1001010; 7 th to 11 th bits are destination registers; bits 12 to 14 are func3, which takes the value 000; bits 15 to 19 are the source register 1. The 20 th bit to the 24 th bit are state registers for identifying the priority state of each task; bits 25 to 31 are func7, and take on the value 0000011.

Graphics rendering instruction set extensions are used to accelerate graphics processing, and more particularly vertex processing, which add support to the graphics rendering module.

FIG. 4 is a schematic diagram of a graphics rendering instruction set according to an embodiment of the present application. As shown in FIG. 4, the graphics rendering instruction set includes 5 graphics rendering instructions, which are respectively a vertex addition instruction (GREADD instruction), a vertex third party operation instruction (GREDP 3 instruction), a vertex index operation instruction (GREEXP instruction), a vertex logarithm operation instruction (GRELOG instruction), and a vertex trigonometric function operation instruction (GRECOS instruction).

The GREADD instruction may be used to accelerate vertex addition operations. The 0 th bit to the 6 th bit are operation codes (opcodes), and the value is 1001010; the 7 th bit to 11 th bit are destination registers for storing calculation results; bits 12 to 14 are func3, and the value is 011; the 15 th bit to the 19 th bit are the source registers 1, the 20 th bit to the 24 th bit are the source registers 2, and are respectively used for acquiring the operation objects; bits 25 to 31 are immediate values, valued imm [6:0].

GREDP3 instructions may be used to accelerate vertex third-order operations. The 0 th bit to the 6 th bit are operation codes (opcodes), and the value is 1001010; the 7 th bit to 11 th bit are destination registers for storing calculation results; bits 12 to 14 are func3, and the value is 100; the 15 th bit to the 19 th bit are the source registers 1, the 20 th bit to the 24 th bit are the source registers 2, and are respectively used for acquiring the operation objects; bits 25 to 31 are immediate values, valued imm [6:0].

The green xp instruction may be used to accelerate vertex index operations. The 0 th bit to the 6 th bit are operation codes (opcodes), and the value is 1001010; 7 th to 11 th bits are destination registers; bits 12 to 14 are func3, which takes the value 000; the 15 th bit to the 19 th bit are the source registers 1, the 20 th bit to the 24 th bit are the source registers 2, and are respectively used for acquiring the operation objects; bits 25 to 31 are func7, and the value is 0000100.

The GRELOG instruction may be used to accelerate vertex logarithmic operations. The 0 th bit to the 6 th bit are operation codes (opcodes), and the value is 1001010; 7 th to 11 th bits are destination registers; bits 12 to 14 are func3, which takes the value 000; the 15 th bit to the 19 th bit are the source registers 1, the 20 th bit to the 24 th bit are the source registers 2, and are respectively used for acquiring the operation objects; bits 25 to 31 are func7, and the value 0000101 is taken.

Greqos instructions may be used to accelerate vertex trigonometric function operations. The 0 th bit to the 6 th bit are operation codes (opcodes), and the value is 1001010; 7 th to 11 th bits are destination registers; bits 12 to 14 are func3, which takes the value 000; the 15 th bit to the 19 th bit are the source registers 1, the 20 th bit to the 24 th bit are the source registers 2, and are respectively used for acquiring the operation objects; bits 25 to 31 are func7, and the value is 0000110.

The computing accelerator can be used for carrying out parallel processing on the received computing subtasks to obtain processing results of the computing subtasks, and sending the processing results of the computing subtasks to the processor. The processor may be further configured to send the processing results of the respective computing subtasks returned by the computing accelerator to the graphics rendering module. In the embodiment of the application, after the task is divided into the computing sub-tasks, the computing sub-tasks can be processed in parallel by adopting a computing accelerator matched with the type of each computing sub-task. Optionally, the computing accelerators include vector computing accelerators, AI computing accelerators and convolutional neural network (Convolutional Neural Network, CNN) computing accelerators, and each computing accelerator can better adapt to the computing requirements of VR/AR application tasks, so that energy consumption overhead is reduced.

The graphics rendering module may be configured to divide the processing result of each computing subtask into vertex data and texture data corresponding to the vertex data, perform operation processing on the vertex data, and perform fusion processing on the operation processing result of the vertex data and the texture data corresponding to the vertex data to obtain a graphics rendering result of the task to be processed.

Fig. 5 is a block diagram of a graphics rendering module according to an embodiment of the present application. As shown in FIG. 5, the graphics rendering module may include a multiplexer, a programmable vertex engine, and a fixed pipeline slicing engine to accelerate graphics rendering and enhance imaging. The multiplexer may be configured to multiplex the processing results received from each of the computing subtasks to split the processing results into vertex data and texture data. The programmable vertex engine may process the vertex data to obtain transformed and lit vertex data. The programmable vertex engine may include special function units (SPUs) for the purpose of processing input vertex data. The functions of the SPU include, but are not limited to, addition, squaring, logarithmic calculation. The fixed pipeline slicing engine can process the transformed and lightened vertex data and texture data, and output a graphic rendering result after vertex assembly through a vertex assembler, rasterization through a rasterizer and color buffer mixing and Tile buffering.

As shown in fig. 1, the graphics rendering results output by the graphics rendering module may be stored in a data store. The display module may obtain a graphics rendering result from the graphics rendering module and display the graphics rendering result.

Application scenario 1: single task to be processed

Task to be processed: the virtual game accelerates.

The processing procedure of the virtual reality system is as follows: first, the communication sensing module receives man-machine interaction data and stores the man-machine interaction data in the data storage module. The instructions are mapped to the functional modules after decoupling. The virtual game application task is divided into a plurality of computing subtasks in a RISC-V processor in a fine granularity mode, wherein the computing subtasks comprise an AI computing subtask, a CNN computing subtask and the like, and a user-defined parallel computing instruction set and a graphic rendering instruction set are adopted to accelerate, and the computing demands are scheduled to a computing acceleration module. And then, the processing result of the calculation subtask enters a graphic rendering module to obtain a graphic rendering result. And finally, displaying the graph rendering result on the display module.

Application scenario 2: multiple tasks to be processed

Task to be processed: virtual game acceleration and virtual classroom acceleration.

The processing procedure of the virtual reality system is as follows: first, the communication sensing module receives man-machine interaction data and stores the man-machine interaction data in the data storage module. The instructions are mapped to the functional modules after decoupling. At the system software layer, priority judgment is performed based on the data size of tasks, the arrival sequence of the tasks, the delay tolerance of the tasks and the time of sending to the CPU. The RISC-V processor judges that a plurality of tasks to be processed exist currently for task scheduling based on the priority and the number of application tasks, divides the fine granularity of the virtual travel application tasks into a plurality of calculation subtasks, including AI calculation subtasks, CNN calculation subtasks and the like, divides the fine granularity of the virtual classroom application tasks into vector calculation subtasks, AI calculation subtasks, CNN calculation subtasks and the like, simultaneously adopts a self-defined asynchronous calculation instruction set, a parallel calculation instruction set and a graphic rendering instruction set for acceleration, and schedules calculation demands to a calculation acceleration module. And then, the processing result of the calculation subtask enters a graphic rendering module to obtain a graphic rendering result. And finally, displaying the graph rendering result on the display module.

According to the virtual reality system, firstly, the task is divided into a plurality of computing subtasks in a finer granularity, and the computing subtasks are processed in parallel, so that the computing efficiency of the computing subtasks is improved; meanwhile, the computing subtasks are processed by adopting a computing accelerator matched with the type of the computing subtasks, so that the computing efficiency of the subtasks is further improved while the requirement on the computing capacity of a processor is reduced; then, through splitting the vertex data, the graphics rendering efficiency is improved. Therefore, the running efficiency of the virtual reality application task is effectively improved.

The ARM architecture design-based processor has low flexibility, a complex instruction set, difficulty in modularized expansion, high-performance calculation, high ARM instruction set authorization cost and increased design cost. In the embodiment of the application, the RISC-V processor is designed based on an open source RISC-V instruction set architecture, so that the instruction set authorization cost is greatly reduced, and the parallel computing instruction set expansion, the asynchronous computing instruction set expansion and the graphics rendering instruction set expansion are customized, thereby effectively improving the task execution efficiency.

Furthermore, in the embodiment of the application, optimization and framework updating are respectively carried out from the software and hardware layers. And adding a priority judging module on the software layer, taking the change condition of the VR/AR application task into consideration, and judging the priority based on the data size of the task, the arrival sequence of the task, the delay tolerance of the task and the time sent to the processor so as to improve the calculation efficiency of the task. The design of the RISC-V processor and the special accelerator is realized on the hardware level, the graphic rendering module is increased, the running speed is effectively improved, and the calculation efficiency is improved.

The VR/AR device configures multiple GPU chips to perform special image, audio, vector and other calculations, and the use cost is high. In the embodiment of the application, the special computing accelerator is designed to efficiently process special computing demands, and the special computing accelerator specifically comprises a vector computing acceleration module, an AI computing acceleration module and a CNN computing acceleration module, so that the special computing accelerator can better adapt to the computing demands of VR/AR application tasks and reduce energy consumption expenditure. Meanwhile, a graphic rendering module is designed, and a programmable vertex engine and a fixed pipeline slicing engine are utilized to enhance graphic imaging effect.

In one embodiment, fig. 6 is a flowchart of a processing method of a virtual reality application task according to an embodiment of the present application, and is described by taking the application of the method to the virtual reality system in fig. 1 as an example, and includes the following steps:

in step S601, under the condition that a computing instruction of a virtual reality application is read, a task to be processed of the virtual reality application is divided into different types of computing subtasks.

After the user operates the virtual reality device through actions or audio, the virtual reality system can acquire man-machine interaction data. These human-machine interaction data may cause the virtual reality application to generate one or more tasks to be processed. The virtual reality system needs to process these tasks to be processed quickly to improve the user experience. The virtual reality system can sequentially read the instructions from the data storage module for processing. When the read instruction is a calculation instruction, the instruction indicates that calculation is required. The virtual reality system can divide the task to be processed of the virtual reality application into different types of computing subtasks to perform finer-granularity computation, so that the computation speed is increased.

The type of the computing subtask is determined according to the computing requirement of the computing subtask. In one example, the types of computing subtasks include vector computing types, artificial intelligence computing types, and convolutional neural network computing types. The data characteristics of different types of computing subtasks are different, and the computing demands are different, so that the different computing accelerators are needed to be adopted for processing respectively.

It is contemplated that the virtual reality application may have a plurality of tasks to be processed. When various tasks are subdivided into computing sub-tasks, hysteresis may occur in the computing accelerator due to the processing time of the different application tasks. To improve computational efficiency, embodiments of the present application introduce multi-tasking asynchronous computations with priority.

In one possible implementation, the method may further include: and under the condition that a plurality of tasks to be processed exist in the virtual reality application, scheduling the computing subtasks of the tasks to be processed according to the number of the tasks to be processed of the virtual reality application and the priority of each task to be processed.

Optionally, the priority of the task to be processed may be determined according to one or more of a data size of the task to be processed, an arrival order of the task to be processed, a delay tolerance of the task to be processed, and a time when the task to be processed is sent to the processor. Of course, the priority of the task to be processed may be determined in other manners, which is not limited to the embodiment of the present application.

In one possible implementation, the scheduling of the computing sub-tasks of the plurality of pending tasks is implemented based on an asynchronous computing instruction set. The asynchronous calculation instruction set is an extended instruction set of RISC-V, and the specific format may refer to fig. 2, which is not described herein.

And step S602, performing parallel processing on each calculation subtask by adopting a calculation accelerator matched with the type of each calculation subtask to obtain a processing result of each calculation subtask.

In one possible implementation, the parallel processing is implemented based on a parallel computing instruction set. The parallel computing instruction set is an extended instruction set of RISC-V, and specifically, reference may be made to fig. 3, which is not described herein.

Step S603, dividing the processing result of each computing subtask into vertex data and texture data corresponding to the vertex data, performing operation processing on the vertex data, and performing fusion processing on the operation processing result of the vertex data and the texture data corresponding to the vertex data to obtain a graphics rendering result of the task to be processed.

In one possible implementation, the vertex data operations are implemented based on a graphics rendering instruction set. The graphics rendering instruction set is an extended instruction set of RISC-V, and may refer to fig. 4, which is not described herein.

In one possible implementation, step S603 may include: inputting the processing results of the calculation subtasks into a multiplexer, and outputting vertex data and texture data of each calculation subtask; inputting the vertex data of each calculation subtask into a programmable vertex engine for operation processing to obtain transformed and lightened vertex data; and inputting the transformed and lightened vertex data and the texture data into a fixed pipeline slicing engine for fusion processing to obtain a graph rendering result of the task to be processed. The programmable vertex engine and the fixed pipeline slicing engine may refer to fig. 5, and are not described herein.

According to the processing method of the virtual reality application task, firstly, the task is divided into the plurality of computing subtasks in a finer granularity, and the computing subtasks are processed in parallel, so that the computing efficiency of the computing subtasks is improved; meanwhile, the computing subtasks are processed by adopting a computing accelerator matched with the type of the computing subtasks, so that the computing efficiency of the subtasks is further improved while the requirement on the computing capacity of a processor is reduced; then, through splitting the vertex data, the graphics rendering efficiency is improved. Therefore, the running efficiency of the virtual reality application task is effectively improved.

In one possible implementation, the method may further include: reading instructions from the data storage module; and decoupling the instruction to obtain a memory access instruction and the calculation instruction.

In one possible implementation, the method may further include: and acquiring man-machine interaction data and storing the man-machine interaction data into the data storage module, wherein the man-machine interaction data corresponds to one or more tasks to be processed.

In one embodiment, fig. 7 is a flowchart of a method for processing a virtual reality application task according to an embodiment of the present application,

fig. 7 is a flow chart of a virtual reality application task method according to an embodiment of the present application, and is illustrated by taking the application of the method to the virtual reality system in fig. 1 as an example, and includes the following steps:

in step S701, the communication sensing module receives the man-machine interaction data.

In step S702, the communication sensing module stores the man-machine interaction data into the data storage module.

In step S703, the processor reads the instruction of the virtual reality application from the data storage module, and decouples the read instruction to obtain the memory access instruction and the calculation instruction.

Step S704, when the processor processes the computing instruction, determining whether there are a plurality of tasks to be processed; if yes, go to step S705; otherwise, step S706 is performed.

In step S705, the processor schedules the computing subtasks of the plurality of tasks to be processed according to the number of tasks to be processed of the virtual reality application and the priority of each task to be processed.

In step S706, the task to be processed of the virtual reality application is divided into different types of computing subtasks.

In step S707, the processor processes each computing subtask in parallel by using a computing accelerator matched with the type of each computing subtask, so as to obtain a processing result of each computing subtask.

In step S708, the processor inputs the processing results of the respective computation sub-tasks to the graphics rendering module.

In step S709, the graphics rendering module divides the processing result of each computing subtask into vertex data and texture data corresponding to the vertex data, performs operation processing on the vertex data, and performs fusion processing on the operation processing result of the vertex data and the texture data corresponding to the vertex data to obtain the graphics rendering result of the task to be processed.

In step S710, the graphics rendering module stores the graphics rendering result in the data storage module.

In step S711, the display module acquires the graphic rendering result from the data storage module and displays the graphic rendering result.

It should be understood that, although the steps in the flowcharts of fig. 6 and 7 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps in fig. 6 and 7 may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the execution of the steps or stages is not necessarily sequential, but may be performed in turn or alternately with at least a portion of the steps or stages in other steps or other steps.

In one embodiment, fig. 8 is a block diagram of a processing device for a virtual reality application task according to an embodiment of the present application. As shown in fig. 8, the apparatus 800 includes: a task partitioning module 801, a parallel processing module 802, and a fusion module 803, wherein:

In one possible implementation, the apparatus further includes:

In one possible implementation, the fusion module is further configured to:

In one possible implementation, the apparatus further includes:

For specific limitations of the processing device of the virtual reality application task, reference may be made to the above limitation of the processing method of the virtual reality application task, which is not described herein. The above-described modules in the processing device for virtual reality application tasks may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a virtual reality device is provided, see fig. 9. Fig. 9 is a schematic structural diagram of a virtual reality device according to an embodiment of the present application. The virtual reality device 700 shown in fig. 9 includes: at least one processor 701, memory 702, at least one network interface 704, and a user interface 703. The various components in the virtual reality device 700 are coupled together by a bus system 705. It is appreciated that the bus system 705 is used to enable connected communications between these components. The bus system 705 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration, the various buses are labeled as bus system 705 in fig. 9. In addition, in embodiments of the present application, a transceiver 706 is also included, which may be a plurality of elements, i.e., a transmitter and a receiver, providing a means for communicating with various other apparatus over a transmission medium.

The user interface 703 may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, a trackball, a touch pad, or a touch screen, etc.).

It is to be appreciated that memory 702 in embodiments of the invention may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory may be a Read-only memory (ROM), a programmable Read-only memory (ProgrammableROM, PROM), an erasable programmable Read-only memory (ErasablePROM, EPROM), an electrically erasable programmable Read-only memory (ElectricallyEPROM, EEPROM), or a flash memory, among others. The volatile memory may be a random access memory (RandomAccessMemory, RAM) that acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic random access memory (DynamicRAM, DRAM), synchronous dynamic random access memory (SynchronousDRAM, SDRAM), double data rate synchronous dynamic random access memory (ddr SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous link dynamic random access memory (SynchlinkDRAM, SLDRAM), and direct memory bus random access memory (DirectRambusRAM, DRRAM). The memory 702 of the systems and methods described in embodiments of the present invention is intended to comprise, without being limited to, these and any other suitable types of memory.

In some implementations, the memory 702 stores the following elements, executable modules or data structures, or a subset thereof, or an extended set thereof: an operating system 7021 and application programs 7022.

The operating system 7021 contains various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application programs 7022 include various application programs such as a media player (MediaPlayer), a Browser (Browser), and the like for realizing various application services. A program for implementing the method of the embodiment of the present invention may be contained in the application program 7022.

In the embodiment of the present invention, by calling a program or an instruction stored in the memory 702, specifically, a program or an instruction stored in the application program 7022, where the processor is configured to divide a task to be processed of a virtual reality application into different types of computing subtasks when a computing instruction of the virtual reality application is read; adopting a computing accelerator matched with the type of each computing subtask to process each computing subtask in parallel, and obtaining the processing result of each computing subtask; dividing the processing results of each calculation subtask into vertex data and texture data corresponding to the vertex data, carrying out operation processing on the vertex data, and carrying out fusion processing on the operation processing results of the vertex data and the texture data corresponding to the vertex data to obtain the graph rendering results of the task to be processed.

Some or all of the methods disclosed in the embodiments of the present invention may also be applied to the processor 701, or implemented by the processor 701 in conjunction with other elements (e.g., a transceiver). The processor 701 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 701 or by instructions in the form of software. The processor 701 may be a general purpose processor, a digital signal processor (DigitalSignalProcessor, DSP), an application specific integrated circuit (application specific IntegratedCircuit, ASIC), an off-the-shelf programmable gate array (FieldProgrammableGateArray, FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 702, and the processor 701 reads information in the memory 702 and performs the steps of the method in combination with its hardware.

It is to be understood that the embodiments of the application described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (ApplicationSpecificIntegratedCircuits, ASIC), digital signal processors (DigitalSignalProcessing, DSP), digital signal processing devices (dspev), programmable logic devices (ProgrammableLogicDevice, PLD), field programmable gate arrays (Field-ProgrammableGateArray, FPGA), general purpose processors, controllers, microcontrollers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described in embodiments of the present application may be implemented by modules (e.g., procedures, functions, and so on) that perform the functions described in embodiments of the present application. The software codes may be stored in memory and executed by the processor 701. The memory may be implemented within the processor 701 or external to the processor 701.

In one embodiment, the processor is further configured to:

In one embodiment, the type of the computing subtask is determined based on the computing requirements of the computing subtask.

In one embodiment, the types of computing subtasks include vector computing types, artificial intelligence computing types, and convolutional neural network computing types.

In one embodiment, the scheduling of the computing sub-tasks of the plurality of tasks to be processed is based on an asynchronous computing instruction set, the parallel processing is based on a parallel computing instruction set, the vertex data operations are based on a graphics rendering instruction set, and the asynchronous computing instruction set, the parallel computing instruction set, and the graphics rendering instruction set are RISC-V extended instruction sets.

In one embodiment, the processor is further configured to:

reading instructions from the data storage module;

In one embodiment, the processor is further configured to:

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

Reading instructions from the data storage module;

The embodiments of the present application also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the steps of:

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. A method for processing a virtual reality application task, the method comprising:

2. The method according to claim 1, wherein the method further comprises:

3. The method according to claim 2, wherein the method further comprises:

4. The method of claim 1, wherein the type of computing sub-task is determined based on a computing requirement of the computing sub-task.

5. The method of claim 4, wherein the types of computing subtasks include a vector computation type, an artificial intelligence computation type, and a convolutional neural network computation type.

6. The method of claim 2, wherein the scheduling of the computing sub-tasks of the plurality of tasks to be processed is based on an asynchronous computing instruction set, the parallel processing is based on a parallel computing instruction set, the vertex data operations are based on a graphics rendering instruction set, and the asynchronous computing instruction set, the parallel computing instruction set, and the graphics rendering instruction set are an extended instruction set of RISC-V.

7. The method according to claim 1, wherein dividing the processing result of each computing subtask into vertex data and texture data corresponding to the vertex data, performing operation processing on the vertex data, and performing fusion processing on the operation processing result of the vertex data and the texture data corresponding to the vertex data to obtain the graphics rendering result of the task to be processed, includes:

8. The method according to any one of claims 1 to 7, further comprising:

reading instructions from the data storage module;

9. The method of claim 8, wherein the method further comprises:

10. A processing apparatus for virtual reality application tasks, the apparatus comprising:

11. The virtual reality system is characterized by comprising a data storage module, a processor, a calculation accelerator, a graph rendering module and a display module;

The display module is used for displaying the graphic rendering result.

12. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 9.

13. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 9.