CN114330689A

CN114330689A - Data processing method, device, electronic device and storage medium

Info

Publication number: CN114330689A
Application number: CN202111644074.5A
Authority: CN
Inventors: 范文捷
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2022-04-12
Anticipated expiration: 2041-12-29
Also published as: CN114330689B

Abstract

The embodiment of the disclosure provides a data processing method, a data processing device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring neural network parameters corresponding to the target neural grid model; generating shaders to be used corresponding to each network level according to the neural network parameters; wherein the target neural network model comprises a plurality of network hierarchies; and determining a computing pipeline corresponding to the shader to be used according to target equipment parameters of the equipment to which the target neural network model belongs, and calling the computing pipeline to process the data to be processed according to the target equipment parameters when the data to be processed is received to obtain a target processing result. According to the technical scheme of the embodiment of the disclosure, the limitation of the GPU communication bandwidth to the model calculation process is reduced in a mode of greatly reducing the interaction between the CPU and the GPU, and the performance of the neural network model is improved.

Description

Data processing method, device, electronic device and storage medium

技术领域technical field

本公开实施例涉及数据处理技术领域，尤其涉及一种数据处理方法、装置、电子设备及存储介质。The embodiments of the present disclosure relate to the technical field of data processing, and in particular, to a data processing method, apparatus, electronic device, and storage medium.

背景技术Background technique

随着人工智能技术的不断发展，神经网络模型在各个领域中都得到了广泛的应用，依靠其大规模并行处理以及分布式信息存储等特点，可以对多种数据进行处理。With the continuous development of artificial intelligence technology, the neural network model has been widely used in various fields. Relying on its characteristics of large-scale parallel processing and distributed information storage, it can process a variety of data.

然而，计算机基于神经网络模型进行计算时，中央处理器(Central ProcessingUnit，CPU)与图形处理器(Graphics Processing Unit，GPU)之间大量的通信交互，降低了神经网络计算效率，模型的性能存在瓶颈。However, when the computer performs calculations based on the neural network model, a large number of communication interactions between the central processing unit (CPU) and the graphics processing unit (Graphics Processing Unit, GPU) reduce the computational efficiency of the neural network, and there is a bottleneck in the performance of the model. .

发明内容SUMMARY OF THE INVENTION

本公开提供一种数据处理方法、装置、电子设备及存储介质，以大幅降低CPU与GPU之间的交互的方式，减少了GPU通信带宽对模型计算过程的限制，提升了神经网络模型的性能。The present disclosure provides a data processing method, device, electronic device and storage medium, which greatly reduce the interaction between the CPU and the GPU, reduce the limitation of the GPU communication bandwidth on the model calculation process, and improve the performance of the neural network model.

第一方面，本公开实施例提供了一种数据处理方法，应用于中央处理器中，包括：In a first aspect, an embodiment of the present disclosure provides a data processing method, which is applied to a central processing unit, including:

获取与目标神经网格模型相对应的神经网络参数；Obtain the neural network parameters corresponding to the target neural grid model;

根据所述神经网络参数，生成与每个网络层级相对应的待使用着色器；其中，所述目标神经网络模型中包括多个网络层级；According to the neural network parameters, a shader to be used corresponding to each network level is generated; wherein, the target neural network model includes a plurality of network levels;

根据所述目标神经网络模型所属设备的目标设备参数，确定与所述待使用着色器相对应的计算管线，以在接收到待处理数据时，根据所述目标设备参数，调取所述计算管线对所述待处理数据进行处理，得到目标处理结果。Determine the computing pipeline corresponding to the shader to be used according to the target device parameters of the device to which the target neural network model belongs, so as to call the computing pipeline according to the target device parameters when receiving the data to be processed The data to be processed is processed to obtain a target processing result.

第二方面，本公开实施例还提供了一种数据处理方法，应用于图形处理器中，包括：In a second aspect, an embodiment of the present disclosure further provides a data processing method, which is applied to a graphics processor, including:

在接收到待处理数据时，加载预先确定出的计算管线和神经网络参数；其中，所述计算管线是基于中央处理器对所述目标神经网络模型的神经网络参数和所述目标神经网络模型所属目标设备参数确定的；When receiving the data to be processed, load the pre-determined computing pipeline and neural network parameters; wherein, the computing pipeline is based on the neural network parameters of the target neural network model by the central processing unit and the target neural network model belongs to The target device parameters are determined;

根据所述目标设备参数，确定各待使用着色器对待处理数据进行处理的目标处理方式；其中，所述待使用着色器是基于中央处理器对所述神经网络参数处理后得到的；According to the target device parameters, determine the target processing mode of each shader to be used to process the data to be processed; wherein, the shader to be used is obtained by processing the neural network parameters based on the central processing unit;

基于所述目标处理方式对所述待处理数据进行处理，得到目标处理结果。The data to be processed is processed based on the target processing manner to obtain a target processing result.

第三方面，本发明实施例还提供了一种数据处理装置，该装置配置于中央处理器中，包括：In a third aspect, an embodiment of the present invention further provides a data processing device, the device is configured in a central processing unit, and includes:

网络参数确定模块，用于获取与目标神经网格模型相对应的神经网络参数；The network parameter determination module is used to obtain the neural network parameters corresponding to the target neural grid model;

着色器确定模块，用于根据所述神经网络参数，生成与每个网络层级相对应的待使用着色器；其中，所述目标神经网络模型中包括多个网络层级；a shader determination module, configured to generate shaders to be used corresponding to each network level according to the neural network parameters; wherein, the target neural network model includes a plurality of network levels;

管线确定模块，用于根据所述目标神经网络模型所属设备的目标设备参数，确定与所述待使用着色器相对应的计算管线，以在接收到待处理数据时，根据所述目标设备参数，调取所述计算管线对所述待处理数据进行处理，得到目标处理结果。A pipeline determination module, configured to determine the computing pipeline corresponding to the shader to be used according to the target device parameters of the device to which the target neural network model belongs, so that when the data to be processed is received, according to the target device parameters, The computing pipeline is called to process the to-be-processed data to obtain a target processing result.

第四方面，本发明实施例还提供了一种数据处理装置，该装置配置于图形处理器中，包括：In a fourth aspect, an embodiment of the present invention further provides a data processing apparatus, where the apparatus is configured in a graphics processor and includes:

网络参数加载模块，用于在接收到待处理数据时，加载预先确定出的计算管线和神经网络参数；其中，所述计算管线是基于中央处理器对所述目标神经网络模型的神经网络参数和所述目标神经网络模型所属目标设备参数确定的；The network parameter loading module is used to load the pre-determined calculation pipeline and neural network parameters when receiving the data to be processed; wherein, the calculation pipeline is based on the neural network parameters and the neural network parameters of the target neural network model based on the central processing unit. Determined by the parameters of the target device to which the target neural network model belongs;

处理方式确定模块，用于根据所述目标设备参数，确定各待使用着色器对待处理数据进行处理的目标处理方式；其中，所述待使用着色器是基于中央处理器对所述神经网络参数处理后得到的；A processing mode determination module, configured to determine, according to the target device parameters, a target processing mode for each shader to be used to process the data to be processed; wherein, the shader to be used is based on the processing of the neural network parameters by the central processing unit obtained after

处理结果确定模块，用于基于所述目标处理方式对所述待处理数据进行处理，得到目标处理结果。A processing result determination module, configured to process the data to be processed based on the target processing manner to obtain a target processing result.

第五方面，本公开实施例还提供了一种电子设备，所述电子设备包括：In a fifth aspect, an embodiment of the present disclosure further provides an electronic device, the electronic device comprising:

一个或多个处理器；one or more processors;

存储装置，用于存储一个或多个程序，storage means for storing one or more programs,

当所述一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现如本公开实施例任一所述的数据处理方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the data processing method according to any one of the embodiments of the present disclosure.

第六方面，本公开实施例还提供了一种包含计算机可执行指令的存储介质，所述计算机可执行指令在由计算机处理器执行时用于执行如本公开实施例任一所述的数据处理方法。In a sixth aspect, an embodiment of the present disclosure further provides a storage medium containing computer-executable instructions, when executed by a computer processor, the computer-executable instructions are used to perform the data processing according to any one of the embodiments of the present disclosure method.

本公开实施例的技术方案，获取与目标神经网络模型相对应的神经网络参数；根据神经网络参数，生成与每个网络层级相对应的待使用着色器，即，针对目标神经网络模型中多个网络层级生成对应的待使用着色器；进一步的，根据目标神经网络模型所属设备的目标设备参数，确定与待使用着色器相对应的计算管线，以在接收到待处理数据时，根据目标设备参数，调取计算管线对待处理数据进行处理，得到目标处理结果，通过CPU生成与神经网络模型各网络层级相对应的着色器，再在接收到待处理数据后基于各着色器进行粗粒度的计算，以大幅降低CPU与GPU之间的交互的方式，减少了GPU通信带宽对模型计算过程的限制，提升了神经网络模型的性能。According to the technical solution of the embodiment of the present disclosure, the neural network parameters corresponding to the target neural network model are obtained; according to the neural network parameters, the to-be-used shaders corresponding to each network level are generated, that is, for multiple shaders in the target neural network model The network level generates the corresponding shader to be used; further, according to the target device parameters of the device to which the target neural network model belongs, the computing pipeline corresponding to the shader to be used is determined, so that when the data to be processed is received, according to the target device parameters , call the computing pipeline to process the data to be processed, obtain the target processing result, generate the shader corresponding to each network level of the neural network model through the CPU, and then perform coarse-grained calculation based on each shader after receiving the data to be processed, In a way that greatly reduces the interaction between the CPU and the GPU, the limitation of the GPU communication bandwidth on the model calculation process is reduced, and the performance of the neural network model is improved.

附图说明Description of drawings

结合附图并参考以下具体实施方式，本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中，相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的，原件和元素不一定按照比例绘制。The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent when taken in conjunction with the accompanying drawings and with reference to the following detailed description. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that the originals and elements are not necessarily drawn to scale.

图1为本公开实施例一所提供的一种数据处理方法流程示意图；1 is a schematic flowchart of a data processing method according to Embodiment 1 of the present disclosure;

图2为本公开实施例二所提供的一种数据处理方法流程示意图；2 is a schematic flowchart of a data processing method according to Embodiment 2 of the present disclosure;

图3为本公开实施例三所提供的一种数据处理方法流程示意图；3 is a schematic flowchart of a data processing method provided in Embodiment 3 of the present disclosure;

图4为本公开实施例四所提供的一种数据处理方法流程示意图；4 is a schematic flowchart of a data processing method provided in Embodiment 4 of the present disclosure;

图5为本公开实施例五所提供的一种数据处理装置结构示意图；FIG. 5 is a schematic structural diagram of a data processing apparatus according to Embodiment 5 of the present disclosure;

图6为本公开实施例六所提供的一种数据处理装置结构示意图；6 is a schematic structural diagram of a data processing apparatus according to Embodiment 6 of the present disclosure;

图7为本公开实施例七所提供的一种电子设备结构示意图。FIG. 7 is a schematic structural diagram of an electronic device according to Embodiment 7 of the present disclosure.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例，然而应当理解的是，本公开可以通过各种形式来实现，而且不应该被解释为限于这里阐述的实施例，相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是，本公开的附图及实施例仅用于示例性作用，并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for the purpose of A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are only for exemplary purposes, and are not intended to limit the protection scope of the present disclosure.

应当理解，本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行，和/或并行执行。此外，方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that the various steps described in the method embodiments of the present disclosure may be performed in different orders and/or in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this regard.

本文使用的术语“包括”及其变形是开放性包括，即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”；术语“另一实施例”表示“至少一个另外的实施例”；术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。As used herein, the term "including" and variations thereof are open-ended inclusions, ie, "including but not limited to". The term "based on" is "based at least in part on." The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions of other terms will be given in the description below.

需要注意，本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分，并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。需要注意，本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的，本领域技术人员应当理解，除非在上下文另有明确指出，否则应该理解为“一个或多个”。It should be noted that concepts such as "first" and "second" mentioned in the present disclosure are only used to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or interdependence. It should be noted that the modifications of "a" and "a plurality" mentioned in the present disclosure are illustrative rather than restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, they should be understood as "one or a plurality of". multiple".

本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的，而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are only for illustrative purposes, and are not intended to limit the scope of these messages or information.

实施例一Example 1

图1为本公开实施例一所提供的一种数据处理方法流程示意图，本公开实施例适用于使中央处理器根据神经网络参数生成粗粒度的着色器，以在接收到待处理数据时，基于与各网络层级相对应的待使用着色器对数据进行快速处理的情形，该方法可以由数据处理装置来执行，该装置可以通过软件和/或硬件的形式实现，可选的，通过电子设备来实现，该电子设备可以是移动终端、PC端或服务器等。FIG. 1 is a schematic flowchart of a data processing method according to Embodiment 1 of the present disclosure. The embodiment of the present disclosure is suitable for enabling a central processing unit to generate a coarse-grained shader according to neural network parameters, so that when receiving data to be processed, based on When the shader corresponding to each network level is to be used to quickly process the data, the method can be executed by a data processing device, which can be implemented in the form of software and/or hardware, or, optionally, through an electronic device. For implementation, the electronic device may be a mobile terminal, a PC terminal, a server, or the like.

如图1所示，所述方法包括：As shown in Figure 1, the method includes:

S110、获取与目标神经网格模型相对应的神经网络参数。S110. Obtain neural network parameters corresponding to the target neural grid model.

需要说明的是，本实施例的方案可以基于目标设备来执行，其中，目标设备可以是任意搭载有CPU以及GPU的终端设备，当目标设备接收到待处理数据后，即可利用CPU以及GPU的处理能力对接收的数据进行处理。It should be noted that the solution in this embodiment can be implemented based on a target device, wherein the target device can be any terminal device equipped with a CPU and a GPU. After the target device receives the data to be processed, it can use the CPU and GPU The processing capability processes the received data.

其中，神经网络模型(Neural Networks，NN)即是模拟人类实际神经网络的数学方法，具体的，每个神经网络都是由大量的、简单的处理单元(可以称为神经元)广泛地互相连接而形成的复杂网络系统，至少可以反映人脑功能的许多基本特征，是一个高度复杂的非线性动力学习系统。因此，神经网络模型具有大规模并行、分布式存储和处理、自组织、自适应和自学能力，适用于处理需要同时考虑许多因素和条件的、不精确和模糊的信息处理问题。Among them, the neural network model (Neural Networks, NN) is a mathematical method to simulate the actual human neural network. Specifically, each neural network is widely interconnected by a large number of simple processing units (which can be called neurons). The formed complex network system can at least reflect many basic features of human brain function, and is a highly complex nonlinear dynamic learning system. Therefore, the neural network model has large-scale parallel, distributed storage and processing, self-organization, self-adaptation and self-learning capabilities, and is suitable for dealing with inaccurate and ambiguous information processing problems that need to consider many factors and conditions at the same time.

在本实施例中，目标神经网络模型可以是一种或多种与特定业务相关联的神经网络模型，例如，当需要对图形相关业务进行处理时，可以确定出对应的卷积神经网络(Convolutional Neural Networks，CNN)模型，当需要对存在多个时段的业务进行处理时，可以确定出对应的循环神经网络(Recurrent Neural Network，RNN)模型。In this embodiment, the target neural network model may be one or more neural network models associated with a specific service. For example, when a graph-related service needs to be processed, a corresponding convolutional neural network (Convolutional Neural Network) can be determined. Neural Networks, CNN) model, when the business needs to be processed with multiple time periods, a corresponding Recurrent Neural Network (RNN) model can be determined.

在本实施例中，为了使计算机设备执行与目标神经网络模型相对应的计算，还需要获取与目标神经网络相对应的神经网络参数。其中，神经网络参数即是目标神经网络模型所涉及的多维度的算法信息，可以包括目标神经网络模型的标识、网络结构、神经网络层数、神经网络的张量、各层的执行顺序以及各层的输入输出中的至少一项。例如，当目标神经网络模型为CNN模型时，所确定的神经网络参数可以是CNN的输入和输出的数据类型，以及输入层、卷积层、激活函数、池化层、全连接层的执行顺序。可以理解为，基于所获取的神经网络参数，计算机至少能够在存储器中对目标神经网络模型进行构建。In this embodiment, in order for the computer device to perform the calculation corresponding to the target neural network model, it is also necessary to acquire the neural network parameters corresponding to the target neural network. Among them, the neural network parameters are the multi-dimensional algorithm information involved in the target neural network model, which can include the identifier of the target neural network model, the network structure, the number of neural network layers, the tensors of the neural network, the execution order of each layer, and the number of layers of the neural network. At least one of the input and output of the layer. For example, when the target neural network model is a CNN model, the determined neural network parameters can be the data types of the input and output of the CNN, and the execution order of the input layer, convolution layer, activation function, pooling layer, and fully connected layer . It can be understood that, based on the acquired neural network parameters, the computer can at least construct the target neural network model in the memory.

可以理解为，目标神经网络模型的结构可以根据实际情况进行设置，其具体的神经网络参数与模型的各个网络层级相对应，CPU只需获取到神经网络参数，即可对相应的各网络层级进行构建，从而得到目标神经网络模型。It can be understood that the structure of the target neural network model can be set according to the actual situation, and its specific neural network parameters correspond to each network level of the model. Build to obtain the target neural network model.

需要说明的是，在实际应用过程中，既可以预先获取与各目标业务对应的神经网络参数并存储在相应的服务器中，在需要执行模型相关计算时进行调用，也可以在需要执行目标神经网络模型相关异构计算时，以动态的方式实时获取所需要的神经网络参数。本领域技术人员应当理解，具体的获取神经网络参数的方式可以根据实际情况进行选择，本公开实施例在此不做具体的限定。It should be noted that, in the actual application process, the neural network parameters corresponding to each target business can be obtained in advance and stored in the corresponding server, and called when the model-related calculation needs to be performed, or the target neural network can be executed when the target business needs to be executed. During model-related heterogeneous computing, the required neural network parameters are obtained in a dynamic manner in real time. Those skilled in the art should understand that the specific manner of acquiring the neural network parameters can be selected according to the actual situation, which is not specifically limited in this embodiment of the present disclosure.

S120、根据神经网络参数，生成与每个网络层级相对应的待使用着色器。S120. Generate shaders to be used corresponding to each network level according to the neural network parameters.

在本实施例中，神经网络的异构计算是一种特殊形式的并行和分布式计算，可以理解为，异构计算所需的性能特性实际上与图形相关算法非常类似，通常涉及大量的参数、激活值、梯度值的缓冲区，其中每一个值在每一个训练迭代或处理数据的过程中都会被更新，这些缓冲过大，会超出传统桌面计算机cache，因此，可以利用具有很高的内存带宽的GPU来执行神经网络的相关计算。In this embodiment, the heterogeneous computing of neural networks is a special form of parallel and distributed computing. It can be understood that the performance characteristics required by heterogeneous computing are actually very similar to graph-related algorithms, usually involving a large number of parameters. , buffers of activation values, gradient values, each of which is updated during each training iteration or data processing process, these buffers are too large and will exceed the traditional desktop computer cache, therefore, can be used with high memory bandwidth GPU to perform neural network related computations.

因此，在本实施例中，在获取到神经网络参数后，为了在接收到业务相关的数据时利用GPU对这些数据进行处理，首先需要利用CPU生成与目标神经网络模型相对应的、可以运行于GPU中的待使用着色器。其中，待使用着色器可以理解为与常规图形渲染无关的计算着色器，当GPU加载所生成的计算着色器时，即可执行与目标神经网络模型相关的计算。Therefore, in this embodiment, after acquiring the neural network parameters, in order to use the GPU to process the data when receiving the business-related data, it is first necessary to use the CPU to generate the corresponding target neural network model, which can be run in The shader to use in the GPU. Among them, the shader to be used can be understood as a calculation shader unrelated to conventional graphics rendering. When the GPU loads the generated calculation shader, the calculation related to the target neural network model can be performed.

同时，由于CPU所确定的目标神经网络模型中包括多个网络层级，在生成待使用着色器时，通常需要针对每个网络层级生成相对应的待使用着色器。具体来说，需要调用着色器生成模块对所述神经网络参数进行处理，得到目标神经网络模型中每个网络层级的待使用着色器。其中，着色器生成模块是指部署于CPU端的处理模块，至少拥有构建待使用着色器，由于神经网络参数与目标神经网络模型各个网络层级相对应，因此，CPU利用着色器生成模块生成的各待使用着色器也与模型各网络层级相对应。At the same time, since the target neural network model determined by the CPU includes multiple network layers, when generating shaders to be used, it is usually necessary to generate corresponding shaders to be used for each network layer. Specifically, the shader generation module needs to be called to process the neural network parameters to obtain shaders to be used at each network level in the target neural network model. Among them, the shader generation module refers to the processing module deployed on the CPU side, and at least has the shader to be used for construction. Since the neural network parameters correspond to each network level of the target neural network model, the CPU uses the shader generation module. Using shaders also corresponds to each network level of the model.

在本实施例中，CPU针对目标神经网络模型各个网络层级构建出粗粒度的待使用着色器，而非针对各网络层级中具体的算子构建细粒度的着色器，从而在神经网络计算的过程中，避免了CPU与GPU之间的频繁交互，减少了GPU通信带宽对神经网络性能的限制。In this embodiment, the CPU constructs coarse-grained shaders to be used for each network level of the target neural network model, instead of constructing fine-grained shaders for specific operators in each network level, so that in the process of neural network calculation , avoids frequent interaction between CPU and GPU, and reduces the limitation of GPU communication bandwidth on neural network performance.

示例性的，当确定出目标神经网络模型为CNN模型，并获取到CNN模型的神经网络参数后，CPU即可基于获取的神经网络参数确定出CNN的输入层、卷积层、池化层以及全连接层后，即可生成与各网络层级相对应的待使用着色器，可以理解，在为目标神经网络模型各层生成对应的待使用着色器后，GPU在后续过程中执行哪一个着色器，即代表执行与该着色器对应的网络层级的计算。Exemplarily, after it is determined that the target neural network model is a CNN model, and the neural network parameters of the CNN model are obtained, the CPU can determine the input layer, convolution layer, pooling layer and the CNN based on the obtained neural network parameters. After the fully connected layer, the shader to be used corresponding to each network level can be generated. It can be understood that after generating the corresponding shader to be used for each layer of the target neural network model, which shader will be executed by the GPU in the subsequent process , which means to perform the computation at the network level corresponding to this shader.

S130、根据目标神经网络模型所属设备的目标设备参数，确定与待使用着色器相对应的计算管线，以在接收到待处理数据时，根据目标设备参数，调取计算管线对待处理数据进行处理，得到目标处理结果。S130, according to the target device parameters of the device to which the target neural network model belongs, determine the computing pipeline corresponding to the shader to be used, so that when receiving the data to be processed, according to the target device parameters, call the computing pipeline to process the data to be processed, Get the target processing result.

在本实施例中，由于并非所有的GPU都使用相同的指令集，只有将程序转换为二进制的文件，才能使GPU基于自身的架构集对各网络层级对应的待使用着色器进行加载和使用。因此，在CPU生成与目标神经网络模型各网络层级相对应的待使用着色器后，还需要调用GPU的驱动能力生成与各待使用着色器相对应的计算管线。In this embodiment, since not all GPUs use the same instruction set, only by converting the program into a binary file can the GPU load and use the shaders to be used corresponding to each network level based on its own architecture set. Therefore, after the CPU generates shaders to be used corresponding to each network level of the target neural network model, it is also necessary to invoke the driving capability of the GPU to generate computing pipelines corresponding to the shaders to be used.

其中，对于计算机来说，计算管线可以理解为GPU运行程序时所需要的流水线，而确定计算管线的过程，可以理解为CPU调用GPU的驱动能力，基于预先设置的数据结构对各待使用着色器进行编译、封装并存储至目标缓存的过程。示例性的，预先设置json格式作为计算管线的数据结构后，CPU即可调用GPU的驱动能力将各待使用着色器进行编译、封装以及缓存，可以理解，最终存储于目标缓存中的json格式的数据，即是各待使用着色器中的关键数据，在后续对具体的业务数据进行处理时，GPU可以基于目标缓存加载这些作为计算管线的关键数据，从而按照目标神经网络模型各网络层级的执行顺序执行相应的异构计算。Among them, for the computer, the calculation pipeline can be understood as the pipeline required by the GPU to run the program, and the process of determining the calculation pipeline can be understood as the CPU calling the GPU's driving ability, based on the preset data structure. The process of compiling, packaging, and storing into the target cache. Exemplarily, after pre-setting the json format as the data structure of the computing pipeline, the CPU can invoke the driving capability of the GPU to compile, encapsulate and cache the shaders to be used. Data is the key data in each shader to be used. When processing specific business data later, the GPU can load these key data as the calculation pipeline based on the target cache, so as to execute the execution of each network level according to the target neural network model. The corresponding heterogeneous computations are performed sequentially.

在本实施例中，可以根据目标神经网络模型所属设备的目标设备参数来确定各计算管线。其中，目标设备可以是基于GPU执行目标神经网络模型相关计算的设备，对应的，目标设备参数即是目标设备所搭载的GPU的属性信息。In this embodiment, each computing pipeline may be determined according to target device parameters of the device to which the target neural network model belongs. The target device may be a device that performs computation related to the target neural network model based on the GPU, and correspondingly, the target device parameter is attribute information of the GPU mounted on the target device.

在实际应用过程中，目标设备参数包括间接缓冲参数(IndirectBuffer)，该参数可以理解为表征GPU是否支持间接缓冲功能的信息，具体来说，当GPU支持间接缓冲时，GPU即可在对业务数据进行处理时，基于自身确定出所加载的各待使用着色器的执行顺序，而当GPU不支持间接缓冲时，GPU在对业务数据进行处理时，则需要在执行完毕一个待使用着色器后，根据CPU发送的指令，确定出下一个需要执行的待使用着色器。In the actual application process, the parameters of the target device include the indirect buffer parameter (IndirectBuffer), which can be understood as the information indicating whether the GPU supports the indirect buffer function. When processing, the execution order of the loaded shaders to be used is determined based on itself. When the GPU does not support indirect buffering, when the GPU is processing business data, it needs to execute a shader to be used. Instructions sent by the CPU to determine the next shader to be used that needs to be executed.

可以理解为，当GPU支持间接缓存时，可以在无需CPU发送指令的情况下完成目标神经网络模型整体的异构计算过程，而在GPU不支持间接缓存时，在执行目标神经网络模型相关计算的过程中还需要与CPU多次交互，基于CPU发送的各条指令来确定下一个需要执行的待使用着色器。It can be understood that when the GPU supports indirect caching, the overall heterogeneous computing process of the target neural network model can be completed without the need for the CPU to send instructions. In the process, multiple interactions with the CPU are also required, and the next shader to be used to be executed is determined based on each instruction sent by the CPU.

基于此，在目标设备参数不同的情况下，CPU确定各计算管线的方式也存在差异，可以理解为，根据对GPU是否支持间接缓存的判定，CPU存在相应的两种方式来确定与各待使用着色器相对应的计算管线。Based on this, in the case of different target device parameters, there are also differences in the way the CPU determines each computing pipeline. It can be understood that, according to the judgment on whether the GPU supports indirect caching, the CPU has two corresponding ways to determine the The compute pipeline corresponding to the shader.

第一种方式为，若目标神经网络模型所属设备的间接缓冲参数为第一参数，则生成与各待使用着色器相对应的调度着色器；通过对调度着色器和待使用着色器编译处理，得到计算管线。The first way is, if the indirect buffer parameter of the device to which the target neural network model belongs is the first parameter, generate a scheduling shader corresponding to each shader to be used; by compiling the scheduling shader and the shader to be used, Get the computation pipeline.

具体来说，当间接缓冲参数为第一参数时，表示目标设备搭载的GPU支持间接缓冲，因此，可以生成与各待使用着色器相对应的调度着色器。其中，调度着色器可以理解为运行于GPU中、用于控制各待使用着色器执行顺序的程序代码。例如，当确定目标设备参数为第一缓冲参数时，可以针对CNN模型中的输入层、卷积层、池化层以及全连接层生成对应的调度着色器，在后续对相关图像数据进行处理时，GPU基于调度着色器即可确定CNN模型中各层的执行顺序，可以理解为，在GPU执行完毕与输入层对应的待使用着色器后，根据调度着色器即可确定出下一个需要执行的着色器为卷积层对应的待使用着色器。Specifically, when the indirect buffer parameter is the first parameter, it means that the GPU mounted on the target device supports indirect buffering, and therefore, the scheduling shader corresponding to each shader to be used can be generated. The scheduling shader may be understood as the program code running in the GPU and used to control the execution order of each shader to be used. For example, when it is determined that the target device parameter is the first buffer parameter, corresponding scheduling shaders can be generated for the input layer, convolutional layer, pooling layer and fully connected layer in the CNN model, and when the related image data is subsequently processed , the GPU can determine the execution order of each layer in the CNN model based on the scheduling shader. It can be understood that after the GPU finishes executing the shader to be used corresponding to the input layer, the next shader to be executed can be determined according to the scheduling shader. The shader is the shader to be used corresponding to the convolutional layer.

在本实施例中，在生成调度着色器以及各待使用着色器后，即可基于CPU对生成的各着色器进行编译，得到与各着色器相对应的计算管线。其中，计算管线中包括与各待使用着色器相对应的子计算管线，以及与各子计算管线对应的控制管线，可以理解，子计算管线即包含与对应的网络层级相应的关键数据，控制管线则包含用于控制各网络层级执行顺序相关程序的关键数据。In this embodiment, after the scheduling shader and each shader to be used are generated, each generated shader can be compiled based on the CPU to obtain a computing pipeline corresponding to each shader. Among them, the computing pipeline includes sub-computation pipelines corresponding to the shaders to be used, and control pipelines corresponding to each sub-computation pipeline. It contains key data used to control the sequence-related procedures at each network level.

第二种方式为，若所述目标神经网络模型所属设备的间接缓冲参数为第二参数，则确定与各待使用着色器相对应的子计算管线；基于各子计算管线，确定所述计算管线。The second way is, if the indirect buffer parameter of the device to which the target neural network model belongs is the second parameter, determine the sub-computing pipeline corresponding to each shader to be used; and determine the computing pipeline based on each sub-computing pipeline .

具体来说，当间接缓冲参数为第二参数时，表示目标设备搭载的GPU不支持间接缓冲，因此，无需生成与各待使用着色器相对应的调度着色器，而是直接生成与各待使用着色器相对应的子计算管线，进一步的，将所确定的子计算关键进行整合，即得到与目标神经网络模型相对应的计算管线。可以理解，当间接缓冲参数为第二参数时，在GPU对各业务数据进行处理时，基于其自身无法确定出各待使用着色器的执行顺序，因此，需要基于CPU发送的指令执行各段待使用着色器。Specifically, when the indirect buffer parameter is the second parameter, it means that the GPU mounted on the target device does not support indirect buffering. Therefore, it is not necessary to generate the scheduling shader corresponding to each shader to be used, but to directly generate the corresponding shader to be used. The sub-computation pipeline corresponding to the shader, and further, the determined sub-computation key is integrated, that is, the computation pipeline corresponding to the target neural network model is obtained. It can be understood that when the indirect buffer parameter is the second parameter, when the GPU processes each service data, the execution order of each shader to be used cannot be determined based on itself. Therefore, it is necessary to execute each segment based on the instructions sent by the CPU. Use shaders.

在本实施例中，在确定出计算管线后，CPU可以缓存目标神经网络模型的神经网络参数和计算管线，以在接收到待处理数据时，调取神经网络参数和计算管线对待处理数据进行处理。其中，待处理数据可以是与目标神经网络模型相关联的业务所涉及的数据，同时，待处理数据也是目标神经网络模型的输入。In this embodiment, after the calculation pipeline is determined, the CPU can cache the neural network parameters and the calculation pipeline of the target neural network model, so as to call the neural network parameters and the calculation pipeline to process the to-be-processed data when receiving the to-be-processed data . The data to be processed may be data related to the business associated with the target neural network model, and at the same time, the data to be processed is also the input of the target neural network model.

示例性的，当CPU确定出CNN模型的神经网络参数、与各网络层级相对应的子计算管线以及与调度着色器对应的控制管线后，可以将这些数据存储于目标缓存中。当接收到与CNN模型相关联的业务数据(即待处理数据)后，目标设备的GPU即可基于目标缓存加载目标神经网络模型对应的子计算管线以及控制管线，以对待处理数据进行处理，即，在控制管线的调度下，先将业务数据输入至输入层，待输入层对应的待使用着色器执行完毕后，将该层输出的结果在控制管线的调度下，依次输入至卷积层、池化层以及全连接层，并执行各网络层级对应的待使用着色器，从而为该业务得到与待处理数据对应的处理结果。Exemplarily, after the CPU determines the neural network parameters of the CNN model, the sub-computation pipelines corresponding to each network level, and the control pipelines corresponding to the scheduling shaders, these data can be stored in the target cache. After receiving the business data associated with the CNN model (that is, the data to be processed), the GPU of the target device can load the sub-computing pipeline and the control pipeline corresponding to the target neural network model based on the target cache to process the data to be processed, that is, , under the scheduling of the control pipeline, first input the business data to the input layer, after the shader to be used corresponding to the input layer is executed, the output result of this layer is input to the convolutional layer, The pooling layer and the fully connected layer, and execute the shader to be used corresponding to each network level, so as to obtain the processing result corresponding to the data to be processed for the service.

实施例二Embodiment 2

作为上述实施例的一可选实施例，图2为本公开实施例二所提供的一种数据处理方法流程示意图。为了清楚的介绍本实施例技术方案，可以以应用场景是基于中央处理器动态生成着色器程序，以在接收到待处理数据时，由目标设备对待处理数据进行处理的情形为例来介绍，但是不局限于上述场景，可以适用于各种需要对目标神经网络模型相关的数据进行处理的场景中。As an optional embodiment of the foregoing embodiment, FIG. 2 is a schematic flowchart of a data processing method provided in Embodiment 2 of the present disclosure. In order to clearly introduce the technical solution of this embodiment, it can be introduced by taking the application scenario that the shader program is dynamically generated based on the central processing unit, and the situation that the target device processes the data to be processed when receiving the data to be processed as an example, but It is not limited to the above scenarios, and can be applied to various scenarios where data related to the target neural network model needs to be processed.

参见图2，在CPU构建神经网络的过程中，首先需要获取目标神经网络模型的模型参数，该参数可以基于调用方输入的配置参数来确定，例如，当需要开展某业务的调用方在目标页面输入CNN模型的相关配置参数后，CPU端的处理模块即可获取这些参数并将其作为神经网络参数。Referring to Figure 2, in the process of building a neural network by the CPU, it is first necessary to obtain the model parameters of the target neural network model, which can be determined based on the configuration parameters input by the caller. After entering the relevant configuration parameters of the CNN model, the processing module on the CPU side can obtain these parameters and use them as neural network parameters.

继续参见图2，在CPU在确定出目标神经网络模型以及相应的神经网络参数后，即可针对模型的每个层级生成用于神经网络计算的待使用着色器；进一步的，判断目标设备的GPU是否支持间接缓存，当确定GPU支持间接缓存时，CPU还需要生成各层级的调度着色器，并调用GPU的驱动将调度着色器以及各待使用着色器进行编译，得到对应的计算管线，可以理解，计算管线中包括与各待使用着色器对应的子计算管线，以及与调度着色器相对应的控制管线。当确定GPU不支持间接缓存时，CPU则直接生成与各待使用着色器相对应的子计算管线。Continue to refer to FIG. 2, after the CPU determines the target neural network model and the corresponding neural network parameters, it can generate the shader to be used for neural network calculation for each level of the model; further, determine the GPU of the target device. Whether indirect cache is supported, when it is determined that the GPU supports indirect cache, the CPU also needs to generate scheduling shaders at each level, and call the GPU driver to compile the shaders and the shaders to be used, and get the corresponding computing pipeline, understandable , the computing pipeline includes sub-computing pipelines corresponding to the shaders to be used, and control pipelines corresponding to the scheduling shaders. When it is determined that the GPU does not support indirect caching, the CPU directly generates sub-computation pipelines corresponding to the shaders to be used.

继续参见图2，在CPU生成计算管线后，即可将神经网络参数以及计算管线进行缓存，以在接收到待处理数据时，由关联的GPU在缓存中调取上述数据，对待处理数据进行处理。Continuing to refer to Figure 2, after the CPU generates the calculation pipeline, the neural network parameters and the calculation pipeline can be cached, so that when the data to be processed is received, the associated GPU can retrieve the above data in the cache and process the data to be processed. .

本公开实施例的技术方案，通过CPU生成与神经网络模型各网络层级相对应的着色器，再在接收到待处理数据后基于各着色器进行粗粒度的计算，以大幅降低CPU与GPU之间的交互的方式，减少了GPU通信带宽对模型计算过程的限制，提升了神经网络模型的性能。In the technical solution of the embodiments of the present disclosure, shaders corresponding to each network level of the neural network model are generated by the CPU, and then coarse-grained calculations are performed based on each shader after receiving the data to be processed, so as to greatly reduce the difference between the CPU and the GPU. The interactive method reduces the limitation of the GPU communication bandwidth on the model calculation process and improves the performance of the neural network model.

实施例三Embodiment 3

图3为本公开实施例三所提供的一种数据处理方法流程示意图，本公开实施例适用于基于图形处理器加载缓存中的计算管线，以对待处理数据进行处理的情形，该方法可以由数据处理装置来执行，该装置可以通过软件和/或硬件的形式实现，可选的，通过电子设备来实现，该电子设备可以是移动终端、PC端或服务器等。FIG. 3 is a schematic flowchart of a data processing method provided by Embodiment 3 of the present disclosure. The embodiment of the present disclosure is applicable to a situation in which a computing pipeline in a cache is loaded based on a graphics processor to process data to be processed. The processing apparatus can be implemented in the form of software and/or hardware. Optionally, it can be implemented by an electronic device, and the electronic device can be a mobile terminal, a PC terminal, a server, or the like.

如图3所示，所述方法包括：As shown in Figure 3, the method includes:

S210、在接收到待处理数据时，加载预先确定出的计算管线和神经网络参数。S210. When receiving the data to be processed, load the pre-determined computing pipeline and neural network parameters.

在本实施例中，由于计算管线是基于中央处理器对目标神经网络模型的神经网络参数和目标神经网络模型所属目标设备参数确定的，同时，CPU已经将各目标神经网络模型的神经网络参数，以及计算管线存储至目标缓存中，因此，当目标设备接收到待处理数据时，首先需要CPU基于缓存加载所确定的上述数据，进一步的，使设备搭载的GPU基于与CPU进行交互的应用程序编程接口(Application Programming Interface，API)，加载目标神经网络模型以及各计算管线，即，将模型即计算管线在GPU上进行部署，以对待处理数据进行处理。In this embodiment, since the computing pipeline is determined by the central processing unit on the neural network parameters of the target neural network model and the parameters of the target device to which the target neural network model belongs, at the same time, the CPU has already calculated the neural network parameters of each target neural network model, And the computing pipeline is stored in the target cache. Therefore, when the target device receives the data to be processed, the CPU first needs to load the determined data based on the cache. Further, the GPU on the device is based on the application programming that interacts with the CPU. Interface (Application Programming Interface, API), load the target neural network model and each computing pipeline, that is, deploy the model, that is, the computing pipeline, on the GPU to process the data to be processed.

示例性的，当CPU对CNN模型的神经网络参数，以及目标设备的目标设备参数(即表征目标设备是否支持间接缓存的信息)确定出计算管线，并将神经网络参数以及计算管线预先存储于目标缓存中后，当目标设备接收到与CNN模型关联的业务的待处理数据时，可以由CPU加载CNN模型以及相应的各计算管线，同时，向GPU发送一条程序执行指令，从而利用相应的API接口将模型与计算管线部署于GPU中，即，使GPU加载与该业务对应的CNN模型，以及与输入层、卷积层、池化层以及全连接层相对应的计算管线，从而执行与上述各网络层级相对应的异构计算。Exemplarily, when the CPU determines the calculation pipeline based on the neural network parameters of the CNN model and the target device parameters of the target device (that is, the information representing whether the target device supports indirect caching), the neural network parameters and the calculation pipeline are pre-stored in the target device. After the cache, when the target device receives the pending data of the business associated with the CNN model, the CPU can load the CNN model and the corresponding computing pipelines, and at the same time, send a program execution instruction to the GPU, so as to use the corresponding API interface Deploy the model and computing pipeline in the GPU, that is, make the GPU load the CNN model corresponding to the service, and the computing pipeline corresponding to the input layer, convolution layer, pooling layer, and fully connected layer, so as to execute the above-mentioned functions. Heterogeneous computing corresponding to the network level.

S220、根据所述目标设备参数，确定各待使用着色器对待处理数据进行处理的目标处理方式。S220. Determine, according to the target device parameters, a target processing mode for each shader to be used to process the data to be processed.

其中，待使用着色器是基于中央处理器对神经网络参数处理后得到的，也即是说，CPU可以按照本公开实施例一的方式对神经网络参数进行处理，从而得到与目标神经网络模型各网络层级相对应的待使用着色器，本公开实施例在此不再赘述。The shader to be used is obtained after processing the neural network parameters based on the central processing unit, that is to say, the CPU can process the neural network parameters according to the method of the first embodiment of the present disclosure, so as to obtain the parameters corresponding to the target neural network model. The shader to be used corresponding to the network level will not be repeated in this embodiment of the present disclosure.

在本实施例中，当目标设备参数不同时，GPU的处理方式同样存在差异。可选的，若目标设备参数为第一参数，则确定各待使用着色器对待处理数据进行处理的目标处理方式为第一目标处理方式；若目标设备参数为第二参数，则确定所述目标处理方式为第二目标处理方式。In this embodiment, when the parameters of the target devices are different, the processing methods of the GPU are also different. Optionally, if the target device parameter is the first parameter, determine the target processing mode of each shader to be used to process the data to be processed as the first target processing mode; if the target device parameter is the second parameter, determine the target processing mode The processing method is the second target processing method.

可以理解为，当确定目标设备搭载的GPU支持间接缓存时，则确定第一目标处理方式，从而基于各待使用着色器以对待处理数据进行处理，当确定目标设备搭载的GPU不支持间接缓存时，则确定第二目标处理方式，从而基于各待使用着色器以对待处理数据进行处理。It can be understood that when it is determined that the GPU mounted on the target device supports indirect caching, the first target processing method is determined, so as to process the data to be processed based on each shader to be used, and when it is determined that the GPU mounted on the target device does not support indirect caching , the second target processing mode is determined, so as to process the data to be processed based on each shader to be used.

S230、基于所述目标处理方式对所述待处理数据进行处理，得到目标处理结果。S230. Process the data to be processed based on the target processing method to obtain a target processing result.

在本实施例中，可选的，当目标处理方式为第一目标处理方式时，基于与所述计算管线相对应的控制管线确定各子计算管线的执行顺序；基于所述执行顺序和相应的子计算管线向相应的待使用着色器发送程序执行指令，以使各待使用着色器对所述待处理数据进行处理，得到与所述目标处理结果。In this embodiment, optionally, when the target processing mode is the first target processing mode, the execution order of each sub-computing pipeline is determined based on the control pipeline corresponding to the computing pipeline; based on the execution order and the corresponding The sub-computing pipeline sends a program execution instruction to the corresponding shader to be used, so that each shader to be used processes the data to be processed to obtain a processing result corresponding to the target.

示例性的，当目标设备搭载的GPU支持间接缓存时，可以确定目标处理方式为第一目标处理方式。同时，由于CPU已经针对作为目标神经网络模型的CNN模型各网络层级构建出对应的子计算管线，以及控制各网络层级执行顺序的控制管线，同时，将上述计算管线存储于目标缓存中，因此，在对待处理数据进行处理时，GPU可以直接加载缓存中的控制管线，从而确定执行顺序为输入层、卷积层、池化层以及全连接层。进一步的，基于所确定的执行顺序，GPU可以利用输入层子计算管线、卷积层子计算管线、池化层子计算管线以及全连接层子计算管线，依次向各网络层级相应的待使用着色器发送程序执行指令，以运行各段程序对待处理数据进行处理。Exemplarily, when the GPU mounted on the target device supports indirect caching, it may be determined that the target processing mode is the first target processing mode. At the same time, since the CPU has constructed corresponding sub-computation pipelines for each network level of the CNN model as the target neural network model, as well as control pipelines that control the execution order of each network level, and at the same time, the above-mentioned computing pipelines are stored in the target cache. Therefore, When processing the data to be processed, the GPU can directly load the control pipeline in the cache to determine the execution order as input layer, convolution layer, pooling layer and fully connected layer. Further, based on the determined execution order, the GPU can use the input layer sub-computation pipeline, the convolution layer sub-computation pipeline, the pooling layer sub-computation pipeline, and the fully-connected layer sub-computation pipeline to colorize the corresponding to-be-used layers at each network level in turn. The processor sends program execution instructions to run each segment of the program to process the data to be processed.

可选的，接收基于当前子计算管线发送的程序执行指令；基于程序执行指令，对待处理数据进行处理，得到数据处理结果，并将数据处理结果反馈至所述中央处理器，以使中央处理器接收到数据处理结果时，将当前子计算管线的下一子计算管线作为当前子计算管线，并基于当前子计算管线发送程序执行指令，直至当前子计算管线为最后一个子计算管线，得到与待处理数据相对应的目标处理结果。Optionally, receive a program execution instruction sent based on the current sub-computing pipeline; process the data to be processed based on the program execution instruction, obtain a data processing result, and feed back the data processing result to the central processing unit, so that the central processing unit When the data processing result is received, the next sub-computation pipeline of the current sub-computation pipeline is used as the current sub-computation pipeline, and the program execution instruction is sent based on the current sub-computation pipeline until the current sub-computation pipeline is the last sub-computation pipeline, and the The target processing result corresponding to the processing data.

具体来说，当目标设备搭载的GPU不支持间接缓存时，可以确定目标处理方式为第二目标处理方式。在这种情况下，CPU仅为目标神经网络模型各网络层级构建出相应的子计算管线，并将上述计算管线存储于目标缓存中，因此，在对待处理数据进行处理时，GPU无法基于自身确定出各网络层级关联的待使用着色器的执行顺序，而是需要根据CPU发送的程序执行指令对待处理数据进行处理，可以理解为，程序执行指令是基于中央处理器根据各神经网络层级的子计算管线确定的。Specifically, when the GPU mounted on the target device does not support indirect caching, it may be determined that the target processing mode is the second target processing mode. In this case, the CPU only constructs the corresponding sub-computing pipelines for each network level of the target neural network model, and stores the above-mentioned computing pipelines in the target cache. Therefore, when processing the data to be processed, the GPU cannot determine based on itself. It is necessary to process the data to be processed according to the program execution instructions sent by the CPU. It can be understood that the program execution instructions are based on the sub-calculation of the central processor according to each neural network level. pipeline is determined.

示例性的，当CPU仅针对作为目标神经网络模型的CNN模型各网络层级构建出对应的子计算管线，并将上述计算管线存储于目标缓存中后，GPU在对待处理数据进行处理时，无法确定出各待使用着色器的执行顺序，需要接收CPU发送的程序执行指令，并对指令进行解析，以此确定出需要执行的第一个待使用着色器为与CNN输入层对应的待使用着色器，进一步的，基于与输入层对应的子计算管线加载对应的待使用着色器并执行，从而对待处理数据进行处理；同时，CPU端的处理模块可以对GPU的执行结果进行异步查询，或者，接收GPU根据执行结果发送的反馈信息，当CPU确定GPU执行的与输入层相关联的待使用着色器无误，且得到有效的数据处理结果后，可以为GPU确定下一个需要执行的网络层级为卷积层，并在将卷积层对应的子计算管线作为当前子计算管线后，将对应的消息以程序执行指令的方式再次发送给GPU，由GPU基于当前子计算管线执行与CNN卷积层对应的待使用着色器并执行，再次得到对应的处理结果。基于这种方式，使GPU依次执行池化层以及全连接层对应的待使用着色器，并得到最终的目标处理结果。Exemplarily, when the CPU only constructs corresponding sub-computing pipelines for each network level of the CNN model as the target neural network model, and stores the above-mentioned computing pipelines in the target cache, the GPU cannot determine when processing the data to be processed. To get the execution order of each shader to be used, it is necessary to receive the program execution instructions sent by the CPU and parse the instructions, so as to determine that the first shader to be used to be executed is the shader to be used corresponding to the input layer of the CNN , further, load the corresponding shader to be used based on the sub-computing pipeline corresponding to the input layer and execute it, so as to process the data to be processed; at the same time, the processing module on the CPU side can asynchronously query the execution result of the GPU, or receive the GPU According to the feedback information sent by the execution result, when the CPU determines that the shader to be used associated with the input layer executed by the GPU is correct and obtains valid data processing results, it can determine for the GPU that the next network layer to be executed is the convolution layer , and after the sub-computation pipeline corresponding to the convolution layer is used as the current sub-computation pipeline, the corresponding message is sent to the GPU again in the form of program execution instructions, and the GPU executes the pending sub-computation pipeline corresponding to the CNN convolution layer based on the current sub-computation pipeline. Use the shader and execute it to get the corresponding processing result again. Based on this method, the GPU executes the pooling layer and the shader to be used corresponding to the fully connected layer in sequence, and obtains the final target processing result.

上述过程可以理解为，CPU针对目标神经网络模型的四个网络层级构建出相应的子计算管线后，可以向GPU发送指令，使GPU基于与第一层相对应的子计算管线执行第一层的待使用着色器，当GPU执行完成后，向CPU发送程序执行完毕的消息，CPU接收到消息后即可向GPU发送执行第二层的待使用着色器的指令，以此类推，直至GPU将四个网络层级的待使用着色器全部执行完毕为止。当GPU得到目标处理结果后，既可以由CPU对目标处理结果直接进行读取，也可以将目标处理结果存储于GPU端，以供后续其他待使用着色器调用。The above process can be understood as, after the CPU constructs corresponding sub-computing pipelines for the four network layers of the target neural network model, it can send instructions to the GPU, so that the GPU executes the first-layer based on the sub-computing pipelines corresponding to the first layer. When the shader is to be used, when the GPU execution is completed, it sends a message that the program execution is complete to the CPU. After receiving the message, the CPU can send the instruction to execute the shader to be used in the second layer to the GPU, and so on, until the GPU until all shaders to be used at each network level are executed. After the GPU obtains the target processing result, the target processing result can be directly read by the CPU, or the target processing result can be stored on the GPU side for subsequent calls by other shaders to be used.

本公开实施例的技术方案，在接收到待处理数据时，加载预先确定出的计算管线和神经网络参数，根据目标设备参数，确定各待使用着色器对待处理数据进行处理的目标处理方式；基于目标处理方式对待处理数据进行处理，得到目标处理结果，从而使GPU在接收到待处理数据后，基于缓存中的各着色器进行粗粒度的计算，以大幅降低CPU与GPU之间的交互的方式，减少了GPU通信带宽对模型计算过程的限制，提升了神经网络模型的性能。According to the technical solution of the embodiment of the present disclosure, when the data to be processed is received, the pre-determined parameters of the computing pipeline and neural network are loaded, and the target processing method of each shader to be used for processing the data to be processed is determined according to the parameters of the target device; The target processing method processes the data to be processed and obtains the target processing result, so that after receiving the data to be processed, the GPU performs coarse-grained calculations based on each shader in the cache, which greatly reduces the interaction between the CPU and the GPU. , which reduces the limitation of the GPU communication bandwidth on the model calculation process and improves the performance of the neural network model.

实施例四Embodiment 4

作为上述实施例的一可选实施例，图4为本公开实施例四所提供的一种数据处理方法流程示意图。为了清楚的介绍本实施例技术方案，可以以应用场景是基于图形处理器加载缓存中的计算管线，以对待处理数据进行处理的情形为例来介绍，但是不局限于上述场景，可以适用于各种需要对目标神经网络模型相关的数据进行处理的场景中。As an optional embodiment of the foregoing embodiment, FIG. 4 is a schematic flowchart of a data processing method provided in Embodiment 4 of the present disclosure. In order to clearly introduce the technical solution of this embodiment, the application scenario may be based on the graphics processor loading the computing pipeline in the cache, and the case where the data to be processed is processed as an example may be introduced. In a scenario where data related to the target neural network model needs to be processed.

参见图4，在GPU调用神经网络的过程中，CPU端的处理模块首先需要加载已经构建好的计算管线，以及神经网络参数，同时，CPU端的处理模块还需要加载业务数据，并由GPU基于缓存加载计算管线。Referring to Figure 4, in the process of calling the neural network on the GPU, the processing module on the CPU side first needs to load the built computing pipeline and the parameters of the neural network. At the same time, the processing module on the CPU side also needs to load the business data, which is loaded by the GPU based on the cache Calculate the pipeline.

继续参见图4，当上述数据加载完毕后，需要对GPU是否支持间接缓存进行判断，当GPU支持间接缓冲时，GPU端处理模块可以直接加载控制管线至间接缓冲区内，并基于控制管线确定出目标神经网络模型各网络层级的执行顺序，进一步的，按照所确定的执行顺序，依次基于各网络层级对应的子计算管线执行相应的待使用着色器程序，从而完成目标神经网络模型全流程的调度；当GPU不支持间接缓冲时，则需要由CPU端处理模块对目标神经网络模型的各个网络层级进行调度，在CPU的控制下，GPU才能确定出目标神经网络模型各层的执行顺序，从而按照CPU给定的顺序执行相应的待使用着色器。Continue to refer to Figure 4, when the above data is loaded, it is necessary to judge whether the GPU supports indirect buffering. When the GPU supports indirect buffering, the GPU-side processing module can directly load the control pipeline into the indirect buffer, and determine based on the control pipeline. The execution order of each network level of the target neural network model, and further, according to the determined execution order, the corresponding shader programs to be used are executed based on the sub-computing pipelines corresponding to each network level in turn, so as to complete the scheduling of the entire process of the target neural network model ; When the GPU does not support indirect buffering, the CPU-side processing module needs to schedule each network level of the target neural network model. Under the control of the CPU, the GPU can determine the execution order of each layer of the target neural network model. The CPU executes the corresponding shaders to be used in the order given by the CPU.

继续参见图4，在GPU基于各待使用着色器程序对待处理数据进行处理时，CPU端处理模块可以以异步查询的方式确定目标神经网络模型是否执行完毕，当确定模型执行完毕后，即可直接读取目标处理结果，或者，将目标处理结果保存于GPU端以供其他待使用着色器使用。Continuing to refer to Figure 4, when the GPU processes the data to be processed based on each shader program to be used, the CPU-side processing module can determine whether the target neural network model has been executed through an asynchronous query. Read the target processing result, or save the target processing result on the GPU side for use by other shaders to be used.

本公开实施例的技术方案，使GPU在接收到待处理数据后，基于缓存中的各着色器进行粗粒度的计算，以大幅降低CPU与GPU之间的交互的方式，减少了GPU通信带宽对模型计算过程的限制，提升了神经网络模型的性能。The technical solutions of the embodiments of the present disclosure enable the GPU to perform coarse-grained calculations based on each shader in the cache after receiving the data to be processed, so as to greatly reduce the interaction between the CPU and the GPU and reduce the communication bandwidth of the GPU. The limitation of the model calculation process improves the performance of the neural network model.

实施例五Embodiment 5

图5为本公开实施例五所提供的一种数据处理装置结构示意图，如图5所示，所述装置包括：网络参数确定模块310、着色器确定模块320以及管线确定模块330。FIG. 5 is a schematic structural diagram of a data processing apparatus according to Embodiment 5 of the present disclosure. As shown in FIG. 5 , the apparatus includes: a network parameter determination module 310 , a shader determination module 320 , and a pipeline determination module 330 .

网络参数确定模块310，用于获取与目标神经网格模型相对应的神经网络参数。The network parameter determination module 310 is configured to acquire the neural network parameters corresponding to the target neural grid model.

着色器确定模块320，用于根据所述神经网络参数，生成与每个网络层级相对应的待使用着色器；其中，所述目标神经网络模型中包括多个网络层级。The shader determination module 320 is configured to generate, according to the neural network parameters, a shader to be used corresponding to each network level; wherein, the target neural network model includes a plurality of network levels.

管线确定模块330，用于根据所述目标神经网络模型所属设备的目标设备参数，确定与所述待使用着色器相对应的计算管线，以在接收到待处理数据时，根据所述目标设备参数，调取所述计算管线对所述待处理数据进行处理，得到目标处理结果。The pipeline determination module 330 is configured to determine the computing pipeline corresponding to the shader to be used according to the target device parameters of the device to which the target neural network model belongs, so that when the data to be processed is received, according to the target device parameters , call the computing pipeline to process the data to be processed, and obtain the target processing result.

可选的，着色器确定模块320，还用于调用着色器生成模块对所述神经网络参数进行处理，得到所述目标神经网络模型中每个网络层级的待使用着色器。Optionally, the shader determination module 320 is further configured to call the shader generation module to process the neural network parameters to obtain the shader to be used for each network level in the target neural network model.

在上述各技术方案的基础上，所述目标设备参数包括是间接缓冲参数。Based on the above technical solutions, the target device parameters include indirect buffer parameters.

在上述各技术方案的基础上，管线确定模块330包括调度着色器生成单元以及计算管线确定单元。Based on the above technical solutions, the pipeline determination module 330 includes a scheduling shader generation unit and a calculation pipeline determination unit.

调度着色器生成单元，用于若所述目标神经网络模型所属设备的间接缓冲参数为第一参数，则生成与各待使用着色器相对应的调度着色器。The scheduling shader generating unit is configured to generate a scheduling shader corresponding to each shader to be used if the indirect buffer parameter of the device to which the target neural network model belongs is the first parameter.

计算管线确定单元，用于通过对所述调度着色器和所述待使用着色器编译处理，得到计算管线；其中，所述计算管线中包括与各待使用着色器相对应的子计算管线，以及与各子计算管线对应的控制管线。a calculation pipeline determination unit, configured to obtain a calculation pipeline by compiling the scheduled shader and the to-be-used shader; wherein the calculation pipeline includes sub-calculation pipelines corresponding to each to-be-used shader, and Control pipelines corresponding to each sub-computing pipeline.

在上述各技术方案的基础上，管线确定模块330还包括子计算管线确定单元。Based on the above technical solutions, the pipeline determination module 330 further includes a sub-computing pipeline determination unit.

子计算管线确定单元，用于若所述目标神经网络模型所属设备的间接缓冲参数为第二参数，则确定与各待使用着色器相对应的子计算管线。The sub-computing pipeline determining unit is configured to determine the sub-computing pipeline corresponding to each shader to be used if the indirect buffer parameter of the device to which the target neural network model belongs is the second parameter.

计算管线确定单元，还用于基于各子计算管线，确定所述计算管线。The computing pipeline determining unit is further configured to determine the computing pipeline based on each sub computing pipeline.

本实施例所提供的技术方案，获取与目标神经网络模型相对应的神经网络参数；根据神经网络参数，生成与每个网络层级相对应的待使用着色器，即，针对目标神经网络模型中多个网络层级生成对应的待使用着色器；进一步的，根据目标神经网络模型所属设备的目标设备参数，确定与待使用着色器相对应的计算管线，以在接收到待处理数据时，根据目标设备参数，调取计算管线对待处理数据进行处理，得到目标处理结果，通过CPU生成与神经网络模型各网络层级相对应的着色器，再在接收到待处理数据后基于各着色器进行粗粒度的计算，以大幅降低CPU与GPU之间的交互的方式，减少了GPU通信带宽对模型计算过程的限制，提升了神经网络模型的性能。In the technical solution provided by this embodiment, the neural network parameters corresponding to the target neural network model are obtained; according to the neural network parameters, the shaders to be used corresponding to each network level are generated, that is, for the most of the target neural network model Each network level generates the corresponding shader to be used; further, according to the target device parameters of the device to which the target neural network model belongs, determine the computing pipeline corresponding to the shader to be used, so that when the data to be processed is received, according to the target device Parameters, call the computing pipeline to process the data to be processed, get the target processing result, generate the shader corresponding to each network level of the neural network model through the CPU, and then perform coarse-grained calculation based on each shader after receiving the data to be processed , which greatly reduces the interaction between the CPU and the GPU, reduces the limitation of the GPU communication bandwidth on the model calculation process, and improves the performance of the neural network model.

本公开实施例所提供的数据处理装置可执行本公开任意实施例所提供的数据处理方法，具备执行方法相应的功能模块和有益效果。The data processing apparatus provided by the embodiment of the present disclosure can execute the data processing method provided by any embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method.

值得注意的是，上述装置所包括的各个单元和模块只是按照功能逻辑进行划分的，但并不局限于上述的划分，只要能够实现相应的功能即可；另外，各功能单元的具体名称也只是为了便于相互区分，并不用于限制本公开实施例的保护范围。It is worth noting that the units and modules included in the above device are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; in addition, the specific names of the functional units are only For the convenience of distinguishing from each other, it is not used to limit the protection scope of the embodiments of the present disclosure.

实施例六Embodiment 6

图6为本公开实施例六所提供的一种数据处理装置结构示意图，如图6所示，所述装置包括：网络参数加载模块410、处理方式确定模块420以及处理结果确定模块430。FIG. 6 is a schematic structural diagram of a data processing apparatus according to Embodiment 6 of the present disclosure. As shown in FIG. 6 , the apparatus includes: a network parameter loading module 410 , a processing mode determination module 420 , and a processing result determination module 430 .

网络参数加载模块410，用于在接收到待处理数据时，加载预先确定出的计算管线和神经网络参数；其中，所述计算管线是基于中央处理器对所述目标神经网络模型的神经网络参数和所述目标神经网络模型所属目标设备参数确定的。The network parameter loading module 410 is used to load the pre-determined computing pipeline and neural network parameters when receiving the data to be processed; wherein, the computing pipeline is based on the neural network parameters of the target neural network model based on the central processing unit and the parameters of the target device to which the target neural network model belongs.

处理方式确定模块420，用于根据所述目标设备参数，确定各待使用着色器对待处理数据进行处理的目标处理方式；其中，所述待使用着色器是基于中央处理器对所述神经网络参数处理后得到的。The processing mode determination module 420 is configured to determine, according to the target device parameters, the target processing mode of each shader to be used to process the data to be processed; wherein, the shader to be used is based on the central processing unit's adjustment of the neural network parameters obtained after processing.

处理结果确定模块430，用于基于所述目标处理方式对所述待处理数据进行处理，得到目标处理结果。The processing result determination module 430 is configured to process the data to be processed based on the target processing mode to obtain a target processing result.

可选的，处理方式确定模块420，还用于若所述目标设备参数为第一参数，则确定各待使用着色器对待处理数据进行处理的目标处理方式为第一目标处理方式；若所述目标设备参数为第二参数，则确定所述目标处理方式为第二目标处理方式。Optionally, the processing mode determination module 420 is further configured to, if the target device parameter is the first parameter, determine the target processing mode of each shader to be used to process the data to be processed as the first target processing mode; if the If the target device parameter is the second parameter, the target processing mode is determined to be the second target processing mode.

可选的，处理结果确定模块430，还用于基于与所述计算管线相对应的控制管线确定各子计算管线的执行顺序；基于所述执行顺序和相应的子计算管线向相应的待使用着色器发送程序执行指令，以使各待使用着色器对所述待处理数据进行处理，得到与所述目标处理结果。Optionally, the processing result determination module 430 is further configured to determine the execution order of each sub-computation pipeline based on the control pipeline corresponding to the computation pipeline; based on the execution order and the corresponding sub-computation pipeline, color the corresponding to-be-used pipelines. The shader sends a program execution instruction, so that each shader to be used processes the data to be processed to obtain a processing result related to the target.

可选的，处理结果确定模块430，还用于接收基于当前子计算管线发送的程序执行指令；其中，所述程序执行指令是基于中央处理器根据各神经网络层级的子计算管线确定的；基于所述程序执行指令，对所述待处理数据进行处理，得到图像处理结果，并将所述图像处理结果反馈至所述中央处理器，以使所述中央处理器接收到图像处理结果时，将所述当前子计算管线的下一子计算管线作为所述当前子计算管线，并基于所述当前子计算管线发送程序执行指令，直至所述当前子计算管线为最后一个子计算管线，得到与所述待处理数据相对应的目标处理结果。Optionally, the processing result determination module 430 is further configured to receive a program execution instruction sent based on the current sub-computing pipeline; wherein, the program execution instruction is determined based on the central processing unit according to the sub-computing pipeline of each neural network level; based on The program executes the instructions, processes the data to be processed, obtains an image processing result, and feeds back the image processing result to the central processing unit, so that when the central processing unit receives the image processing result, it will The next sub-computation pipeline of the current sub-computation pipeline is used as the current sub-computation pipeline, and a program execution instruction is sent based on the current sub-computation pipeline until the current sub-computation pipeline is the last sub-computation pipeline. The target processing result corresponding to the data to be processed.

本实施例所提供的技术方案，在接收到待处理数据时，加载预先确定出的计算管线和神经网络参数，根据目标设备参数，确定各待使用着色器对待处理数据进行处理的目标处理方式；基于目标处理方式对待处理数据进行处理，得到目标处理结果，从而使GPU在接收到待处理数据后，基于缓存中的各着色器进行粗粒度的计算，以大幅降低CPU与GPU之间的交互的方式，减少了GPU通信带宽对模型计算过程的限制，提升了神经网络模型的性能。In the technical solution provided by this embodiment, when the data to be processed is received, the pre-determined computing pipeline and neural network parameters are loaded, and the target processing mode of each shader to be used for processing the data to be processed is determined according to the target device parameters; The data to be processed is processed based on the target processing method, and the target processing result is obtained, so that the GPU can perform coarse-grained calculations based on the shaders in the cache after receiving the data to be processed, so as to greatly reduce the interaction between the CPU and the GPU. In this way, the limitation of the GPU communication bandwidth on the model calculation process is reduced, and the performance of the neural network model is improved.

实施例七Embodiment 7

图7为本公开实施例七所提供的一种电子设备结构示意图。下面参考图7，其示出了适于用来实现本公开实施例的电子设备(例如图7中的终端设备或服务器)500的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图7示出的电子设备仅仅是一个示例，不应对本公开实施例的功能和使用范围带来任何限制。FIG. 7 is a schematic structural diagram of an electronic device according to Embodiment 7 of the present disclosure. Referring next to FIG. 7 , it shows a schematic structural diagram of an electronic device (eg, a terminal device or a server in FIG. 7 ) 500 suitable for implementing an embodiment of the present disclosure. Terminal devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, mobile terminals such as in-vehicle navigation terminals), etc., and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in FIG. 7 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.

如图7所示，电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501，其可以根据存储在只读存储器(ROM)502中的程序或者从存储装置506加载到随机访问存储器(RAM)503中的程序而执行各种适当的动作和处理。在RAM 503中，还存储有电子设备500操作所需的各种程序和数据。处理装置501、ROM 502以及RAM 503通过总线504彼此相连。编辑/输出(I/O)接口505也连接至总线504。As shown in FIG. 7 , an electronic device 500 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 501 that may be loaded into random access according to a program stored in a read only memory (ROM) 502 or from a storage device 506 Various appropriate actions and processes are executed by the programs in the memory (RAM) 503 . In the RAM 503, various programs and data necessary for the operation of the electronic device 500 are also stored. The processing device 501 , the ROM 502 , and the RAM 503 are connected to each other through a bus 504 . An edit/output (I/O) interface 505 is also connected to bus 504 .

通常，以下装置可以连接至I/O接口505：包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的编辑装置506；包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置507；包括例如磁带、硬盘等的存储装置506；以及通信装置509。通信装置509可以允许电子设备500与其他设备进行无线或有线通信以交换数据。虽然图7示出了具有各种装置的电子设备500，但是应理解的是，并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。In general, the following devices may be connected to the I/O interface 505: editing devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 507 such as a computer; a storage device 506 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 509 . Communication means 509 may allow electronic device 500 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 7 shows electronic device 500 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

特别地，根据本公开的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本公开的实施例包括一种计算机程序产品，其包括承载在非暂态计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信装置509从网络上被下载和安装，或者从存储装置506被安装，或者从ROM 502被安装。在该计算机程序被处理装置501执行时，执行本公开实施例的方法中限定的上述功能。In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication device 509 , or from the storage device 506 , or from the ROM 502 . When the computer program is executed by the processing apparatus 501, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.

本公开实施例提供的电子设备与上述实施例提供的数据处理方法属于同一发明构思，未在本实施例中详尽描述的技术细节可参见上述实施例，并且本实施例与上述实施例具有相同的有益效果。The electronic device provided by the embodiment of the present disclosure and the data processing method provided by the above-mentioned embodiment belong to the same inventive concept. For the technical details not described in detail in this embodiment, please refer to the above-mentioned embodiment, and this embodiment has the same characteristics as the above-mentioned embodiment. beneficial effect.

实施例八Embodiment 8

本公开实施例提供了一种计算机存储介质，其上存储有计算机程序，该程序被处理器执行时实现上述实施例所提供的数据处理方法。Embodiments of the present disclosure provide a computer storage medium on which a computer program is stored, and when the program is executed by a processor, implements the data processing method provided by the foregoing embodiments.

需要说明的是，本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中，计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：电线、光缆、RF(射频)等等，或者上述的任意合适的组合。It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.

在一些实施方式中，客户端、服务器可以利用诸如HTTP(HyperText TransferProtocol，超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信，并且可以与任意形式或介质的数字数据通信(例如，通信网络)互连。通信网络的示例包括局域网(“LAN”)，广域网(“WAN”)，网际网(例如，互联网)以及端对端网络(例如，ad hoc端对端网络)，以及任何当前已知或未来研发的网络。In some embodiments, the client and server can communicate using any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium (eg, a communications network) interconnected. Examples of communication networks include local area networks ("LAN"), wide area networks ("WAN"), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.

上述计算机可读介质可以是上述电子设备中所包含的；也可以是单独存在，而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.

上述计算机可读介质承载有一个或者多个程序，当上述一个或者多个程序被该电子设备执行时，使得该电子设备：The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device:

或，or,

可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码，上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).

附图中的流程图和框图，图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

描述于本公开实施例中所涉及到的单元可以通过软件的方式实现，也可以通过硬件的方式来实现。其中，单元的名称在某种情况下并不构成对该单元本身的限定，例如，第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。The units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. Wherein, the name of the unit does not constitute a limitation of the unit itself under certain circumstances, for example, the first obtaining unit may also be described as "a unit that obtains at least two Internet Protocol addresses".

本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如，非限制性地，可以使用的示范类型的硬件逻辑部件包括：现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.

在本公开的上下文中，机器可读介质可以是有形的介质，其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备，或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

根据本公开的一个或多个实施例，【示例一】提供了一种数据处理方法，应用于中央处理器中，该方法包括：According to one or more embodiments of the present disclosure, [Example 1] provides a data processing method, which is applied to a central processing unit, and the method includes:

根据本公开的一个或多个实施例，【示例二】提供了一种数据处理方法，该方法，还包括：According to one or more embodiments of the present disclosure, [Example 2] provides a data processing method, and the method further includes:

可选的，所述根据所述神经网络参数，生成与每个网络层级相对应的待使用着色器，包括：Optionally, generating a shader to be used corresponding to each network level according to the neural network parameters, including:

调用着色器生成模块对所述神经网络参数进行处理，得到所述目标神经网络模型中每个网络层级的待使用着色器。The shader generation module is called to process the neural network parameters to obtain the shader to be used for each network level in the target neural network model.

根据本公开的一个或多个实施例，【示例三】提供了一种数据处理方法，该方法，还包括：According to one or more embodiments of the present disclosure, [Example 3] provides a data processing method, and the method further includes:

可选的，所述目标设备参数包括是间接缓冲参数，所述根据所述目标神经网络模型所属设备的目标设备参数，确定与所述待使用着色器相对应的计算管线，包括：Optionally, the target device parameters include indirect buffer parameters, and determining the computing pipeline corresponding to the shader to be used according to the target device parameters of the device to which the target neural network model belongs, including:

若所述目标神经网络模型所属设备的间接缓冲参数为第一参数，则生成与各待使用着色器相对应的调度着色器；If the indirect buffer parameter of the device to which the target neural network model belongs is the first parameter, generating a scheduling shader corresponding to each shader to be used;

通过对所述调度着色器和所述待使用着色器编译处理，得到计算管线；Obtaining a computing pipeline by compiling the scheduling shader and the shader to be used;

其中，所述计算管线中包括与各待使用着色器相对应的子计算管线，以及与各子计算管线对应的控制管线。Wherein, the computing pipeline includes sub-computing pipelines corresponding to each shader to be used, and a control pipeline corresponding to each sub-computing pipeline.

根据本公开的一个或多个实施例，【示例四】提供了一种数据处理方法，该方法，还包括：According to one or more embodiments of the present disclosure, [Example 4] provides a data processing method, and the method further includes:

可选的，所述目标设备参数包括间接缓冲参数，所述根据所述目标神经网络模型所属设备的目标设备参数，确定与所述待使用着色器相对应的计算管线，包括：Optionally, the target device parameters include indirect buffer parameters, and the calculation pipeline corresponding to the shader to be used is determined according to the target device parameters of the device to which the target neural network model belongs, including:

若所述目标神经网络模型所属设备的间接缓冲参数为第二参数，则确定与各待使用着色器相对应的子计算管线；If the indirect buffer parameter of the device to which the target neural network model belongs is the second parameter, determining the sub-computing pipeline corresponding to each shader to be used;

基于各子计算管线，确定所述计算管线。Based on each sub-computing pipeline, the computing pipeline is determined.

根据本公开的一个或多个实施例，【示例五】提供了一种数据处理方法，该方法，还包括：According to one or more embodiments of the present disclosure, [Example 5] provides a data processing method, and the method further includes:

可选的，缓存所述目标神经网络模型的神经网络参数和计算管线，以在接收到待处理数据时，调取所述神经网络参数和计算管线对所述待处理数据进行处理。Optionally, the neural network parameters and computing pipeline of the target neural network model are cached, so that when data to be processed is received, the neural network parameters and computing pipeline are retrieved to process the data to be processed.

根据本公开的一个或多个实施例，【示例六】提供了一种数据处理方法，应用于图形处理器中，该方法包括：According to one or more embodiments of the present disclosure, [Example 6] provides a data processing method, which is applied in a graphics processor, and the method includes:

根据本公开的一个或多个实施例，【示例七】提供了一种数据处理方法，该方法，还包括：According to one or more embodiments of the present disclosure, [Example 7] provides a data processing method, and the method further includes:

可选的，所述根据所述目标设备参数，确定各待使用着色器对待处理数据进行处理的目标处理方式，包括：Optionally, according to the target device parameters, determining the target processing mode of each shader to be used to process the data to be processed includes:

若所述目标设备参数为第一参数，则确定各待使用着色器对待处理数据进行处理的目标处理方式为第一目标处理方式；If the target device parameter is the first parameter, determine that the target processing mode of each shader to be used to process the data to be processed is the first target processing mode;

若所述目标设备参数为第二参数，则确定所述目标处理方式为第二目标处理方式。If the target device parameter is the second parameter, it is determined that the target processing mode is the second target processing mode.

根据本公开的一个或多个实施例，【示例八】提供了一种数据处理方法，该方法，还包括：According to one or more embodiments of the present disclosure, [Example 8] provides a data processing method, and the method further includes:

可选的，所述目标处理方式为第一目标处理方式，所述基于所述目标处理方式对所述待处理数据进行处理，得到目标处理结果，包括：Optionally, the target processing mode is a first target processing mode, and the data to be processed is processed based on the target processing mode to obtain a target processing result, including:

基于与所述计算管线相对应的控制管线确定各子计算管线的执行顺序；determining the execution order of each sub-computing pipeline based on the control pipeline corresponding to the computing pipeline;

基于所述执行顺序和相应的子计算管线向相应的待使用着色器发送程序执行指令，以使各待使用着色器对所述待处理数据进行处理，得到与所述目标处理结果。Based on the execution order and corresponding sub-computing pipelines, program execution instructions are sent to the corresponding shaders to be used, so that each shader to be used processes the data to be processed to obtain a processing result corresponding to the target.

根据本公开的一个或多个实施例，【示例九】提供了一种数据处理方法，该方法，还包括：According to one or more embodiments of the present disclosure, [Example 9] provides a data processing method, the method further comprising:

可选的，所述基于所述目标处理方式对所述待处理数据进行处理，得到目标处理结果，包括：Optionally, the processing of the data to be processed based on the target processing mode to obtain a target processing result, including:

接收基于当前子计算管线发送的程序执行指令；其中，所述程序执行指令是基于中央处理器根据各神经网络层级的子计算管线确定的；Receive the program execution instruction sent based on the current sub-computing pipeline; wherein, the program execution instruction is determined based on the central processing unit according to the sub-computing pipelines of each neural network level;

基于所述程序执行指令，对所述待处理数据进行处理，得到数据处理结果，并将所述数据处理结果反馈至所述中央处理器，以使所述中央处理器接收到数据处理结果时，将所述当前子计算管线的下一子计算管线作为所述当前子计算管线，并基于所述当前子计算管线发送程序执行指令，直至所述当前子计算管线为最后一个子计算管线，得到与所述待处理数据相对应的目标处理结果。Based on the program execution instructions, the data to be processed is processed to obtain a data processing result, and the data processing result is fed back to the central processing unit, so that when the central processing unit receives the data processing result, Taking the next sub-computing pipeline of the current sub-computing pipeline as the current sub-computing pipeline, and sending a program execution instruction based on the current sub-computing pipeline, until the current sub-computing pipeline is the last sub-computing pipeline, obtaining and The target processing result corresponding to the data to be processed.

根据本公开的一个或多个实施例，【示例十】提供了一种数据处理装置，配置于中央处理器中，该装置包括：According to one or more embodiments of the present disclosure, [Example 10] provides a data processing apparatus configured in a central processing unit, the apparatus comprising:

根据本公开的一个或多个实施例，【示例十一】提供了一种数据处理装置，配置于图形处理器中，该装置包括：According to one or more embodiments of the present disclosure, [Example 11] provides a data processing apparatus configured in a graphics processor, the apparatus comprising:

以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解，本公开中所涉及的公开范围，并不限于上述技术特征的特定组合而成的技术方案，同时也应涵盖在不脱离上述公开构思的情况下，由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of the disclosure involved in the present disclosure is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned disclosed concept, the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the present disclosure (but not limited to) with similar functions.

此外，虽然采用特定次序描绘了各操作，但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下，多任务和并行处理可能是有利的。同样地，虽然在上面论述中包含了若干具体实现细节，但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地，在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。Additionally, although operations are depicted in a particular order, this should not be construed as requiring that the operations be performed in the particular order shown or in a sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, although the above discussion contains several implementation-specific details, these should not be construed as limitations on the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题，但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反，上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。Although the subject matter has been described in language specific to structural features and/or logical acts of method, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.

Claims

1. A data processing method is applied to a central processing unit and comprises the following steps:

acquiring neural network parameters corresponding to the target neural grid model;

generating shaders to be used corresponding to each network level according to the neural network parameters; wherein the target neural network model comprises a plurality of network hierarchies;

and determining a computing pipeline corresponding to the shader to be used according to target equipment parameters of the equipment to which the target neural network model belongs, and calling the computing pipeline to process the data to be processed according to the target equipment parameters when the data to be processed is received to obtain a target processing result.

2. The method of claim 1, wherein generating a to-be-used shader corresponding to each network level according to the neural network parameters comprises:

and calling a shader generating module to process the neural network parameters to obtain a shader to be used of each network level in the target neural network model.

3. The method of claim 1, wherein the target device parameter comprises an indirect buffering parameter, and wherein determining the computing pipeline corresponding to the shader to be used according to the target device parameter of the device to which the target neural network model belongs comprises:

if the indirect buffering parameter of the device to which the target neural network model belongs is a first parameter, generating scheduling shaders corresponding to the shaders to be used;

compiling and processing the scheduling shader and the to-be-used shader to obtain a computing pipeline;

the computing pipeline comprises sub-computing pipelines corresponding to all shaders to be used and control pipelines corresponding to all the sub-computing pipelines.

4. The method of claim 1, wherein the target device parameters comprise indirect buffer parameters, and wherein determining the computing pipeline corresponding to the shader to be used according to the target device parameters of the device to which the target neural network model belongs comprises:

if the indirect buffering parameter of the device to which the target neural network model belongs is a second parameter, determining a sub-computing pipeline corresponding to each shader to be used;

based on each sub-computation pipeline, the computation pipeline is determined.

5. The method of claim 1, further comprising:

caching the neural network parameters and the calculation pipelines of the target neural network model so as to call the neural network parameters and the calculation pipelines to process the data to be processed when the data to be processed is received.

6. A data processing method applied to a graphics processor includes:

loading predetermined calculation pipeline and neural network parameters when receiving data to be processed; wherein the calculation pipeline is determined based on the neural network parameters of a target neural network model and the target equipment parameters of the target neural network model by a central processing unit;

determining a target processing mode for processing the data to be processed by each shader to be used according to the target equipment parameters; the shader to be used is obtained after the neural network parameters are processed based on a central processing unit;

and processing the data to be processed based on the target processing mode to obtain a target processing result.

7. The method of claim 6, wherein determining the target processing mode for processing the data to be processed by each shader according to the target device parameters comprises:

if the target equipment parameter is a first parameter, determining that a target processing mode for processing the data to be processed by each shader to be used is a first target processing mode;

and if the target equipment parameter is the second parameter, determining that the target processing mode is the second target processing mode.

8. The method according to claim 7, wherein the target processing manner is a first target processing manner, and the processing the data to be processed based on the target processing manner to obtain a target processing result includes:

determining an execution order of each sub-compute pipeline based on a control pipeline corresponding to the compute pipeline;

and sending a program execution instruction to the corresponding to-be-used shaders based on the execution sequence and the corresponding sub-computing pipelines, so that each to-be-used shader processes the to-be-processed data to obtain the target processing result.

9. The method according to claim 8, wherein the processing the data to be processed based on the target processing manner to obtain a target processing result comprises:

receiving a program execution instruction sent based on a current sub-compute pipeline; wherein the program execution instructions are determined based on the central processor from sub-compute pipelines of each neural network hierarchy;

and processing the data to be processed based on the program execution instruction to obtain a data processing result, and feeding back the data processing result to the central processing unit, so that when the central processing unit receives the data processing result, the next sub-computing pipeline of the current sub-computing pipeline is used as the current sub-computing pipeline, and the program execution instruction is sent based on the current sub-computing pipeline until the current sub-computing pipeline is the last sub-computing pipeline, so as to obtain a target processing result corresponding to the data to be processed.

10. A data processing apparatus, configured in a central processing unit, comprising:

the network parameter determining module is used for acquiring neural network parameters corresponding to the target neural grid model;

the shader determining module is used for generating shaders to be used corresponding to each network level according to the neural network parameters; wherein the target neural network model comprises a plurality of network hierarchies;

and the pipeline determining module is used for determining a computing pipeline corresponding to the shader to be used according to the target equipment parameter of the equipment to which the target neural network model belongs, and calling the computing pipeline to process the data to be processed according to the target equipment parameter when the data to be processed is received to obtain a target processing result.

11. A data processing apparatus, configured in a graphics processor, comprising:

the network parameter loading module is used for loading predetermined calculation pipelines and neural network parameters when receiving data to be processed; wherein the calculation pipeline is determined based on the neural network parameters of a target neural network model and the target equipment parameters of the target neural network model by a central processing unit;

the processing mode determining module is used for determining a target processing mode for processing the data to be processed by each shader to be used according to the target equipment parameters; the shader to be used is obtained after the neural network parameters are processed based on a central processing unit;

and the processing result determining module is used for processing the data to be processed based on the target processing mode to obtain a target processing result.

12. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a data processing method as claimed in any one of claims 1-5 or 6-9.

13. A storage medium containing computer-executable instructions for performing the data processing method of any one of claims 1-5 or 6-9 when executed by a computer processor.