CN114330689A - Data processing method and device, electronic equipment and storage medium - Google Patents
Data processing method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN114330689A CN114330689A CN202111644074.5A CN202111644074A CN114330689A CN 114330689 A CN114330689 A CN 114330689A CN 202111644074 A CN202111644074 A CN 202111644074A CN 114330689 A CN114330689 A CN 114330689A
- Authority
- CN
- China
- Prior art keywords
- target
- neural network
- data
- shader
- pipeline
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Image Generation (AREA)
- Image Processing (AREA)
Abstract
The embodiment of the disclosure provides a data processing method, a data processing device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring neural network parameters corresponding to the target neural grid model; generating shaders to be used corresponding to each network level according to the neural network parameters; wherein the target neural network model comprises a plurality of network hierarchies; and determining a computing pipeline corresponding to the shader to be used according to target equipment parameters of the equipment to which the target neural network model belongs, and calling the computing pipeline to process the data to be processed according to the target equipment parameters when the data to be processed is received to obtain a target processing result. According to the technical scheme of the embodiment of the disclosure, the limitation of the GPU communication bandwidth to the model calculation process is reduced in a mode of greatly reducing the interaction between the CPU and the GPU, and the performance of the neural network model is improved.
Description
Technical Field
The embodiments of the present disclosure relate to the field of data processing technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.
Background
With the continuous development of artificial intelligence technology, neural network models are widely applied in various fields, and various data can be processed by depending on the characteristics of large-scale parallel processing, distributed information storage and the like.
However, when a computer performs calculation based on a neural network model, a large amount of communication interaction between a Central Processing Unit (CPU) and a Graphics Processing Unit (GPU) reduces the calculation efficiency of the neural network, and the performance of the model has a bottleneck.
Disclosure of Invention
The present disclosure provides a data processing method, an apparatus, an electronic device, and a storage medium, which reduce the limitation of GPU communication bandwidth on the model calculation process and improve the performance of a neural network model in a manner of greatly reducing the interaction between a CPU and a GPU.
In a first aspect, an embodiment of the present disclosure provides a data processing method, applied to a central processing unit, including:
acquiring neural network parameters corresponding to the target neural grid model;
generating shaders to be used corresponding to each network level according to the neural network parameters; wherein the target neural network model comprises a plurality of network hierarchies;
and determining a computing pipeline corresponding to the shader to be used according to target equipment parameters of the equipment to which the target neural network model belongs, and calling the computing pipeline to process the data to be processed according to the target equipment parameters when the data to be processed is received to obtain a target processing result.
In a second aspect, an embodiment of the present disclosure further provides a data processing method applied in a graphics processor, including:
loading predetermined calculation pipeline and neural network parameters when receiving data to be processed; wherein the calculation pipeline is determined based on a central processing unit for the neural network parameters of the target neural network model and the target device parameters to which the target neural network model belongs;
determining a target processing mode for processing the data to be processed by each shader to be used according to the target equipment parameters; the shader to be used is obtained after the neural network parameters are processed based on a central processing unit;
and processing the data to be processed based on the target processing mode to obtain a target processing result.
In a third aspect, an embodiment of the present invention further provides a data processing apparatus, where the apparatus is configured in a central processing unit, and the apparatus includes:
the network parameter determining module is used for acquiring neural network parameters corresponding to the target neural grid model;
the shader determining module is used for generating shaders to be used corresponding to each network level according to the neural network parameters; wherein the target neural network model comprises a plurality of network hierarchies;
and the pipeline determining module is used for determining a computing pipeline corresponding to the shader to be used according to the target equipment parameter of the equipment to which the target neural network model belongs, and calling the computing pipeline to process the data to be processed according to the target equipment parameter when the data to be processed is received to obtain a target processing result.
In a fourth aspect, an embodiment of the present invention further provides a data processing apparatus, where the data processing apparatus is configured in a graphics processor, and the data processing apparatus includes:
the network parameter loading module is used for loading predetermined calculation pipelines and neural network parameters when receiving data to be processed; wherein the calculation pipeline is determined based on a central processing unit for the neural network parameters of the target neural network model and the target device parameters to which the target neural network model belongs;
the processing mode determining module is used for determining a target processing mode for processing the data to be processed by each shader to be used according to the target equipment parameters; the shader to be used is obtained after the neural network parameters are processed based on a central processing unit;
and the processing result determining module is used for processing the data to be processed based on the target processing mode to obtain a target processing result.
In a fifth aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the data processing method according to any one of the embodiments of the present disclosure.
In a sixth aspect, the disclosed embodiments also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are used to perform the data processing method according to any one of the disclosed embodiments.
According to the technical scheme of the embodiment of the disclosure, neural network parameters corresponding to a target neural network model are obtained; generating shaders to be used corresponding to each network level according to the neural network parameters, namely generating corresponding shaders to be used aiming at a plurality of network levels in the target neural network model; furthermore, according to target device parameters of a device to which the target neural network model belongs, a computing pipeline corresponding to a shader to be used is determined, when data to be processed is received, the computing pipeline is called according to the target device parameters to process the data to be processed to obtain a target processing result, the shader corresponding to each network level of the neural network model is generated through the CPU, coarse-grained calculation is performed based on each shader after the data to be processed is received, the limitation of GPU communication bandwidth on the model computing process is reduced in a mode of greatly reducing the interaction between the CPU and the GPU, and the performance of the neural network model is improved.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.
Fig. 1 is a schematic flow chart of a data processing method according to a first embodiment of the disclosure;
fig. 2 is a schematic flow chart of a data processing method according to a second embodiment of the disclosure;
fig. 3 is a schematic flow chart of a data processing method according to a third embodiment of the present disclosure;
fig. 4 is a schematic flow chart of a data processing method according to a fourth embodiment of the disclosure;
fig. 5 is a schematic structural diagram of a data processing apparatus according to a fifth embodiment of the disclosure;
fig. 6 is a schematic structural diagram of a data processing apparatus according to a sixth embodiment of the disclosure;
fig. 7 is a schematic structural diagram of an electronic device according to a seventh embodiment of the disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units. It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
Example one
Fig. 1 is a schematic flow chart of a data processing method according to a first embodiment of the present disclosure, where the embodiment of the present disclosure is applicable to a central processing unit generating shaders with coarse granularity according to neural network parameters, so that when receiving data to be processed, the method may be executed by a data processing apparatus, where the apparatus may be implemented in a form of software and/or hardware, or alternatively, implemented by an electronic device, where the electronic device may be a mobile terminal, a PC terminal, a server, or the like, based on a situation that the shaders to be used corresponding to network hierarchies are used to perform fast processing on the data.
As shown in fig. 1, the method includes:
and S110, obtaining the neural network parameters corresponding to the target neural network model.
It should be noted that the solution of this embodiment may be executed based on a target device, where the target device may be any terminal device equipped with a CPU and a GPU, and after receiving data to be processed, the target device may process the received data by using the processing capabilities of the CPU and the GPU.
The Neural network model (NN) is a mathematical method for simulating a human actual Neural network, and specifically, each Neural network is a complex network system formed by widely connecting a large number of simple processing units (which may be referred to as neurons), and at least can reflect many basic features of human brain functions, and is a highly complex nonlinear dynamical learning system. Therefore, the neural network model has large-scale parallel, distributed storage and processing, self-organization, self-adaptation and self-learning capabilities, and is suitable for processing inaccurate and fuzzy information processing problems which need to consider many factors and conditions simultaneously.
In this embodiment, the target Neural Network model may be one or more Neural Network models associated with a specific service, for example, when a graphic-related service needs to be processed, a corresponding Convolutional Neural Network (CNN) model may be determined, and when a service existing for multiple periods needs to be processed, a corresponding Recurrent Neural Network (RNN) model may be determined.
In the present embodiment, in order for the computer device to perform the calculation corresponding to the target neural network model, it is also necessary to acquire neural network parameters corresponding to the target neural network. The neural network parameters are multidimensional algorithm information related to the target neural network model, and may include at least one of an identifier of the target neural network model, a network structure, the number of neural network layers, a tensor of the neural network, an execution sequence of each layer, and an input and an output of each layer. For example, when the target neural network model is a CNN model, the determined neural network parameters may be the data types of the input and output of the CNN, and the execution order of the input layer, the convolutional layer, the activation function, the pooling layer, and the fully-connected layer. It is understood that the computer is at least capable of building a model of the target neural network in the memory based on the acquired neural network parameters.
It can be understood that the structure of the target neural network model can be set according to actual conditions, specific neural network parameters of the target neural network model correspond to network hierarchies of the model, and the CPU can construct corresponding network hierarchies only by acquiring the neural network parameters, so as to obtain the target neural network model.
It should be noted that, in the actual application process, the neural network parameters corresponding to each target service may be obtained in advance and stored in the corresponding server, and the neural network parameters may be called when model-related calculations need to be executed, or the neural network parameters may be obtained in real time in a dynamic manner when heterogeneous calculations related to the target neural network model need to be executed. It will be understood by those skilled in the art that the specific manner of obtaining the neural network parameters may be selected according to actual situations, and the embodiments of the present disclosure are not limited specifically herein.
And S120, generating shaders to be used corresponding to each network level according to the neural network parameters.
In the present embodiment, the heterogeneous computation of the neural network is a special form of parallel and distributed computation, and it is understood that the performance characteristics required by the heterogeneous computation are actually very similar to those of the graph correlation algorithm, and usually involve a large number of buffers of parameters, activation values, and gradient values, where each value is updated in each training iteration or data processing process, and these buffers are too large to exceed the cache of the conventional desktop computer, so that the GPU with a very high memory bandwidth can be used to perform the correlation computation of the neural network.
Therefore, in this embodiment, after obtaining the neural network parameters, in order to process the data related to the service by using the GPU when receiving the data related to the service, first, the CPU needs to generate a to-be-used shader that can run in the GPU and corresponds to the target neural network model. The shader to be used can be understood as a compute shader unrelated to conventional graphics rendering, and when the generated compute shader is loaded by the GPU, computations related to the target neural network model can be executed.
Meanwhile, since the target neural network model determined by the CPU includes a plurality of network hierarchies, when generating a to-be-used shader, it is generally necessary to generate a corresponding to-be-used shader for each network hierarchy. Specifically, a shader generation module needs to be called to process the neural network parameters, so as to obtain a to-be-used shader of each network level in the target neural network model. The shader generating module is a processing module deployed at a CPU end and at least has a shader to be used, and because the neural network parameters correspond to each network level of the target neural network model, each shader to be used generated by the CPU through the shader generating module also corresponds to each network level of the model.
In the embodiment, the CPU constructs a coarse-grained shader to be used for each network level of the target neural network model, instead of constructing a fine-grained shader for a specific operator in each network level, so that frequent interaction between the CPU and the GPU is avoided in the neural network calculation process, and the limitation of the GPU communication bandwidth on the performance of the neural network is reduced.
Illustratively, when the target neural network model is determined to be the CNN model and the neural network parameters of the CNN model are obtained, the CPU may determine the input layer, the convolutional layer, the pooling layer, and the fully-connected layer of the CNN based on the obtained neural network parameters, and may generate shaders to be used corresponding to each network level.
S130, determining a calculation pipeline corresponding to the shader to be used according to target equipment parameters of equipment to which the target neural network model belongs, and calling the calculation pipeline to process the data to be processed according to the target equipment parameters when the data to be processed is received to obtain a target processing result.
In this embodiment, since not all GPUs use the same instruction set, only by converting a program into a binary file, the GPU can load and use the to-be-used shaders corresponding to each network level based on its own architecture set. Therefore, after the CPU generates the to-be-used shaders corresponding to the network levels of the target neural network model, the driving capability of the GPU needs to be invoked to generate the computing pipelines corresponding to the to-be-used shaders.
For a computer, a computing pipeline may be understood as a pipeline required by the GPU when running a program, and a process of determining the computing pipeline may be understood as a process of the CPU invoking the driving capability of the GPU, compiling, packaging, and storing each shader to be used to a target cache based on a preset data structure. For example, after a json format is preset as a data structure of a computational pipeline, a CPU may call driving capability of a GPU to compile, package, and cache each shader to be used, and it can be understood that json format data finally stored in a target cache, that is, key data in each shader to be used, may be loaded by the GPU as the key data of the computational pipeline based on the target cache when specific service data is subsequently processed, so that corresponding heterogeneous computations are executed according to an execution sequence of each network level of the target neural network model.
In this embodiment, each calculation pipeline may be determined according to a target device parameter of a device to which the target neural network model belongs. The target device may be a device that performs calculation related to the target neural network model based on the GPU, and correspondingly, the target device parameter is attribute information of the GPU carried by the target device.
In an actual application process, the target device parameter includes an indirect buffering parameter (IndirectBuffer), which may be understood as information that represents whether the GPU supports an indirect buffering function, specifically, when the GPU supports indirect buffering, the GPU may determine, based on the GPU, an execution sequence of each loaded shader to be used when processing the service data, and when the GPU does not support indirect buffering, and when the GPU processes the service data, it needs to determine, according to an instruction sent by the CPU, a next shader to be used that needs to be executed after one shader to be used is executed.
It can be understood that when the GPU supports indirect caching, the overall heterogeneous computation process of the target neural network model can be completed without the CPU sending instructions, and when the GPU does not support indirect caching, multiple interactions with the CPU are also required in the process of performing the computation related to the target neural network model, and the shader to be used that needs to be executed next is determined based on each instruction sent by the CPU.
Based on this, under the condition that the parameters of the target device are different, the CPU determines the manner of each computing pipeline, and it can be understood that, according to the determination of whether the GPU supports indirect caching, the CPU has two corresponding manners to determine the computing pipeline corresponding to each shader to be used.
The first mode is that if the indirect buffering parameter of the device to which the target neural network model belongs is a first parameter, a scheduling shader corresponding to each shader to be used is generated; and compiling and processing the scheduling shader and the to-be-used shader to obtain the computing pipeline.
Specifically, when the indirect buffering parameter is the first parameter, it indicates that the GPU mounted on the target device supports indirect buffering, and therefore, a scheduling shader corresponding to each shader to be used may be generated. The scheduling shader may be understood as program code running in the GPU for controlling an execution sequence of each shader to be used. For example, when the target device parameter is determined to be the first buffer parameter, a corresponding scheduling shader may be generated for the input layer, the convolutional layer, the pooling layer, and the fully-connected layer in the CNN model, and when the related image data is subsequently processed, the GPU may determine an execution sequence of each layer in the CNN model based on the scheduling shader.
In this embodiment, after the scheduling shaders and the to-be-used shaders are generated, the generated shaders can be compiled based on the CPU, and the computing pipelines corresponding to the shaders are obtained. The computing pipelines include sub-computing pipelines corresponding to each shader to be used, and control pipelines corresponding to each sub-computing pipeline.
The second way is that if the indirect buffering parameter of the device to which the target neural network model belongs is a second parameter, the sub-computing pipelines corresponding to the shaders to be used are determined; based on each sub-computation pipeline, the computation pipeline is determined.
Specifically, when the indirect buffering parameter is the second parameter, it indicates that the GPU mounted on the target device does not support indirect buffering, and therefore, a scheduling shader corresponding to each shader to be used is not required to be generated, but a sub-computation pipeline corresponding to each shader to be used is directly generated, and further, the determined sub-computation keys are integrated, so that a computation pipeline corresponding to the target neural network model is obtained. It can be understood that, when the indirect buffering parameter is the second parameter, when the GPU processes each service data, the execution sequence of each shader to be used cannot be determined based on the GPU itself, and therefore, each section of shader to be used needs to be executed based on an instruction sent by the CPU.
In this embodiment, after determining the calculation pipeline, the CPU may cache the neural network parameters and the calculation pipeline of the target neural network model, so as to call the neural network parameters and the calculation pipeline to process the data to be processed when the data to be processed is received. The data to be processed may be data related to a service associated with the target neural network model, and at the same time, the data to be processed is also an input of the target neural network model.
For example, after the CPU determines the neural network parameters of the CNN model, the sub-compute pipelines corresponding to each network level, and the control pipeline corresponding to the scheduling shader, the data may be stored in the target cache. After receiving the service data (i.e., the data to be processed) associated with the CNN model, the GPU of the target device may load the sub-computation pipeline and the control pipeline corresponding to the target neural network model based on the target cache to process the data to be processed, that is, under the scheduling of the control pipeline, the service data is first input to the input layer, after the execution of the shader to be used corresponding to the input layer is completed, the result output by the layer is sequentially input to the convolutional layer, the pooling layer, and the full-link layer under the scheduling of the control pipeline, and the shader to be used corresponding to each network layer is executed, so as to obtain the processing result corresponding to the data to be processed for the service.
According to the technical scheme of the embodiment of the disclosure, neural network parameters corresponding to a target neural network model are obtained; generating shaders to be used corresponding to each network level according to the neural network parameters, namely generating corresponding shaders to be used aiming at a plurality of network levels in the target neural network model; furthermore, according to target device parameters of a device to which the target neural network model belongs, a computing pipeline corresponding to a shader to be used is determined, when data to be processed is received, the computing pipeline is called according to the target device parameters to process the data to be processed to obtain a target processing result, the shader corresponding to each network level of the neural network model is generated through the CPU, coarse-grained calculation is performed based on each shader after the data to be processed is received, the limitation of GPU communication bandwidth on the model computing process is reduced in a mode of greatly reducing the interaction between the CPU and the GPU, and the performance of the neural network model is improved.
Example two
As an alternative embodiment of the foregoing embodiment, fig. 2 is a schematic flow chart of a data processing method provided in the second embodiment of the disclosure. For clearly describing the technical solution of the present embodiment, an application scenario is described as an example in which a shader program is dynamically generated based on a central processing unit, and when data to be processed is received, a target device processes the data to be processed, but the present invention is not limited to the above scenario and may be applied to various scenarios in which data related to a target neural network model needs to be processed.
Referring to fig. 2, in the process of building the neural network by the CPU, first, model parameters of a target neural network model need to be obtained, and the parameters may be determined based on configuration parameters input by a caller, for example, after the caller needing to develop a certain service inputs relevant configuration parameters of the CNN model on a target page, a processing module at the CPU end may obtain the parameters and use the parameters as the neural network parameters.
With continued reference to fig. 2, after the CPU determines the target neural network model and the corresponding neural network parameters, a shader to be used for neural network computation may be generated for each level of the model; further, whether the GPU of the target device supports the indirect cache is determined, when it is determined that the GPU supports the indirect cache, the CPU needs to generate scheduling shaders of each level, and call a driver of the GPU to compile the scheduling shaders and the shaders to be used, so as to obtain corresponding computing pipelines, where it can be understood that the computing pipelines include sub-computing pipelines corresponding to the shaders to be used, and control pipelines corresponding to the scheduling shaders. And when the GPU is determined not to support the indirect cache, the CPU directly generates the sub-computing pipelines corresponding to the shaders to be used.
With continued reference to fig. 2, after the CPU generates the calculation pipeline, the neural network parameters and the calculation pipeline may be cached, so that when the data to be processed is received, the associated GPU retrieves the data from the cache, and processes the data to be processed.
According to the technical scheme of the embodiment of the disclosure, the shaders corresponding to each network level of the neural network model are generated by the CPU, and coarse-grained calculation is performed based on the shaders after the data to be processed are received, so that the interaction between the CPU and the GPU is greatly reduced, the limitation of GPU communication bandwidth on the model calculation process is reduced, and the performance of the neural network model is improved.
EXAMPLE III
Fig. 3 is a schematic flow chart of a data processing method provided by a third embodiment of the present disclosure, where the third embodiment of the present disclosure is applicable to a situation where a computing pipeline in a cache is loaded based on a graphics processor to process data to be processed, and the method may be executed by a data processing apparatus, where the apparatus may be implemented in a form of software and/or hardware, and optionally, the method may be implemented by an electronic device, and the electronic device may be a mobile terminal, a PC terminal, a server, or the like.
As shown in fig. 3, the method includes:
s210, loading predetermined calculation pipeline and neural network parameters when the data to be processed is received.
In this embodiment, since the computation pipeline is determined based on the central processing unit for the neural network parameters of the target neural network model and the target device parameters to which the target neural network model belongs, and meanwhile, the CPU has stored the neural network parameters of each target neural network model and the computation pipeline into the target cache, when the target device receives data to be processed, the CPU is first required to load the determined data based on the cache, and further, the GPU installed in the device loads the target neural network model and each computation pipeline based on an Application Programming Interface (API) interacting with the CPU, that is, the model, i.e., the computation pipeline, is deployed on the GPU to process the data to be processed.
For example, after the CPU determines a computation pipeline for a neural network parameter of the CNN model and a target device parameter of the target device (i.e., information indicating whether the target device supports the indirect cache), and stores the neural network parameter and the computation pipeline in the target cache in advance, when the target device receives data to be processed of a service associated with the CNN model, the CPU may load the CNN model and the corresponding computation pipelines, and simultaneously send a program execution instruction to the GPU, so that the model and the computation pipelines are deployed in the GPU by using the corresponding API interfaces, that is, the GPU loads the CNN model corresponding to the service and the computation pipelines corresponding to the input layer, the convolutional layer, the pooling layer, and the full connection layer, thereby performing heterogeneous computation corresponding to the network hierarchies.
S220, determining a target processing mode for processing the data to be processed by each shader to be used according to the target equipment parameters.
The to-be-used shader is obtained by processing the neural network parameters based on the central processing unit, that is, the CPU may process the neural network parameters according to the first method of the present disclosure, so as to obtain the to-be-used shader corresponding to each network level of the target neural network model, which is not described herein again in the present disclosure.
In this embodiment, when the target device parameters are different, the processing modes of the GPUs are also different. Optionally, if the target device parameter is a first parameter, determining that a target processing mode in which each shader to be used processes the data to be processed is a first target processing mode; and if the target equipment parameter is the second parameter, determining that the target processing mode is the second target processing mode.
It can be understood that when it is determined that the GPU mounted on the target device supports the indirect cache, the first target processing manner is determined so as to process the data to be processed based on each shader to be used, and when it is determined that the GPU mounted on the target device does not support the indirect cache, the second target processing manner is determined so as to process the data to be processed based on each shader to be used.
And S230, processing the data to be processed based on the target processing mode to obtain a target processing result.
In this embodiment, optionally, when the target processing manner is the first target processing manner, determining an execution order of each sub-computation pipeline based on the control pipeline corresponding to the computation pipeline; and sending a program execution instruction to the corresponding to-be-used shaders based on the execution sequence and the corresponding sub-computing pipelines, so that each to-be-used shader processes the to-be-processed data to obtain the target processing result.
For example, when the GPU mounted on the target device supports indirect caching, it may be determined that the target processing manner is the first target processing manner. Meanwhile, the CPU has already constructed corresponding sub-computation pipelines for each network level of the CNN model as the target neural network model, and control pipelines for controlling the execution sequence of each network level, and at the same time, stores the computation pipelines in the target cache, so that the GPU can directly load the control pipelines in the cache when processing data to be processed, thereby determining that the execution sequence is the input layer, the convolutional layer, the pooling layer, and the full connection layer. Further, based on the determined execution sequence, the GPU may send program execution instructions to the to-be-used shaders corresponding to each network level in sequence by using the input layer sub-computing pipeline, the convolution layer sub-computing pipeline, the pooling layer sub-computing pipeline, and the full connection layer sub-computing pipeline, so as to run each segment of programs to process the to-be-processed data.
Optionally, receiving a program execution instruction sent based on the current sub-computing pipeline; and processing the data to be processed based on the program execution instruction to obtain a data processing result, and feeding the data processing result back to the central processing unit, so that when the central processing unit receives the data processing result, the next sub-computing pipeline of the current sub-computing pipeline is used as the current sub-computing pipeline, and the program execution instruction is sent based on the current sub-computing pipeline until the current sub-computing pipeline is the last sub-computing pipeline, so as to obtain a target processing result corresponding to the data to be processed.
Specifically, when the GPU mounted on the target device does not support indirect caching, it may be determined that the target processing manner is the second target processing manner. In this case, the CPU only constructs corresponding sub-computation pipelines for each network level of the target neural network model, and stores the computation pipelines in the target cache, so that, when processing data to be processed, the GPU cannot determine, based on the GPU itself, an execution sequence of shaders to be used associated with each network level, but needs to process the data to be processed according to program execution instructions sent by the CPU, and it can be understood that the program execution instructions are determined based on the CPU according to the sub-computation pipelines of each neural network level.
Exemplarily, when the CPU only constructs a corresponding sub-computation pipeline for each network level of the CNN model as the target neural network model and stores the computation pipeline in the target cache, and the GPU processes the data to be processed, it is impossible to determine an execution sequence of each shader to be used, and it is necessary to receive a program execution instruction sent by the CPU and analyze the instruction, thereby determining that a first shader to be used that needs to be executed is a shader to be used corresponding to the CNN input layer, and further, the corresponding shader to be used is loaded and executed based on the sub-computation pipeline corresponding to the input layer, so that the data to be processed is processed; meanwhile, the processing module at the CPU side may perform asynchronous query on the execution result of the GPU, or receive feedback information sent by the GPU according to the execution result, and after the CPU determines that the shader to be used, which is executed by the GPU and is associated with the input layer, is correct and obtains an effective data processing result, may determine, for the GPU, that the next network layer to be executed is the convolutional layer, and after taking the sub-computation pipeline corresponding to the convolutional layer as the current sub-computation pipeline, send the corresponding message to the GPU again in the form of a program execution instruction, and execute the shader to be used, which is corresponding to the CNN convolutional layer, based on the current sub-computation pipeline, to obtain the corresponding processing result again. Based on the mode, the GPU sequentially executes the shaders to be used corresponding to the pooling layer and the full connection layer, and obtains a final target processing result.
The above process may be understood as that, after the CPU constructs corresponding sub-computation pipelines for four network levels of the target neural network model, an instruction may be sent to the GPU, so that the GPU executes the to-be-used shaders of the first layer based on the sub-computation pipelines corresponding to the first layer, after the GPU is executed, a message indicating that the program execution is completed is sent to the CPU, and after receiving the message, the CPU can send an instruction for executing the to-be-used shaders of the second layer to the GPU, and so on, until the GPU completes the execution of all the to-be-used shaders of the four network levels. After the GPU obtains the target processing result, the CPU can directly read the target processing result, and the target processing result can also be stored in the GPU terminal for subsequent invocation by other shaders to be used.
According to the technical scheme of the embodiment of the disclosure, when data to be processed is received, predetermined calculation pipeline and neural network parameters are loaded, and according to target equipment parameters, a target processing mode for processing the data to be processed by using each shader is determined; and processing the data to be processed based on the target processing mode to obtain a target processing result, so that after the GPU receives the data to be processed, coarse-grained calculation is performed based on each shader in the cache, the interaction between the CPU and the GPU is greatly reduced, the limitation of GPU communication bandwidth to a model calculation process is reduced, and the performance of the neural network model is improved.
Example four
As an alternative embodiment of the foregoing embodiment, fig. 4 is a schematic flow chart of a data processing method provided in a fourth embodiment of the present disclosure. For clearly describing the technical solution of the present embodiment, an application scenario may be described by taking a case where the application scenario is based on a computing pipeline in a graphics processor load cache, and data to be processed is processed as an example, but the application scenario is not limited to the above scenario and may be applied to various scenarios that need to process data related to a target neural network model.
Referring to fig. 4, in the process of invoking the neural network by the GPU, the processing module at the CPU end first needs to load the already-constructed computation pipeline and the neural network parameters, and meanwhile, the processing module at the CPU end also needs to load the service data, and the GPU loads the computation pipeline based on the cache.
Continuing to refer to fig. 4, after the data loading is completed, it is determined whether the GPU supports indirect caching, and when the GPU supports indirect caching, the GPU end processing module may directly load the control pipeline into the indirect buffer, and determine an execution sequence of each network level of the target neural network model based on the control pipeline, and further execute corresponding shader programs to be used based on sub-computation pipelines corresponding to each network level in sequence according to the determined execution sequence, thereby completing scheduling of the whole flow of the target neural network model; when the GPU does not support indirect buffering, the CPU-side processing module needs to schedule each network level of the target neural network model, and the GPU can determine the execution order of each layer of the target neural network model under the control of the CPU, thereby executing the corresponding shader to be used according to the order given by the CPU.
With reference to fig. 4, when the GPU processes data to be processed based on each shader program to be used, the CPU processing module may determine whether the execution of the target neural network model is completed in an asynchronous query manner, and when it is determined that the execution of the model is completed, the target processing result may be directly read, or the target processing result may be stored in the GPU terminal for use by another shader to be used.
According to the technical scheme of the embodiment of the disclosure, after the GPU receives the data to be processed, coarse-grained calculation is carried out on the basis of all shaders in the cache, so that the interaction between the CPU and the GPU is greatly reduced, the limitation of GPU communication bandwidth to the model calculation process is reduced, and the performance of the neural network model is improved.
EXAMPLE five
Fig. 5 is a schematic structural diagram of a data processing apparatus according to a fifth embodiment of the disclosure, and as shown in fig. 5, the apparatus includes: a network parameter determination module 310, a shader determination module 320, and a pipeline determination module 330.
A network parameter determination module 310, configured to obtain a neural network parameter corresponding to the target neural mesh model.
A shader determining module 320, configured to generate a to-be-used shader corresponding to each network hierarchy according to the neural network parameters; wherein the target neural network model comprises a plurality of network hierarchies.
A pipeline determining module 330, configured to determine, according to a target device parameter of a device to which the target neural network model belongs, a computing pipeline corresponding to the shader to be used, so that when the data to be processed is received, the computing pipeline is called to process the data to be processed according to the target device parameter, and a target processing result is obtained.
Optionally, the shader determining module 320 is further configured to invoke a shader generating module to process the neural network parameter, so as to obtain a to-be-used shader of each network level in the target neural network model.
On the basis of the above technical solutions, the target device parameter includes an indirect buffering parameter.
On the basis of the above technical solutions, the pipeline determining module 330 includes a scheduling shader generating unit and a calculating pipeline determining unit.
And the scheduling shader generating unit is used for generating scheduling shaders corresponding to the shaders to be used if the indirect buffering parameter of the device to which the target neural network model belongs is a first parameter.
A calculation pipeline determining unit, configured to obtain a calculation pipeline by compiling and processing the scheduling shader and the to-be-used shader; the computing pipeline comprises sub-computing pipelines corresponding to all shaders to be used and control pipelines corresponding to all the sub-computing pipelines.
On the basis of the above technical solutions, the pipeline determining module 330 further includes a sub-calculation pipeline determining unit.
And the sub-computing pipeline determining unit is used for determining the sub-computing pipelines corresponding to the shaders to be used if the indirect buffering parameter of the device to which the target neural network model belongs is a second parameter.
A calculation pipeline determination unit, further configured to determine the calculation pipeline based on each sub-calculation pipeline.
According to the technical scheme provided by the embodiment, neural network parameters corresponding to a target neural network model are obtained; generating shaders to be used corresponding to each network level according to the neural network parameters, namely generating corresponding shaders to be used aiming at a plurality of network levels in the target neural network model; furthermore, according to target device parameters of a device to which the target neural network model belongs, a computing pipeline corresponding to a shader to be used is determined, when data to be processed is received, the computing pipeline is called according to the target device parameters to process the data to be processed to obtain a target processing result, the shader corresponding to each network level of the neural network model is generated through the CPU, coarse-grained calculation is performed based on each shader after the data to be processed is received, the limitation of GPU communication bandwidth on the model computing process is reduced in a mode of greatly reducing the interaction between the CPU and the GPU, and the performance of the neural network model is improved.
The data processing device provided by the embodiment of the disclosure can execute the data processing method provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method.
It should be noted that, the units and modules included in the apparatus are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the embodiments of the present disclosure.
EXAMPLE six
Fig. 6 is a schematic structural diagram of a data processing apparatus according to a sixth embodiment of the present disclosure, and as shown in fig. 6, the apparatus includes: a network parameter loading module 410, a processing mode determining module 420 and a processing result determining module 430.
A network parameter loading module 410, configured to load predetermined calculation pipeline and neural network parameters when receiving data to be processed; wherein the calculation pipeline is determined based on the neural network parameters of the target neural network model and the target device parameters to which the target neural network model belongs by a central processing unit.
A processing mode determining module 420, configured to determine, according to the target device parameter, a target processing mode for processing the data to be processed by each shader to be used; and the shader to be used is obtained by processing the neural network parameters based on a central processing unit.
And a processing result determining module 430, configured to process the to-be-processed data based on the target processing manner to obtain a target processing result.
Optionally, the processing mode determining module 420 is further configured to determine, if the target device parameter is a first parameter, that a target processing mode in which each shader to be used processes the data to be processed is a first target processing mode; and if the target equipment parameter is the second parameter, determining that the target processing mode is the second target processing mode.
Optionally, the processing result determining module 430 is further configured to determine an execution order of each sub-computation pipeline based on the control pipeline corresponding to the computation pipeline; and sending a program execution instruction to the corresponding to-be-used shaders based on the execution sequence and the corresponding sub-computing pipelines, so that each to-be-used shader processes the to-be-processed data to obtain the target processing result.
Optionally, the processing result determining module 430 is further configured to receive a program execution instruction sent based on the current sub-computing pipeline; wherein the program execution instructions are determined based on the central processor from sub-compute pipelines of each neural network hierarchy; and processing the data to be processed based on the program execution instruction to obtain an image processing result, and feeding back the image processing result to the central processing unit, so that when the central processing unit receives the image processing result, the next sub-computing pipeline of the current sub-computing pipeline is used as the current sub-computing pipeline, and the program execution instruction is sent based on the current sub-computing pipeline until the current sub-computing pipeline is the last sub-computing pipeline, so as to obtain a target processing result corresponding to the data to be processed.
According to the technical scheme provided by the embodiment, when data to be processed is received, predetermined calculation pipeline and neural network parameters are loaded, and a target processing mode for processing the data to be processed by using each shader is determined according to target equipment parameters; and processing the data to be processed based on the target processing mode to obtain a target processing result, so that after the GPU receives the data to be processed, coarse-grained calculation is performed based on each shader in the cache, the interaction between the CPU and the GPU is greatly reduced, the limitation of GPU communication bandwidth to a model calculation process is reduced, and the performance of the neural network model is improved.
The data processing device provided by the embodiment of the disclosure can execute the data processing method provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method.
It should be noted that, the units and modules included in the apparatus are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the embodiments of the present disclosure.
EXAMPLE seven
Fig. 7 is a schematic structural diagram of an electronic device according to a seventh embodiment of the disclosure. Referring now to fig. 7, a schematic diagram of an electronic device (e.g., the terminal device or the server in fig. 7) 500 suitable for implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 7, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 506 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An editing/output (I/O) interface 505 is also connected to bus 504.
Generally, the following devices may be connected to the I/O interface 505: editing devices 506 including, for example, a touch screen, a touch pad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, or the like; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 506 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 7 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 506, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
The electronic device provided by the embodiment of the present disclosure and the data processing method provided by the above embodiment belong to the same inventive concept, and technical details that are not described in detail in the embodiment can be referred to the above embodiment, and the embodiment has the same beneficial effects as the above embodiment.
Example eight
The disclosed embodiments provide a computer storage medium on which a computer program is stored, which when executed by a processor implements the data processing method provided by the above-described embodiments.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to:
acquiring neural network parameters corresponding to the target neural grid model;
generating shaders to be used corresponding to each network level according to the neural network parameters; wherein the target neural network model comprises a plurality of network hierarchies;
and determining a computing pipeline corresponding to the shader to be used according to target equipment parameters of the equipment to which the target neural network model belongs, and calling the computing pipeline to process the data to be processed according to the target equipment parameters when the data to be processed is received to obtain a target processing result.
Or the like, or, alternatively,
loading predetermined calculation pipeline and neural network parameters when receiving data to be processed; wherein the calculation pipeline is determined based on a central processing unit for the neural network parameters of the target neural network model and the target device parameters to which the target neural network model belongs;
determining a target processing mode for processing the data to be processed by each shader to be used according to the target equipment parameters; the shader to be used is obtained after the neural network parameters are processed based on a central processing unit;
and processing the data to be processed based on the target processing mode to obtain a target processing result.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, [ example one ] there is provided a data processing method applied in a central processing unit, the method including:
acquiring neural network parameters corresponding to the target neural grid model;
generating shaders to be used corresponding to each network level according to the neural network parameters; wherein the target neural network model comprises a plurality of network hierarchies;
and determining a computing pipeline corresponding to the shader to be used according to target equipment parameters of the equipment to which the target neural network model belongs, and calling the computing pipeline to process the data to be processed according to the target equipment parameters when the data to be processed is received to obtain a target processing result.
According to one or more embodiments of the present disclosure, [ example two ] there is provided a data processing method, further comprising:
optionally, the generating a to-be-used shader corresponding to each network hierarchy according to the neural network parameter includes:
and calling a shader generating module to process the neural network parameters to obtain a shader to be used of each network level in the target neural network model.
According to one or more embodiments of the present disclosure, [ example three ] there is provided a data processing method, further comprising:
optionally, the determining, according to the target device parameter of the device to which the target neural network model belongs, a calculation pipeline corresponding to the to-be-used shader includes:
if the indirect buffering parameter of the device to which the target neural network model belongs is a first parameter, generating scheduling shaders corresponding to the shaders to be used;
compiling and processing the scheduling shader and the to-be-used shader to obtain a computing pipeline;
the computing pipeline comprises sub-computing pipelines corresponding to all shaders to be used and control pipelines corresponding to all the sub-computing pipelines.
According to one or more embodiments of the present disclosure, [ example four ] there is provided a data processing method, further comprising:
optionally, the determining, according to the target device parameter of the device to which the target neural network model belongs, a calculation pipeline corresponding to the to-be-used shader includes:
if the indirect buffering parameter of the device to which the target neural network model belongs is a second parameter, determining a sub-computing pipeline corresponding to each shader to be used;
based on each sub-computation pipeline, the computation pipeline is determined.
According to one or more embodiments of the present disclosure, [ example five ] there is provided a data processing method, further comprising:
optionally, the neural network parameters and the calculation pipelines of the target neural network model are cached, so that when the data to be processed is received, the neural network parameters and the calculation pipelines are called to process the data to be processed.
According to one or more embodiments of the present disclosure, [ example six ] there is provided a data processing method applied in a graphics processor, the method including:
loading predetermined calculation pipeline and neural network parameters when receiving data to be processed; wherein the calculation pipeline is determined based on a central processing unit for the neural network parameters of the target neural network model and the target device parameters to which the target neural network model belongs;
determining a target processing mode for processing the data to be processed by each shader to be used according to the target equipment parameters; the shader to be used is obtained after the neural network parameters are processed based on a central processing unit;
and processing the data to be processed based on the target processing mode to obtain a target processing result.
According to one or more embodiments of the present disclosure, [ example seven ] there is provided a data processing method, the method further comprising:
optionally, the determining, according to the target device parameter, a target processing manner in which each shader to be used processes data to be processed includes:
if the target equipment parameter is a first parameter, determining that a target processing mode for processing the data to be processed by each shader to be used is a first target processing mode;
and if the target equipment parameter is the second parameter, determining that the target processing mode is the second target processing mode.
According to one or more embodiments of the present disclosure, [ example eight ] there is provided a data processing method, further comprising:
optionally, the target processing manner is a first target processing manner, and the processing the to-be-processed data based on the target processing manner to obtain a target processing result includes:
determining an execution order of each sub-compute pipeline based on a control pipeline corresponding to the compute pipeline;
and sending a program execution instruction to the corresponding to-be-used shaders based on the execution sequence and the corresponding sub-computing pipelines, so that each to-be-used shader processes the to-be-processed data to obtain the target processing result.
According to one or more embodiments of the present disclosure, [ example nine ] there is provided a data processing method, further comprising:
optionally, the processing the data to be processed based on the target processing manner to obtain a target processing result includes:
receiving a program execution instruction sent based on a current sub-compute pipeline; wherein the program execution instructions are determined based on the central processor from sub-compute pipelines of each neural network hierarchy;
and processing the data to be processed based on the program execution instruction to obtain a data processing result, and feeding back the data processing result to the central processing unit, so that when the central processing unit receives the data processing result, the next sub-computing pipeline of the current sub-computing pipeline is used as the current sub-computing pipeline, and the program execution instruction is sent based on the current sub-computing pipeline until the current sub-computing pipeline is the last sub-computing pipeline, so as to obtain a target processing result corresponding to the data to be processed.
According to one or more embodiments of the present disclosure, [ example ten ] there is provided a data processing apparatus, configured in a central processor, the apparatus comprising:
the network parameter determining module is used for acquiring neural network parameters corresponding to the target neural grid model;
the shader determining module is used for generating shaders to be used corresponding to each network level according to the neural network parameters; wherein the target neural network model comprises a plurality of network hierarchies;
and the pipeline determining module is used for determining a computing pipeline corresponding to the shader to be used according to the target equipment parameter of the equipment to which the target neural network model belongs, and calling the computing pipeline to process the data to be processed according to the target equipment parameter when the data to be processed is received to obtain a target processing result.
According to one or more embodiments of the present disclosure, [ example eleven ] there is provided a data processing apparatus configured in a graphics processor, the apparatus including:
the network parameter loading module is used for loading predetermined calculation pipelines and neural network parameters when receiving data to be processed; wherein the calculation pipeline is determined based on a central processing unit for the neural network parameters of the target neural network model and the target device parameters to which the target neural network model belongs;
the processing mode determining module is used for determining a target processing mode for processing the data to be processed by each shader to be used according to the target equipment parameters; the shader to be used is obtained after the neural network parameters are processed based on a central processing unit;
and the processing result determining module is used for processing the data to be processed based on the target processing mode to obtain a target processing result.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
Claims (13)
1. A data processing method is applied to a central processing unit and comprises the following steps:
acquiring neural network parameters corresponding to the target neural grid model;
generating shaders to be used corresponding to each network level according to the neural network parameters; wherein the target neural network model comprises a plurality of network hierarchies;
and determining a computing pipeline corresponding to the shader to be used according to target equipment parameters of the equipment to which the target neural network model belongs, and calling the computing pipeline to process the data to be processed according to the target equipment parameters when the data to be processed is received to obtain a target processing result.
2. The method of claim 1, wherein generating a to-be-used shader corresponding to each network level according to the neural network parameters comprises:
and calling a shader generating module to process the neural network parameters to obtain a shader to be used of each network level in the target neural network model.
3. The method of claim 1, wherein the target device parameter comprises an indirect buffering parameter, and wherein determining the computing pipeline corresponding to the shader to be used according to the target device parameter of the device to which the target neural network model belongs comprises:
if the indirect buffering parameter of the device to which the target neural network model belongs is a first parameter, generating scheduling shaders corresponding to the shaders to be used;
compiling and processing the scheduling shader and the to-be-used shader to obtain a computing pipeline;
the computing pipeline comprises sub-computing pipelines corresponding to all shaders to be used and control pipelines corresponding to all the sub-computing pipelines.
4. The method of claim 1, wherein the target device parameters comprise indirect buffer parameters, and wherein determining the computing pipeline corresponding to the shader to be used according to the target device parameters of the device to which the target neural network model belongs comprises:
if the indirect buffering parameter of the device to which the target neural network model belongs is a second parameter, determining a sub-computing pipeline corresponding to each shader to be used;
based on each sub-computation pipeline, the computation pipeline is determined.
5. The method of claim 1, further comprising:
caching the neural network parameters and the calculation pipelines of the target neural network model so as to call the neural network parameters and the calculation pipelines to process the data to be processed when the data to be processed is received.
6. A data processing method applied to a graphics processor includes:
loading predetermined calculation pipeline and neural network parameters when receiving data to be processed; wherein the calculation pipeline is determined based on the neural network parameters of a target neural network model and the target equipment parameters of the target neural network model by a central processing unit;
determining a target processing mode for processing the data to be processed by each shader to be used according to the target equipment parameters; the shader to be used is obtained after the neural network parameters are processed based on a central processing unit;
and processing the data to be processed based on the target processing mode to obtain a target processing result.
7. The method of claim 6, wherein determining the target processing mode for processing the data to be processed by each shader according to the target device parameters comprises:
if the target equipment parameter is a first parameter, determining that a target processing mode for processing the data to be processed by each shader to be used is a first target processing mode;
and if the target equipment parameter is the second parameter, determining that the target processing mode is the second target processing mode.
8. The method according to claim 7, wherein the target processing manner is a first target processing manner, and the processing the data to be processed based on the target processing manner to obtain a target processing result includes:
determining an execution order of each sub-compute pipeline based on a control pipeline corresponding to the compute pipeline;
and sending a program execution instruction to the corresponding to-be-used shaders based on the execution sequence and the corresponding sub-computing pipelines, so that each to-be-used shader processes the to-be-processed data to obtain the target processing result.
9. The method according to claim 8, wherein the processing the data to be processed based on the target processing manner to obtain a target processing result comprises:
receiving a program execution instruction sent based on a current sub-compute pipeline; wherein the program execution instructions are determined based on the central processor from sub-compute pipelines of each neural network hierarchy;
and processing the data to be processed based on the program execution instruction to obtain a data processing result, and feeding back the data processing result to the central processing unit, so that when the central processing unit receives the data processing result, the next sub-computing pipeline of the current sub-computing pipeline is used as the current sub-computing pipeline, and the program execution instruction is sent based on the current sub-computing pipeline until the current sub-computing pipeline is the last sub-computing pipeline, so as to obtain a target processing result corresponding to the data to be processed.
10. A data processing apparatus, configured in a central processing unit, comprising:
the network parameter determining module is used for acquiring neural network parameters corresponding to the target neural grid model;
the shader determining module is used for generating shaders to be used corresponding to each network level according to the neural network parameters; wherein the target neural network model comprises a plurality of network hierarchies;
and the pipeline determining module is used for determining a computing pipeline corresponding to the shader to be used according to the target equipment parameter of the equipment to which the target neural network model belongs, and calling the computing pipeline to process the data to be processed according to the target equipment parameter when the data to be processed is received to obtain a target processing result.
11. A data processing apparatus, configured in a graphics processor, comprising:
the network parameter loading module is used for loading predetermined calculation pipelines and neural network parameters when receiving data to be processed; wherein the calculation pipeline is determined based on the neural network parameters of a target neural network model and the target equipment parameters of the target neural network model by a central processing unit;
the processing mode determining module is used for determining a target processing mode for processing the data to be processed by each shader to be used according to the target equipment parameters; the shader to be used is obtained after the neural network parameters are processed based on a central processing unit;
and the processing result determining module is used for processing the data to be processed based on the target processing mode to obtain a target processing result.
12. An electronic device, characterized in that the electronic device comprises:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a data processing method as claimed in any one of claims 1-5 or 6-9.
13. A storage medium containing computer-executable instructions for performing the data processing method of any one of claims 1-5 or 6-9 when executed by a computer processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111644074.5A CN114330689A (en) | 2021-12-29 | 2021-12-29 | Data processing method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111644074.5A CN114330689A (en) | 2021-12-29 | 2021-12-29 | Data processing method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114330689A true CN114330689A (en) | 2022-04-12 |
Family
ID=81016671
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111644074.5A Pending CN114330689A (en) | 2021-12-29 | 2021-12-29 | Data processing method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114330689A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115934768A (en) * | 2022-12-01 | 2023-04-07 | 摩尔线程智能科技(北京)有限责任公司 | Data processing method, display adapter, electronic device and storage medium |
CN116756444A (en) * | 2023-06-14 | 2023-09-15 | 北京百度网讯科技有限公司 | Image processing method, device, equipment and storage medium |
-
2021
- 2021-12-29 CN CN202111644074.5A patent/CN114330689A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115934768A (en) * | 2022-12-01 | 2023-04-07 | 摩尔线程智能科技(北京)有限责任公司 | Data processing method, display adapter, electronic device and storage medium |
CN116756444A (en) * | 2023-06-14 | 2023-09-15 | 北京百度网讯科技有限公司 | Image processing method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210216875A1 (en) | Method and apparatus for training deep learning model | |
WO2022151966A1 (en) | Processing method and apparatus for language model, text generation method and apparatus, and medium | |
CN114330689A (en) | Data processing method and device, electronic equipment and storage medium | |
CN114020470A (en) | Resource allocation method, device, readable medium and electronic equipment | |
CN110909527B (en) | Text processing model running method and device, electronic equipment and storage medium | |
CN114625536A (en) | Video memory allocation method, device, medium and electronic equipment | |
CN109598344B (en) | Model generation method and device | |
CN110489219B (en) | Method, device, medium and electronic equipment for scheduling functional objects | |
CN112416303A (en) | Software development kit thermal restoration method and device and electronic equipment | |
CN113064704B (en) | Task processing method, device, electronic equipment and computer readable medium | |
CN113988992B (en) | Order information sending method, order information sending device, electronic equipment and computer readable medium | |
CN116360971A (en) | Processing method, device, equipment and medium based on heterogeneous computing framework | |
CN111459893B (en) | File processing method and device and electronic equipment | |
CN112148448A (en) | Resource allocation method, device, equipment and computer readable medium | |
CN111580890A (en) | Method, apparatus, electronic device, and computer-readable medium for processing features | |
CN111309323A (en) | Parameter initialization method and device and electronic equipment | |
CN116306781A (en) | Data processing method and device based on neural network model and electronic equipment | |
CN115759260B (en) | Reasoning method and device of deep learning model, electronic equipment and storage medium | |
CN115993942B (en) | Data caching method, device, electronic equipment and computer readable medium | |
CN117170986B (en) | Chip consistency processing system, method, device, equipment and medium thereof | |
CN117113727B (en) | Interactive numerical simulation equipment configuration method and device and electronic equipment | |
CN115565607B (en) | Method, device, readable medium and electronic equipment for determining protein information | |
CN114647472B (en) | Picture processing method, apparatus, device, storage medium, and program product | |
CN112862110B (en) | Model generation method and device and electronic equipment | |
CN115705193A (en) | Distributed compiling method, device, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |