WO2021120578A1 - 神经网络的前向计算方法、装置及计算机可读存储介质 - Google Patents

神经网络的前向计算方法、装置及计算机可读存储介质 Download PDF

Info

Publication number
WO2021120578A1
WO2021120578A1 PCT/CN2020/098799 CN2020098799W WO2021120578A1 WO 2021120578 A1 WO2021120578 A1 WO 2021120578A1 CN 2020098799 W CN2020098799 W CN 2020098799W WO 2021120578 A1 WO2021120578 A1 WO 2021120578A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
texture
storage structure
texture storage
channels
Prior art date
Application number
PCT/CN2020/098799
Other languages
English (en)
French (fr)
Inventor
刘文然
陈洪锐
李昊沅
陈其锋
李峰
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021120578A1 publication Critical patent/WO2021120578A1/zh
Priority to US17/507,127 priority Critical patent/US20220044104A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics

Definitions

  • This application relates to the field of computer technology. Specifically, this application relates to a neural network forward calculation method, device, electronic equipment, and computer-readable storage medium.
  • the neural network forward algorithm needs to implement GPU (Graphics Processing Unit, graphics processor) calculations on different platforms such as mobile and PC. Different platforms use different computing libraries or graphics libraries. On the mobile side, APIs such as OpenCL and OpenGL are usually used for implementation. On the PC side running the Windows operating system, due to the versatility of Direct3D, the Direct3D graphics library can be used to implement the neural network forward algorithm.
  • the forward calculation of the neural network is mainly composed of the calculation of each layer of the network in the GPU, including the input data and weights of each layer are uploaded to the GPU, and the result is calculated in the GPU.
  • the purpose of this application is to solve at least one of the above-mentioned technical defects.
  • the technical solutions provided by the embodiments of this application are as follows.
  • the embodiment of the present application provides a forward calculation method of a neural network, and the method for at least one data processing layer in the neural network to perform data processing includes:
  • the input data is stored in the first texture storage structure to obtain the first texture data
  • the weight data is stored in the second texture storage structure to obtain the second texture data
  • multiple data elements in the input data are
  • the first texture data corresponds to the same index
  • the multiple data elements in the weight data correspond to the same index in the second texture data
  • the computing device uses the index as a unit, in the first Texture data and access data in the second texture data
  • the embodiment of the application provides a forward computing device of a neural network.
  • the device is used to perform data processing on at least one data processing layer in the neural network, including:
  • the data acquisition module is used to acquire the input data and weight data of the data processing layer
  • the data storage module is used to store the first texture storage structure to obtain the first texture data, and store the weight data in the second texture storage structure to obtain the second texture data, wherein multiple of the input data Data elements correspond to the same index in the first texture data, multiple data elements in the weight data correspond to the same index in the second texture data, and the computing device uses the index as a unit, Access data in the first texture data and the second texture data;
  • the data processing module is used to perform data processing of the data processing layer based on the first texture data and the second texture data to obtain output data of the data processing layer.
  • an embodiment of the present application provides an electronic device, including a memory and a processor
  • a computer program is stored in the memory
  • the processor is configured to execute a computer program to implement the method provided in the embodiment of the first aspect or any optional embodiment of the first aspect.
  • An embodiment of the present application provides a computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the embodiment of the first aspect or any optional of the first aspect is implemented.
  • the method provided in the embodiment is not limited to:
  • the input data and weight data are stored in the corresponding texture storage structure. Because the texture storage structure index is simple and convenient, and the data storage capacity is large, it saves The time for the data processing layer to read and store data during data processing has greatly improved the forward calculation efficiency of the neural network.
  • Fig. 1 is a schematic diagram of forward calculation of a neural network in the prior art
  • 2A is a schematic diagram of an application scenario of a neural network forward calculation method according to an embodiment of the application
  • 2B is a schematic flowchart of a neural network forward calculation method according to an embodiment of the application.
  • FIG. 3 is a schematic diagram of a data processing process of a data processing layer in an example of an embodiment of the application
  • FIG. 4 is a schematic diagram of an input data storage process in an example of an embodiment of the application.
  • FIG. 5 is a schematic diagram of a weight data storage process in an example of an embodiment of the application.
  • FIG. 6 is a schematic diagram of the effect of the background blur function in an example of an embodiment of the application.
  • FIG. 7 is a schematic diagram of the effect of the gesture recognition function in an example of an embodiment of the application.
  • FIG. 8 is a schematic diagram of the effect of the image saliency recognition function in an example of an embodiment of the application.
  • FIG. 9 is a structural block diagram of a neural network forward computing device according to an embodiment of the application.
  • FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the application.
  • AI Artificial Intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Computer Vision is a science that studies how to make machines "see”. Furthermore, it refers to the use of cameras and computers instead of human eyes to identify, track, and measure targets. And further graphics processing, so that computer processing becomes more suitable for human eyes to observe or send to the instrument to detect the image.
  • Computer vision studies related theories and technologies trying to establish an artificial intelligence system that can obtain information from images or multi-dimensional data.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning and mapping Construction and other technologies also include common face recognition, fingerprint recognition and other biometric recognition technologies.
  • ASR automatic speech recognition technology
  • TTS speech synthesis technology
  • voiceprint recognition technology Enabling computers to be able to listen, see, speak, and feel is the future development direction of human-computer interaction, among which voice has become one of the most promising human-computer interaction methods in the future.
  • Natural language processing (Nature Language Processing, NLP) is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that enable effective communication between humans and computers in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, that is, the language people use daily, so it is closely related to the study of linguistics. Natural language processing technology usually includes text processing, semantic understanding, machine translation, robot question answering, knowledge graph and other technologies.
  • Machine Learning is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or implement human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
  • Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence.
  • Machine learning and deep learning usually include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and style teaching learning.
  • Autonomous driving technology usually includes high-precision maps, environment perception, behavior decision-making, path planning, motion control and other technologies.
  • Self-determined driving technology has a wide range of application prospects.
  • artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones , Robotics, intelligent medical care, intelligent customer service, etc., I believe that with the development of technology, artificial intelligence technology will be applied in more fields and play more and more important values.
  • the existing technical solutions usually use the Buffer (cache) structure as the data storage method, that is, upload the input and weight of each layer of the network to the Buffer of the GPU ( Corresponding to the layer input Buffer and layer weight Buffer in the figure, use the Buffer in the GPU to calculate, and finally use the CPU (Central Processing Unit, central processing unit) to read the Buffer in the GPU to obtain the calculation result (corresponding to the result in the figure) Buffer).
  • the storage of the Buffer in the memory is linear, it takes a long time for the GPU to read and store the data in its structure, resulting in low efficiency of forward calculation.
  • the embodiments of the present application provide a forward calculation method of a neural network.
  • FIG. 2A is a schematic diagram of an application scenario of a neural network forward calculation method according to an embodiment of the application.
  • the server 10 communicates with the terminal devices 30, 40, and 50 through the network 20.
  • the method of each embodiment can be executed by a computing device.
  • the computing device may be, for example, the server 10 shown in FIG. 1A, or the terminal devices 30, 40, 50, etc.
  • the server 10 may be an independent physical server device, or may be a physical server in a server cluster.
  • the terminal devices 30, 40, 50 may be PCs, notebook phones, tablet computers, smart phones, smart TVs, game consoles, and so on.
  • FIG. 2B is a schematic flowchart of a method for at least one data processing layer in the neural network to perform data processing, and the execution subject of the method may be a GPU in a computing device. As shown in Figure 2, the method may include the following steps.
  • Step S201 Obtain input data and weight data of the data processing layer.
  • the data processing layer is generally a hidden layer of a neural network.
  • a neural network can contain multiple data processing layers.
  • the forward calculation process of the neural network is the process of data processing by each data processing layer in the neural network.
  • the data processing of each data processing layer can also be understood as the forward calculation process of the data processing layer, and the forward calculation process of each data processing layer is each input data of the data processing layer and its corresponding weight The process of performing corresponding operations on the data to obtain the corresponding output data.
  • step S202 the input data is stored in a first texture storage structure to obtain first texture data, and the weight data is stored in a second texture storage structure to obtain second texture data.
  • the input data correspond to the same index in the first texture data
  • multiple data elements in the weight data correspond to the same index in the second texture data
  • the computing device uses the index as a unit to access data in the first texture data and the second texture data.
  • the texture storage structure is a structured storage form.
  • the shader in the GPU can read data from it, or write data to it.
  • the data storage mode of the texture storage structure includes two-dimensional (2D) texture storage structure, three-dimensional (3D) texture storage structure, two-dimensional texture storage structure array, etc.
  • the basic unit in each texture storage structure can be called a texel, and each texel can contain multiple channels, such as R single channel, RGB three channels, RGBA four channels, and so on.
  • the texture storage structure mainly has the following two characteristics.
  • the texture storage structure usually uses texture coordinates to apply to a surface, and the texture coordinates can be used as an index of the data stored in the texture storage structure.
  • the GPU Based on the texture coordinates, when the GPU reads data from the texture storage structure and when it stores the data into the texture storage structure, because the index in the texture storage structure is simple and convenient, the data reading and storage speed of the GPU is faster than the Buffer storage structure. .
  • one texel of the texture storage structure can contain multiple channels, so one texel can store multiple data.
  • the GPU can read the data of all channels in the texel every time it reads one texel.
  • the GPU can store data in all channels of a texel at a time, so the GPU reads a larger amount of data from the texture storage structure each time, and the amount of data stored in the texture storage structure is also larger. Because the GPU has a large data throughput when storing or reading the texture storage structure, the data reading and storage speed of the GPU is faster than the Buffer storage structure.
  • the GPU in the process of forward calculation of the neural network, since the data amount or size of the input data and the weight data are often different, the two are stored separately, that is, for each data processing layer, the GPU will obtain The received input data and weight data of the data processing layer are respectively stored in different texture storage structures to obtain corresponding texture data.
  • Step S203 Perform data processing of the data processing layer based on the first texture data and the second texture data to obtain output data of the data processing layer.
  • the GPU reads the input data from the first texture data, reads the corresponding weight data from the second texture data, and performs forward calculation based on the input data and the corresponding weight data to obtain the corresponding output data. After forward calculation between the input data and the corresponding weight data, all the output data of the data processing layer of this layer are obtained.
  • one or more data processing layers may use the solutions of step S201 to step S203 to perform data processing.
  • the GPU can also execute the storage of the output data of the data processing layer in this layer Steps in the corresponding texture storage structure.
  • the texture storage format used in this example is the RGBA four-channel texture storage format, and each data processing layer in the neural network (ie, the network layer in the figure) is used for data processing from steps S201 to S203. deal with.
  • the GPU saves the input data of each network layer (that is, the network input in the figure) into the corresponding RGBA four-channel texture. Save the weight data of each network layer (and network weights in the figure) to the corresponding RGBA four-channel texture.
  • the GPU shader reads the input data and weight data from the two RGBA four-channel textures.
  • the GPU then obtains the output data of the network layer according to the input data and weight data.
  • the GPU stores its output data into the corresponding RGBA four-channel texture. So far, the forward calculation of the network layer of this layer ends.
  • the input data and weight data are stored in the corresponding texture storage structure. Because the texture storage structure index is simple and convenient, and the data storage capacity is large, it saves The time for the data processing layer to read and store data during data processing has greatly improved the forward calculation efficiency of the neural network.
  • the first texture storage structure or the second texture storage structure may be any of the following:
  • RGB three-channel 3D texture storage structure
  • RGBA four-channel 2D texture storage structure
  • RGB three-channel 2D texture storage structure RGB three-channel 2D texture storage structure.
  • the difference between the 3D texture storage structure and the 2D texture storage structure is that the 3D texture storage structure has different depths, that is, the texels of the 3D texture storage structure can be set at different depths, while the 2D texture storage structure can be viewed as It is a 3D texture storage structure with a depth of 1.
  • each texel in an RGBA four-channel 2D or 3D texture storage structure can store 4 data
  • each texel in an RGB three-channel 2D or 3D texture storage structure can store 3 data.
  • the data storage method adopted by the texture storage structure can be determined according to actual needs. For the same texture storage structure, the greater the number of channels in a single texel, the greater the amount of data that can be stored, and the greater the data throughput of each read and storage operation of the GPU, which results in forward calculations. The higher the efficiency.
  • the first texture data and the second texture data respectively include at least one texel, and the number of channels of each texel is three or four. At least one texel is determined by the corresponding input data or weight data according to the channel. The numbers are arranged in order.
  • the first texture data includes at least one texel indexed in at least two dimensions, and each texel has a plurality of channels, and each channel is used to store one data element in the input data, then When storing input data, the data elements in the input data can be sequentially stored in each channel of each texel.
  • texels can be understood as the basic unit of texture data.
  • GPU When GPU reads data from texture data, it can read data stored in one texel at a time; when storing data in texture data, it can send data to one texel at a time. All channels in the texel store data. The number of channels contained in each texel determines the data throughput of the texel. Since the texture storage structure corresponding to the first texture data and the texture storage structure corresponding to the second texture data are three-channel or four-channel respectively, then the first The texels in the texture data and the texels in the second texture data respectively include three channels or four channels, that is, the number of channels is three and four respectively.
  • multiple channels in each texel can be stacked one above the other in sequence, for example, a texel containing three channels of RGB, and the three channels of R, G, and B in the texel are stacked in sequence from bottom to top.
  • the input data or weight data is stored in each channel to form the corresponding texture data.
  • multiple texels contained in it can be stacked one above the other in the depth direction, and each texel contains at least three RGB channels of side-by-side information, or RGBA four channels of side-by-side information. For example, referring to FIG.
  • the 3D texture data may include texture data of two texels (texel 1 and texel 2) with a depth of 2, where texel 1 and texel 2 can be stacked in the depth direction from bottom to top.
  • Each texel can include four channels of RGBA. Specifically, texel 1 contains four channels corresponding to the squares labeled "1, 2, 3, 4", and texel 2 contains four channels labeled "5, 6". , 7, 8” squares correspond to four channels.
  • storing the input data in the first texture storage structure to obtain the first texture data includes:
  • the first texture storage structure parameter of the first texture storage structure ie, at least two Dimensions in dimensions and the number of channels per texel
  • the input data is stored according to the first texture storage structure parameter to obtain the first texture data.
  • the amount of data can be determined according to its pixel width, pixel height, and the number of channels. If the input data is not picture data, it can also be analogized to picture data, and the pixel width obtained according to the analogy , Pixel height and the number of color channels can determine the amount of data.
  • the input data with the channel number of 1 is a one-dimensional column data (A1, A2, A3, A4, A5), which can be analogous to the image data with a pixel width of 1, a pixel height of 5, and a channel number of 1.
  • the quantity is the product of pixel width, pixel height, and color channel, 5. It can be understood that the number of input data channels used in the data processing layer in the following text is generally the same.
  • the first texture storage structure parameter type of the first texture storage structure for storing the input data may be determined first.
  • the data storage mode of the first texture storage structure (for example, 2D texture storage mode or 3D texture storage mode) is different, and the corresponding first texture storage structure parameter types are also different. Therefore, it is necessary to determine the first texture storage structure according to the data storage mode of the first texture storage structure.
  • a type of texture storage structure parameter After determining the type of the first texture storage structure parameter, the size of the first texture storage structure parameter determines the amount of data that can be stored in the corresponding first texture storage structure.
  • Determine the size of each first texture storage parameter that is, ensure that each data in the input data can be stored in the first texture storage structure. Determining the type and size of the first texture storage parameter determines the first texture storage structure, and only needs to store the input data in the corresponding channel in the corresponding texel of the first texture storage structure to obtain the first texture data.
  • the first texture storage structure parameters include the height, width, and depth of the texture storage structure
  • the first texture storage structure parameter includes the height and width of the texture storage structure.
  • texture storage structure data storage method is different, and the corresponding texture storage structure parameter types are also different.
  • the first texture storage structure is a 3D texture storage structure
  • the first texture storage structure parameters include the height, width, and depth of the texture storage structure
  • determining the first texture storage structure parameter of the texture storage structure includes:
  • the pixel width of the input data is taken as the width of the first texture storage structure
  • the pixel height of the input data is taken as the height of the first texture storage structure, based on the number of channels of the input data and the channel of each texel in the first texture storage structure Number to determine the depth of the first texture storage structure.
  • the corresponding first texture storage structure parameters include height and width And depth.
  • the height, width, and depth of the texture storage structure should be determined according to the amount of input data and the data storage mode of the texture storage structure.
  • the amount of input data is the product of the pixel width, pixel height, and the number of channels
  • the data storage amount of the first texture storage structure is its width, height, depth, and each texel in the first texture storage structure.
  • the pixel width of the input data can be taken as the width of the texture storage structure
  • the pixel height of the input data can be taken as the height of the first texture storage structure.
  • the data storage capacity of the first texture structure be The amount of data greater than the input data depends on the number of channels of the input data, the depth of the first texture storage structure, and the number of channels of each texel. Therefore, whether the first texture storage structure is an RGB three-channel 3D texture storage structure or an RGBA four-channel 3D texture storage structure, it can be determined based on the number of channels of the input data and the number of channels of each texel in the first texture storage structure The depth of the first texture storage structure.
  • determining the depth of the first texture storage structure based on the number of channels of the input data and the number of channels of each texel in the first texture storage structure includes:
  • the depth of the first texture storage structure is determined by the following expression:
  • d 1 is the depth of the first texture storage structure
  • c 1 is the number of channels of input data
  • s 1 is the number of channels of each texel in the first texture storage structure
  • It is the symbol for the round-down operation.
  • the pixel height of the input data is w1
  • the pixel width is b1
  • the number of channels is c1
  • the pixel height w1 of the input data is taken as the height of the first texture storage structure
  • the pixel width b1 of the input data is taken as the first The width of the texture storage structure.
  • the first texture storage structure is an RGBA four-channel 3D texture storage structure
  • the data storage capacity of the structure corresponding to depth 1 is w1*b1*4.
  • the data volume of the input data is w1*b1*c1.
  • the depth of the texture storage structure is determined by the number of channels c1 of the input data.
  • the depth of the first texture storage structure is at least 1 to meet the data volume requirement.
  • the first texture storage structure has a depth of at least 2 to meet the data volume requirement.
  • the depth of the first texture storage structure is at least 3 to meet the data volume requirement, and so on, the depth value of the first texture storage structure can be determined.
  • the depth of the first texture storage structure can be determined by the following expression:
  • d 1 is the depth of the first texture storage structure data
  • c 1 is the number of channels of the input data.
  • the first texture storage structure is an RGB three-channel texture storage structure
  • the data storage volume of the structure corresponding to depth 1 is w1*b1*3
  • the data volume of the input data is w1*b1 *c1
  • the depth of the texture storage structure is determined by the number of channels c1 of the input data.
  • c1 is less than or equal to 3
  • the depth of the first texture storage structure is at least 1 to meet the data volume requirement.
  • the first texture storage structure has a depth of at least 2 to meet the data volume requirement.
  • the depth of the first texture storage structure is at least 3 to meet the data volume requirement.
  • the depth value of the first texture storage structure can be determined.
  • the depth of the first texture storage structure in order to make the storage space utilization rate in the first texture storage structure as high as possible, can be determined by the following expression:
  • d 1 is the depth of the first texture storage structure data
  • c 1 is the number of channels of the input data.
  • the input data pixel width of a certain data processing layer is 2, the pixel height is 2, and the number of channels is 4, then the data volume of the data processing layer is 16 (respectively 1 as shown in the figure, 2, 3,..., 16), the input data can be stored in an RGBA four-channel 3D texture storage structure with a width of 2, a height of 2, and a depth of 1.
  • the storage process can be understood as rearranging the 16 input parameters of the data processing layer into an RGBA four-channel 3D texture storage structure with a width of 2, a height of 2, and a depth of 1.
  • the method for determining the size of the first texture storage structure parameter is not limited to the method described above, as long as the determined first texture storage structure
  • the data storage capacity of the first texture storage structure corresponding to the parameter may be greater than the storage capacity of the input data to be stored.
  • the pixel width of the input data can also be taken as The height of the texture storage structure, the pixel height of the input data is taken as the width of the texture storage structure; at the same time, after the width and height of the first texture storage structure are determined, the determined depth value can also be greater than the above expression The calculated depth value.
  • storing the weight data in the second texture storage structure to obtain the second texture data includes:
  • the second texture storage structure parameter of the second texture storage structure is determined, that is, the second texture storage structure parameter is determined to be used.
  • the amount of data stored in the second texture storage structure is not less than the data amount of the weight data, the size in at least two dimensions of the second texture storage structure and the number of channels for each texel;
  • the weight data is stored according to the second texture storage structure parameter, that is, the data elements in the weight data are sequentially stored in each channel of each texel according to the second texture storage structure to obtain the second texture data.
  • the weight data can be analogized to image data, and its pixel width, pixel height, and channel number obtained by analogy can be divided into the number of input channels and the number of output channels of the data processing layer.
  • the parameter data that determines the data volume of different data processing layers may be different.
  • the data volume of the weight data of the convolutional layer and its pixel width and pixel height, as well as the number of input channels and data output of the convolutional layer The number of channels are all related, and the data amount of the weight data of the zoom layer is only related to its pixel width and pixel height, as well as the number of input channels or the number of output channels of the zoom layer.
  • the convolutional layer in a convolutional neural network For example, for the convolutional layer in a convolutional neural network, the number of input channels is 3, the pixel width of the convolution kernel is 3, the pixel height of the convolution kernel is 3, and the number of output channels is 2, then the convolutional layer
  • the amount of data is 54 of the product of pixel width, pixel height, the number of input channels and the number of output channels.
  • the second texture storage structure parameter type of the second texture storage structure for storing the weight data can be determined first, and the data of the second texture storage structure
  • the storage method for example, 2D texture storage method or 3D texture storage method
  • the corresponding second texture storage structure parameter type is also different. Therefore, the type of the second texture storage structure parameter needs to be determined according to the data storage method of the second texture storage structure .
  • the size of the second texture storage structure parameter determines the amount of data that can be stored in the corresponding second texture storage structure.
  • each second texture storage parameter that is, ensure that each data in the input data can be stored in the second texture storage structure. Determining the type and size of the second texture storage parameter determines the second texture storage structure, and only needs to store the weight data in the corresponding channel in the corresponding texel of the second texture storage structure to obtain the second texture data.
  • the second texture storage structure parameters include the height, width, and depth of the texture storage structure
  • the second texture storage structure parameters include the height and width of the texture storage structure.
  • texture storage structure data storage method is different, and the corresponding texture storage structure parameter types are also different.
  • the second texture storage structure is a 3D texture storage structure
  • the second texture storage structure parameters include the height, width, and depth of the texture storage structure
  • the data storage mode of the structure, determining the second texture storage structure parameter of the second texture storage structure includes:
  • the depth of the second texture storage structure is determined based on the other one of the number of input channels and the number of output channels of the data processing layer, and the number of channels of each texel in the second texture storage structure.
  • the parameter data related to the data amount of the weight data includes the pixel width and pixel height of the weight data, and the number of input channels and output channels of the data processing layer, it can be understood as the weight data of the data processing layer.
  • the number of digits is four-dimensional.
  • the process of determining the second texture storage structure parameter corresponding to the weight data of the data processing layer can also be A method similar to the determination method of the texture storage structure parameter of the first texture storage structure corresponding to the input data described in the foregoing is adopted.
  • one of the number of input channels and the number of output channels related to the data amount of the weight parameter will be merged into the other data, and the other channel will be used as the input data.
  • the parameter data corresponding to the number of channels can then be used to determine the second texture storage structure parameter in the manner described above for determining the first texture storage structure parameter.
  • the height and width of the second texture storage structure are determined based on one of the number of input channels and the number of output channels of the data processing layer, and the pixel width and pixel height of the weight data, include:
  • the product of the pixel height of the weight data and the number of channels is used as the height of the second texture storage structure.
  • the weight parameter is reduced by merging a channel number into the pixel height.
  • the product of the pixel height of the weight data and a channel number is used as the pixel height of the weight parameter after the dimensionality reduction.
  • it can be the pixel height that uses the product of the pixel height and the number of input channels as the weighted parameter after dimensionality reduction, or the pixel height that uses the product of the pixel height and the number of output channels as the weighted parameter after dimensionality reduction.
  • the pixel height of the weight data after dimensionality reduction is taken as the height of the second texture storage structure
  • the pixel width of the weight data after dimensionality reduction is taken as the height of the second texture storage structure. width.
  • the dimensionality reduction method of the weight data can also include the following methods: the product of the pixel width and the number of output channels is used as the pixel width of the weight parameter after dimensionality reduction, and the difference between the pixel width and the number of input channels The product is the pixel width of the weight parameter after dimensionality reduction.
  • determining the depth of the second texture storage structure includes:
  • the depth of the second texture storage structure is determined by the following expression:
  • d 2 is the depth of the second texture storage structure
  • c 2 is the number of another channel
  • s 2 is the number of channels of each texel in the second texture storage structure
  • the first texture described above can be determined similarly.
  • the depth of the storage structure is determined by determining the depth of the second texture storage structure. In this process, the number of channels that are not merged into other parameters when the weight parameter is reduced corresponds to the number of channels of the input data.
  • the data volume of the convolutional weight data is input_channel*output_channel*kernel_w*kernel_h, where input_channel Is the number of input channels of the convolutional layer, output_channel is the number of output channels of the convolutional layer, kernel_w is the pixel width of the convolution kernel, and kernel_h is the pixel height of the convolution kernel.
  • the width of the texture storage parameters of the RGBA four-channel texture can be set to kernel_w
  • the height is set to kernel_h*output_channel (that is, the weight parameter corresponding to the output channel is increased to the pixel height direction to achieve dimensionality reduction)
  • the depth is set to ( The value of input_channel+3)/4 is rounded down.
  • the second texture storage structure is a 3D texture storage structure
  • the second texture storage structure parameters include the height, width, and depth of the texture storage structure
  • the data storage mode of the structure, determining the second texture storage structure parameter of the second texture storage structure includes:
  • the pixel width of the weight data is taken as the width of the second texture storage structure, and the pixel height of the weight data is taken as the height of the second texture storage structure, based on the number of input channels or output channels of the data processing layer, and the second texture storage structure The number of channels for each texel in, determines the depth of the second texture storage structure.
  • determining the depth of the second texture storage structure based on the number of input channels or output channels of the data processing layer and the number of channels of each texel in the second texture storage structure includes:
  • d 3 is the depth of the second texture storage structure
  • c 3 is the number of input channels or output channels
  • s 3 is the number of channels of each texel in the second texture storage structure
  • the weight data in this situation can be compared with Input data, and determine the second texture storage result parameter in this situation using the method of determining the first texture storage structure parameter, where the number of channels of the input data corresponds to the number of input channels or the number of output channels.
  • the data amount of its weight data is input_channel (or output_channel)*1*1, and the weight data of the scaling layer is arranged to
  • the width of the texture storage parameters of the RGBA four-channel texture can be set to 1, the height is set to 1, and the depth is set to the value of (input_channel+3)/4 rounded down, or (output_channel+ 3)/4.
  • the weight parameters (1,2,3,4,5,6,7,8) of the scaling layer with the number of input channels input_channel of 8 are stored in height 1, width 1, and depth 2.
  • At least one computing unit in the GPU is used to perform data processing in the data processing layer; data processing in the data processing layer is performed based on the first texture data and the second texture data to obtain the data processing layer Output data, including:
  • each calculation unit uses each calculation unit to read input data stored at a first texture location in the first texture data, and use the calculation unit to read weight data stored at a second texture location in the second texture data corresponding to the first texture location;
  • Each calculation unit is used to perform data processing on the read input data and the corresponding weight data to obtain the output data of the data processing layer.
  • the texture position can be determined by texture coordinates in practical applications.
  • the texture coordinates of the first texture data and the second texture can be determined according to the corresponding relationship between the input data and the weight data.
  • the texture coordinates of the data are stored associatively. It can be understood that, in order to facilitate the correspondence between the input data and the weight data, the texture storage structure corresponding to the first texture data and the second texture data may adopt the same data storage manner, for example, both adopt RGBA four-channel texture.
  • each computing unit in the GPU reads all data in one texel in the texture storage structure each time, then after one computing unit reads the data in one texel in the first texture data, the The calculation unit reads the weight data with the corresponding relationship from the second texture according to the associated stored texture coordinates (that is, the index), and performs calculation to obtain the corresponding output data.
  • each input data is stored in a channel of a texel of a first texture data, then the input can be located by the texture coordinates of the texel in the first texture and the channel position in the texel. Similarly, you can also locate any weight data in the second texture according to the texture coordinates and the channel position.
  • the storage mode of the corresponding first texture data and the second texture are both RGBA four-channel 3D textures, and the texels with coordinates (0,0,0) in the first texture data correspond to four
  • the pieces of input data are stored in association with the four pieces of stored data corresponding to the texels of coordinates (0, 0, 0) in the second texture data.
  • a computing unit of the GPU reads a texel point whose first texture data position is (0,0,0).
  • the texel point contains (0,0,1), (0,0).
  • the calculation unit also reads the second texture data location (0,0,0) one
  • the four data in the texel points are calculated in the forward direction according to the corresponding channel correspondence relationship, that is, the input data is multiplied by the corresponding scaling weight value and the corresponding offset weight value is added to obtain four output results. Further, the four output results can also be saved in the output RGBA four-channel texture at position (0,0,0).
  • the RGBA four-channel texture scheme can reduce the time of forward calculation by 30-50% compared with the scheme using R single-channel texture.
  • Table 1 shows the calculation time of running a portrait segmentation model using single-channel texture and RGBA four-channel texture as the storage structure on different graphics cards. It can be seen that the time of the RGBA four-channel texture scheme is significantly shortened.
  • the embodiment of this application is mainly applied in the neural network forward calculation library. Since the neural network forward calculation library provides the computing capabilities of all neural network algorithms, the application scenario of the embodiment of this application is the same as the application scenario of using the neural network forward library. .
  • the scope of application can mainly include AI (Artificial Intelligence) algorithm related applications, such as portrait segmentation algorithm, gesture recognition, image saliency recognition, etc.
  • AI Artificial Intelligence
  • the application scenarios of each scene are listed below.
  • Image saliency recognition As shown in Figure 8, the cars and children in the picture are the content that human eyes pay attention to when looking at the picture. These highly saliency content can be marked out through the forward calculation of the neural network, as shown in the figure. Shown in the two rectangular boxes are the marked bicycles and children.
  • the forward calculation method provided in the embodiment of the application is used, that is, the RGBA four-channel texture storage structure (or the RGB three-channel texture storage structure )
  • the input data and weight data in the neural network are stored, which improves the efficiency of the forward calculation process and accelerates the speed of image saliency recognition.
  • the forward calculation method of the neural network provided by this application can significantly improve the realization efficiency of each function in the above three application scenarios.
  • FIG. 9 is a structural block diagram of a neural network forward computing device provided by an embodiment of the application.
  • the device 900 is used to perform data processing on at least one data processing layer in the neural network, and the device 900 It may include: a data acquisition module 901, a data storage module 902, and a data processing module 903.
  • the data acquisition module 901 is used to acquire input data and weight data of the data processing layer
  • the data storage module 902 is configured to store the first texture storage structure to obtain the first texture data, and store the weight data using the second texture storage structure to obtain the second texture data;
  • the data processing module 903 is configured to perform data processing of the data processing layer based on the first texture data and the second texture data to obtain output data of the data processing layer.
  • the input data and weight data are stored in the corresponding texture storage structure. Because the texture storage structure index is simple and convenient, and the data storage capacity is large, it saves The time for the data processing layer to read and store data during data processing has greatly improved the forward calculation efficiency of the neural network.
  • the first texture storage structure or the second texture storage structure is any one of the following:
  • RGB three-channel 3D texture storage structure
  • RGBA four-channel 2D texture storage structure
  • RGB three-channel 2D texture storage structure RGB three-channel 2D texture storage structure.
  • the first texture data and the second texture data respectively include at least one texel, and the number of channels of each texel is three or four. At least one texel is determined by the corresponding input data or weight data according to the channel. The numbers are arranged in order.
  • the data storage module can be used to:
  • the input data is stored according to the first texture storage structure parameter to obtain the first texture data.
  • the first texture storage structure is a 3D texture storage structure
  • the first texture storage structure parameters include the height, width, and depth of the texture storage structure
  • the data storage module can be used for:
  • the pixel width of the input data is taken as the width of the texture storage structure
  • the pixel height of the input data is taken as the height of the first texture storage structure, and based on the number of channels of the input data and the number of channels of each texel in the first texture storage structure, Determine the depth of the first texture storage structure.
  • the data storage module can be used to:
  • the depth of the first texture storage structure is determined by the following expression:
  • d 1 is the depth of the first texture storage structure
  • c 1 is the number of channels of input data
  • s 1 is the number of channels of each texel in the first texture storage structure
  • It is the symbol for the round-down operation.
  • the data storage module can be used to:
  • the weight data is stored according to the second texture storage structure parameter to obtain the second texture data.
  • the second texture storage structure is a 3D texture storage structure
  • the second texture storage structure parameters include the height, width, and depth of the texture storage structure
  • the data storage module can be used for:
  • the depth of the second texture storage structure is determined based on the other one of the number of input channels and the number of output channels of the data processing layer, and the number of channels of each texel in the second texture storage structure.
  • the data storage module can be used to:
  • determining the depth of the second texture storage structure includes:
  • the depth of the second texture storage structure is determined by the following expression:
  • d 2 is the depth of the second texture storage structure
  • c 2 is the number of another channel
  • s 2 is the number of channels of each texel in the second texture storage structure
  • the second texture storage structure is a 3D texture storage structure
  • the second texture storage structure parameters include the height, width, and depth of the texture storage structure
  • the data storage module can be used for:
  • the pixel width of the weight data is taken as the width of the second texture storage structure, and the pixel height of the weight data is taken as the height of the second texture storage structure, based on the number of input channels or output channels of the data processing layer, and the second texture storage structure The number of channels for each texel in, determines the depth of the second texture storage structure.
  • the data storage module can be used to:
  • d 3 is the depth of the second texture storage structure
  • c 3 is the number of input channels or output channels
  • s 3 is the number of channels of each texel in the second texture storage structure
  • At least one computing unit in the GPU is used to perform data processing on the data processing layer; the data processing module can be used to:
  • each calculation unit uses each calculation unit to read input data stored at a first texture location in the first texture data, and use the calculation unit to read weight data stored at a second texture location in the second texture data corresponding to the first texture location;
  • Each calculation unit is used to perform data processing on the read input data and the corresponding weight data to obtain the output data of the data processing layer.
  • an embodiment of the present application also provides an electronic device.
  • the electronic device includes a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor executes the computer program.
  • the method provided in any embodiment of the present application can specifically implement the following situations:
  • the method for at least one data processing layer in the neural network to perform data processing includes: obtaining input data and weight data of the data processing layer; storing the input data in a first texture storage structure to obtain the first texture data, and using the weight data in the first texture storage structure.
  • the second texture storage structure stores to obtain the second texture data; the data processing of the data processing layer is performed based on the first texture data and the second texture data to obtain the output data of the data processing layer.
  • the embodiment of the present application provides a computer-readable storage medium, and the computer-readable storage medium stores a computer program, and when the program is executed by a processor, the method shown in any embodiment of the present application is implemented.
  • the computer program corresponding to the forward calculation method of the neural network may be stored in the medium.
  • FIG. 10 shows a schematic structural diagram of an electronic device to which an embodiment of the present application is applicable.
  • the electronic device 1000 shown in FIG. 10 includes a processor 1001 and a memory 1003. Among them, the processor 1001 and the memory 1003 are connected, for example, connected through a bus 1002. Further, the electronic device 1000 may further include a transceiver 1004, and the electronic device 1000 may interact with other electronic devices through the transceiver 1004 for data. It should be noted that in actual applications, the transceiver 1004 is not limited to one, and the structure of the electronic device 1000 does not constitute a limitation to the embodiment of the present application.
  • the processor 1001 is applied in the embodiment of the present application, and can be used to implement the functions of the data acquisition module, the data storage module, and the data processing module shown in FIG. 9.
  • the processor 1001 may be a CPU, a general-purpose processor, a DSP, an ASIC, an FPGA, or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It can implement or execute various exemplary logical blocks, modules, and circuits described in conjunction with the disclosure of this application.
  • the processor 1001 may also be a combination that implements computing functions, for example, includes a combination of one or more microprocessors, a combination of a DSP and a microprocessor, etc.
  • the bus 1002 may include a path for transferring information between the above-mentioned components.
  • the bus 1002 may be a PCI bus, an EISA bus, or the like.
  • the bus 1002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used to represent in FIG. 10, but it does not mean that there is only one bus or one type of bus.
  • the memory 1003 can be ROM or other types of static storage devices that can store static information and instructions, RAM or other types of dynamic storage devices that can store information and instructions, or it can be EEPROM, CD-ROM or other optical disk storage, or optical disk storage. (Including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can be used by a computer Any other media accessed, but not limited to this.
  • the memory 1003 is used to store application program codes for executing the solutions of the present application, and is controlled by the processor 1001 to execute.
  • the processor 1001 is configured to execute the application program code stored in the memory 1003 to implement the action of the neural network forward computing device provided in the embodiment shown in FIG. 9.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Generation (AREA)

Abstract

一种神经网络的前向计算方法、装置及计算机可读存储介质,神经网络中的至少一个数据处理层进行数据处理的方法包括:获取数据处理层的输入数据和权重数据(S201);将输入数据采用第一纹理存储结构进行存储,得到第一纹理数据,将权重数据采用第二纹理存储结构进行存储,得到第二纹理数据(S202);基于第一纹理数据和第二纹理数据进行数据处理层的数据处理,得到数据处理层的输出数据(S203)。对于神经网络的至少一个数据处理层,将输入数据和权重数据分别存储在对应的纹理存储结构中,由于纹理存储结构索引简单方便,且数据存储量大,节省了数据处理层在进行数据处理过程中读取和存储数据的时间,使得神经网络的前向计算效率大大提高。

Description

神经网络的前向计算方法、装置及计算机可读存储介质
本申请要求于2019年12月16日提交中国专利局、申请号为201911294777.2、发明名称为“神经网络的前向计算方法、装置及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,具体而言,本申请涉及一种神经网络的前向计算方法、装置、电子设备及计算机可读存储介质。
发明背景
神经网络前向算法需要在移动端和PC端等不同平台实现GPU(Graphics Processing Unit,图形处理器)计算。不同平台使用的计算库或图形库有所不同。在移动端,通常采用OpenCL和OpenGL等API进行实现。而在运行Windows操作系统的PC端,由于Direct3D的通用性,可以使用Direct3D图形库来实现神经网络前向算法。神经网络前向计算主要由网络每一层在GPU中的计算组成,包括每一层的输入数据和权值上传到GPU,并在GPU中计算得到结果。现有的技术方案通常采用Buffer(缓存)结构作为数据存储的方式,即把网络每层的输入和权值上传到GPU的Buffer中,在GPU中再用Buffer进行计算,最后再用CPU(Central Processing Unit,中央处理器)读取GPU中的Buffer得到计算结果。但是,由于Buffer在内存中的存储是线性的,GPU在对其结构中的数据进行读取和存储过程中,需要消耗较长的时间,导致前向计算的效率较低。
发明内容
本申请的目的旨在至少能解决上述的技术缺陷之一,本申请实施例所提供的技术方案如下。
本申请实施例提供了一种神经网络的前向计算方法,神经网络中的至少一个数据处理层进行数据处理的方法包括:
获取数据处理层的输入数据和权重数据;
将输入数据采用第一纹理存储结构进行存储,得到第一纹理数据,将权重数据采用第二纹理存储结构进行存储,得到第二纹理数据,其中,所述输入数据中的多个数据元素在所述第一纹理数据中对应相同的索引,所述权重数据中的多个数据元素在所述第二纹理数据中对应相同的索引,所述计算设备以所述索引为单位,在所述第一纹理数据和所述第二纹理数据中存取数据;
基于第一纹理数据和第二纹理数据进行数据处理层的数据处理,得到数据处理层的输出数据。
本申请实施例提供了一种神经网络的前向计算装置,该装置用于对神经网络中的至少一个数据处理层进行数据处理,包括:
数据获取模块,用于获取数据处理层的输入数据和权重数据;
数据存储模块,用于将采用第一纹理存储结构进行存储,得到第一纹理数据,将权重数据采用第二纹理存储结构进行存储,得到第二纹理数据,其中, 所述输入数据中的多个数据元素在所述第一纹理数据中对应相同的索引,所述权重数据中的多个数据元素在所述第二纹理数据中对应相同的索引,所述计算设备以所述索引为单位,在所述第一纹理数据和所述第二纹理数据中存取数据;
数据处理模块,用于基于第一纹理数据和第二纹理数据进行数据处理层的数据处理,得到数据处理层的输出数据。
第三方面,本申请实施例提供了一种电子设备,包括存储器和处理器;
存储器中存储有计算机程序;
处理器,用于执行计算机程序以实现第一方面实施例或第一方面任一可选实施例中所提供的方法。
本申请实施例提供了一种计算机可读存储介质,其特征在于,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现第一方面实施例或第一方面任一可选实施例中所提供的方法。
本申请实施例提供的方案,对于神经网络的至少一个数据处理层,将输入数据和权重数据分别存储在对应的纹理存储结构中,由于纹理存储结构索引简单方便,且数据存储量大,节省了数据处理层在进行数据处理过程中读取和存储数据的时间,使得神经网络的前向计算效率大大提高。
附图简要说明
为了更清楚地说明本申请实施例中的技术方案,下面将对本申请实施例描述中所需要使用的附图作简单地介绍。
图1为现有技术中神经网络前向计算示意图;
图2A为本申请实施例的一种神经网络的前向计算方法的应用场景示意图;
图2B为本申请实施例的一种神经网络的前向计算方法的流程示意图;
图3为本申请实施例的一示例中一个数据处理层的数据处理过程示意图;
图4为本申请实施例的一示例中输入数据存储过程的示意图;
图5为本申请实施例的一示例中权重数据存储过程的示意图;
图6为本申请实施例的一示例中背景虚化功能的效果示意图;
图7为本申请实施例的一示例中手势识别功能的效果示意图;
图8为本申请实施例的一示例中图像显著性识别功能的效果示意图;
图9为本申请实施例的一种神经网络的前向计算装置的结构框图;
图10为本申请实施例的一种电子设备的结构示意图。
实施本发明的方式
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本申请,而不能解释为对本发明的限制。
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件, 但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
计算机视觉技术(Computer Vision,CV)计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。
语音技术(Speech Technology)的关键技术有自动语音识别技术(ASR)和语音合成技术(TTS)以及声纹识别技术。让计算机能听、能看、能说、能感觉,是未来人机交互的发展方向,其中语音成为未来最被看好的人机交互方式之一。
自然语言处理(Nature Language processing,NLP)是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。自然语言处理是一门融语言学、计算机科学、数学于一体的科学。因此,这一领域的研究将涉及自然语言,即人们日常使用的语言,所以它与语言学的研究有着密切的联系。自然语言处理技术通常包括文本处理、语义理解、机器翻译、机器人问答、知识图谱等技术。
机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模 拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。
自动驾驶技术通常包括高精地图、环境感知、行为决策、路径规划、运动控制等技术,自定驾驶技术有着广泛的应用前景,
随着人工智能技术研究和进步,人工智能技术在多个领域展开研究和应用,例如常见的智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服等,相信随着技术的发展,人工智能技术将在更多的领域得到应用,并发挥越来越重要的价值。
本申请实施例提供的方案涉及人工智能的机器学习等技术,具体通过如下实施例进行说明:
如图1所示,在神经网络的前向计算中,现有的技术方案通常采用Buffer(缓存)结构作为数据存储的方式,即把网络每层的输入和权值上传到GPU的Buffer中(对应于图中层输入Buffer和层权值Buffer),在GPU中再用Buffer进行计算,最后再用CPU(Central Processing Unit,中央处理器)读取GPU中的Buffer得到计算结果(对应于图中结果Buffer)。但是,由于Buffer在内存中的存储是线性的,GPU在对其结构中的数据进行读取和存储过程中,需要消耗较长的时间,导致前向计算的效率较低。针对上述问题本申请实施例提供了一种神经网络的前向计算方法。
本申请实施例提供的一种神经网络的前向计算方法。图2A为本申请实施例的一种神经网络的前向计算方法的应用场景示意图。在该应用场景中,服务器10通过网络20与终端设备30、40、50进行通信。各实施例的方法可以由一计算设备执行。该计算设备可以是,例如,图1A所示的服务器10、或终端设备30、40、50,等。服务器10可以是独立的物理服务器设备,也可以是服务器集群中的一个物理服务器。终端设备30、40、50可以是PC、笔记本电话、平板电脑、智能手机、智能电视、游戏主机,等。图2B为该神经网络中的至少一个数据处理层进行数据处理的方法的流程示意图,该方法的执行主体可以为计算设备中的GPU。如图2所示,该方法可以包括以下步骤。
步骤S201,获取数据处理层的输入数据和权重数据。
其中,数据处理层一般为神经网络隐藏层(hidden layer)。神经网络可以包含多个数据处理层。神经网络的前向计算过程是神经网络中各数据处理层进行数据处理的过程。每一数据处理层的数据处理也可以理解为该数据处理层的前向计算过程,而每个数据处理层的前向计算过程,是对该数据处理层的每一输入数据以及与其对应的权重数据做相应的运算,得出对应的输出数据的过程。
步骤S202,将输入数据采用第一纹理存储结构进行存储,得到第一纹理数据,将权重数据采用第二纹理存储结构进行存储,得到第二纹理数据。其中,所述输入数据中的多个数据元素在所述第一纹理数据中对应相同的索引,所述权重数 据中的多个数据元素在所述第二纹理数据中对应相同的索引,所述计算设备以所述索引为单位,在所述第一纹理数据和所述第二纹理数据中存取数据。
其中,纹理存储结构是一种结构化的存储形式。利用GPU中的着色器可以从其中读取数据,也可以将数据写入其中。纹理存储结构的数据存储方式包括二维(2D)纹理存储结构、三维(3D)纹理存储结构、二维纹理存储结构数组等。各纹理存储结构中的基本单元可称为纹素,每个纹素可以包含有多个通道,例如R单通道、RGB三通道、RGBA四通道等。纹理存储结构主要有以下两方面的特点。一方面,纹理存储结构通常使用纹理坐标应用到一个表面,纹理坐标可以作为纹理存储结构中所存储数据的索引。基于纹理坐标,GPU在从纹理存储结构中读取数据时,以及将数据存储进纹理存储结构时,由于纹理存储结构中索引简单方便,使得GPU的数据读取和存储速度都快于Buffer存储结构。另一方面,纹理存储结构的一个纹素中可以包含多个通道,因此一个纹素可以存储多个数据。GPU每次读取一个纹素即可以读取该纹素中所有通道的数据。同理GPU每次可以向一个纹素的所有通道中存储数据,那么GPU每次从纹理存储结构中读取数据量较大,向纹理存储结构中存储的数据量也较大。由于GPU对纹理存储结构进行数据存储或读取时数据吞吐量较大,使得GPU的数据读取和存储速度都快于Buffer存储结构。
一些实施例中,在对神经网络进行前向计算过程中,由于输入数据和权重数据两者的数据量或尺寸大小往往不同,故将两者分开存储,即对于各数据处理层,GPU将获取到的该数据处理层的输入数据和权重数据分别存储在不同的纹理存储结构中,得到对应的纹理数据。
步骤S203,基于第一纹理数据和第二纹理数据进行数据处理层的数据处理,得到数据处理层的输出数据。
一些实施例中,GPU从第一纹理数据读取输入数据,从第二纹理数据读取对应的权重数据,基于输入数据和对应的权重数据进行前向计算得出对应的输出数据,在完成每个输入数据与其对应的权重数据之间的前向计算后,得到该层数据处理层的所有输出数据。
可以理解的是,神经网络在进行前向计算过程中,可以有一个或多个数据处理层采用步骤S201至步骤S203的方案进行数据处理。当连续的两个数据处理层都采用步骤S201至步骤S203的方案进行数据处理时,对于前一个数据处理层,GPU在完成步骤203后,还可以执行将该层数据处理层的输出数据存储在相应的纹理存储结构中的步骤。
下面通过一个具体示例来对本申请实施例进行进一步说明。如图3所示,该示例中采用的纹理存储格式为RGBA四通道纹理存储格式,且对神经网络中每个数据处理层(即图中网络层)都采用步骤S201至步骤S203的方案进行数据处理。
如图所示,每层网络层的前向计算开始后,GPU将每层网络层的输入数据(即图中网络输入)保存至对应的RGBA四通道纹理中。将每层网络层的权重 数据(及图中网络权值)保存至对应的RGBA四通道纹理中,GPU的着色器(Shader)从两个RGBA四通道纹理中分别读取输入数据和权重数据,GPU再根据输入数据和权重数据得出该层网络层的输出数据。一些例子中,为了便于后一网络层的前向计算,GPU将其输出数据存储至对应的RGBA四通道纹理,至此该层网络层的前向计算结束。
本申请实施例提供的方案,对于神经网络的至少一个数据处理层,将输入数据和权重数据分别存储在对应的纹理存储结构中,由于纹理存储结构索引简单方便,且数据存储量大,节省了数据处理层在进行数据处理过程中读取和存储数据的时间,使得神经网络的前向计算效率大大提高。
在本申请实施例中,第一纹理存储结构或第二纹理存储结构可以为以下任一种:
红、绿、蓝、透明RGBA四通道三维3D纹理存储结构;
RGB三通道3D纹理存储结构;
RGBA四通道2D纹理存储结构;
RGB三通道2D纹理存储结构。
一些实施例中,3D纹理存储结构与2D纹理存储结构的区别在于,3D纹理存储结构具有不同的深度,即3D纹理存储结构的纹素可以设置在不同的深度上,而2D纹理存储结构可以视为深度为1的3D纹理存储结构。其中,一个RGBA四通道2D或3D纹理存储结构中的每个纹素能够存储4个数据,一个RGB三通道2D或3D纹理存储结构中的每个纹素能够存储3个数据。可以理解的是,纹理存储结构采用何种数据存储方式可以根据实际需求进行确定。对于同一个纹理存储结构来说,单个纹素中通道数越多其能存储的数据量也就越大,GPU每次读取和存储操作的数据吞吐量也就越大,从而前向计算的效率也越高。
一些实施例中,第一纹理数据和第二纹理数据中分别包含有至少一个纹素,且每个纹素的通道数为三或四,至少一个纹素由对应的输入数据或权重数据按照通道数依序排列而成。
一些实施例中,第一纹理数据包括在至少两个维度上索引的至少一个纹素,且每个纹素具有复数个通道,每个通道用于存储所述输入数据中的一个数据元素,则存储输入数据时,可以将所述输入数据中的数据元素依次存入各纹素的各通道。
其中,纹素可以理解为纹理数据的基本构成单位,GPU在从纹理数据中读取数据,每次可以读取一个纹素内存储的数据;向纹理数据中存储数据时,每次可以向一个纹素内所有通道存储数据。每个纹素中包含的通道数决定了该纹素的数据吞吐量,由于第一纹理数据对应的纹理存储结构和第二纹理数据对应的纹理存储结构分别为三通道或四通道,那么第一纹理数据中的纹素和第二纹理数据中的纹素分别包含三通道或四通道,即通道数分别为三和四。
一些实施例中,每个纹素中的多个通道可以依次上下堆叠设置,例如,包含RGB三个通道的纹素,纹素中三个通道R通道、G通道、B通道由下至上依次堆叠,每个通道中存入输入数据或权重数据构成对应的纹理数据。对于3D纹 理数据,其所包含的多个纹素可以在深度方向依次上下堆叠设置,每个纹素包含至少RGB三个通道的并排信息,也可以包含RGBA四个通道的并排信息。例如,参考图5,其3D纹理数据可以包含2个纹素(纹素1和纹素2)的深度为2的纹理数据,其中纹素1和纹素2可以在深度方向由下至上依次堆叠设置,而每个纹素可以包括RGBA四通道,具体来说,纹素1包含标号为“1、2、3、4”的方块对应的四个通道,纹素2包含标号为“5、6、7、8”的方块对应的四个通道。
在本申请的一些实施例中,将输入数据采用第一纹理存储结构进行存储,得到第一纹理数据,包括:
获取输入数据的像素宽度、像素高度以及通道数;
基于输入数据的像素宽度、像素高度、通道数以及第一纹理存储结构的数据存储方式(即,索引的维度数量),确定第一纹理存储结构的第一纹理存储结构参数(即,至少两个维度上的尺寸和每个纹素的通道数);
将输入数据根据第一纹理存储结构参数进行存储,得到第一纹理数据。
其中,若输入数据为图片数据,根据其像素宽度、像素高度以及通道数可以确定出其数据量,若输入数据不为图片数据,也可将其类比为图片数据,并根据类比得到的像素宽度、像素高度以及颜色通道数可以确定出其数据量。例如,通道数为1的输入数据为一维列数据(A1,A2,A3,A4,A5),可以类比为像素宽为1,像素高为5,通道数为1的图片数据,显然其数据量为像素宽、像素高以及颜色通道的乘积5。可以理解的是,输入数据的通道数用于后文中数据处理层的输入通道数一般是相等的。
一些实施例中,在对输入数据进行存储得到第一纹理数据的过程中,首先可以确定用于存储输入数据的第一纹理存储结构的第一纹理存储结构参数种类。第一纹理存储结构的数据存储方式(例如2D纹理存储方式或3D纹理存储方式)不同,对应的第一纹理存储结构参数的种类也不同,故需要根据第一纹理存储结构的数据存储方式确定第一纹理存储结构参数的种类。在确定出第一纹理存储结构参数的种类后,第一纹理存储结构参数的大小决定了对应的第一纹理存储结构的能够存储的数据量,那么可根据所需存储的输入数据的数据量,确定出各第一纹理存储参数的大小(即保证输入数据中的每个数据都能存储在第一纹理存储结构中)。确定出第一纹理存储参数的种类和大小即确定出了第一纹理存储结构,只需将输入数据存入第一纹理存储结构的相应纹素中的相应通道中,即得到第一纹理数据。
在本申请的一种一些实施例中,若第一纹理存储结构为3D纹理存储结构,则第一纹理存储结构参数包括纹理存储结构的高度、宽度和深度;
若第一纹理存储结构为2D纹理存储结构,则第一纹理存储结构参数包括纹理存储结构的高度和宽度。
可以理解的是,纹理存储结构数据存储方式不同,其对应的纹理存储结构参数的种类也不相同。
在本申请的一种一些实施例中,第一纹理存储结构为3D纹理存储结构,第 一纹理存储结构参数包括纹理存储结构的高度、宽度和深度;
基于输入数据的像素宽度、像素高度、通道数以及第一纹理存储结构的数据存储方式,确定纹理存储结构的第一纹理存储结构参数,包括:
将输入数据的像素宽度作为第一纹理存储结构的宽度,将输入数据的像素高度作为第一纹理存储结构的高度,并基于输入数据的通道数和第一纹理存储结构中每个纹素的通道数,确定第一纹理存储结构的深度。
其中,由前文描述可知,若第一纹理存储结构为3D纹理存储结构,无论是RGB三通道3D纹理存储结构,还是RGBA四通道3D纹理存储结构,对应的第一纹理存储结构参数包括高度、宽度和深度。在对输入数据进行存储前,要根据输入数据的数据量和纹理存储结构的数据存储方式确定纹理存储结构的高度、宽度和深度的值。
一些实施例中,要实现对输入数据的存储,只要保证第一纹理存储结构的数据存储量大于输入数据的数据量即可。由前文描述可知,输入数据的数据量为其像素宽度、像素高度和通道数的乘积,第一纹理存储结构的数据存储量为其宽度、高度、深度和第一纹理存储结构中每个纹素的通道数的乘积。为了便于各输入数据纹理位置的确定,可以将输入数据的像素宽度作为纹理存储结构的宽度,将输入数据的像素高度作为第一纹理存储结构的高度,那么第一纹理结构的数据存储量能否大于输入数据的数据量,取决于输入数据的通道数、第一纹理存储结构的深度以及其中每个纹素的通道数。所以,无论第一纹理存储结构为RGB三通道3D纹理存储结构,还是RGBA四通道3D纹理存储结构,都可以基于输入数据的通道数和第一纹理存储结构中每个纹素的通道数,确定第一纹理存储结构的深度。
在本申请的一种一些实施例中,基于输入数据的通道数和第一纹理存储结构中每个纹素的通道数,确定第一纹理存储结构的深度,包括:
通过如下表达式确定第一纹理存储结构的深度:
Figure PCTCN2020098799-appb-000001
其中,d 1为第一纹理存储结构的深度,c 1为输入数据的通道数,s 1为第一纹理存储结构中每个纹素的通道数,
Figure PCTCN2020098799-appb-000002
为向下取整运算符号。
一些实施例中,若输入数据的像素高为w1,像素宽度为b1,通道数为c1,将输入数据的像素高度w1作为第一纹理存储结构的高度,将输入数据的像素宽度b1作为第一纹理存储结构的宽度。若第一纹理存储结构为RGBA四通道3D纹理存储结构,那么在该第一纹理存储结构中,深度1对应的结构的数据存储量为w1*b1*4。而输入数据的数据量为w1*b1*c1,要想将所有输入数据存入该第一纹理存储结构中,该纹理存储结构的深度由输入数据的通道数c1确定。当c1小于等于4的时,该第一纹理存储结构深度至少为1可满足数据量要求。当c1大于4小于等于8,该第一纹理存储结构至少深度为2可满足数据量要求。 当c1大于8小于等于12,该第一纹理存储结构深度至少为3可满足数据量要求,以此类推,即可确定出该第一纹理存储结构的深度值。换言之,在此情形下,为了使得该第一纹理存储结构中存储空间利用率尽可能的高,可以通过如下表达式确定第一纹理存储结构的深度:
Figure PCTCN2020098799-appb-000003
其中,d 1为第一纹理存储结构数据的深度,c 1为输入数据的通道数。
若第一纹理存储结构为RGB三通道纹理存储结构,那么在该第一纹理存储结构中,深度1对应的结构的数据存储量为w1*b1*3,而输入数据的数据量为w1*b1*c1,要想将所有输入数据存入该第一纹理存储结构中,该纹理存储结构的深度由输入数据的通道数c1确定。当c1小于等于3时,该第一纹理存储结构深度至少为1可满足数据量要求。当c1大于3小于等于6时,该第一纹理存储结构至少深度为2可满足数据量要求。当c1大于6小于等于9,该第一纹理存储结构深度至少为3可满足数据量要求。以此类推,即可确定出第一纹理存储结构的深度值。换言之,在此情形下,为了使得该第一纹理存储结构中存储空间利用率尽可能的高,可以通过如下表达式确定第一纹理存储结构的深度:
Figure PCTCN2020098799-appb-000004
其中,d 1为第一纹理存储结构数据的深度,c 1为输入数据的通道数。
举例来说,如图4所示,某数据处理层输入数据像素宽度为2,像素高度为2,通道数为4,那么该数据处理层的数据量为16(分别为图中所示1,2,3,…,16),可以采用宽度为2,高度为2,深度为1的RGBA四通道3D纹理存储结构对输入数据进行存储。如图所示,其存储过程可以理解为将该数据处理层的16个输入参数重新排列到宽度为2,高度为2,深度为1的RGBA四通道3D纹理存储结构中。
可以理解的是,本申请实施例中在确定出第一纹理存储结构参数的种类后,第一纹理存储结构参数的大小的确定方式不限于前文所述方式,只要确定出的第一纹理存储结构参数对应的第一纹理存储结构的数据存储量大于所要存储的输入数据的存储量即可。例如,在基于输入数据的像素宽度、像素高度、通道数以及第一纹理存储结构的数据存储方式,确定第一纹理存储结构的第一纹理存储结构参数时,也可以将输入数据的像素宽度作为纹理存储结构的高度,将输入数据的像素高度作为纹理存储结构的宽度;同时,在确定出第一纹理存储结构的宽度和高度后,确定出的深度值也可以也可以取大于通过上述表达式计算得到的深度值。
在本申请的一种一些实施例中,将权重数据采用第二纹理存储结构进行存储,得到第二纹理数据,包括:
获取与权重数据的数据量相关的参数数据;
基于与权重数据的数据量相关的参数数据以及第二纹理存储结构的数据存 储方式(即,索引的维度数量),确定第二纹理存储结构的第二纹理存储结构参数,也即,确定使所述第二纹理存储结构所存储的数据量不小于所述权重数据的数据量的所述第二纹理存储结构的至少两个维度上的尺寸和每个纹素的通道数;
将权重数据根据第二纹理存储结构参数进行存储,也即,将所述权重数据中的数据元素依次存入根据所述第二纹理存储结构各纹素的各通道,得到第二纹理数据。
其中,可将权重数据类比为图片数据,并根据类比得到的其像素宽度、像素高度以及通道数,该通道数可分为数据处理层的输入通道数和输出通道数。需要说明的是,不同数据处理层的决定其数据量的参数数据可能不同,例如,卷积层的权重数据的数据量与其像素宽度和像素高度、以及该卷积层的输入通道数和数据输出通道数都相关,而缩放层的权重数据的数据量只与其像素宽度和像素高度、以及该缩放层的输入通道数或输出通道数都相关。举例来说,卷积神经网络中的卷积层,其输入通道数为3,卷积核的像素宽度为3,卷积核的像素高度为3,输出通道数为2,那么该卷积层的数据量为像素宽度、像素高度、输入通道数和输出通道数的乘积54。
一些实施例中,在对权重数据进行存储得到第二纹理数据的过程中,首先可以确定用于存储权重数据的第二纹理存储结构的第二纹理存储结构参数种类,第二纹理存储结构的数据存储方式(例如2D纹理存储方式或3D纹理存储方式)不同,对应的第二纹理存储结构参数的种类也不同,故需要根据第二纹理存储结构的数据存储方式确定第二纹理存储结构参数的种类。在确定出第二纹理存储结构参数的种类后,第二纹理存储结构参数的大小决定了对应的第二纹理存储结构的能够存储的数据量,那么可根据所需存储的权重数据的数据量,确定出各第二纹理存储参数的大小(即保证输入数据中的每个数据都能存储在第二纹理存储结构中)。确定出第二纹理存储参数的种类和大小即确定出了第二纹理存储结构,只需将权重数据存入第二纹理存储结构的相应纹素中的相应通道中,即得到第二纹理数据。
在本申请的一种一些实施例中,若第二纹理存储结构为3D纹理存储结构,则第二纹理存储结构参数包括纹理存储结构的高度、宽度和深度;
若第二纹理存储结构为2D纹理存储结构,则第二纹理存储结构参数包括纹理存储结构的高度和宽度。
可以理解的是,纹理存储结构数据存储方式不同,其对应的纹理存储结构参数的种类也不相同。
在本申请的一种一些实施例中,第二纹理存储结构为3D纹理存储结构,第二纹理存储结构参数包括纹理存储结构的高度、宽度和深度;
若与权重数据的数据量相关的参数数据包括权重数据的像素宽度和像素高度、以及数据处理层的输入通道数和输出通道数,基于与权重数据的数据量相关的参数数据以及第二纹理存储结构的数据存储方式,确定第二纹理存储结构的第二纹理存储结构参数,包括:
基于数据处理层的输入通道数和输出通道数中的一个通道数、以及权重数据的像素宽度和像素高度,确定第二纹理存储结构的高度和宽度;
基于数据处理层的输入通道数和输出通道数中的另一个通道数、以及第二纹理存储结构中每个纹素的通道数,确定第二纹理存储结构的深度。
可以理解的是,若与权重数据的数据量相关的参数数据包括权重数据的像素宽度和像素高度、以及数据处理层的输入通道数和输出通道数,可以理解为该数据处理层的权重数据的位数为四维,在将权重参数的维数从四维转换为(降维)为与输入数据相同的三维后,该数据处理层的权重数据对应的第二纹理存储结构参数的确定过程,也可以采用类似于前文中所描述的输入数据对应的第一纹理存储结构的纹理存储结构参数的确定方式。
一些实施例中,在对权重参数进行降维后,与权重参数的数据量相关的输入通道数和输出通道数中的一个通道数将合并至其他数据,而另一个通道数则作为与输入数据的通道数相对应的参数数据,然后即可采用前文中确定第一纹理存储结构参数的方式确定第二纹理存储结构参数。
在本申请的一种一些实施例中,基于数据处理层的输入通道数和输出通道数中的一个通道数、以及权重数据的像素宽度和像素高度,确定第二纹理存储结构的高度和宽度,包括:
将权重数据的像素宽度作为第二纹理存储结构的宽度;
将权重数据的像素高度与一个通道数的乘积作为第二纹理存储结构的高度。
一些实施例中,采用将一个通道数合并至像素高度的方式对权重参数进行降维,换言之,将权重数据的像素高度与一个通道数的乘积作为降维后的权重参数的像素高度,具体来说,可以是将像素高度与输入通道数的乘积作为降维后的权重参数的像素高度,或者是将像素高度与输出通道数的乘积作为降维后的权重参数的像素高度。类似于确定前文确定第一纹理存储结构参数的方式,将降维后的权重数据的像素高度作为第二纹理存储结构的高度,将降维后的权重数据的像素宽度作为第二纹理存储结构的宽度。
可以理解的是,在实际应用中,权重数据的降维方式还可以包括以下方式:将像素宽度与输出通道数的乘积作为降维后的权重参数的像素宽度,将像素宽度与输入通道数的乘积作为降维后的权重参数的像素宽度。
基于数据处理层的输入通道数和输出通道数中的另一个通道数、以及第二纹理存储结构中每个纹素的通道数,确定第二纹理存储结构的深度,包括:
通过如下表达是确定第二纹理存储结构的深度:
Figure PCTCN2020098799-appb-000005
其中,d 2为第二纹理存储结构的深度,c 2为另一通道数,s 2为第二纹理存储结构中每个纹素的通道数,
Figure PCTCN2020098799-appb-000006
为向下取整运算符号。
一些实施例中,在对权重参数进行降维处理后,无论第二纹理存储结构为 RGB三通道3D纹理存储结构,还是RGBA四通道3D纹理存储结构,都可以采用类似确定前文描述的第一纹理存储结构的深度的确定方式来确定第二纹理存储结构深度,在此过程中,权重参数降维时没有合并至其他参数的另一通道数对应于输入数据的通道数。
举例来说,对于卷积神经网络中的一个卷积层,若该卷积层的组数(group)为1,则卷积的权重数据的数据量为input_channel*output_channel*kernel_w*kernel_h,其中input_channel为卷积层的输入通道数,output_channel为卷积层的输出通道数,kernel_w为卷积核的像素宽度,kernel_h为卷积核的像素高度,将该卷积层的权重数据排列到RGBA四通道纹理中,可以将该将RGBA四通道纹理的纹理存储参数中宽度设置为kernel_w,高度设置为kernel_h*output_channel(即将输出通道对应的权重参数增加到像素高度方向以实现降维),深度设置为(input_channel+3)/4的值向下取整。
在本申请的一种一些实施例中,第二纹理存储结构为3D纹理存储结构,第二纹理存储结构参数包括纹理存储结构的高度、宽度和深度;
若与权重数据的数据量相关的参数数据包括权重数据的像素宽度和像素高度、以及数据处理层的输入通道数或输出通道数,基于与权重数据的数据量相关的参数数据以及第二纹理存储结构的数据存储方式,确定第二纹理存储结构的第二纹理存储结构参数,包括:
将权重数据的像素宽度作为第二纹理存储结构的宽度,将权重数据的像素高度作为第二纹理存储结构的高度,并基于数据处理层的输入通道数或输出通道数,以及第二纹理存储结构中每个纹素的通道数,确定第二纹理存储结构的深度。
在本申请的一种一些实施例中,基于数据处理层的输入通道数或输出通道数,以及第二纹理存储结构中每个纹素的通道数,确定第二纹理存储结构的深度,包括:
通过如下表达式确定纹理存储结构的深度:
Figure PCTCN2020098799-appb-000007
其中,其中,d 3为第二纹理存储结构的深度,c 3为输入通道数或输出通道数,s 3为第二纹理存储结构中每个纹素的通道数,
Figure PCTCN2020098799-appb-000008
为向下取整运算符号。
一些实施例中,若与权重数据的数据量相关的参数数据包括权重数据的像素宽度和像素高度、以及数据处理层的输入通道数或输出通道数,则可以将此情形下的权重数据类比与输入数据,并采用第一纹理存储结构参数的确定方式确定该情形下的第二纹理存储结果参数,其中输入数据的通道数对应于输入通道数或者输出通道数。
举例来说,经网络中的缩放层,由于其输入通道数input_channe和输出通道数output_channel相等,其权重数据的数据量为input_channel(或 output_channel)*1*1,将该缩放层的权重数据排列到GBA四通道纹理中,可以将该将RGBA四通道纹理的纹理存储参数中宽度设置为1,高度设置为1,深度设置为(input_channel+3)/4的值向下取整,或者(output_channel+3)/4。如图5所示,将输入通道数input_channel为8的缩放层的权重参数(1,2,3,4,5,6,7,8)存储在高度为1、宽度为1、深度为2的RGBA四通道3D纹理存储结构中。
在本申请的一种一些实施例中,利用GPU中的至少一个计算单元进行数据处理层的数据处理;基于第一纹理数据和第二纹理数据进行数据处理层的数据处理,得到数据处理层的输出数据,包括:
利用各计算单元读取第一纹理数据中一个第一纹理位置存储的输入数据,并利用该计算单元读取第二纹理数据中对应于该第一纹理位置的第二纹理位置存储的权重数据;
利用各计算单元对读取到的输入数据和对应的权重数据进行数据处理,得到数据处理层的输出数据。
其中,纹理位置在实际应用中可以用纹理坐标来确定。
其中,由于每个输入数据与对应的权重数据有对应关系,在存储输入数据和权重数据时,可以根据输入数据与权重数据之间的对应关系,对第一纹理数据的纹理坐标和第二纹理数据的纹理坐标进行关联存储。可以理解的是,为了便于输入数据和权重数据的对应,第一纹理数据和第二纹理数据对应的纹理存储结构可以采用相同的数据存储方式,例如都采用RGBA四通道纹理。
一些实施例中,GPU中的每个计算单元每次读取纹理存储结构中的一个纹素里所有数据,那么在一个计算单元读取第一纹理数据中的一个纹素里的数据后,该计算单元会根据关联存储的纹理坐标(也即索引)从第二纹理中读取具有对应关系的权重数据,并进行计算得到对应的输出数据。可以理解的是,每个输入数据存储在一个第一纹理数据的一个纹素的一个通道中,那么通过该纹素在第一纹理中的纹理坐标和该纹素中的通道位置可以定位该输入数据,同理也可以在第二纹理中根据纹理坐标和通道位置定位任一权重数据。
举例来说,对于一个缩放层,其对应的第一纹理数据和第二纹理的存储方式都为RGBA四通道3D纹理,第一纹理数据中坐标(0,0,0)的纹素对应的四个输入数据与第二纹理数据中坐标(0,0,0)的纹素对应的四个存储数据关联存储。前向计算过程中,GPU的一个计算单元读取第一纹理数据位置为(0,0,0)的一个纹素点,该纹素点包含有(0,0,1)、(0,0,2)、(0,0,0)和(0,0,3)四个通道中的四个数据,同时该计算单元也读取第二纹理数据位置为(0,0,0)的一个纹素点中的四个数据,按照对应的通道对应关系进行前向计算,即输入数据乘以对应的缩放权值再加上对应的偏移权值后,得出四个输出结果。进一步的,还可以把四个输出结果保存到位置(0,0,0)的输出RGBA四通道纹理中。
需要说明的是,本申请实施例的技术效果主要体现在神经网络前向计算效率上,且所采用的纹理存储结构的通道数越多,其计算效率越高,对于相同的 神经网络模型,采用RGBA四通道纹理的方案比采用R单通道纹理的方案,前向计算的时间可减少百分之三十到五十。表1为运行一个人像分割模型采用单通道纹理和RGBA四通道纹理作为存储结构在不同的显卡上的计算时间,可以看到采用RGBA四通道纹理的方案的时间显著缩短。
表1
Figure PCTCN2020098799-appb-000009
本申请实施例主要应用在神经网络前向计算库中,由于神经网络前向计算库提供了所有神经网络算法的计算能力,本申请实施例的应用场景与使用神经网络前向库的应用场景相同。应用范围主要可以包括AI(Artificial Intelligence,人工智能)算法相关应用,如人像分割算法,手势识别,图像显著性识别等。各场景应用场景列举如下。
(1)背景虚化,如图6所示,为满足音视频通话中剔除背景的需求,通过神经网络的前向计算识别每帧图像的人像区域,然后将人像以外的背景区域进行模糊处理,从而实现避免背景进入视频。在该场景下,由于通过神经网络的前向计算进行人像区域识别过程中,采用了本申请实施例提供的前向计算方法,即采用RGBA四通道纹理存储结构(或RGB三通道纹理存储结构)对神经网络中的输入数据和权重数据进行了存储,使得前向计算过程效率提高,加快了背景虚化的速度。
(2)手势识别,如图7所示,首先通过神经网络的前向计算将图片中的包含人手的区域(图中矩形框内的区域)分割出来,然后识别分割区域内的手势,图中为识别手势“V”的过程。在该场景下,由于通过神经网络的前向计算进行手势识别过程中,采用了本申请实施例提供的前向计算方法,即采用RGBA四通道纹理存储结构(或RGB三通道纹理存储结构)对神经网络中的输入数据和权重数据进行了存储,使得前向计算过程效率提高,加快了手势识别的速度。
(3)图像显著性识别,如图8所示,图片中的车和小孩为人眼看图片时重点关注的内容,通过神经网络的前向计算可以将这些显著性高的内容标注出来,如图中两个矩形框内所示,为标注出的骑车和小孩。在该场景下,由于通过神经网络的前向计算进行图像显著性识别过程中,采用了本申请实施例提供的前向计算方法,即采用RGBA四通道纹理存储结构(或RGB三通道纹理存储结构)对神经网络中的输入数据和权重数据进行了存储,使得前向计算过程效率提高,加快了图像显著性识别的速度。
通过本申请提供的神经网络的前向计算方法都能显著提高上述三个应用场景中各功能的实现效率。
图9为本申请实施例提供的一种神经网络的前向计算装置的结构框图,如图9所示,该装置900用于对神经网络中的至少一个数据处理层进行数据处理,该装置900可以包括:数据获取模块901、数据存储模块902以及数据处理模块903。
数据获取模块901用于获取数据处理层的输入数据和权重数据;
数据存储模块902用于将采用第一纹理存储结构进行存储,得到第一纹理数据,将权重数据采用第二纹理存储结构进行存储,得到第二纹理数据;
数据处理模块903用于基于第一纹理数据和第二纹理数据进行数据处理层的数据处理,得到数据处理层的输出数据。
本申请实施例提供的方案,对于神经网络的至少一个数据处理层,将输入数据和权重数据分别存储在对应的纹理存储结构中,由于纹理存储结构索引简单方便,且数据存储量大,节省了数据处理层在进行数据处理过程中读取和存储数据的时间,使得神经网络的前向计算效率大大提高。
一些实施例中,第一纹理存储结构或第二纹理存储结构为以下任一种:
红、绿、蓝、透明RGBA四通道三维3D纹理存储结构;
RGB三通道3D纹理存储结构;
RGBA四通道2D纹理存储结构;
RGB三通道2D纹理存储结构。
一些实施例中,第一纹理数据和第二纹理数据中分别包含有至少一个纹素,且每个纹素的通道数为三或四,至少一个纹素由对应的输入数据或权重数据按照通道数依序排列而成。
一些实施例中,数据存储模块可以用于:
获取输入数据的像素宽度、像素高度以及通道数;
基于输入数据的像素宽度、像素高度、通道数以及第一纹理存储结构的数据存储方式,确定第一纹理存储结构的第一纹理存储结构参数;
将输入数据根据第一纹理存储结构参数进行存储,得到第一纹理数据。
一些实施例中,第一纹理存储结构为3D纹理存储结构,第一纹理存储结构参数包括纹理存储结构的高度、宽度和深度;数据存储模块可以用于:
将输入数据的像素宽度作为纹理存储结构的宽度,将输入数据的像素高度作为第一纹理存储结构的高度,并基于输入数据的通道数和第一纹理存储结构中每个纹素的通道数,确定第一纹理存储结构的深度。
一些实施例中,数据存储模块可以用于:
通过如下表达式确定第一纹理存储结构的深度:
Figure PCTCN2020098799-appb-000010
其中,d 1为第一纹理存储结构的深度,c 1为输入数据的通道数,s 1为第一纹理存储结构中每个纹素的通道数,
Figure PCTCN2020098799-appb-000011
为向下取整运算符号。
一些实施例中,数据存储模块可以用于:
获取与权重数据的数据量相关的参数数据;
基于与权重数据的数据量相关的参数数据以及第二纹理存储结构的数据存储方式,确定第二纹理存储结构的第二纹理存储结构参数;
将权重数据根据第二纹理存储结构参数进行存储,得到第二纹理数据。
一些实施例中,第二纹理存储结构为3D纹理存储结构,第二纹理存储结构参数包括纹理存储结构的高度、宽度和深度;
若与权重数据的数据量相关的参数数据包括权重数据的像素宽度和像素高度、以及数据处理层的输入通道数和输出通道数,数据存储模块可以用于:
基于数据处理层的输入通道数和输出通道数中的一个通道数、以及权重数据的像素宽度和像素高度,确定第二纹理存储结构的高度和宽度;
基于数据处理层的输入通道数和输出通道数中的另一个通道数、以及第二纹理存储结构中每个纹素的通道数,确定第二纹理存储结构的深度。
一些实施例中,数据存储模块可以用于:
将权重数据的像素宽度作为第二纹理存储结构的宽度;
将权重数据的像素高度与一个通道数的乘积作为第二纹理存储结构的高度;
基于数据处理层的输入通道数和输出通道数中的另一个通道数、以及第二纹理存储结构中每个纹素的通道数,确定第二纹理存储结构的深度,包括:
通过如下表达是确定第二纹理存储结构的深度:
Figure PCTCN2020098799-appb-000012
其中,d 2为第二纹理存储结构的深度,c 2为另一通道数,s 2为第二纹理存储结构中每个纹素的通道数,
Figure PCTCN2020098799-appb-000013
为向下取整运算符号。
一些实施例中,第二纹理存储结构为3D纹理存储结构,第二纹理存储结构参数包括纹理存储结构的高度、宽度和深度;
若与权重数据的数据量相关的参数数据包括权重数据的像素宽度和像素高度、以及数据处理层的输入通道数或输出通道数,数据存储模块可以用于:
将权重数据的像素宽度作为第二纹理存储结构的宽度,将权重数据的像素高度作为第二纹理存储结构的高度,并基于数据处理层的输入通道数或输出通道数,以及第二纹理存储结构中每个纹素的通道数,确定第二纹理存储结构的深度。
一些实施例中,数据存储模块可以用于:
通过如下表达式确定纹理存储结构的深度:
Figure PCTCN2020098799-appb-000014
其中,其中,d 3为第二纹理存储结构的深度,c 3为输入通道数或输出通道数,s 3为第二纹理存储结构中每个纹素的通道数,
Figure PCTCN2020098799-appb-000015
为向下取整运算符号。
一些实施例中,利用GPU中的至少一个计算单元进行数据处理层的数据处理;数据处理模块可以用于:
利用各计算单元读取第一纹理数据中一个第一纹理位置存储的输入数据,并利用该计算单元读取第二纹理数据中对应于该第一纹理位置的第二纹理位置存储的权重数据;
利用各计算单元对读取到的输入数据和对应的权重数据进行数据处理,得到数据处理层的输出数据。
基于相同的原理,本申请实施例还提供了一种电子设备,该电子设备包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行该计算机程序时,实现本申请任一实施例中所提供的方法,具体可实现如下情况:
神经网络中的至少一个数据处理层进行数据处理的方法包括:获取数据处理层的输入数据和权重数据;将输入数据采用第一纹理存储结构进行存储,得到第一纹理数据,将权重数据采用第二纹理存储结构进行存储,得到第二纹理数据;基于第一纹理数据和第二纹理数据进行数据处理层的数据处理,得到数据处理层的输出数据。
本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该程序被处理器执行时实现本申请任一实施例所示的方法。
可以理解的是,介质中存储的可以是神经网络的前向计算方法对应的计算机程序。
图10中示出了本申请实施例所适用的一种电子设备的结构示意图,如图10所示,图10所示的电子设备1000包括:处理器1001和存储器1003。其中,处理器1001和存储器1003相连,如通过总线1002相连。进一步地,电子设备1000还可以包括收发器1004,电子设备1000可以通过收发器1004与其他电子设备进行数据的交互。需要说明的是,实际应用中收发器1004不限于一个,该电子设备1000的结构并不构成对本申请实施例的限定。
其中,处理器1001应用于本申请实施例中,可以用于实现图9所示的数据获取模块、数据存储模块以及数据处理模块的功能。
处理器1001可以是CPU,通用处理器,DSP,ASIC,FPGA或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器1001也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等。
总线1002可包括一通路,在上述组件之间传送信息。总线1002可以是PCI总线或EISA总线等。总线1002可以分为地址总线、数据总线、控制总线等。为便于表示,图10中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
存储器1003可以是ROM或可存储静态信息和指令的其他类型的静态存储 设备,RAM或者可存储信息和指令的其他类型的动态存储设备,也可以是EEPROM、CD-ROM或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。
存储器1003用于存储执行本申请方案的应用程序代码,并由处理器1001来控制执行。处理器1001用于执行存储器1003中存储的应用程序代码,以实现图9所示实施例提供的神经网络的前向计算装置的动作。
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
以上仅是本申请的部分实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。

Claims (19)

  1. 一种神经网络的前向计算方法,由一计算设备执行,所述神经网络中的至少一个数据处理层进行数据处理的方法包括:
    获取所述数据处理层的输入数据和权重数据;
    将所述输入数据采用第一纹理存储结构进行存储,得到第一纹理数据,将所述权重数据采用第二纹理存储结构进行存储,得到第二纹理数据,其中,所述输入数据中的多个数据元素在所述第一纹理数据中对应相同的索引,所述权重数据中的多个数据元素在所述第二纹理数据中对应相同的索引,所述计算设备以所述索引为单位,在所述第一纹理数据和所述第二纹理数据中存取数据;
    基于所述第一纹理数据和所述第二纹理数据进行所述数据处理层的数据处理,得到所述数据处理层的输出数据。
  2. 根据权利要求1所述的方法,其中,所述第一纹理数据包括在至少两个维度上索引的至少一个纹素,且每个纹素具有复数个通道,每个通道用于存储所述输入数据中的一个数据元素,
    将所述输入数据采用第一纹理存储结构进行存储,得到第一纹理数据包括:将所述输入数据中的数据元素依次存入各纹素的各通道。
  3. 根据权利要求2所述的方法,进一步包括:
    获取所述输入数据的像素宽度、像素高度以及通道数;
    基于所述输入数据的像素宽度、像素高度、通道数以及所述第一纹理存储结构的索引的维度数量,确定所述第一纹理存储结构的所述至少两个维度上的尺寸和每个纹素的通道数,其中,所确定的各个维度上的尺寸和所述通道数的乘积不小于所述输入数据的像素宽度、像素高度以及通道数的乘积。
  4. 根据权利要求3所述的方法,其中,所述第一纹理存储结构为3D纹理存储结构,所述至少两个维度包括高度、宽度和深度;
    确定所述第一纹理存储结构的所述至少两个维度上的尺寸和每个纹素的通道数,包括:
    将所述输入数据的像素宽度作为所述第一纹理存储结构的宽度,将所述输入数据的像素高度作为所述第一纹理存储结构的高度;以及
    基于所述输入数据的通道数和所述第一纹理存储结构中每个纹素的通道数,确定所述第一纹理存储结构的深度。
  5. 根据权利要求4所述的方法,其中,所述基于所述输入数据的通道数和所述第一纹理存储结构中每个纹素的通道数,确定所述第一纹理存储结构的深度,包括:
    通过如下表达式确定所述第一纹理存储结构的深度:
    Figure PCTCN2020098799-appb-100001
    其中,d 1为所述第一纹理存储结构的深度,c 1为所述输入数据的通道数,s 1为所述第一纹理存储结构中每个纹素的通道数,
    Figure PCTCN2020098799-appb-100002
    为向下取整运算符号。
  6. 根据权利要求1所述的方法,其中,所述将所述权重数据采用第二纹理存储结构进行存储,得到第二纹理数据,包括:
    获取与所述权重数据的数据量相关的参数数据;
    基于所述与所述权重数据的数据量相关的参数数据以及所述第二纹理存储结构的索引的维度数量,确定使所述第二纹理存储结构所存储的数据量不小于所述权重数据的数据量的所述第二纹理存储结构的至少两个维度上的尺寸和每个纹素的通道数;
    将所述权重数据中的数据元素依次存入所述第二纹理存储结构各纹素的各通道,得到所述第二纹理数据。
  7. 根据权利要求6所述的方法,其中,所述第二纹理存储结构为3D纹理存储结构,所述第二纹理存储结构的至少两个维度包括纹理存储结构的高度、宽度和深度;
    若所述与所述权重数据的数据量相关的参数数据包括所述权重数据的像素宽度和像素高度、以及所述数据处理层的输入通道数和输出通道数,确定使所述第二纹理存储结构所存储的数据量不小于所述权重数据的数据量的所述第二纹理存储结构的至少两个维度上的尺寸和每个纹素的通道数,包括:
    基于所述数据处理层的输入通道数和输出通道数中的一个通道数、以及所述权重数据的像素宽度和像素高度,确定所述第二纹理存储结构的高度和宽度;
    基于所述数据处理层的输入通道数和输出通道数中的另一个通道数、以及所述第二纹理存储结构中每个纹素的通道数,确定所述第二纹理存储结构的深度。
  8. 根据权利要求7所述的方法,其中,所述基于所述数据处理层的输入通道数和输出通道数中的一个通道数、以及所述权重数据的像素宽度和像素高度,确定所述第二纹理存储结构的高度和宽度,包括:
    将所述权重数据的像素宽度作为所述第二纹理存储结构的宽度;
    将所述权重数据的像素高度与所述一个通道数的乘积作为所述第二纹理存储结构的高度;
    基于所述数据处理层的输入通道数和输出通道数中的另一个通道数、以及所述第二纹理存储结构中每个纹素的通道数,确定所述第二纹理存储结构的深度,包括:
    通过如下表达是确定所述第二纹理存储结构的深度:
    Figure PCTCN2020098799-appb-100003
    其中,d 2为所述第二纹理存储结构的深度,c 2为所述另一通道数,s 2为所述第 二纹理存储结构中每个纹素的通道数,
    Figure PCTCN2020098799-appb-100004
    为向下取整运算符号。
  9. 根据权利要求6所述的方法,其中,所述第二纹理存储结构为3D纹理存储结构,所述第二纹理存储结构的至少两个维度包括纹理存储结构的高度、宽度和深度;
    若所述与所述权重数据的数据量相关的参数数据包括所述权重数据的像素宽度和像素高度、以及所述数据处理层的输入通道数或输出通道数,确定使所述第二纹理存储结构所存储的数据量不小于所述权重数据的数据量的所述第二纹理存储结构的至少两个维度上的尺寸和每个纹素的通道数,包括:
    将所述权重数据的像素宽度作为所述第二纹理存储结构的宽度,将所述权重数据的像素高度作为所述第二纹理存储结构的高度,并基于所述数据处理层的输入通道数或输出通道数,以及所述第二纹理存储结构中每个纹素的通道数,确定所述第二纹理存储结构的深度。
  10. 根据权利要求9所述的方法,其中,所述基于所述数据处理层的输入通道数或输出通道数,以及所述第二纹理存储结构中每个纹素的通道数,确定所述第二纹理存储结构的深度,包括:
    通过如下表达式确定所述纹理存储结构的深度:
    Figure PCTCN2020098799-appb-100005
    其中,其中,d 3为所述第二纹理存储结构的深度,c 3为所述输入通道数或所述输出通道数,s 3为所述第二纹理存储结构中每个纹素的通道数,
    Figure PCTCN2020098799-appb-100006
    为向下取整运算符号。
  11. 根据权利要求1所述的方法,其中,利用GPU中的至少一个计算单元进行所述数据处理层的数据处理;所述基于所述第一纹理数据和所述第二纹理数据进行所述数据处理层的数据处理,得到所述数据处理层的输出数据,包括:
    利用各计算单元读取所述第一纹理数据中一个第一索引存储的输入数据,并利用该计算单元读取所述第二纹理数据中对应于该第一索引对应的第二索引存储的权重数据;
    利用各计算单元对读取到的输入数据和对应的权重数据进行数据处理,得到所述数据处理层的输出数据。
  12. 根据权利要求1所述的方法,其中,所述第一纹理存储结构或所述第二纹理存储结构为以下任一种:
    红、绿、蓝、透明RGBA四通道三维3D纹理存储结构;
    RGB三通道3D纹理存储结构;
    RGBA四通道2D纹理存储结构;
    RGB三通道2D纹理存储结构。
  13. 一种神经网络的前向计算装置,其中,该装置用于对所述神经网络中的至少一个数据处理层进行数据处理,包括:
    数据获取模块,用于获取所述数据处理层的输入数据和权重数据;
    数据存储模块,用于将采用第一纹理存储结构进行存储,得到第一纹理数据,将所述权重数据采用第二纹理存储结构进行存储,得到第二纹理数据,其中,所述输入数据中的多个数据元素在所述第一纹理数据中对应相同的索引,所述权重数据中的多个数据元素在所述第二纹理数据中对应相同的索引,所述计算设备以所述索引为单位,在所述第一纹理数据和所述第二纹理数据中存取数据;
    数据处理模块,用于基于所述第一纹理数据和所述第二纹理数据进行所述数据处理层的数据处理,得到所述数据处理层的输出数据。
  14. 根据权利要求13所述的前向计算装置,其中,所述第一纹理数据包括在至少两个维度上索引的至少一个纹素,且每个纹素具有复数个通道,每个通道用于存储所述输入数据中的一个数据元素,
    将数据存储模块用于,将所述输入数据中的数据元素依次存入各纹素的各通道。
  15. 根据权利要求13所述的前向计算装置,其中,所述数据存储模块进一步用于:
    获取所述输入数据的像素宽度、像素高度以及通道数;
    基于所述输入数据的像素宽度、像素高度、通道数以及所述第一纹理存储结构的索引的维度数量,确定所述第一纹理存储结构的所述至少两个维度上的尺寸和每个纹素的通道数,其中,所确定的各个维度上的尺寸和所述通道数的乘积不小于所述输入数据的像素宽度、像素高度以及通道数的乘积。
  16. 根据权利要求13所述的前向计算装置,其中,所述数据存储模块用于:
    获取与所述权重数据的数据量相关的参数数据;
    基于所述与所述权重数据的数据量相关的参数数据以及所述第二纹理存储结构的索引的维度数量,确定使所述第二纹理存储结构所存储的数据量不小于所述权重数据的数据量的所述第二纹理存储结构的至少两个维度上的尺寸和每个纹素的通道数;
    将所述权重数据中的数据元素依次存入所述第二纹理存储结构各纹素的各通道,得到所述第二纹理数据。
  17. 根据权利要求13所述的前向计算装置,其中,所述数据处理模块用于:
    利用GPU中的至少一个计算单元读取所述第一纹理数据中一个第一索引存储的输入数据,并利用该计算单元读取所述第二纹理数据中对应于该第一索引的第二索引存储的权重数据;
    利用各计算单元对读取到的输入数据和对应的权重数据进行数据处理,得到所述数据处理层的输出数据。
  18. 一种电子设备,其中,包括存储器和处理器;
    所述存储器中存储有计算机程序;
    所述处理器,用于执行所述计算机程序以实现权利要求1至12中任一项所述的方法。
  19. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至12中任一项所述的方法。
PCT/CN2020/098799 2019-12-16 2020-06-29 神经网络的前向计算方法、装置及计算机可读存储介质 WO2021120578A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/507,127 US20220044104A1 (en) 2019-12-16 2021-10-21 Method and apparatus for forward computation of neural network, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911294777.2A CN111091188B (zh) 2019-12-16 2019-12-16 神经网络的前向计算方法、装置及计算机可读存储介质
CN201911294777.2 2019-12-16

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/507,127 Continuation US20220044104A1 (en) 2019-12-16 2021-10-21 Method and apparatus for forward computation of neural network, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2021120578A1 true WO2021120578A1 (zh) 2021-06-24

Family

ID=70395080

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/098799 WO2021120578A1 (zh) 2019-12-16 2020-06-29 神经网络的前向计算方法、装置及计算机可读存储介质

Country Status (3)

Country Link
US (1) US20220044104A1 (zh)
CN (1) CN111091188B (zh)
WO (1) WO2021120578A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091188B (zh) * 2019-12-16 2022-03-25 腾讯科技(深圳)有限公司 神经网络的前向计算方法、装置及计算机可读存储介质
US11368016B2 (en) 2020-03-18 2022-06-21 Mavagail Technology, LLC ESD protection for integrated circuit devices

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106471545A (zh) * 2014-06-25 2017-03-01 高通股份有限公司 作为图像处理引擎的纹理单元
CN107836001A (zh) * 2015-06-29 2018-03-23 微软技术许可有限责任公司 硬件加速器上的卷积神经网络
US20180239992A1 (en) * 2017-02-22 2018-08-23 Arm Limited Processing artificial neural network weights
CN110555793A (zh) * 2018-06-04 2019-12-10 北京亮亮视野科技有限公司 高效的深度卷积实现方法及包括该方法的视觉处理方法
CN111091188A (zh) * 2019-12-16 2020-05-01 腾讯科技(深圳)有限公司 神经网络的前向计算方法、装置及计算机可读存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7916149B1 (en) * 2005-01-04 2011-03-29 Nvidia Corporation Block linear memory ordering of texture data
US9508185B2 (en) * 2011-05-02 2016-11-29 Sony Interactive Entertainment Inc. Texturing in graphics hardware
KR102258100B1 (ko) * 2014-11-18 2021-05-28 삼성전자주식회사 텍스쳐 처리 방법 및 장치
US10055810B2 (en) * 2016-03-04 2018-08-21 Samsung Electronics Co., Ltd. Cache architecture for efficiently accessing texture data using buffers
US10339443B1 (en) * 2017-02-24 2019-07-02 Gopro, Inc. Systems and methods for processing convolutional neural network operations using textures
CN108572593B (zh) * 2018-04-27 2020-12-18 北京源码矩阵科技有限公司 跨平台卷积神经网络控制系统及方法、信息数据处理终端
CN110147880A (zh) * 2019-05-22 2019-08-20 苏州浪潮智能科技有限公司 一种神经网络数据处理结构、方法、系统及相关装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106471545A (zh) * 2014-06-25 2017-03-01 高通股份有限公司 作为图像处理引擎的纹理单元
CN107836001A (zh) * 2015-06-29 2018-03-23 微软技术许可有限责任公司 硬件加速器上的卷积神经网络
US20180239992A1 (en) * 2017-02-22 2018-08-23 Arm Limited Processing artificial neural network weights
CN110555793A (zh) * 2018-06-04 2019-12-10 北京亮亮视野科技有限公司 高效的深度卷积实现方法及包括该方法的视觉处理方法
CN111091188A (zh) * 2019-12-16 2020-05-01 腾讯科技(深圳)有限公司 神经网络的前向计算方法、装置及计算机可读存储介质

Also Published As

Publication number Publication date
US20220044104A1 (en) 2022-02-10
CN111091188A (zh) 2020-05-01
CN111091188B (zh) 2022-03-25

Similar Documents

Publication Publication Date Title
US20210350183A1 (en) Point cloud segmentation method, computer-readable storage medium, and computer device
WO2021088556A1 (zh) 图像处理方法、装置、设备及存储介质
CN111028330B (zh) 三维表情基的生成方法、装置、设备及存储介质
CN110674829B (zh) 一种基于图卷积注意网络的三维目标检测方法
CN108491848B (zh) 基于深度信息的图像显著性检测方法和装置
CN111652974B (zh) 三维人脸模型的构建方法、装置、设备及存储介质
WO2021120578A1 (zh) 神经网络的前向计算方法、装置及计算机可读存储介质
JP7337268B2 (ja) 三次元エッジ検出方法、装置、コンピュータプログラム及びコンピュータ機器
CN111798450A (zh) 使用无监督神经网络训练技术的分割
US11954830B2 (en) High dynamic range support for legacy applications
CN111709956A (zh) 图像处理方法、装置、电子设备及可读存储介质
CN106570482A (zh) 人体动作识别方法及装置
WO2023282847A1 (en) Detecting objects in a video using attention models
US20220036106A1 (en) Method and apparatus for data calculation in neural network model, and image processing method and apparatus
CN112749576B (zh) 图像识别方法和装置、计算设备以及计算机存储介质
US11282258B1 (en) Adaptive sampling at a target sampling rate
WO2023109086A1 (zh) 文字识别方法、装置、设备及存储介质
US20230252692A1 (en) Learning dense correspondences for images
US11925860B2 (en) Projective hash maps
CN112734772B (zh) 图像处理方法、装置、电子设备以及存储介质
CN114638866A (zh) 一种基于局部特征学习的点云配准方法及系统
CN113592990A (zh) 针对二维图像的三维效果生成方法、装置、设备及介质
TW202125408A (zh) 圖像語義分割方法及裝置、儲存介質
CN111768353A (zh) 一种三维模型的孔洞填补方法及装置
CN115984583B (zh) 数据处理方法、装置、计算机设备、存储介质和程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20903531

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20903531

Country of ref document: EP

Kind code of ref document: A1