WO2021114904A1

WO2021114904A1 - Data processing method and apparatus, computer device and storage medium

Info

Publication number: WO2021114904A1
Application number: PCT/CN2020/123836
Authority: WO
Inventors: 刘道福; 黄迪; 周诗怡
Original assignee: 中科寒武纪科技股份有限公司
Priority date: 2019-12-09
Filing date: 2020-10-27
Publication date: 2021-06-17
Also published as: CN113033761A; CN113033761B

Abstract

A data processing method and apparatus, a computer device and a storage medium. The computer device comprises a control module, and the control module comprises: an instruction caching unit, an instruction processing unit and a queue storage unit, wherein the instruction caching unit is used for storing a calculation instruction associated with the computation of an artificial neural network; the instruction processing unit is used for parsing the calculation instruction to obtain a plurality of computation instructions; and the queue storage unit is used for storing an instruction queue, and the instruction queue comprises: a plurality of computation instructions or calculation instructions to be executed in the sequential order of the queue. By means of the data processing method and apparatus, the computer device and the storage medium, the computation efficiency of a related product during the computation of a neural network model is improved.

Description

Data processing method, device, computer equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201911252885.3, and the invention title is "data processing methods, devices, computer equipment and storage media" on December 9, 2019, the entire contents of which are incorporated by reference In this application.

Technical field

The present disclosure relates to the field of data processing technology, and in particular to a data processing method, device, computer equipment, and storage medium.

Background technique

In the field of artificial intelligence technology, neural network algorithm is a very popular machine learning algorithm recently, and it has achieved very good results in various fields, such as image recognition, speech recognition, natural language processing, etc. With the development of neural network algorithms, the complexity of the algorithm is getting higher and higher. In order to improve the recognition, the scale of the model is gradually increasing.

Summary of the invention

Based on this, the embodiments of the present disclosure provide a data processing method, device, computer equipment, and storage medium that can improve hardware energy efficiency ratio, reduce computing time, and improve computing efficiency.

According to an aspect of the present disclosure, there is provided a data processing method applied to a processor, and the method includes:

Split the first data according to a preset split mode to obtain multiple second data;

Performing a convolution operation of the second data and the weight respectively to obtain multiple first convolution results;

Combining the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight,

Wherein, the preset merge mode is the reverse process of the preset split mode.

According to another aspect of the present disclosure, there is provided a data processing device applied to a processor, and the device includes:

The splitting module is used to split the first data according to a preset splitting method to obtain multiple second data;

The convolution module is configured to perform the convolution operation of the second data and the weight respectively to obtain multiple first convolution results;

A merging module, configured to merge the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight,

Wherein, the preset merge mode is the reverse process of the preset split mode.

According to another aspect of the present disclosure, an artificial intelligence chip is provided, and the chip includes the data processing device as described in any one of the foregoing.

According to another aspect of the present disclosure, there is provided an electronic device including the aforementioned artificial intelligence chip.

According to another aspect of the present disclosure, there is provided a board card, the board card comprising: a storage device, an interface device, a control device, and the aforementioned artificial intelligence chip;

Wherein, the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;

The storage device is used to store data;

The interface device is used to implement data transmission between the artificial intelligence chip and external equipment;

The control device is used to monitor the state of the artificial intelligence chip.

According to another aspect of the present disclosure, there is provided an electronic device including:

processor;

A memory for storing processor executable instructions;

Wherein, the processor is configured to call instructions stored in the memory to execute the method described in any one of the foregoing.

According to another aspect of the present disclosure, there is provided a computer-readable storage medium having computer program instructions stored thereon, wherein the computer program instructions implement the method described in any one of the foregoing when the computer program instructions are executed by a processor .

In this way, after the first data is split according to the preset split mode, the multiple obtained second data are respectively convolved with the weights, and the multiple obtained first convolution results are combined according to the preset mode For merging, since the preset merging method is an inverse process of the preset splitting method, a hole convolution result of the first data and the weight can be obtained after merging. According to the data processing method, device, computer equipment, and storage medium provided by the present disclosure, the energy efficiency ratio of hardware can be improved, computing time can be reduced, and computing efficiency can be improved.

According to the following detailed description of exemplary embodiments with reference to the accompanying drawings, other features and aspects of the present disclosure will become clear.

Description of the drawings

The drawings included in the specification and constituting a part of the specification together with the specification illustrate exemplary embodiments, features, and aspects of the present disclosure, and are used to explain the principle of the present disclosure.

Fig. 1 shows a schematic diagram of an exemplary hole convolution according to the present disclosure;

Fig. 2 shows a flowchart of a data processing method according to an embodiment of the present disclosure;

Fig. 3 shows a schematic diagram of a data processing method according to an embodiment of the present disclosure;

Fig. 4 shows a schematic diagram of a data processing method according to an embodiment of the present disclosure;

Fig. 5 shows a block diagram of a data processing device according to an embodiment of the present disclosure;

Figure 6 shows a structural block diagram of a board according to an embodiment of the present disclosure;

FIG. 7 shows a block diagram of an electronic device 800 according to an embodiment of the present disclosure;

FIG. 8 shows a block diagram of an electronic device 1900 according to an embodiment of the present disclosure.

Detailed ways

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are part of the embodiments of the present disclosure, rather than all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those skilled in the art without creative work shall fall within the protection scope of the present disclosure.

It should be understood that the terms "first", "second", "third" and "fourth" in the claims, specification and drawings of the present disclosure are used to distinguish different objects, rather than to describe a specific order. . The terms "comprising" and "comprising" used in the specification and claims of the present disclosure indicate the existence of the described features, wholes, steps, operations, elements and/or components, but do not exclude one or more other features, wholes The existence or addition of, steps, operations, elements, components, and/or their collections.

It should also be understood that the terms used in this specification of the present disclosure are only for the purpose of describing specific embodiments, and are not intended to limit the present disclosure. As used in the specification and claims of the present disclosure, unless the context clearly indicates otherwise, the singular forms "a", "an" and "the" are intended to include plural forms. It should be further understood that the term "and/or" used in the specification and claims of the present disclosure refers to any combination of one or more of the associated listed items and all possible combinations, and includes these combinations.

As used in this specification and claims, the term "if" can be interpreted as "when" or "once" or "in response to determination" or "in response to detection" depending on the context. Similarly, the phrase "if determined" or "if detected [described condition or event]" can be interpreted as meaning "once determined" or "in response to determination" or "once detected [described condition or event]" depending on the context ]" or "in response to detection of [condition or event described]".

For the hole convolution with a dilation rate (number of holes) of 2, as shown in Figure 1, when convolving with a weight of 3*3, one element can be obtained for each interval in the feature map, and the obtained The weight of the element in the interval between the two elements is set to 0, and the 5*5 kernel is obtained and the weight is multiplied to increase the receptive field of the convolution.

Hole convolution can increase the receptive field of convolution, but at the same time it will also affect the hardware energy efficiency ratio and computing time, reduce the hardware energy efficiency ratio and increase the computing time.

The present disclosure provides a data processing method. The data processing method of the embodiment of the present disclosure can be applied to a processor, and the processor can be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or an artificial intelligence processor for performing artificial intelligence operations. (IPU). Artificial intelligence operations can include machine learning operations, brain-like operations, and so on. Among them, machine learning operations include neural network operations, k-means operations, support vector machine operations, and so on. The artificial intelligence processor may include, for example, GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips. The present disclosure does not limit the specific types of processors.

In a possible implementation manner, the processor mentioned in the present disclosure may include multiple processing units, and each processing unit can independently run various tasks assigned to it, such as convolution computing tasks and pooling tasks. Or fully connected tasks, etc. The present disclosure does not limit the processing unit and the tasks run by the processing unit.

Fig. 2 shows a flowchart of a data processing method according to an embodiment of the present disclosure. The method may be applied to a processor. As shown in Fig. 2, the method may include:

In step S21, the first data is split according to a preset split mode to obtain a plurality of second data.

For example, the foregoing preset splitting method may be a preset method for splitting the first data. The preset splitting method may split the first data into four second data, for example: For the rows and columns of the data, the first data can be split by the principle of one element apart to obtain four second data. After splitting, all the second data and the weights are the elements in the first convolution result. It is consistent with the elements in the convolution result of the first data and the weight.

In a possible implementation manner, the plurality of second data may include the first sub-data, the second sub-data, the third sub-data, and the fourth sub-data. The first data is divided in a preset manner. Split to obtain multiple second data, which can include:

Traverse the elements in the first data, determine the elements corresponding to the odd columns of the odd rows in the first data, and form the first sub-data;

Determining elements corresponding to odd-numbered rows and even-numbered columns in the first data to form the second sub-data;

Determining elements corresponding to odd columns of even rows in the first data to form the third sub-data;

The element corresponding to the even-numbered row and the even-numbered column in the first data is determined to form the fourth sub-data.

For example, the odd-numbered rows in the first data can be determined, and the elements corresponding to the odd-numbered columns in each odd-numbered row can be determined to form the first sub-data, and the elements corresponding to the even-numbered columns in each odd-numbered row can be determined to form the second sub-data. Sub-data; the even-numbered row in the first data can be determined, and the element corresponding to the odd-numbered column in each even-numbered row can be determined to form the third sub-data, and the element corresponding to the even-numbered column in each even-numbered row can be determined to form the fourth sub-data data.

Exemplarily, referring to FIG. 3, elements corresponding to odd columns of odd rows in the first data (identified as "1" in the first data shown in FIG. 3) may be used to form the first sub-data Determine the element corresponding to the even-numbered column of the odd-numbered row in the first data (identified as "2" in the first data shown in Figure 3) to form the second sub-data; determine the even-numbered row of the first data The element corresponding to the odd-numbered column (identified as "3" in the first data shown in Figure 3) constitutes the third sub-data; the element corresponding to the even-numbered column of the even-numbered row in the first data is determined (in the figure The first data shown in 3 is identified as "4"), which constitutes the fourth sub-data.

In step S22, the convolution operation of the second data and the weight is performed respectively to obtain multiple first convolution results.

After the first data is split into a plurality of second data, the plurality of second data may be subjected to a common convolution operation with the weights respectively to obtain a plurality of first convolution results. Taking the example shown in Figure 3 as an example, the first sub-data and the weight can be convolved to obtain the first convolution result corresponding to the first sub-data, and the second sub-data and the weight can be convolved. , Obtain the first convolution result corresponding to the second sub-data, perform convolution operation on the third sub-data and the weight, obtain the first convolution result corresponding to the third sub-data, and convolve the fourth sub-data with the weight The product operation obtains the first convolution result corresponding to the fourth sub-data.

In step S23, the multiple first convolution results are merged according to a preset merging manner to obtain a hole convolution result of the first data and the weight value,

Wherein, the preset merge mode is the reverse process of the preset split mode.

For example, the foregoing preset merging method is the inverse process of the foregoing preset splitting method, that is, the hole convolution result of the first data and the weight obtained after merging is split according to the preset splitting method, and each can be obtained. The first convolution result.

In a possible implementation manner, the above-mentioned multiple first convolution results are combined according to a preset merging manner to obtain the hole convolution result of the first data and the weight, and the first sub-convolution result The elements in the first convolution result corresponding to the data are sequentially used as the elements corresponding to the odd columns of the odd rows of the hole convolution result;

Taking the elements in the first convolution result corresponding to the second sub-data as the elements corresponding to the odd-numbered rows and even-numbered columns of the hole convolution result in sequence;

Taking the elements in the first convolution result corresponding to the third sub-data as the elements corresponding to the odd columns of the even rows in the hole convolution result in sequence;

The elements in the first convolution result corresponding to the fourth sub-data are sequentially used as the elements corresponding to the even-numbered rows and the even-numbered columns in the hole convolution result.

For example, each row in the first convolution result corresponding to the first sub-data is used as the odd row of the hole convolution result in turn, and each element in each row is used as the odd row of the odd row in the hole convolution result in turn Column, each row in the first convolution result corresponding to the second sub-data is used as an odd row of the hole convolution result in turn, and each element in each row is used as an even column of the odd row in the hole convolution result in turn. Take each row in the first convolution result corresponding to the third sub-data as the even-numbered row of the hole convolution result in turn, and use each element in each row as the odd-numbered column of the even-numbered row in the hole convolution result in turn. Each row in each convolution result corresponding to the four sub-data is successively regarded as an even-numbered row of the hole convolution result, and each element in each row is successively regarded as an even-numbered column of the even-numbered row in the hole convolution result.

Still taking the above example as an example, as shown in Figure 4, the first convolution result corresponding to the first sub-data is 2*2, the first convolution result corresponding to the second sub-data is 1*2, and the third sub-data corresponds to The first convolution result of is 2*1, and the first convolution result corresponding to the fourth sub-data is 1*1.

Take each row of the first convolution result corresponding to the first sub-data as the odd-numbered column of the odd-numbered row of the hole convolution result (that is, the two columns of the first row in the first convolution result corresponding to the first sub-data). The elements are respectively as the elements in the first column and the third column of the first row of the hole convolution result, and the two elements in the second row are respectively used as the elements in the first column and the third column of the third row of the hole convolution result , Marked as "1" in Figure 4).

The elements in the first convolution result corresponding to the second sub-data are sequentially used as the even-numbered columns of the odd rows of the hole convolution result (that is, an element in the first row of the first convolution result corresponding to the second sub-data The element in the first row and the second column as the result of the hole convolution, and an element in the second row as the element in the second column of the third row as the result of the hole convolution, which is identified as "2" in FIG. 4).

The elements in the first convolution result corresponding to the third sub-data are sequentially used as the odd-numbered columns of the even rows of the hole convolution result (that is, an element in the first row of the first convolution result corresponding to the third sub-data The element in the first column of the second row as the result of the hole convolution, and an element in the second row as the element in the third column of the second row of the hole convolution result, which is identified as "3" in FIG. 4).

The element in the first convolution result corresponding to the fourth sub-data is sequentially used as the even-numbered column of the even-numbered row of the hole convolution result (that is, an element in the first row of the first convolution result corresponding to the fourth sub-data The elements in the second row and the second column as the result of the hole convolution are identified as "4" in FIG. 4).

In this way, after the first data is split according to the preset split mode, the multiple obtained second data are respectively convolved with the weights, and the multiple obtained first convolution results are combined according to the preset mode For merging, since the preset merging method is an inverse process of the preset splitting method, a hole convolution result of the first data and the weight can be obtained after merging. According to the data processing method provided by the present disclosure, the energy efficiency ratio of hardware can be improved, computing time can be reduced, and computing efficiency can be improved.

In order to enable those skilled in the art to better understand the beneficial effects of the present disclosure, the following uses specific examples to illustrate the beneficial effects of the present disclosure.

Assuming that the convolution scale that the processor can handle is: 5*5 neurons and 3*3 weights, if the first data shown in Figure 3 is directly convolved with 3*3 weights, then the processing The processor can only process a convolution of a 5*5 kernel and a 3*3 weight at a time, and output a convolution result. The processor needs to perform 9 operations to complete this hollow convolution.

But after splitting the first data into four second data, the scales of multiple second data are: 4*4, 3*4, 4*3, 3*3, and the processor can get 9 through one operation As a result, the hole convolution of the first data and the weight can be completed.

It can be seen that the data processing method provided by the present disclosure improves the energy efficiency ratio of hardware, reduces computing time, and improves computing efficiency.

In a possible implementation manner, the foregoing first data may include neurons and/or gradients.

In the back propagation process of the hole convolution, the hole convolution can be performed through the first gradient and the weight of the current convolution layer to determine the second gradient of the next convolution layer. In this process, the gradient of the current convolutional layer can be split according to the preset splitting method to obtain four first sub-gradients, and the four first sub-gradients and weights can be convolved to obtain four If the four convolution results are combined according to the preset combining method, the second gradient of the next convolution layer can be obtained.

In a possible implementation manner, the foregoing first data may include a first neuron and a first gradient, and the splitting of the first data according to a preset splitting manner to obtain a plurality of second data may include:

Splitting the first neuron according to the preset splitting manner to obtain a plurality of second neurons;

The first gradient is split according to the preset split mode to obtain a plurality of second gradients.

For example, the first neuron can be split according to a preset splitting method to obtain multiple second neurons. Exemplarily, the second neuron may include: a first sub-neuron, a second sub-neuron, a third sub-neuron, and a fourth sub-neuron, and then it can be determined that the odd-numbered row of the first neuron corresponds to the odd-numbered column The elements of to form the first sub-neuron, determine the elements corresponding to the odd-numbered rows and the even-numbered columns in the first neuron, form the second sub-neuron, and determine the even-numbered rows in the first neuron Elements corresponding to odd-numbered columns of to form the third sub-neuron, and elements corresponding to even-numbered rows of even-numbered columns in the first neuron are determined to form the fourth sub-neuron.

Correspondingly, the first gradient can be split according to a preset split mode to obtain multiple second gradients. Exemplarily, the second gradient may include: a first sub-gradient, a second sub-gradient, a third sub-gradient, and a fourth sub-gradient, then the elements corresponding to the odd-numbered columns of the odd-numbered rows of the first gradient may be determined to form the The first sub-gradient determines the elements corresponding to the odd-numbered rows and the even-numbered columns in the first gradient to form the second sub-gradient, and determines the elements corresponding to the odd-numbered columns in the even-numbered rows in the first gradient to form the The third sub-gradient determines the elements corresponding to the even-numbered rows and the even-numbered columns in the first gradient to form the fourth sub-gradient.

In a possible implementation manner, the above method may further include:

For any of the second neurons, perform a convolution operation on the second neuron and the corresponding second gradient to obtain a third convolution result;

Determining that the sum of the third convolution results corresponding to each of the second neurons is the residual of the weight;

Wherein, the parity property of the row and column corresponding to the position of the element in the second neuron in the first neuron corresponds to the position of the element in the corresponding second gradient in the first gradient The parity of the rows and columns is consistent.

For example, the parity properties of the row and column corresponding to the position of the element in the second gradient corresponding to the second neuron in the first gradient correspond to the position of the element in the second neuron in the first neuron The rows and columns of is consistent with the parity properties, for example: all elements in the second neuron are in odd rows and odd columns in the first neuron, then all elements in the second gradient corresponding to the second neuron are in the first gradient All elements in the second neuron are in odd rows and odd columns; or, all elements in the second neuron are in odd rows and even columns in the first neuron, then all elements in the second gradient corresponding to the second neuron are in the first gradient Are located in odd rows and even columns, or all elements in the second neuron are located in even rows and odd columns in the first neuron, then all elements in the second gradient corresponding to the second neuron are in the first gradient. Located in even-numbered rows and odd-numbered columns; or, all elements in the second neuron are in even-numbered rows and even-numbered columns in the first neuron, then all elements in the second gradient corresponding to the second neuron are located in the first gradient Even rows and even columns.

Exemplarily, taking the above example as an example, the second neuron includes: a first sub-neuron, a second sub-neuron, a third sub-neuron, and a fourth sub-neuron, and the second gradient includes: the first sub-gradient, The second sub-gradient, the third sub-gradient and the fourth sub-gradient, the first sub-neuron corresponds to the first sub-gradient, and the first sub-neuron performs convolution processing with the first sub-gradient to obtain the first sub-neuron The corresponding convolution result; the second sub-neuron corresponds to the second sub-gradient, and the second sub-neuron performs convolution processing with the second sub-gradient to obtain the convolution result corresponding to the second sub-neuron; the third sub-neuron The third sub-neuron corresponds to the third sub-gradient, and the third sub-neuron performs convolution processing with the third sub-gradient to obtain the convolution result corresponding to the third sub-neuron; the fourth sub-neuron corresponds to the fourth sub-gradient. The four sub-neurons and the fourth sub-gradient perform convolution processing to obtain a convolution result corresponding to the fourth sub-neuron.

After the third convolution result corresponding to each second neuron is obtained, the third convolution result corresponding to each second neuron is added, and the obtained sum is determined as the residual of the weight.

In this way, the energy efficiency ratio of the hardware can be improved, the calculation time can be reduced, and the calculation efficiency can be improved.

In a possible implementation manner, the above method may further include:

The weight value is adjusted according to the residual error of the weight value.

For example, after determining the residual of the weight, the weight of the current convolutional layer can be adjusted according to the residual of the weight. For example, the sum of the residual of the weight and the weight can be determined as the new weight.

It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that the present disclosure is not limited by the described sequence of actions. Because according to the present disclosure, certain steps can be performed in other order or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the involved actions and modules are not necessarily required by the present disclosure.

It should be further noted that although the steps in the flowcharts of FIGS. 1-4 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figures 1-4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.

Fig. 5 shows a block diagram of a data processing device according to an embodiment of the present disclosure. As shown in FIG. 5, the data processing device may include:

The splitting module 501 can be used to split the first data according to a preset splitting manner to obtain multiple second data;

The convolution module 502 may be used to perform the convolution operation of the second data and the weight respectively to obtain multiple first convolution results;

The merging module 503 may be used to merge the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight value,

Wherein, the preset merge mode is the reverse process of the preset split mode.

In this way, after the first data is split according to the preset split mode, the multiple obtained second data are respectively convolved with the weights, and the multiple obtained first convolution results are combined according to the preset mode For merging, since the preset merging method is an inverse process of the preset splitting method, a hole convolution result of the first data and the weight can be obtained after merging. According to the data processing device provided by the present disclosure, the energy efficiency ratio of hardware can be improved, computing time can be reduced, and computing efficiency can be improved.

In a possible implementation manner, the plurality of second data includes first sub-data, second sub-data, third sub-data, and fourth sub-data. The aforementioned splitting module may also be used for:

In a possible implementation manner, the above-mentioned merging module can also be used for:

Taking the elements in the first convolution result corresponding to the first sub-data as the elements corresponding to the odd columns of the odd rows of the hole convolution result in sequence;

The elements in the first convolution result corresponding to the fourth sub-data are sequentially used as the elements corresponding to the even-numbered rows and even-numbered columns in the hole convolution result.

In a possible implementation manner, the first data includes neurons and/or gradients.

In a possible implementation manner, the first data may include a first neuron and a first gradient, and the above-mentioned splitting module may also be used for:

In a possible implementation manner, the foregoing apparatus may further include:

A processing module, configured to perform a convolution operation on any of the second neuron and the corresponding second gradient to obtain a third convolution result;

A determining module, configured to determine that the sum of the third convolution results corresponding to each of the second neurons is the residual of the weight;

The adjustment module is configured to adjust the weight value according to the residual error of the weight value.

In some embodiments of the present disclosure, the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments. For specific implementation and technical effects, please refer to the above method embodiments. Description, for the sake of brevity, I will not repeat it here.

It should be understood that the foregoing device embodiments are only illustrative, and the device of the present disclosure may also be implemented in other ways. For example, the division of units/modules in the above-mentioned embodiments is only a logical function division, and there may be other division methods in actual implementation. For example, multiple units, modules or components may be combined or integrated into another system, or some features may be omitted or not implemented.

In addition, unless otherwise specified, the functional units/modules in the various embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may exist. The modules are integrated together. The above-mentioned integrated unit/module can be realized in the form of hardware or software program module.

If the integrated unit/module is implemented in the form of hardware, the hardware may be a digital circuit, an analog circuit, and so on. The physical realization of the hardware structure includes but is not limited to transistors, memristors and so on. Unless otherwise specified, the artificial intelligence processor may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on. Unless otherwise specified, the storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), DRAM (Dynamic Random Access Memory), Static random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Cube), etc. Wait.

If the integrated unit/module is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory. Based on this understanding, the technical solution of the present disclosure essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present disclosure. The aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.

In a possible implementation manner, an artificial intelligence chip is also disclosed, which includes the above-mentioned data processing device.

In a possible implementation manner, a board card is also disclosed, which includes a storage device, an interface device, a control device, and the above-mentioned artificial intelligence chip; wherein the artificial intelligence chip is related to the storage device and the control device. And the interface devices are respectively connected; the storage device is used to store data; the interface device is used to realize data transmission between the artificial intelligence chip and an external device; the control device is used to The state of the artificial intelligence chip is monitored.

Fig. 6 shows a structural block diagram of a board according to an embodiment of the present disclosure. Referring to Fig. 6, the board may include other supporting components in addition to the chip 389 described above. The supporting components include, but are not limited to: a storage device 390 Interface device 391 and control device 392.

The storage device 390 is connected to the artificial intelligence chip through a bus for storing data. The storage device may include multiple groups of storage units 393. Each group of the storage unit and the artificial intelligence chip are connected through a bus. It can be understood that each group of the storage units may be DDR SDRAM (English: Double Data Rate SDRAM, double-rate synchronous dynamic random access memory).

DDR does not need to increase the clock frequency to double the speed of SDRAM. DDR allows data to be read on the rising and falling edges of the clock pulse. The speed of DDR is twice that of standard SDRAM. In an embodiment, the storage device may include 4 groups of the storage units. Each group of the storage unit may include a plurality of DDR4 particles (chips). In one embodiment, the artificial intelligence chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification. It can be understood that when DDR4-3200 particles are used in each group of the storage units, the theoretical bandwidth of data transmission can reach 25600MB/s.

In one embodiment, each group of the storage unit includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel. DDR can transmit data twice in one clock cycle. A controller for controlling the DDR is provided in the chip for controlling the data transmission and data storage of each storage unit.

The interface device is electrically connected with the artificial intelligence chip. The interface device is used to implement data transmission between the artificial intelligence chip and an external device (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer. Preferably, when the PCIE3.0X 16 interface is used for transmission, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device may also be other interfaces. The present disclosure does not limit the specific manifestations of the above other interfaces, as long as the interface unit can realize the switching function. In addition, the calculation result of the artificial intelligence chip is still transmitted by the interface device back to an external device (such as a server).

The control device is electrically connected with the artificial intelligence chip. The control device is used to monitor the state of the artificial intelligence chip. Specifically, the artificial intelligence chip and the control device may be electrically connected through an SPI interface. The control device may include a single-chip microcomputer (Micro Controller Unit, MCU). For example, the artificial intelligence chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the artificial intelligence chip can be in different working states such as multi-load and light-load. The control device can realize the regulation and control of the working states of multiple processing chips, multiple processing and or multiple processing circuits in the artificial intelligence chip.

In a possible implementation manner, an electronic device is disclosed, which includes the aforementioned artificial intelligence chip. Electronic equipment includes data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, cameras, projectors, watches, headsets , Mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment. The transportation means include airplanes, ships, and/or vehicles; the household appliances include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.

The embodiments of the present disclosure also provide a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement the above-mentioned method when executed by a processor. The computer-readable storage medium may be a non-volatile computer-readable storage medium.

An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the instructions stored in the memory to execute the above method.

The electronic device can be provided as a terminal, server or other form of device.

FIG. 7 shows a block diagram of an electronic device 800 according to an embodiment of the present disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and other terminals.

7, the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, and a sensor component 814 , And communication component 816.

The processing component 802 generally controls the overall operations of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the foregoing method. In addition, the processing component 802 may include one or more modules to facilitate the interaction between the processing component 802 and other components. For example, the processing component 802 may include a multimedia module to facilitate the interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations in the electronic device 800. Examples of these data include instructions for any application or method to operate on the electronic device 800, contact data, phone book data, messages, pictures, videos, etc. The memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic Disk or Optical Disk.

The power supply component 806 provides power for various components of the electronic device 800. The power supply component 806 may include a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a microphone (MIC), and when the electronic device 800 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal. The received audio signal may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, the audio component 810 further includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module. The above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.

The sensor component 814 includes one or more sensors for providing the electronic device 800 with various aspects of state evaluation. For example, the sensor component 814 can detect the on/off status of the electronic device 800 and the relative positioning of the components. For example, the component is the display and the keypad of the electronic device 800. The sensor component 814 can also detect the electronic device 800 or the electronic device 800. The position of the component changes, the presence or absence of contact between the user and the electronic device 800, the orientation or acceleration/deceleration of the electronic device 800, and the temperature change of the electronic device 800. The sensor component 814 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact. The sensor component 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 can access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more application-specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field-available A programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.

In an exemplary embodiment, there is also provided a non-volatile computer-readable storage medium, such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to complete the foregoing method.

FIG. 8 shows a block diagram of an electronic device 1900 according to an embodiment of the present disclosure. For example, the electronic device 1900 may be provided as a server. 8, the electronic device 1900 includes a processing component 1922, which further includes one or more processors, and a memory resource represented by the memory 1932, for storing instructions that can be executed by the processing component 1922, such as application programs. The application program stored in the memory 1932 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 1922 is configured to execute instructions to perform the above-described methods.

The electronic device 1900 may also include a power supply component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input output (I/O) interface 1958 . The electronic device 1900 can operate based on an operating system stored in the memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.

In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the foregoing method.

In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments. The technical features of the above-mentioned embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the various technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should all be combined. It is considered as the range described in this specification.

The foregoing can be better understood according to the following clauses:

Clause A1, a data processing method, applied to a processor, the method including:

Wherein, the preset merge mode is the reverse process of the preset split mode.

Clause A2, according to the method of clause A1, the plurality of second data includes first sub-data, second sub-data, third sub-data, and fourth sub-data, and the first data is split according to presets Way to split, get multiple second data, including:

Clause A3, according to the method of clause A2, the combining the multiple first convolution results in a preset combining manner to obtain the hole convolution result of the first data and the weight includes:

Clause A4, according to the method of any one of clauses A1 to A3, the first data includes neurons and/or gradients.

Clause A5, according to the method of any one of clauses A1 to A4, the first data includes a first neuron and a first gradient, and the first data is split according to a preset split mode to obtain multiple The second data includes:

Clause A6, the method according to clause A5, the method further includes:

Clause A7, the method according to clause A6, the method further comprising:

Clause A8, a data processing device applied to a processor, the device comprising:

Wherein, the preset merge mode is the reverse process of the preset split mode.

Clause A9, the device according to clause A8, wherein the plurality of second data includes first sub-data, second sub-data, third sub-data, and fourth sub-data, and the splitting module is further configured to:

Clause A10, the device according to clause A9, the merging module is further used for:

Clause A11, the device according to any one of clauses A8 to A10, the first data includes neurons and/or gradients.

Clause A12, the device according to any one of clauses A8 to A11, the first data includes a first neuron and a first gradient, and the splitting module is further used for:

Clause A13, the device according to clause A12, the device further comprising:

Clause A14, the device according to clause A13, the device further comprising:

Clause A15, an artificial intelligence chip including the data processing device as described in Clause A8.

Clause A16, an electronic device including the artificial intelligence chip as described in Clause A15.

Clause A17, a board card, the board card includes: a storage device, an interface device, a control device, and the artificial intelligence chip as described in clause A15;

The storage device is used to store data;

Clause A18, the board according to clause A17, the storage device includes: multiple groups of storage units, each group of the storage unit is connected to the artificial intelligence chip through a bus, and the storage unit is: DDR SDRAM;

The chip includes: a DDR controller, which is used to control the data transmission and data storage of each storage unit;

The interface device is: a standard PCIE interface.

Clause A18, an electronic device, including:

processor;

A memory for storing processor executable instructions;

Wherein, the processor is configured to call instructions stored in the memory to execute the method described in any one of clauses A1 to A7.

Clause A19, a computer-readable storage medium with computer program instructions stored thereon, characterized in that, when the computer program instructions are executed by a processor, the method described in any one of clauses A1 to A7 is implemented.

The embodiments of the present disclosure are described in detail above, and specific examples are used in this article to illustrate the principles and implementations of the present disclosure. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present disclosure. At the same time, changes or modifications made by those skilled in the art based on the ideas of the present disclosure, the specific embodiments and the scope of application of the present disclosure, are all within the protection scope of the present disclosure. In summary, the content of this specification should not be construed as a limitation of this disclosure.

Claims

A data processing method, characterized in that it is applied to a processor, and the method includes:

Split the first data according to a preset split mode to obtain multiple second data;

Performing a convolution operation of the second data and the weight respectively to obtain multiple first convolution results;

Combining the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight,

Wherein, the preset merge mode is the reverse process of the preset split mode.
The method according to claim 1, wherein the plurality of second data includes a first sub-data, a second sub-data, a third sub-data, and a fourth sub-data, and the first data is preset Split by splitting method to obtain multiple second data, including:

Traverse the elements in the first data, determine the elements corresponding to the odd columns of the odd rows in the first data, and form the first sub-data;

Determining elements corresponding to odd-numbered rows and even-numbered columns in the first data to form the second sub-data;

Determining elements corresponding to odd columns of even rows in the first data to form the third sub-data;

The element corresponding to the even-numbered row and the even-numbered column in the first data is determined to form the fourth sub-data.
3. The method according to claim 2, wherein the combining the multiple first convolution results according to a preset combining manner to obtain a hole convolution result of the first data and the weight, include:

Taking the elements in the first convolution result corresponding to the first sub-data as the elements corresponding to the odd columns of the odd rows of the hole convolution result in sequence;

Taking the elements in the first convolution result corresponding to the second sub-data as the elements corresponding to the odd-numbered rows and even-numbered columns of the hole convolution result in sequence;

Taking the elements in the first convolution result corresponding to the third sub-data as the elements corresponding to the odd columns of the even rows in the hole convolution result in sequence;

The elements in the first convolution result corresponding to the fourth sub-data are sequentially used as the elements corresponding to the even-numbered rows and the even-numbered columns in the hole convolution result.
The method according to any one of claims 1 to 3, wherein the first data includes neurons and/or gradients.
The method according to any one of claims 1 to 4, wherein the first data includes a first neuron and a first gradient, and the first data is split according to a preset split mode, Obtain multiple second data, including:

Splitting the first neuron according to the preset splitting manner to obtain a plurality of second neurons;

The first gradient is split according to the preset split mode to obtain a plurality of second gradients.
The method according to claim 5, wherein the method further comprises:

For any of the second neurons, perform a convolution operation on the second neuron and the corresponding second gradient to obtain a third convolution result;

Determining that the sum of the third convolution results corresponding to each of the second neurons is the residual of the weight;

Wherein, the parity property of the row and column corresponding to the position of the element in the second neuron in the first neuron corresponds to the position of the element in the corresponding second gradient in the first gradient The parity of the rows and columns is consistent.
The method according to claim 6, wherein the method further comprises:

The weight value is adjusted according to the residual error of the weight value.
A data processing device, characterized in that it is applied to a processor, and the device includes:

The splitting module is used to split the first data according to a preset splitting method to obtain multiple second data;

The convolution module is configured to perform the convolution operation of the second data and the weight respectively to obtain multiple first convolution results;

A merging module, configured to merge the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight,

Wherein, the preset merge mode is the reverse process of the preset split mode.
An artificial intelligence chip, characterized in that the chip includes the data processing device according to claim 8.
An electronic device, wherein the electronic device comprises the artificial intelligence chip according to claim 9.
A board card, characterized in that the board card comprises: a storage device, an interface device, a control device, and the artificial intelligence chip according to claim 9;

Wherein, the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;

The storage device is used to store data;

The interface device is used to implement data transmission between the artificial intelligence chip and external equipment;

The control device is used to monitor the state of the artificial intelligence chip.
The board according to claim 11, characterized in that,

The storage device includes: multiple groups of storage units, each group of the storage unit is connected to the artificial intelligence chip through a bus, and the storage unit is: DDR SDRAM;

The chip includes: a DDR controller, which is used to control the data transmission and data storage of each storage unit;

The interface device is: a standard PCIE interface.
An electronic device, characterized in that it comprises:

processor;

A memory for storing processor executable instructions;

Wherein, the processor is configured to call instructions stored in the memory to execute the method according to any one of claims 1 to 7.
A computer-readable storage medium having computer program instructions stored thereon, wherein the computer program instructions implement the method according to any one of claims 1 to 7 when the computer program instructions are executed by a processor.