CN113033761A

CN113033761A - Data processing method, data processing device, computer equipment and storage medium

Info

Publication number: CN113033761A
Application number: CN201911252885.3A
Authority: CN
Inventors: 不公告发明人
Original assignee: Cambricon Technologies Corp Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2021-06-25
Anticipated expiration: 2039-12-09
Also published as: CN113033761B; WO2021114904A1

Abstract

The present disclosure relates to a data processing method, apparatus, computer device, and storage medium. The computer device includes a control module, the control module including: the device comprises an instruction cache unit, an instruction processing unit and a storage queue unit; the instruction cache unit is used for storing the calculation instruction associated with the artificial neural network operation; the instruction processing unit is used for analyzing the calculation instruction to obtain a plurality of operation instructions; the storage queue unit is configured to store an instruction queue, where the instruction queue includes: and a plurality of operation instructions or calculation instructions to be executed according to the front and back sequence of the queue. Through the method, the operation efficiency of the related product in the operation of the neural network model can be improved.

Description

Data processing method, data processing device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, a computer device, and a storage medium.

Background

In the technical field of artificial intelligence, a neural network algorithm is a very popular machine learning algorithm in recent years, and has a very good effect in various fields, such as image recognition, voice recognition, natural language processing and the like. Along with the development of neural network algorithms, the complexity of the algorithms is higher and higher, and in order to improve the recognition degree, the scale of the model is gradually increased.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a data processing method, an apparatus, a computer device, and a storage medium, which can improve the hardware energy efficiency ratio, reduce the computation time, and improve the computation efficiency.

According to an aspect of the present disclosure, there is provided a data processing method applied to a processor, the method including:

splitting the first data according to a preset splitting mode to obtain a plurality of second data;

performing convolution operation on the second data and the weight respectively to obtain a plurality of first convolution results;

merging the first convolution results according to a preset merging mode to obtain a cavity convolution result of the first data and the weight,

and the preset merging mode is the inverse process of the preset splitting mode.

According to another aspect of the present disclosure, there is provided a data processing apparatus applied to a processor, the apparatus including:

the splitting module is used for splitting the first data according to a preset splitting mode to obtain a plurality of second data;

the convolution module is used for performing convolution operation on the second data and the weight respectively to obtain a plurality of first convolution results;

a merging module, configured to merge the multiple first convolution results according to a preset merging manner to obtain a void convolution result of the first data and the weight,

According to another aspect of the present disclosure, there is provided an artificial intelligence chip comprising a data processing apparatus according to any one of the preceding claims.

According to another aspect of the present disclosure, there is provided an electronic device including the artificial intelligence chip as described above.

According to another aspect of the present disclosure, a board card is provided, which includes: memory devices, interface devices and control devices and artificial intelligence chips as previously described;

wherein, the artificial intelligence chip is respectively connected with the storage device, the control device and the interface device;

the storage device is used for storing data;

the interface device is used for realizing data transmission between the artificial intelligence chip and external equipment;

and the control device is used for monitoring the state of the artificial intelligence chip.

According to another aspect of the present disclosure, there is provided an electronic device including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any of the preceding.

According to another aspect of the present disclosure, there is provided a computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the method of any one of the preceding.

In this way, after the first data is split according to the preset splitting mode, the obtained plurality of second data are respectively convolved with the weight, and the obtained plurality of first convolution results are combined according to the preset combining mode. According to the data processing method, the data processing device, the computer equipment and the storage medium, the hardware energy efficiency ratio can be improved, the operation time can be reduced, and the operation efficiency can be improved.

Through deducing technical characteristics in the claims, the beneficial effects corresponding to the technical problems in the background art can be achieved. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 shows a schematic diagram of an exemplary hole convolution according to the present disclosure;

FIG. 2 shows a flow diagram of a data processing method according to an embodiment of the present disclosure;

FIG. 3 shows a schematic diagram of a data processing method according to an embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of a data processing method according to an embodiment of the present disclosure;

FIG. 5 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure;

fig. 6 shows a block diagram of a board card according to an embodiment of the present disclosure;

FIG. 7 shows a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure;

fig. 8 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, description, and drawings of the present disclosure are used to distinguish between different objects and are not used to describe a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

For the convolution of a hole with a resolution rate of 2, as shown in fig. 1, when performing convolution with a weight of 3 × 3, one element is obtained every other element in the feature map, the weight of the obtained element in the interval between two elements is set to 0, and a kernel of 5 × 5 is obtained and subjected to a bit-multiplication operation with the weight, thereby increasing the field of view of the convolution.

The field of experience of convolution can be increased through the hole convolution, but the influence can be brought to the hardware energy efficiency ratio and the operation time, the hardware energy efficiency ratio is reduced, and the operation time is increased.

In order to solve the above technical problem, the present disclosure provides a data processing method.

The data Processing method according to the embodiment of the present disclosure may be applied to a processor, which may be a general-purpose processor, such as a Central Processing Unit (CPU), or an artificial Intelligence Processor (IPU) for performing artificial intelligence operations. The artificial intelligence operations may include machine learning operations, brain-like operations, and the like. The machine learning operation comprises neural network operation, k-means operation, support vector machine operation and the like. The artificial intelligence processor may include, for example, one or a combination of a GPU (Graphics Processing Unit), a NPU (Neural-Network Processing Unit), a DSP (Digital Signal Processing Unit), and a Field Programmable Gate Array (FPGA) chip. The present disclosure is not limited to a particular type of processor.

In one possible implementation, the processor referred to in this disclosure may include multiple processing units, each of which may independently run various tasks assigned thereto, such as: a convolution operation task, a pooling task, a full connection task, or the like. The present disclosure is not limited to processing units and tasks executed by processing units.

Fig. 2 shows a flow chart of a data processing method according to an embodiment of the present disclosure, which may be applied to a processor, as shown in fig. 2, which may include:

in step S21, the first data is split according to a preset splitting manner, so as to obtain a plurality of second data.

For example, the preset splitting manner may be a preset manner for splitting the first data, and the preset splitting manner may split the first data into four second data, for example: and splitting the first data by adopting a principle of separating one element for the rows and the columns of the first data to obtain four second data, wherein the elements in the first convolution result of all the second data and the weight obtained after splitting are consistent with the elements in the convolution result of the first data and the weight.

In a possible implementation manner, the aforementioned plurality of second data may include first sub data, second sub data, third sub data, and fourth sub data, and splitting the first data according to a preset splitting manner to obtain a plurality of second data, which may include:

traversing elements in first data, determining elements corresponding to odd columns of odd rows in the first data, and forming first subdata;

determining elements corresponding to even columns of odd rows in the first data to form the second subdata;

determining elements corresponding to odd columns of even rows in the first data to form the third subdata;

and determining elements corresponding to even columns of even rows in the first data to form the fourth subdata.

For example, odd rows in the first data may be determined, elements corresponding to odd columns in each odd row may be determined to constitute first sub-data, and elements corresponding to even columns in each odd row may be determined to constitute second sub-data; even rows in the first data may be determined, elements corresponding to odd columns in each even row may be determined to constitute third sub-data, and elements corresponding to even columns in each even row may be determined to constitute fourth sub-data.

For example, referring to fig. 3, elements corresponding to odd columns of odd rows in the first data (identified as "1" in the first data shown in fig. 3) may be grouped into the first sub-data; determining elements (marked as '2' in the first data shown in FIG. 3) corresponding to even columns of odd rows in the first data, and composing the second subdata; determining elements (marked as '3' in the first data shown in fig. 3) corresponding to odd columns of even rows in the first data to compose the third sub-data; elements corresponding to even columns of even rows in the first data (identified as "4" in the first data shown in fig. 3) are determined, and the fourth sub-data is composed.

In step S22, convolution operations are performed on the second data and the weights, respectively, to obtain a plurality of first convolution results.

After the first data is split into the plurality of second data, the plurality of second data may be respectively subjected to a common convolution operation with the weight to obtain a plurality of first convolution results. Taking the example shown in fig. 3 as an example, the first sub-data and the weight may be convolved to obtain a first convolution result corresponding to the first sub-data, the second sub-data and the weight may be convolved to obtain a first convolution result corresponding to the second sub-data, the third sub-data and the weight may be convolved to obtain a first convolution result corresponding to the third sub-data, and the fourth sub-data and the weight may be convolved to obtain a first convolution result corresponding to the fourth sub-data.

In step S23, the first convolution results are merged according to a preset merging manner to obtain a hole convolution result of the first data and the weight,

For example, the preset merging mode is an inverse process of the preset splitting mode, that is, the merged first data and the result of the hole convolution of the weight are split according to the preset splitting mode, so that each first convolution result can be obtained.

In a possible implementation manner, the combining the plurality of first convolution results according to a preset combining manner, and determining that a second convolution result obtained after the combining is a hole convolution result of the first data and the weight includes:

sequentially using elements in a first convolution result corresponding to the first subdata as elements corresponding to odd columns of odd rows of the cavity convolution result;

sequentially using elements in the first convolution result corresponding to the second subdata as elements corresponding to even columns of odd rows of the cavity convolution result;

sequentially using elements in the first convolution result corresponding to the third subdata as elements corresponding to odd columns of even rows in the cavity convolution result;

and sequentially using the elements in the first convolution result corresponding to the fourth subdata as elements corresponding to the even columns of the even rows in the cavity convolution result.

For example, each row in the first convolution result corresponding to the first sub-data is sequentially used as an odd row of the hole convolution result, each element in each row is sequentially used as an odd column of the odd row in the hole convolution result, each row in the first convolution result corresponding to the second sub-data is sequentially used as an odd row of the hole convolution result, and each element in each row is sequentially used as an even column of the odd row in the hole convolution result. And sequentially taking each row in the first convolution result corresponding to the third sub-data as an even row of the cavity convolution result, sequentially taking each element in each row as an odd column of the even row in the cavity convolution result, sequentially taking each row in each convolution result corresponding to the fourth sub-data as an even row of the cavity convolution result, and sequentially taking each element in each row as an even column of the even row in the cavity convolution result.

Still taking the above example as an example, as shown in fig. 4, the first convolution result corresponding to the first sub-data is 2 × 2, the first convolution result corresponding to the second sub-data is 1 × 2, the first convolution result corresponding to the third sub-data is 2 × 1, and the first convolution result corresponding to the fourth sub-data is 1 × 1.

And sequentially taking each row in the first convolution result corresponding to the first sub-data as an odd column of an odd row of the hole convolution result (that is, two elements in the first row in the first convolution result corresponding to the first sub-data are respectively taken as elements in a first column and a third column of the first row of the hole convolution result, and two elements in the second row are respectively taken as elements in a first column and a third column of the third row of the hole convolution result, and are marked as "1" in fig. 4).

The elements in the first convolution result corresponding to the second sub-data are sequentially used as even columns of odd rows of the hole convolution result (that is, one element in the first row of the first convolution result corresponding to the second sub-data is used as an element in the second column of the first row of the hole convolution result, and one element in the second row is used as an element in the second column of the third row of the hole convolution result, and is marked as "2" in fig. 4).

The elements in the first convolution result corresponding to the third sub-data are sequentially used as odd columns of even rows of the hole convolution result (that is, one element in the first row of the first convolution result corresponding to the third sub-data is used as an element in the first column of the second row of the hole convolution result, and one element in the second row is used as an element in the third column of the second row of the hole convolution result, and is marked as "3" in fig. 4).

The elements in the first convolution result corresponding to the fourth sub-data are sequentially used as even columns of even rows of the hole convolution result (i.e., one element in the first row of the first convolution result corresponding to the fourth sub-data is used as an element in the second column of the second row of the hole convolution result, and is labeled as "4" in fig. 4).

In this way, after the first data is split according to the preset splitting mode, the obtained plurality of second data are respectively convolved with the weight, and the obtained plurality of first convolution results are combined according to the preset combining mode. According to the data processing method provided by the disclosure, the energy efficiency ratio of hardware can be improved, the operation time is reduced, and the operation efficiency is improved.

In order to better understand the benefits of the present disclosure for those skilled in the art, the following description is given by way of specific examples.

Assuming that the convolution size that the processor can handle is: if the first data shown in fig. 3 is directly convolved with the weights of 3 × 3, the processor can only process one convolution of 5 × 5 kernel and 3 × 3 weights at a time, and output a convolution result, and the processor needs to perform 9 operations to complete the convolution of the hole.

However, after the first data is split into four second data, the scales of the plurality of second data are as follows: 4, 3, 4, 3, the processor can obtain 9 results through one operation, namely, the convolution of the first data and the hollow of the weight value can be completed.

Therefore, the data processing method provided by the disclosure improves the energy efficiency ratio of hardware, reduces the operation time and improves the operation efficiency.

In one possible implementation, the first data may include neurons and/or gradients.

In the back propagation process of the hole convolution, the hole convolution can be performed through the first gradient of the current convolutional layer and the weight value to determine the second gradient of the next convolutional layer. In the process, the gradient of the current convolutional layer can be split according to a preset splitting mode to obtain four first sub-gradients, the four first sub-gradients are respectively convolved with the weight to obtain four convolution results, and the four convolution results are combined according to a preset combining mode to obtain a second gradient of the next convolutional layer.

In a possible implementation manner, the splitting the first data according to a preset splitting manner to obtain a plurality of second data may include:

splitting the first neuron according to the preset splitting mode to obtain a plurality of second neurons;

and splitting the first gradient according to the preset splitting mode to obtain a plurality of second gradients.

For example, the first neuron may be split according to a preset splitting manner to obtain a plurality of second neurons. Illustratively, the second neuron may include: a first sub-neuron, a second sub-neuron, a third sub-neuron, and a fourth sub-neuron, then elements corresponding to odd columns of odd rows of the first neuron may be determined to form the first sub-neuron, elements corresponding to even columns of odd rows of the first neuron may be determined, elements corresponding to even columns of even rows of the second neuron may be determined to form the second sub-neuron, elements corresponding to odd columns of even rows of the first neuron may be determined, elements corresponding to even columns of even rows of the third sub-neuron may be determined, and elements corresponding to even columns of even rows of the first neuron may be determined to form the fourth sub-neuron.

Correspondingly, the first gradient can be split according to a preset splitting mode to obtain a plurality of second gradients. Illustratively, the second gradient may include: a first sub-gradient, a second sub-gradient, a third sub-gradient, and a fourth sub-gradient, elements corresponding to odd columns of odd rows of the first gradient may be determined, the first sub-gradient is composed, elements corresponding to even columns of odd rows of the first gradient are determined, the second sub-gradient is composed, elements corresponding to odd columns of even rows of the first gradient are determined, the third sub-gradient is composed, elements corresponding to even columns of even rows of the first gradient are determined, and the fourth sub-gradient is composed.

In a possible implementation manner, the method may further include:

for any second neuron, performing convolution operation on the second neuron and the corresponding second gradient to obtain a third convolution result;

determining the sum of the third convolution results corresponding to each second neuron as the residual error of the weight;

wherein the parity properties of the rows and columns corresponding to the positions of the elements in the second neuron in the first neuron and the parity properties of the rows and columns corresponding to the positions of the elements in the corresponding second gradient in the first gradient are consistent.

For example, the parity properties of rows and columns corresponding to the positions of elements in the second gradient in the first gradient corresponding to the second neuron, and the parity properties of rows and columns corresponding to the positions of elements in the second neuron in the first neuron, are consistent, e.g.: all elements in the second neuron are positioned in odd rows and odd columns in the first neuron, and all elements in a second gradient corresponding to the second neuron are positioned in odd rows and odd columns in the first gradient; or all elements in the second neuron are located in odd-numbered rows and even-numbered columns in the first neuron, and all elements in the second gradient corresponding to the second neuron are located in odd-numbered rows and even-numbered columns in the first neuron, or all elements in the second neuron are located in even-numbered rows and odd-numbered columns in the first neuron, and all elements in the second gradient corresponding to the second neuron are located in even-numbered rows and odd-numbered columns in the first gradient; or all elements in the second neuron are located in even rows and even columns in the first neuron, and all elements in the second gradient corresponding to the second neuron are located in even rows and even columns in the first gradient.

Illustratively, taking the above example as an example, the second neuron includes: a first, second, third and fourth sub-neurons, the second gradient comprising: the first sub-gradient corresponds to the first sub-gradient, and the first sub-gradient perform convolution processing to obtain a convolution result corresponding to the first sub-gradient; the second sub-neurons correspond to the second sub-gradients, and convolution processing is performed on the second sub-neurons and the second sub-gradients to obtain convolution results corresponding to the second sub-neurons; the third sub-neuron corresponds to the third sub-gradient, and convolution processing is performed on the third sub-neuron and the third sub-gradient to obtain a convolution result corresponding to the third sub-neuron; and the fourth sub-neuron corresponds to the fourth sub-gradient, and the convolution processing is executed on the fourth sub-neuron and the fourth sub-gradient to obtain a convolution result corresponding to the fourth sub-neuron.

And after the third convolution results corresponding to the second neurons are obtained, adding the third convolution results corresponding to the second neurons, and determining the obtained sum as the residual error of the weight.

Thus, the hardware energy efficiency ratio can be improved, the operation time can be reduced, and the operation efficiency can be improved.

In a possible implementation manner, the method may further include:

and adjusting the weight according to the residual error of the weight.

For example, after determining the residual of the weight, the weight of the current convolutional layer may be adjusted according to the residual of the weight, for example: and determining the sum of the residual error of the weight and the weight as a new weight.

It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.

It is further noted that, although the various steps in the flowcharts of fig. 1-4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

Fig. 5 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure. As shown in fig. 5, the data processing apparatus may include:

the splitting module 501 may be configured to split the first data according to a preset splitting manner to obtain a plurality of second data;

a convolution module 502, configured to perform convolution operations on the second data and the weight respectively to obtain a plurality of first convolution results;

a merging module 503, configured to merge the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight,

In this way, after the first data is split according to the preset splitting mode, the obtained plurality of second data are respectively convolved with the weight, and the obtained plurality of first convolution results are combined according to the preset combining mode. According to the data processing device provided by the disclosure, the hardware energy efficiency ratio can be improved, the operation time can be reduced, and the operation efficiency can be improved.

In a possible implementation manner, the plurality of second data includes first sub data, second sub data, third sub data, and fourth sub data, and the splitting module may be further configured to:

In a possible implementation manner, the merging module may be further configured to:

In one possible implementation, the first data includes neurons and/or gradients.

In a possible implementation manner, the first data may include a first neuron and a first gradient, and the splitting module is further configured to:

In a possible implementation manner, the apparatus may further include:

the processing module is used for carrying out convolution operation on any second neuron and the corresponding second gradient to obtain a third convolution result;

a determining module, configured to determine a sum of third convolution results corresponding to each second neuron as a residual of the weight;

In a possible implementation manner, the apparatus may further include:

and the adjusting module is used for adjusting the weight according to the residual error of the weight.

It should be understood that the above-described apparatus embodiments are merely illustrative and that the apparatus of the present disclosure may be implemented in other ways. For example, the division of the units/modules in the above embodiments is only one logical function division, and there may be another division manner in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented.

In addition, unless otherwise specified, each functional unit/module in each embodiment of the present disclosure may be integrated into one unit/module, each unit/module may exist alone physically, or two or more units/modules may be integrated together. The integrated units/modules may be implemented in the form of hardware or software program modules.

If the integrated unit/module is implemented in hardware, the hardware may be digital circuits, analog circuits, etc. Physical implementations of hardware structures include, but are not limited to, transistors, memristors, and the like. The artificial intelligence processor may be any suitable hardware processor, such as a CPU, GPU, FPGA, DSP, ASIC, etc., unless otherwise specified. Unless otherwise specified, the Memory unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as resistive Random Access Memory rram (resistive Random Access Memory), Dynamic Random Access Memory dram (Dynamic Random Access Memory), Static Random Access Memory SRAM (Static Random-Access Memory), enhanced Dynamic Random Access Memory edram (enhanced Dynamic Random Access Memory), High-Bandwidth Memory HBM (High-Bandwidth Memory), hybrid Memory cubic hmc (hybrid Memory cube), and so on.

The integrated units/modules, if implemented in the form of software program modules and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

In a possible implementation manner, an artificial intelligence chip is also disclosed, which comprises the data processing device.

In a possible implementation manner, a board card is further disclosed, which comprises a storage device, an interface device, a control device and the artificial intelligence chip; wherein, the artificial intelligence chip is respectively connected with the storage device, the control device and the interface device; the storage device is used for storing data; the interface device is used for realizing data transmission between the artificial intelligence chip and external equipment; and the control device is used for monitoring the state of the artificial intelligence chip.

Fig. 6 shows a block diagram of a board according to an embodiment of the present disclosure, and referring to fig. 6, the board may include other kit components besides the chip 389, where the kit components include, but are not limited to: memory device 390, interface device 391 and control device 392;

the storage device 390 is connected to the artificial intelligence chip through a bus for storing data. The memory device may include a plurality of groups of memory cells 393. Each group of the storage units is connected with the artificial intelligence chip through a bus. It is understood that each group of the memory cells may be a DDR SDRAM (Double Data Rate SDRAM).

DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the artificial intelligence chip may include 4 72-bit DDR4 controllers, and 64 bits of the 72-bit DDR4 controller are used for data transmission, and 8 bits are used for ECC check. It can be understood that when DDR4-3200 particles are adopted in each group of memory cells, the theoretical bandwidth of data transmission can reach 25600 MB/s.

In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each memory unit.

The interface device is electrically connected with the artificial intelligence chip. The interface device is used for realizing data transmission between the artificial intelligence chip and external equipment (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the interface device may also be another interface, and the disclosure does not limit the specific expression of the other interface, and the interface unit may implement the switching function. In addition, the calculation result of the artificial intelligence chip is still transmitted back to the external device (e.g. server) by the interface device.

The control device is electrically connected with the artificial intelligence chip. The control device is used for monitoring the state of the artificial intelligence chip. Specifically, the artificial intelligence chip and the control device can be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). As the artificial intelligence chip can comprise a plurality of processing chips, a plurality of processing cores or a plurality of processing circuits, a plurality of loads can be driven. Therefore, the artificial intelligence chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the artificial intelligence chip.

In one possible implementation, an electronic device is disclosed that includes the artificial intelligence chip described above. The electronic device comprises a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 7 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 7, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 8 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 8, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. The technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The foregoing may be better understood in light of the following clauses:

clause a1, a data processing method applied to a processor, the method comprising:

Clause a2, according to the method of clause a1, the plurality of second data includes first sub data, second sub data, third sub data, and fourth sub data, and the splitting the first data according to a preset splitting manner to obtain a plurality of second data includes:

Clause A3, according to the method of clause a2, the merging the plurality of first convolution results according to a preset merging manner, and determining that a second convolution result obtained after merging is a hole convolution result of the first data and the weight, includes:

Clause a4, the method of any one of clauses a 1-A3, the first data comprising neurons and/or gradients.

Clause a5, the method of any one of clauses a 1-a 4, the first data comprising a first neuron and a first gradient, the splitting the first data according to a preset splitting pattern to obtain a plurality of second data, comprising:

Clause a6, the method of clause a5, further comprising:

Clause a7, the method of clause a6, further comprising:

and adjusting the weight according to the residual error of the weight.

Clause A8, a data processing apparatus applied to a processor, the apparatus comprising:

Clause a9, according to the apparatus of clause A8, the plurality of second data includes first sub-data, second sub-data, third sub-data, and fourth sub-data, and the splitting module is further configured to:

Clause a10, the apparatus of clause a9, the merging module further to:

Clause a11, the apparatus of any one of clauses A8-a 10, the first data comprising neurons and/or gradients.

Clause a12, the apparatus of any one of clauses A8-a 11, the first data comprising a first neuron and a first gradient, the split module further to:

Clause a13, the apparatus of clause a12, further comprising:

Clause a14, the apparatus of clause a13, further comprising:

Clause a15, an artificial intelligence chip, the chip comprising the data processing apparatus of clause A8.

Clause a16, an electronic device comprising the artificial intelligence chip of clause a 15.

Clause a17, a card, comprising: a memory device, an interface device and a control device and an artificial intelligence chip as described in clause a 15;

the storage device is used for storing data;

Clause a18, the board of clause a17, the storage device comprising: the artificial intelligence chip comprises a plurality of groups of storage units, wherein each group of storage unit is connected with the artificial intelligence chip through a bus, and the storage units are as follows: DDR SDRAM;

the chip includes: the DDR controller is used for controlling data transmission and data storage of each memory unit;

the interface device is as follows: a standard PCIE interface.

Clause a18, an electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any of clauses A1-A7.

Clause a19, a computer-readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the method of any of clauses a 1-a 7.

The embodiments of the present disclosure have been described in detail, and the principles and embodiments of the present disclosure are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present disclosure. Meanwhile, a person skilled in the art should, based on the idea of the present disclosure, change or modify the specific embodiments and application scope of the present disclosure. In view of the above, the description is not intended to limit the present disclosure.

Claims

1. A data processing method, applied to a processor, the method comprising:

2. The method of claim 1, wherein the plurality of second data includes first sub-data, second sub-data, third sub-data, and fourth sub-data, and splitting the first data according to a preset splitting manner to obtain a plurality of second data includes:

3. The method according to claim 2, wherein the combining the plurality of first convolution results according to a preset combining manner, and determining that a second convolution result obtained after the combining is a hole convolution result of the first data and the weight includes:

4. The method of any one of claims 1 to 3, wherein the first data comprises neurons and/or gradients.

5. The method according to any one of claims 1 to 4, wherein the first data comprises a first neuron and a first gradient, and the splitting the first data according to a preset splitting manner to obtain a plurality of second data comprises:

6. The method of claim 5, further comprising:

7. The method of claim 6, further comprising:

and adjusting the weight according to the residual error of the weight.

8. A data processing apparatus, for use with a processor, the apparatus comprising:

9. An artificial intelligence chip, characterized in that the chip comprises a data processing device according to claim 8.

10. An electronic device, characterized in that the electronic device comprises an artificial intelligence chip according to claim 9.

11. The utility model provides a board card, its characterized in that, the board card includes: a memory device, an interface device and a control device and an artificial intelligence chip according to claim 9;

the storage device is used for storing data;

12. The board card of claim 11,

the memory device includes: the artificial intelligence chip comprises a plurality of groups of storage units, wherein each group of storage unit is connected with the artificial intelligence chip through a bus, and the storage units are as follows: DDR SDRAM;

the interface device is as follows: a standard PCIE interface.

13. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any of claims 1 to 7.

14. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 7.