CN113298223B - Data processing method, device, computer equipment and storage medium - Google Patents

Data processing method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN113298223B
CN113298223B CN202010111742.7A CN202010111742A CN113298223B CN 113298223 B CN113298223 B CN 113298223B CN 202010111742 A CN202010111742 A CN 202010111742A CN 113298223 B CN113298223 B CN 113298223B
Authority
CN
China
Prior art keywords
weight
data
neural network
gradient
quantization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010111742.7A
Other languages
Chinese (zh)
Other versions
CN113298223A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cambricon Technologies Corp Ltd
Original Assignee
Cambricon Technologies Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cambricon Technologies Corp Ltd filed Critical Cambricon Technologies Corp Ltd
Priority to CN202010111742.7A priority Critical patent/CN113298223B/en
Publication of CN113298223A publication Critical patent/CN113298223A/en
Application granted granted Critical
Publication of CN113298223B publication Critical patent/CN113298223B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The present disclosure relates to a data processing method, a device, a computer device and a storage medium, which can train a neural network, and when processing related to quantization is performed on input data, output data, weights and weight gradients in the training process, update of weights, calculation of weight quantization parameters, quantization of weights, and data layout of quantized weights are implemented by using a parameter server, then the weight quantization parameters, the forward quantized weights and the inverse quantized weights are broadcasted to a plurality of working nodes, other processing processes are implemented by using the working nodes, reasonable division is performed on the processing performed by the parameter server and the working nodes, the number of times of calculation of weight quantization and data layout performed by the device in the processing process is reduced, calculation overhead, transmission bandwidth, access amount and communication amount are reduced, and energy consumption of the device is reduced.

Description

Data processing method, device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method, apparatus, computer device, and storage medium.
Background
With the development of computer technology, neural Networks (NN) are increasingly used in data processing such as image processing, video processing, and speech recognition. The neural network continuously corrects the network weight and the threshold value through training of sample data to enable the error function to descend along the negative gradient direction and approach the expected output. The method is a recognition classification model which is widely applied and is used for function approximation, model recognition classification, data compression, time sequence prediction and the like. Neural networks are applied to the fields of image recognition, voice recognition, natural language processing and the like, however, as the complexity of the neural networks increases, the data volume and the data dimension of data are continuously increased, and the continuously increased data volume and the like provide great challenges for the data processing efficiency of an operation device, the storage capacity and the memory access efficiency of a storage device and the like. In the related art, fixed bit width is adopted to quantize the operation data of the neural network, namely, floating point type operation data is converted into fixed point type operation data, so that the compression of the operation data of the neural network is realized, but the calculation cost, the transmission bandwidth, the memory access, the communication quantity and the energy consumption of the device occupied by quantization in the related art are large.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data processing method, apparatus, computer device, and storage medium that can solve the foregoing technical problems.
According to an aspect of the present disclosure, there is provided a data processing apparatus for training a neural network, the neural network comprising a plurality of neural network layers, the apparatus comprising a parameter server and a plurality of working nodes, wherein the parameter server is configured to
Receiving weight gradients corresponding to the current neural network layer sent by each working node, and updating the weight of the current neural network layer according to a plurality of weight gradients and weight updating operators to obtain updated weight;
carrying out quantization parameter calculation according to the weight parameter operator and the updated weight to obtain a corresponding weight quantization parameter;
carrying out quantization processing on the updated weight according to the determined weight quantization parameter and the weight quantization operator to obtain a quantized weight;
respectively carrying out data layout processing on the quantized weights by using a forward layout operator and an inverse layout operator to obtain forward vectorized weights and inverse quantized weights;
broadcasting the weight quantization parameter, the forward quantized weight and the inverse quantized weight to the plurality of working nodes so that a target working node can perform corresponding data processing operation according to received data, wherein the target working node is any one of the plurality of working nodes.
According to another aspect of the present disclosure, there is provided a data processing method for a data processing apparatus for training a neural network, the neural network including a plurality of neural network layers, the data processing apparatus including a parameter server and a plurality of working nodes, the method comprising:
controlling the parameter server to receive the weight gradient of the current neural network layer sent by each working node, and updating the weight of the current neural network layer according to a plurality of weight gradients and weight updating operators to obtain updated weight;
controlling the parameter server to calculate quantization parameters according to the weight parameter operator and the updated weight to obtain corresponding weight quantization parameters;
controlling the parameter server to quantize the updated weight according to the determined weight quantization parameter and the weight quantization operator to obtain a quantized weight;
the parameter server is controlled to respectively conduct data layout processing on the quantized weights by utilizing a forward layout operator and a reverse layout operator to obtain forward vectorized weights and reverse quantized weights;
and controlling the parameter server to broadcast the weight quantization parameter, the forward quantized weight and the inverse quantized weight to the plurality of working nodes so that a target working node can perform corresponding data processing operation according to the received data, wherein the target working node is any one of the plurality of working nodes.
According to another aspect of the present disclosure, there is provided an artificial intelligence chip comprising the above data processing apparatus.
According to another aspect of the present disclosure, there is provided an electronic device including the artificial intelligence chip described above.
According to another aspect of the present disclosure, there is provided a board including: a memory device, an interface device, and a control device, and an artificial intelligence chip as described above;
wherein the artificial intelligent chip is respectively connected with the storage device, the control device and the interface device;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the artificial intelligent chip and external equipment;
the control device is used for monitoring the state of the artificial intelligent chip,
wherein the memory device includes: each group of storage units is connected with the artificial intelligent chip through a bus, and the storage units are as follows: DDR SDRAM;
the chip comprises: the DDR controller is used for controlling data transmission and data storage of each storage unit;
the interface device is as follows: standard PCIE interfaces.
According to another aspect of the present disclosure, there is provided an electronic device including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the instructions stored by the memory to perform the data processing method described above.
According to another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described data processing method.
The data processing method, the device, the computer equipment and the storage medium provided by the disclosure can train the neural network, when the input data, the output data, the weight and the weight gradient are subjected to quantization related processing in the training process, the parameter server is utilized to update the weight, calculate the weight quantization parameter, quantize the weight and respectively perform data layout on the quantized weight, then the weight quantization parameter, the forward quantized weight and the inverse quantized weight are broadcasted to the plurality of working nodes, other processing processes are realized by the working nodes, the processing performed by the parameter server and the working nodes is reasonably divided, the calculation times of the weight quantization and the data layout in the processing process are reduced, the calculation cost, the transmission bandwidth, the access quantity and the communication quantity are reduced, and the energy consumption of the device is reduced.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features and aspects of the present disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 shows a schematic diagram of a processor utilized in a data processing apparatus according to an embodiment of the present disclosure.
FIG. 2 illustrates a block diagram of a data processing apparatus according to an embodiment of the present disclosure.
Fig. 3 and 4 are schematic diagrams illustrating an operation principle of a data processing apparatus according to an embodiment of the present disclosure.
Fig. 5 shows a flow chart of a data processing method according to an embodiment of the present disclosure.
Fig. 6 shows a block diagram of a board according to an embodiment of the present disclosure.
Fig. 7 illustrates a block diagram of an electronic device 800, according to an embodiment of the disclosure.
Fig. 8 illustrates a block diagram of an electronic device 1900 according to an embodiment of the disclosure.
Detailed Description
The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person skilled in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the present disclosure is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in this disclosure and in the claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the present disclosure and claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in this specification and the claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
The data processing apparatus according to the embodiments of the present disclosure may be configured of a plurality of processors, wherein one of the plurality of processors may be a parameter server, and the remaining processors may be working nodes. The processor utilized by the parameter server may be the same as or different from the processor utilized by the working node. The processor may be a general purpose processor such as a CPU (Central Processing Unit ) or an artificial Intelligence Processor (IPU) for performing artificial intelligence operations. The artificial intelligence operations may include machine learning operations, brain-like operations, and the like. The machine learning operation comprises neural network operation, k-means operation, support vector machine operation and the like. The artificial intelligence processor may include, for example, one or a combination of GPU (Graphics Processing Unit ), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), field-programmable gate array (Field-Programmable Gate Array, FPGA) chips. The present disclosure is not limited by the specific type of processor.
In one possible implementation, the processors referred to in this disclosure may include multiple processing units, each of which may independently execute various tasks assigned thereto, such as: convolution operation task, pooling task or full connection task, etc. The present disclosure is not limited to the tasks that the processing unit operates on.
Fig. 1 shows a schematic diagram of a processor utilized in a data processing apparatus according to an embodiment of the present disclosure. As shown in fig. 1, the processor 700 includes a plurality of processing units 101 and a memory unit 102, the plurality of processing units 101 being configured to execute sequences of instructions, the memory unit 102 being configured to store data, which may include a random access memory (RAM, random Access Memory) and a register file. The plurality of processing units 101 in the processor 700 may share part of the memory space, e.g. share part of the RAM memory space and the register file, or may have respective memory spaces at the same time.
FIG. 2 illustrates a block diagram of a data processing apparatus according to an embodiment of the present disclosure. As shown in fig. 2, the apparatus is used for training a neural network, the neural network comprises a plurality of neural network layers, the apparatus comprises a parameter server 100 and a plurality of working nodes (i.e. target working nodes) 200, wherein the parameter server 100 is used for
Receiving weight gradients corresponding to the current neural network layer sent by each working node, and updating the weight of the current neural network layer according to a plurality of weight gradients and weight updating operators to obtain updated weight;
carrying out quantization parameter calculation according to the weight parameter operator and the updated weight to obtain a corresponding weight quantization parameter;
Carrying out quantization processing on the updated weight according to the determined weight quantization parameter and the weight quantization operator to obtain a quantized weight;
and respectively carrying out data layout processing on the quantized weights by using a forward layout operator and an inverse layout operator to obtain the forward vectorized weights and the inverse quantized weights, wherein the data layout processing can be to set a layout mode of data, namely the number of pendulum. The method comprises the steps of carrying out a first treatment on the surface of the
Broadcasting the weight quantization parameter, the forward quantized weight and the inverse quantized weight to the plurality of working nodes so that a target working node can perform corresponding data processing operation according to received data, wherein the target working node is any one of the plurality of working nodes.
In this embodiment, the output data, input data, weight, output data gradient, input data gradient, and weight gradient before quantization are data represented by a high-precision data format, and the output data, input data, weight, output data gradient, input data gradient, and weight gradient after quantization are data represented by a low-precision data format.
In this embodiment, according to the data processing operation division between the parameter server and the working node of the device disclosed by the present disclosure, the updated weight and the quantized weight are converted from the user visible data in the related art into the user invisible data, so as to provide a more friendly data display for the user.
The data processing device provided by the disclosure can train a neural network, when processing related to quantization is carried out on input data, output data, weights and weight gradients in the training process, the parameter server is utilized to update the weights, calculate the weight quantization parameters, quantize the weights, respectively carry out data layout on the quantized weights, then the weight quantization parameters, the forward quantized weights and the inverse quantized weights are broadcast to a plurality of working nodes, other processing processes are realized by the working nodes, the processing carried out on the parameter server and the working nodes is reasonably divided, the calculation times of the weights quantization and the data layout carried out by the device in the processing process are reduced, the calculation cost, the transmission bandwidth, the access quantity and the communication quantity are reduced, and the energy consumption of the device is reduced.
In this embodiment, each target working node may implement different data processing operations, which may include: convolution forward operation, input data processing (input data quantization, input data quantization parameter calculation), output data gradient calculation, output data gradient processing (output data gradient quantization parameter calculation, output data gradient quantization), weight gradient calculation, convolution inverse operation, error operation, and the like. Fig. 3 and fig. 4 are schematic diagrams illustrating an operation principle of a data processing apparatus according to an embodiment of the present disclosure, and as shown in fig. 3 and fig. 4, an operation procedure and a principle of the data processing apparatus are described by taking one data processing operation per target operation node as an example, where only the data processing operation of a certain layer in a neural network is implemented by using the data processing apparatus for simplicity and understanding in fig. 3 and fig. 4.
In one possible implementation, as shown in fig. 3 and 4, the target working node 200 is used to implement a convolution forward operation. The target working node 200 is configured to perform convolution forward operation according to the received weight quantization parameter, the forward quantized weight, quantized input data, an input data quantization parameter, and a forward convolution network operator, so as to obtain output data corresponding to the current neural network layer.
In a possible implementation manner, the target working node 200 is further configured to determine, when the current neural network layer is the last layer of the neural network, an error of the output data according to preset target output data, so that the device determines whether a training end condition is met according to the error.
In this implementation, the training end condition may be set according to the accuracy requirement, the time requirement, and the like of the training, which is not limited by the present disclosure. For example, when the error of the output data is smaller than a preset error threshold value, it may be determined that the training end condition is satisfied. Therefore, the device can meet the training requirements of different neural networks, and the application range of the device is enlarged.
In one possible implementation, as shown in fig. 3 and 4, the target working node 200 is further configured to implement input data processing. The target working node 200 is configured to perform quantization parameter calculation according to input data and an input data parameter operator, so as to obtain an input data quantization parameter; and carrying out quantization processing on the input data according to the determined input data quantization parameters and the input data quantization operator to obtain quantized input data. In the implementation manner, the related processing operations (input data quantization and input data quantization parameter calculation) performed on the input data are controlled to be performed in the same target working node, so that the occupation condition of communication quantity and transmission bandwidth can be reduced.
In one possible implementation, as shown in fig. 3 and 4, the target working node 200 is further configured to implement output data gradient calculation. The target working node 200 is configured to perform an error operation on the output data, so as to obtain an output data gradient of the output data.
In this implementation, the target working node performing the "error operation" may be the same as or different from the target working node performing the "convolution forward operation", which is not limited by the present disclosure.
In one possible implementation, as shown in fig. 3 and 4, the target working node 200 is further configured to implement output data gradient computation. The target working node 200 is further configured to perform quantization parameter calculation according to the output data gradient and the output data gradient parameter operator, so as to obtain an output data gradient quantization parameter; and carrying out quantization processing on the output data gradient according to the determined output data gradient quantization parameter and the output data gradient quantization operator to obtain a quantized output data gradient. In the implementation mode, the related processing operation (output data gradient quantization parameter calculation and output data gradient quantization) performed on the output data gradient is controlled to be performed in the same target working node, so that the occupation condition of communication quantity and transmission bandwidth can be reduced.
In one possible implementation, as shown in fig. 3 and fig. 4, the target working node 200 is further configured to implement weight gradient calculation. The target working node 200 is further configured to perform a weight gradient operation according to the quantized output data gradient, the output data gradient quantization parameter, the quantized input data, the input data quantization parameter, and a weight gradient operator, obtain a new weight gradient corresponding to the current neural network layer, and send the new weight gradient to the parameter server. Therefore, the weight is updated through the parameter server, so that the operation of the device can be simplified, the calculation cost is reduced, and the energy consumption of the device is reduced.
In one possible implementation, as shown in fig. 3 and 4, the target working node 200 is further configured to implement a convolution inverse operation. The target working node 200 is further configured to perform convolution inverse operation according to the quantized output data gradient, the output data quantization parameter, the inverse quantized weight, the weight quantization parameter, and an inverse convolution network operator, so as to obtain an input data gradient corresponding to the current neural network layer.
In a possible implementation manner, the target working node 200 is further configured to, when determining that the current neural network layer is the middle layer and/or the first layer of the neural network, use the output data and/or the input data gradient as the input data of the next neural network layer. Thus, the interaction of the interlayer data can be realized, and the smooth proceeding of the data processing is ensured.
In one possible implementation, corresponding operator identities may also be set for different operators, so that the parameter server and the target working node may invoke responsive operators according to the operator identities. Table 1 below provides an example of an operator identification provided by the present disclosure, which may be a combination of one or more of letters, numbers, operators, punctuation and other symbols, and some functional symbols, such as a combination of multiple letters, a combination of multiple operational coincidences, etc., which is not limiting to the present disclosure.
Table 1 operator identification example
It should be understood that the above-described apparatus embodiments are merely illustrative and that the apparatus of the present disclosure may be implemented in other ways. For example, the division of the units/modules in the above embodiments is merely a logic function division, and there may be another division manner in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be omitted or not performed.
In addition, each functional unit/module in the embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may be integrated together, unless otherwise specified. The integrated units/modules described above may be implemented either in hardware or in software program modules.
The integrated units/modules, if implemented in hardware, may be digital circuits, analog circuits, etc. Physical implementations of hardware structures include, but are not limited to, transistors, memristors, and the like. The artificial intelligence processor may be any suitable hardware processor, such as CPU, GPU, FPGA, DSP and ASIC, etc., unless otherwise specified. The Memory unit may be any suitable magnetic or magneto-optical storage medium, such as resistive Random Access Memory RRAM (Resistive Random Access Memory), dynamic Random Access Memory DRAM (Dynamic Random Access Memory), static Random Access Memory SRAM (Static Random-Access Memory), enhanced dynamic Random Access Memory EDRAM (Enhanced Dynamic Random Access Memory), high-Bandwidth Memory HBM (High-Bandwidth Memory), hybrid Memory cube HMC (Hybrid Memory Cube), etc., unless otherwise indicated.
The integrated units/modules may be stored in a computer readable memory if implemented in the form of software program modules and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method described in the various embodiments of the present disclosure. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Fig. 5 shows a flow chart of a data processing method according to an embodiment of the present disclosure. As shown in fig. 5, a data processing apparatus for training a neural network including a plurality of neural network layers, the data processing apparatus including a parameter server and a plurality of working nodes, the method including steps S11 to S15.
In step S11, the parameter server is controlled to receive the weight gradients of the current neural network layer sent by each working node, and update the weight of the current neural network layer according to a plurality of weight gradients and weight update operators to obtain updated weight.
In step S12, the parameter server is controlled to perform quantization parameter calculation according to the weight parameter operator and the updated weight, so as to obtain a corresponding weight quantization parameter.
In step S13, the parameter server is controlled to quantize the updated weight according to the determined weight quantization parameter and the weight quantization operator, so as to obtain a quantized weight.
In step S14, the parameter server is controlled to perform data layout processing on the quantized weights by using a forward layout operator and an inverse layout operator, so as to obtain a forward vectorized weight and an inverse quantized weight.
In step S15, the parameter server is controlled to broadcast the weight quantization parameter, the forward quantized weight and the inverse quantized weight to the plurality of working nodes, so that a target working node can perform a corresponding data processing operation according to the received data, where the target working node is any one of the plurality of working nodes.
In one possible implementation, the method may further include:
and the control target working node carries out convolution forward operation according to the received weight quantization parameter, the forward quantized rear weight, quantized input data, the input data quantization parameter and a forward convolution network operator to obtain output data corresponding to the current neural network layer.
In one possible implementation, the method may further include:
when the current neural network layer is the last layer of the neural network, the control target working node determines the error of the output data according to preset target output data, so that the device judges whether the training ending condition is met according to the error.
In one possible implementation, the method may further include:
The target working node is controlled to perform quantization parameter calculation according to the input data and the input data parameter operator to obtain input data quantization parameters;
and the control target working node carries out quantization processing on the input data according to the determined input data quantization parameter and the input data quantization operator to obtain quantized input data.
In one possible implementation, the method may further include:
and the control target working node performs error operation on the output data to obtain an output data gradient of the output data.
In one possible implementation, the method may further include:
the control target working node calculates quantization parameters according to the output data gradient and the output data gradient parameter operator to obtain output data gradient quantization parameters;
and the control target working node carries out quantization processing on the output data gradient according to the determined output data gradient quantization parameter and the output data gradient quantization operator to obtain the quantized output data gradient.
In one possible implementation, the method may further include:
and the control target working node carries out weight gradient operation according to the quantized output data gradient, the output data gradient quantization parameter, the quantized input data, the input data quantization parameter and a weight gradient operator to obtain a new weight gradient corresponding to the current neural network layer, and sends the new weight gradient to the parameter server.
In one possible implementation, the method may further include:
and the control target working node carries out convolution reverse operation according to the quantized output data gradient, the output data quantization parameter, the reverse quantized weight, the weight quantization parameter and a reverse convolution network operator to obtain the input data gradient corresponding to the current neural network layer.
In one possible implementation, the method may further include:
and when the current neural network layer is determined to be the middle layer and/or the first layer of the neural network, the control target working node takes the output data and/or the input data gradient as the input data of the next neural network layer.
The data processing method provided by the disclosure can train the neural network, when the input data, the output data, the weight and the weight gradient are subjected to quantization related processing in the training process, the parameter server is utilized to update the weight, calculate the weight quantization parameter, quantize the weight, respectively carry out data layout on the quantized weight, then broadcast the weight quantization parameter, the forward quantized weight and the inverse quantized weight to the plurality of working nodes, and the working nodes are utilized to realize other processing processes, so that the processing times of the weight quantization and the data layout of the device in the processing process are reduced, the calculation cost, the transmission bandwidth, the access quantity and the communication quantity are reduced, and the energy consumption of the device is reduced.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present disclosure is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present disclosure. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the acts and modules referred to are not necessarily required by the present disclosure.
It should be further noted that, although the steps in the flowchart of fig. 5 are sequentially shown as indicated by arrows, the steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 5 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
In one possible implementation, an artificial intelligence chip is also disclosed, which includes the above-described data processing apparatus.
In one possible implementation, a board is also disclosed, which includes a memory device, an interface device, and a control device, and the artificial intelligence chip described above; wherein the artificial intelligent chip is respectively connected with the storage device, the control device and the interface device; the storage device is used for storing data; the interface device is used for realizing data transmission between the artificial intelligent chip and external equipment; the control device is used for monitoring the state of the artificial intelligent chip.
Fig. 6 shows a block diagram of a board according to an embodiment of the present disclosure, and referring to fig. 6, the board may include other mating components in addition to the chip 389, including but not limited to: a memory device 390, an interface device 391 and a control device 392;
the memory device 390 is connected to the artificial intelligence chip through a bus for storing data. The memory device may include multiple sets of memory cells 393. Each group of storage units is connected with the artificial intelligent chip through a bus. It is understood that each set of memory cells may be DDR SDRAM (English: double Data Rate SDRAM, double Rate synchronous dynamic random Access memory).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on both the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the memory device may include 4 sets of the memory cells. Each set of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the artificial intelligence chip may include 4 72-bit DDR4 controllers therein, where 64 bits of the 72-bit DDR4 controllers are used to transfer data and 8 bits are used for ECC verification. It is understood that the theoretical bandwidth of data transfer can reach 25600MB/s when DDR4-3200 granules are employed in each set of memory cells.
In one embodiment, each set of memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each storage unit.
The interface device is electrically connected with the artificial intelligent chip. The interface device is used for realizing data transmission between the artificial intelligent chip and an external device (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X10 interface transmission is adopted, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device may be another interface, and the disclosure is not limited to the specific form of the other interface, and the interface unit may be capable of implementing a switching function. In addition, the results of the computation of the artificial intelligence chip are still transmitted back to the external device (e.g., server) by the interface device.
The control device is electrically connected with the artificial intelligence chip. The control device is used for monitoring the state of the artificial intelligent chip. Specifically, the artificial intelligent chip and the control device can be electrically connected through an SPI interface. The control device may comprise a single chip microcomputer (Micro Controller Unit, MCU). The artificial intelligent chip can comprise a plurality of processing chips, a plurality of processing cores or a plurality of processing circuits, and can drive a plurality of loads. Therefore, the artificial intelligent chip can be in different working states such as multi-load and light-load. The control device can regulate and control the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the artificial intelligent chip.
In one possible implementation, an electronic device is disclosed that includes the artificial intelligence chip described above. The electronic device includes a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, an intelligent terminal, a cell phone, a vehicle recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an aircraft, a ship and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.
The disclosed embodiments also provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method. The computer readable storage medium may be a non-volatile computer readable storage medium.
The embodiment of the disclosure also provides an electronic device, which comprises: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored in the memory to perform the above method.
Fig. 7 illustrates a block diagram of an electronic device 800, according to an embodiment of the disclosure. For example, electronic device 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like, for use as a working node.
Referring to fig. 7, an electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the electronic device 800.
The multimedia component 808 includes a screen between the electronic device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the electronic device 800. For example, the sensor assembly 814 may detect an on/off state of the electronic device 800, a relative positioning of the components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of a user's contact with the electronic device 800, an orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communication between the electronic device 800 and other devices, either wired or wireless. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi,2G, or 3G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including computer program instructions executable by processor 820 of electronic device 800 to perform the above-described methods.
Fig. 8 illustrates a block diagram of an electronic device 1900 according to an embodiment of the disclosure. For example, electronic device 1900 may be provided as a server for serving as a parameter server. Referring to fig. 8, electronic device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of electronic device 1900 to perform the methods described above.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments. The technical features of the foregoing embodiments may be arbitrarily combined, and for brevity, all of the possible combinations of the technical features of the foregoing embodiments are not described, however, all of the combinations of the technical features should be considered as being within the scope of the disclosure.
The foregoing has outlined rather closely the embodiments of the present disclosure, and detailed description of the principles and embodiments of the present disclosure have been presented herein with the application of specific examples, the description of the examples above being merely intended to facilitate an understanding of the method of the present disclosure and its core ideas. Meanwhile, those skilled in the art will recognize that modifications or variations made on the basis of the specific embodiments and application scope of the present disclosure are within the scope of the protection of the present disclosure in light of the ideas of the present disclosure. In view of the foregoing, this description should not be construed as limiting the disclosure.

Claims (15)

1. A data processing apparatus for training a neural network, the neural network comprising a plurality of neural network layers, the apparatus comprising a parameter server and a plurality of working nodes, wherein the parameter server is configured to
Receiving weight gradients corresponding to the current neural network layer sent by each working node, and updating the weight of the current neural network layer according to a plurality of weight gradients and weight updating operators to obtain updated weight;
carrying out quantization parameter calculation according to the weight parameter operator and the updated weight to obtain a corresponding weight quantization parameter;
carrying out quantization processing on the updated weight according to the determined weight quantization parameter and the weight quantization operator to obtain a quantized weight;
respectively carrying out data layout processing on the quantized weights by using a forward layout operator and an inverse layout operator to obtain forward vectorized weights and inverse quantized weights;
broadcasting the weight quantization parameter, the forward quantized weight and the inverse quantized weight to the plurality of working nodes so that a target working node can perform corresponding data processing operation according to received data, wherein the target working node is any one of the plurality of working nodes.
2. The apparatus of claim 1, wherein the device comprises a plurality of sensors,
the target working node is used for carrying out convolution forward operation according to the received weight quantization parameter, the forward quantized weight, quantized input data, the input data quantization parameter and a forward convolution network operator to obtain output data corresponding to the current neural network layer.
3. The apparatus of claim 2, wherein the device comprises a plurality of sensors,
and the target working node is further configured to determine an error of the output data according to preset target output data when the current neural network layer is the last layer of the neural network, so that the device determines whether the training end condition is met according to the error.
4. The apparatus of claim 1, wherein the device comprises a plurality of sensors,
the target working node is used for carrying out quantization parameter calculation according to input data and an input data parameter operator to obtain input data quantization parameters;
and carrying out quantization processing on the input data according to the determined input data quantization parameters and the input data quantization operator to obtain quantized input data.
5. The apparatus of claim 2, wherein the device comprises a plurality of sensors,
and the target working node is used for carrying out error operation on the output data to obtain an output data gradient of the output data.
6. The apparatus of claim 5, wherein the device comprises a plurality of sensors,
the target working node is further used for carrying out quantization parameter calculation according to the output data gradient and the output data gradient parameter operator to obtain an output data gradient quantization parameter;
And carrying out quantization processing on the output data gradient according to the determined output data gradient quantization parameter and the output data gradient quantization operator to obtain a quantized output data gradient.
7. The apparatus of claim 6, wherein the device comprises a plurality of sensors,
the target working node is further configured to perform weight gradient operation according to the quantized output data gradient, the output data gradient quantization parameter, the quantized input data, the input data quantization parameter and a weight gradient operator, obtain a new weight gradient corresponding to the current neural network layer, and send the new weight gradient to the parameter server.
8. The apparatus of claim 6, wherein the device comprises a plurality of sensors,
the target working node is further configured to perform convolution inverse operation according to the quantized output data gradient, the output data quantization parameter, the inverse quantized weight, the weight quantization parameter and the inverse convolution network operator, so as to obtain an input data gradient corresponding to the current neural network layer.
9. The device according to claim 2 or 8, wherein,
the target working node is further configured to, when determining that the current neural network layer is an intermediate layer and/or a first layer of the neural network, use the output data and/or the input data gradient as input data of a next neural network layer.
10. A data processing method, characterized by a data processing apparatus for training a neural network, the neural network comprising a plurality of neural network layers, the data processing apparatus comprising a parameter server and a plurality of working nodes, the method comprising:
controlling the parameter server to receive the weight gradient of the current neural network layer sent by each working node, and updating the weight of the current neural network layer according to a plurality of weight gradients and weight updating operators to obtain updated weight;
controlling the parameter server to calculate quantization parameters according to the weight parameter operator and the updated weight to obtain corresponding weight quantization parameters;
controlling the parameter server to quantize the updated weight according to the determined weight quantization parameter and the weight quantization operator to obtain a quantized weight;
the parameter server is controlled to respectively conduct data layout processing on the quantized weights by utilizing a forward layout operator and a reverse layout operator to obtain forward vectorized weights and reverse quantized weights;
and controlling the parameter server to broadcast the weight quantization parameter, the forward quantized weight and the inverse quantized weight to the plurality of working nodes so that a target working node can perform corresponding data processing operation according to the received data, wherein the target working node is any one of the plurality of working nodes.
11. An artificial intelligence chip, characterized in that the chip comprises a data processing device according to any one of claims 1-9.
12. An electronic device comprising the artificial intelligence chip of claim 11.
13. A board, characterized in that, the board includes: a memory device, interface means and control device, an artificial intelligence chip according to claim 11;
wherein the artificial intelligent chip is respectively connected with the storage device, the control device and the interface device;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the artificial intelligent chip and external equipment;
the control device is used for monitoring the state of the artificial intelligent chip,
wherein the memory device includes: each group of storage units is connected with the artificial intelligent chip through a bus, and the storage units are as follows: DDR SDRAM;
the chip comprises: the DDR controller is used for controlling data transmission and data storage of each storage unit;
the interface device is as follows: standard PCIE interfaces.
14. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the memory-stored instructions to perform the method of claim 10.
15. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of claim 10.
CN202010111742.7A 2020-02-24 2020-02-24 Data processing method, device, computer equipment and storage medium Active CN113298223B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010111742.7A CN113298223B (en) 2020-02-24 2020-02-24 Data processing method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010111742.7A CN113298223B (en) 2020-02-24 2020-02-24 Data processing method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113298223A CN113298223A (en) 2021-08-24
CN113298223B true CN113298223B (en) 2023-12-26

Family

ID=77317837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010111742.7A Active CN113298223B (en) 2020-02-24 2020-02-24 Data processing method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113298223B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104463324A (en) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 Convolution neural network parallel processing method based on large-scale high-performance cluster
CN108335228A (en) * 2017-12-26 2018-07-27 南京海兴电网技术有限公司 A kind of workload equilibrium worksheet processing method in the power distribution network based on improved BP
CN108491928A (en) * 2018-03-29 2018-09-04 腾讯科技(深圳)有限公司 Model parameter training method, device, server and storage medium
CN110489567A (en) * 2019-08-26 2019-11-22 重庆邮电大学 A kind of node information acquisition method and its device based on across a network Feature Mapping

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11574164B2 (en) * 2017-03-20 2023-02-07 International Business Machines Corporation Neural network cooperation
US11630994B2 (en) * 2018-02-17 2023-04-18 Advanced Micro Devices, Inc. Optimized asynchronous training of neural networks using a distributed parameter server with eager updates
US10832139B2 (en) * 2018-06-22 2020-11-10 Moffett Technologies Co. Limited Neural network acceleration and embedding compression systems and methods with activation sparsification

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104463324A (en) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 Convolution neural network parallel processing method based on large-scale high-performance cluster
CN108335228A (en) * 2017-12-26 2018-07-27 南京海兴电网技术有限公司 A kind of workload equilibrium worksheet processing method in the power distribution network based on improved BP
CN108491928A (en) * 2018-03-29 2018-09-04 腾讯科技(深圳)有限公司 Model parameter training method, device, server and storage medium
CN110489567A (en) * 2019-08-26 2019-11-22 重庆邮电大学 A kind of node information acquisition method and its device based on across a network Feature Mapping

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
演化神经网络学习方法及其应用;彭真明, 安鸿伟, 张淑芹, 龙雨风, 张兴焰;石油地球物理勘探(02);全文 *

Also Published As

Publication number Publication date
CN113298223A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
CN110889503B (en) Data processing method, data processing device, computer equipment and storage medium
CN108510987B (en) Voice processing method and device
US20210133563A1 (en) Method and apparatus for training neural network, and storage medium
CN109065989B (en) Charging method and charging device
CN111443917B (en) Neural network operation optimization method and device and related products
CN111553464B (en) Image processing method and device based on super network and intelligent equipment
CN112416352A (en) Data processing method, data processing device, computer equipment and storage medium
CN108804684B (en) Data processing method and device
WO2021114903A1 (en) Data processing method and apparatus, computer device, and storage medium
CN113298223B (en) Data processing method, device, computer equipment and storage medium
CN113297128B (en) Data processing method, device, computer equipment and storage medium
CN110489177B (en) Application control method and device, storage medium and terminal equipment
WO2021114904A1 (en) Data processing method and apparatus, computer device and storage medium
CN113033761B (en) Data processing method, device, computer equipment and storage medium
US20210117199A1 (en) Method, device and storage medium for processing overhead of memory access
CN111783969A (en) Data processing method, data processing device, computer equipment and storage medium
CN112367428A (en) Electric quantity display method and system, storage medium and mobile terminal
EP3786852A1 (en) Method for subnetwork sampling, and method and device for building a hypernetwork topology
CN115173495A (en) Charging control method, charging control device and storage medium
CN112765541B (en) Data processing method, device, computer equipment and storage medium
CN113762518A (en) Data processing method, data processing device, computer equipment and storage medium
WO2021082654A1 (en) Data processing method and apparatus, and computer device and storage medium
CN113762488B (en) Processor, data processing method, computer device, and storage medium
WO2021083097A1 (en) Data processing method and apparatus, and computer device and storage medium
CN115086232B (en) Task processing and data stream generating method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant