CN110555522B - Data processing method, data processing device, computer equipment and storage medium - Google Patents

Data processing method, data processing device, computer equipment and storage medium Download PDF

Info

Publication number
CN110555522B
CN110555522B CN201910899897.9A CN201910899897A CN110555522B CN 110555522 B CN110555522 B CN 110555522B CN 201910899897 A CN201910899897 A CN 201910899897A CN 110555522 B CN110555522 B CN 110555522B
Authority
CN
China
Prior art keywords
data
information
node
dynamic
tag information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910899897.9A
Other languages
Chinese (zh)
Other versions
CN110555522A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cambricon Technologies Corp Ltd
Original Assignee
Cambricon Technologies Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cambricon Technologies Corp Ltd filed Critical Cambricon Technologies Corp Ltd
Priority to CN201910899897.9A priority Critical patent/CN110555522B/en
Publication of CN110555522A publication Critical patent/CN110555522A/en
Application granted granted Critical
Publication of CN110555522B publication Critical patent/CN110555522B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to a data processing method, apparatus, computer device, and storage medium. Its disclosed integrated circuit board includes: a memory device, an interface device, a control device and a data processing device; wherein, the data processing device is respectively connected with the memory device, the control device and the interface device; a memory device for storing data; the interface device is used for realizing data transmission between the data processing device and the external equipment; and the control device is used for monitoring the state of the data processing device. According to the data processing method, the data processing device, the computer equipment and the storage medium, the data in the neural network are marked through the label information, the processes of processing such as data storage are simplified, the occupation of hardware resources is reduced, and the operation speed of the neural network is improved.

Description

Data processing method, data processing device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method and apparatus, a computer device, and a storage medium.
Background
With the development of computer technology, neural networks (neural networks) have also been significantly improved, and neural network operations can be performed by a specific or general-purpose processor. In the related art, the operation speed of the neural network is greatly limited under the influence of factors such as many data types, large operation amount, hardware limitation and the like in the neural network.
Disclosure of Invention
In view of the above, it is necessary to provide a data processing method, an apparatus, a computer device and a storage medium for solving the above technical problems.
According to an aspect of the present disclosure, there is provided a data processing method applied to a processor, the method including:
acquiring first data, wherein the first data is used for carrying out neural network operation;
determining tag information of the first data;
storing the first data into a data storage space according to the tag information,
the label information comprises static label information, and the static label information is used for representing information related to participation of the first data in the neural network operation.
According to another aspect of the present disclosure, there is provided a data processing apparatus for a processor, the apparatus comprising:
the data acquisition module is used for acquiring first data, and the first data is used for carrying out neural network operation;
the tag information determining module is used for determining tag information of the first data;
a data storage module for storing the first data into a data storage space according to the tag information,
the label information comprises static label information, and the static label information is used for representing information related to participation of the first data in the neural network operation.
According to another aspect of the present disclosure, there is provided a data processing apparatus including the above data processing apparatus.
According to another aspect of the present disclosure, an electronic device is provided, which includes the above data processing apparatus.
According to another aspect of the present disclosure, a board card is provided, which includes the above data processing apparatus.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described data processing method.
In some embodiments, the electronic device comprises a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a camcorder, a projector, a watch, a headset, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
In some embodiments, the vehicle comprises an aircraft, a ship, and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.
The embodiment of the disclosure provides a data processing method, a data processing device, a computer device and a storage medium, wherein data in a neural network is marked through tag information, and the tag information of the data is automatically determined, so that the processes of processing the data such as storage, operation and the like are simplified, a more friendly API is provided for a user, the performance of software is improved, the occupation of hardware resources is reduced, and the operation speed of the neural network is increased.
Through deducing technical characteristics in the claims, the beneficial effects corresponding to the technical problems in the background art can be achieved. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 shows a schematic diagram of a processor of a data processing method according to an embodiment of the present disclosure.
Fig. 2a shows a flow diagram of a data processing method according to an embodiment of the present disclosure.
Fig. 2b shows a flow chart of determining a data class in a data processing method according to an embodiment of the present disclosure.
Fig. 2c is a schematic diagram illustrating determining a data category in a data processing method according to an embodiment of the present disclosure.
Fig. 2d shows a schematic diagram of a data category generation module in a data processing method according to an embodiment of the present disclosure.
Fig. 2e is a schematic diagram of a dynamic tag information generation module in the data processing method according to an embodiment of the disclosure.
Fig. 3a shows a schematic diagram of an association list in a data processing method according to an embodiment of the present disclosure.
Fig. 3b shows a schematic diagram of a computation graph in a data processing method according to an embodiment of the present disclosure.
Fig. 4a shows a schematic diagram of data conversion in a data processing method according to an embodiment of the present disclosure.
Fig. 4b shows a schematic diagram of an apparatus designed for neural network operations in a data processing method according to an embodiment of the present disclosure.
Fig. 4c shows a schematic diagram of the use of the neural network arithmetic device in the data processing method according to the embodiment of the present disclosure.
Fig. 5 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure.
Fig. 6 shows a block diagram of a board card according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
It should be understood that the terms "first," "second," and the like in the claims, the description, and the drawings of the present disclosure are used for distinguishing between different objects and not for describing a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.
As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
In the related art, research on neural network accelerators has achieved remarkable achievements and provides powerful hardware support for many deep learning algorithms. In order to improve the performance of the neural network accelerator, algorithm optimization and data layout (including data-related processing such as storage, deformation, and transfer of data) in a neural Network Development Kit (NDK) are indispensable. The rich data types in neural network algorithms result in a variety of data layout information. How to add complex data layout information to the NDK, so as to guide the software to work in all aspects, to avoid user perception, and to provide a user-friendly API, and to improve the performance of the software, and to provide the speed of neural network operation is a problem to be solved urgently. The embodiment of the disclosure provides a data processing method, a data processing device, a computer device and a storage medium, wherein data in a neural network is marked through tag information, and the tag information of the data is automatically determined, so that the processes of processing the data such as storage, operation and the like are simplified, a more friendly API is provided for a user, the performance of software is improved, the occupation of hardware resources is reduced, and the operation speed of the neural network is increased.
The data Processing method according to the embodiment of the present disclosure may be applied to a processor, and the processor may include a general-purpose processor, such as a Central Processing Unit (CPU), where the general-purpose processor may perform operations such as preprocessing on data and instructions received by the electronic device, and the general-purpose processor may also implement functions such as instruction compiling. The processor may also include an artificial Intelligence Processor (IPU) that performs artificial intelligence operations. The artificial intelligence operations may include machine learning operations, brain-like operations, and the like. The machine learning operation comprises neural network operation, k-means operation, support vector machine operation and the like. The artificial intelligence processor may include, for example, one or a combination of a GPU (Graphics Processing Unit), a NPU (Neural-Network Processing Unit), a DSP (Digital Signal Processing Unit), and a Field Programmable Gate Array (FPGA) chip. The present disclosure is not limited to a particular type of artificial intelligence processor.
In one possible implementation, the artificial intelligence processor referred to in this disclosure may include a plurality of processing units, each of which may independently run various tasks assigned thereto, such as: a convolution operation task, a pooling task, a full connection task, or the like. The present disclosure is not limited to processing units and tasks executed by processing units. The artificial intelligence processor can execute machine learning operation according to the compiled instruction transmitted by the general processor, for example, the artificial intelligence processor can execute operations such as neural network reasoning or training according to the compiled instruction so as to realize image recognition, voice recognition and the like.
Of course, in other possible embodiments, the artificial intelligence processor may also implement the above-described instruction compiling function. Further optionally, the electronic device of an embodiment of the present disclosure may include the processor described above, i.e., the electronic device may include the general-purpose processor and the artificial intelligence processor described above.
Fig. 1 shows a schematic diagram of a processor of a data processing method according to an embodiment of the present disclosure. As shown in fig. 1, the processor 100 includes a plurality of processing units 101 and a storage unit 102, the plurality of processing units 101 is used for executing instruction sequences, the storage unit 102 is used for storing data, and the storage unit 102 may include a Random Access Memory (RAM) and a register file. Multiple processing units 101 in processor 100 may share a portion of memory, such as a portion of RAM memory and a register file, or may have separate memory spaces at the same time.
Fig. 2a shows a flow diagram of a data processing method according to an embodiment of the present disclosure. As shown in fig. 2a, the method is applied to a processor, and includes step S10, step S11, and step S12.
In step S10, first data for performing neural network operations is acquired.
In step S11, tag information of the first data is determined. The label information comprises static label information, and the static label information is used for representing information related to the participation of the first data in the neural network operation.
In this embodiment, the static tag information may include information such as a data type, a dimension, and a dimension value describing the nature of the first data itself, and further include information related to a neural network operation involved based on the first data. The static tag information may be determined after the neural network is established, and the static tag information of the first data may be applicable to any processor operating the neural network, i.e., the static tag information of the first data is not changed in different processors. The static tag information of the same first data may be different in different neural networks. Alternatively, the static tag information of the first data may be determined by automatic detection of the processor during the process of acquiring the first data (the process of inputting the first data by the user), or may be determined according to the information input by the user, which is not limited by the present disclosure.
In step S12, the first data is stored in the data storage space according to the tag information.
In this embodiment, the processor may determine the size of the data storage space required for storing the first data according to the tag information, apply for the required data storage space, and then store the first data in the applied data storage space. Specifically, the processor may directly apply for a data storage space according to the static tag information, and store the first data. For example, the data volume of the first data is determined according to the static tag information of the first data, and then a data storage space for storing the first data is applied according to the determined data volume, so as to store the first data.
Optionally, the static tag information may include at least one of: a data category, a static data type, a static data dimension order, and a dimension value corresponding to each static data dimension.
In this implementation, what kind of data the first data represented by the data category belongs to in the neural network is determined based on information such as whether the user is visible or not, an operation involved in the neural network, and the like. The static data type represents the type and the number of bits of the first data, and the static data type can be a 32-bit floating point number or the like. The static data dimension may be one-dimensional, two-dimensional, multi-dimensional, etc. and the static data dimension order may represent a dimension order of storage and/or reading of the first data. The dimension value for each static data dimension represents the length or size of the corresponding static data dimension. For example, a first data is a matrix, the static data dimension includes rows and columns, the static data dimension order is row-first, the dimension value of a row is 10, and the dimension value of a column is 4. The first data is three-dimensional data, the static data dimensions comprise a first dimension, a second dimension and a third dimension, the static data dimensions are in the order of the third dimension > the second dimension > the first dimension, the dimension value of the first dimension is 10, the dimension value of the second dimension is 4, and the dimension value of the third dimension is 8.
In a possible implementation manner, the obtaining of the tag information of the first data may include: and determining static label information of the first data according to the neural network corresponding to the first data and/or the acquired input information of the first data.
In this implementation, the input information may be determined according to user input or detection of the first data by the processor when the first data is obtained, and may describe a dimension, a size, a data type, and other data states of the first data when the first data is input into the processor. The static label information, such as the type of static data, the dimensions of the static data, the order of the dimensions of the static data, and the dimension value corresponding to each dimension of the static data, may be determined according to the input information. For example, when the processor receives a certain first data input by the user, the static label information is determined according to the input information input by the user. Or the processor can detect certain first data input by the user when receiving the first data to determine the input information. The processor may determine a data category of the first data according to a neural network to which the first data corresponds.
In a possible implementation manner, determining static tag information of the first data according to a neural network corresponding to the first data and/or the acquired input information of the first data may include: determining a data category of the first data according to the first data corresponding to the out-degree and in-degree of the neural network and the operation in the neural network in which the first data participates.
In this implementation, the in-degree represents the number of previous operation nodes in which the first data participates as a data node (the first data is the output of the previous operation nodes), and the out-degree represents the number of subsequent operation nodes in which the first data participates as a data node (the first data is the input of the subsequent operation nodes). For example, a first data cc may be the output of 1 previous operation node and the input of 3 subsequent operation nodes, and then the out-degree and in-degree of the first data cc are 3 and 1. Different codes may be set for different data categories to distinguish. As shown in table 1 below, the characteristics and corresponding identifications of the data of the different data categories are described.
Optionally, the data categories may include any of: instruction (Instruction), Input Neuron (Input Neuron), Output Neuron (Output Neuron), Hidden Neuron (Hidden Neuron), Constant Neuron (Constant Neuron), Input Weight (Input Weight), Output Weight (Output Weight), Constant Weight (Constant Weight), and Auxiliary data (Auxiliary).
TABLE 1 data categories, corresponding identifications and data characteristics
Figure BDA0002211503890000051
The out-degree and in-degree of the instruction are zero, and the instruction is used for triggering neural network operation. The output degree of the input neuron, the constant neuron, the input weight, the constant weight and the auxiliary data can be more than 1, and the input degree can be 0. The output neuron and the output weight can be 0, and the in degree can be more than or equal to 1. Both the out-degree and the in-degree of the hidden neuron may be greater than or equal to 1.
In one possible implementation, the static tag information of the first data may be represented as:
Static:classification,type1,DIM_A1…An,{x1…xn}
wherein static is an identifier indicating that the tag information is static tag information. classification indicates the data type and type1 indicates the static data type. N in DIM _ A1 … An represents the static data dimension, and A1 … An represents the static data dimension order is A1 … An. The dimension value of A1 is x1 … An and the dimension value is xn. The term, "{ }" is only used to separate different parameters in the static label information in the present disclosure, and is not a necessary content of the static label information, and in practical applications, "{ }" may not exist, or may be replaced by other identifiers, and the present disclosure does not limit this.
It should be understood that, those skilled in the art can set the static tag information, the identification of the data category, and the location of each parameter in the static tag information according to actual needs, and the disclosure is not limited thereto.
For example, the static tag information of a certain first data is: static IW, Float32, DIM _ HW, {10,4}, indicates that the first data is input weight (data type), 32-bit floating point number (Static data type), two-dimensional row-first (Static data dimension and Static data dimension order), row with 10 numbers (dimension values), column with 4 numbers (dimension values).
Fig. 2b shows a flow chart of determining a data class in a data processing method according to an embodiment of the present disclosure. In a possible implementation manner, as shown in fig. 2b, determining static tag information of the first data according to the neural network corresponding to the first data and/or the acquired input information of the first data may include steps S31 to S34.
In step S31, a computation graph to be labeled corresponding to the neural network is acquired. The to-be-marked calculation graph comprises non-category data nodes, operation nodes and connection relations between the non-category data nodes and the operation nodes. Category-free data nodes may be used to represent data involved in a neural network. The operation node can be used for representing operation processing performed on data in the neural network.
In step S32, the computation graph to be labeled is traversed to obtain information of all the non-category-data nodes and information of the operation nodes in the computation graph to be labeled.
The information of the non-category data node may include data (including the first data and the second data mentioned below) participating in the neural network operation, remaining static tag information of the data other than the data category, a previous operation node and a subsequent operation node connected to the data node. The information of the operation node may include parameters corresponding to the operation of the operation node, and an input data node and an output data node (i.e., the above-mentioned non-category data node) connected to the operation node.
In this implementation, the computation graph to be marked may be used to store a graph structure of the neural network structure, or may be a data graph structure for storage, where the data graph structure is used only to store data, and the present disclosure is not limited thereto.
In this implementation, the processor may perform traversal from a certain non-category data node (e.g., an input neuron) of the computational graph to be marked, determine a previous operation node and a subsequent operation node of the non-category data node, then acquire a new non-category data node, and continue to determine the previous operation node and the subsequent operation node of the new non-category data node until all nodes (including the non-category data node and the operation node) are traversed.
In step S33, the information of the traversed non-category data node is sequentially stored in the data node queue, and the information of the traversed operation node is sequentially stored in the operation node queue.
The processor needs to judge whether the information of the non-category data node in the data node queue is stored or not after traversing the non-category data node every time, and the information of the non-category data node is not stored if the information of the non-category data node in the data node queue is stored. And if the information of the non-category data node is not stored in the data node queue, storing the information of the non-category data node. Before the processor stores the traversed operation node each time, whether the information of the operation node in the operation node queue is stored or not needs to be judged, and if the information of the operation node is stored in the operation node queue, the information of the operation node is not stored. And if the information of the operation node is not stored in the operation node queue, storing the operation node. Therefore, the storage space can be saved, and the traversal speed can be improved.
In step S34, the data type of the first data in the corresponding non-category data node is determined based on the information of the non-category data node, and the non-category data node for which the corresponding data type is determined is deleted from the data node queue.
In the implementation mode, node scanning and data type determination are finished after the data types of all the non-type data nodes in the calculation graph to be marked are determined until the data node queues are empty. Deleting the non-class data nodes of which the data classes are determined in the data node queue, so that the storage space occupied by the data node queue can be saved, secondary repetition of the data class process can be avoided after deletion, and the determination speed of determining all the non-class data nodes in the computational graph to be marked by the processor is improved.
In a possible implementation manner, determining static tag information of the first data according to a neural network corresponding to the first data and/or the acquired input information of the first data, may further include:
and storing the determined data types into corresponding non-type data nodes in the calculation graph to be marked to form data nodes with static label information, so as to obtain the marked calculation graph.
In this implementation, the data class may be stored in the static tag information of the corresponding non-class data node. At this time, the non-category data node in the computation graph to be marked becomes a data node with complete content because the corresponding data category is stored. And after determining the data types of all the non-type data nodes in the calculation graph to be marked until the data node queue is empty, finishing node scanning and data type determination to obtain the marked calculation graph. All data nodes in the marked computational graph have determined their corresponding data categories. The marked computation graph can be used in data storage or data handling processes.
In one possible implementation, step S34 may include:
determining the degree of entry of the non-category data node according to the number of the previous operation nodes connected with the non-category data node;
determining the out-degree of the non-category data node according to the number of subsequent operation nodes connected with the non-category data node;
determining the undetermined data type of the first data in the non-type data node according to the previous operation node and the subsequent operation node of the non-type data node;
and determining the data type of the first data in the non-category data node from the undetermined data type according to the in-degree and out-degree of the non-category data node.
In this implementation manner, one or more data categories corresponding to data that can participate in the operation may be determined according to the operations of the preceding operation node and the subsequent operation node, and determined as the pending data category of the first data. After the data type to be determined is determined, the data type of the first data in the data node without the type can be determined by combining the in degree and the out degree of the data node without the type. For example, if the operations corresponding to the previous operation node and the subsequent operation node of a certain non-category data node are convolution operations, the undetermined data category may include hidden neurons that can be used as convolution operation inputs and convolution operation outputs. If a certain non-category data node only has a subsequent operation node, and the operation of the subsequent operation node is convolution operation, the corresponding to-be-determined data category may include an input weight, an input neuron and a constant neuron, and the degree of emergence of the non-category data node is 2, so that the data category of the first data in the non-category data node may be determined to be the input neuron.
For example, fig. 2c is a schematic diagram illustrating determining a data category in a data processing method according to an embodiment of the disclosure. As shown in fig. 2c, determining the data class includes a node scanning process and a data class determination process. A data category generating module for determining a data category may be provided in the processor, and fig. 2d shows a schematic diagram of the data category generating module in a data processing method according to an embodiment of the present disclosure. As shown in fig. 2d, the data category generating module may include a computational graph node traverser and a static tag setter, wherein the computational graph node traverser includes a node scanner, an operation node buffer, and a data node buffer, and the static tag setter includes an operator feature memory, an access counter, and a static tag holder. The process of the processor determining the data category by using the data category generation module is as follows:
and (3) node scanning process:
and traversing the calculation graph to be marked after the node scanner acquires the calculation graph to be marked. The node scanner may first acquire the non-category data node 1, and send information of the non-category data node 1 to the data node cache. The data node buffer judges whether the information of the non-category data node 1 is stored in the data node queue of the data node buffer, and if not, the non-category data node is stored to the head (or called head) of the data node queue. Then, the node scanner scans to obtain the information of the subsequent operation node 1 'of the non-category data node 1, and sends the information of the subsequent operation node 1' to the operation node buffer. The operation node buffer judges whether the information of the subsequent operation node 1 'is stored in the operation node queue of the operation node buffer, and if not, the information of the subsequent operation node 1' is stored to the tail part of the operation node queue.
And the node scanner continues to scan to obtain the non-category data node 2 and the subsequent operation node 1 'of the non-category data node 2, and respectively sends the information of the non-category data node 2 and the information of the subsequent operation node 1' to the data node buffer and the operation node buffer. After determining that the information of the non-category data node 2 is not stored in the data node queue, the data node buffer stores the information of the non-category data node 2 to the tail of the data node queue. The operation node buffer determines that the information of the operation node 1 'is stored by querying the operation node queue, and does not store the information of the subsequent operation node 1' of the non-category data node 2.
Based on the same process, the node scanner continues to scan, and sends the information of the non-category data node 3 and the information of the subsequent operation node 1' corresponding to the non-category data node to the data node buffer and the operation node buffer respectively. And the data node buffer stores the acquired information of the non-category data node 3 to the tail part of the data node queue by judging. The node scanner continues scanning, and sends the information of the non-category data node 4 and the information of the subsequent previous operation node 1 'and the subsequent operation node 2' corresponding to the non-category data node to the data node buffer and the operation node buffer respectively. After determining that the information of the non-category data node 4 is not stored in the data node queue, the data node buffer stores the information of the non-category data node 4 to the tail of the data node queue. The operation node buffer judges whether the information of the previous operation node 1 'and the subsequent operation node 2' of the non-category data node 4 is stored in the operation node queue, and the information of the operation node 2 'is stored in the tail part of the operation node queue because the information of the operation node 1' is stored in the operation node queue. The node scanner continues scanning, and sends the information of the non-category data node 5 and the information of the corresponding previous operation node 2' to the data node buffer and the operation node buffer respectively. After determining that the information of the non-category data node 5 is not stored in the data node queue, the data node buffer stores the information of the non-category data node 5 to the tail of the data node queue. The operation node buffer judges whether the operation node queue stores the information of the previous operation node 2 ' of the non-category data node 5, and the operation node queue stores the information of the operation node 2 ', so that the information of the operation node 2 ' is not stored.
Data category determination procedure:
the node scanner is in the process of scanning the node, storing the non-category data node by the data node buffer and storing the operation node by the operation node buffer,
and the in-out degree counter is used for determining the corresponding in degree according to the number of the previous operation nodes of each non-category data node, determining the corresponding out degree according to the number of the subsequent operation nodes of each non-category data node, sending the information of the non-category data node and the corresponding out degree and in degree to the static label holder, and sending the non-category data node and the corresponding previous operation nodes and the corresponding subsequent operation nodes to the operator characteristic storage.
And the operator characteristic memory determines the to-be-selected data category possibly corresponding to the data in the non-category data node according to the received non-category data node and the corresponding preceding operation node and subsequent operation node thereof, and sends the to-be-selected data category to the static label holder.
And the static label holder determines the data type corresponding to the non-class data node from the data types to be selected according to the out-degree and in-degree of the non-class data node, stores the data type into the data node, and controls the data node buffer to delete the information of the data node of the determined data type from the data node queue.
And the node scanner is also used for scanning the data node queue after the scanning of the computational graph to be marked is finished, finishing the data type determination process when the data node queue is determined to be empty, outputting the marked computational graph and controlling the operation node queue under the condition of operating the node buffer. And if the data node queue is not empty, continuing the data type determination process.
For example, the data type of the non-category data node 1 may be determined as "input neuron" according to the information of the non-category data node 1 at the head of the data node queue (the determining process refers to the related description of step S34), and then the identifier of the "input neuron" is added to the non-category data node 1 in the computational graph to be labeled, such as the static tag information of the first data in the data node 1; meanwhile, the information of the non-category data node 1 with the determined data category needs to be deleted in the data node queue. Continuously inquiring the head of the data node queue, and if the head of the data node queue stores a data node without category, executing the data category determination process; and if the head of the data node queue is empty, ending, and determining that the data types of all the data nodes in the computational graph to be marked are determined to be finished to form the marked computational graph.
In one possible implementation, the tag information may further include dynamic tag information characterizing information of the first data relating to a processor operating the neural network. In operation S11, the obtaining the tag information of the first data may include: and generating dynamic label information of the first data according to the processor and the static label information. Alternatively, the processor may obtain the dynamic tag information of the first data according to the hardware characteristics of the processor and the marked computation graph. Further, the processor may generate a computation graph in which the dynamic label information and the static label information are labeled, based on the dynamic label information of the first data and the labeled computation graph, and the computation graph may be used for data storage, data transportation, or the like.
In this implementation, the dynamic tag information is determined from the static tag information and the computational power, performance, etc. of the processor after determining the processor running the neural network, so that the first data with the dynamic tag information can be adapted to the operation of the processor. When the neural network uses different processors to perform operations, the dynamic tag information of the first data may be different. When the performance, computing power, etc. parameters of the two processors are the same, the dynamic tag information of the first data may be the same.
In one possible implementation, the dynamic tag information may include at least one of: dynamic data type, dynamic data dimension order, fragmentation parameter, padding parameter, and data size.
In this implementation, the dynamic data type may be determined according to the type, power, etc. of data that can be processed by a processor running the neural network. If a processor is capable of processing 16-bit floating point numbers, the dynamic data type of the first data is the 16-bit floating point number when the processor is used to run the neural network. The dynamic data dimension order may be determined by the need for a processor running the neural network to read or store data. The slicing parameter may be determined according to the computational power of the processors operating the neural network, for example, if a certain processor is capable of performing 8 operations at a time, the slicing parameter may be set to 8. The filling parameter may be determined according to a dimension value of a static data dimension of the first data and the slicing parameter. The data size, alternatively referred to as the size of the data, the amount of data, is determined from the dimension values of the static data dimensions, the slicing parameters, and the padding parameters.
In a possible implementation manner, generating the dynamic tag information of the first data according to the processor and the static tag information may include:
determining an operation module for executing the target operation from the processor according to the target operation participated by the first data, wherein the target operation comprises a previous operation and/or a subsequent operation participated by the first data;
and determining the dynamic data type of the first data according to the operation module.
In this implementation, the processor may determine the target operation in which the first data participates by using the data node corresponding to the first data and only the static tag information, and the corresponding computation graph or the connection pointer of the data node, for example, as shown in fig. 2c, the target operation in which the non-category data node 4 participates includes the operations corresponding to the operation node 1 'and the operation node 2'. Or the processor may also determine the target operation in which the processor participates through the static tag information of the first data, and the processor may also determine the target operation in which the processor participates through any other manner, which is not limited by the present disclosure.
In this implementation manner, an operation module capable of executing the target operation is determined according to the target operation, and then a dynamic data type of the first data is determined according to a hardware computation requirement of the operation module, where the operation module may include a corresponding operator, or the operation module may also be set as an operator. For example, if the target operation is a convolution operation and the corresponding operation module can only perform convolution operation of floating point numbers, it may be determined that the dynamic data type of the first data is a floating point number. The processor may determine that the highest computational performance of all modules capable of performing the target operation is the operation module of the first data. For example, the first data is a matrix, the target operation is a matrix multiplication operation, the module that can perform the matrix multiplication operation includes an integer operator and a floating-point operator, and the integer operator is determined as an operation module because the integer operator has higher performance in calculating the matrix multiplication than the floating-point operator.
In a possible implementation manner, generating dynamic tag information of the first data according to the processor and the static tag information may further include:
and determining the dynamic data dimension sequence of the first data according to the algorithm characteristic corresponding to the target operation.
In this implementation manner, the processor may determine, according to the algorithm features, a plurality of implementation manners for implementing the algorithm, and determine, as the first data, a data dimension order corresponding to an implementation manner with the highest efficiency among the plurality of implementation manners, to use a specific dimension order, which is more favorable for subsequent operation processing. For example, assuming that the first data includes two first matrices and two second matrices that need to be multiplied, since the two matrices use different dimensional orders to facilitate multiplication, the dynamic data dimensional order of the first matrix may be determined to be column-first and the dynamic data dimensional order of the second matrix may be determined to be row-first, so that the multiplication efficiency of the first matrix and the second matrix is high. The row-first data may be stored in rows, and the column-first data may be stored in columns. For example, assume a matrix of
Figure BDA0002211503890000091
The row-first storage means that the matrix is stored in the order of 1, 2, 3, and 4, and the column-first storage means that the matrix is stored in the order of 1, 3, 2, and 4.
In a possible implementation manner, the generating of the dynamic tag information of the first data according to the processor and the static tag information may further include at least one of:
determining the slicing parameter of the first data according to the single calculable data number of the operation module;
determining a filling parameter of the first data according to the slicing parameter and the dimension value of the lowest dimension in the dimension sequence of the dynamic data;
and determining the data size of the first data according to all the dimension values of the first data and the filling parameters.
Wherein the dimension value of the lowest dimension in the dynamic data dimension order is determined according to the static tag information.
In this implementation manner, the processor may directly determine the number of data that can be calculated once by the operation module as the slicing parameter, or the determined slicing parameter may also be smaller than the number of data that can be calculated once by the operation module. For example, if the number of data that can be calculated by the operation module at a time is 8, the slicing parameter may be 8,6, etc.
In this implementation, the processor may determine the dimension value of the lowest dimension in the dynamic data dimension order from the dimension value of the lowest dimension in the static data dimension order. For example, if the dimension value of the lowest dimension in the dimension order of the dynamic data is 10 and the slicing parameter is 8, the dimension value 10 for the lowest dimension is first divided by the slicing parameter 8 to obtain a remainder 2, and then the remainder 2 is subtracted from the slicing parameter 8 to obtain a padding parameter padding, that is, padding is 8- (10% 8) ═ 6, where% represents the remainder.
In this implementation manner, the processor may determine, according to the dimension value of the lowest dimension S in the dynamic data dimension order and the filling parameter, the post-filling dimension value of the lowest dimension S in the dynamic data dimension order after the first data is filled. And determining the data size according to the dimension value after filling and the dimension values of other dimensions. For example, if the dimension value of the lowest dimension in the dynamic data dimension order is 10, the padding parameter is 6, and the dimension values of the remaining dimensions are 4, the dimension value 10 of the lowest dimension and the padding parameter 6 may be added to obtain a post-padding dimension value of 16, the data size of the post-padding data is 64, the final data size is 256, that is, the product of the number of bits 4 occupied by each data and the data size 64, that is, the post-padding dimension value is 10+6 — 16, the data size of the data is (10+6) × 4 — 64, and the data size is 64 × 4 — 256 Bytes.
In one possible implementation, the dynamic tag information of the first data may be expressed as:
dynamic:type2,DIM_B1…Bn,tiling,padding,size
wherein dynamic indicates that the tag information is an identifier of dynamic tag information. type2 represents a dynamic data type. DIM _ B1 … Bn indicates that the dynamic data dimension order is B1 … Bn. Tiling is a slicing parameter. Padding is the Padding parameter and size is the data size. The term, "{ }" is only used to separate different parameters in the dynamic tag information in the present disclosure, and is not a necessary content of the dynamic tag information, and in practical applications, "{ }" may not be present or may be replaced by other identifiers, and the present disclosure does not limit this.
For example, suppose that Static tag information of a first data is Static IW, Float32, DIM _ HW, {10,4 }. And the processor of the neural network which the first data participates in is operated, the data is stored in a column-first mode, 16-bit floating point numbers can be processed, and at most 8 numbers of calculations can be performed each time. Then, the dynamic tag information of the first data includes 8 tilting, 6 padding- (10-tilting), and 128Bytes, where the size is (10+6) × 4 × 2 (the byte occupied by the 16-bit floating point number is 2). Then, the dynamic tag information of the first data may be dynamic: Float16, DIM _ WH,8,6,128 Bytes.
In this embodiment, the positions of the parameters in the dynamic tag information and the static tag information may be adjusted according to actual needs, which is not limited by this disclosure.
For example, a dynamic tag generation module for determining dynamic tag information may be disposed in the processor, and the dynamic tag information generation module may include an operation node extractor, an operation module querier, an operation querier, a fragment querier, a data replenisher, and a data size calculator. Fig. 2e is a schematic diagram of a dynamic tag information generation module in the data processing method according to an embodiment of the disclosure. As shown in fig. 2e, the dynamic tag information generation module is as follows:
and the operation node extractor is used for determining a corresponding target operation node according to the connection pointer of the data node (only with static label information) corresponding to the first data to obtain the information of the target operation node.
And the operation module querier is used for determining the corresponding target operation according to the information of the target operation node, and further determining the corresponding operation module according to the target operation. And determining the dynamic data type of the first data according to the operation module, and recording the dynamic data type of the first data in the dynamic label information of the first data in the data node.
And the operation querier is used for determining the target operation performed by the target operation node according to the information of the target operation node, determining the dynamic data dimension sequence of the first data according to the algorithm characteristic corresponding to the target operation, and recording the dynamic data dimension sequence in the dynamic label information of the first data in the data node.
And the fragment querier is used for determining the fragment parameters of the first data according to the single calculable data number of the operation module and recording the fragment parameters in the dynamic label information of the first data in the data node.
And the data replenisher is used for determining the filling parameter of the first data according to the slicing parameter and the dimension value of the lowest dimension in the dimension sequence of the dynamic data, and recording the filling parameter in the dynamic label information of the first data in the data node.
And the data size calculator is used for determining the data size of the first data according to all the dimension values and the filling parameters of the first data and recording the data size in the dynamic label information of the first data in the data node.
In one possible implementation, the processor may store tag information of the first data. Alternatively, the tag information of the first data may be separately stored from the first data, that is, the first data has a corresponding data storage space, and the tag information of the first data has a corresponding tag storage space, which may not overlap with the data storage space. Specifically, the method may further include: and storing the label information of the first data into the label storage space. The tag storage space further comprises a static tag storage space and a dynamic tag storage space, and the static tag information is stored in the static tag storage space in the tag storage space; and storing the dynamic label information into a dynamic label storage space in the label storage space. Optionally, the processor may further store the static tag information and the dynamic tag information of the first data in the same tag storage space, which is not limited by this disclosure.
In this implementation, the static tag information and the dynamic tag information of the first data may be stored in different tag storage spaces, so that the processor manages the static tag information and the dynamic tag information. In addition, because the quantity of the first data is huge, the situation that a plurality of first data have the same tag information exists, the first data, the static tag information and the dynamic tag information are stored separately, the multiplexing of the tag information is facilitated, and the storage space is saved.
In one possible implementation, the method may further include: and correspondingly storing the tag identification of the tag information and the data identification of the first data into an association list, wherein the association list is used for recording the corresponding relation between the first data and the tag information. Further optionally, the association list may be stored in the memory, and the processor may determine the tag information corresponding to the first data by querying the association list, or determine the corresponding first data by the corresponding tag information.
In this implementation, the data identifier of the first data and the tag identifier of the tag information of the first data may be the same or matched identifiers, and the tag identifier and the data identifier may be identifiers such as numbers and symbols. The data identifier may also be a storage address of the corresponding first data (e.g. a first address of a physical address, a pointer indicating the storage address, etc. can represent information of the first data storage address), and the corresponding tag identifier may be a storage address storing tag information of the first data (e.g. a first address of a physical address, a pointer indicating the storage address, etc. can represent information of the first data storage address). The label identifier and the data identifier can be set by those skilled in the art according to actual needs, and the disclosure does not limit this.
Fig. 3a is a schematic diagram of an association list in a data processing method according to an embodiment of the present disclosure, and as shown in fig. 3a, it may be determined that tag information of the first data a is static tag information a and dynamic tag information 1 according to a corresponding relationship 1 in the association list. According to the correspondence 2, it can be determined that the tag information of the first data B is static tag information B and dynamic tag information 1. According to the correspondence 3, it can be determined that the tag information of the first data C is static tag information C and dynamic tag information 2. According to the correspondence 4, it can be determined that the tag information of the first data D is the static tag information c and the dynamic tag information 3. The first data a and the first data B share the same dynamic tag information 1. The first data C and the first data D share the same static tag information C.
Through the mode, the processor can accurately and quickly determine the corresponding relation between the first data and the label information according to the association list. In other optional implementation manners, the corresponding relationship between the first data and the tag information may also be stored in other manners besides the association list.
In one possible implementation, storing the tag information of the first data in the tag storage space may include: and adding a first identifier for the tag information, and storing the tag information with the first identifier into a tag storage space. Wherein, according to the tag information, storing the first data into the data storage space may include: and adding a second identifier matched with the first identifier to the first data, and storing the first data with the second identifier into the data storage space.
In this implementation, the first identifier and the second identifier may be the same or uniquely corresponding identifiers, for example, the same numbers, codes, and the like. The first identifier may be an identifier indicating a storage address of the corresponding first data, and the second identifier may be an identifier indicating a storage address of tag information of the first data. In this way, the processor may quickly and accurately determine the first data and corresponding tag information based on the first identifier and the second identifier.
Further alternatively, the processor may store the neural network according to the tag information and the structural data of the neural network, for example, the processor may store the neural network according to the tag information and a structural diagram of the neural network, wherein the structural diagram of the neural network may include a data node and an operation node, wherein the data node includes the tag information of the data.
In one possible implementation, the method may further include:
acquiring information of data nodes in a neural network, wherein the information of the data nodes comprises second data participating in neural network operation, label information of the second data, the out-degree and the in-degree of the second data and the connection relation between the data nodes and operation nodes, and the second data can comprise first data;
acquiring information of an operation node (also called an operation node) in a neural network, wherein the information of the operation node comprises parameters corresponding to the operation of the operation node, and a corresponding input data node and an output data node;
acquiring a calculation graph corresponding to the neural network, wherein the calculation graph comprises operation nodes contained in the neural network, corresponding data nodes and connection relations between the data nodes and the operation nodes;
the data nodes, the operation nodes and the calculation graph are stored.
In this implementation, fig. 3b shows a schematic diagram of a computation graph in the data processing method according to the embodiment of the present disclosure, and as shown in fig. 3b, after the neural network is built, corresponding data nodes, operation nodes, and computation graph may be determined according to the built neural network. The data node is used for recording a structure or a mark of the second data (including label information of the second data), and according to the out degree and the in degree of the second data and the connection relation between the second data and the operation node, the data node can determine which previous operation nodes the second data can serve as output and which subsequent operation nodes the second data can serve as input. The operation node is used for recording parameters required by operations or operations performed on one or more input data nodes which are input, and one or more output data nodes which are output after corresponding operations or operations are performed. For example, assuming that a certain operation node performs convolution operation on input data of 3 input data nodes to obtain data of one output data node, parameters to be recorded in the operation node include the size, the stride, the padding, and the like of a convolution kernel. The computational graph represents the neural network in the form of a graph, and the computational graph comprises data nodes, operation nodes and directional 'edges' representing the connection relations of the data nodes and the operation nodes.
In one possible implementation, the processor may generate one or more corresponding instructions according to the computational graph to implement the neural network operations by executing the one or more instructions. Optionally, a general-purpose processor included in the processor may perform compiling processing on the computation graph according to the computation graph and the label information included in the computation graph, so as to obtain one or more instructions, and then, the general-purpose processor may execute the instructions to implement operations such as neural network training or reasoning. Optionally, a general-purpose processor included in the processor may compile the obtained instruction, and an artificial intelligence processor included in the processor may perform operations such as neural network training or inference according to the compiled instruction.
In one possible implementation manner, storing the first data in the data storage space according to the tag information includes:
when the tag information contains dynamic tag information, applying a first data storage space for storing first data from the data storage space according to the data size in the dynamic tag information;
and storing the first data into the first data storage space.
In this implementation, when there is dynamic tag information in the tag information of the first data, the processor may apply for a first data storage space for the first data according to a request of a user or automatically according to a data size in the dynamic tag information, and store the first data in the first data storage space.
For example, the application for the first data storage space may be implemented by "labelmaloc (& wp, weight)" or the like, where the size of the first data storage space required by weight may be determined according to the data size in the dynamic tag information, and may be determined by a user according to the data size input in the dynamic tag information of the first data provided by the processor (or may be determined by the processor automatically recognizing the data size in the dynamic tag information of the first data). In this way, the first data can be applied to the first data storage space from a certain designated data storage space wp (wp is the number, code, etc. identification of the data storage space).
Further, storing the first data in the first data storage space may include:
judging whether the current data state of the first data is consistent with the dynamic label information or not, wherein the data state comprises the data type of the first data, the sequence of data dimensionality and a dimensionality value; alternatively, the data state may be determined from static tag information of the first data.
When the current data state of the first data is inconsistent with the dynamic tag information, converting the first data according to the static tag information and the dynamic tag information to obtain converted first data, wherein the data state of the converted first data is consistent with the dynamic tag information;
and storing the converted first data into the first data storage space.
In this implementation, before the first data is stored in the data storage space, when it is determined that the current data state of the first data is consistent with the dynamic tag information of the first data, the first data may be directly stored in the first data storage space. Whether the current data state of the first data is consistent with the dynamic label information of the first data or not can be that the current data type of the first data is the same as the dynamic data type, the sequence of the current data dimension is the same as the sequence of the dynamic data dimension, and the dimension value of the current corresponding dimension is the same as the dimension value calculated according to the slicing parameter, the filling parameter and the dimension value of the static data dimension.
In a possible implementation manner, converting the first data according to the static tag information and the dynamic tag information to obtain the converted first data may include at least one of the following processes:
converting the data type of the first data into a dynamic data type;
adjusting the order of the data dimensions of the first data;
filling the first data according to the filling parameters;
and segmenting the first data according to the slicing parameters.
In a possible implementation manner, the processor may first determine whether the current data type of the first data is consistent with the dynamic data type, and if not, may first convert the data type of the first data into the dynamic data type. And adjusting the order of the data dimension of the first data after the type conversion to be the same as the order of the dynamic data dimension. And filling the first data with the adjusted dimension order according to the filling parameters. And fragmenting the filled first data according to the fragmentation parameters to obtain converted first data, wherein the converted first data comprises a plurality of data fragments, and the size of each data fragment corresponds to the fragmentation parameters. The processor may also fragment the first data with the adjusted dimension order, and then fill one or more to-be-filled fragments with the size not corresponding to the fragment parameter after fragmentation, so as to obtain the converted first data.
For example, fig. 4a shows a schematic diagram of data conversion in a data processing method according to an embodiment of the present disclosure. As shown in FIG. 4a, the Static tag information of the first data is "Static: IW, Float32, DIM _ HW, {10,4 }". The current data state of the first data is consistent with the static tag information. The dynamic tag information of the first data is "dynamic: Float16, DIM _ WH,8,6,128 Bytes". The processor may first convert the data type of the first data from a 32-bit floating point number to a 16-bit floating point number. And then transposing the first data to make the dimensional order of the data consistent with the DIM _ WH. Based on the slicing parameter "8", the first data is sliced into two data slices. And then filling the data fragments with the sizes not corresponding to the fragment parameters according to the filling parameter '6', and filling and bit-filling if the sizes are all 0. Finally, the converted first data is obtained.
In the process of actually applying the method, when the method is used for performing neural network operation by using the first data with the tag information, the method needs to be embodied in the form of software for a user to use. The API provided for the user in the software needs to provide an input interface for the user to input the first data. Since the user knows or can see that the first data input through the input interface corresponds to the data amount (or size) of the static tag information, while the data state of the first data in the storage and operation of the first data in the actual application process corresponds to the dynamic tag information, the dynamic tag information is not visible to the user. Assuming that the processor needs to apply for a first data storage space for storing first data with a data state corresponding to the dynamic tag information according to the data size or data size input by the user, another input interface for inputting the data size may be added to the API provided for the user, and the processor displays the data size in the dynamic tag information of the first data for the user so as to be used for input, thereby enabling the processor to apply for the first data storage space. Or the processor can directly apply for the first data storage space according to the data size in the dynamic tag information of the first data, and the API provided by the user does not need to add an input interface. The above two modes are very simple and convenient for the user to operate, and the API provided for the user is very friendly. And moreover, the storage space is applied in real time according to the dynamic label information of the first data, so that the storage space can be saved, and the resource occupation can be reduced.
In this way, when the first data has the dynamic tag information, the first data is stored according to the dynamic tag information, and the first data can be matched with the performance of the processor. When the processor reuses the first data, the first data can be directly processed without data deformation or conversion, so that the process of using the first data by the processor is simplified, the operation speed of the processor can be increased, and the operation time is saved. In a possible implementation manner, in the process that the first data participates in the neural network operation, the first data may need to be transferred between different data storage spaces (for example, access operations such as Load or Store), and the processor may implement the data access operation according to the tag information of the first data. Specifically, the method may further include: and according to the label information, the first data is transferred from the current data storage space to the second data storage space.
Specifically, the processor performs the transfer of the first data when receiving a transfer request for performing the transfer (or dump) of the first data. The processor can obtain the current data storage space address, the second data storage space address, the data size and the transfer direction of the first data when receiving the first data transfer request, and further transfer the first data. Wherein the current data storage space address, the second data storage space address and the transfer direction may be determined according to a user input or according to a received transfer request. The data size may be determined from dynamic tag information of the first data, or may be determined from a static data dimension and a corresponding dimension value when the first data has only static tag information. The transfer direction may be from one memory to another memory, internal to the memory, and when the processor includes a master processor and one or more slave processors, "transferring from one memory to another memory" includes: the memory corresponding to the main processor is transferred into the memory corresponding to the secondary processor, the memory corresponding to one secondary processor is transferred into the memory corresponding to the main processor, and the memory corresponding to one secondary processor is transferred into the memory corresponding to the other secondary processor. The main processor may be a general-purpose processor such as a CPU, and the secondary processor may be an artificial intelligence processor.
For example, the first data may be transferred by means of "labelmcpy" (wp, wp _ CPU, weight, HostToDevice) "or the like, which is represented by fetching the first data from the current data storage space wp _ CPU (which may be a physical address of the current data storage space) in the memory of the host (e.g., CPU side), and storing the first data into the second data storage space wp with the size of weight applied in the device (e.g., artificial intelligence processor side). During the data transfer process, before the first data is stored in the second data storage space wp, the processor needs to apply for the second data storage space wp with the size of weight. When the master processor and the slave processor included in the processor are both CPUs, the master processor or the slave processor may apply for a data storage space during the data transfer process. When the main processor included in the processor is a CPU and the slave processor is an IPU, the main processor may apply for a data storage space during data transfer.
In a possible implementation manner, the unloading the first data from the current data storage space to the second data storage space according to the tag information may include:
judging whether the current data state of the first data is consistent with the dynamic label information of the first data, wherein the data state can comprise the data type of the first data, the sequence of data dimensionality and a dimensionality value;
when the current data state of the first data is inconsistent with the dynamic tag information, performing data conversion on the first data according to the static tag information and the dynamic tag information to obtain converted first data, wherein the data state of the converted first data is consistent with the dynamic tag information;
and transferring the converted first data into a second data storage space.
In this implementation, when transferring the first data, if the first data has static tag information and dynamic tag information, it is first determined whether a data state of the first data is consistent with the dynamic tag information of the first data. If the first data are consistent with the second data, the first data can be processed and directly stored in the second data storage space, and if the first data are inconsistent with the second data storage space, the first data can be subjected to data conversion according to the static tag information and the dynamic tag information, and the obtained converted first data can be stored in the second data storage space. If the first data only contains static label information, the first data can be directly transferred to the second data storage space without processing the first data.
Through the mode, in the process of data transfer, the data state of the first data is preferentially judged according to the dynamic label information of the first data, and the first data (containing the dynamic label information) transferred to the second data storage space is guaranteed to be matched with the performance of the processor. When the processor reuses the first data, the first data can be directly processed without data deformation, so that the process of using the first data by the processor is simplified, the operation speed of the processor can be increased, and the operation time is saved.
In this embodiment, since the tag information and the first data are stored in different storage spaces, it is required to ensure that the processor can determine the corresponding tag information according to the first data through a corresponding policy.
In one possible implementation, the method may further include setting up a device for neural network development, execution, such that the device can implement the following steps: the device for developing and executing the neural network can be developed and executed by setting the device for developing and executing the neural network through the steps, and the device can comprise a processor (the processor can comprise a general processor such as a CPU and an artificial intelligence processor), wherein the CPU is used for executing the steps of developing and creating, developing and compiling and developing and executing. The method comprises the following specific steps:
and a development and creation step, namely creating data nodes and operation nodes required by neural network operation, generating a calculation graph according to the connection relation between the data nodes and the operation nodes, and storing the data nodes, the operation nodes and the calculation graph.
And developing and compiling, namely scanning the calculation graph, determining the data type and the dynamic label information corresponding to the data in each data node, and generating one or more corresponding instructions according to the calculation graph. Optionally, the general-purpose processor may compile the computation graph according to the computation graph and the tag information therein, to obtain the instruction.
And a development operation step, namely converting the data in the data nodes according to the dynamic label information of the data in the data nodes to obtain the converted data. Wherein the dynamic tag information of the converted data may be consistent with hardware information of a device for instruction, such as an artificial intelligence processor.
In one possible implementation, the developing and running step may further include: managing a device (e.g., a processor such as an IPU) includes at least one of: the control device performs the start and end of the neural network operation, configures the device, sets the registers, and performs processing related to the operation of the device, such as checking for an interrupt, which is not limited by the present disclosure.
In one possible implementation, the developing and running step may further include: the memory of a device (such as a processor of an IPU) is managed, and the association performed on the memory of the device includes association related to the use of the memory, such as memory application and memory release. For example, the development operation step includes applying for a first data storage space for storing corresponding data according to dynamic tag information of the data in the data node. The development operation step includes transferring the data from the current data storage space to the second data storage space according to the tag information of the data in the data node. The developing and running step comprises releasing idle memory resources.
In a possible implementation manner, fig. 4b shows a schematic diagram of an apparatus designed for neural network operation in the data processing method according to the embodiment of the disclosure, as shown in fig. 4b, the processor may include a development creation module 41, a development compiling module 42, and a development running module 43, and the development creation module 41, the development compiling module 42, and the development running module 43 may be integrated in a general-purpose processor of the processor. Wherein the content of the first and second substances,
the development creation module 41 includes: the data node sub-module is used for creating and storing data nodes, the operation node sub-module is used for creating and storing operation nodes, and the calculation graph sub-module is used for creating and storing a calculation graph. The development creation module 41 is configured to send the computation graph to the development compilation module 42.
The development compiling module 42 includes: the computation graph scanning submodule is used for scanning the computation graph to generate label information (including data categories and dynamic label information) of the data nodes, and the instruction generating submodule is used for generating the dynamic label information of the data nodes and generating instructions according to the computation graph. The development compiling module 42 is configured to send instructions and tag information to the development execution module 43.
The development execution module 43 includes: the device comprises a data transformation submodule for performing data transformation, a device management submodule for managing the device, and a memory management submodule for managing the memory of the device.
In this implementation, the user can simply and quickly design the apparatus for performing the neural network operation according to the above steps for developing the neural network.
In a possible implementation manner, fig. 4c shows a schematic usage diagram of a device for neural network operation in a data processing method according to an embodiment of the present disclosure, and as shown in fig. 4c, the method may further include the following steps of performing creation, compilation and execution of a neural network, through which the device for neural network operation may be used to implement neural network operation. The device for neural network operation can comprise a CPU, an IPU and other processors.
In executing the creating step (as in step S61), a general-purpose processor such as a CPU may be used to create data nodes and operation nodes required for performing the neural network operation, and generate a computation graph according to the connection relationship between the data nodes and the operation nodes.
Executing the compiling step (as step S62), the general-purpose processor such as a CPU may be configured to scan the computation graph to generate a computation graph of the label information (including the data type and the dynamic label information) of the data nodes, and to compile the computation graph to generate the hardware instructions corresponding to the computation graph.
And executing the operation step (such as step S63), wherein the IPU and other artificial intelligent processors are used for operating hardware instructions and realizing neural network operation such as training or reasoning. And calling a corresponding function based on the hardware instruction to operate the data, and finishing the operation process of the neural network. Of course, general-purpose processors such as a CPU may also be used to run hardware instructions to implement operations such as neural network training or reasoning.
In one possible implementation, the step of performing operations may further include: the processor such as IPU or CPU is used to perform data storage space (e.g. applying for the first data storage space as described above), tag storage space application, and move data (e.g. transferring data to the second data storage space as described above) based on hardware instructions.
In one possible implementation, the step of performing operations may further include: processors such as IPUs or CPUs are used to free up resources that are no longer used.
In this implementation, the user can simply and quickly perform the neural network operation using a corresponding device such as a processor according to the above steps for performing the neural network.
It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.
It should be further noted that, although the steps in the flowcharts of fig. 2a and 2b are shown in sequence as indicated by arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2a and 2b may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least some of the sub-steps or stages of other steps.
Fig. 5 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure. As shown in fig. 5, the apparatus may be applied to a processor 100, and includes a data acquisition module 50, a tag information determination module 51, and a data storage module 52. The data obtaining module 50, the tag information determining module 51, and the data storing module 52 may be disposed in any one of the processing units 101 of the processor 100, or the data obtaining module 50, the tag information determining module 51, and the data storing module 52 may be disposed in different processing units 101, respectively. Any two of the data acquisition module 50, the tag information determination module 51 and the data storage module 52 may also be disposed in the same processing unit, and the remaining one may be disposed in another processing unit 101, which is not limited in this disclosure.
A data acquisition module 50, configured to acquire first data, where the first data is used for performing neural network operations;
a tag information determination module 51 that determines tag information of the first data;
a data storage module 52 for storing the first data into a data storage space according to the tag information,
the label information comprises static label information, and the static label information is used for representing information related to participation of the first data in the neural network operation.
In a possible implementation manner, the tag information determining module 51 includes:
the first determining submodule determines static label information of the first data according to the neural network corresponding to the first data and/or the acquired input information of the first data,
wherein the static tag information comprises at least one of: a data category, a static data type, a static data dimension order, and a dimension value corresponding to each static data dimension.
In one possible implementation manner, the first determining sub-module includes:
the data category determination sub-module is used for determining the data category of the first data according to the output degree and the input degree of the first data corresponding to the neural network and the operation in the neural network in which the first data participates;
wherein the data categories include any of: instructions, input neurons, output neurons, hidden neurons, constant neurons, input weights, output weights, constant weights, and auxiliary data.
In one possible implementation, the data category determining sub-module includes:
the calculation graph acquisition sub-module is used for acquiring a calculation graph to be marked corresponding to the neural network, wherein the calculation graph to be marked comprises operation nodes, non-category data nodes, and connection relations between the non-category data nodes and the operation nodes;
the traversal submodule is used for traversing the to-be-marked calculation graph to obtain information of all class-free data nodes and information of operation nodes in the to-be-marked calculation graph, wherein the information of the class-free data nodes comprises the first data, previous operation nodes and subsequent operation nodes corresponding to the first data, and the information of the operation nodes comprises parameters corresponding to the operation of the operation nodes, corresponding input data nodes and output data nodes;
the information storage submodule is used for sequentially storing the traversed information of the non-category data nodes in a data node queue and sequentially storing the traversed information of the operation nodes in an operation node queue;
and the class determining submodule determines the data class of the first data in the corresponding non-class data node according to the information of the non-class data node, and deletes the non-class data node of which the corresponding data class is determined from the data node queue.
In one possible implementation, the category determination submodule is configured to:
determining the degree of entry of the non-category data node according to the number of the previous operation nodes connected with the non-category data node;
determining the out-degree of the non-category data node according to the number of subsequent operation nodes connected with the non-category data node;
determining the undetermined data type of the first data in the non-type data node according to the previous operation node and the subsequent operation node of the non-type data node;
and determining the data type of the first data in the non-category data node from the undetermined data type according to the in-degree and out-degree of the non-category data node.
In a possible implementation manner, the data category determining sub-module further includes:
and the computation graph marking submodule stores the determined data types into corresponding non-type data nodes in the computation graph to be marked to form data nodes with static label information and obtain the marked computation graph.
In one possible implementation, the tag information further includes dynamic tag information, the dynamic tag information being used to characterize information of the first data related to a processor running the neural network;
wherein, the tag information determining module 51 includes:
a dynamic label generating module for generating dynamic label information of the first data according to the processor and the static label information,
wherein the dynamic tag information comprises at least one of: dynamic data type, dynamic data dimension order, fragmentation parameter, padding parameter, and data size.
In one possible implementation manner, the dynamic tag generation module includes:
a target operation determining sub-module, configured to determine, from the processor, an operation module for executing a target operation according to the target operation involved in the first data, where the target operation includes a previous operation and/or a subsequent operation involved in the first data;
and the dynamic data type determining submodule determines the dynamic data type of the first data according to the operation module.
In a possible implementation manner, the dynamic tag generation module further includes:
and the dimension order determining submodule is used for determining the dimension order of the dynamic data of the first data according to the algorithm characteristic corresponding to the target operation.
In a possible implementation manner, the dynamic tag generation module further includes at least one of the following sub-modules:
the fragmentation parameter determination submodule determines the fragmentation parameter of the first data according to the number of the data which can be calculated by the operation module for one time;
a filling parameter determining submodule for determining a filling parameter of the first data according to the slicing parameter and the dimension value of the lowest dimension in the dimension sequence of the dynamic data;
a data size determination submodule for determining the data size of the first data according to all the dimension values of the first data and the filling parameter,
wherein the dimension value of the lowest dimension in the dynamic data dimension order is determined according to the static tag information.
In one possible implementation, the apparatus further includes:
the tag information storage module stores the tag information of the first data into a tag storage space;
the static label information is stored in a static label storage space in the label storage space;
and the dynamic label information is stored in a dynamic label storage space in the label storage space.
In one possible implementation, the data storage module includes:
the space application sub-module is used for applying a first data storage space for storing the first data from the data storage space according to the dynamic tag information when the tag information comprises the dynamic tag information;
and storing the first data into the first data storage space.
In one possible implementation, the data storage submodule includes:
the state determining submodule is used for judging whether the current data state of the first data is consistent with the dynamic label information or not, wherein the data state comprises the data type of the first data, the sequence of data dimensions and dimension values;
the data conversion sub-module is used for converting the first data according to the static tag information and the dynamic tag information when the current data state of the first data is inconsistent with the dynamic tag information to obtain converted first data, and the data state of the converted first data is consistent with the dynamic tag information;
a storage submodule for storing the converted first data into the first data storage space
In one possible implementation, the apparatus may further include:
and the data transfer module transfers the first data from the current data storage space to the second data storage space according to the label information.
In a possible implementation manner, the unloading the first data from the current data storage space to the second data storage space according to the tag information may include:
judging whether the current data state of the first data is consistent with the dynamic label information of the first data or not, wherein the data state comprises the data type of the first data, the sequence of data dimensionality and a dimensionality value;
when the current data state of the first data is inconsistent with the dynamic tag information, performing data conversion on the first data according to the static tag information and the dynamic tag information to obtain converted first data, wherein the data state of the converted first data is consistent with the dynamic tag information;
and transferring the converted first data into a second data storage space.
In one possible implementation, the apparatus may further include:
and the list storage module is used for correspondingly storing the tag identification of the tag information and the data identification of the first data into an association list, and the association list is used for recording the corresponding relation between the first data and the tag information.
In one possible implementation manner, the tag information storage module may include:
the first marking submodule adds a first mark to the label information and stores the label information with the first mark into the label storage space,
a data storage module, which may include:
and the second marking submodule adds a second identifier matched with the first identifier to the first data and stores the first data with the second identifier in the data storage space.
In one possible implementation, the apparatus may further include:
the data node acquisition module is used for acquiring information of data nodes in the neural network, the information of the data nodes comprises second data participating in the operation of the neural network, label information of the second data, the out-degree and in-degree of the second data and the connection relation between the data nodes and the operation nodes, and the second data comprises first data;
the operation node acquisition module is used for acquiring information of operation nodes in the neural network, wherein the information of the operation nodes comprises parameters corresponding to the operation of the operation nodes, corresponding input data nodes and output data nodes;
the calculation graph acquisition module is used for acquiring a calculation graph corresponding to the neural network, and the calculation graph comprises operation nodes, corresponding data nodes and a connection relation between the data nodes and the operation nodes;
and the storage module is used for storing the data nodes, the operation nodes and the calculation graph.
Embodiments of the present disclosure also provide a non-volatile computer-readable storage medium on which computer program instructions are stored, which when executed by a processor implement the above-mentioned data processing method.
It should be understood that the above-described apparatus embodiments are merely illustrative and that the apparatus of the present disclosure may be implemented in other ways. For example, the division of the units/modules in the above embodiments is only one logical function division, and there may be another division manner in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented.
In addition, unless otherwise specified, each functional unit/module in each embodiment of the present disclosure may be integrated into one unit/module, each unit/module may exist alone physically, or two or more units/modules may be integrated together. The integrated units/modules may be implemented in the form of hardware or software program modules.
If the integrated unit/module is implemented in hardware, the hardware may be digital circuits, analog circuits, etc. Physical implementations of hardware structures include, but are not limited to, transistors, memristors, and the like. The artificial intelligence processor may be any suitable hardware processor, such as a CPU, GPU, FPGA, DSP, ASIC, etc., unless otherwise specified. Unless otherwise specified, the Memory unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as resistive Random Access Memory rram (resistive Random Access Memory), Dynamic Random Access Memory dram (Dynamic Random Access Memory), Static Random Access Memory SRAM (Static Random-Access Memory), enhanced Dynamic Random Access Memory edram (enhanced Dynamic Random Access Memory), High-Bandwidth Memory HBM (High-Bandwidth Memory), hybrid Memory cubic hmc (hybrid Memory cube), and so on.
The integrated units/modules, if implemented in the form of software program modules and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
In a possible implementation manner, a board card is further disclosed, which comprises a storage device, an interface device, a control device and the data processing device; wherein the data processing device is connected with the storage device, the control device and the interface device respectively; the storage device is used for storing data; the interface device is used for realizing data transmission between the data processing device and external equipment; the control device is used for monitoring the state of the data processing device.
Fig. 6 shows a block diagram of a board according to an embodiment of the present disclosure, and referring to fig. 6, the board may include other kit components besides the data processing device 389, where the kit components include, but are not limited to: memory device 390, interface device 391 and control device 392;
the memory device 390 is connected to the data processing apparatus through a bus for storing data. The memory device may include a plurality of groups of memory cells 393. Each group of the storage units is connected with the data processing device through a bus. It is understood that each group of the memory cells may be a DDR SDRAM (Double Data Rate SDRAM).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the data processing device may internally include 4 72-bit DDR4 controllers, wherein 64 bits of the 72-bit DDR4 controller are used for data transmission, and 8 bits are used for ECC checking. It can be understood that when DDR4-3200 particles are adopted in each group of memory cells, the theoretical bandwidth of data transmission can reach 25600 MB/s.
In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the data processing device and is used for controlling data transmission and data storage of each memory unit.
The interface device is electrically connected with the data processing device. The interface means are used to enable data transfer between the data processing apparatus and an external device, such as a server or a computer. For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transmitted to the data processing apparatus by the server through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the interface device may also be another interface, and the disclosure does not limit the specific expression of the other interface, and the interface unit may implement the switching function. In addition, the calculation results of the data processing device are still transmitted back to an external device (e.g. a server) by the interface device.
The control device is electrically connected with the data processing device. The control device is used for monitoring the state of the data processing device. Specifically, the data processing device and the control device may be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). As the data processing apparatus may comprise a plurality of processing chips, a plurality of processing cores or a plurality of processing circuits, a plurality of loads may be carried. Therefore, the data processing device can be in different working states such as multi-load and light load. The control device can regulate and control the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the data processing device.
In one possible implementation, an electronic device is disclosed that includes the above-described data processing apparatus. The electronic device comprises a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. The technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The foregoing may be better understood in light of the following clauses:
clause a1. a data processing method, applied to a processor, the method comprising:
acquiring first data, wherein the first data is used for carrying out neural network operation;
determining tag information of the first data;
storing the first data into a data storage space according to the tag information,
the label information comprises static label information, and the static label information is used for representing information related to participation of the first data in the neural network operation.
Clause a2. the method of clause a1, the determining tag information for the first data, comprising:
determining static label information of the first data according to the neural network corresponding to the first data and/or the acquired input information of the first data,
wherein the static tag information comprises at least one of: a data category, a static data type, a static data dimension order, and a dimension value corresponding to each static data dimension.
Clause a3. according to the method of clause a2, determining static tag information of the first data according to the neural network corresponding to the first data and/or the acquired input information of the first data, including:
determining a data category of the first data according to the first data corresponding to the out-degree and in-degree of the neural network and the operation in the neural network in which the first data participates;
wherein the data categories include any of: instructions, input neurons, output neurons, hidden neurons, constant neurons, input weights, output weights, constant weights, and auxiliary data.
Clause a4. according to the method of clause a3, determining a data category of the first data according to the first data corresponding to the out-degree, in-degree of the neural network and the operation in the neural network in which the first data participates, including:
acquiring a to-be-marked calculation graph corresponding to the neural network, wherein the to-be-marked calculation graph comprises operation nodes, non-category data nodes, and connection relations between the non-category data nodes and the operation nodes;
traversing the to-be-marked calculation graph to obtain information of all non-category data nodes and information of operation nodes in the to-be-marked calculation graph, wherein the information of the non-category data nodes comprises the first data, and a previous operation node and a subsequent operation node corresponding to the first data, and the information of the operation nodes comprises parameters corresponding to the operation of the operation nodes, corresponding input data nodes and output data nodes;
sequentially storing the traversed information of the non-category data nodes in a data node queue, and sequentially storing the traversed information of the operation nodes in an operation node queue;
and determining the data type of the first data in the corresponding non-type data node according to the information of the non-type data node, and deleting the non-type data node of which the corresponding data type is determined from the data node queue.
Clause a5. according to the method of clause a4, determining the data category of the first data in the corresponding non-category-data node according to the information of the non-category-data node, includes:
determining the degree of entry of the non-category data node according to the number of the previous operation nodes connected with the non-category data node;
determining the out-degree of the non-category data node according to the number of subsequent operation nodes connected with the non-category data node;
determining the undetermined data type of the first data in the non-type data node according to the previous operation node and the subsequent operation node of the non-type data node;
and determining the data type of the first data in the non-category data node from the undetermined data type according to the in-degree and out-degree of the non-category data node.
Clause a6. according to the method of clause a4, determining a data category for the first data according to the first data corresponding to the out-degree, in-degree of the neural network and the operation in the neural network in which the first data participates, further comprising:
and storing the determined data types into corresponding non-type data nodes in the calculation graph to be marked to form data nodes with static label information, so as to obtain the marked calculation graph.
Clause A7. the method of any one of clause a 1-clause a6, the tag information further comprising dynamic tag information characterizing information of the first data relating to a processor running the neural network;
wherein determining the tag information of the first data comprises:
generating dynamic tag information for the first data based on the processor and the static tag information,
wherein the dynamic tag information comprises at least one of: dynamic data type, dynamic data dimension order, fragmentation parameter, padding parameter, and data size.
Clause A8. generating dynamic tag information for the first data from the processor and the static tag information according to the method of clause a4, including:
determining an operation module for executing the target operation from the processor according to the target operation participated by the first data, wherein the target operation comprises a previous operation and/or a subsequent operation participated by the first data;
and determining the dynamic data type of the first data according to the operation module.
Clause A9. generating dynamic tag information for the first data from the processor and the static tag information according to the method of clause a8, including:
and determining the dynamic data dimension sequence of the first data according to the algorithm characteristic corresponding to the target operation.
Clause a10. generating, from the processor and the static tag information, dynamic tag information for the first data according to the method of clause A8 or clause a9, including at least one of:
determining the slicing parameter of the first data according to the single calculable data number of the operation module;
determining a filling parameter of the first data according to the slicing parameter and the dimension value of the lowest dimension in the dimension sequence of the dynamic data;
determining the data size of the first data according to all dimension values of the first data and the filling parameter,
wherein the dimension value of the lowest dimension in the dynamic data dimension order is determined according to the static tag information.
Clause a11. the method of clause a7, further comprising:
storing the label information of the first data into a label storage space;
the static label information is stored in a static label storage space in the label storage space;
and the dynamic label information is stored in a dynamic label storage space in the label storage space.
Clause a12. the method of clause a7, storing the first data in a data storage space according to the tag information, comprising:
when the tag information contains dynamic tag information, applying for a first data storage space for storing the first data from the data storage space according to the dynamic tag information;
and storing the first data into the first data storage space.
Clause a13. storing the first data into the first data storage space according to the method of clause a12, including:
judging whether the current data state of the first data is consistent with the dynamic label information or not, wherein the data state comprises the data type of the first data, the sequence of data dimensions and a dimension value;
when the current data state of the first data is inconsistent with the dynamic tag information, converting the first data according to the static tag information and the dynamic tag information to obtain converted first data, wherein the data state of the converted first data is consistent with the dynamic tag information;
and storing the converted first data into the first data storage space.
Clause a14. a data processing apparatus applied to a processor, the apparatus comprising:
the data acquisition module is used for acquiring first data, and the first data is used for carrying out neural network operation;
the tag information determining module is used for determining tag information of the first data;
a data storage module for storing the first data into a data storage space according to the tag information,
the label information comprises static label information, and the static label information is used for representing information related to participation of the first data in the neural network operation.
Clause a15. the apparatus of clause a14, the tag information determination module, comprising:
the first determining submodule determines static label information of the first data according to the neural network corresponding to the first data and/or the acquired input information of the first data,
wherein the static tag information comprises at least one of: a data category, a static data type, a static data dimension order, and a dimension value corresponding to each static data dimension.
Clause a16. the apparatus of clause a15, the first determination submodule comprising:
the data category determination sub-module is used for determining the data category of the first data according to the output degree and the input degree of the first data corresponding to the neural network and the operation in the neural network in which the first data participates;
wherein the data categories include any of: instructions, input neurons, output neurons, hidden neurons, constant neurons, input weights, output weights, constant weights, and auxiliary data.
Clause a17. the apparatus of clause a16, the data category determination submodule, comprising:
the calculation graph acquisition sub-module is used for acquiring a calculation graph to be marked corresponding to the neural network, wherein the calculation graph to be marked comprises operation nodes, non-category data nodes, and connection relations between the non-category data nodes and the operation nodes;
the traversal submodule is used for traversing the to-be-marked calculation graph to obtain information of all class-free data nodes and information of operation nodes in the to-be-marked calculation graph, wherein the information of the class-free data nodes comprises the first data, previous operation nodes and subsequent operation nodes corresponding to the first data, and the information of the operation nodes comprises parameters corresponding to the operation of the operation nodes, corresponding input data nodes and output data nodes;
the information storage submodule is used for sequentially storing the traversed information of the non-category data nodes in a data node queue and sequentially storing the traversed information of the operation nodes in an operation node queue;
and the class determining submodule determines the data class of the first data in the corresponding non-class data node according to the information of the non-class data node, and deletes the non-class data node of which the corresponding data class is determined from the data node queue.
Article a18. the apparatus of article a17, the category determination submodule configured to:
determining the degree of entry of the non-category data node according to the number of the previous operation nodes connected with the non-category data node;
determining the out-degree of the non-category data node according to the number of subsequent operation nodes connected with the non-category data node;
determining the undetermined data type of the first data in the non-type data node according to the previous operation node and the subsequent operation node of the non-type data node;
and determining the data type of the first data in the non-category data node from the undetermined data type according to the in-degree and out-degree of the non-category data node.
Clause a19. the apparatus of clause a17, the data category determination submodule, further comprising:
and the computation graph marking submodule stores the determined data types into corresponding non-type data nodes in the computation graph to be marked to form data nodes with static label information and obtain the marked computation graph.
Article a20. the apparatus of any of articles a 14-a 19, the tag information further comprising dynamic tag information characterizing information of the first data relating to a processor running the neural network;
wherein, the tag information determination module includes:
a dynamic label generating module for generating dynamic label information of the first data according to the processor and the static label information,
wherein the dynamic tag information comprises at least one of: dynamic data type, dynamic data dimension order, fragmentation parameter, padding parameter, and data size.
Article a21. the apparatus of article a20, the dynamic tag generation module, comprising:
a target operation determining sub-module, configured to determine, from the processor, an operation module for executing a target operation according to the target operation involved in the first data, where the target operation includes a previous operation and/or a subsequent operation involved in the first data;
and the dynamic data type determining submodule determines the dynamic data type of the first data according to the operation module.
Article a22. the apparatus of article a21, the dynamic tag generation module, further comprising:
and the dimension order determining submodule is used for determining the dimension order of the dynamic data of the first data according to the algorithm characteristic corresponding to the target operation.
Clause a23. the apparatus of clause a21 or clause a22, the dynamic tag generation module further comprising at least one of the following sub-modules:
the fragmentation parameter determination submodule determines the fragmentation parameter of the first data according to the number of the data which can be calculated by the operation module for one time;
a filling parameter determining submodule for determining a filling parameter of the first data according to the slicing parameter and the dimension value of the lowest dimension in the dimension sequence of the dynamic data;
a data size determination submodule for determining the data size of the first data according to all the dimension values of the first data and the filling parameter,
wherein the dimension value of the lowest dimension in the dynamic data dimension order is determined according to the static tag information.
Clause a24. the apparatus of clause a20, further comprising:
the tag information storage module stores the tag information of the first data into a tag storage space;
the static label information is stored in a static label storage space in the label storage space;
and the dynamic label information is stored in a dynamic label storage space in the label storage space.
Clause a25. the apparatus of clause a20, the data storage module, comprising:
the space application sub-module is used for applying a first data storage space for storing the first data from the data storage space according to the dynamic tag information when the tag information comprises the dynamic tag information;
and storing the first data into the first data storage space.
Clause a26. the apparatus of clause a25, the data storage submodule, comprising:
the state determining submodule is used for judging whether the current data state of the first data is consistent with the dynamic label information or not, wherein the data state comprises the data type of the first data, the sequence of data dimensions and dimension values;
the data conversion sub-module is used for converting the first data according to the static tag information and the dynamic tag information when the current data state of the first data is inconsistent with the dynamic tag information to obtain converted first data, and the data state of the converted first data is consistent with the dynamic tag information;
and the storage submodule stores the converted first data into the first data storage space.
Clause a27. an electronic device comprising the data processing apparatus of any one of clauses a14 to clause a26.
Clause a28. a card, the card comprising: a storage device, an interface device and a control device and a data processing apparatus as set forth in any of clauses a14 to clause a 26;
wherein the data processing device is connected with the storage device, the control device and the interface device respectively;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the data processing device and external equipment;
the control device is used for monitoring the state of the data processing device.
Clause a29. the board of clause a28,
the memory device includes: a plurality of groups of memory cells, each group of memory cells connected with the data processing device through a bus, the memory cells are: DDR SDRAM;
the data processing apparatus includes: the DDR controller is used for controlling data transmission and data storage of each memory unit;
the interface device is as follows: a standard PCIE interface.
Clause a30. a non-transitory computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the data processing method of any one of clauses a1 to a13.
The embodiments of the present disclosure have been described in detail, and the principles and embodiments of the present disclosure are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present disclosure. Meanwhile, a person skilled in the art should, based on the idea of the present disclosure, change or modify the specific embodiments and application scope of the present disclosure. In view of the above, the description is not intended to limit the present disclosure.

Claims (16)

1. A data processing method, applied to a processor, the method comprising:
acquiring first data, wherein the first data is used for carrying out neural network operation;
determining label information of the first data, and storing the label information to a data node corresponding to the first data in a computational graph of the neural network;
storing the first data into a data storage space according to the tag information,
wherein the label information comprises static label information used for representing information related to the participation of the first data in the neural network operation,
the tag information further includes dynamic tag information, the dynamic tag information including at least one of: dynamic data type, dynamic data dimension order, fragmentation parameter, padding parameter, and data size,
the method further comprises the following steps:
and determining the dynamic data dimension sequence of the first data according to the algorithm characteristics corresponding to the target operation participated by the first data.
2. The method of claim 1, wherein the determining the tag information of the first data comprises:
determining static label information of the first data according to the neural network corresponding to the first data and/or the acquired input information of the first data,
wherein the static tag information comprises at least one of: a data category, a static data type, a static data dimension order, and a dimension value corresponding to each static data dimension.
3. The method according to claim 2, wherein determining static tag information of the first data according to a neural network corresponding to the first data and/or the acquired input information of the first data includes:
determining a data category of the first data according to the first data corresponding to the out-degree and in-degree of the neural network and the operation in the neural network in which the first data participates;
wherein the data categories include any of: instructions, input neurons, output neurons, hidden neurons, constant neurons, input weights, output weights, constant weights, and auxiliary data.
4. The method of claim 3, wherein determining the data class of the first data according to the first data corresponding to out-degree, in-degree of the neural network and operations in the neural network in which the first data participates comprises:
acquiring a to-be-marked calculation graph corresponding to the neural network, wherein the to-be-marked calculation graph comprises operation nodes, non-category data nodes, and connection relations between the non-category data nodes and the operation nodes;
traversing the to-be-marked calculation graph to obtain information of all non-category data nodes and information of operation nodes in the to-be-marked calculation graph, wherein the information of the non-category data nodes comprises the first data, and a previous operation node and a subsequent operation node corresponding to the first data, and the information of the operation nodes comprises parameters corresponding to the operation of the operation nodes, corresponding input data nodes and output data nodes;
sequentially storing the traversed information of the non-category data nodes in a data node queue, and sequentially storing the traversed information of the operation nodes in an operation node queue;
and determining the data type of the first data in the corresponding non-type data node according to the information of the non-type data node, and deleting the non-type data node of which the corresponding data type is determined from the data node queue.
5. The method of claim 4, wherein determining the data class of the first data in the corresponding non-class data node according to the information of the non-class data node comprises:
determining the degree of entry of the non-category data node according to the number of the previous operation nodes connected with the non-category data node;
determining the out-degree of the non-category data node according to the number of subsequent operation nodes connected with the non-category data node;
determining the undetermined data type of the first data in the non-type data node according to the previous operation node and the subsequent operation node of the non-type data node;
and determining the data type of the first data in the non-category data node from the undetermined data type according to the in-degree and out-degree of the non-category data node.
6. The method of claim 4, wherein determining the data class of the first data according to the first data corresponding to out-degrees, in-degrees, and operations in the neural network in which the first data participates of the neural network further comprises:
and storing the determined data types into corresponding non-type data nodes in the calculation graph to be marked to form data nodes with static label information, so as to obtain the marked calculation graph.
7. The method of any of claims 1-6, wherein the dynamic label information is used to characterize information of the first data relating to a processor running the neural network;
wherein determining the tag information of the first data comprises:
and generating dynamic label information of the first data according to the processor and the static label information.
8. The method of claim 4, wherein generating dynamic tag information for the first data based on the processor and the static tag information comprises:
determining an operation module for executing the target operation from the processor according to the target operation participated by the first data, wherein the target operation comprises a previous operation and/or a subsequent operation participated by the first data;
and determining the dynamic data type of the first data according to the operation module.
9. The method of claim 8, wherein generating dynamic tag information for the first data based on the processor and the static tag information comprises at least one of:
determining the slicing parameter of the first data according to the single calculable data number of the operation module;
determining a filling parameter of the first data according to the slicing parameter and the dimension value of the lowest dimension in the dimension sequence of the dynamic data;
determining the data size of the first data according to all dimension values of the first data and the filling parameter,
wherein the dimension value of the lowest dimension in the dynamic data dimension order is determined according to the static tag information.
10. The method of claim 7, further comprising:
storing the label information of the first data into a label storage space;
the static label information is stored in a static label storage space in the label storage space;
and the dynamic label information is stored in a dynamic label storage space in the label storage space.
11. The method of claim 7, wherein storing the first data in a data storage space according to the tag information comprises:
when the tag information contains dynamic tag information, applying for a first data storage space for storing the first data from the data storage space according to the dynamic tag information;
and storing the first data into the first data storage space.
12. The method of claim 11, wherein storing the first data in the first data storage space comprises:
judging whether the current data state of the first data is consistent with the dynamic label information or not, wherein the data state comprises the data type of the first data, the sequence of data dimensions and a dimension value;
when the current data state of the first data is inconsistent with the dynamic tag information, converting the first data according to the static tag information and the dynamic tag information to obtain converted first data, wherein the data state of the converted first data is consistent with the dynamic tag information;
and storing the converted first data into the first data storage space.
13. A data processing apparatus, for use with a processor, the apparatus comprising:
the data acquisition module is used for acquiring first data, and the first data is used for carrying out neural network operation;
the label information determining module is used for determining label information of the first data and storing the label information to a data node corresponding to the first data in a computational graph of the neural network;
a data storage module for storing the first data into a data storage space according to the tag information,
wherein the label information comprises static label information used for representing information related to the participation of the first data in the neural network operation,
the tag information further includes dynamic tag information, the dynamic tag information including at least one of: dynamic data type, dynamic data dimension order, fragmentation parameter, padding parameter, and data size,
the device further comprises:
and the dimension order determining submodule is used for determining the dimension order of the dynamic data of the first data according to the algorithm characteristics corresponding to the target operation in which the first data participates.
14. An electronic device, characterized in that the electronic device comprises the data processing apparatus of claim 13.
15. The utility model provides a board card, its characterized in that, the board card includes: memory means, interface means and control means and data processing apparatus according to claim 13;
wherein the data processing device is connected with the storage device, the control device and the interface device respectively;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the data processing device and external equipment;
the control device is used for monitoring the state of the data processing device.
16. A non-transitory computer-readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the data processing method of any one of claims 1 to 12.
CN201910899897.9A 2019-09-23 2019-09-23 Data processing method, data processing device, computer equipment and storage medium Active CN110555522B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910899897.9A CN110555522B (en) 2019-09-23 2019-09-23 Data processing method, data processing device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910899897.9A CN110555522B (en) 2019-09-23 2019-09-23 Data processing method, data processing device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110555522A CN110555522A (en) 2019-12-10
CN110555522B true CN110555522B (en) 2021-05-14

Family

ID=68741050

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910899897.9A Active CN110555522B (en) 2019-09-23 2019-09-23 Data processing method, data processing device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110555522B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112650775B (en) * 2020-12-30 2024-01-05 深圳云天励飞技术股份有限公司 Data searching method and device, electronic equipment and storage medium
CN114091085B (en) * 2022-01-10 2022-04-15 北京一流科技有限公司 Data access control system for binary operation and method thereof

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650922B (en) * 2016-09-29 2019-05-03 清华大学 Hardware neural network conversion method, computing device, software and hardware cooperative system
CN108268466B (en) * 2016-12-30 2020-11-06 广东精点数据科技股份有限公司 Webpage ordering method and device based on neural network model
US11551064B2 (en) * 2018-02-08 2023-01-10 Western Digital Technologies, Inc. Systolic neural network engine capable of forward propagation
CN110196735A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 A kind of computing device and Related product
US10996929B2 (en) * 2018-03-15 2021-05-04 Regents Of The University Of Minnesota High quality down-sampling for deterministic bit-stream computing
CN109657782B (en) * 2018-12-14 2020-10-27 安徽寒武纪信息科技有限公司 Operation method, device and related product
CN109635948A (en) * 2018-12-19 2019-04-16 北京达佳互联信息技术有限公司 On-line training method, apparatus, system and computer readable storage medium
CN110187965B (en) * 2019-05-08 2021-02-12 深圳大学 Operation optimization and data processing method and device of neural network and storage medium
CN110263925B (en) * 2019-06-04 2022-03-15 电子科技大学 Hardware acceleration implementation device for convolutional neural network forward prediction based on FPGA
CN110458285B (en) * 2019-08-14 2021-05-14 中科寒武纪科技股份有限公司 Data processing method, data processing device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110555522A (en) 2019-12-10

Similar Documents

Publication Publication Date Title
CN110096310B (en) Operation method, operation device, computer equipment and storage medium
CN110119807B (en) Operation method, operation device, computer equipment and storage medium
CN110555522B (en) Data processing method, data processing device, computer equipment and storage medium
CN110647981B (en) Data processing method, data processing device, computer equipment and storage medium
CN110458285B (en) Data processing method, data processing device, computer equipment and storage medium
CN112084023A (en) Data parallel processing method, electronic equipment and computer readable storage medium
CN111047005A (en) Operation method, operation device, computer equipment and storage medium
CN110458286B (en) Data processing method, data processing device, computer equipment and storage medium
CN111260070B (en) Operation method, device and related product
CN109542837B (en) Operation method, device and related product
CN109558565B (en) Operation method, device and related product
CN111061507A (en) Operation method, operation device, computer equipment and storage medium
CN115373646A (en) Information expansion method, device and related product
CN112395008A (en) Operation method, operation device, computer equipment and storage medium
CN111047030A (en) Operation method, operation device, computer equipment and storage medium
CN111353124A (en) Operation method, operation device, computer equipment and storage medium
CN112395009A (en) Operation method, operation device, computer equipment and storage medium
CN111353125B (en) Operation method, operation device, computer equipment and storage medium
CN111382850A (en) Operation method, device and related product
CN111339060B (en) Operation method, device, computer equipment and storage medium
CN111290788B (en) Operation method, operation device, computer equipment and storage medium
CN112396169B (en) Operation method, device, computer equipment and storage medium
CN112395002B (en) Operation method, device, computer equipment and storage medium
CN111124497B (en) Operation method, operation device, computer equipment and storage medium
CN111290789B (en) Operation method, operation device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Applicant after: Zhongke Cambrian Technology Co., Ltd

Address before: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Applicant before: Beijing Zhongke Cambrian Technology Co., Ltd.

GR01 Patent grant
GR01 Patent grant