CN117610623A

CN117610623A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN117610623A
Application number: CN202311475799.5A
Authority: CN
Inventors: 帅晋; 林砚; 张亚林
Original assignee: Shanghai Suiyuan Technology Co ltd
Current assignee: Shanghai Suiyuan Technology Co ltd
Priority date: 2023-11-07
Filing date: 2023-11-07
Publication date: 2024-02-27

Abstract

The invention discloses a data processing method, a device, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence, wherein the method comprises the following steps: the shape deriving unit responds to the first input data acquired by the first calculating unit, and acquires a first output shape according to a first input shape of the first input data and a data calculating mode of the first calculating unit; the first output shape is sent to the second computing unit, so that the second computing unit calculates second output data according to the first output shape and the first output data of the first computing unit. According to the technical scheme, scalar calculation is completed through independent hardware design, the design complexity of the hardware structure of each calculation unit is reduced, the pipeline design of the calculation process is optimized, and meanwhile, the calculation time consumption of each calculation unit is reduced through tensor calculation and scalar calculation which are executed in parallel, so that the data processing speed of the neural network model is greatly improved.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a data processing method, apparatus, electronic device, and storage medium.

Background

With the continuous development of deep learning and neural network technology, the shape diversity of input data determines a neural network model based on deep learning, and more flexible data processing capability is required to be provided, namely, the input data with different shapes is adapted.

In the prior art, a neural network model based on deep learning is generally adapted to a dynamic shape by constructing a dynamic graph, and after a calculation unit in the model obtains a data calculation result (i.e. obtains output data) through tensor calculation, a shape derivation of backward output (i.e. obtaining an output shape) is also required to be performed through scalar calculation so as to send the output data and the output shape together to a next calculation unit.

However, such a shape derivation greatly increases the design complexity of the hardware structure of the computation unit in the model, and at the same time increases the computation time consumption of each computation unit, thereby reducing the data processing efficiency of the neural network model based on deep learning.

Disclosure of Invention

The invention provides a data processing method, a data processing device, electronic equipment and a storage medium, which are used for solving the problem of high design complexity of a hardware structure of a computing unit in a neural network model based on deep learning.

According to an aspect of the present invention, there is provided a data processing method applied to a shape deriving unit, including:

responding to first input data acquired by a first computing unit, and acquiring a first output shape according to a first input shape of the first input data and a data computing mode of the first computing unit;

transmitting the first output shape to a second computing unit, so that the second computing unit calculates second output data according to the first output shape and the first output data of the first computing unit; wherein the second computing unit is a downstream computing unit adjacent to the first computing unit; the shape deriving unit is in heterogeneous relation with the first calculating unit and the second calculating unit.

The obtaining a first output shape according to the first input shape of the first input data and the data calculation mode of the first calculation unit specifically includes: and acquiring a first output shape according to the data calculation mode of the first calculation unit and a first input shape of the first input data stored locally, and storing the first output shape locally.

The obtaining a first output shape according to the first input shape of the first input data and the data calculation mode of the first calculation unit further includes: acquiring predicted calculation time consumption of the first calculation unit according to a first input shape of the first input data and a data calculation mode of the first calculation unit, and acquiring matched target hardware resources according to the predicted calculation time consumption; acquiring a first output shape based on a first input shape of the first input data and a data calculation mode of the first calculation unit through the target hardware resource; wherein the target hardware resource comprises a processor or processor core.

After the first computing unit acquires the first input data, the method further includes: and acquiring a matched first output shape through a shape mapping table according to the first input shape of the first input data and the sequence identifier of the first computing unit.

The obtaining, according to the first input shape of the first input data and the sequence identifier of the first computing unit, a matched first output shape through a shape mapping table specifically includes: obtaining a matched shape mapping table according to the task type of the current computing task; wherein the task type includes at least one of a video processing task, a picture processing task, a text processing task, and a voice processing task.

The data processing method further comprises the following steps: respectively constructing a matched first type shape mapping table according to each alternative input shape; wherein, the first type shape mapping table records the mapping relation between the sequence identification of the computing unit and the output shape; or respectively constructing a matched second type shape mapping table according to the sequence identification of each calculation unit; wherein the second type shape mapping table records the mapping relation between the input shape and the output shape.

The obtaining a first output shape according to the first input shape of the first input data and the data calculation mode of the first calculation unit specifically includes: judging whether a first output shape matched with a first input shape of first input data and a sequence identifier of a first computing unit exists in a shape mapping table; if the first output shape exists in the shape mapping table, acquiring the first output shape through the shape mapping table; if it is determined that the first output shape does not exist in the shape mapping table, acquiring the first output shape according to the first input shape of the first input data and the data calculation mode of the first calculation unit, and updating the first output shape into the shape mapping table.

According to an aspect of the present invention, there is provided a data processing apparatus applied to a shape deriving unit, including:

the output shape acquisition module is used for responding to the first input data acquired by the first calculation unit and acquiring a first output shape according to the first input shape of the first input data and the data calculation mode of the first calculation unit;

an output shape transmitting module, configured to transmit the first output shape to a second computing unit, so that the second computing unit calculates second output data according to the first output shape and first output data of the first computing unit; wherein the second computing unit is a downstream computing unit adjacent to the first computing unit; the shape deriving unit is in heterogeneous relation with the first calculating unit and the second calculating unit.

According to another aspect of the present invention, there is provided an electronic apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data processing method according to any one of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute a data processing method according to any one of the embodiments of the present invention.

According to the technical scheme, when the first computing unit acquires the first input data, the shape deriving unit sends the first output shape to the second computing unit according to the first input shape of the first input data and the data computing mode of the first computing unit, so that the second computing unit calculates the second output data according to the first output shape and the first output data. The scalar calculation is completed through independent hardware design, the design complexity of the hardware structure of each calculation unit is reduced, the pipeline design of the calculation process is optimized, meanwhile, the tensor calculation and scalar calculation which are executed in series are changed into parallel execution, the calculation time consumption of each calculation unit is reduced, the data processing speed of the neural network model is greatly improved, and the idle time between calculation tasks caused by dynamic shapes is compressed to the greatest extent.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a data processing method according to a first embodiment of the present invention;

FIG. 2 is a flow chart of a data processing method according to a second embodiment of the present invention;

FIG. 3 is a flow chart of a data processing method according to a third embodiment of the present invention;

FIG. 4 is a schematic diagram of a process flow of a neural network model for dynamically inputting shapes according to a first embodiment of the present invention;

FIG. 5 is a schematic diagram of a processing flow of another neural network model for dynamically inputting shapes according to a first embodiment of the present invention;

FIG. 6 is a schematic diagram of a data processing apparatus according to a fourth embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device implementing a data processing method according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

Fig. 1 is a flowchart of a data processing method according to a first embodiment of the present invention, where the method may be performed by a data processing apparatus according to the first embodiment of the present invention, the data processing apparatus may be implemented in hardware and/or software, and the data processing apparatus may be configured in a chip, and the chip is configured in an electronic device. As shown in fig. 1, the method includes:

s101, responding to first input data acquired by a first computing unit, and acquiring a first output shape according to a first input shape of the first input data and a data computing mode of the first computing unit.

The deep learning algorithm consists of a plurality of computation units, each of which is also called an Operator (OP); in a neural network model under a deep learning framework, computing logic in the corresponding layer of computing units, such as: the convolution layer (Convolution Layer) is a computational unit (i.e., an operator); the weight summation process in the full-connected Layer (FC Layer) is also a calculation unit; optionally, in the embodiment of the present invention, the number of computing units and the types of computing units included in the neural network model are not specifically limited.

The shape deriving unit is a functional unit composed of one or more processors for deriving the shape (i.e., output shape) of the output data of each computing unit; from the aspect of hardware structure, the hardware component where the shape deducing unit is located and the hardware component where each calculating unit is located are in heterogeneous relation; when the current neural network model acquires a data processing task, the calculation units which are needed to be used at the present time and the calculation sequence among all the calculation units can be determined, namely, a calculation unit sequence is acquired; the output data of the former calculation unit is used as the input data of the latter calculation unit.

In the embodiment of the invention, the first calculating unit may be any calculating unit except for the end calculating unit in the calculating unit sequence, because the end calculating unit does not have a downstream calculating unit, the output data of the end calculating unit is the output data of the neural network model, and the shape deriving unit does not need to derive the output shape of the end calculating unit; taking the first calculation unit as an example of a calculation unit No. 3 in the sequence; the shape deriving unit has sent the derived output shape of the No. 2 computing unit to the No. 3 computing unit before the No. 2 computing unit inputs its own output data to the No. 3 computing unit.

The number 3 calculation unit calculates output data (i.e., first output data) based on the input data (i.e., first input data, i.e., output data of the number 2 calculation unit) and the input shape (i.e., first input shape, i.e., output shape of the number 2 calculation unit); meanwhile, the shape deriving unit derives the output shape (i.e., the first output shape) of the operator No. 3 according to the first input shape and the data calculation mode of the operator No. 3 by a dynamic shape deriving (Dynamic Shape Inference) technique; for example, the first input shape is a three-dimensional shape of 64 x 64 pixels, the data for operator number 3 is calculated as a matrix sum, the output shape of operator number 3 can thus be known is 64×64×64 pixels is a three-dimensional shape of (c).

Particularly, in the embodiment of the invention, the initial input data acquired by the neural network model and the intermediate calculation data generated by each calculation unit can be two-dimensional data or three-dimensional and more-than-three-dimensional multi-dimensional data; correspondingly, the shape of multi-bit data with three dimensions and more than three dimensions is a three-dimensional shape, and the shape of two-dimensional data is a plane shape; regarding the planar shape, if the dimensional information (i.e., length and width) of two planar shapes are different, they are also regarded as different shapes; for example, the planar shape a is 64×64 pixels, the planar shape B is 32×32 pixels, and both are the same shape.

S102, sending the first output shape to a second computing unit, so that the second computing unit calculates second output data according to the first output shape and the first output data of the first computing unit; wherein the second computing unit is a downstream computing unit adjacent to the first computing unit; the shape deriving unit is in heterogeneous relation with the first calculating unit and the second calculating unit.

Taking the above technical solution as an example, after deriving and acquiring the first output shape of the number 3 computing unit, the shape deriving unit sends the first output shape to the number 4 computing unit (i.e. the second computing unit) in a parameter transmission manner; the No. 3 calculation unit calculates and acquires first output data according to the acquired first input data and the first input shape, and then sends the first output data to the No. 4 calculation unit; the calculation unit No. 4 calculates output data (i.e., second output data) based on the first output data and the first output shape.

While the number 4 computing unit computes output data, the shape deriving unit derives and obtains the output shape of the number 4 computing unit according to the input shape of the number 4 computing unit and the data computing mode of the number 4 computing unit, and sends the output shape of the number 4 computing unit to the number 5 computing unit; and the No. 5 calculation unit is used for continuously executing data calculation according to the output data and the output shape of the No. 4 calculation unit until the end calculation unit in the calculation unit sequence completes the data calculation, and the neural network model completes the data calculation task.

In particular, since each calculation unit in the neural network model obtains output data by tensor calculation, and the shape deriving unit obtains output shape by scalar calculation, since the calculation amount of scalar calculation is smaller than that of tensor calculation, the shape deriving unit derives the process of obtaining the output shape of each calculation unit according to the input shape and the calculation manner of each calculation unit, and the process of obtaining output data by each calculation unit according to the input data and the input shape calculation, although the former is performed at the same time, the former is completed earlier than the latter.

In the conventional technical scheme, scalar computing functions of the shape deriving units are configured in computing units, which is equivalent to that a shape deriving module unit is configured in each computing unit; each calculation unit performs tensor calculation, and then performs scalar calculation in a serial manner to complete output shape derivation; each computing unit must have both tensor computing function and scalar computing function, which makes the design of the hardware structure of the computing unit more complex; by configuring the shape deduction units under the heterogeneous structure, the scalar calculation can adopt independent hardware design, the design complexity of each calculation unit is reduced, and the pipeline design of the calculation process is optimized.

Meanwhile, in the conventional technical scheme, since the calculation units need to perform data calculation and shape derivation in a serial manner, this results in a long time consumption of each calculation unit; the shape derivation is asynchronously executed through the shape derivation unit, so that the calculation time consumption of each calculation unit is reduced, namely, the calculation time consumption of each calculation unit is reduced from the original tensor calculation time consumption to scalar calculation time consumption, the current tensor calculation time consumption is shortened, the data processing speed of the neural network model is greatly improved, and the idle time between calculation tasks caused by dynamic shapes is maximally compressed.

In addition, compared with tensor calculation, the calculation cost of scalar calculation is larger, and in the traditional technical scheme, shape deriving units are required to be respectively configured in each calculating unit, so that the calculation cost of the whole neural network model is extremely large; by configuring the shape deducing units under heterogeneous conditions, the configuration of the shape deducing units in each computing unit is avoided, and the overall computing cost of the neural network model is greatly reduced. In particular, by abstracting two-stage heterogeneous computing modes, a scalar computing function and a tensor computing function are separated, a programming model is simplified, and programming convenience is brought for subsequent support of aot (ahead of time) operator computing operation execution.

Optionally, in an embodiment of the present invention, the obtaining a first output shape according to the first input shape of the first input data and the data calculation mode of the first calculation unit specifically includes: and acquiring a first output shape according to the data calculation mode of the first calculation unit and a first input shape of the first input data stored locally, and storing the first output shape locally.

Specifically, in the conventional technical scheme, since each shape deriving unit is distributed in different calculating units, the output shape transmission between each shape deriving unit needs to be realized by means of a global storage unit; that is, after the former shape deriving unit writes the output shape A into the global storage unit, the latter shape deriving unit reads the output shape A from the global storage unit to calculate the current output shape B according to the output shape A; in the application, the shape deducing unit directly stores nearby after scalar calculation is executed, namely the shape deducing result is stored locally, so that time delay caused by writing and reading of the shape deducing result is greatly reduced, and the shape deducing efficiency is improved.

Optionally, in an embodiment of the present invention, the obtaining a first output shape according to the first input shape of the first input data and the data calculation mode of the first calculation unit further includes: acquiring predicted calculation time consumption of the first calculation unit according to a first input shape of the first input data and a data calculation mode of the first calculation unit, and acquiring matched target hardware resources according to the predicted calculation time consumption; acquiring a first output shape based on a first input shape of the first input data and a data calculation mode of the first calculation unit through the target hardware resource; wherein the target hardware resource comprises a processor or processor core.

In particular, the difference in time for the calculation unit to perform tensor calculation is significant depending on the shape of the input data, for example, the number of the cells to be processed, when the shape of the input data is 128 x 128 pixels, the tensor calculation time is long, and when the shape of the input data is 32 x 32 pixels, the tensor calculation time is short; however, when the shape deriving unit performs scalar computation, the time difference does not exist, and the computation time is very close on the premise of using the same hardware resource; furthermore, since different calculation units perform different types of data calculations, the calculation time is not the same even for the input data of the same shape; for example, convolution computation is significantly longer time consuming than addition computation.

For the shape deriving unit, the input shape is transmitted to the next computing unit before the input data of the last computing unit is acquired by the next computing unit, so that the computing unit is ensured not to wait for acquiring the input shape after acquiring the input data; therefore, the shape deriving unit may obtain the predicted calculation time consumption of the first calculating unit according to the first input shape of the first input data and the data calculation mode of the first calculating unit, and then obtain the matched target hardware resource according to the predicted calculation time consumption.

When the shape deriving unit includes a plurality of processors, the target hardware resource may be the number of processors, i.e., the present calculation is completed using a larger number or a smaller number of processors; when the shape deriving unit comprises a multi-core processor, the target hardware resource can be the number of processor cores, namely, the number of cores is more or less, so as to finish the calculation; if the calculation time of the prediction of the first calculation unit is longer, the calculation time of the current shape derivation may also be a longer value, and only a smaller number of target hardware resources need to be used.

If the predicted computation time of the first computation unit is shorter, the computation time of the current shape derivation must also be a shorter value, and a larger amount of hardware resources are required to ensure a faster shape derivation speed. Therefore, on the premise of ensuring that the shape deducing result is timely sent to each computing unit and avoiding waiting time delay of each computing unit, reasonable utilization of hardware resources is ensured, excessive occupation of the hardware resources is avoided, and utilization rate of the hardware resources is improved.

Example two

Fig. 2 is a flowchart of a data processing method according to a second embodiment of the present invention, where the output shape can be obtained through a shape mapping table based on the first embodiment. As shown in fig. 2, the method includes:

s201, responding to the first input data acquired by the first computing unit, and acquiring a matched first output shape through a shape mapping table according to a first input shape of the first input data and a sequence identifier of the first computing unit.

The shape mapping table is a mapping relationship between the sequence identification of the pre-constructed calculation unit, the input shape and the output shape. The sequence identifier represents identity information of the computing unit; the shape of the output data of the current computing unit for the current input shape can be obtained by querying the shape mapping table. When the number of calculation units of the neural network model is smaller or the number of the initial input shapes to which the neural network model is applicable is smaller, the number of the mapping relations recorded in the shape mapping table is smaller, and compared with the dynamic shape deriving technology, the table look-up mode is obviously shorter than the time consumption of dynamic derivation, so that when the number of calculation units of the neural network model is smaller or the number of the initial input shapes to which the neural network model is applicable is smaller, the obtaining efficiency of the output shapes is improved by querying the shape mapping table, and meanwhile, the hardware resources occupied by the shape deriving units are saved.

Optionally, in an embodiment of the present invention, the obtaining, according to the first input shape of the first input data and the sequence identifier of the first computing unit, a matched first output shape through a shape mapping table specifically includes: obtaining a matched shape mapping table according to the task type of the current computing task; wherein the task type includes at least one of a video processing task, a picture processing task, a text processing task, and a voice processing task.

Specifically, different types of computing tasks, and the types of data shapes that may be acquired are also different; for example, for a video processing task, the data shape of the input neural network model is usually fixed, and only a limited number of shapes exist, so that obviously, the data volume in the video shape mapping table corresponding to the video processing task is less, and at the moment, the output shape of each computing unit can be quickly obtained by querying the video shape mapping table.

Optionally, in an embodiment of the present invention, the data processing method further includes: respectively constructing a matched first type shape mapping table according to each alternative input shape; wherein, the first type shape mapping table records the mapping relation between the sequence identification of the computing unit and the output shape; or respectively constructing a matched second type shape mapping table according to the sequence identification of each calculation unit; wherein the second type shape mapping table records the mapping relation between the input shape and the output shape.

Specifically, each input shape (i.e., an alternative input shape) that may be acquired may be used as a first query condition, and a shape mapping table (i.e., a first type shape mapping table) is respectively constructed for each alternative shape, where the first type shape mapping table reflects a mapping relationship between the computing unit and the output shape; for part of input shapes, each computing unit may acquire the same output shape for the input shape, so that by combining the first type shape mapping tables under a plurality of input shapes, the number of traversals of the shape mapping tables is reduced, the data query efficiency is improved, and the shape deriving efficiency of the shape deriving unit is further improved.

In addition, each computing unit may be used as a first query condition to construct a shape mapping table (a second type shape mapping table) for each input shape, where the second type shape mapping table reflects a mapping relationship between the input shape and the output shape; for part of the computing units, the same output shape can be acquired for the same input shape, so that the traversing number of the shape mapping tables is reduced by combining the second type shape mapping tables under the plurality of computing units, the data query efficiency is improved, and the shape deriving efficiency of the shape deriving unit is further improved.

S202, sending the first output shape to a second computing unit, so that the second computing unit calculates second output data according to the first output shape and the first output data of the first computing unit; wherein the second computing unit is a downstream computing unit adjacent to the first computing unit; the shape deriving unit is in heterogeneous relation with the first calculating unit and the second calculating unit.

According to the technical scheme of the embodiment of the invention, when the first computing unit acquires the first input data, the shape deriving unit acquires the matched first output shape through the shape mapping table according to the first input shape of the first input data and the sequence identifier of the first computing unit, so that when the number of computing units of the neural network model is small or the number of applicable initial input shapes is small, the shape mapping table is queried, the acquisition efficiency of the output shape is improved, and meanwhile, the hardware resources occupied by the shape deriving unit are saved.

Example III

Fig. 3 is a flowchart of a data processing method according to a third embodiment of the present invention, where, in the third embodiment, when a first computing unit obtains first input data, a shape deriving unit first determines whether a first output shape matching a first input shape of the first input data and a sequence identifier of the first computing unit exists in a shape mapping table. As shown in fig. 3, the method includes:

s301, in response to the first computing unit obtaining the first input data, judging whether a first output shape matched with a first input shape of the first input data and a sequence identifier of the first computing unit exists in the shape mapping table.

S302, if the first output shape exists in the shape mapping table, acquiring the first output shape through the shape mapping table.

S303, if it is determined that the first output shape does not exist in the shape mapping table, acquiring the first output shape according to the first input shape of the first input data and the data calculation mode of the first calculation unit, and updating the first output shape into the shape mapping table.

If the first output shape exists in the shape mapping table, the first output shape is directly acquired according to the shape mapping table, so that the acquisition efficiency of the output shape is improved, and the hardware resources occupied by the shape deriving unit are saved; if the first output shape does not exist in the shape mapping table, the first output shape is obtained according to the first input shape of the first input data and the data calculation mode of the first calculation unit, and the first output shape is updated to the shape mapping table, so that when the first output shape does not exist in the shape mapping table, the matched output shape is obtained through a dynamic shape derivation technology, and the integrity of a shape derivation result is ensured.

S304, the first output shape is sent to a second computing unit, so that the second computing unit calculates second output data according to the first output shape and the first output data of the first computing unit; wherein the second computing unit is a downstream computing unit adjacent to the first computing unit; the shape deriving unit is in heterogeneous relation with the first calculating unit and the second calculating unit.

According to the technical scheme, when the first computing unit acquires first input data, if the first computing unit judges that a first output shape exists in the shape mapping table, the shape deducing unit directly acquires the first output shape through the shape mapping table, and if the first output shape does not exist in the shape mapping table, the first output shape is acquired according to the first input shape and the data computing mode of the first computing unit, and the first output shape is updated into the shape mapping table. The method not only realizes the rapid acquisition of the output shape through the shape mapping table and saves the hardware resources occupied by the shape deducing unit, but also realizes the acquisition of the matched output shape through the dynamic shape deducing technology and ensures the integrity of the shape deducing result.

Specific application scenario 1

Fig. 4 and fig. 5 are schematic views of a processing flow of a neural network model for a dynamic input shape in a conventional technical solution, and a processing flow of a neural network model for a dynamic input shape in an embodiment of the present invention; fig. 4 and 5 each take a neural network model as an example, which includes five calculation units, i.e., calculation units No. 1 to No. 5.

As shown in fig. 4, in the conventional technical solution, a shape deriving unit needs to be inserted between any two computing units with data transmission relationships, so that the shape deriving unit derives the input shape of the former operator according to the output data of the latter operator.

As shown in fig. 5, in the neural network model according to the embodiment of the present invention, initial input data of a current data processing task may reach a No. 1 calculation unit in the neural network model; meanwhile, when the initial input data is acquired, the shape deducing unit sends the shape of the initial input data (namely the initial input shape) to the No. 1 computing unit in a parameter transmission mode; in particular, since the calculation unit No. 1 does not have an upstream calculation unit, the shape deriving unit can obtain the initial input shape directly according to the initial input data at this time, and does not need to perform shape derivation, so that the actual time consumption of the process is extremely short; the calculation unit No. 1 may be regarded as acquiring initial input data and an initial input shape at the same time, and performing corresponding data calculation based on the acquired initial input data and initial input shape.

While the computing unit 1 obtains the initial input data and the initial input shape and performs data computation based on the initial input data and the initial input shape, the shape deriving unit derives and obtains the output shape of the computing unit 1 based on the initial input shape and the data computation mode of the computing unit 1 through a dynamic shape deriving (Dynamic Shape Inference) technology; since there are two downstream computing units, i.e., computing unit No. 2 and computing unit No. 3, the shape deriving unit needs to send the output shape of computing unit No. 1 to computing unit No. 2 and computing unit No. 3.

The No. 2 computing unit continuously acquires output data and transmits the output data to the No. 4 computing unit according to the output data and the output shape of the No. 1 computing unit; meanwhile, the shape deriving unit also transmits the output shape of the calculation unit No. 2 to the calculation unit No. 4.

Similarly, the No. 3 computing unit continuously acquires output data and transmits the output data to the No. 4 computing unit and the No. 5 computing unit according to the output data and the output shape of the No. 1 computing unit; meanwhile, the shape deriving unit also transmits the output shape of the calculation unit No. 3 to the calculation units No. 4 and No. 5.

The No. 5 computing unit obtains output data according to the output data and the output shape of the No. 4 computing unit and the output data and the output shape of the No. 3 computing unit; the output data is the final output data of the neural network model.

According to the technical scheme, scalar calculation is completed through independent hardware design, so that design complexity of hardware structures of all calculation units is reduced, pipeline design of a calculation process is optimized, simultaneously, tensor calculation and scalar calculation which are executed in series are changed into parallel execution, calculation time consumption of all calculation units is reduced, data processing speed of a neural network model is greatly improved, and idle time among calculation tasks caused by dynamic shapes is compressed to the greatest extent.

Example IV

Fig. 6 is a block diagram of a data processing apparatus according to a fourth embodiment of the present invention, where the data processing apparatus specifically includes:

an output shape obtaining module 601, configured to obtain a first output shape according to a first input shape of first input data and a data calculation mode of the first calculating unit in response to the first calculating unit obtaining the first input data;

An output shape sending module 602, configured to send the first output shape to a second computing unit, so that the second computing unit calculates second output data according to the first output shape and first output data of the first computing unit; wherein the second computing unit is a downstream computing unit adjacent to the first computing unit; the shape deriving unit is in heterogeneous relation with the first calculating unit and the second calculating unit.

Optionally, the output shape obtaining module 601 is specifically configured to obtain a first output shape according to the data calculation mode of the first calculating unit and a first input shape of the first input data that is stored locally, and store the first output shape locally.

Optionally, the output shape obtaining module 601 is specifically configured to obtain, according to a first input shape of the first input data and a data calculation manner of the first calculation unit, a predicted calculation time consumption of the first calculation unit, and obtain, according to the predicted calculation time consumption, a matched target hardware resource; acquiring a first output shape based on a first input shape of the first input data and a data calculation mode of the first calculation unit through the target hardware resource; wherein the target hardware resource comprises a processor or processor core.

Optionally, the output shape obtaining module 601 is further configured to obtain, according to the first input shape of the first input data and the sequence identifier of the first computing unit, a matched first output shape through a shape mapping table.

Optionally, the output shape obtaining module 601 is further configured to obtain a matched shape mapping table according to a task type of a current computing task; wherein the task type includes at least one of a video processing task, a picture processing task, a text processing task, and a voice processing task.

Optionally, the data processing apparatus further includes:

the shape mapping table construction module is used for respectively constructing a matched first type shape mapping table according to each alternative input shape; wherein, the first type shape mapping table records the mapping relation between the sequence identification of the computing unit and the output shape; or respectively constructing a matched second type shape mapping table according to the sequence identification of each calculation unit; wherein the second type shape mapping table records the mapping relation between the input shape and the output shape.

Optionally, the output shape obtaining module 601 is further configured to determine whether a first output shape matching the first input shape of the first input data and the sequence identifier of the first computing unit exists in the shape mapping table; if the first output shape exists in the shape mapping table, acquiring the first output shape through the shape mapping table; if it is determined that the first output shape does not exist in the shape mapping table, acquiring the first output shape according to the first input shape of the first input data and the data calculation mode of the first calculation unit, and updating the first output shape into the shape mapping table.

The data processing device provided by the invention can execute the data processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method. Technical details not described in detail in this embodiment may be referred to the data processing method provided in any embodiment of the present invention.

Example five

Fig. 7 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 7, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as data processing methods.

In some embodiments, the data processing method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as a storage unit. In some embodiments, part or all of the computer program may be loaded and/or installed onto the heterogeneous hardware accelerator via the ROM and/or the communication unit. One or more of the steps of the data processing method described above may be performed when the computer program is loaded into RAM and executed by a processor. Alternatively, in other embodiments, the processor may be configured to perform the data processing method in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a heterogeneous hardware accelerator having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or a trackball) through which a user can provide input to the heterogeneous hardware accelerator. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A data processing method, applied to a shape deriving unit, comprising:

2. The method according to claim 1, wherein the obtaining a first output shape according to the first input shape of the first input data and the data calculation mode of the first calculation unit specifically includes:

and acquiring a first output shape according to the data calculation mode of the first calculation unit and a first input shape of the first input data stored locally, and storing the first output shape locally.

3. The method according to claim 1, wherein the obtaining a first output shape according to the first input shape of the first input data and the data calculation manner of the first calculation unit further includes:

acquiring predicted calculation time consumption of the first calculation unit according to a first input shape of the first input data and a data calculation mode of the first calculation unit, and acquiring matched target hardware resources according to the predicted calculation time consumption;

acquiring a first output shape based on a first input shape of the first input data and a data calculation mode of the first calculation unit through the target hardware resource; wherein the target hardware resource comprises a processor or processor core.

4. The method of claim 1, further comprising, after the first computing unit obtains the first input data:

and acquiring a matched first output shape through a shape mapping table according to the first input shape of the first input data and the sequence identifier of the first computing unit.

5. The method according to claim 4, wherein the obtaining, according to the first input shape of the first input data and the sequence identifier of the first computing unit, the matched first output shape through the shape mapping table specifically includes:

obtaining a matched shape mapping table according to the task type of the current computing task; wherein the task type includes at least one of a video processing task, a picture processing task, a text processing task, and a voice processing task.

6. The method of claim 4, wherein the data processing method further comprises:

respectively constructing a matched first type shape mapping table according to each alternative input shape; wherein, the first type shape mapping table records the mapping relation between the sequence identification of the computing unit and the output shape;

Or respectively constructing a matched second type shape mapping table according to the sequence identification of each calculation unit; wherein the second type shape mapping table records the mapping relation between the input shape and the output shape.

7. The method according to claim 1, wherein the obtaining a first output shape according to the first input shape of the first input data and the data calculation mode of the first calculation unit specifically includes:

judging whether a first output shape matched with a first input shape of first input data and a sequence identifier of a first computing unit exists in a shape mapping table;

if the first output shape exists in the shape mapping table, acquiring the first output shape through the shape mapping table;

if it is determined that the first output shape does not exist in the shape mapping table, acquiring the first output shape according to the first input shape of the first input data and the data calculation mode of the first calculation unit, and updating the first output shape into the shape mapping table.

8. A data processing apparatus, characterized by being applied to a shape deriving unit, comprising:

9. An electronic device, the electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data processing method of any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer instructions for causing a processor to implement the data processing method of any one of claims 1-7 when executed.