CN112740174B - Data processing method, device, electronic equipment and computer readable storage medium - Google Patents

Data processing method, device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN112740174B
CN112740174B CN201880097405.8A CN201880097405A CN112740174B CN 112740174 B CN112740174 B CN 112740174B CN 201880097405 A CN201880097405 A CN 201880097405A CN 112740174 B CN112740174 B CN 112740174B
Authority
CN
China
Prior art keywords
layer
task
current
data
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201880097405.8A
Other languages
Chinese (zh)
Other versions
CN112740174A (en
Inventor
蒋国跃
张力
袁航剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bitmain Technologies Inc
Original Assignee
Bitmain Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bitmain Technologies Inc filed Critical Bitmain Technologies Inc
Publication of CN112740174A publication Critical patent/CN112740174A/en
Application granted granted Critical
Publication of CN112740174B publication Critical patent/CN112740174B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A data processing method, apparatus, electronic device and computer readable storage medium, the method comprising: acquiring a cyclic processing parameter, wherein the cyclic processing parameter comprises cyclic processing times and the number of data processing layers required during each cyclic processing; acquiring a layered data processing task, wherein the layered data processing task comprises a layered calculation task and a layered data migration task; and carrying out cyclic processing on the layered data processing task according to the cyclic processing parameters. According to the technical scheme, the data processing efficiency is high, and the time expenditure can be reduced to the greatest extent on the premise of ensuring the calculation quality, so that the calculation and storage resources can be effectively saved.

Description

Data processing method, device, electronic equipment and computer readable storage medium
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to a data processing method, a data processing device, electronic equipment and a computer readable storage medium.
Background
With the development of science and technology, in recent years, a neural network processor has become one of the most hot research fields. The academic and industrial circles all put forward neural network processors with respective architectures, and the neural network processors mainly run algorithms such as convolutional neural networks and recurrent neural networks, which have great success in the fields of image, video and voice recognition. However, the trained neural network has a large number of coefficients, which makes the neural network processor not only consider the usage of the arithmetic unit, but also consider the overhead generated by data handling when performing performance optimization, because the preparation of data is a precondition for computation. To reduce data handling overhead, it is currently common practice to prefetch data required for the next round of computation while computing. However, in the prior art, a manual program is often required to achieve the purpose, and the calculation in the same layer of the neural network is achieved in parallel with the data prefetching. Therefore, the prior art scheme has low efficiency, can not adapt to the endlessly formed neural network structure, and has greatly limited optimization of performance.
Disclosure of Invention
The embodiment of the invention provides a data processing method, a data processing device, electronic equipment and a computer readable storage medium.
In a first aspect, an embodiment of the present invention provides a data processing method.
Specifically, the data processing method includes:
acquiring a cyclic processing parameter, wherein the cyclic processing parameter comprises cyclic processing times and the number of data processing layers required during each cyclic processing;
acquiring a layered data processing task, wherein the layered data processing task comprises a layered calculation task and a layered data migration task;
and carrying out cyclic processing on the layered data processing task according to the cyclic processing parameters.
With reference to the first aspect, in a first implementation manner of the first aspect, the hierarchical data migration task includes input data to be migrated of a next layer of computing task in a current cycle; or,
the layered data migration task comprises to-be-migrated input data of a next-layer computing task in a current cycle and to-be-migrated output data of a last-layer computing task in a last cycle; or,
the layered data migration task comprises input data to be migrated of a first-layer computing task in the next cycle.
With reference to the first aspect and the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the acquiring a hierarchical data processing task includes:
determining whether the current cycle is a first cycle, an intermediate cycle or a last cycle;
when the current cycle is the first cycle, determining whether the current data processing layer number is the first layer, the middle layer or the last layer;
when the current data processing layer number is the first layer, acquiring a current layer computing task, and acquiring a current layer input data migration task and a next layer input data migration task as current layer data migration tasks; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a first layer input data migration task in the next cycle as a current layer data migration task;
when the current cycle is an intermediate cycle, determining whether the current data processing layer number is a first layer, an intermediate layer or a last layer;
when the current data processing layer number is the first layer, acquiring a current layer calculation task, and acquiring a next layer input data migration task and a last layer output data migration task in the last cycle as the current layer data migration task; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a first layer input data migration task in the next cycle as a current layer data migration task;
When the current cycle is the last cycle, determining whether the current data processing layer number is a first layer, an intermediate layer or a last layer;
when the current data processing layer number is the first layer, acquiring a current layer calculation task, and acquiring a next layer input data migration task and a last layer output data migration task in the last cycle as the current layer data migration task; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a current layer output data migration task as a current layer data migration task.
With reference to the first aspect, the first implementation manner of the first aspect, and the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the performing, according to the loop processing parameter, loop processing on the hierarchical data processing task includes:
determining whether the current data processing layer number is a first data processing layer of a first cycle or a last data processing layer of a last cycle;
when the current data processing layer number is the first data processing layer of the first cycle, firstly processing a current layer input data migration task in a current layer data migration task, and then processing a next layer input data migration task and a current layer calculation task in the current layer data migration task in parallel;
When the current data processing layer number is the last data processing layer of the last cycle, sequentially processing a current layer calculation task and a current layer data migration task;
and when the current data processing layer number is not the first data processing layer of the first cycle and the last data processing layer of the last cycle, processing the current layer calculation task and the current layer data migration task in parallel.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, and the third implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the disclosure further includes:
and optimizing the cyclic processing of the hierarchical data processing task according to the time overhead of the hierarchical data processing task.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, and the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the disclosure optimizes the loop processing of the layered data processing task according to the time overhead of the layered data processing task, including:
acquiring the calculation time cost of the layered calculation task and the data migration time cost of the layered data migration task;
Determining an adjustable range of the layered data migration task;
and adjusting the execution time of the layered data processing task according to the calculation time cost of the layered calculation task, the data migration time cost of the layered data migration task and the adjustable range of the layered data migration task, so that the total time cost of the data processing is minimum.
In a second aspect, an embodiment of the present invention provides a data processing apparatus.
Specifically, the data processing apparatus includes:
the first acquisition module is configured to acquire cyclic processing parameters, wherein the cyclic processing parameters comprise cyclic processing times and the number of data processing layers required during each cyclic processing;
the system comprises a first acquisition module, a second acquisition module and a control module, wherein the first acquisition module is configured to acquire a hierarchical data processing task, and the hierarchical data processing task comprises a hierarchical calculation task and a hierarchical data migration task;
and the processing module is configured to perform cyclic processing on the layered data processing task according to the cyclic processing parameters.
With reference to the second aspect, in a first implementation manner of the second aspect, the hierarchical data migration task includes to-be-migrated input data of a next-layer computing task in a current cycle; or,
The layered data migration task comprises to-be-migrated input data of a next-layer computing task in a current cycle and to-be-migrated output data of a last-layer computing task in a last cycle; or,
the layered data migration task comprises input data to be migrated of a first-layer computing task in the next cycle.
With reference to the second aspect and the first implementation manner of the second aspect, in a second implementation manner of the second aspect, the second obtaining module is configured to:
determining whether the current cycle is a first cycle, an intermediate cycle or a last cycle;
when the current cycle is the first cycle, determining whether the current data processing layer number is the first layer, the middle layer or the last layer;
when the current data processing layer number is the first layer, acquiring a current layer computing task, and acquiring a current layer input data migration task and a next layer input data migration task as current layer data migration tasks; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a first layer input data migration task in the next cycle as a current layer data migration task;
When the current cycle is an intermediate cycle, determining whether the current data processing layer number is a first layer, an intermediate layer or a last layer;
when the current data processing layer number is the first layer, acquiring a current layer calculation task, and acquiring a next layer input data migration task and a last layer output data migration task in the last cycle as the current layer data migration task; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a first layer input data migration task in the next cycle as a current layer data migration task;
when the current cycle is the last cycle, determining whether the current data processing layer number is a first layer, an intermediate layer or a last layer;
when the current data processing layer number is the first layer, acquiring a current layer calculation task, and acquiring a next layer input data migration task and a last layer output data migration task in the last cycle as the current layer data migration task; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a current layer output data migration task as a current layer data migration task.
With reference to the second aspect, the first implementation manner of the second aspect, and the second implementation manner of the second aspect, in a third implementation manner of the second aspect, the processing module is configured to:
determining whether the current data processing layer number is a first data processing layer of a first cycle or a last data processing layer of a last cycle;
when the current data processing layer number is the first data processing layer of the first cycle, firstly processing a current layer input data migration task in a current layer data migration task, and then processing a next layer input data migration task and a current layer calculation task in the current layer data migration task in parallel;
when the current data processing layer number is the last data processing layer of the last cycle, sequentially processing a current layer calculation task and a current layer data migration task;
and when the current data processing layer number is not the first data processing layer of the first cycle and the last data processing layer of the last cycle, processing the current layer calculation task and the current layer data migration task in parallel.
With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, and the third implementation manner of the second aspect, in a fourth implementation manner of the second aspect, the disclosure further includes:
And the optimizing module is configured to optimize the cyclic processing of the layered data processing task according to the time overhead of the layered data processing task.
With reference to the second aspect, the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, and the fourth implementation manner of the second aspect, in a fifth implementation manner of the second aspect, the optimizing module includes:
the acquisition sub-module is configured to acquire the calculation time cost of the layered calculation task and the data migration time cost of the layered data migration task;
a determination submodule configured to determine a tunable range of the hierarchical data migration task;
and the adjustment sub-module is configured to adjust the execution time of the layered data processing task according to the calculation time cost of the layered calculation task, the data migration time cost of the layered data migration task and the adjustable range of the layered data migration task so as to minimize the total time cost of the data processing.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory for storing one or more computer instructions for supporting a data processing apparatus to perform the data processing method of the first aspect, and a processor configured to execute the computer instructions stored in the memory. The data processing apparatus may further comprise a communication interface for the data processing apparatus to communicate with other devices or a communication network.
In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium for storing computer instructions for use by a data processing apparatus, including computer instructions for performing the data processing method of the first aspect as described above for a data processing apparatus.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects:
according to the technical scheme, the cyclic cross-layer data automatic migration mechanism is adopted, and the calculation task and the data migration task can be organically combined together for processing. According to the technical scheme, the data processing efficiency is high, and the time expenditure can be reduced to the greatest extent on the premise of ensuring the calculation quality, so that the calculation and storage resources can be effectively saved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of embodiments of the invention.
Drawings
Other features, objects and advantages of embodiments of the present invention will become more apparent from the following detailed description of non-limiting embodiments, taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 shows a flow chart of a data processing method according to an embodiment of the invention;
FIG. 2 illustrates a prior art distribution diagram of hierarchical computing tasks and hierarchical data migration tasks;
FIG. 3 illustrates a schematic allocation of hierarchical computing tasks and hierarchical data migration tasks in accordance with an embodiment of the present invention;
FIG. 4 shows a flow chart of a data processing method according to another embodiment of the invention;
fig. 5 shows a flowchart of step S404 of the data processing method according to the embodiment shown in fig. 4;
FIG. 6 shows a block diagram of a data processing apparatus according to an embodiment of the present invention;
FIG. 7 shows a block diagram of a data processing apparatus according to another embodiment of the present invention;
FIG. 8 shows a flow chart of an optimization module 704 of the data processing apparatus according to the embodiment shown in FIG. 7;
FIG. 9 shows a block diagram of an electronic device according to an embodiment of the invention;
fig. 10 is a schematic diagram of a computer system suitable for use in implementing a data processing method according to an embodiment of the invention.
Detailed Description
Hereinafter, exemplary implementations of embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. In addition, for the sake of clarity, portions irrelevant to description of the exemplary embodiments are omitted in the drawings.
In embodiments of the invention, it should be understood that terms such as "comprises" or "comprising," etc., are intended to indicate the presence of features, numbers, steps, acts, components, portions, or combinations thereof disclosed in the specification, and are not intended to exclude the possibility that one or more other features, numbers, steps, acts, components, portions, or combinations thereof are present or added.
In addition, it should be noted that, without conflict, the embodiments of the present invention and the features of the embodiments may be combined with each other. Embodiments of the present invention will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments.
The technical scheme provided by the embodiment of the invention adopts a circulating cross-layer data automatic migration mechanism, and can organically combine the calculation task and the data migration task together for processing. According to the technical scheme, the data processing efficiency is high, and the time expenditure can be reduced to the greatest extent on the premise of ensuring the calculation quality, so that the calculation and storage resources can be effectively saved.
Fig. 1 shows a flow chart of a data processing method according to an embodiment of the present invention, which can be used for a robot terminal, as shown in fig. 1, and includes the following steps S101-S103:
in step S101, a cyclic processing parameter is obtained, where the cyclic processing parameter includes the number of cyclic processing times and the number of data processing layers required during each cyclic processing;
in step S102, a hierarchical data processing task is acquired, where the hierarchical data processing task includes a hierarchical computing task and a hierarchical data migration task;
in step S103, according to the loop processing parameter, loop processing is performed on the hierarchical data processing task.
As mentioned above, with the development of science and technology, in recent years, a neural network processor has become one of the most hot research fields. Although the neural network processor has achieved great success in the fields of image, video, voice recognition and the like, the trained neural network has a large number of coefficients, so that the neural network processor needs to consider not only the use of an operation unit but also the overhead generated by data handling when performing performance optimization. To reduce data handling overhead, it is currently common practice to prefetch data required for the next round of computation while computing. However, in the prior art, a manual program is often required to achieve the purpose, and the calculation in the same layer of the neural network is achieved in parallel with the data prefetching. Therefore, the prior art scheme has low efficiency, can not adapt to the endlessly formed neural network structure, and has greatly limited optimization of performance.
In view of the foregoing, in this embodiment, a data processing method is provided, which adopts a cyclic cross-layer data migration mechanism, and can organically combine a computing task with a data migration task to process. According to the technical scheme, the data processing efficiency is high, and the time expenditure can be reduced to the greatest extent on the premise of ensuring the calculation quality, so that the calculation and storage resources can be effectively saved.
It should be noted that, although the technical solution of the present invention is proposed based on the neural network problem, it may also be applied to other technical topics related to loop computation and data migration. For ease of understanding, the invention is explained and illustrated below by taking the calculation of loops in a neural network as an example.
In an optional implementation manner of this embodiment, the loop processing parameters include parameters such as the number of loop processing times, and the number of required data processing layers during each loop processing, where the number of required data processing layers during each loop processing is related to the structure of the calculation model. For example, for the neural network, the number of loops may be set to 5 times, and the number of layers of data processing required for each loop processing may be set to 4 layers, depending on the purpose of the current calculation task and the structural characteristics of the neural network.
In an optional implementation manner of this embodiment, the hierarchical data processing tasks include a hierarchical computing task and a hierarchical data migration task, where data to be migrated refers to some data stored in an external memory or data that needs to be stored in an external memory and used in each layer of computing, and in order to save computing time, the data needs to be migrated from the external memory in advance to participate in computing or migrated after computing, so that a hierarchical data migration task is generated.
Further, in order to save data processing time, most of the hierarchical computing tasks and the hierarchical data migration tasks are correspondingly processed in parallel.
In the prior art, the hierarchical computing task and the hierarchical data migration task are processed once, instead of being processed circularly, so that for each task processing flow, the allocation of the hierarchical computing task and the hierarchical data migration task is shown in fig. 2, assuming that the number of data processing layers of the neural network is 4, as shown in the left graph in fig. 2, only input data 1 is input into computing layer 1 serving as the first layer, and the input data 1 may be stored in an external memory, so that the input data 1 needs to be migrated from the external memory and then participate in computing of computing layer 1; the computing layer 2 is an intermediate layer, and the input data of the computing layer 2 comprises input data 3 which is stored in an external memory and needs to be migrated in addition to the output data 2 of the computing layer 1; the computation layer 3 is also an intermediate layer, and similar to the computation layer 2, the input data of the computation layer 2 comprises input data 5 which is stored in an external memory and needs to be migrated in addition to output data 4 of the computation layer 2; the computing layer 4 is a last layer, and its input data is the same as other intermediate layers, and includes, in addition to the output data 6 of the computing layer 3, input data 7 to be migrated stored in an external memory, unlike other intermediate layers, output data 8 of which needs to be migrated to the external memory, so that the output data 8 needs to be migrated after completing the computing task of the computing layer 4.
Thus, as shown in the right diagram of fig. 2, for the processing flow of each task, the hierarchical computing tasks include a computing layer 1 computing task, a computing layer 2 computing task, a computing layer 3 computing task, and a computing layer 4 computing task, and the hierarchical data migration tasks include migration of input data 1, migration of input data 3, migration of input data 5, migration of input data 7, and migration of output data 8, and the sequentially executed tasks are sequentially executed in order of time steps. Moreover, since the input data 1 is the basis of the calculation task of the calculation layer 1, it is necessary to set the data migration task in advance to migrate the input data 1, that is, the migration of the input data 1 needs to be completed before the calculation task of the calculation layer 1 is performed, and the migration of the output data 8 needs to be performed after the calculation task of the calculation layer 4 is completed to generate the output data 8, so that only the migration of the input data 3, the migration of the input data 5, and the migration of the input data 7 can be processed in parallel with the calculation task of the calculation layer.
If it is assumed that each task needs to occupy 1 time unit of time overhead, the hierarchical computing task needs to occupy 4 time units of time overhead in total, and the hierarchical data migration task needs to occupy 6 time units of time overhead in total, specifically including: migration time overhead for input data 1, migration time overhead for input data 3, migration time overhead for input data 5, migration time overhead for input data 7, computation time overhead for computation layer 4, and migration time overhead for output data 8. Because the time overhead occupied by the whole task processing flow is the maximum value between the time overhead occupied by the layered calculation task and the time overhead occupied by the layered data migration task, the time overhead occupied by the whole single task processing flow in the prior art is 6 time units.
In an alternative implementation manner of this embodiment, the hierarchical computing task and the hierarchical data migration task are circularly executed, so that time can be fully utilized, and the efficiency of data processing can be improved. In this implementation manner, the allocation of the hierarchical computing task and the hierarchical data migration task is shown in fig. 3, it is still assumed that the number of data processing layers of the neural network is 4, as shown in the left graph in fig. 3, the input/output of each computing layer is substantially the same as that shown in fig. 2, except that the processing manner of the input data 1 of the computing layer 1 serving as the first layer is related to the current number of cycles, when the current cycle is the first cycle, similarly to fig. 2, the input data 1 needs to be migrated from the external memory and then participate in the computation of the computing layer 1, and when the current cycle is not the first cycle, the input data 1 is the output data of the last computing layer in the last cycle that needs to be migrated; the other difference is that the input data of the last calculation layer 4 is the same as that of fig. 2, and besides the output data 6 of the calculation layer 3, the last calculation layer also comprises input data 7 which is stored in an external memory and needs to be migrated, and the output data 8 of the last calculation layer is migrated to the external memory or sent to the calculation layer 1 to participate in the next cycle calculation according to the difference of the cycle times of the current calculation, which is different from that of fig. 2.
Thus, as shown in the right graph of fig. 3, as shown in fig. 2, the layered computing tasks in the nth cycle include a computing layer 1 computing task, a computing layer 2 computing task, a computing layer 3 computing task and a computing layer 4 computing task, and unlike the one shown in fig. 2, the first-layer and last-layer data migration tasks may be interleaved in adjacent or different cycle flows for processing, for example, the data migration task that can be processed in parallel with the computing layer 1 computing task is determined according to the current cycle number, when the current cycle is the first cycle, the input data 1 migration task of the computing layer 1 is executed first, then the migration of the input data 3 of the computing layer 2 can be processed in parallel with the computing layer 1 computing task, and when the current cycle is not the first cycle, the migration of the input data 3 of the computing layer 2 and the migration of the output data 8 of the last-layer computing layer can be processed in parallel with the computing layer 1 computing task; as shown in fig. 2, migration of the computing layer 3 input data 5 may be processed in parallel with the computing layer 2 computing task; migration of the computing layer 4 input data 7 may be processed in parallel with the computing layer 3 computing task; unlike the method shown in fig. 2, the data migration task that can be processed in parallel with the calculation task of the last layer 4 is related to the current cycle number, when the current cycle is the last cycle, the data is not required to be migrated for the next layer or the next cycle because the current cycle is the last-stage calculation, so the calculation task of the last layer 4 is independently executed, if the output data 8 is required to be migrated, the migration of the output data 8 is performed after the calculation task of the layer 4 is completed, and when the current cycle is not the last cycle, the migration of the input data 1 of the first layer of the next cycle can be processed in parallel with the calculation task of the layer 4.
Still assuming that each task needs to occupy 1 time unit of time overhead, in the above embodiment, the layered computing task still needs to occupy 4 time units of time overhead in total, but the time overhead of the layered data migration task is greatly reduced, specifically: when the current cycle is the first cycle, the hierarchical data migration task only needs to occupy a total of 5 time units of time overhead: the migration time overhead of the input data 1, the migration time overhead of the input data 3, the migration time overhead of the input data 5, the migration time overhead of the input data 7, and the migration time overhead of the input data 1 of the next cycle. When the current cycle is an intermediate cycle, the hierarchical data migration task only needs to occupy a total of 4 time units of time overhead: parallel migration time overhead of input data 3 and last cycle output data 8, migration time overhead of input data 5, migration time overhead of input data 7, and migration time overhead of next cycle input data 1. When the current cycle is the last cycle, the hierarchical data migration task only needs to occupy a total of 5 time units of time overhead: parallel migration time overhead of input data 3 and last cycle output data 8, migration time overhead of input data 5, migration time overhead of input data 7, computation time overhead of computation layer 4, and migration time overhead of output data 8. It can be seen that the time overhead occupied by the single task processing flow in this embodiment is reduced by 1-2 time units compared with the prior art, and especially for a complex neural network structure, multiple cyclic calculations and massive neural network data, the saved time overhead will be very considerable.
As can be seen from the above, in this implementation manner, the content of the hierarchical data migration task is related to the layer where the current computing task is located and the number of cycles where the current computing task is located, and the number of cycles where the current computing task is located is different, so that the content of the hierarchical data migration task may be different, that is, the hierarchical data migration task may include input data to be migrated of the next computing task in the current cycle; or the input data to be migrated of the next-layer computing task in the current cycle and the output data to be migrated of the last-layer computing task in the last cycle are included; or the input data to be migrated of the first-layer computing task in the next cycle is included.
More specifically, in an alternative implementation manner of this embodiment, the step S102, that is, the step of acquiring the hierarchical data processing task, may be implemented as the following steps:
determining whether the current cycle is a first cycle, an intermediate cycle or a last cycle;
when the current cycle is the first cycle, determining whether the current data processing layer number is the first layer, the middle layer or the last layer;
when the current data processing layer number is the first layer, acquiring a current layer computing task, and acquiring a current layer input data migration task and a next layer input data migration task as current layer data migration tasks; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a first layer input data migration task in the next cycle as a current layer data migration task;
When the current cycle is an intermediate cycle, determining whether the current data processing layer number is a first layer, an intermediate layer or a last layer;
when the current data processing layer number is the first layer, acquiring a current layer calculation task, and acquiring a next layer input data migration task and a last layer output data migration task in the last cycle as the current layer data migration task; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a first layer input data migration task in the next cycle as a current layer data migration task;
when the current cycle is the last cycle, determining whether the current data processing layer number is a first layer, an intermediate layer or a last layer;
when the current data processing layer number is the first layer, acquiring a current layer calculation task, and acquiring a next layer input data migration task and a last layer output data migration task in the last cycle as the current layer data migration task; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a current layer output data migration task as a current layer data migration task.
In an optional implementation manner of this embodiment, the step S103, that is, the step of performing the loop processing on the hierarchical data processing task according to the loop processing parameter, may be implemented as the following steps:
determining whether the current data processing layer number is a first data processing layer of a first cycle or a last data processing layer of a last cycle;
when the current data processing layer number is the first data processing layer of the first cycle, firstly processing a current layer input data migration task in a current layer data migration task, and then processing a next layer input data migration task and a current layer calculation task in the current layer data migration task in parallel;
when the current data processing layer number is the last data processing layer of the last cycle, sequentially processing a current layer calculation task and a current layer data migration task;
and when the current data processing layer number is not the first data processing layer of the first cycle and the last data processing layer of the last cycle, processing the current layer calculation task and the current layer data migration task in parallel.
As mentioned above, since the input data 1 is the basis of the calculation task of the calculation layer 1, the data migration task needs to be set in advance to migrate the input data 1 when the loop is first performed, that is, the migration of the input data 1 needs to be completed before the calculation task of the calculation layer 1 is performed; in the last cycle, since the last stage of calculation is not needed to migrate data for the next layer or the next cycle, the calculation task of the last calculation layer 4 needs to be executed separately first, and then the output data 8 is migrated, that is, the migration of the output data 8 needs to be executed after the calculation task of the calculation layer 4 is completed. Thus, the execution relationship of the current layer data migration task and the current layer calculation task is related to the number of loops and the number of layers.
In an alternative implementation manner of this embodiment, as shown in fig. 4, the method further includes a step of optimizing the loop processing of the hierarchical data processing task according to the time overhead of the hierarchical data processing task, that is, as shown in fig. 4, the method includes the following steps S401 to S404:
in step S401, a cyclic processing parameter is acquired, where the cyclic processing parameter includes the number of cyclic processing times and the number of data processing layers required during each cyclic processing;
in step S402, a hierarchical data processing task is acquired, where the hierarchical data processing task includes a hierarchical computing task and a hierarchical data migration task;
in step S403, performing a loop process on the hierarchical data processing task according to the loop processing parameter;
in step S404, loop processing of the hierarchical data processing task is optimized according to the time overhead of the hierarchical data processing task.
In the above, for convenience of explanation, it is assumed that each task needs to occupy the same time overhead of 1 time unit, but in reality, since the neural network has different structures of each layer and different input data and output data, the time overhead required for each layer of computing task and each layer of data migration task may be different. Thus, in this embodiment, the loop processing of the hierarchical data processing tasks may be optimized according to the specific time overhead of the hierarchical data processing tasks.
Specifically, in an optional implementation manner of this embodiment, as shown in fig. 5, step S404, that is, a step of optimizing loop processing of the hierarchical data processing task according to time overhead of the hierarchical data processing task, includes the following steps S501-S503:
in step S501, the computation time overhead of the hierarchical computation task and the data migration time overhead of the hierarchical data migration task are obtained;
in step S502, determining an adjustable range of the hierarchical data migration task;
in step S503, according to the calculation time cost of the hierarchical calculation task, the data migration time cost of the hierarchical data migration task, and the adjustable range of the hierarchical data migration task, the execution time of the hierarchical data processing task is adjusted, so that the total time cost of the data processing is minimized.
In this embodiment, first, specific computing time overhead of each layer of computing task in the layered computing task and specific data migration time overhead of each layer of data migration task in the layered data migration task are obtained; then determining the adjustable range of the layered data migration task; and finally, optimizing the cyclic processing of the layered data processing task according to the calculation time cost of the layered calculation task, the data migration time cost of the layered data migration task and the adjustable range of the layered data migration task, for example, adjusting the execution time of the layered data processing task so as to minimize the total time cost of the data processing.
The adjustable range of the layered data migration task refers to a variable range of execution time of the layered data migration task without affecting a data processing flow. For example, for an immigrating data task, its configurable range is T 0 ~T c Wherein T is 0 Representing the execution time of the first task in the data processing flow of the migration data task, T c Representing the currently allocated execution time of the migration data task; for the migrated data task, the adjustable range is T c ~T L Wherein T is L And the execution time of the last migration data task in the data processing flow of the migration data task is represented, or the time obtained by subtracting the estimated time spent by the migration data task from the estimated ending time of the last migration data task.
Taking the middle secondary circulation flow shown in fig. 3 as an example, it is assumed that in the present circulation, the input data 5 is relatively large, the migration takes 2 time units, and the calculation task of the calculation layer 2 in parallel processing with the input data 5 only needs 1 time unit, meanwhile, the calculation task of the calculation layer 3 executed in the next time unit, and the migration of the input data 7 originally planned to be in parallel processing with the calculation task of the calculation layer 3 also needs 1 time unit, at this time, the migration execution time of the input data 7 can be adjusted to be advanced to the migration parallel processing with the input data 5, that is, the migration parallel processing of the calculation task of the calculation layer 2 and the migration of the input data 5 is firstly performed, after the calculation task of the calculation layer 2 is completed, the calculation task of the calculation layer 3 is executed, and the migration of the input data 7 can be processed in parallel with the calculation task of the calculation layer 2 and the migration of the input data 5 in the first time unit of the migration of the input data 5, or the migration parallel processing of the calculation layer 3 and the migration of the input data 5 in the second time unit of the migration of the input data 5. Therefore, the time cost of the whole data processing flow is not increased due to different time cost of the layered calculation task or the data migration task, and the aim of minimizing the total time cost of the data processing is fulfilled.
The following are examples of the apparatus of the present invention that may be used to perform the method embodiments of the present invention.
Fig. 6 shows a block diagram of a data processing apparatus according to an embodiment of the invention, which may be implemented as part or all of an electronic device by software, hardware or a combination of both. As shown in fig. 6, the data processing apparatus includes:
a first obtaining module 601, configured to obtain a cyclic processing parameter, where the cyclic processing parameter includes a cyclic processing number of times and a number of data processing layers required during each cyclic processing;
a second acquisition module 602 configured to acquire layered data processing tasks, wherein the layered data processing tasks include a layered computing task and a layered data migration task;
and the processing module 603 is configured to perform cyclic processing on the layered data processing task according to the cyclic processing parameter.
As mentioned above, with the development of science and technology, in recent years, a neural network processor has become one of the most hot research fields. Although the neural network processor has achieved great success in the fields of image, video, voice recognition and the like, the trained neural network has a large number of coefficients, so that the neural network processor needs to consider not only the use of an operation unit but also the overhead generated by data handling when performing performance optimization. To reduce data handling overhead, it is currently common practice to prefetch data required for the next round of computation while computing. However, in the prior art, a manual program is often required to achieve the purpose, and the calculation in the same layer of the neural network is achieved in parallel with the data prefetching. Therefore, the prior art scheme has low efficiency, can not adapt to the endlessly formed neural network structure, and has greatly limited optimization of performance.
In view of the foregoing, in this embodiment, a data processing apparatus is provided that employs a cyclic cross-layer data automatic migration mechanism, and is capable of organically combining a computing task with a data migration task for processing. According to the technical scheme, the data processing efficiency is high, and the time expenditure can be reduced to the greatest extent on the premise of ensuring the calculation quality, so that the calculation and storage resources can be effectively saved.
It should be noted that, although the technical solution of the present invention is proposed based on the neural network problem, it may also be applied to other technical topics related to loop computation and data migration. For ease of understanding, the invention is explained and illustrated below by taking the calculation of loops in a neural network as an example.
In an optional implementation manner of this embodiment, the loop processing parameters include parameters such as the number of loop processing times, and the number of required data processing layers during each loop processing, where the number of required data processing layers during each loop processing is related to the structure of the calculation model. For example, for the neural network, the number of loops may be set to 5 times, and the number of layers of data processing required for each loop processing may be set to 4 layers, depending on the purpose of the current calculation task and the structural characteristics of the neural network.
In an optional implementation manner of this embodiment, the hierarchical data processing tasks include a hierarchical computing task and a hierarchical data migration task, where data to be migrated refers to some data stored in an external memory or data that needs to be stored in an external memory and used in each layer of computing, and in order to save computing time, the data needs to be migrated from the external memory in advance to participate in computing or migrated after computing, so that a hierarchical data migration task is generated.
Further, in order to save data processing time, most of the hierarchical computing tasks and the hierarchical data migration tasks are correspondingly processed in parallel.
In the prior art, the hierarchical computing task and the hierarchical data migration task are processed once, instead of being processed circularly, so that for each task processing flow, the allocation of the hierarchical computing task and the hierarchical data migration task is shown in fig. 2, assuming that the number of data processing layers of the neural network is 4, as shown in the left graph in fig. 2, only input data 1 is input into computing layer 1 serving as the first layer, and the input data 1 may be stored in an external memory, so that the input data 1 needs to be migrated from the external memory and then participate in computing of computing layer 1; the computing layer 2 is an intermediate layer, and the input data of the computing layer 2 comprises input data 3 which is stored in an external memory and needs to be migrated in addition to the output data 2 of the computing layer 1; the computation layer 3 is also an intermediate layer, and similar to the computation layer 2, the input data of the computation layer 2 comprises input data 5 which is stored in an external memory and needs to be migrated in addition to output data 4 of the computation layer 2; the computing layer 4 is a last layer, and its input data is the same as other intermediate layers, and includes, in addition to the output data 6 of the computing layer 3, input data 7 to be migrated stored in an external memory, unlike other intermediate layers, output data 8 of which needs to be migrated to the external memory, so that the output data 8 needs to be migrated after completing the computing task of the computing layer 4.
Thus, as shown in the right diagram of fig. 2, for the processing flow of each task, the hierarchical computing tasks include a computing layer 1 computing task, a computing layer 2 computing task, a computing layer 3 computing task, and a computing layer 4 computing task, and the hierarchical data migration tasks include migration of input data 1, migration of input data 3, migration of input data 5, migration of input data 7, and migration of output data 8, and the sequentially executed tasks are sequentially executed in order of time steps. Moreover, since the input data 1 is the basis of the calculation task of the calculation layer 1, it is necessary to set the data migration task in advance to migrate the input data 1, that is, the migration of the input data 1 needs to be completed before the calculation task of the calculation layer 1 is performed, and the migration of the output data 8 needs to be performed after the calculation task of the calculation layer 4 is completed to generate the output data 8, so that only the migration of the input data 3, the migration of the input data 5, and the migration of the input data 7 can be processed in parallel with the calculation task of the calculation layer.
If it is assumed that each task needs to occupy 1 time unit of time overhead, the hierarchical computing task needs to occupy 4 time units of time overhead in total, and the hierarchical data migration task needs to occupy 6 time units of time overhead in total, specifically including: migration time overhead for input data 1, migration time overhead for input data 3, migration time overhead for input data 5, migration time overhead for input data 7, computation time overhead for computation layer 4, and migration time overhead for output data 8. Because the time overhead occupied by the whole task processing flow is the maximum value between the time overhead occupied by the layered calculation task and the time overhead occupied by the layered data migration task, the time overhead occupied by the whole single task processing flow in the prior art is 6 time units.
In an alternative implementation manner of this embodiment, the hierarchical computing task and the hierarchical data migration task are circularly executed, so that time can be fully utilized, and the efficiency of data processing can be improved. In this implementation manner, the allocation of the hierarchical computing task and the hierarchical data migration task is shown in fig. 3, it is still assumed that the number of data processing layers of the neural network is 4, as shown in the left graph in fig. 3, the input/output of each computing layer is substantially the same as that shown in fig. 2, except that the processing manner of the input data 1 of the computing layer 1 serving as the first layer is related to the current number of cycles, when the current cycle is the first cycle, similarly to fig. 2, the input data 1 needs to be migrated from the external memory and then participate in the computation of the computing layer 1, and when the current cycle is not the first cycle, the input data 1 is the output data of the last computing layer in the last cycle that needs to be migrated; the other difference is that the input data of the last calculation layer 4 is the same as that of fig. 2, and besides the output data 6 of the calculation layer 3, the last calculation layer also comprises input data 7 which is stored in an external memory and needs to be migrated, and the output data 8 of the last calculation layer is migrated to the external memory or sent to the calculation layer 1 to participate in the next cycle calculation according to the difference of the cycle times of the current calculation, which is different from that of fig. 2.
Thus, as shown in the right graph of fig. 3, as shown in fig. 2, the layered computing tasks in the nth cycle include a computing layer 1 computing task, a computing layer 2 computing task, a computing layer 3 computing task and a computing layer 4 computing task, and unlike the one shown in fig. 2, the first-layer and last-layer data migration tasks may be interleaved in adjacent or different cycle flows for processing, for example, the data migration task that can be processed in parallel with the computing layer 1 computing task is determined according to the current cycle number, when the current cycle is the first cycle, the input data 1 migration task of the computing layer 1 is executed first, then the migration of the input data 3 of the computing layer 2 can be processed in parallel with the computing layer 1 computing task, and when the current cycle is not the first cycle, the migration of the input data 3 of the computing layer 2 and the migration of the output data 8 of the last-layer computing layer can be processed in parallel with the computing layer 1 computing task; as shown in fig. 2, migration of the computing layer 3 input data 5 may be processed in parallel with the computing layer 2 computing task; migration of the computing layer 4 input data 7 may be processed in parallel with the computing layer 3 computing task; unlike the method shown in fig. 2, the data migration task that can be processed in parallel with the calculation task of the last layer 4 is related to the current cycle number, when the current cycle is the last cycle, the data is not required to be migrated for the next layer or the next cycle because the current cycle is the last-stage calculation, so the calculation task of the last layer 4 is independently executed, if the output data 8 is required to be migrated, the migration of the output data 8 is performed after the calculation task of the layer 4 is completed, and when the current cycle is not the last cycle, the migration of the input data 1 of the first layer of the next cycle can be processed in parallel with the calculation task of the layer 4.
Still assuming that each task needs to occupy 1 time unit of time overhead, in the above embodiment, the layered computing task still needs to occupy 4 time units of time overhead in total, but the time overhead of the layered data migration task is greatly reduced, specifically: when the current cycle is the first cycle, the hierarchical data migration task only needs to occupy a total of 5 time units of time overhead: the migration time overhead of the input data 1, the migration time overhead of the input data 3, the migration time overhead of the input data 5, the migration time overhead of the input data 7, and the migration time overhead of the input data 1 of the next cycle. When the current cycle is an intermediate cycle, the hierarchical data migration task only needs to occupy a total of 4 time units of time overhead: parallel migration time overhead of input data 3 and last cycle output data 8, migration time overhead of input data 5, migration time overhead of input data 7, and migration time overhead of next cycle input data 1. When the current cycle is the last cycle, the hierarchical data migration task only needs to occupy a total of 5 time units of time overhead: parallel migration time overhead of input data 3 and last cycle output data 8, migration time overhead of input data 5, migration time overhead of input data 7, computation time overhead of computation layer 4, and migration time overhead of output data 8. It can be seen that the time overhead occupied by the single task processing flow in this embodiment is reduced by 1-2 time units compared with the prior art, and especially for a complex neural network structure, multiple cyclic calculations and massive neural network data, the saved time overhead will be very considerable.
As can be seen from the above, in this implementation manner, the content of the hierarchical data migration task is related to the layer where the current computing task is located and the number of cycles where the current computing task is located, and the number of cycles where the current computing task is located is different, so that the content of the hierarchical data migration task may be different, that is, the hierarchical data migration task may include input data to be migrated of the next computing task in the current cycle; or the input data to be migrated of the next-layer computing task in the current cycle and the output data to be migrated of the last-layer computing task in the last cycle are included; or the input data to be migrated of the first-layer computing task in the next cycle is included.
More specifically, in an alternative implementation of the present embodiment, the second obtaining module 602 may be configured to:
determining whether the current cycle is a first cycle, an intermediate cycle or a last cycle;
when the current cycle is the first cycle, determining whether the current data processing layer number is the first layer, the middle layer or the last layer;
when the current data processing layer number is the first layer, acquiring a current layer computing task, and acquiring a current layer input data migration task and a next layer input data migration task as current layer data migration tasks; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a first layer input data migration task in the next cycle as a current layer data migration task;
When the current cycle is an intermediate cycle, determining whether the current data processing layer number is a first layer, an intermediate layer or a last layer;
when the current data processing layer number is the first layer, acquiring a current layer calculation task, and acquiring a next layer input data migration task and a last layer output data migration task in the last cycle as the current layer data migration task; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a first layer input data migration task in the next cycle as a current layer data migration task;
when the current cycle is the last cycle, determining whether the current data processing layer number is a first layer, an intermediate layer or a last layer;
when the current data processing layer number is the first layer, acquiring a current layer calculation task, and acquiring a next layer input data migration task and a last layer output data migration task in the last cycle as the current layer data migration task; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a current layer output data migration task as a current layer data migration task.
In an alternative implementation of the present embodiment, the processing module 603 may be configured to:
determining whether the current data processing layer number is a first data processing layer of a first cycle or a last data processing layer of a last cycle;
when the current data processing layer number is the first data processing layer of the first cycle, firstly processing a current layer input data migration task in a current layer data migration task, and then processing a next layer input data migration task and a current layer calculation task in the current layer data migration task in parallel;
when the current data processing layer number is the last data processing layer of the last cycle, sequentially processing a current layer calculation task and a current layer data migration task;
and when the current data processing layer number is not the first data processing layer of the first cycle and the last data processing layer of the last cycle, processing the current layer calculation task and the current layer data migration task in parallel.
As mentioned above, since the input data 1 is the basis of the calculation task of the calculation layer 1, the data migration task needs to be set in advance to migrate the input data 1 when the loop is first performed, that is, the migration of the input data 1 needs to be completed before the calculation task of the calculation layer 1 is performed; in the last cycle, since the last stage of calculation is not needed to migrate data for the next layer or the next cycle, the calculation task of the last calculation layer 4 needs to be executed separately first, and then the output data 8 is migrated, that is, the migration of the output data 8 needs to be executed after the calculation task of the calculation layer 4 is completed. Thus, the execution relationship of the current layer data migration task and the current layer calculation task is related to the number of loops and the number of layers.
In an alternative implementation manner of this embodiment, as shown in fig. 7, the apparatus further includes a portion for optimizing loop processing of the hierarchical data processing task according to time overhead of the hierarchical data processing task, that is, as shown in fig. 7, the apparatus includes:
a first obtaining module 701, configured to obtain a cyclic processing parameter, where the cyclic processing parameter includes a cyclic processing number of times and a number of data processing layers required during each cyclic processing;
a second acquisition module 702 configured to acquire layered data processing tasks, wherein the layered data processing tasks include a layered computing task and a layered data migration task;
a processing module 703 configured to perform a loop process on the layered data processing task according to the loop processing parameter;
an optimization module 704 configured to optimize loop processing of the hierarchical data processing tasks according to the time overhead of the hierarchical data processing tasks.
In the above, for convenience of explanation, it is assumed that each task needs to occupy the same time overhead of 1 time unit, but in reality, since the neural network has different structures of each layer and different input data and output data, the time overhead required for each layer of computing task and each layer of data migration task may be different. Thus, in this embodiment, the loop processing of the hierarchical data processing tasks may be optimized according to the specific time overhead of the hierarchical data processing tasks.
Specifically, in an alternative implementation of the present embodiment, as shown in fig. 8, the optimization module 704 includes:
an obtaining sub-module 801 configured to obtain a computation time overhead of the hierarchical computation task and a data migration time overhead of the hierarchical data migration task;
a determination submodule 802 configured to determine a tunable range of the hierarchical data migration task;
an adjustment sub-module 803 configured to adjust the execution time of the hierarchical data processing task according to the calculation time overhead of the hierarchical calculation task, the data migration time overhead of the hierarchical data migration task, and the adjustable range of the hierarchical data migration task, so that the total time overhead of the data processing is minimum.
In this embodiment, the obtaining sub-module 801 obtains a specific computing time cost of each layer of computing tasks in the hierarchical computing tasks, and the specific data migration time cost determining sub-module 802 of each layer of data migration tasks in the hierarchical data migration tasks determines an adjustable range of the hierarchical data migration tasks; the adjustment submodule 803 optimizes the cyclic processing of the hierarchical data processing task according to the calculation time cost of the hierarchical calculation task, the data migration time cost of the hierarchical data migration task and the adjustable range of the hierarchical data migration task, for example, adjusts the execution time of the hierarchical data processing task so that the total time cost of the data processing is minimum.
The adjustable range of the layered data migration task refers to a variable range of execution time of the layered data migration task without affecting a data processing flow. For example, for an immigrating data task, its configurable range is T 0 ~T c Wherein T is 0 Representing the execution time of the first task in the data processing flow of the migration data task, T c Representing the currently allocated execution time of the migration data task; for the migrated data task, the adjustable range is T c ~T L Wherein T is L And the execution time of the last migration data task in the data processing flow of the migration data task is represented, or the time obtained by subtracting the estimated time spent by the migration data task from the estimated ending time of the last migration data task.
Taking the middle secondary circulation flow shown in fig. 3 as an example, it is assumed that in the present circulation, the input data 5 is relatively large, the migration takes 2 time units, and the calculation task of the calculation layer 2 in parallel processing with the input data 5 only needs 1 time unit, meanwhile, the calculation task of the calculation layer 3 executed in the next time unit, and the migration of the input data 7 originally planned to be in parallel processing with the calculation task of the calculation layer 3 also needs 1 time unit, at this time, the migration execution time of the input data 7 can be adjusted to be advanced to the migration parallel processing with the input data 5, that is, the migration parallel processing of the calculation task of the calculation layer 2 and the migration of the input data 5 is firstly performed, after the calculation task of the calculation layer 2 is completed, the calculation task of the calculation layer 3 is executed, and the migration of the input data 7 can be processed in parallel with the calculation task of the calculation layer 2 and the migration of the input data 5 in the first time unit of the migration of the input data 5, or the migration parallel processing of the calculation layer 3 and the migration of the input data 5 in the second time unit of the migration of the input data 5. Therefore, the time cost of the whole data processing flow is not increased due to different time cost of the layered calculation task or the data migration task, and the aim of minimizing the total time cost of the data processing is fulfilled.
The embodiment of the invention also discloses an electronic device, fig. 9 shows a block diagram of the electronic device according to an embodiment of the invention, and as shown in fig. 9, the electronic device 900 includes a memory 901 and a processor 902; wherein,
the memory 901 is configured to store one or more computer instructions that are executed by the processor 902 to implement any of the method steps described above.
Fig. 10 is a schematic diagram of a computer system suitable for implementing a data processing method according to an embodiment of the present invention.
As shown in fig. 10, the computer system 1000 includes a Central Processing Unit (CPU) 1001, which can execute various processes in the above-described embodiments in accordance with a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM1003, various programs and data required for the operation of the system 1000 are also stored. The CPU1001, ROM1002, and RAM1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output portion 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), etc., and a speaker, etc.; a storage portion 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The drive 1010 is also connected to the I/O interface 1005 as needed. A removable medium 1011, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in the drive 1010, so that a computer program read out therefrom is installed as needed in the storage section 1008.
In particular, the method described above may be implemented as a computer software program according to an embodiment of the invention. For example, embodiments of the present invention include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the data processing method. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1009, and/or installed from the removable medium 1011.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware. The units or modules described may also be provided in a processor, the names of which in some cases do not constitute a limitation of the unit or module itself.
As another aspect, the embodiment of the present invention further provides a computer-readable storage medium, which may be a computer-readable storage medium included in the apparatus described in the above embodiment; or may be a computer-readable storage medium, alone, that is not assembled into a device. The computer-readable storage medium stores one or more programs for use by one or more processors to perform the methods described in embodiments of the present invention.
The above description is only illustrative of the preferred embodiments of the present invention and of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present invention is not limited to the specific combination of the above technical features, but also encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the inventive concept. Such as the technical solution formed by mutually replacing the above features and the technical features with similar functions (but not limited to) disclosed in the embodiments of the present invention.

Claims (12)

1. A method of data processing, comprising:
acquiring a cyclic processing parameter, wherein the cyclic processing parameter comprises cyclic processing times and the number of data processing layers required during each cyclic processing;
acquiring a layered data processing task, wherein the layered data processing task comprises a layered calculation task and a layered data migration task;
performing cyclic processing on the layered data processing task according to the cyclic processing parameters;
and performing cyclic processing on the layered data processing task according to the cyclic processing parameter, wherein the cyclic processing comprises the following steps:
determining whether the current data processing layer number is a first data processing layer of a first cycle or a last data processing layer of a last cycle;
when the current data processing layer number is the first data processing layer of the first cycle, firstly processing a current layer input data migration task in a current layer data migration task, and then processing a next layer input data migration task and a current layer calculation task in the current layer data migration task in parallel;
when the current data processing layer number is the last data processing layer of the last cycle, sequentially processing a current layer calculation task and a current layer data migration task;
And when the current data processing layer number is not the first data processing layer of the first cycle and the last data processing layer of the last cycle, processing the current layer calculation task and the current layer data migration task in parallel.
2. The method of claim 1, wherein the hierarchical data migration task comprises input data to be migrated for a next layer of computing tasks in a current cycle; or,
the layered data migration task comprises to-be-migrated input data of a next-layer computing task in a current cycle and to-be-migrated output data of a last-layer computing task in a last cycle; or,
the layered data migration task comprises input data to be migrated of a first-layer computing task in the next cycle.
3. The method according to claim 1 or 2, wherein the acquiring hierarchical data processing tasks comprises:
determining whether the current cycle is a first cycle, an intermediate cycle or a last cycle;
when the current cycle is the first cycle, determining whether the current data processing layer number is the first layer, the middle layer or the last layer;
when the current data processing layer number is the first layer, acquiring a current layer computing task, and acquiring a current layer input data migration task and a next layer input data migration task as current layer data migration tasks; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a first layer input data migration task in the next cycle as a current layer data migration task;
When the current cycle is an intermediate cycle, determining whether the current data processing layer number is a first layer, an intermediate layer or a last layer;
when the current data processing layer number is the first layer, acquiring a current layer calculation task, and acquiring a next layer input data migration task and a last layer output data migration task in the last cycle as the current layer data migration task; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a first layer input data migration task in the next cycle as a current layer data migration task;
when the current cycle is the last cycle, determining whether the current data processing layer number is a first layer, an intermediate layer or a last layer;
when the current data processing layer number is the first layer, acquiring a current layer calculation task, and acquiring a next layer input data migration task and a last layer output data migration task in the last cycle as the current layer data migration task; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a current layer output data migration task as a current layer data migration task.
4. A method according to claim 3, further comprising:
and optimizing the cyclic processing of the hierarchical data processing task according to the time overhead of the hierarchical data processing task.
5. The method of claim 4, wherein optimizing the loop processing of the hierarchical data processing task based on the time overhead of the hierarchical data processing task comprises:
acquiring the calculation time cost of the layered calculation task and the data migration time cost of the layered data migration task;
determining an adjustable range of the layered data migration task;
and adjusting the execution time of the layered data processing task according to the calculation time cost of the layered calculation task, the data migration time cost of the layered data migration task and the adjustable range of the layered data migration task, so that the total time cost of the data processing is minimum.
6. A data processing apparatus, comprising:
the first acquisition module is configured to acquire cyclic processing parameters, wherein the cyclic processing parameters comprise cyclic processing times and the number of data processing layers required during each cyclic processing;
The system comprises a first acquisition module, a second acquisition module and a control module, wherein the first acquisition module is configured to acquire a hierarchical data processing task, and the hierarchical data processing task comprises a hierarchical calculation task and a hierarchical data migration task;
the processing module is configured to perform cyclic processing on the layered data processing task according to the cyclic processing parameters;
the processing module is configured to:
determining whether the current data processing layer number is a first data processing layer of a first cycle or a last data processing layer of a last cycle;
when the current data processing layer number is the first data processing layer of the first cycle, firstly processing a current layer input data migration task in a current layer data migration task, and then processing a next layer input data migration task and a current layer calculation task in the current layer data migration task in parallel;
when the current data processing layer number is the last data processing layer of the last cycle, sequentially processing a current layer calculation task and a current layer data migration task;
and when the current data processing layer number is not the first data processing layer of the first cycle and the last data processing layer of the last cycle, processing the current layer calculation task and the current layer data migration task in parallel.
7. The apparatus of claim 6, wherein the hierarchical data migration task comprises input data to be migrated for a next layer of computing tasks in a current cycle; or,
the layered data migration task comprises to-be-migrated input data of a next-layer computing task in a current cycle and to-be-migrated output data of a last-layer computing task in a last cycle; or,
the layered data migration task comprises input data to be migrated of a first-layer computing task in the next cycle.
8. The apparatus of claim 6 or 7, wherein the second acquisition module is configured to:
determining whether the current cycle is a first cycle, an intermediate cycle or a last cycle;
when the current cycle is the first cycle, determining whether the current data processing layer number is the first layer, the middle layer or the last layer;
when the current data processing layer number is the first layer, acquiring a current layer computing task, and acquiring a current layer input data migration task and a next layer input data migration task as current layer data migration tasks; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a first layer input data migration task in the next cycle as a current layer data migration task;
When the current cycle is an intermediate cycle, determining whether the current data processing layer number is a first layer, an intermediate layer or a last layer;
when the current data processing layer number is the first layer, acquiring a current layer calculation task, and acquiring a next layer input data migration task and a last layer output data migration task in the last cycle as the current layer data migration task; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a first layer input data migration task in the next cycle as a current layer data migration task;
when the current cycle is the last cycle, determining whether the current data processing layer number is a first layer, an intermediate layer or a last layer;
when the current data processing layer number is the first layer, acquiring a current layer calculation task, and acquiring a next layer input data migration task and a last layer output data migration task in the last cycle as the current layer data migration task; when the current data processing layer number is the middle layer, acquiring a current layer computing task, and acquiring a next layer input data migration task as a current layer data migration task; when the current data processing layer number is the last layer, acquiring a current layer calculation task, and acquiring a current layer output data migration task as a current layer data migration task.
9. The apparatus as recited in claim 8, further comprising:
and the optimizing module is configured to optimize the cyclic processing of the layered data processing task according to the time overhead of the layered data processing task.
10. The apparatus of claim 9, wherein the optimization module comprises:
the acquisition sub-module is configured to acquire the calculation time cost of the layered calculation task and the data migration time cost of the layered data migration task;
a determination submodule configured to determine a tunable range of the hierarchical data migration task;
and the adjustment sub-module is configured to adjust the execution time of the layered data processing task according to the calculation time cost of the layered calculation task, the data migration time cost of the layered data migration task and the adjustable range of the layered data migration task so as to minimize the total time cost of the data processing.
11. An electronic device comprising a memory and a processor; wherein,
the memory is for storing one or more computer instructions, wherein the one or more computer instructions are executable by the processor to implement the method steps of any one of claims 1-5.
12. A computer readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the method steps of any of claims 1-5.
CN201880097405.8A 2018-10-17 2018-10-17 Data processing method, device, electronic equipment and computer readable storage medium Active CN112740174B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/110658 WO2020077565A1 (en) 2018-10-17 2018-10-17 Data processing method and apparatus, electronic device, and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN112740174A CN112740174A (en) 2021-04-30
CN112740174B true CN112740174B (en) 2024-02-06

Family

ID=70283216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201880097405.8A Active CN112740174B (en) 2018-10-17 2018-10-17 Data processing method, device, electronic equipment and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN112740174B (en)
WO (1) WO2020077565A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493797A (en) * 2009-03-12 2009-07-29 华为技术有限公司 Data dynamic migration method and device
CN104778074A (en) * 2014-01-14 2015-07-15 腾讯科技(深圳)有限公司 Calculation task processing method and device
CN107563512A (en) * 2017-08-24 2018-01-09 腾讯科技(上海)有限公司 A kind of data processing method, device and storage medium
CN107908477A (en) * 2017-11-17 2018-04-13 郑州云海信息技术有限公司 A kind of data processing method and device for radio astronomy data

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8380958B2 (en) * 2010-11-09 2013-02-19 International Business Machines Corporation Spatial extent migration for tiered storage architecture
CN106250103A (en) * 2016-08-04 2016-12-21 东南大学 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing
US9959498B1 (en) * 2016-10-27 2018-05-01 Google Llc Neural network instruction set architecture
CN108597539B (en) * 2018-02-09 2021-09-03 桂林电子科技大学 Speech emotion recognition method based on parameter migration and spectrogram

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493797A (en) * 2009-03-12 2009-07-29 华为技术有限公司 Data dynamic migration method and device
CN104778074A (en) * 2014-01-14 2015-07-15 腾讯科技(深圳)有限公司 Calculation task processing method and device
CN107563512A (en) * 2017-08-24 2018-01-09 腾讯科技(上海)有限公司 A kind of data processing method, device and storage medium
CN107908477A (en) * 2017-11-17 2018-04-13 郑州云海信息技术有限公司 A kind of data processing method and device for radio astronomy data

Also Published As

Publication number Publication date
CN112740174A (en) 2021-04-30
WO2020077565A1 (en) 2020-04-23

Similar Documents

Publication Publication Date Title
US20200089535A1 (en) Data sharing system and data sharing method therefor
US10817260B1 (en) Reducing dynamic power consumption in arrays
CN110262901B (en) Data processing method and data processing system
US20190065938A1 (en) Apparatus and Methods for Pooling Operations
Gao et al. Deep neural network task partitioning and offloading for mobile edge computing
Lee et al. Improving scalability of parallel CNN training by adjusting mini-batch size at run-time
CN116991560B (en) Parallel scheduling method, device, equipment and storage medium for language model
CN115994567B (en) Asynchronous scheduling method for parallel computing tasks of deep neural network model
US11816061B2 (en) Dynamic allocation of arithmetic logic units for vectorized operations
JPWO2019082859A1 (en) Inference device, convolution operation execution method and program
CN116684420A (en) Cluster resource scheduling method, device, cluster system and readable storage medium
WO2017185248A1 (en) Apparatus and method for performing auto-learning operation of artificial neural network
JP2022512211A (en) Image processing methods, equipment, in-vehicle computing platforms, electronic devices and systems
CN112862083B (en) Deep neural network inference method and device in edge environment
CN112740174B (en) Data processing method, device, electronic equipment and computer readable storage medium
US20190130274A1 (en) Apparatus and methods for backward propagation in neural networks supporting discrete data
CN113741999A (en) Dependency-oriented task unloading method and device based on mobile edge calculation
US20180089555A1 (en) Neural network device and method of operating neural network device
CN112817741A (en) DNN task control method for edge calculation
Gerogiannis et al. Deep reinforcement learning acceleration for real-time edge computing mixed integer programming problems
DE112022000723T5 (en) BRANCHING PROCESS FOR A CIRCUIT OF A NEURONAL PROCESSOR
CN113747504A (en) Method and system for multi-access edge computing combined task unloading and resource allocation
CN112668639A (en) Model training method and device, server and storage medium
CN116011593B (en) Method and device for determining energy consumption of network model
US20230195531A1 (en) Energy-aware task scheduling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant