CN108984426B

CN108984426B - Method and apparatus for processing data

Info

Publication number: CN108984426B
Application number: CN201810878625.6A
Authority: CN
Inventors: 李振鹏
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2018-08-03
Filing date: 2018-08-03
Publication date: 2021-01-26
Anticipated expiration: 2038-08-03
Also published as: CN108984426A

Abstract

The embodiment of the application discloses a method and a device for processing data. One embodiment of the method comprises: selecting a target channel group from at least one preset channel group, wherein the at least one channel group is obtained by grouping channels included in a target layer in a preset convolutional neural network in advance; the following loading steps are performed: loading data included in the target channel group into a target cache; determining whether the processing of the data included in the target channel group is completed; and in response to determining that the processing is completed, reselecting the channel group as the target channel group from among the unselected channel groups of the at least one channel group to continue the loading step. The implementation mode can reduce the times of loading data into the convolutional neural network, and is beneficial to improving the operation efficiency of the convolutional neural network.

Description

Method and apparatus for processing data

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for processing data.

Background

With the development of computer technology, large-scale data needs to be processed in many fields. For example, in the fields of image processing, voice recognition, etc., a convolutional neural network is usually used to perform operations on a large amount of data, and usually, the data of the convolutional neural network is stored in a memory, and when some data needs to be processed, the data needs to be extracted from the memory and loaded into a cache.

Disclosure of Invention

The embodiment of the application provides a method and a device for processing data.

In a first aspect, an embodiment of the present application provides a method for processing data, where the method includes: selecting a target channel group from at least one preset channel group, wherein the at least one channel group is obtained by grouping channels included in a target layer in a preset convolutional neural network in advance; the following loading steps are performed: loading data included in the target channel group into a target cache; determining whether the processing of the data included in the target channel group is completed; and in response to determining that the processing is completed, reselecting the channel group as the target channel group from among the unselected channel groups of the at least one channel group to continue the loading step.

In some embodiments, reselecting a channel group as the target channel group from among unselected channel groups in the at least one channel group includes: and deleting the data included in the target channel group from the target cache, and reselecting the channel group as the target channel group from the unselected channel groups in at least one channel group.

In some embodiments, determining whether processing of data included in the target lane group is complete includes: in response to receiving a signal indicative of completion of processing of data included in the target channel group, determining that processing of data included in the target channel group is complete.

In some embodiments, the target channel group includes channels having data in corresponding location numbers; and loading data included in the target channel group into the target cache, including: selecting a position number from the position numbers as a target position number, and executing the following extraction steps: for the channels included in the target channel group, extracting data corresponding to the target position number from the channels, and loading the extracted data into a cache line corresponding to the target position number in a target cache; determining whether the unselected position number exists in each position number; in response to determining that there is a location number that is reselected from the unselected location numbers as a target location number, the extracting step is continued with the reselected target location number.

In some embodiments, the at least one channel group is previously determined by: determining the channel included by the target layer as an ungrouped channel, and executing the following grouping steps: determining whether the number of non-grouped channels is less than or equal to a preset number; determining an ungrouped lane as a lane group in response to determining that the ungrouped lane is less than or equal to; in response to the determination that the number of the channels is larger than the preset number, determining a preset number of channels as a channel group according to the arrangement sequence of the channels; the non-grouped channel is re-determined from the channels included in the target layer, and the grouping step is continued using the re-determined non-grouped channel.

In some embodiments, the preset number is determined in advance by: determining the size of a storage space of a cache line of a target cache; and dividing the size of the storage space by the size of the storage space occupied by single data in the convolutional neural network, and determining the obtained result as a preset number.

In a second aspect, an embodiment of the present application provides an apparatus for processing data, the apparatus including: the selecting unit is configured to select a target channel group from at least one preset channel group, wherein the at least one channel group is obtained by grouping channels included in a target layer in a preset convolutional neural network in advance; a loading unit configured to perform the following loading steps: loading data included in the target channel group into a target cache; determining whether the processing of the data included in the target channel group is completed; and the determining unit is configured to reselect the channel group as the target channel group to continue the loading step from the unselected channel groups in the at least one channel group in response to the determination processing being completed.

In some embodiments, the determining unit is further configured to: and deleting the data included in the target channel group from the target cache, and reselecting the channel group as the target channel group from the unselected channel groups in at least one channel group.

In some embodiments, the load unit is further configured to: in response to receiving a signal indicative of completion of processing of data included in the target channel group, determining that processing of data included in the target channel group is complete.

In some embodiments, the target channel group includes channels having data in corresponding location numbers; and the loading unit includes: an extraction module configured to select a position number as a target position number from the respective position numbers, performing the following extraction steps: for the channels included in the target channel group, extracting data corresponding to the target position number from the channels, and loading the extracted data into a cache line corresponding to the target position number in a target cache; determining whether the unselected position number exists in each position number; a determination module configured to, in response to determining that there is a presence, reselect a location number as a target location number from among the unselected location numbers, and continue to perform the extracting step with the reselected target location number.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.

In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the method as described in any implementation manner of the first aspect.

According to the method and the device for processing data provided by the embodiment of the application, the target channel group is selected from at least one channel group included by a preset convolutional neural network, the data included by the target channel group is loaded into the target cache, if the data included by the target channel group is processed, the channel group is selected from the unselected channel groups in the at least one channel group again to serve as the target channel group, and the reselected target channel group is loaded into the target cache again.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for processing data according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an application scenario of a method for processing data according to an embodiment of the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method for processing data according to an embodiment of the present application;

FIG. 5 is a block diagram of one embodiment of an apparatus for processing data according to an embodiment of the present application;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 to which a method for processing data or an apparatus for processing data of embodiments of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various applications, such as data processing applications, image processing applications, social platform software, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, and 103 are hardware, they may be various electronic devices supporting data processing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as a background data processing server that provides support for data uploaded by the

terminal devices

101, 102, 103. The background data processing server can process the acquired data and store the processing result (such as data extracted from a target channel group included in the convolutional neural network) into a target cache.

It should be noted that the method for processing data provided in the embodiment of the present application may be executed by the server 105, or may be executed by the

terminal devices

101, 102, and 103, and accordingly, the apparatus for processing data may be disposed in the server 105, or may be disposed in the

terminal devices

101, 102, and 103.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where the processed data does not need to be acquired from a remote place, the system architecture described above may not include a network, but only a terminal device or a server.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for processing data in accordance with the present application is shown. The method for processing data comprises the following steps:

step 201, selecting a target channel group from at least one preset channel group.

In this embodiment, an execution subject (e.g., a server or a terminal device shown in fig. 1) of the method for processing data may select a target channel group from at least one preset channel group. At least one channel group may be obtained by grouping channels included in a target layer in a preset convolutional neural network in advance. The convolutional neural network may be preset in the execution main body by a technician, or the execution main body may be obtained from other electronic devices in communication connection with the execution main body. In general, a convolutional neural network may include a plurality of layers, such as convolutional layers, pooling layers, and the like, wherein the number of convolutional layers and pooling layers may be at least one. The target layer may be a layer in a convolutional neural network that includes a plurality of channels. For example, a convolutional layer includes N (N is a positive integer) channels, where each channel may be a matrix that includes an equal amount of data.

In this embodiment, the execution body may select a target channel group from at least one preset channel group according to various methods. For example, at least one channel group may be selected as a target channel group in the arrangement order of the target layers of the respective channel groups in the convolutional neural network; alternatively, at least one channel group may be selected from the respective channel groups as a target channel group according to designation by a technician (e.g., according to a number of the channel group designated by the technician).

In some optional implementations of this embodiment, the at least one channel group may be determined in advance by:

firstly, determining the channel included by the target layer as an ungrouped channel, and executing the following grouping steps: determining whether the number of non-grouped channels is less than or equal to a preset number; in response to determining that the non-grouped lanes are less than or equal to, determining the non-grouped lanes as a lane group.

Then, in response to the fact that the number of the non-grouped channels is larger than the preset number, determining a preset number of channels as a channel group according to the arrangement sequence of the channels; the non-grouped channel is re-determined from the channels included in the target layer, and the grouping step is continued using the re-determined non-grouped channel. It should be noted that, the execution main body may determine a preset number of channels as a channel group in the order of the arrangement of the channels, and in the order of the positive order or the negative order.

As an example, assuming that the target layer includes 40 channels and the preset number is 16, the target layer may be divided into three channel groups, respectively including 16, and 8 channels.

In some optional implementations of this embodiment, the preset number may be determined in advance by:

first, the storage space size of the cache line of the target cache is determined. The target cache may be a cache of a CPU (Central Processing Unit) of the electronic device, or a cache of a CPU of another electronic device communicatively connected to the electronic device. The Cache Line (Cache Line) is the minimum Cache unit of the Cache of the CPU, that is, the data read from the memory by the CPU each time needs to occupy at least one Cache Line. As an example, assuming that the size of the storage space of the cache line is 64 bytes, if the size of the storage space of the target cache is 512 bytes, the number of the cache lines of the target cache is 512/64-8, and at least one of the cache lines is occupied by the data read from the memory by the CPU each time.

Then, the size of the storage space is divided by the size of the storage space occupied by the single data in the convolutional neural network, and the obtained result is determined as a preset number. As an example, assuming that the size of the storage space of the cache line is 64 bytes, and assuming that the size of the storage space occupied by a single data in the convolutional neural network is 1 byte, the preset number is 64/1-64; assuming that the size of the storage space occupied by a single data in the convolutional neural network is 4 bytes, the preset number is 64/4-16. The time consumed by loading the data from the memory to the cache is an important factor influencing the speed of the CPU for data processing, and the storage space occupied by the data extracted each time is set to be the same as the storage space of the cache line, so that the storage space of the cache line can be fully utilized, the times of loading the data into the cache are reduced, and the efficiency of the CPU for processing the data can be improved.

Step 202, the following loading steps are performed: loading data included in the target channel group into a target cache; it is determined whether processing of data included in the target channel group is completed.

In this embodiment, based on the target channel group selected in step 201, the execution body may perform the following loading steps:

at step 2021, the data included in the target channel group is loaded into the target cache. The target cache may be a cache of a CPU (Central Processing Unit) of the electronic device, or a cache of a CPU of another electronic device communicatively connected to the electronic device. Generally, data included in the target channel group can be stored in the target cache at one time, and the CPU can process the data in the target cache, so that the efficiency of loading the data into the target cache can be improved, and the data processing time can be shortened.

In some optional implementations of this embodiment, the data in the channels included in the target channel group may have corresponding location numbers. The position number may be preset by a technician, or may be automatically assigned by the execution main body according to a storage order of the data. The execution body may load data included in the target channel group into the target cache according to the following steps:

first, a position number is selected as a target position number from among the respective position numbers. Specifically, the execution main body described above may select a position number as a target position number from the respective position numbers in various ways. As an example, the execution body described above may select a position number as a target position number from the respective position numbers in a randomly selected manner. Alternatively, the execution body may select a position number as the target position number from the respective position numbers based on the arrangement order of the position numbers. For example, assuming that the position number is a number, the execution body may select the smallest number as the target position number.

Then, the following extraction steps are performed:

step one, extracting data corresponding to the target position number from the channel included in the target channel group, and loading the extracted data into a cache line corresponding to the target position number in the target cache. The corresponding relationship between the target location number and the cache line may be pre-established, or may be automatically allocated by the execution main body. For example, assuming that the target lane group includes 16 lanes, each lane includes 4 data, each lane includes data with a position number of "1", "2", "3", "4", respectively, and the target cache includes 8 cache lines, respectively, "a", "B", "C" …, "H", each of which may store 16 data, and if the target position number is "1", each of the 8 cache lines is configured to store data, the data with a position number of "1" included in each lane may be stored in the cache line "a" (i.e., the cache line corresponding to the target position number of "1") in the order of arrangement of the lanes. When this step is executed again, the target position number becomes "2", and the cache line corresponding to the target position number "2" may be cache line "B".

And step two, determining whether the unselected position numbers exist in the position numbers. As an example, the target location number of each selection may be marked as a location number that has been selected, and the location numbers that have not been marked as having been selected may be marked as location numbers that have not been selected.

Finally, in response to determining that there is an unselected position number in each position number, reselecting the position number from the unselected position numbers as a target position number, and continuing to perform the above-described extraction steps (i.e., steps one to two) with the reselected target position number. Specifically, the execution body may reselect a position number as a target position number from among unselected position numbers in the same method as the first selection of a position number as a target position number from among the respective position numbers.

By executing the optional implementation mode, the extracted data can be stored in the target cache according to a specific sequence, so that the data in the target cache is arranged according to the specific sequence, and the efficiency of data processing is further improved.

At step 2022, it is determined whether the processing of the data included in the target channel group is complete. Specifically, the execution body may determine whether the processing of the data included in the target channel group is completed in various ways. For example, when all the data in the target layer including the target channel group is processed, a processing result is generated, and at this time, the execution subject determines that the data included in the target channel group is processed.

In some optional implementations of this embodiment, the executing body may determine that the data processing included in the target channel group is completed in response to receiving a signal indicating that the data processing included in the target channel group is completed. Generally, data of each layer included in the convolutional neural network often needs to be processed more than once (for example, a residual error network (ResNet) generally includes a plurality of convolutional layers, a convolutional layer is processed in combination with at least one other convolutional layer, and data included in the convolutional layer needs to be read and called more than once). The generated signal may be a single signal (for example, "1" represents that the processing is completed, and "0" represents that the processing is not completed), or a processing result generated by the completion of the processing of the target channel group may be determined as a signal indicating that the processing of the data included in the target channel group is completed.

As an example, assume that the convolutional neural network includes convolutional layer A, B, C, where convolutional layer a is the target layer, which is divided into three channel groups, a1, a2, and A3, where a1 is the target channel group. When processing convolutional layer B and convolutional layer C, it is necessary to refer to the processing results obtained by the first processing method and the second processing method for convolutional layer a, respectively, so that target channel group a1 may be loaded into the cache, convolutional layer B refers to the processing result obtained by processing target channel group a1 in the first processing method, and convolutional layer C refers to the processing result obtained by processing target channel group a1 in the second processing method. When the processing of the convolutional layers B and C with reference to the processing result of the processing of the target channel group a1 is completed, a signal indicating that the processing of the data included in the target channel group is completed is generated.

Step 203, in response to determining that the processing is completed, reselecting the channel group as the target channel group from the unselected channel groups in the at least one channel group to continue the loading step.

In this embodiment, the execution main body may reselect the channel group as the target channel group from the unselected channel groups in the at least one channel group to continue the loading step in response to determining that the data processing included in the target channel group is completed. As an example, the execution body may reselect the channel group as the target channel group from among the unselected channel groups in the arrangement order of the respective channel groups. For example, each channel group may have a corresponding number, and the execution body may select a channel group corresponding to the smallest number as a target channel group from among the unselected channel groups. Alternatively, the execution body may reselect the channel group from the unselected channel groups as the target channel group according to the specification of the technician (for example, according to the number of the channel group specified by the technician).

Generally, the target layer includes a large number of channels, that is, the target layer includes a large amount of data, so that the data included in the target layer cannot be completely loaded into the target cache. If the data included in the target layer can be loaded into the target cache completely, a large amount of cache space is occupied, and the cache space is insufficient. The target layer is divided into a plurality of channel groups, and the step 202 and the step 203 are repeatedly executed based on each channel group, so that the times of extracting data from the memory and loading the data into the target cache can be reduced, and the efficiency of data processing is improved.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for processing data according to the present embodiment. In the application scenario of fig. 3, the terminal device 301 selects a target channel group 3021 from at least one preset channel group 302 according to the arrangement order of the channel groups. At least one channel group 302 is obtained by grouping channels included in a target layer 3031 in a preset convolutional neural network 303 in advance. Then, the terminal device 301 performs the following loading step: the data included in the target channel group 3021 is loaded into the target cache 304, and it is determined whether the processing of the data included in the target channel group is completed. Next, in response to determining that the processing is completed (for example, receiving a signal "1" indicating that the processing is completed), the terminal device 301 reselects a channel group (for example, the channel group 3022) from among the unselected channel groups of the at least one channel group as a target channel group to continue the loading step. By repeatedly executing the loading step, the channel groups included in the target layer can be loaded into the target cache 304 one by one, so that the CPU including the target cache 304 performs data processing.

The method provided by the above embodiment of the present application may reduce the number of times of loading data into the convolutional neural network by selecting a target channel group from at least one channel group included in a preset convolutional neural network, and then loading data included in the target channel group into a target cache, and if processing of the data included in the target channel group is completed, then reselecting a channel group from unselected channel groups in the at least one channel group as the target channel group, and loading the reselected target channel group into the target cache again, and by performing the steps of selecting the target channel group and loading the target channel group multiple times, the method may help to improve the operational efficiency of the convolutional neural network.

With further reference to FIG. 4, a flow 400 of yet another embodiment of a method for processing data is shown. The flow 400 of the method for processing data includes the steps of:

step 401, selecting a target channel group from at least one preset channel group.

In this embodiment, step 401 is substantially the same as step 201 in the corresponding embodiment of fig. 2, and is not described here again.

Step 402, the following loading steps are performed: loading data included in the target channel group into a target cache; it is determined whether processing of data included in the target channel group is completed.

In this embodiment, step 402 is substantially the same as step 202 in the corresponding embodiment of fig. 2, and is not described herein again.

Step 403, in response to determining that the processing is completed, deleting the data included in the target channel group from the target cache, and reselecting the channel group as the target channel group from the unselected channel groups in the at least one channel group.

In this embodiment, an execution subject (e.g., the server or the terminal device shown in fig. 1) of the method for processing data may delete data included in the target channel group from the target cache in response to determining that the processing of the data included in the target channel group is completed, and reselect the channel group as the target channel group from among the unselected channel groups in the at least one channel group. The step of deleting the data included in the target channel group from the target cache may be performed before or after the step of reselecting the target channel group, which is not limited herein. When the data processing of the target channel group is completed, the data included in the target channel group does not participate in the operation any more, and at this time, the data included in the target channel group is deleted from the target cache, so that the storage space of the target cache can be released, and the data processing efficiency can be improved.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for processing data in this embodiment highlights a step of deleting data included in the target channel group from the target cache in response to determining that the data processing included in the target channel group is completed, so that the storage space of the target cache can be released, which is helpful for further improving the efficiency of data processing.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for processing data, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the apparatus 500 for processing data of the present embodiment includes: a selecting unit 501, configured to select a target channel group from at least one preset channel group, where at least one channel group is obtained by grouping channels included in a target layer in a preset convolutional neural network in advance; a loading unit 502 configured to perform the following loading steps: loading data included in the target channel group into a target cache; determining whether the processing of the data included in the target channel group is completed; a determining unit 503 configured to, in response to the determination processing being completed, reselect a channel group as a target channel group from among the unselected channel groups of the at least one channel group to continue the loading step.

In this embodiment, the selecting unit 501 may select a target channel group from at least one preset channel group. At least one channel group may be obtained by grouping channels included in a target layer in a preset convolutional neural network in advance. The convolutional neural network may be preset in the apparatus 500 by a technician, or may be obtained by the apparatus 500 from other electronic devices communicatively connected thereto. In general, a convolutional neural network may include a plurality of layers, such as convolutional layers, pooling layers, and the like, wherein the number of convolutional layers and pooling layers may be at least one. The target layer may be a layer in a convolutional neural network that includes a plurality of channels. For example, a convolutional layer includes N (N is a positive integer) channels, where each channel may be a matrix that includes an equal amount of data.

In this embodiment, the selecting unit 501 may select the target channel group from the preset at least one channel group according to various methods. For example, at least one channel group may be selected as a target channel group in the arrangement order of the target layers of the respective channel groups in the convolutional neural network; alternatively, at least one channel group may be selected from the respective channel groups as a target channel group according to designation by a technician (e.g., according to a number of the channel group designated by the technician).

In this embodiment, the loading unit 502 may perform the following loading steps:

first, data included in the target channel group is loaded into the target cache. The target cache may be a cache of a CPU (Central Processing Unit) of the electronic device, or a cache of a CPU of another electronic device communicatively connected to the electronic device. Generally, data included in the target channel group can be stored in the target cache at one time, and the CPU can process the data in the target cache, so that the efficiency of loading the data into the target cache can be improved, and the data processing time can be shortened.

Then, it is determined whether the processing of the data included in the target channel group is completed. Specifically, the load unit 502 may determine whether the processing of the data included in the target channel group is completed in various ways. For example, when all the data in the target layer including the target channel group is processed, the processing result is generated, and at this time, the load unit 502 determines that the data included in the target channel group is processed.

In this embodiment, the determining unit 503 may, in response to determining that the data processing included in the target channel group is completed, reselect a channel group as the target channel group from among the unselected channel groups in the at least one channel group, and continue to perform the loading step. As an example, the determination unit 503 may reselect the channel group as the target channel group from among the unselected channel groups in the order of arrangement of the respective channel groups. For example, each channel group may have a corresponding number, and the determining unit 503 may select a channel group corresponding to the smallest number as the target channel group from among the unselected channel groups. Alternatively, the determination unit 503 may reselect the channel group as the target channel group from among the unselected channel groups according to the specification of the technician (for example, according to the number of the channel group specified by the technician).

In some optional implementations of this embodiment, the determining unit 501 may be further configured to: and deleting the data included in the target channel group from the target cache, and reselecting the channel group as the target channel group from the unselected channel groups in at least one channel group.

In some optional implementations of this embodiment, the loading unit 502 may be further configured to: in response to receiving a signal indicative of completion of processing of data included in the target channel group, determining that processing of data included in the target channel group is complete.

In some optional implementations of this embodiment, the data in the channels included in the target channel group has corresponding position numbers; and the loading unit 502 may include: an extraction module (not shown in the figure) configured to select a position number from the respective position numbers as a target position number, perform the following extraction steps: for the channels included in the target channel group, extracting data corresponding to the target position number from the channels, and loading the extracted data into a cache line corresponding to the target position number in a target cache; determining whether the unselected position number exists in each position number; a determination module (not shown in the figure) configured to, in response to determining the presence, reselect a location number as a target location number from among the unselected location numbers, and continue performing the extracting step with the reselected target location number.

In some optional implementations of this embodiment, the at least one channel group may be determined in advance by: determining the channel included by the target layer as an ungrouped channel, and executing the following grouping steps: determining whether the number of non-grouped channels is less than or equal to a preset number; determining an ungrouped lane as a lane group in response to determining that the ungrouped lane is less than or equal to; in response to the determination that the number of the channels is larger than the preset number, determining a preset number of channels as a channel group according to the arrangement sequence of the channels; the non-grouped channel is re-determined from the channels included in the target layer, and the grouping step is continued using the re-determined non-grouped channel.

In some optional implementations of this embodiment, the preset number may be determined in advance by: determining the size of a storage space of a cache line of a target cache; and dividing the size of the storage space by the size of the storage space occupied by single data in the convolutional neural network, and determining the obtained result as a preset number.

The apparatus provided in the foregoing embodiment of the present application selects a target channel group from at least one channel group included in a preset convolutional neural network, and loads data included in the target channel group into a target cache, and if processing of the data included in the target channel group is completed, reselects a channel group from unselected channel groups in the at least one channel group as the target channel group, and loads the reselected target channel group into the target cache again, and by performing steps of selecting the target channel group and loading the target channel group multiple times, the number of times of loading the data into the convolutional neural network can be reduced, which is beneficial to improving the operational efficiency of the convolutional neural network.

Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use in implementing an electronic device (e.g., the server or terminal device shown in FIG. 1) of an embodiment of the present application is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601.

It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a selecting unit, a loading unit, and a determining unit. Here, the names of the units do not constitute a limitation to the unit itself in some cases, and for example, the selecting unit may also be described as a "unit that selects a target channel group from at least one preset channel group".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: selecting a target channel group from at least one preset channel group, wherein the at least one channel group is obtained by grouping channels included in a target layer in a preset convolutional neural network in advance; the following loading steps are performed: loading data included in the target channel group into a target cache; determining whether the processing of the data included in the target channel group is completed; and in response to determining that the processing is completed, reselecting the channel group as the target channel group from among the unselected channel groups of the at least one channel group to continue the loading step.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for processing data, comprising:

selecting a target channel group from at least one preset channel group, wherein the at least one channel group is obtained by grouping channels included in a target layer in a preset convolutional neural network in advance according to a preset number, and the preset number is determined based on the size of a storage space of a cache line of a target cache;

the following loading steps are performed: loading data included in the target channel group into a target cache; determining whether the processing of the data included in the target channel group is completed;

in response to determining that the processing is complete, reselecting a channel group as a target channel group from among the unselected channel groups of the at least one channel group to continue performing the loading step.

2. The method of claim 1, wherein said reselecting a channel group as a target channel group from among unselected channel groups of the at least one channel group comprises:

and deleting the data included in the target channel group from the target cache, and reselecting the channel group as the target channel group from the unselected channel groups in the at least one channel group.

3. The method of claim 1, wherein the determining whether processing of the data included in the target channel group is complete comprises:

in response to receiving a signal indicative of completion of processing of data included in the target channel group, determining that processing of data included in the target channel group is complete.

4. The method of claim 1, wherein the target channel group includes channels having data with corresponding location numbers; and

the loading of the data included in the target channel group into the target cache includes:

selecting a position number from the position numbers as a target position number, and executing the following extraction steps: for the channels included in the target channel group, extracting data corresponding to the target position number from the channels, and loading the extracted data into a cache line corresponding to the target position number in a target cache; determining whether the unselected position number exists in each position number;

in response to determining that there is a location number that is reselected from the unselected location numbers as a target location number, the extracting step is continued with the reselected target location number.

5. Method according to one of claims 1 to 4, wherein the at least one channel group is determined beforehand by:

determining the channel included by the target layer as an ungrouped channel, and executing the following grouping steps: determining whether the number of non-grouped channels is less than or equal to a preset number; determining an ungrouped lane as a lane group in response to determining that the ungrouped lane is less than or equal to;

in response to the determination that the number of the channels is larger than the preset number, determining a preset number of channels as a channel group according to the arrangement sequence of the channels; re-determining an ungrouped channel from among the channels included in the target layer, and continuing the grouping step using the re-determined ungrouped channel.

6. The method of claim 5, wherein the predetermined number is determined based on a size of a storage space of a cache line of the target cache, comprising:

determining the size of a storage space of a cache line of a target cache;

and dividing the size of the storage space by the size of the storage space occupied by the single data in the convolutional neural network, and determining the obtained result as a preset number.

7. An apparatus for processing data, comprising:

the selecting unit is configured to select a target channel group from at least one preset channel group, wherein the at least one channel group is obtained by grouping channels included in a target layer in a preset convolutional neural network in advance according to a preset number, and the preset number is determined based on the size of a storage space of a cache line of a target cache;

a loading unit configured to perform the following loading steps: loading data included in the target channel group into a target cache; determining whether the processing of the data included in the target channel group is completed;

a determining unit configured to reselect a channel group as a target channel group from among the at least one channel group not selected in response to completion of the determination process to continue the loading step.

8. The apparatus of claim 7, wherein the determination unit is further configured to:

9. The apparatus of claim 7, wherein the loading unit is further configured to:

10. The apparatus of claim 7, wherein the target lane group includes lanes having data with corresponding location numbers; and

the load unit includes:

an extraction module configured to select a position number as a target position number from the respective position numbers, performing the following extraction steps: for the channels included in the target channel group, extracting data corresponding to the target position number from the channels, and loading the extracted data into a cache line corresponding to the target position number in a target cache; determining whether the unselected position number exists in each position number;

a determination module configured to, in response to determining that there is a presence, reselect a location number as a target location number from among the unselected location numbers, and continue performing the extracting step with the reselected target location number.

11. The apparatus according to one of claims 7 to 10, wherein the at least one channel group is determined beforehand by:

12. The apparatus of claim 11, wherein the preset number is determined based on a size of a storage space of a cache line of a target cache, comprising:

determining the size of a storage space of a cache line of a target cache;

13. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

14. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-6.