CN116957027A

CN116957027A - Data processing method, electronic equipment and storage medium

Info

Publication number: CN116957027A
Application number: CN202310945981.6A
Authority: CN
Inventors: 高峰; 陈柏韬; 刘超
Original assignee: ARM Technology China Co Ltd
Current assignee: ARM Technology China Co Ltd
Priority date: 2023-07-28
Filing date: 2023-07-28
Publication date: 2023-10-27

Abstract

The application relates to the technical field of artificial intelligence and discloses a data processing method, electronic equipment and a storage medium. When the multi-core NPU processes data, the segmentation information in the memory can be read, based on the segmentation information, a plurality of cores in the multi-core NPU are adopted to load different parameter sets corresponding to different processing layer sets at the same time, and input data of the processing layer sets are processed respectively. Therefore, by adopting a plurality of cores in the multi-core NPU to process data, the waste of computing power resources of the multi-core NPU can be reduced under the condition that the input and output data before and after the model segmentation is the same.

Description

Data processing method, electronic equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a data processing method, an electronic device, and a storage medium.

Background

In general, by deploying a model in a multi-core embedded Neural network model processor (NPU), cores in the multi-core NPU can process input data by loading parameters corresponding to each layer in the model, and output data is obtained. Fig. 1 shows a partial schematic structure of a recurrent neural network model, which may include an input layer 110, a hidden layer 120, an output layer 130, and the like, as shown in fig. 1.

However, in the existing data processing scheme, because the layer-to-layer coupling of the model is strong and the parallelism is low, the input data is processed by using parameters corresponding to all layers in a kernel loading model in the multi-core NPU, so that the waste of computational resources of the multi-core NPU is caused.

Disclosure of Invention

In order to solve the problem of waste of computational power resources of a multi-core embedded neural network model processor, the embodiment of the application provides a data processing method, electronic equipment and a storage medium.

The first aspect of the embodiment of the application provides a data processing method, wherein an electronic device comprises a first processor, and the first processor comprises a plurality of kernels; and the method comprises the following steps: acquiring first input data of a first model; based on the segmentation information of the first model, segmenting a plurality of processing layers in the first model into a plurality of processing layer sets matched with the segmentation information, wherein the processing layers in the same processing layer set meet effective segmentation conditions, and the effective segmentation conditions are related to the running performance of the first model; and processing each part of data of the first input data which is required to be processed by each processing layer set by adopting the kernel corresponding to each processing layer set.

Based on the scheme, the multiple cores in the multi-core NPU are adopted for data processing, so that the waste of computing power resources of the multi-core NPU can be reduced under the condition that the input and output data before and after the model segmentation are the same.

It is understood that the first processor may be a multi-core embedded neural network model processor. The first model is a neural network model. The first input data may include image data, text data, and audio data, and processing each sub-data may include performing image feature extraction on the image data, performing text feature extraction on the text data, and performing audio feature extraction on the audio data to obtain, for example, pixels, characters, audio signals, and the like.

It can be understood that the segmentation information may divide a processing layer partially connected in the model into a processing layer set, and divide parameters corresponding to all processing layers in the processing layer set into a parameter set, so as to obtain a plurality of parameter sets. Wherein the plurality of parameter sets are a plurality of parameter sets capable of processing data in parallel.

In one possible implementation of the first aspect, the segmentation information is obtained by: determining a first processing layer and a second processing layer from a plurality of processing layers in a first model, the first processing layer and the second processing layer being connected; judging whether the first processing layer and the second processing layer meet the effective segmentation condition or not; dividing the first processing layer and the second processing layer into a first processing layer set corresponding to the first processing layer and the second processing layer meeting the effective segmentation condition; the first processing layer is divided into a first set of processing layers and the second processing layer is divided into a second set of processing layers corresponding to the first processing layer and the second processing layer not meeting the effective slicing condition.

For example, for a process layer of layer 9 in the model, a process layer of layer 9 and a process layer of layer 8 may be put into a temporary process layer set, i.e., a temporary subgraph described below. And loading parameters corresponding to all processing layers in the temporary processing layer set by adopting a processor, processing input data (namely output data of the processing layer with the layer mark of 7) of the processing layer set, obtaining a performance index when processing the input data and redundancy rate when repeatedly processing partial data, wherein the performance index when processing the input data is higher than a performance index threshold value, and the redundancy rate when repeatedly processing the data is lower than the redundancy rate threshold value (for example, the redundancy rate is 0.2 and lower than the redundancy rate threshold value of 0.6), and dividing the processing layer with the layer mark of 9 and the processing layer with the layer mark of 8 into the same processing layer set.

In one possible implementation of the first aspect, the method further includes: determining a third processing layer from the plurality of processing layers in the first model, the third processing layer being connected to the first processing layer; judging whether the first treatment layer, the second treatment layer and the third treatment layer meet the effective segmentation condition; and dividing the first processing layer, the second processing layer and the third processing layer into a first processing layer set corresponding to the first processing layer, the second processing layer and the third processing layer meeting the effective segmentation condition.

In one possible implementation of the first aspect, the effective segmentation condition includes at least one of: when the first processor loads parameters corresponding to all processing layers in the same processing layer set, and processes partial data required to be processed by the same processing layer set in the first input data, the performance index is higher than the performance index threshold, when the first processor loads parameters corresponding to all processing layers in the same processing layer set, and processes partial data required to be processed by the same processing layer set in the first input data, the redundancy rate of repeated processing of the partial data is lower than the redundancy rate threshold.

For example, a redundancy rate threshold value, which is the maximum redundancy rate, may be set to 0.6, and when the redundancy rate of the repeated processing of the partial data is lower than 0.6, it may be determined that the first processing layer and the second processing layer satisfy the effective slicing condition. When the redundancy rate of the repeated processing of the partial data is higher than 0.6, it may be determined that the first processing layer and the second processing layer do not satisfy the effective slicing condition.

In one possible implementation manner of the first aspect, the cores corresponding to the processing layer sets include a first core and a second core, and the processing, with the cores corresponding to the processing layer sets, each portion of data of the first input data that needs to be processed by each processing layer set includes: and processing the first sub-data in the first input data required to be processed by the first processing layer set by adopting the first kernel, and processing the second sub-data in the first input data required to be processed by the first processing layer set by adopting the second kernel.

For example, a first processor includes a first core and a second core. The model may include 10 process layers (e.g., layers identified as 0-9 in fig. 3), the process layers identified as 0, 1, 2, 5, 6, 7 may be set in memory to divide the process layers into a first set of process layers, the process layers identified as 3, 4 into a second set of process layers, and the process layers identified as 8, 9 into a third set of process layers.

When processing data, a first kernel in the multi-core NPU may be used to load a first parameter set corresponding to the first processing layer set, to process input data of the first processing layer set (i.e. all or part of input data in the model), and a second kernel in the multi-core NPU may be used to load a second parameter set corresponding to the second processing layer set, to process input data of the second processing layer set (i.e. all or part of input data in the model), to load a first parameter set corresponding to the first processing layer set in the first kernel, to process input data of the first processing layer set, and then to load a third parameter set corresponding to the third processing layer set, to process input data of the third processing layer set (i.e. output data of the first processing layer set and output data of the second processing layer set). And the second kernel in the multi-core NPU is used for loading a third parameter set corresponding to the third processing layer set, or other kernels except the first kernel and the second kernel in the multi-core NPU are used for loading the third parameter set corresponding to the third processing layer set, so that input data (namely output data of the first processing layer set and output data of the second processing layer set) of the third processing layer set are processed.

In one possible implementation manner of the first aspect, using the first kernel, processing first sub-data in the first input data that needs to be processed by the first processing layer set includes: loading a first parameter set corresponding to all layers in the first processing layer set by using the first kernel, processing first sub-data in the first input data, and processing second sub-data in the first input data required to be processed by the first processing layer set by using the second kernel, including: and loading a second parameter set corresponding to all layers in the second processing layer set by adopting the second kernel, and processing second sub-data in the first input data.

In one possible implementation of the first aspect, the method further includes: the first sub data is stored to a first internal storage space of the first core and the second sub data is stored to a second internal storage space of the second core.

It will be appreciated that the first output sub-data obtained by processing the first sub-data by the first process may be stored in the first internal storage space of the first core at the same time as the first sub-data is stored in the first internal storage space of the first core.

In the embodiment of the application, the data corresponding to the first kernel is stored in the internal storage space of the first kernel, compared with the single kernel, the whole first input data is processed and stored in the internal storage space of the single kernel, so that the storage pressure of the single kernel can be relieved, and compared with the data stored in the external memory, the data can be directly read from the internal memory of the kernel without reading the data from the external memory based on a bus, and the access times to the external memory can be reduced, the data processing efficiency can be improved, and the data reading delay can be reduced while the storage pressure of the single kernel is relieved.

In one possible implementation of the first aspect, the first processor is a multi-core embedded neural network model processor, the first model is a neural network model, and the first input data includes image data, text data, and audio data.

In a second aspect, the present application provides an electronic device comprising: the memory is used for storing instructions executed by one or more processors of the electronic device, and the processor is one of the one or more processors of the electronic device and is used for executing the data processing method.

In a third aspect, the application provides a readable storage medium having stored thereon instructions that, when executed on an electronic device, cause the electronic device to perform the data processing method of the application.

Drawings

FIG. 1 illustrates a partial schematic of a recurrent neural network model, according to some examples of the application;

FIG. 2 illustrates a schematic diagram of an application scenario, according to some examples of the application;

FIG. 3 illustrates a schematic diagram of segmentation information for segmenting a process layer in a model, according to some examples of the application;

FIG. 4 illustrates a flow diagram of a method of data processing, according to some examples of the application;

FIG. 5 illustrates a flow diagram of a model segmentation method, according to some examples of the application;

FIG. 6 illustrates a flow diagram of a method of determining a set of target layers, according to some examples of this application;

FIG. 7 illustrates a schematic structural diagram of a model, according to some examples of the application;

fig. 8 illustrates a schematic structural diagram of an electronic device, according to some examples of the application.

Detailed Description

Illustrative embodiments of the application include, but are not limited to, a data processing method, an electronic device, and a storage medium.

It will be appreciated that the data processing method according to the embodiments of the present application may be used in a neural network model, which may be applied in various fields, for example, in the fields of text processing, image processing, audio processing, automatic driving, etc. As shown in fig. 2, in some specific implementations, the server 210 may perform a slicing process on the neural network model 211, and send the sliced neural network model 211 to the terminal 220, so that the terminal 220 can execute the data processing method mentioned in the embodiments of the present application based on the sliced neural network model 211.

In order to solve the above problems, an embodiment of the present application discloses a data processing method. In the method, splitting information for splitting parameters in a model is set in a memory, for example, processing layers which are partially connected in the model are divided into a processing layer set, and parameters corresponding to all the processing layers in the processing layer set are divided into a parameter set, so that a plurality of parameter sets are obtained. Wherein the plurality of parameter sets are a plurality of parameter sets capable of processing data in parallel. When the multi-core NPU processes data, the segmentation information in the memory can be read, based on the segmentation information, a plurality of cores in the multi-core NPU are adopted to load different parameter sets corresponding to different processing layer sets at the same time, and input data of the processing layer sets are processed respectively. Therefore, by adopting a plurality of cores in the multi-core NPU to process data, the waste of computing power resources of the multi-core NPU can be reduced under the condition that the input and output data before and after the model segmentation is the same.

Fig. 3 shows a schematic diagram of slicing information for slicing a processing layer in a model, and as shown in fig. 3, the model may include 10 processing layers (such as processing layers with layer identifiers of 0-9 in fig. 3), and the processing layers with layer identifiers of 0, 1, 2, 5, 6, and 7 may be set in a memory to be divided into a first processing layer set, the processing layers with layer identifiers of 3 and 4 are divided into a second processing layer set, and the processing layers with layer identifiers of 8 and 9 are divided into a third processing layer set.

Based on the segmentation information shown in fig. 3, when data is processed, a first kernel in the multi-core NPU may be used to load parameters corresponding to all processing layers in the first processing layer set, and simultaneously a second kernel in the multi-core NPU is used to load parameters corresponding to all processing layers in the second processing layer set, and after the first kernel loads parameters corresponding to all processing layers in the first processing layer set, the first kernel in the multi-core NPU may be used to load parameters corresponding to all processing layers in the third processing layer set.

In some alternative examples, the specific splitting manner of splitting the parameters in the model may be:

based on a first processing layer (any one processing layer) in the model, a first processing layer set is created, and a second processing layer connected with the first processing layer in the model is put into the first processing layer set. If the first processing layer set meets the effective segmentation condition, dividing the first processing layer and the second processing layer into one processing layer set, namely the first processing layer set. If the first processing layer set does not meet the effective segmentation condition, creating a second processing layer set based on the second processing layer, dividing the first processing layer into one processing layer set, namely the first processing layer set, and dividing the second processing layer into one processing layer set, namely the second processing layer set. Then, the rest processing layers in the model can be put into the first processing layer set, the second processing layer set or the re-creation processing layer set based on the same method until all the processing layers in the model are in any one processing layer set, and the parameters corresponding to all the processing layers in each processing layer set are divided into one parameter set.

In some alternative examples, the effective segmentation conditions are related to the running performance of the model. For example, the effective slicing condition may include that the processor loads parameters corresponding to all processing layers in the processing layer set, the performance index of the data when processed is higher than the performance index threshold, and the redundancy rate of the repeated processing of part of the data is lower than the redundancy rate threshold.

For convenience of explanation, the process layer will be simply referred to as a layer hereinafter.

In the following description of the data processing method according to the embodiment of the present application, fig. 4 shows a schematic flow chart of a data processing method, and as shown in fig. 4, the data processing method may include:

401: the segmentation information is stored in the form of a configuration file to a memory.

It may be understood that the segmentation information may be segmentation information for segmenting parameters in the model, where the segmentation information may be information for dividing a layer partially connected in the model into a layer set, and dividing parameters corresponding to all layers in the layer set into a parameter set, which is also referred to as sub-graph topology information. Wherein the plurality of parameter sets are a plurality of parameter sets capable of processing data in parallel. When the multi-core NPU processes data, the segmentation information in the memory can be read, based on the segmentation information, a plurality of cores in the multi-core NPU are adopted to load different parameter sets corresponding to different layer sets at the same time, and input data of the layer sets are processed respectively. Therefore, by adopting a plurality of cores in the multi-core NPU to process data, the waste of computing power resources of the multi-core NPU can be reduced under the condition that the input and output data before and after the model segmentation is the same.

For example, as shown in fig. 3, the model may include 10 layers (such as layers identified as 0-9 in fig. 3), and layers identified as 0, 1, 2, 5, 6, 7 may be set in memory to be divided into a first layer set, layers identified as 3, 4 are divided into a second layer set, and layers identified as 8, 9 are divided into a third layer set. The sub-graph topology information may then be stored in the form of a configuration file into memory.

In some alternative examples, the splitting information may be stored in the internal memory space of the multi-core NPU in the form of a configuration file, or may be stored in the external memory in the form of a configuration file, such as a Double Data Rate (DDR) memory.

402: the configuration file is read from the memory as the data is processed.

It will be appreciated that the multi-core NPU may read a configuration file from an internal memory space of the multi-core NPU or from an external memory while processing data.

403: based on the segmentation information in the configuration file, a plurality of cores in the multi-core processor are adopted to process the input data, and output data is obtained.

It can be understood that after the configuration file is read, based on the segmentation information in the configuration file, namely the sub-graph topology information, a plurality of kernels in the multi-core NPU can be adopted to simultaneously load parameters corresponding to all layers in different layer sets, and input data of the layer sets are respectively processed.

As mentioned above in fig. 3, in some optional examples, after the configuration file is read, a first kernel in the multi-core NPU may be used to load a first parameter set corresponding to the first layer set, process input data of the first layer set (i.e., all or part of input data in the model), and simultaneously, a second kernel in the multi-core NPU is used to load a second parameter set corresponding to the second layer set, process input data of the second layer set (i.e., all or part of input data in the model), process the first parameter set corresponding to the first layer set after the first kernel is loaded, process input data of the first layer set, then load a third parameter set corresponding to the third layer set, and process input data of the third layer set (i.e., output data of the first layer set and output data of the second layer set). And the second kernel in the multi-core NPU is used for loading the third parameter set corresponding to the third layer set, or other kernels except the first kernel and the second kernel in the multi-core NPU are used for loading the third parameter set corresponding to the third layer set, so that the input data (namely the output data of the first layer set and the output data of the second layer set) of the third layer set are processed.

The data processing method according to the embodiment of the present application is described below. Fig. 5 shows a flow chart of a model segmentation method, as shown in fig. 5, the model segmentation method may include:

501: a set of layers to be processed is determined from the model.

It is understood that the set of layers to be processed may include at least one layer in the model, such as an output layer, a convolution layer, etc. in a neural network model. In some alternative examples, all layers of the model may be traversed, if the traversed current layer is an output layer of the model, the current layer may be used as a layer to be processed, and if the traversed current layer is determined to be a layer unsuitable for segmentation by the determination module, a parent layer of the current layer may be used as a layer to be processed. The output data of the parent layer is the input data of the current layer.

In some alternative implementations, a supported cut list may be stored in the determination module, which may include multiple layer types. If the layer type corresponding to the current layer exists in the supporting segmentation list, judging that the current layer is a layer suitable for segmentation. If the layer type corresponding to the current layer does not exist in the supporting segmentation list, judging that the current layer is a layer unsuitable for segmentation.

In some optional examples, all layers of the model may be traversed, and if the traversed current layer is an output layer of the model, or the current layer is judged to be a layer suitable for segmentation by the judging module, the current layer may be taken as a root layer. Thus, by traversing all layers of the model, a root level layer set can be obtained, denoted root nodes. Then, the root level layer set root nodes can be traversed, and whether the traversed current root level layer, namely the root layer, is a layer suitable for segmentation or not is judged.

And if the judgment result is yes, taking the current root layer, namely the root layer, as a layer to be processed, namely a collection layer, and marking the layer as a base_merge_node. If the judging result is negative, further judging whether the parent layer of the current root layer, namely the root layer, is a layer suitable for segmentation, and if the judging result is positive, taking the parent layer of the current root layer, namely the root layer, as a layer to be processed, namely a collection layer, and marking the parent layer as a base_merge_node. Thus, all base_merge_nodes can be denoted as base_merge_nodes as a set of layers to be processed, i.e., a collection layer set.

In the embodiment of the application, the acquired layer to be processed is the root layer accuracy can be ensured by acquiring the root layer set first and then judging whether the root layer in the root layer set is the layer suitable for segmentation, and the subsequent layer dividing accuracy in the model can be improved.

502: an index layer is determined from the set of layers to be processed.

It will be appreciated that the index layer may be any layer in the set of layers to be processed, denoted mp.

In some alternative examples, the set of layers to be processed may be copied to a set of copy layers, i.e. a new set, denoted rest_merge_modes, and the index layer mp may be determined from the new set rest_merge_modes.

In some alternative examples, a matching layer set may be defined for recording which layers have completed the splitting (i.e., which layers have been partitioned into layer sets), denoted as match nodes, and the matching layer set match nodes may be initialized to be empty. Meanwhile, an effective subgraph for recording which layer sets have completed layer splitting (i.e., dividing a plurality of layers into the same layer set) can be defined, and is denoted as valid subgraphs, and the effective subgraphs can be initialized to be empty.

503: whether the index layer belongs to the matching layer set is judged, if not, the step is switched to step 504, otherwise, the step is switched to step 502.

In some alternative examples, the layer identification of the index layer may be matched with the layer identifications in the matching layer set. When there is no layer identifier of the index layer in the layer identifiers in the matching layer set, it may be determined that the index layer does not belong to the matching layer set, and then step 504 is performed, that is, the target layer set is determined based on the index layer and all parent layers of the index layer. When the layer identifier of the index layer exists in the layer identifiers in the matching layer set, it can be determined that the index layer belongs to the matching layer set, and then the process goes to step 502, that is, the index layer is redetermined from the layer set to be processed, and the step of determining whether the index layer belongs to the matching layer set is repeated until the redetermined index layer belongs to the matching layer set.

504: the target layer set is determined based on the index layer and all parent layers of the index layer.

It will be appreciated that, for a sliced model, if the input data is sliced in the H direction or the W direction, where H may represent the data of the input data in the vertical direction, and W may represent the data of the input data in the horizontal direction, the input data may be sliced with overlapping, which may result in the model performing repetitive processing on a portion of the input data, such as a convolution layer with a convolution kernel greater than a step size, and if the input data is sliced with overlapping, the output data may also overlap. Here, the ratio of the calculated amount of the repeated processing of the overlapped data to the total calculated amount may be defined as the redundancy rate, that is, the overlap, and the allowable maximum redundancy rate, that is, the redundancy rate threshold may be noted as overlap_thr, and alternatively, the maximum redundancy rate, that is, the redundancy rate threshold may be set to 0.6 in advance.

It is understood that the target layer set may refer to dividing a plurality of connected layers in the model into one layer set, and the layer set may put into the valid subgraph. And the processor loads the parameters corresponding to all layers in the layer set, and the performance of processing the input data of the target layer set and the redundancy rate of repeatedly processing partial data meet the requirements. For example, the performance index when processing input data of the target layer set is higher than the performance index threshold, and the redundancy rate when repeating processing of part of the data is lower than the redundancy rate threshold.

In some alternative examples, a sub-graph may be defined for recording which layers may be divided into the same layer set, denoted as a subgraph, and the index layer mp determined from the layer set to be processed is used as the first element in the sub-graph, i.e. the first layer. And, a candidate layer set for recording which layers are candidate layers of the index layer, denoted as candidate nodes, may be defined, and the candidate layers of the index layer may be all parent layers of the index layer, where input data of the parent layers are output data of the index layer.

It will be appreciated that all candidate layers in the candidate layer set candidates nodes may be cycled through until the candidate layer set candidates nodes is empty, resulting in a target layer set.

Detailed description of a specific manner of circularly traversing all layers in the candidate layer set to obtain the target layer set is provided below, and fig. 6 shows a flowchart of a method for determining the target layer set, as shown in fig. 6, where the method for determining the target layer set includes:

601: and determining a candidate layer to be processed from the candidate layer set of the index layer.

It will be appreciated that the candidate layer to be processed may be any layer in the candidate layer set, denoted candi node.

602: and judging whether the candidate layer to be processed meets the dividing condition.

Judging whether the candidate layer to be processed meets the dividing condition, if yes, turning to step 603, otherwise turning to step 601.

It is understood that the partitioning condition may be that the candidate layer candi node is not in the matching layer set match nodes, and is not in the to-be-processed layer set base_merge_nodes, and is not in the subgraph.

In some alternative examples, the layer identifier of the candidate layer to be processed may be matched with the layer identifier in the matching layer set, and the layer identifier of the candidate layer to be processed may be matched with the layer identifier in the subgraph.

In some optional examples, when there is no layer identifier of the candidate layer to be processed in the layer identifier in the matching layer set, and there is no layer identifier of the candidate layer to be processed in the layer identifier in the layer set to be processed, and there is no layer identifier of the candidate layer to be processed in the layer identifier in the subgraph, it may be determined that the candidate layer to be processed satisfies the partitioning condition, then step 603 is performed, that is, the index layer and the candidate layer to be processed are placed in the temporary subgraph.

In other optional examples, when there is a layer identifier of the candidate layer to be processed in the layer identifier in the matching layer set, or there is a layer identifier of the candidate layer to be processed in the layer identifier in the layer set to be processed, or there is no layer identifier of the candidate layer to be processed in the layer identifier in the sub-graph, it may be determined that the candidate layer to be processed does not satisfy the dividing condition, and then step 601 is performed, that is, the candidate layer to be processed is redetermined from the candidate layer set, and the step of determining whether the candidate layer to be processed satisfies the dividing condition is repeated until the redetermined candidate layer to be processed satisfies the dividing condition.

603: and placing the index layer and the candidate layer to be processed into a temporary subgraph, and judging whether the temporary subgraph meets the effective segmentation condition.

And (5) placing the index layer and the candidate layer to be processed into a temporary subgraph, judging whether the temporary subgraph meets the effective segmentation condition, and if so, turning to step 604, otherwise, turning to step 605.

It can be understood that the effective dividing condition may include that the processor loads parameters corresponding to all layers in the temporary sub-graph, and the performance when processing the input data of the temporary sub-graph and the redundancy rate when repeating the processing of part of the data meet the requirements. For example, the performance index when processing the input data of the temporary sub-graph is higher than the performance index threshold, and the redundancy rate when repeating the processing of the partial data is lower than the redundancy rate threshold.

In some alternative examples, a temporary sub-graph may be defined for recording which layers are temporarily divided into a layer set, i.e., a subgraph_tmp.

In some optional examples, the processor may be used to load parameters corresponding to all layers in the temporary subgraph, to obtain a performance index when processing the input data of the temporary subgraph, and to obtain a redundancy rate when repeating processing of part of the data. When the performance index is higher than the performance index threshold when the input data of the temporary sub-graph is processed, and the redundancy rate of the repeated processing of the partial data is lower than the redundancy rate threshold, and it can be determined that the temporary sub-graph meets the effective segmentation condition, the process goes to step 604, i.e. the candidate layer candi node to be processed is put into the sub-graph, the parent layer of the candidate layer candi node to be processed is put into the candidate layer set candid nodes, the candidate layer candi node to be processed is deleted from the candidate layer set candid nodes, and step 601 is repeated until the candidate layer set is empty.

In other optional examples, when the performance index when the input data of the temporary sub-graph is processed is lower than the performance index threshold, or the redundancy rate when the repeated processing is performed on part of the data is higher than the redundancy rate threshold, it may be determined that the temporary sub-graph does not meet the valid segmentation condition, and then the process goes to step 605, i.e. the candidate layer candidate to be processed is put into the replication layer set rest_merge_modes, and the candidate layer candidate to be processed is deleted from the candidate layer set candidate nodes, and step 601 is repeated until the candidate layer set is empty.

604: and (3) putting the candidate layer to be processed into a child, putting the parent layer of the matching layer to be processed into a candidate layer set, deleting the candidate layer to be processed from the candidate layer set, and turning to step 601 until the candidate layer set is empty.

In some alternative examples, when it is determined that the temporary sub-graph satisfies the valid segmentation condition, the candidate layer candi node to be processed may be formally placed into the sub-graph, while the candidate layer candi node to be processed may be matched with the layer set, and the parent layer of the candidate layer candi node to be processed may be placed into the candidate layer set candidate nodes, while the candidate layer candi node to be processed is deleted from the candidate layer set candidate nodes.

605: and putting the candidate layer to be processed into a replication layer set, and deleting the candidate layer to be processed from the candidate layer set.

And putting the candidate layer to be processed into the replication layer set, deleting the candidate layer to be processed from the candidate layer set, and repeating the step 601 until the candidate layer set is empty.

In some alternative examples, when it is determined that the temporary subgraph does not meet the valid slicing condition, the candidate layer candi node to be processed may be put into the replication layer set rest_merge_modes, and the candidate layer candi node to be processed may be deleted from the candidate layer set candate nodes.

606: the subgraph is determined as a target layer set.

It is understood that the subgraph may be determined as a valid subgraph and put into a valid subgraph set.

The above data slicing method will be described in detail with reference to a specific example, and a schematic structure of a model is shown in fig. 7.

In some specific implementations, all layers of the model may be traversed and the layers that are suitable for slicing (e.g., output layers, convolution layers, etc.) are determined to be the aggregate layer set base_merge_nodes. The collection of sink layers base_merge_nodes may include 10 sink layers with layer identifications of 0-9.

If the current layer traversed is the layer with the layer identification of 9, and when the layer with the layer identification of 9 is not in the matching layer set match nodes, the layer with the layer identification of 9 in the aggregation layer set base_merge_nodes is used as an index layer mp, the index layer with the layer identification of 9 is put into a temporary subgraph, as shown by a dotted line box in fig. 7a, and a processor is adopted to load parameters corresponding to all the layers in the temporary subgraph, process the input data of the temporary subgraph (namely, the output data of the layer with the layer identification of 8), obtain performance indexes when the input data is processed and redundancy rate when part of the data is repeatedly processed, when the performance index when processing input data is higher than the performance index threshold and the redundancy rate when repeating processing of part of the data is lower than the redundancy rate threshold (e.g., the redundancy rate is 0 and lower than the redundancy rate threshold is 0.6), the index layer with layer identification of 9 may be formally put into the sub-map, and the parent layer of the layer with layer identification of 9 may be used as a candidate layer in the candidate layer set candidates of the index layer mp, for example, the parent layer with layer identification of 8 may be used as a candidate layer in the candidate layer set candidates of the index layer with layer identification of 9.

Then, the candidate layer with the layer mark of 8 is taken as a candidate layer candi node to be processed, when the candidate layer candi node to be processed with the layer mark of 8 is not in a matching layer set match nodes and is not in a to-be-processed layer set base_merge_nodes and is not in a subgraph, an index layer mp with the layer mark of 9 and the candidate layer to be processed with the layer mark of 8 are put into a temporary subgraph_tmp, as shown by a dotted line frame in fig. 7b, and a processor is adopted to load parameters corresponding to all layers in the temporary subgraph, input data of the temporary subgraph (namely, output data of the layer with the layer mark of 7) is processed, a performance index when the input data is processed and a redundancy rate when part of the data is repeatedly processed are obtained, when the performance index is higher than the performance index threshold when processing the input data, and the redundancy rate of the data to be repeatedly processed is lower than the redundancy rate threshold (e.g., the redundancy rate is 0.2 and lower than the redundancy rate threshold is 0.6), the candidate layer to be processed with the layer identification 8 may be formally put into the subgraph, while the parent layer of the layer with the layer identification 8 (i.e., the layer with the layer identification 7) is added to the candidate layer set candidates of the index layer with the layer identification 9, and the candidate layer with the layer identification 8 is deleted from the candidate layer set candidates, and the candidate layer with the layer identification 8 may be put into the matching layer set match nodes.

Next, the candidate layer with the layer identifier 7 may be used as a candidate layer candidate node to be processed, when the candidate layer candidate node to be processed with the layer identifier 7 is not in the matching layer set match nodes and is not in the layer set base_merge_nodes to be processed and is not in the subgraph, the index layer mp with the layer identifier 9, the candidate layer with the layer identifier 8, and the candidate layer to be processed with the layer identifier 7 are placed in the temporary subgraph_tmp, as shown by a dashed line frame in fig. 7c, and the processor is used to load parameters corresponding to all layers in the temporary subgraph, to process the input data of the temporary subgraph (i.e. the output data of the layer with the layer identifier 6), to obtain the performance index when the input data is processed and the redundancy rate when the input data is processed is lower than the performance index threshold, or the redundancy rate when the data is processed is higher than the redundancy rate threshold (e.g. 0.65, higher than the redundancy rate when the redundancy rate is 0.6), and the redundancy rate when the redundancy rate of the input data is processed is higher than the redundancy rate threshold, and the redundancy rate of the candidate layer with the layer identifier 7 is placed in the candidate set node. At this time, the candidate layer set candidate nodes is empty, and a target layer set is obtained, namely, the target layer set including layer identifiers of 9 and 8, and the target layer set is put into a valid subimage.

Meanwhile, the layer with the layer identifier 7 can be re-used as an index layer mp, the index layer with the layer identifier 7 is put into a temporary sub-graph, as shown by a dotted line frame in fig. 7d, and a processor is adopted to load parameters corresponding to all layers in the temporary sub-graph, process input data of the temporary sub-graph (i.e. output data of the layer with the layer identifier 6), obtain a performance index when the input data is processed and a redundancy rate when the input data is repeatedly processed, wherein the performance index when the input data is processed is higher than a performance index threshold, and the redundancy rate when the data is repeatedly processed is lower than a redundancy rate threshold (e.g. the redundancy rate is 0 and lower than the redundancy rate threshold 0.6), and the index layer mp with the layer identifier 7 can be formally put into the sub-graph, and a parent layer with the layer identifier 7 is used as a candidate layer in a candidate layer set of the index layer mp, for example, the parent layer with the layer identifier 6 is used as a candidate layer in a candidate layer set of the index layer candidate layer candidide of the index layer mp.

Then, the candidate layer with the layer identifier of 6 can be used as a candidate layer candi node to be processed, when the candidate layer candi node to be processed with the layer identifier of 6 is not in the matching layer set match nodes and is not in the layer set base_merge_nodes to be processed and is not in the subgraph, the index layer mp with the layer identifier of 7 and the candidate layer to be processed with the layer identifier of 6 are put into the temporary subgraph_tmp, as shown by a dotted line frame in fig. 7e, and the processor is adopted to load parameters corresponding to all layers in the temporary subgraph, the input data (namely, the output data of the layer with the layer identifier of 5) of the temporary subgraph is processed, the performance index of the input data when the input data is processed and the redundancy rate of the part of data are obtained, when the performance index of the input data is higher than the performance index threshold, and the redundancy rate of the data when the input data is processed is lower than the redundancy rate threshold (for example, the redundancy rate is lower than 0.2 and lower than the redundancy rate of 0.6), the candidate layer with the layer identifier of 6 can be put into the candidate layer with the candidate layer identifier of 6 as the candidate layer set 6, and the candidate layer of 5 can be deleted from the candidate layer set 6.

Then, the candidate layer with the layer identifier of 5 can be used as a candidate layer candidate to be processed, when the candidate layer candidate with the layer identifier of 5 is not in the matching layer set match nodes and is not in the layer set base_merge_nodes to be processed and is not in the sub-graph, the index layer mp with the layer identifier of 7, the candidate layer with the layer identifier of 6 and the candidate layer with the layer identifier of 5 are placed in the temporary sub-graph_tmp, as shown by a dotted line frame in fig. 7f, and the processor is used for loading parameters corresponding to all layers in the temporary sub-graph, processing the input data (namely, the output data of the layer with the layer identifier of 4) of the temporary sub-graph, obtaining the performance index when the input data is processed and the redundancy rate when the input data is processed, wherein the performance index of the input data is higher than the performance index threshold, the redundancy rate when the data is processed is lower than the redundancy rate threshold (for example, the redundancy rate is lower than 0.5, the redundancy rate is lower than the redundancy threshold for the index layer 5), the candidate layer with the layer identifier of 6 can be placed in the candidate set 5, namely, the redundancy rate of the buffer layer with the layer 5 is added to the candidate layer 5 as the candidate layer set 5, and the redundancy rate of the candidate layer is deleted from the candidate layer set 5.

Then, the candidate layer with the layer identifier of 4 may be used as a candidate layer candidate node to be processed, when the candidate layer candidate node to be processed with the layer identifier of 4 is not in the matching layer set match nodes and is not in the layer set base_merge_nodes to be processed and is not in the subgraph, the index layer mp with the layer identifier of 7, the candidate layers with the layer identifiers of 6 and 5, and the candidate layer to be processed with the layer identifier of 4 are placed in the temporary subgraph_tmp, as shown by a dotted line frame in fig. 7g, and the processor is used to load parameters corresponding to all the layers in the temporary subgraph, to process the input data of the temporary subgraph (i.e. the output data of the layer identifier of 3), to obtain the performance index when the input data is processed and the redundancy rate when the input data is processed is lower than the performance index threshold, and the redundancy rate when the performance index when the input data is processed is lower than the performance index threshold, and the redundancy rate when the data is processed is higher than the redundancy rate (e.g. 0.7, higher than the redundancy rate of 0.6) is higher than the redundancy rate when the redundancy index of the candidate layer 4 is placed in the redundancy set buffer.

Meanwhile, the layer with the layer identifier 4 can be re-used as an index layer mp, the index layer with the layer identifier 4 is put into a temporary subgraph, as shown by a dotted line frame in fig. 7h, a processor is adopted to load parameters corresponding to all layers in the temporary subgraph, input data of the temporary subgraph (namely, output data of the layer with the layer identifier 3) is processed, a performance index when the input data is processed and a redundancy rate when part of the data is repeatedly processed are obtained, when the performance index when the input data is processed is higher than a performance index threshold, the redundancy rate when the data is repeatedly processed is lower than the redundancy rate threshold (for example, the redundancy rate is 0.3 and lower than the redundancy rate threshold 0.6), a candidate layer to be processed with the layer identifier 4 is formally put into the subgraph, and a parent layer of the layer with the layer identifier 3 is used as a candidate layer in a candidate layer set of the index layer mp, for example, a parent layer with the layer identifier 3 is used as a candidate layer in a candidate layer set of the index layer candidate layer candidates of the index layer set of the index layer 4.

In addition, when formally placing the candidate layer to be processed with the layer identification 6 into the child sub-graph, another parent layer of the layer with the layer identification 6 (i.e., the layer with the layer identification 2) may be added to the candidate layer set candidate nodes of the index layer with the layer identification 7.

Next, the candidate layer with layer identification 2 may be used as a candidate layer candi node to be processed, when the candidate layer candi node to be processed with layer identification 2 is not in the matching layer set match nodes and is not in the candidate layer set base_merge_nodes and is not in the subgraph, the index layer mp with layer identification 7, the candidate layer with layer identification 6, and the candidate layer to be processed with layer identification 2 and the candidate layer to be processed with layer identification 5 are put into the temporary subgraph tmp, as shown by the dotted line box in fig. 7i, and the processor is used to load parameters corresponding to all layers in the temporary subgraph, to process the input data of the temporary subgraph (i.e., the output data of the layer with layer identification 1 and the output data of the layer with layer identification 4), acquiring a performance index when processing input data and a redundancy rate when repeating partial data, wherein the performance index when processing the input data is higher than a performance index threshold, and the redundancy rate when repeating the data is lower than the redundancy rate threshold (for example, the redundancy rate is 0.53 and lower than the redundancy rate threshold is 0.6), formally placing a candidate layer to be processed with a layer identifier of 2 into a subgraph, adding a parent layer (i.e., a layer with a layer identifier of 1) of the layer identifier of 2 into a candidate layer set candidate nodes of an index layer with a layer identifier of 7, deleting the candidate layer with a layer identifier of 2 from the candidate layer set candidate nodes, and placing the candidate layer with a layer identifier of 2 into a matching layer set candidate nodes.

Then, the candidate layer with the layer identifier 1 may be used as a candidate layer candidate to be processed, in which the candidate layer candidate with the layer identifier 1 is not in the matching layer set match nodes, and is not in the sub-map, the index layer mp with the layer identifier 7, the candidate layers with the layer identifiers 6, 5 and 2, and the candidate layer with the layer identifier 1 are placed in the temporary sub-map_tmp, as shown by a dashed frame in fig. 7j, and the parameters corresponding to all the layers in the temporary sub-map are loaded by the processor, so that the performance index when the input data is processed and the redundancy rate when the input data is processed are obtained, and the performance index when the input data is processed is higher than the performance index threshold, and the redundancy rate when the data is processed is lower than the redundancy threshold (for example, the redundancy index is 0.53, the redundancy index is lower than the redundancy index threshold, and the redundancy index is lower than the redundancy index 1), and the candidate layer with the layer identifier 1 can be added to the candidate layer 1 from the candidate layer with the layer identifier 1 as the candidate layer identifier 1.

Then, the candidate layer with the layer identifier of 0 may be used as a candidate layer candidate node to be processed, when the candidate layer candidate node to be processed with the layer identifier of 0 is not in the matching layer set match nodes and is not in the to-be-processed layer set base_merge_nodes and is not in the subgraph, the index layer mp with the layer identifier of 7, the candidate layers with the layer identifiers of 6, 5, 2 and 1, and the candidate layer to be processed with the layer identifier of 0 are placed in the temporary subgraph_tmp, as shown by a dotted line frame in fig. 7k, and a processor is used to load parameters corresponding to all layers in the temporary subgraph, process input data (namely, part or all data of a model) of the temporary subgraph, obtain a performance index when the input data is processed and a redundancy rate when the input data is processed, the performance index when the input data is processed is higher than a performance index threshold, and the redundancy rate when the redundancy index of the data is processed is lower than the performance index threshold (for example, the redundancy rate is lower than 0.53, the redundancy rate when the redundancy index is lower than 0.6, the redundancy index can be processed by the candidate layer and the candidate layer is placed in the candidate set of 0, and the candidate layer can be deleted from the candidate layer as the candidate layer map. At this time, the candidate layer set candidate nodes is empty, a target layer set is obtained, that is, a target layer set including layer identifiers of 7, 6, 5, 2, 1 and 0, and the target layer set is put into a valid subimage.

Then, the candidate layer with the layer identifier 3 can be used as a candidate layer candi node to be processed, when the candidate layer candi node to be processed with the layer identifier 3 is not in the matching layer set match nodes and is not in the to-be-processed layer set base_merge_nodes, and is not in the subgraph, the index layer mp with the layer identifier 4 and the candidate layer to be processed with the layer identifier 3 are placed in the temporary subgraph_tmp, as shown by a dotted line frame in fig. 7l, and the processor is used for loading parameters corresponding to all layers in the temporary subgraph, processing the input data (namely, part or all data of the model) of the temporary subgraph, obtaining the performance index of the temporary subgraph when the input data is processed and the redundancy rate of the part of the data, when the performance index of the input data is higher than the performance index threshold, and the redundancy rate of the data is lower than the redundancy rate threshold (for example, the redundancy rate is lower than 0.4 and lower than the redundancy rate threshold 0.6), the candidate layer with the layer identifier 3 can be placed in the subgraph as the candidate layer with the temporary subgraph, and the candidate layer identifier 3 can be deleted from the candidate layer set map. At this time, the candidate layer set candidate nodes is empty, and a target layer set is obtained, namely, the target layer set including layer identifiers of 4 and 3, and the target layer set is put into a valid subimage.

In this way, a plurality of valid subgraphs can be obtained, namely, a target layer set comprising layer identifiers of 0, 1, 2, 5, 6 and 7, a target layer set comprising layer identifiers of 3 and 4 and a target layer set comprising layer identifiers of 8 and 9, and then parameters corresponding to all layers in each target layer set can be divided into one parameter set to obtain a plurality of parameter sets.

In the data processing method according to the embodiment of the present application, the splitting information for splitting the parameters in the model is set in the memory, for example, the processing layers partially connected in the model are divided into one processing layer set, and the parameters corresponding to all the processing layers in the processing layer set are divided into one parameter set, so as to obtain a plurality of parameter sets. Wherein the plurality of parameter sets are a plurality of parameter sets capable of processing data in parallel. When the multi-core NPU processes data, a plurality of cores in the multi-core NPU simultaneously load different parameter sets corresponding to different processing layer sets, respectively process input data of the processing layer sets, and simultaneously respectively store parameters, input data and output data of the processing sets into internal storage spaces of the different cores. Therefore, when data is processed, the data is directly read from the internal memory of the core, and the data is not required to be read from the external memory based on a bus, so that the access times to the external memory can be reduced while the storage pressure of a single core is relieved, the data processing efficiency is improved, the data reading delay is reduced, and the power consumption is reduced.

The hardware structure of the electronic device is described below. The hardware structure of the electronic device is described below. As shown in fig. 8, fig. 8 shows a schematic diagram of a hardware structure of an electronic device. It is understood that the electronic device of the present application may be an electronic device such as a server, a desktop (desktop computer), a handheld computer, a notebook (laptop computer), etc., and the structure of the electronic device will be described below using the electronic device as an example of the server.

In one embodiment, a server may include one or more processors 801, system control logic 802 coupled to at least one of the processors 801, system memory 803 coupled to the system control logic 802, non-volatile memory (NVM) 804 coupled to the system control logic 802, and input output (I/O) devices 805 and network interfaces 806 coupled to the system control logic 802.

In some embodiments, processor 801 may include one or more single-core or multi-core processors. In some embodiments, the processor 801 may include any combination of general-purpose and special-purpose processors (e.g., graphics processor, application processor, baseband processor, etc.). In embodiments where the server employs an eNB (enhanced Node B) or RAN (Radio Access Network ) controller, the processor 801 may be configured to perform various conforming embodiments.

In some embodiments, system control logic 802 may include any suitable interface controller to provide any suitable interface to at least one of processors 801 and/or any suitable device or component in communication with system control logic 802.

In some embodiments, system control logic 802 may include one or more memory controllers to provide an interface to system memory 803. The system memory 803 may be used to load and store data and/or instructions 8031. The memory of the server may include any suitable volatile memory in some embodiments, such as suitable Dynamic Random Access Memory (DRAM).

The non-volatile memory (NVM) 804 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. In some embodiments, the non-volatile memory (NVM) 804 may include any suitable non-volatile memory, such as flash memory, and/or any suitable non-volatile storage device, such as at least one of a HDD (Hard Disk Drive), CD (Compact Disc) Drive, DVD (Digital Versatile Disc ) Drive.

The non-volatile memory (NVM) 804 may include a portion of a storage resource on the device on which the server is installed, or it may be accessed by, but is not necessarily part of, the apparatus. For example, non-volatile memory (NVM) 804 may be accessed over a network via network interface 806.

In particular, the system memory 803 and the non-volatile memory (NVM) 804 may each include: a temporary copy and a permanent copy of the instruction. The instructions may include: instructions that when executed by at least one of the processors 801 cause the server to carry out the data processing methods mentioned in the embodiments of the present application. In some embodiments, instructions, hardware, firmware, and/or software components thereof may additionally/alternatively be disposed in system control logic 802, network interface 806, and/or processor 801.

The network interface 806 may include a transceiver to provide a radio interface for a server to communicate with any other suitable device (e.g., front end module, antenna, etc.) over one or more networks. In some embodiments, the network interface 806 may be integrated with other components of the server. For example, the network interface 806 may be integrated with at least one of the processor 801, the system memory 803, the non-volatile memory (NVM) 804, and a firmware device (not shown) having instructions which, when executed by at least one of the processor 801, implement the data processing methods mentioned in embodiments of the application.

The network interface 806 may further include any suitable hardware and/or firmware to provide a multiple-input multiple-output radio interface. For example, network interface 806 may be a network adapter, a wireless network adapter, a telephone modem, and/or a wireless modem.

In one embodiment, at least one of the processors 801 may be packaged together with logic for one or more controllers of the system control logic 802 to form a System In Package (SiP). In one embodiment, at least one of the processors 801 may be integrated on the same die with logic for one or more controllers of the system control logic 802 to form a system on a chip (SoC).

The server may further include: input/output (I/O) devices 805. The I/O device 805 may include a user interface to enable a user to interact with the server; the design of the peripheral component interface enables the peripheral component to also interact with the server. In some embodiments, the server further comprises a sensor for determining at least one of environmental conditions and location information associated with the server.

In some embodiments, the user interface may include, but is not limited to, a display (e.g., a liquid crystal display, a touch screen display, etc.), a speaker, a microphone, one or more cameras (e.g., still image cameras and/or video cameras), a flashlight (e.g., light emitting diode flash), and a keyboard.

In some embodiments, the peripheral component interface may include, but is not limited to, a non-volatile memory port, an audio jack, and a power interface.

In some embodiments, the sensors may include, but are not limited to, gyroscopic sensors, accelerometers, proximity sensors, ambient light sensors, and positioning units. The positioning unit may also be part of the network interface 806 or interact with the network interface 806 to communicate with components of a positioning network, such as Global Positioning System (GPS) satellites.

While the foregoing describes a possible hardware configuration of an electronic device, it should be understood that the configuration illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device. In other embodiments of the application, the electronic device may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

In the drawings, some structural or methodological features may be shown in a particular arrangement and/or order. However, it should be understood that such a particular arrangement and/or ordering may not be required. Rather, in some embodiments, these features may be arranged in a different manner and/or order than shown in the illustrative figures. Additionally, the inclusion of structural or methodological features in a particular figure is not meant to imply that such features are required in all embodiments, and in some embodiments, may not be included or may be combined with other features.

Embodiments of the disclosed mechanisms may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as a computer program or program code that is executed on a programmable system comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.

Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For the purposes of this application, a processing system includes any system having a processor such as, for example, a Digital Signal Processor (DSP), microcontroller, application specific integrated circuit, or microprocessor.

The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. Program code may also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in the present application are not limited in scope by any particular programming language. In either case, the language may be a compiled or interpreted language.

In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed over a network or through other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including but not limited to floppy diskettes, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), random Access Memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared signal digital signals, etc.) in an electrical, optical, acoustical or other form of propagated signal using the internet. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).

It should be noted that, in the embodiments of the present application, each unit/module mentioned in each device is a logic unit/module, and in physical terms, one logic unit/module may be one physical unit/module, or may be a part of one physical unit/module, or may be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logic unit/module itself is not the most important, and the combination of functions implemented by the logic unit/module is only a key for solving the technical problem posed by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-described device embodiments of the present application do not introduce units/modules that are less closely related to solving the technical problems posed by the present application, which does not indicate that the above-described device embodiments do not have other units/modules.

It should be noted that in the examples and descriptions of this patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

While the application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the application.

Claims

1. A data processing method for an electronic device, wherein the electronic device comprises a first processor, the first processor comprising a plurality of cores;

and the method comprises:

acquiring first input data of a first model;

based on the segmentation information of the first model, segmenting a plurality of processing layers in the first model into a plurality of processing layer sets matched with the segmentation information, wherein the processing layers in the same processing layer set meet effective segmentation conditions, and the effective segmentation conditions are related to the running performance of the first model;

and processing each part of data of the first input data which is required to be processed by each processing layer set by adopting the kernel corresponding to each processing layer set.

2. The method according to claim 1, characterized in that the segmentation information is obtained by:

determining a first processing layer and a second processing layer from a plurality of processing layers in the first model, the first processing layer and the second processing layer being connected;

Judging whether the first processing layer and the second processing layer meet the effective segmentation condition or not;

dividing the first processing layer and the second processing layer into a first processing layer set corresponding to the first processing layer and the second processing layer meeting the effective segmentation condition;

and dividing the first processing layer into a first processing layer set and dividing the second processing layer into a second processing layer set corresponding to the first processing layer and the second processing layer not meeting the effective segmentation condition.

3. The method according to claim 2, wherein the method further comprises:

determining a third processing layer from a plurality of processing layers in the first model, the third processing layer being connected to the first processing layer;

judging whether the first processing layer, the second processing layer and the third processing layer meet the effective segmentation condition or not;

and dividing the first processing layer, the second processing layer and the third processing layer into a first processing layer set corresponding to the first processing layer, the second processing layer and the third processing layer meeting the effective segmentation condition.

4. The method according to claim 1, characterized in that the effective segmentation conditions comprise at least one of:

Loading parameters corresponding to all processing layers in the same processing layer set by the first processor, when partial data needing to be processed by the same processing layer set in the first input data is processed, the performance index is higher than the performance index threshold,

and loading parameters corresponding to all processing layers in the same processing layer set by the first processor, and processing partial data required to be processed by the same processing layer set in the first input data, wherein the redundancy rate of repeated processing of the partial data is lower than a redundancy rate threshold value.

5. The method of claim 2, wherein the cores corresponding to the set of processing layers include a first core and a second core,

the processing, by using the kernel corresponding to each processing layer set, each part of data of the first input data to be processed by each processing layer set includes:

processing a first sub-data in the first input data to be processed by the first processing layer set by adopting the first kernel, and

and processing second sub-data in the first input data to be processed by the first processing layer set by adopting the second kernel.

6. The method of claim 5, wherein processing, with the first kernel, first sub-data in the first input data that is to be processed by the first set of processing layers includes:

loading a first parameter set corresponding to all layers in the first processing layer set by adopting the first kernel, processing first sub-data in the first input data, and

the processing, by using the second kernel, the second sub-data in the first input data to be processed by the first processing layer set includes:

and loading a second parameter set corresponding to all layers in the second processing layer set by adopting the second kernel, and processing second sub-data in the first input data.

7. The method of claim 5, wherein the method further comprises:

storing the first sub-data to a first internal storage space of the first core, and

and storing the second sub data into a second internal storage space of the second kernel.

8. The method according to any one of claims 1 to 7, wherein,

the first processor is a multi-core embedded neural network model processor,

The first model is a neural network model,

the first input data includes image data, text data, audio data.

9. An electronic device, comprising: a memory for storing instructions for execution by one or more processors of the electronic device, and the processor being one of the one or more processors of the electronic device for performing the data processing method of any of claims 1-8.

10. A readable storage medium having stored thereon instructions that, when executed on an electronic device, cause the electronic device to perform the data processing method of any of claims 1-8.