WO2022171027A1 - 一种模型训练方法及装置 - Google Patents

一种模型训练方法及装置 Download PDF

Info

Publication number
WO2022171027A1
WO2022171027A1 PCT/CN2022/074940 CN2022074940W WO2022171027A1 WO 2022171027 A1 WO2022171027 A1 WO 2022171027A1 CN 2022074940 W CN2022074940 W CN 2022074940W WO 2022171027 A1 WO2022171027 A1 WO 2022171027A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
linear
linear operation
neural network
network model
Prior art date
Application number
PCT/CN2022/074940
Other languages
English (en)
French (fr)
Inventor
周彧聪
钟钊
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022171027A1 publication Critical patent/WO2022171027A1/zh
Priority to US18/446,294 priority Critical patent/US20230385642A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to a model training method and device.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • an over-parameterized training method can be used. Specifically, additional parameters and calculations can be introduced on the basis of the original model during training, thereby affecting the training process of the model and improving the accuracy of the model.
  • ACNet Asymmetric Convolutional Network
  • ACNet is an over-parameterized training method. During the training process, the original 3x3 convolution is replaced by the sum of three convolutions of 3x3, 1x3 and 3x1.
  • ACNet has only one fixed over-parameterization. Parametric form, the improvement of model performance is limited.
  • the present application provides a model training method, the method comprising:
  • the training device may replace some or all of the convolutional layers in the first neural network model with linear operations.
  • the replaced convolutional layer object may be the first convolutional layer included in the first neural network model.
  • the first neural network model may include multiple convolutional layers, and the first convolutional layer is multiple convolutional layers. one of the layers.
  • the replaced convolutional layer object may be multiple convolutional layers included in the first neural network model, and the first convolutional layer is one of the multiple convolutional layers.
  • each second neural network model is to replace the first convolutional layer in the first neural network model with a linear obtained from the operation, the linear operation is equivalent to a convolution layer;
  • the so-called “equivalent” in the embodiments of the present application refers to the relationship between two computing units, and specifically, refers to the processing obtained by two computing units that are different in form when processing any identical data.
  • one of the two operation units is derived through mathematical operations and can be transformed into the form of the other operation unit.
  • the sub-linear operations included in the linear operations can be derived through mathematical operations and transformed into the form of convolution layers, and the transformed convolution layers and the linear operations can be obtained when processing the same data.
  • the processing result is the same;
  • Linear operations are composed of multiple sub-linear operations.
  • the so-called sub-linear operations here can refer to basic linear operations, rather than operations composed of multiple basic linear operations.
  • the so-called linear operation here refers to the combination of multiple basic linear operations. resulting operation.
  • the operation type of the sub-linear operation can be, but is not limited to, an addition operation, a null operation, an identity operation, a convolution operation, a batch normalized BN operation or a pooling operation.
  • the linear operation may refer to an addition operation, A composite of at least one sub-linear operation among null operations, identity operations, convolution operations, batch normalized BN operations, and pooling operations.
  • connection relationship refers to the output of a sub-linear operation. for use as an input to another sub-linear operation (except for the sub-linear operation on the output side of the linear operation whose output is used as the output of the linear operation);
  • Target neural network model is a neural network model with the highest model accuracy among the plurality of second neural network models after training.
  • the model accuracy (or referred to as the verification accuracy) of each trained second neural network model can be obtained. Based on the model accuracy of each second neural network model, multiple Select the second neural network model with the highest model accuracy in the second neural network model;
  • the convolution layer in the neural network to be trained is replaced with a linear operation that can be equivalent to the convolution layer, and the method with the highest accuracy is selected from multiple replacement methods, thereby improving the accuracy of the model after training.
  • the receptive field of the convolutional layer equivalent to the linear operation is less than or equal to the receptive field of the first convolutional layer.
  • the linear operation equivalent to a convolution layer In order to make the linear operation equivalent to a convolution layer, at least one convolution operation is required among the multiple sub-linear operations included in the linear operation.
  • the linear operation group In the subsequent process of model inference, in order not to reduce the speed of the inference stage or increase the resource consumption of the inference stage in the subsequent process of model inference, the linear operation group is not used for model inference, but the linear operation is used.
  • the equivalent convolutional layer (may be referred to as the second convolutional layer in subsequent embodiments) is used for model inference, and it is necessary to ensure that the receptive field of the equivalent convolutional layer in linear operations is less than or equal to the receptive field of the first convolutional layer. .
  • the linear operation includes multiple operation branches, and the input of each operation branch is the input of the linear operation, that is to say, each operation branch is used to perform the operation on the input data of the linear operation.
  • each operation branch includes at least one sub-linear operation in series, and the equivalent receptive field of the at least one sub-linear operation in the series is less than or equal to the receptive field of the first convolution layer; or,
  • the linear operation includes an operation branch for processing input data of the linear operation, the operation branch includes a serial at least one sub-linear operation, and the serial at least one sub-linear operation
  • the equivalent receptive field of the operation is less than or equal to the receptive field of the first convolutional layer.
  • a data path between the two endpoints can be an operation branch, the starting point of the operation branch is the input of the linear operation, and the end point of the operation branch is the output of the linear operation.
  • the linear operation may include multiple operation branches, and each operation branch is used to process the input data of the linear operation, that is, the starting point of each operation branch is the input of the linear operation, Furthermore, the input of the sub-linear operation closest to the input of the linear operation in each operation branch is the input data of the linear operation, which is equivalent to that each operation branch is used to process the input data of the linear operation.
  • the operation branches include at least one sub-linear operation of the series.
  • the linear operation can be represented as a computational graph, which defines the input source and output data flow direction of each sub-linear operation. For any path from input to output in the computational graph, you can An operation branch defined as a linear operation;
  • the receptive field is k
  • the receptive field of sum operation and BN operation is 1, and the equivalent receptive field of the operation branch is k.
  • the definition is: this operation Each output of the branch is affected by kxk inputs;
  • the linear operation may only include one operation branch, the one operation branch is used to process the input data of the linear operation, and the one operation branch includes at least one serial sub-linear operation, then the linear operation
  • the equivalent receptive field of only the operation branches included in t is less than or equal to the receptive field of the first convolutional layer.
  • the equivalent receptive field of at least one operation branch in the plurality of parallel operation branches is equal to the receptive field of the first convolutional layer; or,
  • the equivalent receptive field of only one operation branch included in the linear operation is equal to the receptive field of the first convolutional layer.
  • the equivalent receptive field of at least one operation branch in the multiple parallel operation branches is equal to the receptive field of the first convolutional layer
  • the receptive field of the linear operation is equal to the receptive field of the first convolutional layer
  • the receptive field of the equivalent convolutional layer (described later as the second convolutional layer) is equal to the receptive field of the first convolutional layer, and the second convolutional layer can be used for the subsequent model inference process.
  • the receptive field of the second convolutional layer is the same as that of the first convolutional layer, on the premise that the size specification of the neural network model that has not been replaced is consistent, that is, the speed and resource consumption of the inference stage remain unchanged.
  • the receptive field of the second convolutional layer is smaller than that of the first convolutional layer, which increases the amount of training parameters and improves the accuracy of the model.
  • the linear operations in each second neural network model are different from the first convolutional layer, and the linear operations included in different second neural network models are different.
  • the target neural network model includes a trained target linear operation
  • the method further includes:
  • the target linear operation includes multiple sub-linear operations. If the target neural network model is directly used for model inference, it will reduce the model inference speed and increase the resource consumption required for model inference. . Therefore, in this embodiment, a second convolutional layer equivalent to the trained target linear operation can be obtained, and the trained target linear operation in the target neural network model can be replaced with the second volume stacking layers to obtain a third neural network model, the third neural network model can be used for model inference;
  • model inference refers to the actual data processing process using the model in the application process of the model.
  • the second convolutional layer equivalent to the trained target linear operation is obtained, and the trained target linear operation in the target neural network model is replaced with the first
  • the second convolution layer to obtain the third neural network model can be completed by the training device.
  • the training device can directly feed back the third neural network model.
  • the specific training device can send the third neural network model to The terminal device or the server, so that the terminal device or the server performs model inference based on the third neural network model.
  • the terminal device or server obtains a second convolutional layer equivalent to the trained target linear operation, and replaces the trained target linear operation in the target neural network model for the second convolutional layer to obtain the action execution of the third neural network model;
  • the size of the second convolutional layer is the same as the size of the first convolutional layer.
  • the size of the second convolutional layer is the same as the size of the first convolutional layer.
  • the calculated size of the equivalent convolutional layer will be smaller than the size of the first convolutional layer.
  • the calculated equivalent convolutional layer is subjected to a zero-padding operation to obtain a second convolutional layer of the same size as the first convolutional layer.
  • the method further includes:
  • each sub-linear operation is fused into the adjacent and subsequent sub-linear operations in the sequence, until the completion of the The last sub-linear operation in the sequence is fused to obtain the second convolutional layer equivalent to the target linear operation.
  • the sub-linear operation is not an operation directly connected to the input side of the linear operation, its fusion parameter is its own operation parameter;
  • the sub-linear operation is not an operation directly connected to the input side of the linear operation, its fusion parameters are obtained based on the fusion parameters of the adjacent pre-sub-linear operations, or based on the fusion parameters of the adjacent pre-operation and its own operation parameter obtained;
  • the multiple sub-linear operations can be fused to the adjacent and subsequent sub-linear operations in the sequence in the order of processing data, until the last sub-linear operation is completed (distance output The fusion of the nearest sub-linear operation).
  • the determination of the input of the sub-linear operation needs to rely on other sub-linear operations to complete the data processing and obtain the corresponding output.
  • the output of the A operation is the input of the B operation
  • the output of the B operation is the input of the C operation
  • C The data processing of the C operation must be performed after the A operation and the B operation complete the data processing and obtain the corresponding output. Therefore, the sub-linear operation needs to complete the parameter fusion of the sub-linear operation before performing its own parameter fusion. .
  • the determination of the input of some sub-linear operations does not need to rely on certain sub-linear operations to complete data processing and obtain corresponding outputs.
  • the input of the A1 operation is the input of the overall linear operation
  • the output of the A1 operation is the output of the A2 operation.
  • the output of the A2 operation is the input of the B operation
  • the input of the C1 operation is the input of the overall linear operation
  • the output of the C1 operation is the input of the C2 operation
  • the output of the C2 operation is also the input of the B operation
  • the A1 operation processes the data and There is no strict time sequence constraint between C1 processing data, and the process of fusing A1 operation to A2 can be at the same time, before or after the process of C1 operation fusing into C2.
  • the trained target linear operation includes an adjacent first sub-linear operation and a second sub-linear operation, and in the sequence, the second sub-linear operation is located in the After the first sub-linear operation, the first sub-linear operation includes a first operation parameter, and the second sub-linear operation includes a second operation parameter;
  • each sub-linear operation into adjacent and subsequent sub-linear operations in the sequence includes:
  • the fusion parameters of the first sub-linear operation wherein, if the input data of the first sub-linear operation is the input data of the trained target linear operation, the fusion parameters of the first sub-linear operation are For the first operation parameter, if the input data of the first sub-linear operation is the output data of the third sub-linear operation adjacent to the first sub-linear operation and before the sequence, then the first sub-linear operation
  • the fusion parameter of a sub-linear operation is obtained according to the fusion parameter of the third sub-linear operation and the first operation parameter;
  • the fusion parameter of the second sub-linear operation is obtained; wherein, if the second sub-linear operation If the linear operation is the last sub-linear operation in the sequence, the fusion parameter of the second sub-linear operation is used as the operation parameter of the second convolution layer.
  • the first sub-linear operation and the second sub-linear operation may be any adjacent sub-linear operations in the trained target linear operations, and the second sub-linear operation is located at the same position in the sequence.
  • a sub-linear operation after the first sub-linear operation, where the first sub-linear operation includes a first operation parameter, and the first sub-linear operation is used to perform the first sub-linear operation according to the first operation parameter The input data of the first sub-linear operation is processed corresponding to the operation type of the first sub-linear operation, the second sub-linear operation includes a second operation parameter, and the second sub-linear operation is used to The input data of the second sub-linear operation is processed corresponding to the operation type of the second sub-linear operation, and the fusion of each sub-linear operation into the adjacent and subsequent sub-linear operations in the sequence includes:
  • the fusion parameters of the first sub-linear operation wherein, if the input data of the first sub-linear operation is the input data of the trained target linear operation, the fusion parameters of the first sub-linear operation are the first operating parameter;
  • the fusion parameter of the second sub-linear operation is obtained; wherein, if the second sub-linear operation If the linear operation is the last sub-linear operation in the sequence, the fusion parameter of the second sub-linear operation is used as the operation parameter of the second convolution layer.
  • fusion parameter fusion (output node).
  • the fusion process is performed for each linear operation in the model, and finally a fully fused model is obtained, which is consistent with the original model structure, so the speed and resource consumption of the inference stage remain unchanged.
  • the models before and after fusion are mathematically equivalent, so the accuracy of the model after fusion is consistent with that before fusion.
  • the linear operation includes multiple sub-linear operations
  • the operation types of the multiple sub-linear operations include at least one of the following: sum operation, null operation, identity operation, convolution operation, Batch normalized BN operation or pooling operation.
  • the fusion parameter of the second sub-linear operation is the fusion parameter of the first sub-linear operation
  • the operation parameters of the second sub-linear operation are obtained by inner product calculation; if the operation type of the second sub-linear operation is an addition operation, a pooling operation, an identity operation or a null operation, then the second sub-linear operation
  • the fusion parameter of the linear operation is obtained by performing a calculation corresponding to the operation type of the second sub-linear operation on the fusion parameter of the first sub-linear operation.
  • the present application provides a model training method, the method comprising:
  • the first neural network model includes a first convolutional layer, and the first neural network model is used to achieve a target task;
  • a target linear operation for replacing the first convolutional layer is determined based on at least one of the following information, wherein the information includes a network structure of the first neural network model, the target task, and the first Where the convolutional layer is located in the first neural network model, the target linear operation is equivalent to a convolutional layer;
  • different linear operations can be selected for neural network models with different network structures, neural network models for achieving different target tasks, and convolutional layers in different positions in the neural network model, so that the replaced neural network model can be
  • the trained model has higher accuracy
  • the target linear operation may be determined based on the network structure of the first neural network model and/or the position of the first convolutional layer in the first neural network model. Specifically, it may be determined according to the network structure of the first neural network model. Determine the structure of the target linear operation; the network structure of the first neural network model may be the number of sub-network layers included in the first neural network model, the type of sub-network layers, and the connection relationship between the sub-network layers, the first convolution layer
  • the position in the first neural network model; the structure of the target linear operation may refer to the number of sub-linear operations included in the target linear operation, the type of sub-linear operations, and the connection relationship between the sub-linear operations, for example, it can be based on model search
  • the method of linear operation is performed for the convolutional layers of neural network models with different network structures, and the replaced neural network models are trained to determine the network structure of each neural network model.
  • the corresponding optimal or better linear operation refers to the higher accuracy of the model obtained by training the replaced neural network model; after obtaining the first neural network model, it can be based on For the network structure of the first neural network model, a neural network model with a consistent or similar structure is selected from the network structure of the neural network model obtained by the pre-search, and a corresponding convolutional layer in the consistent or similar neural network model is determined.
  • the linear operation of is the target linear operation, wherein the relative position of the above-mentioned "a convolutional layer" in the consistent or similar neural network model is consistent with or similar to the relative position of the first convolutional layer in the first neural network model;
  • the target linear operation can be determined based on the network structure of the first neural network model and the achieved target task, which is similar to the above-mentioned determination based on the network structure of the first neural network model.
  • the structure and the convolutional layers of the neural network models that achieve different target tasks are replaced by linear operations, and the replaced neural network models are trained to determine the best corresponding convolutional layers in the network structure of each neural network model.
  • Excellent or better linear operation, the optimal or better linear operation refers to the higher accuracy of the model obtained by training the replaced neural network model;
  • the target linear operation can be determined based on the target task achieved by the first neural network model, which is similar to the above-mentioned determination based on the network structure of the first neural network model.
  • the model search method can be used for neural networks that achieve different target tasks.
  • the convolution layer of the model is replaced by a linear operation, and the replaced neural network model is trained to determine the optimal or better linear operation corresponding to each convolution layer in the network structure of each neural network model.
  • the optimal or better linear operation refers to the higher accuracy of the model obtained by training the replaced neural network model;
  • the above-mentioned network structure based on the first neural network model and/or the method for determining the linear operation of the target task is only an illustration, and can also be implemented in other ways, as long as the replaced first neural network model is made. (that is, the second neural network model) has high model accuracy, and does not limit how to determine the specific structure and determination method of the target linear operation.
  • the convolutional layer in the neural network to be trained is replaced with a target linear operation, and the structure of the target linear operation is determined according to the structure of the first neural network model and/or the target task.
  • the linear operation used when replacing the convolutional layer can be more suitable for the first neural network model and is more flexible. Different linear operations can be designed for different model structures and task types. , which improves the accuracy of the trained model.
  • the target linear operation includes multiple sub-linear operations
  • the target linear operation includes M operation branches
  • the input of each operation branch is the input of the target linear operation
  • the M operations A branch satisfies at least one of the following conditions:
  • the input of at least one sub-linear operation in the plurality of sub-linear operations included in the M operation branches is the output of the plurality of sub-linear operations in the plurality of sub-linear operations;
  • the number of sub-linear operations included between at least two of the M operation branches is different; or,
  • the operation types of the sub-linear operations included between at least two of the M operation branches are different.
  • the structure of the target linear operation provided in this embodiment is more complex, which can improve the accuracy of the trained model.
  • the receptive field of the convolutional layer equivalent to the target linear operation is less than or equal to the receptive field of the first convolutional layer.
  • the target linear operation is different from the first convolutional layer.
  • the processing results obtained are the same.
  • the target neural network model includes a trained target linear operation
  • the method further includes:
  • the size of the second convolutional layer is the same as the size of the first convolutional layer.
  • the method further includes:
  • each sub-linear operation is fused into the adjacent and subsequent sub-linear operations in the sequence, until the completion of the The last sub-linear operation in the sequence is fused to obtain the second convolutional layer equivalent to the target linear operation.
  • the trained target linear operation includes an adjacent first sub-linear operation and a second sub-linear operation, and in the sequence, the second sub-linear operation is located in the After the first sub-linear operation, the first sub-linear operation includes a first operation parameter, and the second sub-linear operation includes a second operation parameter;
  • each sub-linear operation into adjacent and subsequent sub-linear operations in the sequence includes:
  • the fusion parameters of the first sub-linear operation wherein, if the input data of the first sub-linear operation is the input data of the trained target linear operation, the fusion parameters of the first sub-linear operation are For the first operation parameter, if the input data of the first sub-linear operation is the output data of the third sub-linear operation adjacent to the first sub-linear operation and before the sequence, then the first sub-linear operation
  • the fusion parameter of a sub-linear operation is obtained according to the fusion parameter of the third sub-linear operation and the first operation parameter;
  • the fusion parameter of the second sub-linear operation is obtained; wherein, if the second sub-linear operation If the linear operation is the last sub-linear operation in the sequence, the fusion parameter of the second sub-linear operation is used as the operation parameter of the second convolution layer.
  • the linear operation includes multiple sub-linear operations
  • the operation types of the multiple sub-linear operations include at least one of the following: sum operation, null operation, identity operation, convolution operation, Batch normalized BN operation or pooling operation.
  • the fusion parameter of the second sub-linear operation is the fusion parameter of the first sub-linear operation
  • the operation parameters of the second sub-linear operation are obtained by inner product calculation; if the operation type of the second sub-linear operation is an addition operation, a pooling operation, an identity operation or a null operation, then the second sub-linear operation
  • the fusion parameter of the linear operation is obtained by performing a calculation corresponding to the operation type of the second sub-linear operation on the fusion parameter of the first sub-linear operation.
  • the present application provides a model training method, characterized in that the method includes:
  • the first neural network model includes a first convolutional layer
  • each second neural network model is an operation of replacing the first convolutional layer in the first neural network model with a target linear operation Obtained, the target linear operation is equivalent to a convolution layer, the target linear operation includes multiple sub-linear operations, the target linear operation includes M operation branches, and the input of each operation branch is the target linear operation , the M operation branches satisfy at least one of the following conditions:
  • the input of at least one sub-linear operation in the plurality of sub-linear operations included in the M operation branches is the output of the plurality of sub-linear operations in the plurality of sub-linear operations;
  • the number of sub-linear operations included between at least two of the M operation branches is different; or,
  • the operation types of the sub-linear operations included between at least two of the M operation branches are different;
  • the receptive field of the convolutional layer equivalent to the target linear operation is less than or equal to the receptive field of the first convolutional layer.
  • the target linear operation is different from the first convolutional layer.
  • the processing results obtained are the same.
  • the target neural network model includes a trained target linear operation
  • the method further includes:
  • the size of the second convolutional layer is the same as the size of the first convolutional layer.
  • the method further includes:
  • each sub-linear operation is merged into adjacent sub-linear operations located after the sequence, until the completion of the The last sub-linear operation in the sequence is fused to obtain the second convolutional layer equivalent to the target linear operation.
  • the trained target linear operation includes an adjacent first sub-linear operation and a second sub-linear operation, and in the sequence, the second sub-linear operation is located in the After the first sub-linear operation, the first sub-linear operation includes a first operation parameter, and the second sub-linear operation includes a second operation parameter;
  • each sub-linear operation into adjacent and subsequent sub-linear operations in the sequence includes:
  • the fusion parameters of the first sub-linear operation wherein, if the input data of the first sub-linear operation is the input data of the trained target linear operation, the fusion parameters of the first sub-linear operation are For the first operation parameter, if the input data of the first sub-linear operation is the output data of the third sub-linear operation adjacent to the first sub-linear operation and before the sequence, then the first sub-linear operation
  • the fusion parameter of a sub-linear operation is obtained according to the fusion parameter of the third sub-linear operation and the first operation parameter;
  • the fusion parameter of the second sub-linear operation is obtained; wherein, if the second sub-linear operation If the linear operation is the last sub-linear operation in the sequence, the fusion parameter of the second sub-linear operation is used as the operation parameter of the second convolution layer.
  • the linear operation includes multiple sub-linear operations
  • the operation types of the multiple sub-linear operations include at least one of the following: sum operation, null operation, identity operation, convolution operation, Batch normalized BN operation or pooling operation.
  • the fusion parameter of the second sub-linear operation is the fusion parameter of the first sub-linear operation
  • the operation parameters of the second sub-linear operation are obtained by inner product calculation; if the operation type of the second sub-linear operation is an addition operation, a pooling operation, an identity operation or a null operation, then the second sub-linear operation
  • the fusion parameter of the linear operation is obtained by performing a calculation corresponding to the operation type of the second sub-linear operation on the fusion parameter of the first sub-linear operation.
  • the present application provides a model training method.
  • the method includes: obtaining a first neural network model, where the first neural network model includes a first convolution layer; and obtaining a plurality of second neural network models according to the first neural network model.
  • a neural network model wherein each second neural network model is obtained by replacing the first convolutional layer in the first neural network model with a target linear operation, and the target linear operation is equivalent to a convolution layer, the target linear operation includes multiple sub-linear operations, the target linear operation includes M operation branches, the input of each operation branch is the input of the target linear operation, and the M operation branches satisfy at least one of the following conditions
  • the input of at least one sub-linear operation in the multiple sub-linear operations included in the M operation branches is the output of the multiple sub-linear operations in the multiple sub-linear operations; the output of at least two operation branches in the M operation branches is The number of sub-linear operations included in the M operation branches is different; or, the operation types of the sub-linear operations included between at least two operation
  • the present application provides a model training device, the device comprising:
  • an acquisition module for acquiring a first neural network model, where the first neural network model includes a first convolution layer;
  • each second neural network model is to replace the first convolutional layer in the first neural network model with a linear obtained from the operation, the linear operation is equivalent to a convolution layer;
  • a model training module is used to perform model training on the plurality of second neural network models to obtain a target neural network model, where the target neural network model is the neural network with the highest model accuracy among the plurality of second neural network models after training. network model.
  • the convolution layer in the neural network to be trained is replaced with a linear operation that can be equivalent to the convolution layer, and the method with the highest accuracy is selected from multiple replacement methods, thereby improving the accuracy of the model after training.
  • the receptive field of the convolutional layer equivalent to the linear operation is less than or equal to the receptive field of the first convolutional layer.
  • the linear operation equivalent to a convolution layer In order to make the linear operation equivalent to a convolution layer, at least one convolution operation is required among the multiple sub-linear operations included in the linear operation.
  • the linear operation group In the process of subsequent model inference, in order not to reduce the speed of the inference stage or increase the resource consumption of the inference stage in the subsequent process of model inference, the linear operation group is not used for model inference, but the linear operation is used.
  • the equivalent convolutional layer (may be referred to as the second convolutional layer in subsequent embodiments) is used for model inference, and it is necessary to ensure that the receptive field of the equivalent convolutional layer in linear operations is less than or equal to the receptive field of the first convolutional layer. .
  • the linear operation includes a plurality of operation branches, the input of each operation branch is the input of the linear operation, each operation branch includes at least one sub-linear operation in series, and the the equivalent receptive field of at least one sub-linear operation of the series is less than or equal to the receptive field of the first convolutional layer; or,
  • the linear operation includes an operation branch for processing input data of the linear operation, the operation branch includes a serial at least one sub-linear operation, and the serial at least one sub-linear operation
  • the equivalent receptive field of the operation is less than or equal to the receptive field of the first convolutional layer.
  • the equivalent receptive field of at least one operation branch in the multiple parallel operation branches is equal to the receptive field of the first convolutional layer
  • the receptive field of the linear operation is equal to the receptive field of the first convolutional layer
  • the receptive field of the equivalent convolutional layer (described later as the second convolutional layer) is equal to the receptive field of the first convolutional layer, and the second convolutional layer can be used for the subsequent model inference process.
  • the receptive field of the second convolutional layer is the same as that of the first convolutional layer, on the premise that the size specification of the neural network model that has not been replaced is consistent, that is, the speed and resource consumption of the inference stage remain unchanged.
  • the receptive field of the second convolutional layer is smaller than that of the first convolutional layer, which increases the amount of training parameters and improves the accuracy of the model.
  • the linear operations in each second neural network model are different from the first convolutional layer, and the linear operations included in different second neural network models are different.
  • the target neural network model includes a trained target linear operation
  • the acquisition module is used for:
  • the target linear operation includes multiple sub-linear operations. If the target neural network model is directly used for model inference, it will reduce the model inference speed and increase the resource consumption required for model inference. . Therefore, in this embodiment, a second convolutional layer equivalent to the trained target linear operation can be obtained, and the trained target linear operation in the target neural network model can be replaced with the second volume stacking layers to obtain a third neural network model, the third neural network model can be used for model inference;
  • model inference refers to the actual data processing process using the model in the application process of the model.
  • the second convolutional layer equivalent to the trained target linear operation is obtained, and the trained target linear operation in the target neural network model is replaced with the first
  • the second convolution layer to obtain the third neural network model can be completed by the training device. After the training is completed, the training device can directly feed back the third neural network model.
  • the specific training device can send the third neural network model to The terminal device or the server, so that the terminal device or the server performs model inference based on the third neural network model.
  • the terminal device or server obtains a second convolutional layer equivalent to the trained target linear operation, and replaces the trained target linear operation in the target neural network model Perform actions for the second convolutional layer to obtain the third neural network model.
  • the size of the second convolutional layer is the same as the size of the first convolutional layer.
  • the size of the second convolutional layer is the same as the size of the first convolutional layer.
  • the calculated size of the equivalent convolutional layer will be smaller than the size of the first convolutional layer.
  • the calculated equivalent convolutional layer is subjected to a zero-padding operation to obtain a second convolutional layer of the same size as the first convolutional layer.
  • the apparatus further includes:
  • a fusion module configured to fuse each sub-linear operation into an adjacent and subsequent sub-linear operation in the sequence according to the sequence of the multiple sub-linear operations included in the trained target linear operation when processing data, Until the fusion to the last sub-linear operation in the sequence is completed, a second convolutional layer equivalent to the target linear operation is obtained.
  • the trained target linear operation includes an adjacent first sub-linear operation and a second sub-linear operation, and in the sequence, the second sub-linear operation is located in the After the first sub-linear operation, the first sub-linear operation includes a first operation parameter, and the second sub-linear operation includes a second operation parameter;
  • the fusion module is used for:
  • the fusion parameters of the first sub-linear operation wherein, if the input data of the first sub-linear operation is the input data of the trained target linear operation, the fusion parameters of the first sub-linear operation are For the first operation parameter, if the input data of the first sub-linear operation is the output data of the third sub-linear operation adjacent to the first sub-linear operation and before the sequence, then the first sub-linear operation
  • the fusion parameter of a sub-linear operation is obtained according to the fusion parameter of the third sub-linear operation and the first operation parameter;
  • the fusion parameter of the second sub-linear operation is obtained; wherein, if the second sub-linear operation If the linear operation is the last sub-linear operation in the sequence, the fusion parameter of the second sub-linear operation is used as the operation parameter of the second convolution layer.
  • the linear operation includes multiple sub-linear operations
  • the operation types of the multiple sub-linear operations include at least one of the following: sum operation, null operation, identity operation, convolution operation, Batch normalized BN operation or pooling operation.
  • the fusion parameter of the second sub-linear operation is the fusion parameter of the first sub-linear operation
  • the operation parameters of the second sub-linear operation are obtained by inner product calculation; if the operation type of the second sub-linear operation is an addition operation, a pooling operation, an identity operation or a null operation, then the second sub-linear operation
  • the fusion parameter of the linear operation is obtained by performing a calculation corresponding to the operation type of the second sub-linear operation on the fusion parameter of the first sub-linear operation.
  • the present application provides a model training device, the device comprising:
  • an acquisition module for acquiring a first neural network model, where the first neural network model includes a first convolution layer;
  • a target linear operation for replacing the first convolutional layer is determined based on at least one of the following information, wherein the information includes a network structure of the first neural network model, the target task, and the first Where the convolutional layer is located in the first neural network model, the target linear operation is equivalent to a convolutional layer;
  • a model training module configured to perform model training on the second neural network model to obtain a target neural network model.
  • the convolutional layer in the neural network to be trained is replaced with a target linear operation, and the structure of the target linear operation is determined according to the structure of the first neural network model and/or the target task.
  • the linear operation used when replacing the convolutional layer can be more suitable for the first neural network model and is more flexible. Different linear operations can be designed for different model structures and task types. , which improves the accuracy of the trained model.
  • the target linear operation includes multiple sub-linear operations
  • the target linear operation includes M operation branches
  • the input of each operation branch is the input of the target linear operation
  • the M operations A branch satisfies at least one of the following conditions:
  • the input of at least one sub-linear operation in the plurality of sub-linear operations included in the M operation branches is the output of the plurality of sub-linear operations in the plurality of sub-linear operations;
  • the number of sub-linear operations included between at least two of the M operation branches is different; or,
  • the operation types of the sub-linear operations included between at least two of the M operation branches are different.
  • the structure of the target linear operation provided in this embodiment is more complex, which can improve the accuracy of the trained model.
  • the receptive field of the convolutional layer equivalent to the target linear operation is less than or equal to the receptive field of the first convolutional layer.
  • the target linear operation is different from the first convolutional layer.
  • the processing results obtained are the same.
  • the acquisition module is configured to replace the trained target linear operation in the target neural network model with a second convolution layer equivalent to the trained target linear operation , to obtain the third neural network model.
  • the size of the second convolutional layer is the same as the size of the first convolutional layer.
  • the apparatus further includes:
  • a fusion module configured to fuse each sub-linear operation into an adjacent and subsequent sub-linear operation in the sequence according to the sequence of the multiple sub-linear operations included in the trained target linear operation when processing data, Until the fusion to the last sub-linear operation in the sequence is completed, a second convolutional layer equivalent to the target linear operation is obtained.
  • the trained target linear operation includes an adjacent first sub-linear operation and a second sub-linear operation, and in the sequence, the second sub-linear operation is located in the After the first sub-linear operation, the first sub-linear operation includes a first operation parameter, and the second sub-linear operation includes a second operation parameter;
  • the fusion module is configured to obtain fusion parameters of the first sub-linear operation, wherein, if the input data of the first sub-linear operation is the input data of the trained target linear operation, the first sub-linear operation
  • the fusion parameter of the sub-linear operation is the first operation parameter, if the input data of the first sub-linear operation is the third sub-linear operation adjacent to the first sub-linear operation and before the sequence output data, the fusion parameter of the first sub-linear operation is obtained according to the fusion parameter of the third sub-linear operation and the first operation parameter;
  • the fusion parameter of the second sub-linear operation is obtained; wherein, if the second sub-linear operation If the linear operation is the last sub-linear operation in the sequence, the fusion parameter of the second sub-linear operation is used as the operation parameter of the second convolution layer.
  • the linear operation includes multiple sub-linear operations
  • the operation types of the multiple sub-linear operations include at least one of the following: sum operation, null operation, identity operation, convolution operation, Batch normalized BN operation or pooling operation.
  • the fusion parameter of the second sub-linear operation is the fusion parameter of the first sub-linear operation
  • the operation parameters of the second sub-linear operation are obtained by inner product calculation; if the operation type of the second sub-linear operation is an addition operation, a pooling operation, an identity operation or a null operation, then the second sub-linear operation
  • the fusion parameter of the linear operation is obtained by performing a calculation corresponding to the operation type of the second sub-linear operation on the fusion parameter of the first sub-linear operation.
  • the embodiment of the present application also provides a model training device, and the device includes:
  • an acquisition module for acquiring a first neural network model, where the first neural network model includes a first convolution layer;
  • each second neural network model is an operation of replacing the first convolutional layer in the first neural network model with a target linear operation Obtained, the target linear operation is equivalent to a convolution layer, the target linear operation includes multiple sub-linear operations, the target linear operation includes M operation branches, and the input of each operation branch is the target linear operation , the M operation branches satisfy at least one of the following conditions:
  • the input of at least one sub-linear operation in the plurality of sub-linear operations included in the M operation branches is the output of the plurality of sub-linear operations in the plurality of sub-linear operations;
  • the number of sub-linear operations included between at least two of the M operation branches is different; or,
  • the operation types of the sub-linear operations included between at least two of the M operation branches are different;
  • a model training module configured to perform model training on the second neural network model to obtain a target neural network model.
  • the structure of the target linear operation provided in this embodiment is more complex, which can improve the accuracy of the trained model.
  • the receptive field of the convolutional layer equivalent to the target linear operation is less than or equal to the receptive field of the first convolutional layer.
  • the target linear operation is different from the first convolutional layer.
  • the processing results obtained are the same.
  • the target neural network model includes a trained target linear operation
  • the acquisition module is used for:
  • the size of the second convolutional layer is the same as the size of the first convolutional layer.
  • the apparatus further includes:
  • a fusion module configured to fuse each sub-linear operation into an adjacent and subsequent sub-linear operation in the sequence according to the sequence of the multiple sub-linear operations included in the trained target linear operation when processing data, Until the fusion to the last sub-linear operation in the sequence is completed, a second convolutional layer equivalent to the target linear operation is obtained.
  • the trained target linear operation includes an adjacent first sub-linear operation and a second sub-linear operation, and in the sequence, the second sub-linear operation is located in the After the first sub-linear operation, the first sub-linear operation includes a first operation parameter, and the second sub-linear operation includes a second operation parameter;
  • each sub-linear operation into adjacent and subsequent sub-linear operations in the sequence includes:
  • the fusion parameters of the first sub-linear operation wherein, if the input data of the first sub-linear operation is the input data of the trained target linear operation, the fusion parameters of the first sub-linear operation are For the first operation parameter, if the input data of the first sub-linear operation is the output data of the third sub-linear operation adjacent to the first sub-linear operation and before the sequence, then the first sub-linear operation
  • the fusion parameter of a sub-linear operation is obtained according to the fusion parameter of the third sub-linear operation and the first operation parameter;
  • the fusion parameter of the second sub-linear operation is obtained; wherein, if the second sub-linear operation If the linear operation is the last sub-linear operation in the sequence, the fusion parameter of the second sub-linear operation is used as the operation parameter of the second convolution layer.
  • the linear operation includes multiple sub-linear operations
  • the operation types of the multiple sub-linear operations include at least one of the following: sum operation, null operation, identity operation, convolution operation, Batch normalized BN operation or pooling operation.
  • the fusion parameter of the second sub-linear operation is the fusion parameter of the first sub-linear operation
  • the operation parameters of the second sub-linear operation are obtained by inner product calculation; if the operation type of the second sub-linear operation is an addition operation, a pooling operation, an identity operation or a null operation, then the second sub-linear operation
  • the fusion parameter of the linear operation is obtained by performing a calculation corresponding to the operation type of the second sub-linear operation on the fusion parameter of the first sub-linear operation.
  • an embodiment of the present application provides a model training apparatus, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to execute the program in the memory, so as to execute the above-mentioned first aspect , the third aspect, and any optional method thereof.
  • embodiments of the present application provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when it runs on a computer, causes the computer to execute the above-mentioned first and third aspects and any of its optional methods.
  • an embodiment of the present application provides a computer program, including code, for implementing the first aspect, the third aspect, and any optional method thereof when the code is executed.
  • the present application provides a system-on-chip
  • the system-on-a-chip includes a processor for supporting an execution device or a training device to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods; or, information.
  • the chip system further includes a memory for storing program instructions and data necessary for executing the device or training the device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • An embodiment of the present application provides a model training method.
  • the method includes: acquiring a first neural network model, where the first neural network model includes a first convolution layer; and acquiring a plurality of A second neural network model, wherein each second neural network model is obtained by replacing the first convolutional layer in the first neural network model with a linear operation, and the linear operation is equivalent to a
  • the convolutional layer performs model training on the multiple second neural network models to obtain a target neural network model, where the target neural network model is the neural network model with the highest model accuracy among the multiple second neural network models after training .
  • the convolution layer in the neural network to be trained is replaced with a linear operation that can be equivalent to the convolution layer, and the method with the highest accuracy is selected from multiple replacement methods, thereby improving the accuracy of the model after training. .
  • Fig. 1 is a kind of structural schematic diagram of artificial intelligence main frame
  • FIG. 2 is a schematic diagram of a convolutional neural network provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a convolutional neural network provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an embodiment of a model training method provided by an embodiment of the present application.
  • FIG. 6a is a schematic diagram of a linear operation provided by an embodiment of the present application.
  • FIG. 6b is a schematic diagram of a linear operation provided by an embodiment of the present application.
  • FIG. 6c is a schematic diagram of a linear operation provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a receptive field of a convolutional layer provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a convolutional layer receptive field provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a convolution layer provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a convolution kernel provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a linear operation fusion provided by an embodiment of the application.
  • FIG. 12 is a schematic diagram of a linear operation replacement provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a linear operation provided by an embodiment of the present application.
  • FIG. 14 is a schematic diagram of a zero-filling operation provided by an embodiment of the present application.
  • 15a is a schematic diagram of an application scenario of a model training method provided by an embodiment of the application.
  • 15b is a schematic diagram of an application scenario of a model training method provided by an embodiment of the application.
  • 16a is a schematic diagram of an application scenario of a model training method provided by an embodiment of the present application.
  • FIG. 16b is a schematic diagram of an embodiment of a model training method provided by an embodiment of the application.
  • FIG. 17 is a schematic diagram of a model training apparatus provided by an embodiment of the application.
  • FIG. 18 is a schematic structural diagram of an execution device provided by an embodiment of the present application.
  • FIG. 19 is a schematic structural diagram of a training device provided by an embodiment of the present application.
  • FIG. 20 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • Figure 1 shows a schematic structural diagram of the main frame of artificial intelligence.
  • the above-mentioned artificial intelligence theme framework is explained in two dimensions (vertical axis).
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
  • the infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communication with the outside world through sensors; computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA); the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
  • smart chips hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA
  • the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
  • the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
  • machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall solution of artificial intelligence, and the productization of intelligent information decision-making to achieve landing applications. Its application areas mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, safe city, etc.
  • the model training method provided in the embodiment of the present application can be specifically applied to data processing methods such as data training, machine learning, deep learning, etc., to perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on the training data, Finally obtain a trained neural network model (as the target neural network model in the embodiment of the present application); and the target neural network model can be used for model inference, specifically input data can be input into the target neural network model, obtain output data .
  • a neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes xs (ie input data) and an intercept 1 as input, and the output of the operation unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • Ws is the weight of xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting a plurality of the above single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field, and the local receptive field can be an area composed of several neural units.
  • Convolutional Neural Network (Convosutionas Neuras Network, CNN) is a deep neural network with a convolutional structure.
  • a convolutional neural network consists of a feature extractor consisting of convolutional and subsampling layers.
  • the feature extractor can be viewed as a filter, and the convolution process can be viewed as convolution with an input image or a convolutional feature map using a trainable filter.
  • the convolutional layer refers to a neuron layer (eg, the first convolutional layer and the second convolutional layer in this embodiment) that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron In a convolutional layer of a convolutional neural network, a neuron can only be connected to some of its neighbors.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle. Neural units in the same feature plane share weights, where the shared weights are convolution kernels. Shared weights can be understood as the way to extract image information is independent of location. The underlying principle is that the statistics of one part of the image are the same as the other parts. This means that the image information learned in one part can also be used in another part. So for all positions on the image, we can use the same learned image information. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.
  • the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network.
  • the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • a convolutional neural network (CNN) 100 may include an input layer 110 , a convolutional/pooling layer 120 , where the pooling layer is optional, and a neural network layer 130 .
  • the structures composed of the convolutional layer/pooling layer 120 and the neural network layer 130 may be the first convolutional layer and the second convolutional layer described in this application, the input layer 110 and the convolutional layer/pooling layer 120
  • the convolutional layer/pooling layer 120 is connected to the neural network layer 130, the output of the neural network layer 130 can be input to the activation layer, and the activation layer can perform nonlinear processing on the output of the neural network layer 130.
  • the convolutional/pooling layer 120 may include layers 121-126 as examples.
  • layer 121 is a convolutional layer
  • layer 122 is a pooling layer
  • layer 123 is a convolutional layer
  • layer 124 is a convolutional layer.
  • Layers are pooling layers
  • 125 are convolutional layers
  • 126 are pooling layers; in another implementation, 121 and 122 are convolutional layers, 123 are pooling layers, 124 and 125 are convolutional layers, and 126 are pooling layer. That is, the output of a convolutional layer can be used as the input of a subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 121 may include many convolution operators, which are also called kernels, and their role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator can be essentially a weight matrix. This weight matrix is usually pre-defined. In the process of convolving an image, the weight matrix is usually pixel by pixel along the horizontal direction on the input image ( Or two pixels after two pixels...depending on the value of stride), which completes the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image.
  • the weight matrix will be extended to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension of the convolutional output, but in most cases a single weight matrix is not used, but multiple weight matrices of the same dimension are applied.
  • the output of each weight matrix is stacked to form the depth dimension of the convolutional image.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted noise in the image. Perform fuzzification...
  • the dimensions of the multiple weight matrices are the same, and the dimension of the feature maps extracted from the weight matrices with the same dimensions are also the same, and then the multiple extracted feature maps with the same dimensions are combined to form the output of the convolution operation .
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can extract information from the input image, thereby helping the convolutional neural network 100 to make correct predictions.
  • the initial convolutional layer for example, 121
  • the features extracted by the later convolutional layers become more and more complex, such as features such as high-level semantics.
  • each layer 121-126 exemplified by 120 in Figure 2 can be a convolutional layer followed by a layer
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the convolutional neural network 100 After being processed by the convolutional layer/pooling layer 120, the convolutional neural network 100 is not sufficient to output the required output information. Because as mentioned before, the convolutional layer/pooling layer 120 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 100 needs to utilize the neural network layer 130 to generate one or a set of outputs of the required number of classes. Therefore, the neural network layer 130 may include multiple hidden layers (131, 132 to 13n as shown in FIG. 2) and the output layer 140, and the parameters contained in the multiple hidden layers may be based on specific task types The relevant training data is pre-trained, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.
  • the output layer 140 After the multi-layer hidden layers in the neural network layer 130, that is, the last layer of the entire convolutional neural network 100 is the output layer 140, the output layer 140 has a loss function similar to the classification cross entropy, and is specifically used to calculate the prediction error,
  • the forward propagation of the entire convolutional neural network 100 (as shown in Fig. 2, the propagation from 110 to 140 is forward propagation) is completed, the back propagation (as shown in Fig. 2 from 140 to 110 as the back propagation) will start to update.
  • the weight values and biases of the aforementioned layers are used to reduce the loss of the convolutional neural network 100 and the error between the result output by the convolutional neural network 100 through the output layer and the ideal result.
  • the convolutional neural network 100 shown in FIG. 2 is only used as an example of a convolutional neural network.
  • the convolutional neural network may also exist in the form of other network models, for example, such as
  • the multiple convolutional layers/pooling layers shown in FIG. 3 are in parallel, and the extracted features are input to the full neural network layer 130 for processing.
  • Deep Neural Network also known as multi-layer neural network
  • DNN Deep Neural Network
  • the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the middle layers are all hidden layers.
  • the layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.
  • the coefficient from the kth neuron in the L-1 layer to the jth neuron in the Lth layer is defined as It should be noted that the input layer does not have a W parameter.
  • more hidden layers allow the network to better capture the complexities of the real world.
  • a model with more parameters is more complex and has a larger "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vectors W of many layers).
  • the convolutional neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, forwarding the input signal until the output will generate an error loss, and updating the parameters in the initial super-resolution model by back-propagating the error loss information, so that the error loss converges.
  • the back-propagation algorithm is a back-propagation motion dominated by the error loss, aiming to obtain the parameters of the optimal super-resolution model, such as the weight matrix.
  • Linearity refers to the proportional and straight-line relationship between quantities. Mathematically, it can be understood as a function whose first derivative is a constant. Linear operations can be, but are not limited to, addition operations, null operations, identity operations, and convolutions. operations, batch normalized BN operations, and pooling operations. Linear operations can also be called linear mappings. Linear mappings need to satisfy two conditions: homogeneity and additivity. If either condition is not satisfied, it is nonlinear.
  • x, a, and f(x) here are not necessarily scalars, but can be vectors or matrices to form a linear space of any dimension. If x and f(x) are n-dimensional vectors, when a is a constant, it is equivalent to satisfy homogeneity, and when a is a matrix, it is equivalent to satisfy additivity.
  • each linear operation included in the linear operation may also be referred to as a sub-linear operation.
  • FIG. 4 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the execution device 110 is configured with an input/output (I/O) interface 112, which is used for data interaction with external devices. Data may be input to I/O interface 112 through client device 140 .
  • I/O input/output
  • the execution device 120 may call the data storage system 150
  • the data, codes, etc. in the corresponding processing can also be stored in the data storage system 150 .
  • the I/O interface 112 returns the processing results to the client device 140 for provision to the user.
  • the client device 140 can be, for example, a control unit in an automatic driving system or a functional algorithm module in a mobile phone terminal, for example, the functional algorithm module can be used to implement related tasks.
  • the training device 120 can generate corresponding target models/rules (for example, the target neural network model in this embodiment) based on different training data for different targets or different tasks.
  • the corresponding target model/rules Rules can then be used to achieve the above goals or complete the above tasks to provide the user with the desired result.
  • the user can manually specify the input data, which can be operated through the interface provided by the I/O interface 112 .
  • the client device 140 can automatically send the input data to the I/O interface 112 . If the user's authorization is required to request the client device 140 to automatically send the input data, the user can set the corresponding permission in the client device 140 .
  • the user can view the result output by the execution device 110 on the client device 140, and the specific present form can be a specific manner such as display, sound, and action.
  • the client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data as shown in the figure, and store them in the database 130 .
  • the I/O interface 112 directly uses the input data input into the I/O interface 112 and the output result of the output I/O interface 112 as shown in the figure as a new sample The data is stored in database 130 .
  • FIG. 4 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 .
  • model training method provided by the embodiment of the present application is described by taking the model training stage as an example.
  • FIG. 5 is a schematic diagram of an embodiment of a model training method provided by an embodiment of the present application.
  • a model training method provided by an embodiment of the present application includes:
  • 501 Obtain a first neural network model, where the first neural network model includes a first convolution layer.
  • the training device may acquire the first neural network model to be trained, and the first neural network model may be the model to be trained given by the user.
  • the training device may replace some or all of the convolutional layers in the first neural network model with linear operations.
  • the replaced convolutional layer object may be the first convolutional layer included in the first neural network model.
  • the first neural network model may include multiple convolutional layers, and the first convolutional layer is multiple convolutional layers. one of the layers.
  • the replaced convolutional layer object may be multiple convolutional layers included in the first neural network model, and the first convolutional layer is one of the multiple convolutional layers.
  • the training device may select a convolutional layer (including the first convolutional layer) that needs to be replaced from the first neural network model.
  • the convolutional layer that needs to be replaced in the first neural network model can be specified by the administrator, or the convolutional layer that needs to be replaced in the first neural network model is determined by the training device through model structure search, How the training device determines the convolutional layer that needs to be replaced through the model structure search will be described in subsequent embodiments, and will not be repeated here.
  • each second neural network model is to replace the first convolutional layer in the first neural network model with a obtained by a linear operation, which is equivalent to a convolutional layer.
  • the training device may replace the first convolutional layer in the first neural network model with a linear operation, so as to obtain a second neural network model, and then obtain a plurality of second neural network models.
  • a neural network model, each second neural network model is obtained by replacing the first convolutional layer in the first neural network model with a linear operation.
  • the linear operation is equivalent to a convolution layer.
  • the so-called “equivalent” in the embodiments of the present application refers to the relationship between two computing units, and specifically, refers to the processing obtained by two computing units that are different in form when processing any identical data.
  • one of the two operation units is derived through mathematical operations and can be transformed into the form of the other operation unit.
  • the sub-linear operations included in the linear operations can be derived through mathematical operations and transformed into the form of convolution layers, and the transformed convolution layers and the linear operations can be obtained when processing the same data.
  • the processing result is the same.
  • a linear operation is composed of multiple sub-linear operations.
  • the so-called sub-linear operations can refer to basic linear operations, rather than operations composed of multiple basic linear operations.
  • the so-called linear operations here refer to multiple basic linear operations.
  • the operation type of the sub-linear operation can be, but is not limited to, an addition operation, a null operation, an identity operation, a convolution operation, a batch normalized BN operation or a pooling operation.
  • the linear operation may refer to an addition operation, A composite of at least one sub-linear operation among null operations, identity operations, convolution operations, batch normalized BN operations, and pooling operations.
  • the compounding here means that the number of sub-linear operations is greater than or equal to 2, and there is a connection relationship between the sub-linear operations, and there is no isolated sub-linear operation.
  • the so-called connection relationship refers to the output of a sub-linear operation. Used as the input of another sub-linear operation (except for the sub-linear operation on the output side of the linear operation whose output is used as the output of the linear operation).
  • FIG. 6a, 6b and 6c are schematic diagrams of several structures of the linear operation in the embodiment of the present application, wherein the linear operation shown in FIG. 6a includes 4 sub-linear Operation, 4 sub-linear operations include convolution operation 1 (convolution size is k*k), convolution operation 2 (convolution size is 1*1), convolution operation 3 (convolution size is k*k) ) and the sum operation, the convolution operation 1 processes the input data of the linear operation and obtains output 1, the convolution operation 2 processes the input data of the linear operation and obtains the output 2, the convolution operation 3 processes the output 2 and obtains the output 3, and the sum The operation adds outputs 1 and 3 to get the output of the linear operation.
  • 4 sub-linear operations include convolution operation 1 (convolution size is k*k), convolution operation 2 (convolution size is 1*1), convolution operation 3 (convolution size is k*k) ) and the sum operation
  • the convolution operation 1 processes the input data of the linear operation and obtains output 1
  • the linear operation shown in FIG. 6b includes 7 sub-linear operations, and the 7 sub-linear operations include convolution operation 1 (convolution size is k*k), convolution operation 2 (convolution size is 1*1), Convolution operation 3 (convolution size is k*k), convolution operation 4 (convolution size is 1*1), convolution operation 5 (convolution size is k*k), convolution operation 6 ( The size of convolution is 1*1) and the sum operation, convolution operation 1 processes the input data of the linear operation, and obtains output 1, convolution operation 2 processes the input data of the linear operation, and obtains output 2, and convolution operation 3 processes the output. 2, get output 3, convolution operation 4 processes the input data of linear operation, get output 4, convolution operation 5 processes output 4, get output 5, convolution operation 6 processes output 5, get output 6, sum operation on the output 1. Add output 3 and output 6 to obtain the output of the linear operation.
  • the linear operation shown in FIG. 6c includes 8 sub-linear operations, and the 8 sub-linear operations include convolution operation 1 (convolution size is k*k), convolution operation 2 (convolution size is 1*1), Convolution operation 3 (convolution size is k*k), convolution operation 4 (convolution size is 1*1), convolution operation 5 (convolution size is 1*1), convolution operation 6 ( The size of the convolution is k*k), the sum operation 1 and the sum operation 2, the convolution operation 1 processes the input data of the linear operation, and obtains the output 1, and the convolution operation 2 processes the input data of the linear operation, and obtains the output 2, Convolution operation 3 processes output 2 to get output 3, convolution operation 4 processes output 2 to get output 4, convolution operation 5 processes the input data of the linear operation to get output 5, and sum operation 1 performs output 4 and output 5.
  • Convolution operation 6 processes output 6 to get output 7, and sum operation 2 adds output 1, output 3, and output 7 to get the output of the linear operation.
  • the linear operation in order to enable the linear operation to be equivalent to a convolution layer, at least one convolution operation is required among the multiple sub-linear operations included in the linear operation.
  • the linear operation group is not used for model inference, but the linear operation is used.
  • the equivalent convolutional layer (may be referred to as the second convolutional layer in subsequent embodiments) is used for model inference, and it is necessary to ensure that the receptive field of the equivalent convolutional layer in linear operations is less than or equal to the receptive field of the first convolutional layer. .
  • a data path between the two endpoints can be an operation branch.
  • the starting point of the operation branch is the input of the linear operation
  • the end point of the operation branch is the output of the linear operation.
  • the linear operation may include multiple parallel operation branches, and each operation branch is used to process the input data of the linear operation, which is equivalent to the starting point of each operation branch being the input of the linear operation, Furthermore, the input of the sub-linear operation closest to the input of the linear operation in each operation branch is the input data of the linear operation, which is equivalent to that each operation branch is used to process the input data of the linear operation.
  • the operation branches include at least one sub-linear operation of the series.
  • the linear operation can be represented as a computational graph, which defines the input source and output data flow direction of each sub-linear operation. For any path from input to output in the computational graph, you can An operation branch defined as a linear operation.
  • the linear operation shown in FIG. 6a may include two operation branches (represented as operation branch 1 and operation branch 2 in this embodiment), wherein operation branch 1 includes convolution operation 1 and addition operation.
  • operation branch 2 includes convolution operation 2
  • operation branch 1 and operation branch 2 are both used to process the input data of linear operation
  • the data flow direction of operation branch 1 is from convolution operation 1 to
  • the addition operation that is, the input data of the linear operation
  • the data flow of the operation branch 2 is from the convolution operation 2, the convolution operation 3 to the addition operation, that is, the linear operation.
  • the input data is used for processing through convolution operation 2, convolution operation 3, and addition operation in sequence.
  • the linear operation shown in FIG. 6b may include three operation branches (represented as operation branch 1, operation branch 2 and operation branch 3 in this embodiment), wherein operation branch 1 includes a convolution operation. 1 and addition operation, operation branch 2 includes convolution operation 2, convolution operation 3 and addition operation, operation branch 3 includes convolution operation 4, convolution operation 5, convolution operation 6 and addition operation, operation branch 1, Both the operation branch 2 and the operation branch 3 are used to process the input data of the linear operation.
  • the data flow of the operation branch 1 is from the convolution operation 1 to the addition operation, that is, the input data of the linear operation is used to pass the convolution operation 1 and the addition in turn.
  • the data flow of the operation branch 2 is from the convolution operation 2, the convolution operation 3 to the addition operation, that is, the input data of the linear operation is used for the processing through the convolution operation 2, the convolution operation 3 and the addition operation in turn ,
  • the data flow direction of operation branch 3 is from convolution operation 4, convolution operation 5, convolution operation 6 to addition operation, that is, the input data of linear operation is used to sequentially pass through convolution operation 4, convolution operation 5, convolution operation Operation 6 and processing of the addition operation.
  • the linear operation shown in FIG. 6c may include four operation branches (represented as operation branch 1, operation branch 2, operation branch 3 and operation branch 4 in this embodiment), wherein operation branch 1 Including convolution operation 1 and addition operation 2, operation branch 2 includes convolution operation 2, convolution operation 3 and addition operation 2, operation branch 3 includes convolution operation 2, convolution operation 4, addition operation 1, convolution operation Operation 6 and addition operation 1, operation branch 4 includes convolution operation 5, addition operation 1, convolution operation 6 and addition operation 2, operation branch 1, operation branch 2, operation branch 3 and operation branch 4 are used for To process the input data of the linear operation, the data flow of the operation branch 1 is from the convolution operation 1 to the addition operation 2, that is, the input data of the linear operation is used for the processing of the convolution operation 1 and the addition operation 2 in turn.
  • the data flow is from the convolution operation 2, the convolution operation 3 to the addition operation 2, that is, the input data of the linear operation is used for the processing of the convolution operation 2, the convolution operation 3 and the addition operation 2 in turn, and the data of the operation branch 3 is used.
  • the flow direction is from convolution operation 2, convolution operation 4, addition operation 1, convolution operation 6 to addition operation 1, that is, the input data of the linear operation is used to sequentially pass through convolution operation 2, convolution operation 4, addition operation
  • the processing of sum operation 1, convolution operation 6 and sum operation 1, the data flow of operation branch 4 is from convolution operation 5, sum operation 1, convolution operation 6 to sum operation 2, that is, the input of linear operation
  • the data is used for processing through convolution operation 5, sum operation 1, convolution operation 6, and sum operation 2 in sequence.
  • the receptive field is k
  • the receptive field of sum operation and BN operation is 1
  • the equivalent receptive field of the operation branch is k.
  • the definition is: this operation
  • Each output of the branch is affected by kxk inputs.
  • the receptive field of the convolutional layer equivalent to the linear operation is consistent with the receptive field of the linear operation, and the receptive field of the linear operation is equal to the largest receptive field in each operation branch. For example, if the linear operation includes each operation The receptive fields of the branches are 3, 5, 5, 5, and 7, respectively, and the receptive field of the linear operation is equal to 7.
  • the equivalent receptive field of each operation branch in the linear operation is less than or equal to the receptive field of the first convolutional layer.
  • the linear operation may include only one operation branch, the one operation branch is used to process the input data of the linear operation, and the one operation branch includes at least one sub-linear operation in series, Then the equivalent receptive field of only the operation branches included in the linear operation is less than or equal to the receptive field of the first convolutional layer.
  • the receptive field refers to the perceptual field (perceptual range) of a feature on the convolutional layer on the input image. If the pixels in the perceptual range change, the value of the feature will follow. change.
  • the convolution kernel slides on the input image, and the extracted features constitute the convolution layer 101.
  • the convolution kernel is slid on the convolutional layer 101, and the extracted features constitute the convolutional layer 102. Then, each feature in the convolution layer 101 is extracted from the pixels of the input image within the size of the convolution slice of the convolution kernel sliding on the input image, which is also the receptive field of the convolution layer 101 . Therefore, the receptive field of the convolutional layer 101 is shown in FIG. 7 .
  • each feature in the convolutional layer 102 is mapped to the range on the input image (ie, what range of pixels on the input image is used), that is, the receptive field of the convolutional layer 102 .
  • each feature in the convolutional layer 102 is extracted from the pixels of the input image within the size of the convolutional slice of the convolutional kernel sliding on the convolutional layer 101 .
  • each feature on the convolutional layer 101 is extracted by the pixels of the input image within the range of the convolution slice of the convolution kernel sliding on the input image. Therefore, the receptive field of the convolutional layer 102 is larger than that of the convolutional layer 101 .
  • the equivalent receptive field of at least one operation branch in the multiple parallel operation branches is equal to the receptive field of the first convolutional layer
  • the receptive field of the linear operation is equal to the receptive field of the first convolutional layer
  • the receptive field of the equivalent convolutional layer (described later as the second convolutional layer) is equal to the receptive field of the first convolutional layer, and the second convolutional layer can be used for the subsequent model inference process.
  • the receptive field of the second convolutional layer is the same as that of the first convolutional layer, on the premise that the size specification of the neural network model that has not been replaced is consistent, that is, the speed and resource consumption of the inference stage remain unchanged.
  • the receptive field of the second convolutional layer is smaller than that of the first convolutional layer, which increases the amount of training parameters and improves the accuracy of the model.
  • the training device may acquire multiple linear operations, and replace the first convolutional layer in the first neural network model with one linear operation among the multiple linear operations (or replace the first neural network model with one linear operation in the first neural network model).
  • the multiple convolutional layers of (including the first convolutional layer) are replaced with one linear operation of multiple linear operations), and so on, to obtain multiple second neural network models, wherein each second neural network model is obtained by replacing the first convolutional layer in the first neural network model with a linear operation.
  • a certain sampling-based search algorithm such as reinforcement learning, genetic algorithm, etc.
  • a search space including linear operations may be encoded.
  • a feasible encoding method is to first encode the optional sub-linear operations sequentially, such as empty operation, identity operation, 1x1 convolution, 3x3 convolution, BN, 3x3 pooling, and encoding them as 0 respectively. , 1, 2, 3, 4, 5, and then use the adjacency matrix M to represent the computational graph of a set of linear operations.
  • the adjacency matrix M is an N*(N+1) matrix with row numbers 1-N and column numbers 0-N.
  • M[i, j] in the i-th row and the j-th column of the matrix indicates that the output of the j-th node goes through the corresponding operation of M[i, j], and the result is added to the i-th node.
  • the code of the linear operation can be sampled according to the search algorithm, and for each sampled linear operation code, the first convolution in the first neural network model is replaced with a linear operation corresponding to the linear operation code.
  • only one second neural network model may be obtained, that is, one target linear operation may be determined and the first convolutional layer in the first neural network model may be replaced with the determined target linear operation, so as to Obtain a second neural network model, specifically, the training device can obtain a second neural network model according to the first neural network model, wherein the second neural network model is a combination of all of the first neural network model.
  • the first convolution layer is obtained by replacing the target linear operation, the target linear operation includes multiple sub-linear operations, the target linear operation is used to be equivalent to a convolution layer, and the target linear operation includes M operation branches , the input of each operation branch is the input of the target linear operation, and the multiple sub-linear operations satisfy at least one of the following conditions:
  • the multiple sub-linear operations include at least three types of operations; the M is not 3; the number of sub-linear operations included in at least one of the M operation branches is not equal to 2, and the M is a positive integer; or , the number of sub-linear operations whose operation type is convolution operation in at least one operation branch of the M operation branches is not 1.
  • the training device may perform model training on the obtained second neural network models, so as to obtain a plurality of trained second neural network models, and obtain a plurality of second neural network models after training.
  • a target neural network model is determined in the network model, wherein the target neural network model is the neural network model with the highest model accuracy among the plurality of second neural network models.
  • the implementation of the action execution of acquiring multiple second neural networks in step 502 is not strictly after the action execution time of performing model training on the multiple second neural network models in step 503.
  • the training device can obtain the After a second neural network model, the training of the second neural network model is carried out, and after the training is completed, the next second neural network model is obtained, and so on. Further, the training device can obtain multiple second neural network models. network model, and train a plurality of second neural network models.
  • the number of the second neural network model may be pre-specified by the administrator, or the training equipment may, during the training of the second neural network model, complete the training of the second neural network based on the limit of search resources. number of models.
  • the model accuracy (or called the verification accuracy) of each trained second neural network model can be obtained, and the model based on each second neural network model Accuracy, the second neural network model with the highest model accuracy can be selected from multiple second neural network models.
  • the second neural network model corresponding to the target neural network model is to replace the first convolutional layer in the first neural network model. obtained for the target linear operation, and the neural network model with the highest accuracy includes the trained target linear operation.
  • the target linear operation includes multiple sub-linear operations. If the target neural network model is directly used for model inference, it will reduce the model inference speed and increase the resource consumption required for model inference. . Therefore, in this embodiment, a second convolutional layer equivalent to the trained target linear operation can be obtained, and the trained target linear operation in the target neural network model can be replaced with the second volume Layers are stacked to obtain a third neural network model, which can be used for model inference.
  • the second convolutional layer equivalent to the trained target linear operation is obtained, and the trained target linear operation in the target neural network model is replaced with the first
  • the second convolution layer to obtain the third neural network model can be completed by the training device. After the training is completed, the training device can directly feed back the third neural network model.
  • the specific training device can send the third neural network model to The terminal device or the server, so that the terminal device or the server performs model inference based on the third neural network model.
  • the terminal device or server obtains a second convolutional layer equivalent to the trained target linear operation, and replaces the trained target linear operation in the target neural network model Perform actions for the second convolutional layer to obtain the third neural network model.
  • each sub-linear operation may be merged into adjacent sub-linear operations that are located after the sequential order according to the sequence of the multiple sub-linear operations included in the trained target linear operation when processing data. operation until the fusion to the last sub-linear operation in the sequence is completed, so as to obtain a second convolutional layer equivalent to the target linear operation.
  • each sub-linear operation it can be fused to adjacent and subsequent sub-linear operations in the sequence until fusion to the last sub-linear operation (the closest sub-linear operation to the output) is completed.
  • the trained target linear operation includes an adjacent first sub-linear operation and a second sub-linear operation, and in the sequence, the second sub-linear operation is located in the first sub-linear operation After the sub-linear operation, the first sub-linear operation includes a first operation parameter, and the second sub-linear operation includes a second operation parameter;
  • the first sub-linear operation and the second sub-linear operation are any adjacent sub-linear operations in the target linear operation after training, and the second sub-linear operation is located in the sequential A sub-linear operation subsequent to the first sub-linear operation in the sequence, the first sub-linear operation includes a first operation parameter, and the first sub-linear operation is used for performing the first sub-linear operation according to the first operation parameter
  • the input data of the sub-linear operation is processed corresponding to the operation type of the first sub-linear operation
  • the second sub-linear operation includes a second operation parameter
  • the second sub-linear operation is used according to the second operation parameter.
  • the input data of the second sub-linear operation is processed corresponding to the operation type of the second sub-linear operation, and further, the fusion parameters of the first sub-linear operation can be obtained, wherein, if the first sub-linear operation The input data of the linear operation is the input data of the trained target linear operation, then the fusion parameter of the first sub-linear operation is the first operation parameter, and according to the fusion parameter of the first sub-linear operation, the second operation parameter and the operation type of the second sub-linear operation, obtain the fusion parameter of the second sub-linear operation; wherein, if the second sub-linear operation is the last sub-linear operation in the sequence Linear operation, the fusion parameter of the second sub-linear operation is used as the operation parameter of the second convolution layer.
  • the operation type of the linear operation neutron linear operation includes at least one of the following: addition operation, null operation, identity operation, convolution operation, batch normalization BN operation or pooling Operation, convolution operation and BN operation all include trainable operation parameters.
  • addition operation null operation, identity operation, convolution operation, batch normalization BN operation or pooling Operation, convolution operation and BN operation all include trainable operation parameters.
  • null operation null operation
  • identity operation identity operation
  • convolution operation batch normalization BN operation or pooling Operation
  • convolution operation and BN operation all include trainable operation parameters.
  • convolution operation and BN operation all include trainable operation parameters.
  • an empty operation (0) is required, which is equivalent to no operation from node i to node j.
  • the fusion parameters of the second sub-linear operation are the fusion parameters of the first sub-linear operation and all The operation parameters of the second sub-linear operation are obtained by inner product calculation; if the operation type of the second sub-linear operation is an addition operation, a pooling operation, an identity operation or a null operation, then the second sub-linear operation
  • the fusion parameter of is obtained by performing the calculation corresponding to the operation type of the second sub-linear operation on the fusion parameter of the first sub-linear operation.
  • the operation types of the second sub-linear operation are respectively sum operation (described as addition operation in Fig. 11), convolution operation, pooling operation and BN operation as examples , for an exemplary illustration.
  • fusion parameter fusion (output node).
  • the fusion process is performed for each linear operation in the model, and finally a fully fused model is obtained, which is consistent with the original model structure, so the speed and resource consumption of the inference stage remain unchanged.
  • the models before and after fusion are mathematically equivalent, so the accuracy of the model after fusion is consistent with that before fusion.
  • the first neural network model is ResNet18 as an example, and a specific example is used to describe the model training method in the embodiment of the present application:
  • the convolutional layers in the first neural network model are replaced with linear operations.
  • a part of the convolutional layers can be selected for replacement, or all of them can be replaced.
  • the forms of linear operations replaced by different convolutional layers can be different.
  • only the linear operation is the overparameterized form C shown in Figure 12 as an example.
  • each sub-linear operation is represented as nodes 1-8
  • the specific fusion process can be as follows:
  • the fusion parameter of node 1 is The operation parameter of node 1, the fusion parameter of node 2 is the operation parameter of node 2, and the fusion parameter of node 4 is the operation parameter of node 4;
  • node 5 is used to perform processing (convolution operation) corresponding to the operation type of node 5 on the output of node 2 according to the operation parameter of node 5. Therefore, the fusion parameter of node 5 is the fusion parameter of node 2 and the node The inner product of the operating parameters of 5;
  • node 6 is used to perform processing (sum operation) corresponding to the operation type of node 6 for the output of node 5 and the output of node 4. Therefore, the fusion parameter of node 6 is the fusion parameter of node 5 and the node The sum of the operating parameters of 4;
  • node 3 is used to perform processing (convolution operation) corresponding to the operation type of node 3 on the output of node 2 according to the operation parameter of node 3. Therefore, the fusion parameter of node 3 is the fusion parameter of node 2 and the node The inner product of the operating parameters of 3;
  • node 7 is used to perform processing (convolution operation) corresponding to the operation type of node 7 on the output of node 6 according to the operation parameter of node 7. Therefore, the fusion parameter of node 7 is the fusion parameter of node 6 and the node The inner product of the operating parameters of 7;
  • node 8 is used to perform processing (sum operation) corresponding to the operation type of node 8 for the output of node 1, the output of node 3 and the output of node 7. Therefore, the fusion parameter of node 8 is node 1 The sum of the fusion parameters of , the fusion parameters of node 3 and the operation parameters of node 7;
  • the fusion parameter of the node 8 can be used as the operation parameter of the second convolution layer, and the second convolution layer can perform a convolution operation on the input data based on the operation parameter of the second convolution layer.
  • Fusion parameter fusion (node 8): addition, pre-nodes are 1, 3, 7
  • Node 1 fusion parameter fusion (node 1): convolution, directly connected to the input, return parameters
  • node 3 fusion parameter fusion (node 3): convolution, pre-node 2
  • Node 2 fusion parameter fusion (node 2): convolution, directly connected to the input, return parameters
  • fusion parameter fusion (node 7): convolution, pre-node 6
  • Node 6 fusion parameter fusion (node 6): addition, pre-node is 5, 4
  • fusion parameter fusion (node 5): convolution, pre-node 2
  • Node 2 fusion parameter fusion (node 2): convolution, directly connected to the input, return parameters
  • Node 4 fusion parameter fusion (node 4): convolution, directly connected to the input, return parameters
  • the fused model has the same structure as the original ResNet-18 model.
  • the size of the convolutional layer may represent the number of features included in the convolutional layer. Exemplarily, the size of the convolutional layer will be described below with reference to the convolutional layer and the convolutional kernel.
  • the size of the convolutional layer 101 is X*Y*N1, that is, the convolutional layer 101 includes X*Y*N1 features.
  • N1 is the number of channels
  • one channel is one feature dimension
  • X*Y is the number of features included in each channel.
  • X, Y, and N1 are all positive integers greater than 0.
  • the convolution kernel 1011 is one of the convolution kernels used on the convolution layer 101 .
  • the convolution layer 101 uses a total of N2 convolution kernels, and the size and model parameters of the N2 convolution kernels may be the same or different.
  • the size of the convolution kernel 1011 is X1*X1*N1. That is, the convolution kernel 1011 includes X1*X1*N1 model parameters.
  • a feature on one channel of the convolution layer 102 is obtained.
  • the product of the features of the convolution layer 101 and the convolution kernel 1011 can be directly used as the features of the convolution layer 102 .
  • the feature of the convolution layer 101 and the convolution kernel 1011 can also be slid on the convolution layer 101, and after outputting all the product results, normalize all the product results, and use the normalized product results as the convolution layer. 102 features.
  • the convolution kernel 1011 slides on the convolution layer 101 for convolution, and the result of the convolution forms a channel of the convolution layer 102 .
  • Each convolution kernel used in the convolutional layer 101 corresponds to a channel of the convolutional layer 102 . Therefore, the number of channels of the convolutional layer 102 is equal to the number of convolutional kernels acting on the convolutional layer 101 .
  • the design of the model parameters in each convolution kernel reflects the characteristics of the features that the convolution kernel expects to extract from the convolutional layers.
  • the convolutional layer 101 extracts the features of N2 channels.
  • the convolution kernel 1011 is split.
  • the convolution kernel 1011 includes N1 convolution slices, and each convolution slice includes X1*X1 model parameters (P11 to Px1x1).
  • Each model parameter corresponds to a convolution point.
  • the model parameters corresponding to a convolution point are multiplied by the features in the convolution layer in the corresponding position of the convolution point to obtain the convolution result of the convolution point.
  • the sum of the convolution results of the convolution points of a convolution kernel is The convolution result of this convolution kernel.
  • the size of the second convolutional layer is the same as the size of the first convolutional layer.
  • the calculated size of the equivalent convolutional layer will be smaller than the size of the first convolutional layer.
  • the calculated equivalent convolutional layer is subjected to a zero-padding operation to obtain a second convolutional layer of the same size as the first convolutional layer.
  • FIG. 14 is a schematic diagram of a zero-filling operation in an embodiment of the present application.
  • the post-training performance is improved. accuracy of the model.
  • Table 2 shows the accuracy of the network through different alternatives (represented in Table 2 as overparameterized forms). Specifically, in this task, the lower the loss, the stronger the model fitting ability and the higher the model accuracy. As shown in Table 2, for the two model structures, the loss after over-parameterized training is lower than the baseline of the original model structure. At the same time, for different model structures, the optimal over-parameterization forms are also different.
  • An embodiment of the present application provides a model training method.
  • the method includes: acquiring a first neural network model, where the first neural network model includes a first convolution layer; and acquiring a plurality of A second neural network model, wherein each second neural network model is obtained by replacing the first convolutional layer in the first neural network model with a linear operation, and the linear operation is equivalent to a
  • the convolutional layer performs model training on the multiple second neural network models to obtain a target neural network model, where the target neural network model is the neural network model with the highest model accuracy among the multiple second neural network models after training .
  • the convolution layer in the neural network to be trained is replaced with a linear operation that can be equivalent to the convolution layer, and the method with the highest accuracy is selected from multiple replacement methods, thereby improving the accuracy of the model after training. .
  • a typical application scenario of the embodiments of the present application may include a neural network model on a terminal device.
  • the model obtained after training with the training method provided by the embodiment of the present application may be deployed on a terminal device (such as a smartphone) or a cloud server. , providing reasoning abilities.
  • the first neural network model (represented as a DNN model in FIG. 15a) is subjected to model training of the training method provided in this embodiment of the present application, and the fused over-parameterized model is deployed on a terminal device or On the cloud server, reason about the user's data.
  • the training methods provided in the embodiments of the present application can also be applied to AutoML services on the cloud, and combined with other AutoML technologies such as data enhancement strategy search, model structure search, activation function search, hyperparameter search, etc., to further improve the model effect.
  • AutoML technologies such as data enhancement strategy search, model structure search, activation function search, hyperparameter search, etc.
  • Figure 15b and Figure 16a the user provides training data, model structure, and specifies the target task, the AutoML service on the cloud automatically performs a parameterized search, and finally outputs the searched model and corresponding parameters; or
  • Overparameterized training can be combined with other AutoML technologies, such as data augmentation strategy search, model structure search, activation function search, hyperparameter search, etc., to further improve the model effect.
  • FIG. 16b is a schematic flowchart of a model training method provided by an embodiment of the present application.
  • a model training method provided by an embodiment of the present application includes:
  • step 1601 For the specific description of step 1601, reference may be made to the description of step 501, which will not be repeated here.
  • different linear operations can be selected for neural network models with different network structures, neural network models for achieving different target tasks, and convolutional layers in different positions in the neural network model, so that the replaced neural network model can be
  • the trained model has higher accuracy
  • the target linear operation may be determined based on the network structure of the first neural network model and/or the position of the first convolutional layer in the first neural network model. Specifically, it may be determined according to the network structure of the first neural network model. Determine the structure of the target linear operation; the network structure of the first neural network model may be the number of sub-network layers included in the first neural network model, the type of sub-network layers, and the connection relationship between the sub-network layers, the first convolution layer
  • the position in the first neural network model; the structure of the target linear operation may refer to the number of sub-linear operations included in the target linear operation, the type of sub-linear operations, and the connection relationship between the sub-linear operations, for example, it can be based on model search
  • the method of linear operation is performed for the convolutional layers of neural network models with different network structures, and the replaced neural network models are trained to determine the network structure of each neural network model.
  • the corresponding optimal or better linear operation refers to the higher accuracy of the model obtained by training the replaced neural network model; after obtaining the first neural network model, it can be based on For the network structure of the first neural network model, a neural network model with a consistent or similar structure is selected from the network structure of the neural network model obtained by the pre-search, and a corresponding convolutional layer in the consistent or similar neural network model is determined.
  • the linear operation of is the target linear operation, wherein the relative position of the above-mentioned "a convolutional layer" in the consistent or similar neural network model is consistent with or similar to the relative position of the first convolutional layer in the first neural network model;
  • the target linear operation can be determined based on the network structure of the first neural network model and the achieved target task, which is similar to the above-mentioned determination based on the network structure of the first neural network model.
  • the structure and the convolutional layers of the neural network models that achieve different target tasks are replaced by linear operations, and the replaced neural network models are trained to determine the best corresponding convolutional layers in the network structure of each neural network model.
  • Excellent or better linear operation, the optimal or better linear operation refers to the higher accuracy of the model obtained by training the replaced neural network model;
  • the target linear operation can be determined based on the target task achieved by the first neural network model, which is similar to the above-mentioned determination based on the network structure of the first neural network model.
  • the model search method can be used for neural networks that achieve different target tasks.
  • the convolution layer of the model is replaced by a linear operation, and the replaced neural network model is trained to determine the optimal or better linear operation corresponding to each convolution layer in the network structure of each neural network model.
  • the optimal or better linear operation refers to the higher accuracy of the model obtained by training the replaced neural network model;
  • the above-mentioned network structure based on the first neural network model and/or the method for determining the linear operation of the target task is only an illustration, and can also be implemented in other ways, as long as the replaced first neural network model is made. (that is, the second neural network model) has high model accuracy, and does not limit how to determine the specific structure and determination method of the target linear operation.
  • step 1603 For the specific description of step 1603, reference may be made to the description of step 502, which will not be repeated here.
  • step 1604 For the specific description of step 1604, reference may be made to the description of the process of performing model training on the second neural network model in step 503, which will not be repeated here.
  • the convolutional layer in the neural network to be trained is replaced with a target linear operation, and the structure of the target linear operation is determined according to the structure of the first neural network model and/or the target task.
  • the linear operation used when replacing the convolutional layer is adapted to the structure of the linear operation in this embodiment, which is more flexible. Different linear operations can be designed for different model structures and task types, which improves the post-training performance. accuracy of the model.
  • the target linear operation includes multiple sub-linear operations
  • the target linear operation includes M operation branches
  • the input of each operation branch is the input of the target linear operation
  • the M operations A branch satisfies at least one of the following conditions:
  • the input of at least one sub-linear operation in the plurality of sub-linear operations included in the M operation branches is the output of the plurality of sub-linear operations in the plurality of sub-linear operations;
  • the number of sub-linear operations included between at least two of the M operation branches is different; or,
  • the operation types of the sub-linear operations included between at least two of the M operation branches are different.
  • the structure of the target linear operation provided in this embodiment is more complex, which can improve the accuracy of the trained model.
  • the receptive field of the convolutional layer equivalent to the target linear operation is less than or equal to the receptive field of the first convolutional layer.
  • the target linear operation is different from the first convolutional layer.
  • the processing results obtained are the same.
  • the target neural network model includes a trained target linear operation
  • the method further includes:
  • the size of the second convolutional layer is the same as the size of the first convolutional layer.
  • the method further includes:
  • each sub-linear operation is fused into the adjacent and subsequent sub-linear operations in the sequence, until the completion of the The last sub-linear operation in the sequence is fused to obtain the second convolutional layer equivalent to the target linear operation.
  • the trained target linear operation includes an adjacent first sub-linear operation and a second sub-linear operation, and in the sequence, the second sub-linear operation is located in the After the first sub-linear operation, the first sub-linear operation includes a first operation parameter, and the second sub-linear operation includes a second operation parameter;
  • each sub-linear operation into adjacent and subsequent sub-linear operations in the sequence includes:
  • the fusion parameters of the first sub-linear operation wherein, if the input data of the first sub-linear operation is the input data of the trained target linear operation, the fusion parameters of the first sub-linear operation are For the first operation parameter, if the input data of the first sub-linear operation is the output data of the third sub-linear operation adjacent to the first sub-linear operation and before the sequence, then the first sub-linear operation
  • the fusion parameter of a sub-linear operation is obtained according to the fusion parameter of the third sub-linear operation and the first operation parameter;
  • the fusion parameter of the second sub-linear operation is obtained; wherein, if the second sub-linear operation If the linear operation is the last sub-linear operation in the sequence, the fusion parameter of the second sub-linear operation is used as the operation parameter of the second convolution layer.
  • the linear operation includes multiple sub-linear operations
  • the operation types of the multiple sub-linear operations include at least one of the following: sum operation, null operation, identity operation, convolution operation, Batch normalized BN operation or pooling operation.
  • the fusion parameter of the second sub-linear operation is the fusion parameter of the first sub-linear operation
  • the operation parameters of the second sub-linear operation are obtained by inner product calculation; if the operation type of the second sub-linear operation is an addition operation, a pooling operation, an identity operation or a null operation, then the second sub-linear operation
  • the fusion parameter of the linear operation is obtained by performing a calculation corresponding to the operation type of the second sub-linear operation on the fusion parameter of the first sub-linear operation.
  • An embodiment of the present application provides a model training method, including: acquiring a first neural network model, where the first neural network model includes a first convolution layer, and the first neural network model is used to achieve a target task; according to the following at least one piece of information to determine a target linear operation for replacing the first convolutional layer, wherein the information includes the network structure of the first neural network model, the target task, and the first convolutional layer
  • the layer is at the position of the first neural network model, and the target linear operation is equivalent to a convolution layer
  • a second neural network model is obtained, wherein the second neural network model In order to replace the first convolution layer in the first neural network model with the target linear operation; perform model training on the second neural network model to obtain the target neural network model.
  • the convolutional layer in the neural network to be trained is replaced with the target linear operation, and the structure of the target linear operation is to perform the first neural network operation according to the structure of the first neural network model, the target task and/or the first The position in the network model is determined.
  • the structure of the linear operation in this embodiment can be more suitable for the first neural network model and is more flexible. Different linear operations can be designed for different model structures and task types, thereby improving the accuracy of the trained model.
  • the present application provides a model training method, the method includes:
  • the first neural network model includes a first convolutional layer
  • each second neural network model is an operation of replacing the first convolutional layer in the first neural network model with a target linear operation Obtained, the target linear operation is equivalent to a convolution layer, the target linear operation includes multiple sub-linear operations, the target linear operation includes M operation branches, and the input of each operation branch is the target linear operation , the M operation branches satisfy at least one of the following conditions:
  • the input of at least one sub-linear operation in the plurality of sub-linear operations included in the M operation branches is the output of the plurality of sub-linear operations in the plurality of sub-linear operations;
  • the number of sub-linear operations included between at least two of the M operation branches is different; or,
  • the operation types of the sub-linear operations included between at least two of the M operation branches are different;
  • the receptive field of the convolutional layer equivalent to the target linear operation is less than or equal to the receptive field of the first convolutional layer.
  • the target linear operation is different from the first convolutional layer.
  • the processing results obtained are the same.
  • the target neural network model includes a trained target linear operation
  • the method further includes:
  • the size of the second convolutional layer is the same as the size of the first convolutional layer.
  • the method further includes:
  • each sub-linear operation is fused into the adjacent and subsequent sub-linear operations in the sequence, until the completion of the The last sub-linear operation in the sequence is fused to obtain the second convolutional layer equivalent to the target linear operation.
  • the trained target linear operation includes an adjacent first sub-linear operation and a second sub-linear operation, and in the sequence, the second sub-linear operation is located in the After the first sub-linear operation, the first sub-linear operation includes a first operation parameter, and the second sub-linear operation includes a second operation parameter;
  • each sub-linear operation into adjacent and subsequent sub-linear operations in the sequence includes:
  • the fusion parameters of the first sub-linear operation wherein, if the input data of the first sub-linear operation is the input data of the trained target linear operation, the fusion parameters of the first sub-linear operation are For the first operation parameter, if the input data of the first sub-linear operation is the output data of the third sub-linear operation adjacent to the first sub-linear operation and before the sequence, then the first sub-linear operation
  • the fusion parameter of a sub-linear operation is obtained according to the fusion parameter of the third sub-linear operation and the first operation parameter;
  • the fusion parameter of the second sub-linear operation is obtained; wherein, if the second sub-linear operation If the linear operation is the last sub-linear operation in the sequence, the fusion parameter of the second sub-linear operation is used as the operation parameter of the second convolution layer.
  • the linear operation includes multiple sub-linear operations
  • the operation types of the multiple sub-linear operations include at least one of the following: sum operation, null operation, identity operation, convolution operation, Batch normalized BN operation or pooling operation.
  • the fusion parameter of the second sub-linear operation is the fusion parameter of the first sub-linear operation
  • the operation parameters of the second sub-linear operation are obtained by inner product calculation; if the operation type of the second sub-linear operation is an addition operation, a pooling operation, an identity operation or a null operation, then the second sub-linear operation
  • the fusion parameter of the linear operation is obtained by performing a calculation corresponding to the operation type of the second sub-linear operation on the fusion parameter of the first sub-linear operation.
  • the present application provides a model training method.
  • the method includes: obtaining a first neural network model, where the first neural network model includes a first convolution layer; and obtaining a plurality of second neural network models according to the first neural network model.
  • a neural network model wherein each second neural network model is obtained by replacing the first convolutional layer in the first neural network model with a target linear operation, and the target linear operation is equivalent to a convolution layer, the target linear operation includes multiple sub-linear operations, the target linear operation includes M operation branches, the input of each operation branch is the input of the target linear operation, and the M operation branches satisfy at least one of the following conditions
  • the input of at least one sub-linear operation in the multiple sub-linear operations included in the M operation branches is the output of the multiple sub-linear operations in the multiple sub-linear operations; the output of at least two operation branches in the M operation branches is The number of sub-linear operations included in the M operation branches is different; or, the operation types of the sub-linear operations included between at least two operation
  • FIG. 17 is a schematic diagram of a model training apparatus 1700 provided by an embodiment of the present application.
  • the model training apparatus 1700 provided by the present application includes:
  • an obtaining module 1701 configured to obtain a first neural network model, where the first neural network model includes a first convolutional layer;
  • each second neural network model is to replace the first convolutional layer in the first neural network model with a linear obtained from the operation, the linear operation is equivalent to a convolution layer;
  • the model training module 1702 is used to perform model training on the multiple second neural network models to obtain a target neural network model, where the target neural network model is the model with the highest model accuracy among the multiple second neural network models after training. Neural network model.
  • model training module 1702 can refer to the description of step 503 in the above-mentioned embodiment, which is not repeated here.
  • the receptive field of the convolutional layer equivalent to the linear operation is less than or equal to the receptive field of the first convolutional layer.
  • the linear operation includes a plurality of operation branches, the input of each operation branch is the input of the linear operation, each operation branch includes at least one sub-linear operation in series, and the the equivalent receptive field of at least one sub-linear operation of the series is less than or equal to the receptive field of the first convolutional layer; or,
  • the linear operation includes an operation branch for processing input data of the linear operation, the operation branch includes a serial at least one sub-linear operation, and the serial at least one sub-linear operation
  • the equivalent receptive field of the operation is less than or equal to the receptive field of the first convolutional layer.
  • the linear operations in each second neural network model are different from the first convolutional layer, and the linear operations included in different second neural network models are different.
  • the second neural network model corresponding to the target neural network model is obtained by replacing the first convolutional layer in the first neural network model with a target linear operation, and the target The neural network model includes a trained target linear operation, and the acquisition module is used for:
  • the size of the second convolutional layer is the same as the size of the first convolutional layer.
  • the apparatus further includes:
  • a fusion module configured to fuse each sub-linear operation into an adjacent and subsequent sub-linear operation in the sequence according to the sequence of the multiple sub-linear operations included in the trained target linear operation when processing data, Until the fusion to the last sub-linear operation in the sequence is completed, a second convolutional layer equivalent to the target linear operation is obtained.
  • the trained target linear operation includes an adjacent first sub-linear operation and a second sub-linear operation, and in the sequence, the second sub-linear operation is located in the After the first sub-linear operation, the first sub-linear operation includes a first operation parameter, and the second sub-linear operation includes a second operation parameter;
  • the fusion module is used for:
  • the fusion parameters of the first sub-linear operation wherein, if the input data of the first sub-linear operation is the input data of the trained target linear operation, the fusion parameters of the first sub-linear operation are For the first operation parameter, if the input data of the first sub-linear operation is the output data of the third sub-linear operation adjacent to the first sub-linear operation and before the sequence, then the first sub-linear operation
  • the fusion parameter of a sub-linear operation is obtained according to the fusion parameter of the third sub-linear operation and the first operation parameter;
  • the fusion parameter of the second sub-linear operation is obtained; wherein, if the second sub-linear operation If the linear operation is the last sub-linear operation in the sequence, the fusion parameter of the second sub-linear operation is used as the operation parameter of the second convolution layer.
  • the linear operation includes multiple sub-linear operations
  • the operation types of the multiple sub-linear operations include at least one of the following: sum operation, null operation, identity operation, convolution operation, Batch normalized BN operation or pooling operation.
  • the fusion parameter of the second sub-linear operation is the fusion parameter of the first sub-linear operation
  • the operation parameters of the second sub-linear operation are obtained by inner product calculation; if the operation type of the second sub-linear operation is an addition operation, a pooling operation, an identity operation or a null operation, then the second sub-linear operation
  • the fusion parameter of the linear operation is obtained by performing a calculation corresponding to the operation type of the second sub-linear operation on the fusion parameter of the first sub-linear operation.
  • the acquisition module 1701 in the model training device can be used to acquire a first neural network model, where the first neural network model includes a first convolution layer;
  • the second neural network model is obtained by replacing the first convolutional layer in the first neural network model with a target linear operation , the target linear operation includes multiple sub-linear operations, the target linear operation is used to be equivalent to a convolution layer, the target linear operation includes M operation branches, and the input of each operation branch is the target linear operation , the multiple sub-linear operations satisfy at least one of the following conditions:
  • the plurality of sub-linear operations include at least three types of operations
  • the M is not 3;
  • the number of sub-linear operations included in at least one operation branch of the M operation branches is not equal to 2, and the M is a positive integer; or,
  • the number of sub-linear operations whose operation type is a convolution operation in at least one of the M operation branches is not 1;
  • the model training module 1702 may be configured to perform model training on the second neural network model to obtain a target neural network model.
  • the receptive field of the convolutional layer equivalent to the target linear operation is less than or equal to the receptive field of the first convolutional layer.
  • the target linear operation is different from the first convolutional layer.
  • the processing results obtained are the same.
  • the acquisition module is configured to replace the trained target linear operation in the target neural network model with a second convolution layer equivalent to the trained target linear operation , to obtain the third neural network model.
  • the size of the second convolutional layer is the same as the size of the first convolutional layer.
  • the apparatus further includes:
  • a fusion module configured to fuse each sub-linear operation into an adjacent and subsequent sub-linear operation in the sequence according to the sequence of the multiple sub-linear operations included in the trained target linear operation when processing data, Until the fusion to the last sub-linear operation in the sequence is completed, a second convolutional layer equivalent to the target linear operation is obtained.
  • the trained target linear operation includes an adjacent first sub-linear operation and a second sub-linear operation, and in the sequence, the second sub-linear operation is located in the After the first sub-linear operation, the first sub-linear operation includes a first operation parameter, and the second sub-linear operation includes a second operation parameter;
  • the fusion module is configured to obtain fusion parameters of the first sub-linear operation, wherein, if the input data of the first sub-linear operation is the input data of the trained target linear operation, the first sub-linear operation
  • the fusion parameter of the sub-linear operation is the first operation parameter, if the input data of the first sub-linear operation is the third sub-linear operation adjacent to the first sub-linear operation and before the sequence output data, the fusion parameter of the first sub-linear operation is obtained according to the fusion parameter of the third sub-linear operation and the first operation parameter;
  • the fusion parameter of the second sub-linear operation is obtained; wherein, if the second sub-linear operation If the linear operation is the last sub-linear operation in the sequence, the fusion parameter of the second sub-linear operation is used as the operation parameter of the second convolution layer.
  • the linear operation includes multiple sub-linear operations
  • the operation types of the multiple sub-linear operations include at least one of the following: sum operation, null operation, identity operation, convolution operation, Batch normalized BN operation or pooling operation.
  • the fusion parameter of the second sub-linear operation is the fusion parameter of the first sub-linear operation
  • the operation parameters of the second sub-linear operation are obtained by inner product calculation; if the operation type of the second sub-linear operation is an addition operation, a pooling operation, an identity operation or a null operation, then the second sub-linear operation
  • the fusion parameter of the linear operation is obtained by performing a calculation corresponding to the operation type of the second sub-linear operation on the fusion parameter of the first sub-linear operation.
  • the embodiment of the present application also provides a model training device, and the device includes:
  • an acquisition module for acquiring a first neural network model, where the first neural network model includes a first convolution layer;
  • a target linear operation for replacing the first convolutional layer is determined based on at least one of the following information, wherein the information includes a network structure of the first neural network model, the target task, and the first Where the convolutional layer is located in the first neural network model, the target linear operation is equivalent to a convolutional layer;
  • a model training module configured to perform model training on the second neural network model to obtain a target neural network model.
  • the target linear operation includes multiple sub-linear operations
  • the target linear operation includes M operation branches
  • the input of each operation branch is the input of the target linear operation
  • the M operations A branch satisfies at least one of the following conditions:
  • the input of at least one sub-linear operation in the plurality of sub-linear operations included in the M operation branches is the output of the plurality of sub-linear operations in the plurality of sub-linear operations;
  • the number of sub-linear operations included between at least two of the M operation branches is different; or,
  • the operation types of the sub-linear operations included between at least two of the M operation branches are different.
  • the receptive field of the convolutional layer equivalent to the target linear operation is less than or equal to the receptive field of the first convolutional layer.
  • the target linear operation is different from the first convolutional layer.
  • the processing results obtained are the same.
  • the acquisition module is configured to replace the trained target linear operation in the target neural network model with a second convolution layer equivalent to the trained target linear operation , to obtain the third neural network model.
  • the size of the second convolutional layer is the same as the size of the first convolutional layer.
  • the apparatus further includes:
  • a fusion module configured to fuse each sub-linear operation into an adjacent and subsequent sub-linear operation in the sequence according to the sequence of the multiple sub-linear operations included in the trained target linear operation when processing data, Until the fusion to the last sub-linear operation in the sequence is completed, a second convolutional layer equivalent to the target linear operation is obtained.
  • the trained target linear operation includes an adjacent first sub-linear operation and a second sub-linear operation, and in the sequence, the second sub-linear operation is located in the After the first sub-linear operation, the first sub-linear operation includes a first operation parameter, and the second sub-linear operation includes a second operation parameter;
  • the fusion module is configured to obtain fusion parameters of the first sub-linear operation, wherein, if the input data of the first sub-linear operation is the input data of the trained target linear operation, the first sub-linear operation
  • the fusion parameter of the sub-linear operation is the first operation parameter, if the input data of the first sub-linear operation is the third sub-linear operation adjacent to the first sub-linear operation and before the sequence output data, the fusion parameter of the first sub-linear operation is obtained according to the fusion parameter of the third sub-linear operation and the first operation parameter;
  • the fusion parameter of the second sub-linear operation is obtained; wherein, if the second sub-linear operation If the linear operation is the last sub-linear operation in the sequence, the fusion parameter of the second sub-linear operation is used as the operation parameter of the second convolution layer.
  • the linear operation includes multiple sub-linear operations
  • the operation types of the multiple sub-linear operations include at least one of the following: sum operation, null operation, identity operation, convolution operation, Batch normalized BN operation or pooling operation.
  • the fusion parameter of the second sub-linear operation is the fusion parameter of the first sub-linear operation
  • the operation parameters of the second sub-linear operation are obtained by inner product calculation; if the operation type of the second sub-linear operation is an addition operation, a pooling operation, an identity operation or a null operation, then the second sub-linear operation
  • the fusion parameter of the linear operation is obtained by performing a calculation corresponding to the operation type of the second sub-linear operation on the fusion parameter of the first sub-linear operation.
  • the embodiment of the present application also provides a model training device, and the device includes:
  • an acquisition module for acquiring a first neural network model, where the first neural network model includes a first convolution layer;
  • each second neural network model is an operation of replacing the first convolutional layer in the first neural network model with a target linear operation Obtained, the target linear operation is equivalent to a convolution layer, the target linear operation includes multiple sub-linear operations, the target linear operation includes M operation branches, and the input of each operation branch is the target linear operation , the M operation branches satisfy at least one of the following conditions:
  • the input of at least one sub-linear operation in the plurality of sub-linear operations included in the M operation branches is the output of the plurality of sub-linear operations in the plurality of sub-linear operations;
  • the number of sub-linear operations included between at least two of the M operation branches is different; or,
  • the operation types of the sub-linear operations included between at least two of the M operation branches are different;
  • a model training module configured to perform model training on the second neural network model to obtain a target neural network model.
  • the receptive field of the convolutional layer equivalent to the target linear operation is less than or equal to the receptive field of the first convolutional layer.
  • the target linear operation is different from the first convolutional layer.
  • the processing results obtained are the same.
  • the target neural network model includes a trained target linear operation
  • the acquisition module is used for:
  • the size of the second convolutional layer is the same as the size of the first convolutional layer.
  • the apparatus further includes:
  • a fusion module configured to fuse each sub-linear operation into an adjacent and subsequent sub-linear operation in the sequence according to the sequence of the multiple sub-linear operations included in the trained target linear operation when processing data, Until the fusion to the last sub-linear operation in the sequence is completed, a second convolutional layer equivalent to the target linear operation is obtained.
  • the trained target linear operation includes an adjacent first sub-linear operation and a second sub-linear operation, and in the sequence, the second sub-linear operation is located in the first sub-linear operation.
  • the first sub-linear operation includes a first operation parameter
  • the second sub-linear operation includes a second operation parameter
  • each sub-linear operation into adjacent and subsequent sub-linear operations in the sequence includes:
  • the fusion parameters of the first sub-linear operation wherein, if the input data of the first sub-linear operation is the input data of the trained target linear operation, the fusion parameters of the first sub-linear operation are For the first operation parameter, if the input data of the first sub-linear operation is the output data of the third sub-linear operation adjacent to the first sub-linear operation and before the sequence, then the first sub-linear operation
  • the fusion parameter of a sub-linear operation is obtained according to the fusion parameter of the third sub-linear operation and the first operation parameter;
  • the fusion parameter of the second sub-linear operation is obtained; wherein, if the second sub-linear operation If the linear operation is the last sub-linear operation in the sequence, the fusion parameter of the second sub-linear operation is used as the operation parameter of the second convolution layer.
  • the linear operation includes multiple sub-linear operations
  • the operation types of the multiple sub-linear operations include at least one of the following: sum operation, null operation, identity operation, convolution operation, Batch normalized BN operation or pooling operation.
  • the fusion parameter of the second sub-linear operation is the fusion parameter of the first sub-linear operation
  • the operation parameters of the second sub-linear operation are obtained by inner product calculation; if the operation type of the second sub-linear operation is an addition operation, a pooling operation, an identity operation or a null operation, then the second sub-linear operation
  • the fusion parameter of the linear operation is obtained by performing a calculation corresponding to the operation type of the second sub-linear operation on the fusion parameter of the first sub-linear operation.
  • FIG. 18 is a schematic structural diagram of the execution device provided by the embodiment of the present application.
  • Smart wearable devices, servers, etc. are not limited here.
  • the data processing apparatus described in the embodiment corresponding to FIG. 10 may be deployed on the execution device 1800 to implement the function of data processing in the embodiment corresponding to FIG. 10 .
  • the execution device 1800 includes: a receiver 1801, a transmitter 1802, a processor 1803, and a memory 1804 (wherein the number of processors 1803 in the execution device 1800 may be one or more, and one processor is taken as an example in FIG. 11 ) , wherein the processor 1803 may include an application processor 18031 and a communication processor 18032.
  • the receiver 1801, the transmitter 1802, the processor 1803, and the memory 1804 may be connected by a bus or otherwise.
  • Memory 1804 may include read-only memory and random access memory, and provides instructions and data to processor 1803 .
  • a portion of memory 1804 may also include non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1804 stores processors and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.
  • the processor 1803 controls the operation of the execution device.
  • various components of the execution device are coupled together through a bus system, where the bus system may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus.
  • the various buses are referred to as bus systems in the figures.
  • the methods disclosed in the above embodiments of the present application may be applied to the processor 1803 or implemented by the processor 1803 .
  • the processor 1803 may be an integrated circuit chip, which has signal processing capability.
  • each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in the processor 1803 or an instruction in the form of software.
  • the above-mentioned processor 1803 can be a general-purpose processor, a digital signal processor (digital signal processing, DSP), a microprocessor or a microcontroller, a vision processor (vision processing unit, VPU), a tensor processing unit (tensor processing) unit, TPU) and other processors suitable for AI operations, and may further include application specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs) or other programmable logic devices, Discrete gate or transistor logic devices, discrete hardware components.
  • the processor 1803 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 1804, and the processor 1803 reads the information in the memory 1804, and completes the steps of the above method in combination with its hardware.
  • the receiver 1801 can be used to receive input numerical or character information, and generate signal input related to the relevant settings and function control of the execution device.
  • the transmitter 1802 can be used to output digital or character information through the first interface; the transmitter 1802 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1802 can also include a display device such as a display screen .
  • the execution device may acquire the model trained by the model training method in the embodiment corresponding to FIG. 5 or FIG. 16b, and perform model inference.
  • FIG. 19 is a schematic structural diagram of the training device provided by the embodiment of the present application.
  • the training device 1900 is implemented by one or more servers.
  • the training device 1900 can vary widely by configuration or performance, and can include one or more central processing units (CPUs) 1919 (eg, one or more processors) and memory 1932, one or more storage applications
  • a storage medium 1930 (eg, one or more mass storage devices) for programs 1942 or data 1944.
  • the memory 1932 and the storage medium 1930 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1930 may include one or more modules (not shown in the figure), and each module may include a series of instructions to operate on the training device.
  • the central processing unit 1919 may be configured to communicate with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the training device 1900 .
  • Training device 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input and output interfaces 1958; or, one or more operating systems 1941, such as Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM and so on.
  • operating systems 1941 such as Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM and so on.
  • the training device may execute the model training method in the embodiment corresponding to FIG. 5 or FIG. 16b.
  • the model training apparatus 1700 described in FIG. 17 may be a module in the training apparatus, and the processor in the training apparatus may execute the model training method performed by the model training apparatus 1700 .
  • Embodiments of the present application also provide a computer program product that, when running on a computer, causes the computer to perform the steps performed by the aforementioned execution device, or causes the computer to perform the steps performed by the aforementioned training device.
  • Embodiments of the present application further provide a computer-readable storage medium, where a program for performing signal processing is stored in the computer-readable storage medium, and when it runs on a computer, the computer executes the steps performed by the aforementioned execution device. , or, causing the computer to perform the steps as performed by the aforementioned training device.
  • the execution device, training device, or terminal device provided in this embodiment of the present application may specifically be a chip, and the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, pins or circuits, etc.
  • the processing unit can execute the computer execution instructions stored in the storage unit, so that the chip in the execution device executes the data processing method described in the above embodiments, or the chip in the training device executes the data processing method described in the above embodiment.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • FIG. 20 is a schematic structural diagram of a chip provided by an embodiment of the present application.
  • the chip may be represented as a neural network processor NPU 2000, and the NPU 2000 is mounted as a co-processor to the main CPU (Host CPU), tasks are allocated by the Host CPU.
  • the core part of the NPU is the arithmetic circuit 2003, which is controlled by the controller 2004 to extract the matrix data in the memory and perform multiplication operations.
  • the NPU 2000 can implement the model training method provided in the embodiment described in FIG. 5 through the cooperation between various internal devices, or perform inference on the model obtained by training.
  • the operation circuit 2003 in the NPU 2000 can perform the steps of acquiring the first neural network model and performing model training on the first neural network model.
  • the arithmetic circuit 2003 in the NPU 2000 includes a plurality of processing units (Process Engine, PE).
  • the arithmetic circuit 2003 is a two-dimensional systolic array.
  • the arithmetic circuit 2003 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition.
  • the arithmetic circuit 2003 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 2002 and buffers it on each PE in the arithmetic circuit.
  • the arithmetic circuit fetches the data of matrix A and matrix B from the input memory 2001 to perform matrix operation, and stores the partial result or final result of the matrix in an accumulator 2008 .
  • Unified memory 2006 is used to store input data and output data.
  • the weight data directly passes through the storage unit access controller (Direct Memory Access Controller, DMAC) 2005, and the DMAC is transferred to the weight memory 2002.
  • Input data is also transferred to unified memory 2006 via the DMAC.
  • DMAC Direct Memory Access Controller
  • the BIU is the Bus Interface Unit, that is, the bus interface unit 2010, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (Instruction Fetch Buffer, IFB) 2009.
  • IFB Instruction Fetch Buffer
  • the bus interface unit 2010 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 2009 to obtain instructions from the external memory, and also for the storage unit access controller 2005 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 2006 , the weight data to the weight memory 2002 , or the input data to the input memory 2001 .
  • the vector calculation unit 2007 includes a plurality of operation processing units, and further processes the output of the operation circuit 2003, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc., if necessary. It is mainly used for non-convolutional/fully connected layer network computation in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.
  • the vector computation unit 2007 can store the processed output vectors to the unified memory 2006 .
  • the vector calculation unit 2007 can apply a linear function; or a nonlinear function to the output of the operation circuit 2003, such as performing linear interpolation on the feature plane extracted by the convolution layer, such as a vector of accumulated values, to generate activation values.
  • the vector computation unit 2007 generates normalized values, pixel-level summed values, or both.
  • the vector of processed outputs can be used as activation input to the arithmetic circuit 2003, eg, for use in subsequent layers in a neural network.
  • the instruction fetch memory (instruction fetch buffer) 2009 connected to the controller 2004 is used to store the instructions used by the controller 2004;
  • Unified memory 2006, input memory 2001, weight memory 2002 and instruction fetch memory 2009 are all On-Chip memories. External memory is private to the NPU hardware architecture.
  • the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above program.
  • the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be retrieved from a website, computer, training device, or data Transmission from the center to another website site, computer, training facility or data center via wired (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) means.
  • wired eg coaxial cable, fiber optic, digital subscriber line (DSL)
  • wireless eg infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device, a data center, or the like that includes an integration of one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Filters That Use Time-Delay Elements (AREA)

Abstract

本申请公开了一种模型训练方法,可以应用于人工智能领域,方法包括:获取第一神经网络模型,将所述第一神经网络模型中的所述第一卷积层替换为一种线性操作得到多个第二神经网络模型,对训练后的所述多个第二神经网络模型进行模型训练,以获取多个第二神经网络模型中模型精度最高的神经网络模型。本申请将待训练的神经网络中的卷积层替换为可以等效为卷积层的线性操作,并从多个替换方式中选择精度最高的方式,以此提高了训练后模型的精度。

Description

一种模型训练方法及装置
本申请要求于2021年2月10日提交中国专利局、申请号为202110183936.2、发明名称为“一种模型训练方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种模型训练方法及装置。
背景技术
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
为了在模型训练时提高模型的精度,可以采用过参数化的训练方法,具体可以在训练时会在原始模型的基础上引入额外的参数和计算,从而影响模型的训练过程,并达到提高模型精度的目的。ACNet(Asymmetric Convolutional Network)是一种过参数化训练方法,其中,在训练过程中,将原始的3x3卷积替换为3x3、1x3以及3x1三个卷积的和,然而ACNet只有一种固定的过参数化形式,对模型性能的提高很有限。
发明内容
第一方面,本申请提供了一种模型训练方法,所述方法包括:
获取第一神经网络模型,所述第一神经网络模型包括第一卷积层;训练设备可以将第一神经网络模型中的部分或全部卷积层替换为线性操作。其中,替换的卷积层对象可以是第一神经网络模型中包括的第一卷积层,具体的,第一神经网络模型可以包括多个卷积层,第一卷积层为多个卷积层中的一个。其中,替换的卷积层对象可以是第一神经网络模型中包括的多个卷积层,第一卷积层为多个卷积层中的一个。
根据所述第一神经网络模型,获取多个第二神经网络模型,其中,每个第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为一种线性操作得到的,所述线性操作等效为一个卷积层;
其中,本申请实施例中所谓的“等效”,是指两个运算单元之间的关系,具体的,是指两个形式上不同的运算单元,在处理任意相同的数据时,得到的处理结果是相同的,两个运算单元中的一个运算单元通过数学运算推导,可以变换为另一个运算单元的形式。针对于本申请实施例,线性操作中包括的子线性操作可以通过数学运算推导,变换为卷积层的形式,且变换得到的卷积层和所述线性操作在处理相同的数据时,得到的处理结果相同;
线性操作由多个子线性操作来组成,这里所谓的子线性操作可以指基础的线性操作,而不是多个基础线性操作复合而成的操作,这里所谓的线性操作,是指多个基础线性操作 复合而成的操作。例如,子线性操作的操作类型可以但不限于为加和操作、空操作、恒等操作、卷积操作、批归一化BN操作或池化操作,相应的,线性操作可以指加和操作、空操作、恒等操作、卷积操作、批归一化BN操作以及池化操作中的至少一种子线性操作的复合。应理解,这里的复合,是指子线性操作的数量大于或等于2,且子线性操作之间存在连接关系,不存在孤立的子线性操作,所谓存在连接关系,是指一个子线性操作的输出用于作为另一个子线性操作的输入(除了位于线性操作输出侧的子线性操作,该子线性操作的输出用于作为线性操作的输出);
应理解,每个第二神经网络模型中的所述线性操作与所述第一卷积层不同,且不同的第二神经网络模型包括的所述线性操作不同;
对所述多个第二神经网络模型进行模型训练,以获取目标神经网络模型,所述目标神经网络模型为训练后的多个第二神经网络模型中模型精度最高的神经网络模型。
在进行第二神经网络模型的训练时,可以得到每个训练后的第二神经网络模型的模型精度(或者称之为验证精度),基于各个第二神经网络模型的模型精度,可以从多个第二神经网络模型中选择模型精度最高的第二神经网络模型;
通过上述方式,将待训练的神经网络中的卷积层替换为可以等效为卷积层的线性操作,并从多个替换方式中选择精度最高的方式,以此提高了训练后模型的精度。
在一种可能的实现中,所述线性操作等效的卷积层的感受野小于或等于所述第一卷积层的感受野。
为了能够使线性操作能够等效为一个卷积层,线性操作中包括的多个子线性操作中,至少需要有一个卷积操作。在后续进行模型推理的过程中,为了使在后续进行模型推理的过程中,不降低推理阶段的速度或者增加推理阶段的资源消耗,并不将线性操组用于模型推理,而是采用线性操作等效的卷积层(后续实施例中可以称为第二卷积层)用于模型推理,且需要保证线性操作等效的卷积层的感受野小于或等于第一卷积层的感受野。
在一种可能的实现中,所述线性操作包括多个操作分支,每个操作分支的输入为所述线性操作的输入,也就是说每个操作分支用于对所述线性操作的输入数据进行处理,所述每个操作分支包括串行的至少一个子线性操作,且所述串行的至少一个子线性操作的等效感受野小于或等于所述第一卷积层的感受野;或,
所述线性操作包括一个操作分支,所述操作分支用于对所述线性操作的输入数据进行处理,所述操作分支包括串行的至少一个子线性操作,且所述串行的至少一个子线性操作的等效感受野小于或等于所述第一卷积层的感受野。
其中,以线性操作的输入和输出为两个端点,则两个端点之间的一条数据通路可以为一个操作分支,操作分支的起点为线性操作的输入,操作分支的终点为线性操作的输出,在一种实现中,所述线性操作可以包括多个操作分支,每个操作分支用于对所述线性操作的输入数据进行处理,也就相当于每个操作分支的起点为线性操作的输入,进而,每个操作分支中距离线性操作的输入最近的子线性操作的输入为所述线性操作的输入数据,相当 于每个操作分支用于对所述线性操作的输入数据进行处理,所述每个操作分支包括串行的至少一个子线性操作。换一种表达方式,可以将线性操作表示成一张计算图,该计算图中定义了各个子线性操作的输入来源和输出数据的流向,对于该计算图的任意一条从输入到输出的路径,可以定义为线性操作的一条操作分支;
针对于单个子线性操作,例如k*k卷积或池化的感受野为k,加和操作以及BN操作的感受野为1,而操作分支的等效感受野为k的定义是:该操作分支的每个输出受kxk个输入的影响;
为了保证线性操作等效的感受野小于或等于第一卷积层的感受野,需要使得线性操作中各个操作分支的等效感受野小于或等于第一卷积层的感受野;在一种实现中,所述线性操作可以仅包括一个操作分支,所述一个操作分支用于对所述线性操作的输入数据进行处理,所述一个操作分支包括串行的至少一个子线性操作,则该线性操作中仅包括的操作分支的等效感受野小于或等于第一卷积层的感受野。
在一种可能的实现中,所述多个并行的操作分支中至少一个操作分支的等效感受野等于所述第一卷积层的感受野;或,
所述线性操作仅包括的一个操作分支的的等效感受野等于所述第一卷积层的感受野。
在一种实现中,所述多个并行的操作分支中至少一个操作分支的等效感受野等于所述第一卷积层的感受野,则线性操作的感受野等于第一卷积层的感受野,进而线性操作等效的卷积层(后续描述为第二卷积层)的感受野等于第一卷积层的感受野,第二卷积层可以用于后续的模型推理过程,且由于第二卷积层与第一卷积层的感受野一致,在保证和未进行替换的神经网络模型的尺寸规格一致的前提下,也就是在保证推理阶段的速度、资源消耗保持不变的前提下,相比第二卷积层的感受野小于第一卷积层的感受野,增加了训练参数量,提高了模型的精度。
在一种可能的实现中,每个第二神经网络模型中的所述线性操作与所述第一卷积层不同,且不同的第二神经网络模型包括的所述线性操作不同。
在一种可能的实现中,所述线性操作等效的卷积层和所述线性操作在处理相同的数据时,得到的处理结果相同。
在一种可能的实现中,所述目标神经网络模型包括训练后的目标线性操作,所述方法还包括:
将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述训练后的目标线性操作等效的第二卷积层,以获取第三神经网络模型。
由于相比第一卷积层,目标线性操作中包括了多个子线性操作,若将该目标神经网络模型直接用于模型推理,则会降低模型推理速度,以及增加模型推理时所需的资源消耗。因此本实施例中,可以获取所述训练后的目标线性操作等效的第二卷积层,并将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述第二卷积层,以获取第三神经网络模型,所述第三神经网络模型可以用于进行模型推理;
其中,所谓的模型推理,是指在模型的应用过程中,利用模型进行实际的数据处理过 程。
应理解,本申请实施例中,获取所述训练后的目标线性操作等效的第二卷积层,并将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述第二卷积层,以获取第三神经网络模型的步骤,可以由训练设备完成,在完成训练后,训练设备可以直接反馈第三神经网络模型,具体的训练设备可以将第三神经网络模型发送至终端设备或者服务器,以便端设备或者服务器基于第三神经网络模型进行模型推理。或者,由端设备或者服务器在进行模型推理之前进行获取所述训练后的目标线性操作等效的第二卷积层,并将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述第二卷积层,以获取第三神经网络模型的动作执行;
在一种可能的实现中,所述第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
为了能够使得进行推理时所采用的模型能够和训练前的第一神经网络模型具有相同的规格,则需要使得第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致;
在一种实现中,若目标线性操作的感受野等于第一卷积层的感受野,则第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
在一种实现中,若目标线性操作的感受野小于第一卷积层的感受野,则计算出的等效卷积层的尺寸大小会小于第一卷积层的尺寸大小,此时可以对计算出的等效卷积层进行补0操作,以得到和所述第一卷积层的尺寸大小一致的第二卷积层。
在一种可能的实现中,所述方法还包括:
根据所述训练后的目标线性操作包括的多个子线性操作在处理数据时的先后顺序,将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,直至完成到所述先后顺序中最后一个子线性操作的融合,以得到所述目标线性操作等效的第二卷积层。
若子线性操作不为与线性操作输入侧直接连接的操作,则其融合参数为自身的操作参数;
若子线性操作不为与线性操作输入侧直接连接的操作,其融合参数是基于相邻的前置子线性操作的融合参数得到的,或者是基于相邻的前置操作的融合参数以及自身的操作参数得到的;
针对于每个子线性操作,可以按照多个子线性操作在处理数据时的先后顺序将其融合至相邻且位于所述先后顺序中之后的子线性操作,直到完成到最后一个子线性操作(距离输出最近的子线性操作)的融合。
应理解,子线性操作的输入的确定需要依赖于其他子线性操作完成数据处理并得到相应的输出,例如,A操作的输出为B操作的输入,B操作的输出为C操作的输入,则C操作一定要在A操作和B操作完成数据处理并得到相应的输出后,才可以进行C操作的数据处理,因此,该子线性操作需要完成子线性操作的参数融合之后,才进行自身的参数融合。
应理解,一些子线性操作的输入的确定不需要依赖于某些子线性操作完成数据处理并得到相应的输出,例如,A1操作的输入为整体线性操作的输入,A1操作的输出为A2操作的输入,A2操作的输出为B操作的输入,C1操作的输入为整体线性操作的输入,C1操作 的输出为C2操作的输入,C2操作的输出也为B操作的输入,则A1操作处理数据和C1处理数据之间没有严格的时间先后约束,则A1操作融合至A2的过程在C1操作融合至C2的过程可以同时、之前或者之后。
在一种可能的实现中,所述训练后的目标线性操作包括相邻的第一子线性操作以及第二子线性操,且在所述先后顺序中,所述第二子线性操作位于所述第一子线性操作之后,所述第一子线性操作包括第一操作参数,所述第二子线性操作包括第二操作参数;
所述将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,包括:
获取所述第一子线性操作的融合参数,其中,若所述第一子线性操作的输入数据为所述训练后的目标线性操作的输入数据,则所述第一子线性操作的融合参数为所述第一操作参数,若所述第一子线性操作的输入数据为与所述第一子线性操作相邻且在所述先后顺序之前的第三子线性操作的输出数据,则所述第一子线性操作的融合参数为根据所述第三子线性操作的融合参数以及所述第一操作参数得到;
根据所述第一子线性操作的融合参数、所述第二操作参数以及所述第二子线性操作的操作类型,获取所述第二子线性操作的融合参数;其中,若所述第二子线性操作为所述先后顺序中的最后一个子线性操作,则所述第二子线性操作的融合参数用于作为所述第二卷积层的操作参数。
本实施例中,第一子线性操作以及第二子线性操作可以为所述训练后的目标线性操作中任意相邻的子线性操作,且所述第二子线性操作位于所述先后顺序中所述第一子线性操作之后的子线性操作,所述第一子线性操作包括第一操作参数,所述第一子线性操作用于根据所述第一操作参数,对所述第一子线性操作的输入数据进行所述第一子线性操作的操作类型对应的处理,所述第二子线性操作包括第二操作参数,所述第二子线性操作用于根据所述第二操作参数,对所述第二子线性操作的输入数据进行所述第二子线性操作的操作类型对应的处理,所述将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,包括:
获取所述第一子线性操作的融合参数,其中,若所述第一子线性操作的输入数据为所述训练后的目标线性操作的输入数据,则所述第一子线性操作的融合参数为所述第一操作参数;
根据所述第一子线性操作的融合参数、所述第二操作参数以及所述第二子线性操作的操作类型,获取所述第二子线性操作的融合参数;其中,若所述第二子线性操作为所述先后顺序中的最后一个子线性操作,则所述第二子线性操作的融合参数用于作为所述第二卷积层的操作参数。
对于训练后的目标神经网络的线性操作,融合参数=融合(输出节点)。对模型中的每一个线性操作均执行融合过程,最终得到完全融合后的模型,该模型与原模型结构一致,因此推理阶段的速度、资源消耗保持不变。同时,融合前后的模型在数学上是等价的,因此融合后模型的精度和融合前保持一致。
在一种可能的实现中,所述线性操作包括多个子线性操作,且所述多个子线性操作的操作类型包括如下的至少一种:加和操作、空操作、恒等操作、卷积操作、批归一化BN 操作或池化操作。
在一种可能的实现中,若所述第二子线性操作的操作类型为卷积操作或BN操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数以及所述第二子线性操作的操作参数进行内积计算得到;若所述第二子线性操作的操作类型为加和操作、池化操作、恒等操作或空操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数进行所述第二子线性操作的操作类型对应的计算得到。
第二方面,本申请提供了一种模型训练方法,所述方法包括:
获取第一神经网络模型,所述第一神经网络模型包括第一卷积层,所述第一神经网络模型用于实现目标任务;
根据如下信息的至少一种,确定用于替换所述第一卷积层的目标线性操作,其中,所述信息包括所述第一神经网络模型的网络结构、所述目标任务以及所述第一卷积层在所述第一神经网络模型的位置,所述目标线性操作等效为一个卷积层;
其中,针对于不同网络结构的神经网络模型、实现不同目标任务的神经网络模型、以及在神经网络模型中不同位置的卷积层,可以选择不同的线性操作,以使得替换后的神经网模型经过训练后的模型精度较高;
其中,目标线性操作可以基于第一神经网络模型的网络结构和/或第一卷积层在所述第一神经网络模型的位置来确定,具体的,可以根据第一神经网络模型的网络结构来确定目标线性操作的结构;第一神经网络模型的网络结构可以是第一神经网络模型包括的子网络层的数量、子网络层的类型、以及子网络层之间连接关系、第一卷积层在第一神经网络模型中所在的位置;目标线性操作的结构可以是指目标线性操作包括的子线性操作的数量、子线性操作的类型以及子线性操作之间的连接关系,例如可以基于模型搜索的方式,针对于具有不同网络结构的神经网络模型的卷积层进行进行线性操作替换,并对替换后的神经网络模型进行训练,以确定出各个神经网络模型的网络结构中各个卷积层所对应的最优或者较优的线性操作,该最优或者较优的线性操作是指对替换后的神经网络模型进行训练得到的模型的精度较高;在获取第一神经网络模型之后,可以基于第一神经网络模型的网络结构,从之前预先搜索得到的神经网络模型的网络结构中选择结构一致或者相似的神经网络模,并确定该一致或者相似的神经网络模型中的一个卷积层所对应的线性操作作为目标线性操作,其中,上述“一个卷积层”在该一致或者相似的神经网络模型中的相对位置与第一卷积层在第一神经网络模型中的相对位置一致或者相似;
其中,目标线性操作可以基于第一神经网络模型的网络结构以及实现的目标任务来确定,和上述基于第一神经网络模型的网络结构来确定类似,可以通过模型搜索的方式,针对于具有不同网络结构以及实现不同目标任务的神经网络模型的卷积层进行进行线性操作替换,并对替换后的神经网络模型进行训练,以确定出各个神经网络模型的网络结构中各个卷积层所对应的最优或者较优的线性操作,该最优或者较优的线性操作是指对替换后的神经网络模型进行训练得到的模型的精度较高;
其中,目标线性操作可以基于第一神经网络模型实现的目标任务来确定,和上述基于第一神经网络模型的网络结构来确定类似,可以通过模型搜索的方式,针对于实现不同目标任务的神经网络模型的卷积层进行进行线性操作替换,并对替换后的神经网络模型进行训练,以确定出各个神经网络模型的网络结构中各个卷积层所对应的最优或者较优的线性操作,该最优或者较优的线性操作是指对替换后的神经网络模型进行训练得到的模型的精度较高;
应理解,上述基于第一神经网络模型的网络结构和/或所述目标任务确定目标线性操作的方式仅为一种示意,还可以通过其他方式来实现,只要使得替换后的第一神经网络模型(也就是第二神经网络模型)的模型精度较高,并不限定如何确定目标线性操作的具体结构以及确定方式。
根据所述第一神经网络模型,获取第二神经网络模型,其中,所述第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为所述目标线性操作得到的;
对所述第二神经网络模型进行模型训练,以获取目标神经网络模型。
本实施例中,将待训练的神经网络中的卷积层替换为目标线性操作,目标线性操作的结构为根据第一神经网络模型的结构和/或目标任务确定的,相比现有技术中进行卷积层替换时所采用的线性操作,本实施例中线性操作的结构可以更适配于第一神经网络模型,更加的灵活,针对于不同的模型结构以及任务类型可以设计不同的线性操作,以此提高了训练后模型的精度。
在一种可能的实现中,所述目标线性操作包括多个子线性操作,所述目标线性操作包括M个操作分支,每个操作分支的输入为所述目标线性操作的输入,所述M个操作分支满足如下条件的至少一个:
所述M个操作分支包括的多个子线性操作中至少一个子线性操作的输入为所述多个子线性操作中多个子线性操作的输出;
所述M个操作分支中至少两个操作分支之间包括的子线性操作的数量不同;或,
所述M个操作分支中至少两个操作分支之间包括的子线性操作的操作类型不同。
相对于现有技术中,用于替换卷积层的线性操作的结构,本实施例中提供的目标线性操作的结构更为复杂,可以提高训练后模型的精度。
在一种可能的实现中,所述目标线性操作等效的卷积层的感受野小于或等于所述第一卷积层的感受野。
在一种可能的实现中,所述目标线性操作与所述第一卷积层不同。
在一种可能的实现中,所述目标线性操作等效的卷积层和所述目标线性操作在处理相同的数据时,得到的处理结果相同。
在一种可能的实现中,所述目标神经网络模型包括训练后的目标线性操作,所述方法还包括:
将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述训练后的目标线性操作等效的第二卷积层,以获取第三神经网络模型。
在一种可能的实现中,所述第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
在一种可能的实现中,所述方法还包括:
根据所述训练后的目标线性操作包括的多个子线性操作在处理数据时的先后顺序,将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,直至完成到所述先后顺序中最后一个子线性操作的融合,以得到所述目标线性操作等效的第二卷积层。
在一种可能的实现中,所述训练后的目标线性操作包括相邻的第一子线性操作以及第二子线性操,且在所述先后顺序中,所述第二子线性操作位于所述第一子线性操作之后,所述第一子线性操作包括第一操作参数,所述第二子线性操作包括第二操作参数;
所述将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,包括:
获取所述第一子线性操作的融合参数,其中,若所述第一子线性操作的输入数据为所述训练后的目标线性操作的输入数据,则所述第一子线性操作的融合参数为所述第一操作参数,若所述第一子线性操作的输入数据为与所述第一子线性操作相邻且在所述先后顺序之前的第三子线性操作的输出数据,则所述第一子线性操作的融合参数为根据所述第三子线性操作的融合参数以及所述第一操作参数得到;
根据所述第一子线性操作的融合参数、所述第二操作参数以及所述第二子线性操作的操作类型,获取所述第二子线性操作的融合参数;其中,若所述第二子线性操作为所述先后顺序中的最后一个子线性操作,则所述第二子线性操作的融合参数用于作为所述第二卷积层的操作参数。
在一种可能的实现中,所述线性操作包括多个子线性操作,且所述多个子线性操作的操作类型包括如下的至少一种:加和操作、空操作、恒等操作、卷积操作、批归一化BN操作或池化操作。
在一种可能的实现中,若所述第二子线性操作的操作类型为卷积操作或BN操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数以及所述第二子线性操作的操作参数进行内积计算得到;若所述第二子线性操作的操作类型为加和操作、池化操作、恒等操作或空操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数进行所述第二子线性操作的操作类型对应的计算得到。
此外,本申请提供了一种模型训练方法,其特征在于,所述方法包括:
获取第一神经网络模型,所述第一神经网络模型包括第一卷积层;
根据所述第一神经网络模型,获取多个第二神经网络模型,其中,每个第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为目标线性操作得到的,所述目标线性操作等效为一个卷积层,所述目标线性操作包括多个子线性操作,所述目标线性操作包括M个操作分支,每个操作分支的输入为所述目标线性操作的输入,所述M个操作分支满足如下条件的至少一个:
所述M个操作分支包括的多个子线性操作中至少一个子线性操作的输入为所述多个子线性操作中多个子线性操作的输出;
所述M个操作分支中至少两个操作分支之间包括的子线性操作的数量不同;或,
所述M个操作分支中至少两个操作分支之间包括的子线性操作的操作类型不同;
对所述第二神经网络模型进行模型训练,以获取目标神经网络模型。
在一种可能的实现中,所述目标线性操作等效的卷积层的感受野小于或等于所述第一卷积层的感受野。
在一种可能的实现中,所述目标线性操作与所述第一卷积层不同。
在一种可能的实现中,所述目标线性操作等效的卷积层和所述目标线性操作在处理相同的数据时,得到的处理结果相同。
在一种可能的实现中,所述目标神经网络模型包括训练后的目标线性操作,所述方法还包括:
将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述训练后的目标线性操作等效的第二卷积层,以获取第三神经网络模型。
在一种可能的实现中,所述第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
在一种可能的实现中,所述方法还包括:
根据所述训练后的目标线性操作包括的多个子线性操作在处理数据时的先后顺序,将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,直至完成到所述先后顺序中最后一个子线性操作的融合,以得到所述目标线性操作等效的第二卷积层。
在一种可能的实现中,所述训练后的目标线性操作包括相邻的第一子线性操作以及第二子线性操,且在所述先后顺序中,所述第二子线性操作位于所述第一子线性操作之后,所述第一子线性操作包括第一操作参数,所述第二子线性操作包括第二操作参数;
所述将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,包括:
获取所述第一子线性操作的融合参数,其中,若所述第一子线性操作的输入数据为所述训练后的目标线性操作的输入数据,则所述第一子线性操作的融合参数为所述第一操作参数,若所述第一子线性操作的输入数据为与所述第一子线性操作相邻且在所述先后顺序 之前的第三子线性操作的输出数据,则所述第一子线性操作的融合参数为根据所述第三子线性操作的融合参数以及所述第一操作参数得到;
根据所述第一子线性操作的融合参数、所述第二操作参数以及所述第二子线性操作的操作类型,获取所述第二子线性操作的融合参数;其中,若所述第二子线性操作为所述先后顺序中的最后一个子线性操作,则所述第二子线性操作的融合参数用于作为所述第二卷积层的操作参数。
在一种可能的实现中,所述线性操作包括多个子线性操作,且所述多个子线性操作的操作类型包括如下的至少一种:加和操作、空操作、恒等操作、卷积操作、批归一化BN操作或池化操作。
在一种可能的实现中,若所述第二子线性操作的操作类型为卷积操作或BN操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数以及所述第二子线性操作的操作参数进行内积计算得到;若所述第二子线性操作的操作类型为加和操作、池化操作、恒等操作或空操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数进行所述第二子线性操作的操作类型对应的计算得到。
本申请提供了一种模型训练方法,所述方法包括:获取第一神经网络模型,所述第一神经网络模型包括第一卷积层;根据所述第一神经网络模型,获取多个第二神经网络模型,其中,每个第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为目标线性操作得到的,所述目标线性操作等效为一个卷积层,所述目标线性操作包括多个子线性操作,所述目标线性操作包括M个操作分支,每个操作分支的输入为所述目标线性操作的输入,所述M个操作分支满足如下条件的至少一个:所述M个操作分支包括的多个子线性操作中至少一个子线性操作的输入为所述多个子线性操作中多个子线性操作的输出;所述M个操作分支中至少两个操作分支之间包括的子线性操作的数量不同;或,所述M个操作分支中至少两个操作分支之间包括的子线性操作的操作类型不同;对所述第二神经网络模型进行模型训练,以获取目标神经网络模型。相对于现有技术中,用于替换卷积层的线性操作的结构,本实施例中提供的目标线性操作的结构更为复杂,可以提高训练后模型的精度。
第三方面,本申请提供了一种模型训练装置,所述装置包括:
获取模块,用于获取第一神经网络模型,所述第一神经网络模型包括第一卷积层;
根据所述第一神经网络模型,获取多个第二神经网络模型,其中,每个第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为一种线性操作得到的,所述线性操作等效为一个卷积层;
模型训练模块,用于对所述多个第二神经网络模型进行模型训练,以获取目标神经网络模型,所述目标神经网络模型为训练后的多个第二神经网络模型中模型精度最高的神经网络模型。
通过上述方式,将待训练的神经网络中的卷积层替换为可以等效为卷积层的线性操作,并从多个替换方式中选择精度最高的方式,以此提高了训练后模型的精度。
在一种可能的实现中,所述线性操作等效的卷积层的感受野小于或等于所述第一卷积层的感受野。
为了能够使线性操作能够等效为一个卷积层,线性操作中包括的多个子线性操作中,至少需要有一个卷积操作。在后续进行模型推理的过程中,为了使在后续进行模型推理的过程中,不降低推理阶段的速度或者增加推理阶段的资源消耗,并不将线性操组用于模型推理,而是采用线性操作等效的卷积层(后续实施例中可以称为第二卷积层)用于模型推理,且需要保证线性操作等效的卷积层的感受野小于或等于第一卷积层的感受野。
在一种可能的实现中,所述线性操作包括多个操作分支,每个操作分支的输入为所述线性操作的输入,所述每个操作分支包括串行的至少一个子线性操作,且所述串行的至少一个子线性操作的等效感受野小于或等于所述第一卷积层的感受野;或,
所述线性操作包括一个操作分支,所述操作分支用于对所述线性操作的输入数据进行处理,所述操作分支包括串行的至少一个子线性操作,且所述串行的至少一个子线性操作的等效感受野小于或等于所述第一卷积层的感受野。
在一种实现中,所述多个并行的操作分支中至少一个操作分支的等效感受野等于所述第一卷积层的感受野,则线性操作的感受野等于第一卷积层的感受野,进而线性操作等效的卷积层(后续描述为第二卷积层)的感受野等于第一卷积层的感受野,第二卷积层可以用于后续的模型推理过程,且由于第二卷积层与第一卷积层的感受野一致,在保证和未进行替换的神经网络模型的尺寸规格一致的前提下,也就是在保证推理阶段的速度、资源消耗保持不变的前提下,相比第二卷积层的感受野小于第一卷积层的感受野,增加了训练参数量,提高了模型的精度。
在一种可能的实现中,每个第二神经网络模型中的所述线性操作与所述第一卷积层不同,且不同的第二神经网络模型包括的所述线性操作不同。
在一种可能的实现中,所述线性操作等效的卷积层和所述线性操作在处理相同的数据时,得到的处理结果相同。
在一种可能的实现中,述目标神经网络模型包括训练后的目标线性操作,所述获取模块,用于:
将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述训练后的目标线性操作等效的第二卷积层,以获取第三神经网络模型。
由于相比第一卷积层,目标线性操作中包括了多个子线性操作,若将该目标神经网络模型直接用于模型推理,则会降低模型推理速度,以及增加模型推理时所需的资源消耗。 因此本实施例中,可以获取所述训练后的目标线性操作等效的第二卷积层,并将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述第二卷积层,以获取第三神经网络模型,所述第三神经网络模型可以用于进行模型推理;
其中,所谓的模型推理,是指在模型的应用过程中,利用模型进行实际的数据处理过程。
应理解,本申请实施例中,获取所述训练后的目标线性操作等效的第二卷积层,并将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述第二卷积层,以获取第三神经网络模型的步骤,可以由训练设备完成,在完成训练后,训练设备可以直接反馈第三神经网络模型,具体的训练设备可以将第三神经网络模型发送至终端设备或者服务器,以便端设备或者服务器基于第三神经网络模型进行模型推理。或者,由端设备或者服务器在进行模型推理之前进行获取所述训练后的目标线性操作等效的第二卷积层,并将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述第二卷积层,以获取第三神经网络模型的动作执行。
在一种可能的实现中,所述第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
为了能够使得进行推理时所采用的模型能够和训练前的第一神经网络模型具有相同的规格,则需要使得第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致;
在一种实现中,若目标线性操作的感受野等于第一卷积层的感受野,则第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
在一种实现中,若目标线性操作的感受野小于第一卷积层的感受野,则计算出的等效卷积层的尺寸大小会小于第一卷积层的尺寸大小,此时可以对计算出的等效卷积层进行补0操作,以得到和所述第一卷积层的尺寸大小一致的第二卷积层。
在一种可能的实现中,所述装置还包括:
融合模块,用于根据所述训练后的目标线性操作包括的多个子线性操作在处理数据时的先后顺序,将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,直至完成到所述先后顺序中最后一个子线性操作的融合,以得到所述目标线性操作等效的第二卷积层。
在一种可能的实现中,所述训练后的目标线性操作包括相邻的第一子线性操作以及第二子线性操,且在所述先后顺序中,所述第二子线性操作位于所述第一子线性操作之后,所述第一子线性操作包括第一操作参数,所述第二子线性操作包括第二操作参数;
所述融合模块,用于:
获取所述第一子线性操作的融合参数,其中,若所述第一子线性操作的输入数据为所述训练后的目标线性操作的输入数据,则所述第一子线性操作的融合参数为所述第一操作参数,若所述第一子线性操作的输入数据为与所述第一子线性操作相邻且在所述先后顺序之前的第三子线性操作的输出数据,则所述第一子线性操作的融合参数为根据所述第三子 线性操作的融合参数以及所述第一操作参数得到;
根据所述第一子线性操作的融合参数、所述第二操作参数以及所述第二子线性操作的操作类型,获取所述第二子线性操作的融合参数;其中,若所述第二子线性操作为所述先后顺序中的最后一个子线性操作,则所述第二子线性操作的融合参数用于作为所述第二卷积层的操作参数。
在一种可能的实现中,所述线性操作包括多个子线性操作,且所述多个子线性操作的操作类型包括如下的至少一种:加和操作、空操作、恒等操作、卷积操作、批归一化BN操作或池化操作。
在一种可能的实现中,若所述第二子线性操作的操作类型为卷积操作或BN操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数以及所述第二子线性操作的操作参数进行内积计算得到;若所述第二子线性操作的操作类型为加和操作、池化操作、恒等操作或空操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数进行所述第二子线性操作的操作类型对应的计算得到。
第四方面,本申请提供了一种模型训练装置,所述装置包括:
获取模块,用于获取第一神经网络模型,所述第一神经网络模型包括第一卷积层;
根据如下信息的至少一种,确定用于替换所述第一卷积层的目标线性操作,其中,所述信息包括所述第一神经网络模型的网络结构、所述目标任务以及所述第一卷积层在所述第一神经网络模型的位置,所述目标线性操作等效为一个卷积层;
根据所述第一神经网络模型,获取第二神经网络模型,其中,所述第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为所述目标线性操作得到的;
模型训练模块,用于对所述第二神经网络模型进行模型训练,以获取目标神经网络模型。
本实施例中,将待训练的神经网络中的卷积层替换为目标线性操作,目标线性操作的结构为根据第一神经网络模型的结构和/或目标任务确定的,相比现有技术中进行卷积层替换时所采用的线性操作,本实施例中线性操作的结构可以更适配于第一神经网络模型,更加的灵活,针对于不同的模型结构以及任务类型可以设计不同的线性操作,以此提高了训练后模型的精度。
在一种可能的实现中,所述目标线性操作包括多个子线性操作,所述目标线性操作包括M个操作分支,每个操作分支的输入为所述目标线性操作的输入,所述M个操作分支满足如下条件的至少一个:
所述M个操作分支包括的多个子线性操作中至少一个子线性操作的输入为所述多个子线性操作中多个子线性操作的输出;
所述M个操作分支中至少两个操作分支之间包括的子线性操作的数量不同;或,
所述M个操作分支中至少两个操作分支之间包括的子线性操作的操作类型不同。
相对于现有技术中,用于替换卷积层的线性操作的结构,本实施例中提供的目标线性操作的结构更为复杂,可以提高训练后模型的精度。
在一种可能的实现中,所述目标线性操作等效的卷积层的感受野小于或等于所述第一卷积层的感受野。
在一种可能的实现中,所述目标线性操作与所述第一卷积层不同。
在一种可能的实现中,所述目标线性操作等效的卷积层和所述目标线性操作在处理相同的数据时,得到的处理结果相同。
在一种可能的实现中,所述获取模块,用于将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述训练后的目标线性操作等效的第二卷积层,以获取第三神经网络模型。
在一种可能的实现中,所述第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
在一种可能的实现中,所述装置还包括:
融合模块,用于根据所述训练后的目标线性操作包括的多个子线性操作在处理数据时的先后顺序,将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,直至完成到所述先后顺序中最后一个子线性操作的融合,以得到所述目标线性操作等效的第二卷积层。
在一种可能的实现中,所述训练后的目标线性操作包括相邻的第一子线性操作以及第二子线性操,且在所述先后顺序中,所述第二子线性操作位于所述第一子线性操作之后,所述第一子线性操作包括第一操作参数,所述第二子线性操作包括第二操作参数;
所述融合模块,用于获取所述第一子线性操作的融合参数,其中,若所述第一子线性操作的输入数据为所述训练后的目标线性操作的输入数据,则所述第一子线性操作的融合参数为所述第一操作参数,若所述第一子线性操作的输入数据为与所述第一子线性操作相邻且在所述先后顺序之前的第三子线性操作的输出数据,则所述第一子线性操作的融合参数为根据所述第三子线性操作的融合参数以及所述第一操作参数得到;
根据所述第一子线性操作的融合参数、所述第二操作参数以及所述第二子线性操作的操作类型,获取所述第二子线性操作的融合参数;其中,若所述第二子线性操作为所述先后顺序中的最后一个子线性操作,则所述第二子线性操作的融合参数用于作为所述第二卷积层的操作参数。
在一种可能的实现中,所述线性操作包括多个子线性操作,且所述多个子线性操作的 操作类型包括如下的至少一种:加和操作、空操作、恒等操作、卷积操作、批归一化BN操作或池化操作。
在一种可能的实现中,若所述第二子线性操作的操作类型为卷积操作或BN操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数以及所述第二子线性操作的操作参数进行内积计算得到;若所述第二子线性操作的操作类型为加和操作、池化操作、恒等操作或空操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数进行所述第二子线性操作的操作类型对应的计算得到。
本申请实施例还提供了一种模型训练装置,所述装置包括:
获取模块,用于获取第一神经网络模型,所述第一神经网络模型包括第一卷积层;
根据所述第一神经网络模型,获取多个第二神经网络模型,其中,每个第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为目标线性操作得到的,所述目标线性操作等效为一个卷积层,所述目标线性操作包括多个子线性操作,所述目标线性操作包括M个操作分支,每个操作分支的输入为所述目标线性操作的输入,所述M个操作分支满足如下条件的至少一个:
所述M个操作分支包括的多个子线性操作中至少一个子线性操作的输入为所述多个子线性操作中多个子线性操作的输出;
所述M个操作分支中至少两个操作分支之间包括的子线性操作的数量不同;或,
所述M个操作分支中至少两个操作分支之间包括的子线性操作的操作类型不同;
模型训练模块,用于对所述第二神经网络模型进行模型训练,以获取目标神经网络模型。
相对于现有技术中,用于替换卷积层的线性操作的结构,本实施例中提供的目标线性操作的结构更为复杂,可以提高训练后模型的精度。
在一种可能的实现中,所述目标线性操作等效的卷积层的感受野小于或等于所述第一卷积层的感受野。
在一种可能的实现中,所述目标线性操作与所述第一卷积层不同。
在一种可能的实现中,所述目标线性操作等效的卷积层和所述目标线性操作在处理相同的数据时,得到的处理结果相同。
在一种可能的实现中,所述目标神经网络模型包括训练后的目标线性操作,所述获取模块,用于:
将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述训练后的目标线性操作等效的第二卷积层,以获取第三神经网络模型。
在一种可能的实现中,所述第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
在一种可能的实现中,所述装置还包括:
融合模块,用于根据所述训练后的目标线性操作包括的多个子线性操作在处理数据时的先后顺序,将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,直至完成到所述先后顺序中最后一个子线性操作的融合,以得到所述目标线性操作等效的第二卷积层。
在一种可能的实现中,所述训练后的目标线性操作包括相邻的第一子线性操作以及第二子线性操,且在所述先后顺序中,所述第二子线性操作位于所述第一子线性操作之后,所述第一子线性操作包括第一操作参数,所述第二子线性操作包括第二操作参数;
所述将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,包括:
获取所述第一子线性操作的融合参数,其中,若所述第一子线性操作的输入数据为所述训练后的目标线性操作的输入数据,则所述第一子线性操作的融合参数为所述第一操作参数,若所述第一子线性操作的输入数据为与所述第一子线性操作相邻且在所述先后顺序之前的第三子线性操作的输出数据,则所述第一子线性操作的融合参数为根据所述第三子线性操作的融合参数以及所述第一操作参数得到;
根据所述第一子线性操作的融合参数、所述第二操作参数以及所述第二子线性操作的操作类型,获取所述第二子线性操作的融合参数;其中,若所述第二子线性操作为所述先后顺序中的最后一个子线性操作,则所述第二子线性操作的融合参数用于作为所述第二卷积层的操作参数。
在一种可能的实现中,所述线性操作包括多个子线性操作,且所述多个子线性操作的操作类型包括如下的至少一种:加和操作、空操作、恒等操作、卷积操作、批归一化BN操作或池化操作。
在一种可能的实现中,若所述第二子线性操作的操作类型为卷积操作或BN操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数以及所述第二子线性操作的操作参数进行内积计算得到;若所述第二子线性操作的操作类型为加和操作、池化操作、恒等操作或空操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数进行所述第二子线性操作的操作类型对应的计算得到。
第五方面,本申请实施例提供了一种模型训练装置,可以包括存储器、处理器以及总线系统,其中,存储器用于存储程序,处理器用于执行存储器中的程序,以执行如上述第一方面、第三方面及其任一可选的方法。
第六方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面、第三方面 及其任一可选的方法。
第七方面,本申请实施例提供了一种计算机程序,包括代码,当代码被执行时,用于实现上述第一方面、第三方面及其任一可选的方法。
第八方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持执行设备或训练设备实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据;或,信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存执行设备或训练设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。
本申请实施例提供了一种模型训练方法,所述方法包括:获取第一神经网络模型,所述第一神经网络模型包括第一卷积层;根据所述第一神经网络模型,获取多个第二神经网络模型,其中,每个第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为一种线性操作得到的,所述线性操作等效为一个卷积层,对所述多个第二神经网络模型进行模型训练,以获取目标神经网络模型,所述目标神经网络模型为训练后的多个第二神经网络模型中模型精度最高的神经网络模型。通过上述方式,将待训练的神经网络中的卷积层替换为可以等效为卷积层的线性操作,并从多个替换方式中选择精度最高的方式,以此提高了训练后模型的精度。
附图说明
图1为人工智能主体框架的一种结构示意图;
图2为本申请实施例提供的卷积神经网络的示意图;
图3为本申请实施例提供的卷积神经网络的示意图;
图4为本申请实施例提供的一种系统架构的示意图;
图5为本申请实施例提供的一种模型训练方法的实施例示意;
图6a为本申请实施例提供的一种线性操作示意;
图6b为本申请实施例提供的一种线性操作示意;
图6c为本申请实施例提供的一种线性操作示意;
图7为本申请实施例提供的一种卷积层感受野示意;
图8为本申请实施例提供的一种卷积层感受野示意;
图9为本申请实施例提供的一种卷积层示意;
图10为本申请实施例提供的一种卷积核示意;
图11为本申请实施例提供的一种线性操作融合示意;
图12为本申请实施例提供的一种线性操作替换示意;
图13为本申请实施例提供的一种线性操作示意;
图14为本申请实施例提供的一种补0操作的示意;
图15a为本申请实施例提供的一种模型训练方法的应用场景示意;
图15b为本申请实施例提供的一种模型训练方法的应用场景示意;
图16a为本申请实施例提供的一种模型训练方法的应用场景示意;
图16b为本申请实施例提供的一种模型训练方法的实施例示意;
图17为本申请实施例提供的一种模型训练装置的示意;
图18为本申请实施例提供的执行设备的一种结构示意图;
图19是本申请实施例提供的训练设备一种结构示意图;
图20为本申请实施例提供的芯片的一种结构示意图。
具体实施方式
下面结合本发明实施例中的附图对本发明实施例进行描述。本发明的实施方式部分使用的术语仅用于对本发明的具体实施例进行解释,而非旨在限定本发明。
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、 预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、平安城市等。
下面从模型训练侧和模型应用侧对本申请提供的方法进行描述:
本申请实施例提供的模型训练方法,具体可以应用于数据训练、机器学习、深度学习等数据处理方法,对训练数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等,最终得到训练好的神经网络模型(如本申请实施例中的目标神经网络模型);并且目标神经网络模型可以用于进行模型推理,具体可以将输入数据输入到目标神经网络模型中,得到输出数据。
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例涉及的相关术语及神经网络等相关概念进行介绍。
(1)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以xs(即输入数据)和截距1为输入的运算单元,该运算单元的输出可以为:
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(2)卷积神经网络(Convosutionas Neuras Network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层(例如本实施例中的第一卷积层、第二卷积层)。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这 里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。这其中隐含的原理是:图像的某一部分的统计信息与其他部分是一样的。即意味着在某一部分学习的图像信息也能用在另一部分上。所以对于图像上的所有位置,我们都能使用同样的学习得到的图像信息。在同一卷积层中,可以使用多个卷积核来提取不同的图像信息,一般地,卷积核数量越多,卷积操作反映的图像信息越丰富。
卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
具体的,如图2所示,卷积神经网络(CNN)100可以包括输入层110,卷积层/池化层120,其中池化层为可选的,以及神经网络层130。
其中,卷积层/池化层120以及神经网络层130组成的结构可以为本申请中所描述的第一卷积层以及第二卷积层,输入层110和卷积层/池化层120连接,卷积层/池化层120连接与神经网络层130连接,神经网络层130的输出可以输入至激活层,激活层可以对神经网络层130的输出进行非线性化处理。
卷积层/池化层120:
卷积层:
如图2所示卷积层/池化层120可以包括如示例121-126层,在一种实现中,121层为卷积层,122层为池化层,123层为卷积层,124层为池化层,125为卷积层,126为池化层;在另一种实现方式中,121、122为卷积层,123为池化层,124、125为卷积层,126为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。
以卷积层121为例,卷积层121可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用维度相同的多个权重矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化……该多个权重矩阵维度相同,经过该多个维度相同的权重矩阵提取后的特征图维度也相同,再将提取到的多个维度相同的特征图合并形成卷积运算的输出。
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以从输入图像中提取信息,从而帮助卷积神经网络100进行正 确的预测。
当卷积神经网络100有多个卷积层的时候,初始的卷积层(例如121)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络100深度的加深,越往后的卷积层(例如126)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。
池化层:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,即如图2中120所示例的121-126各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。
神经网络层130:
在经过卷积层/池化层120的处理后,卷积神经网络100还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层120只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或别的相关信息),卷积神经网络100需要利用神经网络层130来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层130中可以包括多层隐含层(如图2所示的131、132至13n)以及输出层140,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等。
在神经网络层130中的多层隐含层之后,也就是整个卷积神经网络100的最后层为输出层140,该输出层140具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络100的前向传播(如图2由110至140的传播为前向传播)完成,反向传播(如图2由140至110的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络100的损失及卷积神经网络100通过输出层输出的结果和理想结果之间的误差。
需要说明的是,如图2所示的卷积神经网络100仅作为一种卷积神经网络的示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在,例如,如图3所示的多个卷积层/池化层并行,将分别提取的特征均输入给全神经网络层130进行处理。
(3)深度神经网络
深度神经网络(Deep Neural Network,DNN),也称多层神经网络,可以理解为具有很多层隐含层的神经网络,这里的“很多”并没有特别的度量标准。从DNN按不同层的位置划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2022074940-appb-000001
其中,
Figure PCTCN2022074940-appb-000002
是输入向量,
Figure PCTCN2022074940-appb-000003
是输出向量,
Figure PCTCN2022074940-appb-000004
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2022074940-appb-000005
经过如此简单的操作得到输出向量
Figure PCTCN2022074940-appb-000006
由于DNN层数多,则系数W和偏移向量
Figure PCTCN2022074940-appb-000007
的数量也就很多了。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的 第2个神经元的线性系数定义为
Figure PCTCN2022074940-appb-000008
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。总结就是:第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2022074940-appb-000009
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(4)损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
(5)反向传播算法
卷积神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的超分辨率模型中参数的大小,使得超分辨率模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的超分辨率模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的超分辨率模型的参数,例如权重矩阵。
(6)线性操作
线性是指量与量之间按比例、成直线的关系,在数学上可以理解为一阶导数为常数的函数,线性操作可以但不限于为加和操作、空操作、恒等操作、卷积操作、批归一化BN操作以及池化操作。线性操作也可以称之为线性映射,线性映射需要满足两个条件:齐次性和可加性,任一个条件不满足则为非线性。
其中,齐次性是指f(ax)=af(x);可加性是指f(x+y)=f(x)+f(y);例如,f(x)=ax就是线性的。需要注意的是,这里的x、a、f(x)并不一定是标量,可以是向量或者矩阵,形成任意维度的线性空间。如果x、f(x)为n维向量,当a为常数时,就是等价满足齐次性,当a为矩阵时,则等价满足可加性。相对而言,函数图形为直线的不一定符合线性映射,比如f(x)=ax+b,既不满足齐次性也不满足可加性,因此属于非线性映射。
本申请实施例中,多个线性操作的复合可以称之为线性操作,线性操作中包括的各个线性操作也可以称之为子线性操作。
(7)BN:通过小批量的归一化,消除了不同层级输入对参数优化的差异性,减少了模型某一层过拟合的可能性,使得训练更能平稳的进行。
图4是本申请实施例提供的一种系统架构的示意图,在图4中,执行设备110配置输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据。
在执行设备120对输入数据进行预处理,或者在执行设备120的计算模块111执行计算等相关的处理(比如进行本申请中神经网络的功能实现)过程中,执行设备120可以调用数据存储系统150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统150中。
最后,I/O接口112将处理结果返回给客户设备140,从而提供给用户。
可选地,客户设备140,例如可以是自动驾驶系统中的控制单元、手机终端中的功能算法模块,例如该功能算法模块可以用于实现相关的任务。
值得说明的是,训练设备120可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型/规则(例如本实施例中的目标神经网络模型),该相应的目标模型/规则即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。
在图4中所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向I/O接口112发送输入数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果,作为新的样本数据存入数据库130。
值得注意的是,图4仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图4中,数据存储系统150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统150置于执行设备110中。
首先以模型训练阶段为例对本申请实施例提供的模型训练方法进行说明。
参照图5,图5为本申请实施例提供的一种模型训练方法的实施例示意,如图5示出的那样,本申请实施例提供的一种模型训练方法包括:
501、获取第一神经网络模型,所述第一神经网络模型包括第一卷积层。
本申请实施例中,训练设备可以获取待训练的第一神经网络模型,第一神经网络模型可以是用户给定的待训练的模型。
本申请实施例中,训练设备可以将第一神经网络模型中的部分或全部卷积层替换为线性操作。其中,替换的卷积层对象可以是第一神经网络模型中包括的第一卷积层,具体的,第一神经网络模型可以包括多个卷积层,第一卷积层为多个卷积层中的一个。其中,替换的卷积层对象可以是第一神经网络模型中包括的多个卷积层,第一卷积层为多个卷积层中的一个。
本申请实施例中,训练设备可以从第一神经网络模型中选择需要进行替换的卷积层(包括第一卷积层)。
在一种实现中,可以由管理人员来指定第一神经网络模型中需要进行替换的卷积层,或者由训练设备通过模型结构搜索来确定第一神经网络模型中需要进行替换的卷积层,关于训练设备如何通过模型结构搜索来确定需要进行替换的卷积层将在后续的实施例中描述,这里不再赘述。
502、根据所述第一神经网络模型,获取多个第二神经网络模型,其中,每个第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为一种线性操作得到的,所述线性操作等效为一个卷积层。
本申请实施例中,训练设备可以将所述第一神经网络模型中的所述第一卷积层替换为一种线性操作,以得到一个第二神经网络模型,进而,获取到多个第二神经网络模型,每个第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为一种线性操作得到的。
本申请实施例中,所述线性操作等效为一个卷积层。
其中,本申请实施例中所谓的“等效”,是指两个运算单元之间的关系,具体的,是指两个形式上不同的运算单元,在处理任意相同的数据时,得到的处理结果是相同的,两个运算单元中的一个运算单元通过数学运算推导,可以变换为另一个运算单元的形式。针对于本申请实施例,线性操作中包括的子线性操作可以通过数学运算推导,变换为卷积层的形式,且变换得到的卷积层和所述线性操作在处理相同的数据时,得到的处理结果相同。
本申请实施例中,为了能够使线性操作能够等效为一个卷积层,线性操作中包括的多个子线性操作中,至少需要有一个卷积操作。具体的,线性操作由多个子线性操作来组成,这里所谓的子线性操作可以指基础的线性操作,而不是多个基础线性操作复合而成的操作,这里所谓的线性操作,是指多个基础线性操作复合而成的操作。例如,子线性操作的操作类型可以但不限于为加和操作、空操作、恒等操作、卷积操作、批归一化BN操作或池化操作,相应的,线性操作可以指加和操作、空操作、恒等操作、卷积操作、批归一化BN操作以及池化操作中的至少一种子线性操作的复合。应理解,这里的复合,是指子线性操作的数量大于或等于2,且子线性操作之间存在连接关系,不存在孤立的子线性操作,所谓存在连接关系,是指一个子线性操作的输出用于作为另一个子线性操作的输入(除了位于线性操作输出侧的子线性操作,该子线性操作的输出用于作为线性操作的输出)。
示例性的,可以参照图6a、图6b以及图6c,图6a、图6b以及图6c为本申请实施例中线性操作的几种结构示意,其中,图6a示出的线性操作包括4个子线性操作,4个子线性操作包括卷积操作1(卷积尺寸大小为k*k)、卷积操作2(卷积尺寸大小为1*1)、卷积操作3(卷积尺寸大小为k*k)以及加和操作,卷积操作1处理线性操作的输入数据,得到输出1,卷积操作2处理线性操作的输入数据,得到输出2,卷积操作3处理输出2,得到输出3,加和操作对输出1以及输出3进行加和,得到线性操作的输出。
其中,图6b示出的线性操作包括7个子线性操作,7个子线性操作包括卷积操作1(卷积 尺寸大小为k*k)、卷积操作2(卷积尺寸大小为1*1)、卷积操作3(卷积尺寸大小为k*k)、卷积操作4(卷积尺寸大小为1*1)、卷积操作5(卷积尺寸大小为k*k)、卷积操作6(卷积尺寸大小为1*1)以及加和操作,卷积操作1处理线性操作的输入数据,得到输出1,卷积操作2处理线性操作的输入数据,得到输出2,卷积操作3处理输出2,得到输出3,卷积操作4处理线性操作的输入数据,得到输出4,卷积操作5处理输出4,得到输出5,卷积操作6处理输出5,得到输出6,加和操作对输出1、输出3以及输出6进行加和,得到线性操作的输出。
其中,图6c示出的线性操作包括8个子线性操作,8个子线性操作包括卷积操作1(卷积尺寸大小为k*k)、卷积操作2(卷积尺寸大小为1*1)、卷积操作3(卷积尺寸大小为k*k)、卷积操作4(卷积尺寸大小为1*1)、卷积操作5(卷积尺寸大小为1*1)、卷积操作6(卷积尺寸大小为k*k)、加和操作1以及加和操作2,卷积操作1处理线性操作的输入数据,得到输出1,卷积操作2处理线性操作的输入数据,得到输出2,卷积操作3处理输出2,得到输出3,卷积操作4处理输出2,得到输出4,卷积操作5处理线性操作的输入数据,得到输出5,加和操作1对输出4和输出5进行加和,得到输出6,卷积操作6处理输出6,得到输出7,加和操作2对输出1、输出3以及输出7进行加和,得到线性操作的输出。
接下来描述用于替换第一卷积层的线性操作:
本申请实施例中,为了能够使线性操作能够等效为一个卷积层,线性操作中包括的多个子线性操作中,至少需要有一个卷积操作。在后续进行模型推理的过程中,为了使在后续进行模型推理的过程中,不降低推理阶段的速度或者增加推理阶段的资源消耗,并不将线性操组用于模型推理,而是采用线性操作等效的卷积层(后续实施例中可以称为第二卷积层)用于模型推理,且需要保证线性操作等效的卷积层的感受野小于或等于第一卷积层的感受野。
接下来描述,如何保证线性操作等效的感受野小于或等于第一卷积层的感受野:
本申请实施例中,为了保证线性操作等效的感受野小于或等于第一卷积层的感受野,需要使得线性操作中各个操作分支的等效感受野小于或等于第一卷积层的感受野。接下来将针对于线性操作中各个操作分支的感受野进行详细描述。
首先描述操作分支的概念:
以线性操作的输入和输出为两个端点,则两个端点之间的一条数据通路可以为一个操作分支,操作分支的起点为线性操作的输入,操作分支的终点为线性操作的输出,在一种实现中,所述线性操作可以包括多个并行的操作分支,每个操作分支用于对所述线性操作的输入数据进行处理,也就相当于每个操作分支的起点为线性操作的输入,进而,每个操作分支中距离线性操作的输入最近的子线性操作的输入为所述线性操作的输入数据,相当于每个操作分支用于对所述线性操作的输入数据进行处理,所述每个操作分支包括串行的至少一个子线性操作。换一种表达方式,可以将线性操作表示成一张计算图,该计算图中定义了各个子线性操作的输入来源和输出数据的流向,对于该计算图的任意一条从输入到输出的路径,可以定义为线性操作的一条操作分支。
示例性的,可以参照图6a,图6a示出的线性操作可以包括两条操作分支(本实施例中表示为操作分支1和操作分支2),其中,操作分支1包括卷积操作1以及加法操作,操作分支 2包括卷积操作2、卷积操作3以及加和操作,操作分支1和操作分支2都用于处理线性操作的输入数据,操作分支1的数据流向为从卷积操作1至加法操作,也就是线性操作的输入数据用于依次通过卷积操作1以及加法操作的处理,操作分支2的数据流向为从卷积操作2、卷积操作3至加法操作,也就是线性操作的输入数据用于依次通过卷积操作2、卷积操作3以及加法操作的处理。
示例性的,可以参照图6b,图6b示出的线性操作可以包括三条操作分支(本实施例中表示为操作分支1、操作分支2以及操作分支3),其中,操作分支1包括卷积操作1以及加法操作,操作分支2包括卷积操作2、卷积操作3以及加和操作,操作分支3包括卷积操作4、卷积操作5、卷积操作6以及加和操作,操作分支1、操作分支2以及操作分支3都用于处理线性操作的输入数据,操作分支1的数据流向为从卷积操作1至加法操作,也就是线性操作的输入数据用于依次通过卷积操作1以及加法操作的处理,操作分支2的数据流向为从卷积操作2、卷积操作3至加法操作,也就是线性操作的输入数据用于依次通过卷积操作2、卷积操作3以及加法操作的处理,操作分支3的数据流向为从卷积操作4、卷积操作5、卷积操作6至加法操作,也就是线性操作的输入数据用于依次通过卷积操作4、卷积操作5、卷积操作6以及加法操作的处理。
示例性的,可以参照图6c,图6c示出的线性操作可以包括四条操作分支(本实施例中表示为操作分支1、操作分支2、操作分支3以及操作分支4),其中,操作分支1包括卷积操作1以及加法操作2,操作分支2包括卷积操作2、卷积操作3以及加和操作2,操作分支3包括卷积操作2、卷积操作4、加和操作1、卷积操作6以及加和操作1,操作分支4包括卷积操作5、加和操作1、卷积操作6以及加和操作2,操作分支1、操作分支2、操作分支3以及操作分支4都用于处理线性操作的输入数据,操作分支1的数据流向为从卷积操作1至加法操作2,也就是线性操作的输入数据用于依次通过卷积操作1以及加法操作2的处理,操作分支2的数据流向为从卷积操作2、卷积操作3至加法操作2,也就是线性操作的输入数据用于依次通过卷积操作2、卷积操作3以及加法操作2的处理,操作分支3的数据流向为从卷积操作2、卷积操作4、加和操作1、卷积操作6至加和操作1,也就是线性操作的输入数据用于依次通过卷积操作2、卷积操作4、加和操作1、卷积操作6以及加和操作1的处理,操作分支4的数据流向为从卷积操作5、加和操作1、卷积操作6至加和操作2,也就是线性操作的输入数据用于依次通过卷积操作5、加和操作1、卷积操作6以及加和操作2的处理。
接下来针对于线性操作中各个操作分支的等效感受野进行描述。
针对于单个子线性操作,例如k*k卷积或池化的感受野为k,加和操作以及BN操作的感受野为1,而操作分支的等效感受野为k的定义是:该操作分支的每个输出受kxk个输入的影响。操作分支的感受野计算方法为:假设操作分支包括有N个子线性操作,N个子线性操作各自的感受野为ki(i为小于或等于N的正整数),那么N个子线性操作的等效感受野为k1+k2+…+kN-(N-1),例如两个3x3卷积操作的等效感受野为3+3-1=5。
例如,图6a中线性操作中的操作分支1的等效感受野为k(计算方法为k+1-1=k)。
例如,图6a中线性操作中的操作分支2的等效感受野为k(计算方法为1+k+1-2=k)。
例如,图6b中线性操作中的操作分支1的等效感受野为k(计算方法为k+1-1=k)。
例如,图6b中线性操作中的操作分支2的等效感受野为k(计算方法为1+k+1-2=k)。
例如,图6b中线性操作中的操作分支3的等效感受野为k(计算方法为1+k+1+1-3=k)。
例如,图6c中线性操作中的操作分支1的等效感受野为k(计算方法为k+1-1=k)。
例如,图6c中线性操作中的操作分支2的等效感受野为k(计算方法为1+k+1-2=k)。
例如,图6c中线性操作中的操作分支3的等效感受野为k(计算方法为1+1+1+k+1-4=k)。
例如,图6c中线性操作中的操作分支4的等效感受野为k(计算方法为1+1+k+1-2=k)。
本申请实施例中,线性操作等效的卷积层的感受野与线性操作的感受野一致,且线性操作的感受野等于各个操作分支中最大的感受野,例如,若线性操作包括的各个操作分支的感受野分别为3、5、5、5、7,则线性操作的感受野等于7。
为了能够使得线性操作等效的卷积层的感受野小于或等于第一卷积层的感受野,则需要保证线性操作的感受野小于或等于第一卷积核的感受野。也就是线性操作中各个操作分支的等效感受野小于或等于第一卷积层的感受野。
在一种实现中,所述线性操作可以仅包括一个操作分支,所述一个操作分支用于对所述线性操作的输入数据进行处理,所述一个操作分支包括串行的至少一个子线性操作,则该线性操作中仅包括的操作分支的等效感受野小于或等于第一卷积层的感受野。
接下来描述卷积层的感受野的概念。
以处理对象为图像为例,感受野是指即卷积层上一个特征在输入图像上的感知域(感知范围),在该感知范围内的像素如果发生变化,该特征的值将会随之发生变化。如图7所示,卷积核在输入图像上滑动,提取出的特征构成了卷积层101。类似的,卷积核在卷积层101上滑动,提取出的特征构成了卷积层102。那么,卷积层101中每一个特征,是由输入图像上滑动的卷积核的卷积片的尺寸内的输入图像的像素提取出来的,该尺寸也即卷积层101的感受野。因此,卷积层101的感受野如图7所示。
相应的,卷积层102中的每一个特征映射到输入图像上的范围(即采用输入图像上多大范围的像素)也即卷积层102的感受野。如图8所示,卷积层102中的每一个特征,是由卷积层101上滑动的卷积核的卷积片的尺寸内的输入图像的像素提取出来的。而卷积层101上的每一特征,由是由输入图像上滑动的卷积核的卷积片的范围内的输入图像的像素提取出来的。因此,卷积层102的感受野比卷积层101的感受野要大。
在一种实现中,所述多个并行的操作分支中至少一个操作分支的等效感受野等于所述第一卷积层的感受野,则线性操作的感受野等于第一卷积层的感受野,进而线性操作等效的卷积层(后续描述为第二卷积层)的感受野等于第一卷积层的感受野,第二卷积层可以用于后续的模型推理过程,且由于第二卷积层与第一卷积层的感受野一致,在保证和未进行替换的神经网络模型的尺寸规格一致的前提下,也就是在保证推理阶段的速度、资源消耗保持不变的前提下,相比第二卷积层的感受野小于第一卷积层的感受野,增加了训练参数量,提高了模型的精度。
以上描述了用于替换卷积层的线性操作。本申请实施例中,训练设备可以获取到多个线性操作,并将第一神经网络模型中的第一卷积层替换为多个线性操作中的一个线性操作(或者将第一神经网络模型中的多个卷积层(包括第一卷积层)替换为多个线性操作中的 一个线性操作),以此类推,以获取多个第二神经网络模型,其中,每个第二神经网络模型是对第一神经网络模型中的第一卷积层替换为一个线性操作得到的。
接下来描述,如何获取到多个线性操作:
本申请实施例中,可以选择某种基于采样的搜索算法,例如强化学习、遗传算法等,并对包括线性操作的搜索空间进行编码。示例性的,一种可行的编码方式是先对可选的子线性操作进行顺序编码,例如将空操作、恒等操作、1x1卷积、3x3卷积、BN、3x3池化、分别编码为0,1,2,3,4,5,然后利用邻接矩阵M表示一组线性操作的计算图。对于N个节点(除输入节点外)的计算图来说邻接矩阵M是一个N*(N+1)的矩阵,该矩阵的行号为1-N,列号为0-N。矩阵第i行第j列的数值M[i,j]表示从第j个节点的输出经过M[i,j]对应的操作,结果加到第i个节点上。M[i,j]=0则表示从第j个节点到第i个节点之间没有直接操作相连。基于该编码方案,图11中示出的线性操作所对应的编码可以表1所示:(假设k=3):
表1
3 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
0 0 3 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
0 0 2 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 3 0 0
0 1 0 1 0 0 0 1 0
之后,可以根据搜索算法采样线性操作的编码,对每个采样出的线性操作编码,将第一神经网络模型中的第一卷积替换为线性操作编码对应的线性操作。
在一种实现中,可以仅获取一种第二神经网络模型,也就是确定一种目标线性操作并将第一神经网络模型中的第一卷积层替换为该确定出的目标线性操作,以得到第二神经网络模型,具体的,训练设备可以根据所述第一神经网络模型,获取第二神经网络模型,其中,所述第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为目标线性操作得到的,所述目标线性操作包括多个子线性操作,所述目标线性操作用于等效为一个卷积层,所述目标线性操作包括M个操作分支,每个操作分支的输入为所述目标线性操作的输入,所述多个子线性操作满足如下条件的至少一个:
所述多个子线性操作包括至少三种操作类型;所述M不为3;所述M个操作分支中至少 一个操作分支包括的子线性操作的数量不等于2,所述M为正整数;或,所述M个操作分支中至少一个操作分支中操作类型为卷积操作的子线性操作的数量不为1。
503、对所述多个第二神经网络模型进行模型训练,以获取目标神经网络模型,所述目标神经网络模型为训练后的多个第二神经网络模型中模型精度最高的神经网络模型。
本申请实施例中,训练设备可以对获取到的所述多个第二神经网络模型进行模型训练,以获取到多个训练后的第二神经网络模型,并从训练后多个的第二神经网络模型中确定目标神经网络模型,其中,所述目标神经网络模型为所述多个第二神经网络模型中模型精度最高的神经网络模型。
应理解,步骤502中获取多个第二神经网络的动作执行实现并不严格在步骤503中对所述多个第二神经网络模型进行模型训练的动作执行时间之后,例如,训练设备可以获取到一个第二神经网络模型之后,就进行该第二神经网络模型的训练,并在训练完成之后,获取下一个第二神经网络模型,以此类推,进而,训练设备可以获取到多个第二神经网络模型,并进行多个第二神经网络模型的训练。
而关于第二神经网络模型的数量,可以是管理人员预先指定的,或者是训练设备在进行第二神经网络模型的训练过程中,基于达到搜索资源的限制时,已经完成训练的第二神经网络模型的数量。
本申请实施例中,在进行第二神经网络模型的训练时,可以得到每个训练后的第二神经网络模型的模型精度(或者称之为验证精度),基于各个第二神经网络模型的模型精度,可以从多个第二神经网络模型中选择模型精度最高的第二神经网络模型。
以模型精度最高的第二神经网络模型为目标神经网络模型为例,所述目标神经网络模型对应的第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为目标线性操作得到的,而所述精度最高的神经网络模型包括训练后的目标线性操作。
由于相比第一卷积层,目标线性操作中包括了多个子线性操作,若将该目标神经网络模型直接用于模型推理,则会降低模型推理速度,以及增加模型推理时所需的资源消耗。因此本实施例中,可以获取所述训练后的目标线性操作等效的第二卷积层,并将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述第二卷积层,以获取第三神经网络模型,所述第三神经网络模型可以用于进行模型推理。
应理解,本申请实施例中,获取所述训练后的目标线性操作等效的第二卷积层,并将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述第二卷积层,以获取第三神经网络模型的步骤,可以由训练设备完成,在完成训练后,训练设备可以直接反馈第三神经网络模型,具体的训练设备可以将第三神经网络模型发送至终端设备或者服务器,以便端设备或者服务器基于第三神经网络模型进行模型推理。或者,由端设备或者服务器在进行模型推理之前进行获取所述训练后的目标线性操作等效的第二卷积层,并将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述第二卷积层,以获取第三神经网络模型的动作执行。
接下来描述如何获取训练后的目标线性操作等效的第二卷积层:
本申请实施例中,可以根据所述训练后的目标线性操作包括的多个子线性操作在处理数据时的先后顺序,将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,直至完成到所述先后顺序中最后一个子线性操作的融合,以得到所述目标线性操作等效的第二卷积层。
针对于每个子线性操作,可以将其融合至相邻且位于所述先后顺序中之后的子线性操作,直到完成到最后一个子线性操作(距离输出最近的子线性操作)的融合。
本申请实施例中,所述训练后的目标线性操作包括相邻的第一子线性操作以及第二子线性操,且在所述先后顺序中,所述第二子线性操作位于所述第一子线性操作之后,所述第一子线性操作包括第一操作参数,所述第二子线性操作包括第二操作参数;
具体的,本申请实施例中,第一子线性操作以及第二子线性操作为所述训练后的目标线性操作中任意相邻的子线性操作,且所述第二子线性操作位于所述先后顺序中所述第一子线性操作之后的子线性操作,所述第一子线性操作包括第一操作参数,所述第一子线性操作用于根据所述第一操作参数,对所述第一子线性操作的输入数据进行所述第一子线性操作的操作类型对应的处理,所述第二子线性操作包括第二操作参数,所述第二子线性操作用于根据所述第二操作参数,对所述第二子线性操作的输入数据进行所述第二子线性操作的操作类型对应的处理,进而,可以获取所述第一子线性操作的融合参数,其中,若所述第一子线性操作的输入数据为所述训练后的目标线性操作的输入数据,则所述第一子线性操作的融合参数为所述第一操作参数,并根据所述第一子线性操作的融合参数、所述第二操作参数以及所述第二子线性操作的操作类型,获取所述第二子线性操作的融合参数;其中,若所述第二子线性操作为所述先后顺序中的最后一个子线性操作,则所述第二子线性操作的融合参数用于作为所述第二卷积层的操作参数。
在一种可能的实现中,所述线性操作中子线性操作的操作类型包括如下的至少一种:加和操作、空操作、恒等操作、卷积操作、批归一化BN操作或池化操作,卷积操作和BN操作都包括可训练的操作参数,针对邻接矩阵的表示方式,需要一种空操作(0),相当于节点i到节点j没有操作。
本申请实施例中,若所述第二子线性操作的操作类型为卷积操作或BN操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数以及所述第二子线性操作的操作参数进行内积计算得到;若所述第二子线性操作的操作类型为加和操作、池化操作、恒等操作或空操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数进行所述第二子线性操作的操作类型对应的计算得到。
具体的融合策略的示意可以参照图11,图11中以第二子线性操作的操作类型分别为加和操作(图11中描述为加法操作)、卷积操作、池化操作以及BN操作为例,进行了示例性的说明。
对于训练后的目标神经网络的线性操作,融合参数=融合(输出节点)。对模型中的每一个线性操作均执行融合过程,最终得到完全融合后的模型,该模型与原模型结构一致,因此推理阶段的速度、资源消耗保持不变。同时,融合前后的模型在数学上是等价的,因此融合后模型的精度和融合前保持一致。
接下来以第一神经网络模型为ResNet18为例,结合一个具体实例,描述本申请实施例中的模型训练方法:
如图12所示,将第一神经网络模型中的卷积层替换为线性操作。这里可以选择一部分卷积层进行替换,也可以全部替换,不同卷积层替换成的线性操作的形式可以不同,此处仅以线性操作为图12所示的过参数化形式C为例。替换完成后,对替换后得到的第二神经网络模型按原模型的训练过程进行训练,得到训练后的模型。
在得到训练后的第二神经网络模型后,需要对每个线性操作进行参数融合。如图13所示(图13中将各个子线性操作表示为节点1-8),以过参数化形式C为例,具体的融合过程可以如下:
针对于节点1、节点2以及节点4,由于节点1、节点2以及节点4都是用于处理线性操作的输入(也就是和输入0节点直连的节点),因此,节点1的融合参数为节点1的操作参数,节点2的融合参数为节点2的操作参数,节点4的融合参数为节点4的操作参数;
针对于节点5,节点5用于根据节点5的操作参数对节点2的输出进行节点5的操作类型对应的处理(卷积操作),因此,节点5的融合参数为节点2的融合参数与节点5的操作参数的内积;
针对于节点6,节点6用于对节点5的输出以及节点4的输出,进行节点6的操作类型对应的处理(加和操作),因此,节点6的融合参数为节点5的融合参数与节点4的操作参数的加和;
针对于节点3,节点3用于根据节点3的操作参数对节点2的输出进行节点3的操作类型对应的处理(卷积操作),因此,节点3的融合参数为节点2的融合参数与节点3的操作参数的内积;
针对于节点7,节点7用于根据节点7的操作参数对节点6的输出进行节点7的操作类型对应的处理(卷积操作),因此,节点7的融合参数为节点6的融合参数与节点7的操作参数的内积;
针对于节点8,节点8用于对节点1的输出、节点3的输出以及节点7的输出,进行节点8的操作类型对应的处理(加和操作),因此,节点8的融合参数为节点1的融合参数、节点3的融合参数与节点7的操作参数的加和;
进而可以将节点8的融合参数作为第二卷积层的操作参数,第二卷积层可以基于第二卷积层的操作参数对输入数据进行卷积操作。
以下为从伪代码的角度描述图13中线性操作的融合过程:
融合参数=融合(节点8):加法,前置节点为1,3,7
节点1融合参数=融合(节点1):卷积,与输入直接相连,返回参数
节点3融合参数=融合(节点3):卷积,前置节点2
节点2融合参数=融合(节点2):卷积,与输入直接相连,返回参数
返回节点3参数和节点2融合参数的内积
节点7融合参数=融合(节点7):卷积,前置节点6
节点6融合参数=融合(节点6):加法,前置节点为5,4
节点5融合参数=融合(节点5):卷积,前置节点2
节点2融合参数=融合(节点2):卷积,与输入直接相连,返回参数
返回节点5参数和节点2融合参数的内积
节点4融合参数=融合(节点4):卷积,与输入直接相连,返回参数
返回求和({节点5融合参数,节点4融合参数})
返回节点7参数和节点6融合参数的内积
返回求和({节点1融合参数,节点3融合参数,节点7融合参数})
对每一个线性操作,仿照上述过程进行子线性操作的融合,最终得到完全融合后的模型。该融合后的模型与原ResNet-18模型结构相同。
本申请实施例中,为了能够使得进行推理时所采用的模型能够和训练前的第一神经网络模型具有相同的规格,则需要使得第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
接下来首先描述卷积层的尺寸的概念。
卷积层的尺寸可以表示卷积层包括的特征数目,示例性的,接下来结合卷积层以及卷积核对卷积层的尺寸进行说明。如图9所示,卷积层101的尺寸为X*Y*N1,即卷积层101包括X*Y*N1个特征。其中,N1为通道数,一个通道即一个特征维度,X*Y为每一个通道包括的特征数目。X、Y、N1均为大于0的正整数。卷积核1011为作用于卷积层101使用的卷积核之一。由于卷积层102包括N2个通道,因此卷积层101共使用N2个卷积核,这N2个卷积核的尺寸和模型参数可以相同也可以不同。以卷积核1011为例,卷积核1011的尺寸为X1*X1*N1。即卷积核1011内包括X1*X1*N1个模型参数。卷积核1011在卷积层101内滑动,滑动到卷积层101的某一位置时,卷积核1011的模型参数和对应位置的卷积层101的特征相乘。将卷积核1011的各个模型参数和对应位置的卷积层101的特征的乘积结果合并后,获得卷积层102的一个通道上的一个特征。卷积层101的特征和卷积核1011的乘积结果可以直接作为卷积层102的特征。也可以在卷积层101的特征和卷积核1011在卷积层101上滑动完毕,输出全部乘积结果后,对全部乘积结果进行归一化,将归一化后的乘积结果作为卷积层102的特征。形象的表示,卷积核1011在卷积层101上滑动做卷积,卷积的结果形成了卷积层102的一个通道。卷积层101使用的每一个卷积核对应了卷积层102的一个通道。因此,卷积层102的通道数等于作用于卷积层101的卷积核的数目。每一个卷积核内的模型参数的设计体现了该卷积核希望从卷积层内提取的特征的特点。通过N2个卷积核,卷积层101被提取出N2个通道的特征。
如图10所示,将卷积核1011拆分开。卷积核1011包括N1个卷积片,每个卷积片包括X1*X1个模型参数(P11至Px1x1)。每个模型参数对应一个卷积点。一个卷积点对应的模型参数与该卷积点对应位置内的卷积层内的特征相乘获得该卷积点的卷积结果,一个卷积核的卷积点的卷积结果之和为该卷积核的卷积结果。
在一种实现中,若目标线性操作的感受野等于第一卷积层的感受野,则第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
在一种实现中,若目标线性操作的感受野小于第一卷积层的感受野,则计算出的等效卷积层的尺寸大小会小于第一卷积层的尺寸大小,此时可以对计算出的等效卷积层进行补0操作,以得到和所述第一卷积层的尺寸大小一致的第二卷积层。具体可以参照图14,图14为本申请实施例中一种补0操作的示意。
本申请实施例中,通过将待训练的神经网络中的卷积层替换为可以等效为卷积层的线性操作,并从多个替换方式中选择精度最高的方式,以此提高了训练后模型的精度。参照表2,表2示出了通过不同替换方式(表2中表示为过参数化形式)的网络的精度。具体的,在该任务中,loss越低表明模型拟合能力越强,模型精度越高。如表2所示,对于两种模型结构,过参数化训练后的loss比原模型结构基线更低。同时,对于不同的模型结构,最优的过参数化形式也不同。
表2
loss 基线 过参数化形式A 过参数化形式B 过参数化形式C
模型结构1 1.625 1.581 1.582 1.598
模型结构2 1.589 1.574 1.564 1.563
本申请实施例提供了一种模型训练方法,所述方法包括:获取第一神经网络模型,所述第一神经网络模型包括第一卷积层;根据所述第一神经网络模型,获取多个第二神经网络模型,其中,每个第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为一种线性操作得到的,所述线性操作等效为一个卷积层,对所述多个第二神经网络模型进行模型训练,以获取目标神经网络模型,所述目标神经网络模型为训练后的多个第二神经网络模型中模型精度最高的神经网络模型。通过上述方式,将待训练的神经网络中的卷积层替换为可以等效为卷积层的线性操作,并从多个替换方式中选择精度最高的方式,以此提高了训练后模型的精度。
接下来从产品应用的角度,介绍几种本申请实施例的应用场景。
本申请实施例典型的应用场景可以包含终端设备上的神经网络模型,具体的,可以将通过本申请实施例提供的训练方法训练后得到的模型部署在终端设备(例如智能手机)或者云服务器上,提供推理能力。具体可以如图15a所示,将第一神经网络模型(图15a中表述为DNN模型)进行本申请实施例提供的训练方法的模型训练,并将融合后的过参数化模型部署在终端设备或云服务器上,对用户的数据进行推理。
本申请实施例提供的训练方法也可以应用在云上的AutoML服务中,结合其他AutoML 技术例如数据增强策略搜索、模型结构搜索、激活函数搜索、超参数搜索等,进一步提升模型效果。具体可以如图15b以及图16a所示:用户提供训练数据、模型结构,并指定目标任务,云上的AutoML服务自动进行过参数化形式的搜索,最终输出搜索得到的模型和对应参数;或者也可以将过参数化训练与其他AutoML技术结合,例如数据增强策略搜索、模型结构搜索、激活函数搜索、超参数搜索等,进一步提升模型效果。
参照图16b,图16b为本申请实施例提供的一种模型训练方法的流程示意,如图16b所示,本申请实施例提供的一种模型训练方法包括:
1601、获取第一神经网络模型,所述第一神经网络模型包括第一卷积层,所述第一神经网络模型用于实现目标任务;
步骤1601的具体描述可以参照步骤501的描述,这里不再赘述。
1602、根据如下信息的至少一种,确定用于替换所述第一卷积层的目标线性操作,其中,所述信息包括所述第一神经网络模型的网络结构、所述目标任务以及所述第一卷积层在所述第一神经网络模型的位置,所述目标线性操作等效为一个卷积层;
其中,针对于不同网络结构的神经网络模型、实现不同目标任务的神经网络模型、以及在神经网络模型中不同位置的卷积层,可以选择不同的线性操作,以使得替换后的神经网模型经过训练后的模型精度较高;
其中,目标线性操作可以基于第一神经网络模型的网络结构和/或第一卷积层在所述第一神经网络模型的位置来确定,具体的,可以根据第一神经网络模型的网络结构来确定目标线性操作的结构;第一神经网络模型的网络结构可以是第一神经网络模型包括的子网络层的数量、子网络层的类型、以及子网络层之间连接关系、第一卷积层在第一神经网络模型中所在的位置;目标线性操作的结构可以是指目标线性操作包括的子线性操作的数量、子线性操作的类型以及子线性操作之间的连接关系,例如可以基于模型搜索的方式,针对于具有不同网络结构的神经网络模型的卷积层进行进行线性操作替换,并对替换后的神经网络模型进行训练,以确定出各个神经网络模型的网络结构中各个卷积层所对应的最优或者较优的线性操作,该最优或者较优的线性操作是指对替换后的神经网络模型进行训练得到的模型的精度较高;在获取第一神经网络模型之后,可以基于第一神经网络模型的网络结构,从之前预先搜索得到的神经网络模型的网络结构中选择结构一致或者相似的神经网络模,并确定该一致或者相似的神经网络模型中的一个卷积层所对应的线性操作作为目标线性操作,其中,上述“一个卷积层”在该一致或者相似的神经网络模型中的相对位置与第一卷积层在第一神经网络模型中的相对位置一致或者相似;
其中,目标线性操作可以基于第一神经网络模型的网络结构以及实现的目标任务来确定,和上述基于第一神经网络模型的网络结构来确定类似,可以通过模型搜索的方式,针对于具有不同网络结构以及实现不同目标任务的神经网络模型的卷积层进行进行线性操作替换,并对替换后的神经网络模型进行训练,以确定出各个神经网络模型的网络结构中各个卷积层所对应的最优或者较优的线性操作,该最优或者较优的线性操作是指对替换后的神经网络模型进行训练得到的模型的精度较高;
其中,目标线性操作可以基于第一神经网络模型实现的目标任务来确定,和上述基于第一神经网络模型的网络结构来确定类似,可以通过模型搜索的方式,针对于实现不同目标任务的神经网络模型的卷积层进行进行线性操作替换,并对替换后的神经网络模型进行训练,以确定出各个神经网络模型的网络结构中各个卷积层所对应的最优或者较优的线性操作,该最优或者较优的线性操作是指对替换后的神经网络模型进行训练得到的模型的精度较高;
应理解,上述基于第一神经网络模型的网络结构和/或所述目标任务确定目标线性操作的方式仅为一种示意,还可以通过其他方式来实现,只要使得替换后的第一神经网络模型(也就是第二神经网络模型)的模型精度较高,并不限定如何确定目标线性操作的具体结构以及确定方式。
1603、根据所述第一神经网络模型,获取第二神经网络模型,其中,所述第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为所述目标线性操作得到的。
步骤1603的具体描述可以参照步骤502的描述,这里不再赘述。
1604、对所述第二神经网络模型进行模型训练,以获取目标神经网络模型。
步骤1604的具体描述可以参照步骤503中关于对第二神经网络模型进行模型训练的过程的描述,这里不再赘述。
本实施例中,将待训练的神经网络中的卷积层替换为目标线性操作,目标线性操作的结构为根据第一神经网络模型的结构和/或目标任务确定的,相比现有技术中进行卷积层替换时所采用的线性操作,本实施例中线性操作的结构适配于,更加的灵活,针对于不同的模型结构以及任务类型可以设计不同的线性操作,以此提高了训练后模型的精度。
在一种可能的实现中,所述目标线性操作包括多个子线性操作,所述目标线性操作包括M个操作分支,每个操作分支的输入为所述目标线性操作的输入,所述M个操作分支满足如下条件的至少一个:
所述M个操作分支包括的多个子线性操作中至少一个子线性操作的输入为所述多个子线性操作中多个子线性操作的输出;
所述M个操作分支中至少两个操作分支之间包括的子线性操作的数量不同;或,
所述M个操作分支中至少两个操作分支之间包括的子线性操作的操作类型不同。
相对于现有技术中,用于替换卷积层的线性操作的结构,本实施例中提供的目标线性操作的结构更为复杂,可以提高训练后模型的精度。
在一种可能的实现中,所述目标线性操作等效的卷积层的感受野小于或等于所述第一卷积层的感受野。
在一种可能的实现中,所述目标线性操作与所述第一卷积层不同。
在一种可能的实现中,所述目标线性操作等效的卷积层和所述目标线性操作在处理相同的数据时,得到的处理结果相同。
在一种可能的实现中,所述目标神经网络模型包括训练后的目标线性操作,所述方法还包括:
将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述训练后的目标线 性操作等效的第二卷积层,以获取第三神经网络模型。
在一种可能的实现中,所述第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
在一种可能的实现中,所述方法还包括:
根据所述训练后的目标线性操作包括的多个子线性操作在处理数据时的先后顺序,将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,直至完成到所述先后顺序中最后一个子线性操作的融合,以得到所述目标线性操作等效的第二卷积层。
在一种可能的实现中,所述训练后的目标线性操作包括相邻的第一子线性操作以及第二子线性操,且在所述先后顺序中,所述第二子线性操作位于所述第一子线性操作之后,所述第一子线性操作包括第一操作参数,所述第二子线性操作包括第二操作参数;
所述将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,包括:
获取所述第一子线性操作的融合参数,其中,若所述第一子线性操作的输入数据为所述训练后的目标线性操作的输入数据,则所述第一子线性操作的融合参数为所述第一操作参数,若所述第一子线性操作的输入数据为与所述第一子线性操作相邻且在所述先后顺序之前的第三子线性操作的输出数据,则所述第一子线性操作的融合参数为根据所述第三子线性操作的融合参数以及所述第一操作参数得到;
根据所述第一子线性操作的融合参数、所述第二操作参数以及所述第二子线性操作的操作类型,获取所述第二子线性操作的融合参数;其中,若所述第二子线性操作为所述先后顺序中的最后一个子线性操作,则所述第二子线性操作的融合参数用于作为所述第二卷积层的操作参数。
在一种可能的实现中,所述线性操作包括多个子线性操作,且所述多个子线性操作的操作类型包括如下的至少一种:加和操作、空操作、恒等操作、卷积操作、批归一化BN操作或池化操作。
在一种可能的实现中,若所述第二子线性操作的操作类型为卷积操作或BN操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数以及所述第二子线性操作的操作参数进行内积计算得到;若所述第二子线性操作的操作类型为加和操作、池化操作、恒等操作或空操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数进行所述第二子线性操作的操作类型对应的计算得到。
本申请实施例提供了一种模型训练方法,包括:获取第一神经网络模型,所述第一神经网络模型包括第一卷积层,所述第一神经网络模型用于实现目标任务;根据如下信息的至少一种,确定用于替换所述第一卷积层的目标线性操作,其中,所述信息包括所述第一神经网络模型的网络结构、所述目标任务以及所述第一卷积层在所述第一神经网络模型的位置,所述目标线性操作等效为一个卷积层;根据所述第一神经网络模型,获取第二神经网络模型,其中,所述第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为所述目标线性操作得到的;对所述第二神经网络模型进行模型训练,以获取目标神经网络模型。通过上述方式,将待训练的神经网络中的卷积层替换为目标线性操作,目标线性操作的结构为根据第一神经网络模型的结构、目标任务和/或第一卷积层在第一神经网络模型中的位置来确定的,相比现有技术中进行卷积层替换时所采用的线性操作,本实施 例中线性操作的结构可以更适配于第一神经网络模型,更加的灵活,针对于不同的模型结构以及任务类型可以设计不同的线性操作,以此提高了训练后模型的精度。
此外,本申请提供了一种模型训练方法,所述方法包括:
获取第一神经网络模型,所述第一神经网络模型包括第一卷积层;
根据所述第一神经网络模型,获取多个第二神经网络模型,其中,每个第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为目标线性操作得到的,所述目标线性操作等效为一个卷积层,所述目标线性操作包括多个子线性操作,所述目标线性操作包括M个操作分支,每个操作分支的输入为所述目标线性操作的输入,所述M个操作分支满足如下条件的至少一个:
所述M个操作分支包括的多个子线性操作中至少一个子线性操作的输入为所述多个子线性操作中多个子线性操作的输出;
所述M个操作分支中至少两个操作分支之间包括的子线性操作的数量不同;或,
所述M个操作分支中至少两个操作分支之间包括的子线性操作的操作类型不同;
对所述第二神经网络模型进行模型训练,以获取目标神经网络模型。
在一种可能的实现中,所述目标线性操作等效的卷积层的感受野小于或等于所述第一卷积层的感受野。
在一种可能的实现中,所述目标线性操作与所述第一卷积层不同。
在一种可能的实现中,所述目标线性操作等效的卷积层和所述目标线性操作在处理相同的数据时,得到的处理结果相同。
在一种可能的实现中,所述目标神经网络模型包括训练后的目标线性操作,所述方法还包括:
将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述训练后的目标线性操作等效的第二卷积层,以获取第三神经网络模型。
在一种可能的实现中,所述第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
在一种可能的实现中,所述方法还包括:
根据所述训练后的目标线性操作包括的多个子线性操作在处理数据时的先后顺序,将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,直至完成到所述先后顺序中最后一个子线性操作的融合,以得到所述目标线性操作等效的第二卷积层。
在一种可能的实现中,所述训练后的目标线性操作包括相邻的第一子线性操作以及第二子线性操,且在所述先后顺序中,所述第二子线性操作位于所述第一子线性操作之后,所述第一子线性操作包括第一操作参数,所述第二子线性操作包括第二操作参数;
所述将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,包括:
获取所述第一子线性操作的融合参数,其中,若所述第一子线性操作的输入数据为所述训练后的目标线性操作的输入数据,则所述第一子线性操作的融合参数为所述第一操作参数,若所述第一子线性操作的输入数据为与所述第一子线性操作相邻且在所述先后顺序之前的第三子线性操作的输出数据,则所述第一子线性操作的融合参数为根据所述第三子线性操作的融合参数以及所述第一操作参数得到;
根据所述第一子线性操作的融合参数、所述第二操作参数以及所述第二子线性操作的操作类型,获取所述第二子线性操作的融合参数;其中,若所述第二子线性操作为所述先后顺序中的最后一个子线性操作,则所述第二子线性操作的融合参数用于作为所述第二卷积层的操作参数。
在一种可能的实现中,所述线性操作包括多个子线性操作,且所述多个子线性操作的操作类型包括如下的至少一种:加和操作、空操作、恒等操作、卷积操作、批归一化BN操作或池化操作。
在一种可能的实现中,若所述第二子线性操作的操作类型为卷积操作或BN操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数以及所述第二子线性操作的操作参数进行内积计算得到;若所述第二子线性操作的操作类型为加和操作、池化操作、恒等操作或空操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数进行所述第二子线性操作的操作类型对应的计算得到。
本申请提供了一种模型训练方法,所述方法包括:获取第一神经网络模型,所述第一神经网络模型包括第一卷积层;根据所述第一神经网络模型,获取多个第二神经网络模型,其中,每个第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为目标线性操作得到的,所述目标线性操作等效为一个卷积层,所述目标线性操作包括多个子线性操作,所述目标线性操作包括M个操作分支,每个操作分支的输入为所述目标线性操作的输入,所述M个操作分支满足如下条件的至少一个:所述M个操作分支包括的多个子线性操作中至少一个子线性操作的输入为所述多个子线性操作中多个子线性操作的输出;所述M个操作分支中至少两个操作分支之间包括的子线性操作的数量不同;或,所述M个操作分支中至少两个操作分支之间包括的子线性操作的操作类型不同;对所述第二神经网络模型进行模型训练,以获取目标神经网络模型。相对于现有技术中,用于替换卷积层的线性操作的结构,本实施例中提供的目标线性操作的结构更为复杂,可以提高训练后模型的精度。
参照图17,图17为本申请实施例提供的一种模型训练装置1700的示意,如图17中示出的那样,本申请提供的模型训练装置1700包括:
获取模块1701,用于获取第一神经网络模型,所述第一神经网络模型包括第一卷积层;
根据所述第一神经网络模型,获取多个第二神经网络模型,其中,每个第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为一种线性操作得到的,所述线性操作等效为一个卷积层;
其中,获取模块1701的相关描述可以参照上述实施例中步骤501至步骤502的描述,这里不再赘述。
模型训练模块1702,用于对所述多个第二神经网络模型进行模型训练,以获取目标神经网络模型,所述目标神经网络模型为训练后的多个第二神经网络模型中模型精度最高的神经网络模型。
其中,模型训练模块1702的相关描述可以参照上述实施例中步骤503的描述,这里不再 赘述。
在一种可能的实现中,所述线性操作等效的卷积层的感受野小于或等于所述第一卷积层的感受野。
在一种可能的实现中,所述线性操作包括多个操作分支,每个操作分支的输入为所述线性操作的输入,所述每个操作分支包括串行的至少一个子线性操作,且所述串行的至少一个子线性操作的等效感受野小于或等于所述第一卷积层的感受野;或,
所述线性操作包括一个操作分支,所述操作分支用于对所述线性操作的输入数据进行处理,所述操作分支包括串行的至少一个子线性操作,且所述串行的至少一个子线性操作的等效感受野小于或等于所述第一卷积层的感受野。
在一种可能的实现中,每个第二神经网络模型中的所述线性操作与所述第一卷积层不同,且不同的第二神经网络模型包括的所述线性操作不同。
在一种可能的实现中,所述线性操作等效的卷积层和所述线性操作在处理相同的数据时,得到的处理结果相同。
在一种可能的实现中,所述目标神经网络模型对应的第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为目标线性操作得到的,所述目标神经网络模型包括训练后的目标线性操作,所述获取模块,用于:
将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述训练后的目标线性操作等效的第二卷积层,以获取第三神经网络模型。
在一种可能的实现中,所述第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
在一种可能的实现中,所述装置还包括:
融合模块,用于根据所述训练后的目标线性操作包括的多个子线性操作在处理数据时的先后顺序,将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,直至完成到所述先后顺序中最后一个子线性操作的融合,以得到所述目标线性操作等效的第二卷积层。
在一种可能的实现中,所述训练后的目标线性操作包括相邻的第一子线性操作以及第二子线性操,且在所述先后顺序中,所述第二子线性操作位于所述第一子线性操作之后,所述第一子线性操作包括第一操作参数,所述第二子线性操作包括第二操作参数;
所述融合模块,用于:
获取所述第一子线性操作的融合参数,其中,若所述第一子线性操作的输入数据为所述训练后的目标线性操作的输入数据,则所述第一子线性操作的融合参数为所述第一操作参数,若所述第一子线性操作的输入数据为与所述第一子线性操作相邻且在所述先后顺序之前的第三子线性操作的输出数据,则所述第一子线性操作的融合参数为根据所述第三子线性操作的融合参数以及所述第一操作参数得到;
根据所述第一子线性操作的融合参数、所述第二操作参数以及所述第二子线性操作的操作类型,获取所述第二子线性操作的融合参数;其中,若所述第二子线性操作为所述先后顺序中的最后一个子线性操作,则所述第二子线性操作的融合参数用于作为所述第二卷积层的操作参数。
在一种可能的实现中,所述线性操作包括多个子线性操作,且所述多个子线性操作的操作类型包括如下的至少一种:加和操作、空操作、恒等操作、卷积操作、批归一化BN操作或池化操作。
在一种可能的实现中,若所述第二子线性操作的操作类型为卷积操作或BN操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数以及所述第二子线性操作的操作参数进行内积计算得到;若所述第二子线性操作的操作类型为加和操作、池化操作、恒等操作或空操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数进行所述第二子线性操作的操作类型对应的计算得到。
在一种实现中,模型训练装置中获取模块1701,可以用于获取第一神经网络模型,所述第一神经网络模型包括第一卷积层;
根据所述第一神经网络模型,获取第二神经网络模型,其中,所述第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为目标线性操作得到的,所述目标线性操作包括多个子线性操作,所述目标线性操作用于等效为一个卷积层,所述目标线性操作包括M个操作分支,每个操作分支的输入为所述目标线性操作的输入,所述多个子线性操作满足如下条件的至少一个:
所述多个子线性操作包括至少三种操作类型;
所述M不为3;
所述M个操作分支中至少一个操作分支包括的子线性操作的数量不等于2,所述M为正整数;或,
所述M个操作分支中至少一个操作分支中操作类型为卷积操作的子线性操作的数量不为1;
模型训练模块1702,可以用于对所述第二神经网络模型进行模型训练,以获取目标神经网络模型。
在一种可能的实现中,所述目标线性操作等效的卷积层的感受野小于或等于所述第一卷积层的感受野。
在一种可能的实现中,所述目标线性操作与所述第一卷积层不同。
在一种可能的实现中,所述目标线性操作等效的卷积层和所述目标线性操作在处理相同的数据时,得到的处理结果相同。
在一种可能的实现中,所述获取模块,用于将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述训练后的目标线性操作等效的第二卷积层,以获取第三神经网络模型。
在一种可能的实现中,所述第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
在一种可能的实现中,所述装置还包括:
融合模块,用于根据所述训练后的目标线性操作包括的多个子线性操作在处理数据时的先后顺序,将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,直至完成到所述先后顺序中最后一个子线性操作的融合,以得到所述目标线性操作等效的第二卷积层。
在一种可能的实现中,所述训练后的目标线性操作包括相邻的第一子线性操作以及第二子线性操,且在所述先后顺序中,所述第二子线性操作位于所述第一子线性操作之后,所述第一子线性操作包括第一操作参数,所述第二子线性操作包括第二操作参数;
所述融合模块,用于获取所述第一子线性操作的融合参数,其中,若所述第一子线性操作的输入数据为所述训练后的目标线性操作的输入数据,则所述第一子线性操作的融合参数为所述第一操作参数,若所述第一子线性操作的输入数据为与所述第一子线性操作相邻且在所述先后顺序之前的第三子线性操作的输出数据,则所述第一子线性操作的融合参数为根据所述第三子线性操作的融合参数以及所述第一操作参数得到;
根据所述第一子线性操作的融合参数、所述第二操作参数以及所述第二子线性操作的操作类型,获取所述第二子线性操作的融合参数;其中,若所述第二子线性操作为所述先后顺序中的最后一个子线性操作,则所述第二子线性操作的融合参数用于作为所述第二卷积层的操作参数。
在一种可能的实现中,所述线性操作包括多个子线性操作,且所述多个子线性操作的操作类型包括如下的至少一种:加和操作、空操作、恒等操作、卷积操作、批归一化BN操作或池化操作。
在一种可能的实现中,若所述第二子线性操作的操作类型为卷积操作或BN操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数以及所述第二子线性操作的操作参数进行内积计算得到;若所述第二子线性操作的操作类型为加和操作、池化操作、恒等操作或空操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数进行所述第二子线性操作的操作类型对应的计算得到。
本申请实施例还提供了一种模型训练装置,所述装置包括:
获取模块,用于获取第一神经网络模型,所述第一神经网络模型包括第一卷积层;
根据如下信息的至少一种,确定用于替换所述第一卷积层的目标线性操作,其中,所述信息包括所述第一神经网络模型的网络结构、所述目标任务以及所述第一卷积层在所述第一神经网络模型的位置,所述目标线性操作等效为一个卷积层;
根据所述第一神经网络模型,获取第二神经网络模型,其中,所述第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为所述目标线性操作得到的;
模型训练模块,用于对所述第二神经网络模型进行模型训练,以获取目标神经网络模型。
在一种可能的实现中,所述目标线性操作包括多个子线性操作,所述目标线性操作包括M个操作分支,每个操作分支的输入为所述目标线性操作的输入,所述M个操作分支满足如下条件的至少一个:
所述M个操作分支包括的多个子线性操作中至少一个子线性操作的输入为所述多个子线性操作中多个子线性操作的输出;
所述M个操作分支中至少两个操作分支之间包括的子线性操作的数量不同;或,
所述M个操作分支中至少两个操作分支之间包括的子线性操作的操作类型不同。
在一种可能的实现中,所述目标线性操作等效的卷积层的感受野小于或等于所述第一 卷积层的感受野。
在一种可能的实现中,所述目标线性操作与所述第一卷积层不同。
在一种可能的实现中,所述目标线性操作等效的卷积层和所述目标线性操作在处理相同的数据时,得到的处理结果相同。
在一种可能的实现中,所述获取模块,用于将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述训练后的目标线性操作等效的第二卷积层,以获取第三神经网络模型。
在一种可能的实现中,所述第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
在一种可能的实现中,所述装置还包括:
融合模块,用于根据所述训练后的目标线性操作包括的多个子线性操作在处理数据时的先后顺序,将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,直至完成到所述先后顺序中最后一个子线性操作的融合,以得到所述目标线性操作等效的第二卷积层。
在一种可能的实现中,所述训练后的目标线性操作包括相邻的第一子线性操作以及第二子线性操,且在所述先后顺序中,所述第二子线性操作位于所述第一子线性操作之后,所述第一子线性操作包括第一操作参数,所述第二子线性操作包括第二操作参数;
所述融合模块,用于获取所述第一子线性操作的融合参数,其中,若所述第一子线性操作的输入数据为所述训练后的目标线性操作的输入数据,则所述第一子线性操作的融合参数为所述第一操作参数,若所述第一子线性操作的输入数据为与所述第一子线性操作相邻且在所述先后顺序之前的第三子线性操作的输出数据,则所述第一子线性操作的融合参数为根据所述第三子线性操作的融合参数以及所述第一操作参数得到;
根据所述第一子线性操作的融合参数、所述第二操作参数以及所述第二子线性操作的操作类型,获取所述第二子线性操作的融合参数;其中,若所述第二子线性操作为所述先后顺序中的最后一个子线性操作,则所述第二子线性操作的融合参数用于作为所述第二卷积层的操作参数。
在一种可能的实现中,所述线性操作包括多个子线性操作,且所述多个子线性操作的操作类型包括如下的至少一种:加和操作、空操作、恒等操作、卷积操作、批归一化BN操作或池化操作。
在一种可能的实现中,若所述第二子线性操作的操作类型为卷积操作或BN操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数以及所述第二子线性操作的操作参数进行内积计算得到;若所述第二子线性操作的操作类型为加和操作、池化操作、恒等操作或空操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数进行所述第二子线性操作的操作类型对应的计算得到。
本申请实施例还提供了一种模型训练装置,所述装置包括:
获取模块,用于获取第一神经网络模型,所述第一神经网络模型包括第一卷积层;
根据所述第一神经网络模型,获取多个第二神经网络模型,其中,每个第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为目标线性操作得到的,所述目 标线性操作等效为一个卷积层,所述目标线性操作包括多个子线性操作,所述目标线性操作包括M个操作分支,每个操作分支的输入为所述目标线性操作的输入,所述M个操作分支满足如下条件的至少一个:
所述M个操作分支包括的多个子线性操作中至少一个子线性操作的输入为所述多个子线性操作中多个子线性操作的输出;
所述M个操作分支中至少两个操作分支之间包括的子线性操作的数量不同;或,
所述M个操作分支中至少两个操作分支之间包括的子线性操作的操作类型不同;
模型训练模块,用于对所述第二神经网络模型进行模型训练,以获取目标神经网络模型。
在一种可能的实现中,所述目标线性操作等效的卷积层的感受野小于或等于所述第一卷积层的感受野。
在一种可能的实现中,所述目标线性操作与所述第一卷积层不同。
在一种可能的实现中,所述目标线性操作等效的卷积层和所述目标线性操作在处理相同的数据时,得到的处理结果相同。
在一种可能的实现中,所述目标神经网络模型包括训练后的目标线性操作,所述获取模块,用于:
将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述训练后的目标线性操作等效的第二卷积层,以获取第三神经网络模型。
在一种可能的实现中,所述第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
在一种可能的实现中,所述装置还包括:
融合模块,用于根据所述训练后的目标线性操作包括的多个子线性操作在处理数据时的先后顺序,将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,直至完成到所述先后顺序中最后一个子线性操作的融合,以得到所述目标线性操作等效的第二卷积层。
可能的实现中,所述训练后的目标线性操作包括相邻的第一子线性操作以及第二子线性操,且在所述先后顺序中,所述第二子线性操作位于所述第一子线性操作之后,所述第一子线性操作包括第一操作参数,所述第二子线性操作包括第二操作参数;
所述将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,包括:
获取所述第一子线性操作的融合参数,其中,若所述第一子线性操作的输入数据为所述训练后的目标线性操作的输入数据,则所述第一子线性操作的融合参数为所述第一操作参数,若所述第一子线性操作的输入数据为与所述第一子线性操作相邻且在所述先后顺序之前的第三子线性操作的输出数据,则所述第一子线性操作的融合参数为根据所述第三子线性操作的融合参数以及所述第一操作参数得到;
根据所述第一子线性操作的融合参数、所述第二操作参数以及所述第二子线性操作的操作类型,获取所述第二子线性操作的融合参数;其中,若所述第二子线性操作为所述先后顺序中的最后一个子线性操作,则所述第二子线性操作的融合参数用于作为所述第二卷积层的操作参数。
在一种可能的实现中,所述线性操作包括多个子线性操作,且所述多个子线性操作的操作类型包括如下的至少一种:加和操作、空操作、恒等操作、卷积操作、批归一化BN操作或池化操作。
在一种可能的实现中,若所述第二子线性操作的操作类型为卷积操作或BN操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数以及所述第二子线性操作的操作参数进行内积计算得到;若所述第二子线性操作的操作类型为加和操作、池化操作、恒等操作或空操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数进行所述第二子线性操作的操作类型对应的计算得到。
接下来介绍本申请实施例提供的一种执行设备,请参阅图18,图18为本申请实施例提供的执行设备的一种结构示意图,执行设备1800具体可以表现为手机、平板、笔记本电脑、智能穿戴设备、服务器等,此处不做限定。其中,执行设备1800上可以部署有图10对应实施例中所描述的数据处理装置,用于实现图10对应实施例中数据处理的功能。具体的,执行设备1800包括:接收器1801、发射器1802、处理器1803和存储器1804(其中执行设备1800中的处理器1803的数量可以一个或多个,图11中以一个处理器为例),其中,处理器1803可以包括应用处理器18031和通信处理器18032。在本申请的一些实施例中,接收器1801、发射器1802、处理器1803和存储器1804可通过总线或其它方式连接。
存储器1804可以包括只读存储器和随机存取存储器,并向处理器1803提供指令和数据。存储器1804的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1804存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。
处理器1803控制执行设备的操作。具体的应用中,执行设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1803中,或者由处理器1803实现。处理器1803可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1803中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1803可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器、以及视觉处理器(vision processing unit,VPU)、张量处理器(tensor processing unit,TPU)等适用于AI运算的处理器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1803可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦 写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1804,处理器1803读取存储器1804中的信息,结合其硬件完成上述方法的步骤。
接收器1801可用于接收输入的数字或字符信息,以及产生与执行设备的相关设置以及功能控制有关的信号输入。发射器1802可用于通过第一接口输出数字或字符信息;发射器1802还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1802还可以包括显示屏等显示设备。
执行设备可以获取到通过图5或图16b对应实施例中的模型训练方法训练得到的模型,并进行模型推理。
本申请实施例还提供了一种训练设备,请参阅图19,图19是本申请实施例提供的训练设备一种结构示意图,具体的,训练设备1900由一个或多个服务器实现,训练设备1900可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1919(例如,一个或一个以上处理器)和存储器1932,一个或一个以上存储应用程序1942或数据1944的存储介质1930(例如一个或一个以上海量存储设备)。其中,存储器1932和存储介质1930可以是短暂存储或持久存储。存储在存储介质1930的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对训练设备中的一系列指令操作。更进一步地,中央处理器1919可以设置为与存储介质1930通信,在训练设备1900上执行存储介质1930中的一系列指令操作。
训练设备1900还可以包括一个或一个以上电源1926,一个或一个以上有线或无线网络接口1950,一个或一个以上输入输出接口1958;或,一个或一个以上操作系统1941,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
具体的,训练设备可以执行图5或图16b对应实施例中的模型训练方法。
图17中描述的模型训练装置1700可以为训练设备中的模块,训练设备中的处理器可以执行模型训练装置1700所执行的模型训练方法。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述执行设备所执行的步骤,或者,使得计算机执行如前述训练设备所执行的步骤。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述执行设备所执行的步骤,或者,使得计算机执行如前述训练设备所执行的步骤。
本申请实施例提供的执行设备、训练设备或终端设备具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使执行设备内的芯片执行上述实施例描述的数据处理方法,或者,以使训练设备内的芯片执行上述实施例描述的数据处理方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储 设备,随机存取存储器(random access memory,RAM)等。
具体的,请参阅图20,图20为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 2000,NPU 2000作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路2003,通过控制器2004控制运算电路2003提取存储器中的矩阵数据并进行乘法运算。
NPU 2000可以通过内部的各个器件之间的相互配合,来实现图5所描述的实施例中提供的模型训练方法,或者对训练得到的模型进行推理。
其中,NPU 2000中的运算电路2003可以执行获取第一神经网络模型以及对所述第一神经网络模型进行模型训练的步骤。
更具体的,在一些实现中,NPU 2000中的运算电路2003内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路2003是二维脉动阵列。运算电路2003还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路2003是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器2002中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器2001中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)2008中。
统一存储器2006用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)2005,DMAC被搬运到权重存储器2002中。输入数据也通过DMAC被搬运到统一存储器2006中。
BIU为Bus Interface Unit即,总线接口单元2010,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)2009的交互。
总线接口单元2010(Bus Interface Unit,简称BIU),用于取指存储器2009从外部存储器获取指令,还用于存储单元访问控制器2005从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器2006或将权重数据搬运到权重存储器2002中或将输入数据数据搬运到输入存储器2001中。
向量计算单元2007包括多个运算处理单元,在需要的情况下,对运算电路2003的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。
在一些实现中,向量计算单元2007能将经处理的输出的向量存储到统一存储器2006。例如,向量计算单元2007可以将线性函数;或,非线性函数应用到运算电路2003的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元2007生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路2003的激活输入,例如用于在神经网络中的后续层中的使用。
控制器2004连接的取指存储器(instruction fetch buffer)2009,用于存储控制器2004使用的指令;
统一存储器2006,输入存储器2001,权重存储器2002以及取指存储器2009均为On-Chip存储器。外部存储器私有于该NPU硬件架构。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (31)

  1. 一种模型训练方法,其特征在于,所述方法包括:
    获取第一神经网络模型,所述第一神经网络模型包括第一卷积层;
    根据所述第一神经网络模型,获取多个第二神经网络模型,其中,每个第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为一种线性操作得到的,所述线性操作等效为一个卷积层;
    对所述多个第二神经网络模型进行模型训练,以获取目标神经网络模型,所述目标神经网络模型为训练后的多个第二神经网络模型中模型精度最高的神经网络模型。
  2. 根据权利要求1所述的方法,其特征在于,所述线性操作等效的卷积层的感受野小于或等于所述第一卷积层的感受野。
  3. 根据权利要求1或2所述的方法,其特征在于,所述线性操作包括多个操作分支,每个操作分支的输入为所述线性操作的输入,所述每个操作分支包括串行的至少一个子线性操作,且所述串行的至少一个子线性操作的等效感受野小于或等于所述第一卷积层的感受野;或,
    所述线性操作包括一个操作分支,所述操作分支用于对所述线性操作的输入数据进行处理,所述操作分支包括串行的至少一个子线性操作,且所述串行的至少一个子线性操作的等效感受野小于或等于所述第一卷积层的感受野。
  4. 根据权利要求1至3任一所述的方法,其特征在于,每个第二神经网络模型中的所述线性操作与所述第一卷积层不同,且不同的第二神经网络模型包括的所述线性操作不同。
  5. 根据权利要求1至4任一所述的方法,其特征在于,所述线性操作等效的卷积层和所述线性操作在处理相同的数据时,得到的处理结果相同。
  6. 根据权利要求1至5任一所述的方法,其特征在于,所述目标神经网络模型对应的第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为目标线性操作得到的,所述目标神经网络模型包括训练后的目标线性操作,所述方法还包括:
    将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述训练后的目标线性操作等效的第二卷积层,以获取第三神经网络模型。
  7. 根据权利要求6所述的方法,其特征在于,所述第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
  8. 根据权利要求6或7所述的方法,其特征在于,所述方法还包括:
    根据所述训练后的目标线性操作包括的多个子线性操作在处理数据时的先后顺序,将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,直至完成到所述先 后顺序中最后一个子线性操作的融合,以得到所述目标线性操作等效的第二卷积层。
  9. 根据权利要求8所述的方法,其特征在于,所述训练后的目标线性操作包括相邻的第一子线性操作以及第二子线性操,且在所述先后顺序中,所述第二子线性操作位于所述第一子线性操作之后,所述第一子线性操作包括第一操作参数,所述第二子线性操作包括第二操作参数;
    所述将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,包括:
    获取所述第一子线性操作的融合参数,其中,若所述第一子线性操作的输入数据为所述训练后的目标线性操作的输入数据,则所述第一子线性操作的融合参数为所述第一操作参数,若所述第一子线性操作的输入数据为与所述第一子线性操作相邻且在所述先后顺序之前的第三子线性操作的输出数据,则所述第一子线性操作的融合参数为根据所述第三子线性操作的融合参数以及所述第一操作参数得到;
    根据所述第一子线性操作的融合参数、所述第二操作参数以及所述第二子线性操作的操作类型,获取所述第二子线性操作的融合参数;其中,若所述第二子线性操作为所述先后顺序中的最后一个子线性操作,则所述第二子线性操作的融合参数用于作为所述第二卷积层的操作参数。
  10. 根据权利要求1至9任一所述的方法,其特征在于,所述线性操作包括多个子线性操作,且所述多个子线性操作的操作类型包括如下的至少一种:加和操作、空操作、恒等操作、卷积操作、批归一化BN操作或池化操作。
  11. 根据权利要求9或10所述的方法,其特征在于,若所述第二子线性操作的操作类型为卷积操作或BN操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数以及所述第二子线性操作的操作参数进行内积计算得到;若所述第二子线性操作的操作类型为加和操作、池化操作、恒等操作或空操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数进行所述第二子线性操作的操作类型对应的计算得到。
  12. 一种模型训练方法,其特征在于,所述方法包括:
    获取第一神经网络模型,所述第一神经网络模型包括第一卷积层,所述第一神经网络模型用于实现目标任务;
    根据如下信息的至少一种,确定用于替换所述第一卷积层的目标线性操作,其中,所述信息包括所述第一神经网络模型的网络结构、所述目标任务以及所述第一卷积层在所述第一神经网络模型的位置,所述目标线性操作等效为一个卷积层;
    根据所述第一神经网络模型,获取第二神经网络模型,其中,所述第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为所述目标线性操作得到的;
    对所述第二神经网络模型进行模型训练,以获取目标神经网络模型。
  13. 根据权利要求12所述的方法,其特征在于,所述目标线性操作包括多个子线性操作,所述目标线性操作包括M个操作分支,每个操作分支的输入为所述目标线性操作的输入,所述M个操作分支满足如下条件的至少一个:
    所述M个操作分支包括的多个子线性操作中至少一个子线性操作的输入为所述多个子线性操作中多个子线性操作的输出;
    所述M个操作分支中至少两个操作分支之间包括的子线性操作的数量不同;或,
    所述M个操作分支中至少两个操作分支之间包括的子线性操作的操作类型不同。
  14. 根据权利要求12或13所述的方法,其特征在于,所述目标线性操作等效的卷积层的感受野小于或等于所述第一卷积层的感受野。
  15. 根据权利要求12至14任一所述的方法,其特征在于,所述目标线性操作与所述第一卷积层不同。
  16. 根据权利要求12至15任一所述的方法,其特征在于,所述目标线性操作等效的卷积层和所述目标线性操作在处理相同的数据时,得到的处理结果相同。
  17. 根据权利要求12至16任一所述的方法,其特征在于,所述目标神经网络模型包括训练后的目标线性操作,所述方法还包括:
    将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述训练后的目标线性操作等效的第二卷积层,以获取第三神经网络模型。
  18. 一种模型训练装置,其特征在于,所述装置包括:
    获取模块,用于获取第一神经网络模型,所述第一神经网络模型包括第一卷积层;
    根据所述第一神经网络模型,获取多个第二神经网络模型,其中,每个第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为一种线性操作得到的,所述线性操作等效为一个卷积层;
    模型训练模块,用于对所述多个第二神经网络模型进行模型训练,以获取目标神经网络模型,所述目标神经网络模型为训练后的多个第二神经网络模型中模型精度最高的神经网络模型。
  19. 根据权利要求18所述的装置,其特征在于,所述线性操作等效的卷积层的感受野小于或等于所述第一卷积层的感受野。
  20. 根据权利要求18或19所述的装置,其特征在于,所述线性操作包括多个操作分支,每个操作分支的输入为所述线性操作的输入,所述每个操作分支包括串行的至少一个子线性操作,且所述串行的至少一个子线性操作的等效感受野小于或等于所述第一卷积层的感 受野;或,
    所述线性操作包括一个操作分支,所述操作分支用于对所述线性操作的输入数据进行处理,所述操作分支包括串行的至少一个子线性操作,且所述串行的至少一个子线性操作的等效感受野小于或等于所述第一卷积层的感受野。
  21. 根据权利要求18至20任一所述的装置,其特征在于,每个第二神经网络模型中的所述线性操作与所述第一卷积层不同,且不同的第二神经网络模型包括的所述线性操作不同。
  22. 根据权利要求18至21任一所述的装置,其特征在于,所述线性操作等效的卷积层和所述线性操作在处理相同的数据时,得到的处理结果相同。
  23. 根据权利要求18至22任一所述的装置,其特征在于,所述目标神经网络模型对应的第二神经网络模型为将所述第一神经网络模型中的所述第一卷积层替换为目标线性操作得到的,所述目标神经网络模型包括训练后的目标线性操作,所述获取模块,用于:
    将所述目标神经网络模型中的所述训练后的目标线性操作替换为所述训练后的目标线性操作等效的第二卷积层,以获取第三神经网络模型。
  24. 根据权利要求23所述的装置,其特征在于,所述第二卷积层的尺寸大小与所述第一卷积层的尺寸大小一致。
  25. 根据权利要求24所述的装置,其特征在于,所述装置还包括:
    融合模块,用于根据所述训练后的目标线性操作包括的多个子线性操作在处理数据时的先后顺序,将各个子线性操作融合到相邻且位于所述先后顺序中之后的子线性操作,直至完成到所述先后顺序中最后一个子线性操作的融合,以得到所述目标线性操作等效的第二卷积层。
  26. 根据权利要求25所述的装置,其特征在于,所述训练后的目标线性操作包括相邻的第一子线性操作以及第二子线性操,且在所述先后顺序中,所述第二子线性操作位于所述第一子线性操作之后,所述第一子线性操作包括第一操作参数,所述第二子线性操作包括第二操作参数;
    所述融合模块,用于:
    获取所述第一子线性操作的融合参数,其中,若所述第一子线性操作的输入数据为所述训练后的目标线性操作的输入数据,则所述第一子线性操作的融合参数为所述第一操作参数,若所述第一子线性操作的输入数据为与所述第一子线性操作相邻且在所述先后顺序之前的第三子线性操作的输出数据,则所述第一子线性操作的融合参数为根据所述第三子线性操作的融合参数以及所述第一操作参数得到;
    根据所述第一子线性操作的融合参数、所述第二操作参数以及所述第二子线性操作的 操作类型,获取所述第二子线性操作的融合参数;其中,若所述第二子线性操作为所述先后顺序中的最后一个子线性操作,则所述第二子线性操作的融合参数用于作为所述第二卷积层的操作参数。
  27. 根据权利要求18至26任一所述的装置,其特征在于,所述线性操作包括多个子线性操作,且所述多个子线性操作的操作类型包括如下的至少一种:加和操作、空操作、恒等操作、卷积操作、批归一化BN操作或池化操作。
  28. 根据权利要求26或27所述的装置,其特征在于,若所述第二子线性操作的操作类型为卷积操作或BN操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数以及所述第二子线性操作的操作参数进行内积计算得到;若所述第二子线性操作的操作类型为加和操作、池化操作、恒等操作或空操作,则所述第二子线性操作的融合参数为对所述第一子线性操作的融合参数进行所述第二子线性操作的操作类型对应的计算得到。
  29. 一种模型训练装置,其特征在于,所述装置包括存储器和处理器;所述存储器存储有代码,所述处理器被配置为获取所述代码,并执行如权利要求1至11、12至17任一所述的方法。
  30. 一种计算机存储介质,其特征在于,所述计算机存储介质存储有一个或多个指令,所述指令在由一个或多个计算机执行时使得所述一个或多个计算机实施权利要求1至11、12至17任一所述的方法。
  31. 一种计算机产品,包括代码,其特征在于,在所述代码被执行时用于实现如权利要求1至11、12至17任一所述的方法。
PCT/CN2022/074940 2021-02-10 2022-01-29 一种模型训练方法及装置 WO2022171027A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/446,294 US20230385642A1 (en) 2021-02-10 2023-08-08 Model training method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110183936.2A CN114912569A (zh) 2021-02-10 2021-02-10 一种模型训练方法及装置
CN202110183936.2 2021-02-10

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/446,294 Continuation US20230385642A1 (en) 2021-02-10 2023-08-08 Model training method and apparatus

Publications (1)

Publication Number Publication Date
WO2022171027A1 true WO2022171027A1 (zh) 2022-08-18

Family

ID=82761622

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/074940 WO2022171027A1 (zh) 2021-02-10 2022-01-29 一种模型训练方法及装置

Country Status (3)

Country Link
US (1) US20230385642A1 (zh)
CN (1) CN114912569A (zh)
WO (1) WO2022171027A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109360206A (zh) * 2018-09-08 2019-02-19 华中农业大学 基于深度学习的大田稻穗分割方法
US20200160065A1 (en) * 2018-08-10 2020-05-21 Naver Corporation Method for training a convolutional recurrent neural network and for semantic segmentation of inputted video using the trained convolutional recurrent neural network
JP2020107042A (ja) * 2018-12-27 2020-07-09 Kddi株式会社 学習モデル生成装置、学習モデル生成方法、及びプログラム
CN111882040A (zh) * 2020-07-30 2020-11-03 中原工学院 基于通道数量搜索的卷积神经网络压缩方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200160065A1 (en) * 2018-08-10 2020-05-21 Naver Corporation Method for training a convolutional recurrent neural network and for semantic segmentation of inputted video using the trained convolutional recurrent neural network
CN109360206A (zh) * 2018-09-08 2019-02-19 华中农业大学 基于深度学习的大田稻穗分割方法
JP2020107042A (ja) * 2018-12-27 2020-07-09 Kddi株式会社 学習モデル生成装置、学習モデル生成方法、及びプログラム
CN111882040A (zh) * 2020-07-30 2020-11-03 中原工学院 基于通道数量搜索的卷积神经网络压缩方法

Also Published As

Publication number Publication date
US20230385642A1 (en) 2023-11-30
CN114912569A (zh) 2022-08-16

Similar Documents

Publication Publication Date Title
WO2022083536A1 (zh) 一种神经网络构建方法以及装置
CN110175671B (zh) 神经网络的构建方法、图像处理方法及装置
WO2021120719A1 (zh) 神经网络模型更新方法、图像处理方法及装置
WO2022042713A1 (zh) 一种用于计算设备的深度学习训练方法和装置
WO2021244249A1 (zh) 一种分类器的训练方法、数据处理方法、系统以及设备
WO2022052601A1 (zh) 神经网络模型的训练方法、图像处理方法及装置
WO2022001805A1 (zh) 一种神经网络蒸馏方法及装置
WO2021233342A1 (zh) 一种神经网络构建方法以及系统
WO2022111617A1 (zh) 一种模型训练方法及装置
WO2022068623A1 (zh) 一种模型训练方法及相关设备
WO2022179492A1 (zh) 一种卷积神经网络的剪枝处理方法、数据处理方法及设备
CN110222718B (zh) 图像处理的方法及装置
CN112215332B (zh) 神经网络结构的搜索方法、图像处理方法和装置
CN113570029A (zh) 获取神经网络模型的方法、图像处理方法及装置
WO2022228425A1 (zh) 一种模型训练方法及装置
WO2022012668A1 (zh) 一种训练集处理方法和装置
WO2022088063A1 (zh) 神经网络模型的量化方法和装置、数据处理的方法和装置
CN111931901A (zh) 一种神经网络构建方法以及装置
CN113592060A (zh) 一种神经网络优化方法以及装置
CN115081588A (zh) 一种神经网络参数量化方法和装置
WO2022156475A1 (zh) 神经网络模型的训练方法、数据处理方法及装置
CN117501245A (zh) 神经网络模型训练方法和装置、数据处理方法和装置
CN113536970A (zh) 一种视频分类模型的训练方法及相关装置
CN113128285A (zh) 一种处理视频的方法及装置
CN112446462A (zh) 目标神经网络模型的生成方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22752188

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22752188

Country of ref document: EP

Kind code of ref document: A1