CN116562338A

CN116562338A - Multi-branch convolution structure, neural network model, and determination method and determination device thereof

Info

Publication number: CN116562338A
Application number: CN202210100259.8A
Authority: CN
Inventors: 祝毅晨; 区志财; 唐剑
Original assignee: Midea Group Co Ltd; Midea Group Shanghai Co Ltd
Current assignee: Midea Group Co Ltd; Midea Group Shanghai Co Ltd
Priority date: 2022-01-27
Filing date: 2022-01-27
Publication date: 2023-08-08

Abstract

A multi-branch convolution structure, a neural network model, a determining method and a determining device thereof are provided, different second convolution units are added in front of first convolution units of a plurality of branches of the convolution structure, only the first convolution units are trained during training, the second convolution units are removed after training, the convolution structures are combined into a single convolution unit based on the first convolution units, a plurality of the convolution structures can be arranged in the neural network model, and the embodiment of the disclosure can improve the feature diversity learned among the branches and simultaneously maintain the branch expression capability.

Description

Multi-branch convolution structure, neural network model, and determination method and determination device thereof

Technical Field

The present disclosure relates to, but is not limited to, the field of machine learning, and more particularly, to a multi-branch convolution structure, a neural network model, and a determination method and a determination device thereof.

Background

The deep learning can automatically learn useful features, is free from the dependence on feature engineering, and obtains results exceeding other algorithms on tasks such as images, voices and the like. This success has greatly benefited from the advent of new neural networks, such as convolutional neural network (Convolutional Neural Networks, CNN) models, such as residual (res net) network models, acceptance network models, densnet network models, and the like. Convolutional layers are the most commonly used operators in convolutional neural network models. However, there is still room for improvement in the performance of the current model.

Disclosure of Invention

The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.

An embodiment of the present disclosure provides a neural network model determining method, including:

acquiring a first neural network model used in a training stage, wherein the first neural network model comprises a multi-branch convolution structure, the multiple branches of the convolution structure comprise a first convolution unit and a second convolution unit which are connected with each other, and at least part of parameters of the second convolution units in different branches are different;

in the process of training the first neural network model, updating parameters of the first convolution unit, wherein parameters of the second convolution unit are kept unchanged;

after training is completed, merging the convolution structures into a single convolution unit based on a first convolution unit in the convolution structures, and obtaining a second neural network model used in an inference stage.

An embodiment of the present disclosure further provides a neural network model determining device, including a processor and a memory storing a computer program, where the processor can implement the neural network model determining method according to any embodiment of the present disclosure when executing the computer program.

The neural network model determining method and the neural network model determining device can keep the expression capability of branches, and further improve the performance of the neural network model.

An embodiment of the present disclosure further provides a multi-branch convolution structure including a plurality of branches, each of the plurality of branches of the convolution structure including a first convolution unit and a second convolution unit connected to each other, at least some of the parameters of the second convolution units in different branches being different.

An embodiment of the present disclosure also provides a neural network model for training, including a plurality of convolution structures, some or all of which employ a multi-branched convolution structure as described in any embodiment of the present disclosure.

The neural network model of the above embodiment of the present disclosure can be used for training, improving the feature diversity learned between branches, while maintaining the branch expression capability.

An embodiment of the present disclosure also provides a computer program product, including a computer program, where the computer program may be stored on a non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer program is capable of implementing a neural network model determining method according to any embodiment of the present disclosure.

An embodiment of the present disclosure also provides a non-transitory computer readable storage medium storing a computer program, which when executed by a processor, is capable of implementing the neural network model determination method according to any embodiment of the present disclosure.

Other aspects will become apparent upon reading and understanding the accompanying drawings and detailed description.

Drawings

The accompanying drawings are included to provide an understanding of embodiments of the disclosure, and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain, without limitation, the embodiments.

FIG. 1 is a schematic illustration of a multi-branch convolution structure;

FIG. 2 is a flow chart of a neural network model determination method of an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a multi-branch convolution structure in accordance with an embodiment of the present disclosure;

FIG. 4A is a schematic diagram of a network structure including a multi-branched convolution structure and a ReLU operator according to one embodiment of the disclosure;

FIG. 4B is a schematic diagram of the network architecture after merging the multi-branched convolution structures of FIG. 4A into a single convolution unit;

FIG. 5A is a schematic diagram of a local network architecture of a ResNet network model of an embodiment of the present disclosure, including a multi-branched convolution architecture of an embodiment of the present disclosure;

FIG. 5B is a schematic diagram of the partial network architecture resulting from combining the multi-branch convolution of FIG. 5A into a single convolution unit;

fig. 6 is a schematic diagram of a neural network structure determining apparatus according to an embodiment of the present disclosure.

Detailed Description

The present disclosure describes several embodiments, but the description is illustrative and not limiting, and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described in the present disclosure.

In the description of the present disclosure, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment described as "exemplary" or "e.g." in this disclosure should not be taken as preferred or advantageous over other embodiments. "and/or" herein is a description of an association relationship of an associated object, meaning that there may be three relationships, e.g., a and/or B, which may represent: a exists alone, A and B exist together, and B exists alone. "plurality" means two or more than two. In addition, for the purpose of clearly describing the technical solutions of the embodiments of the present disclosure, words such as "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.

In describing representative exemplary embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other sequences of steps are possible as will be appreciated by those of ordinary skill in the art. Accordingly, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Furthermore, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present disclosure.

In order to not lose diversity of branches in multi-branch convolution and maintain the expression capability of branches thereof, an embodiment of the present disclosure provides a neural network model determining method, as shown in fig. 2, including the steps of:

step 110, acquiring a first neural network model used in a training stage;

the first neural network model comprises a multi-branch convolution structure, wherein a plurality of branches of the convolution structure comprise a first convolution unit and a second convolution unit which are connected with each other, and at least part of parameters of the second convolution units in different branches are different.

The first neural network model used in the training stage can be created manually or obtained through searching by a neural network structure searching method, or a part of the structure is created manually, and the other part of the structure is obtained through searching by the neural network structure searching method.

Step 120, in the process of training the first neural network model, updating parameters of the first convolution unit, wherein parameters of the second convolution unit remain unchanged;

and step 130, after training, merging the convolution structures into a single convolution unit based on the first convolution unit in the convolution structures to obtain a second neural network model used in the test stage.

The multi-branch convolution of the embodiment expands a single convolution into multi-branch convolution in a training stage, combines the multi-branch convolution in a testing stage, maintains an original macroscopic structure by combining a large number of microstructures into different convolution structures, and improves the performance of a model under the condition of not changing the original model structure.

FIG. 1 shows another multi-branched convolution structure comprising four branches, wherein the first branch comprises a 1×1Conv operator and a BN operator connected in sequence; the second branch comprises a 1 multiplied by 1Conv operator, a BN operator, a Kmultiplied by K Conv operator and a BN operator which are connected in sequence; the third branch comprises a 1 multiplied by 1Conv operator, a BN operator, an AVG operator and a BN operator which are connected in sequence; the fourth branch includes a kxkconv operator and a BN operator. Where K.gtoreq.2, "Conv" represents convolution, BN represents batch normalization (Batch Normalization), and "AVG" represents average pooling. Other operators may also be used on the multi-branch convolution, such as a 1 x K Conv operator, a K x 1Conv operator, and so on. The inputs of each branch are the same, the inputs of the convolution are added, and the outputs of all branches are added to obtain the Output of the convolution (Output). In order to increase the feature diversity learned among the branches, different types of operators, such as AVG operators, are used in different branches, however, the use of different types of operators has been found to reduce the expression capacity of the neural network model, and in some cases, can cause the attenuation of the performance of the neural network model.

In the neural network model determining method of the above embodiment of the present disclosure, the convolution structure in the training model is modified, and the second convolution units for increasing the diversity of the branches are added before the first convolution units in the branches, where the parameters of the second convolution units are at least partially different. Only parameters such as weight parameters and/or bias parameters in the first convolution unit are updated during training, and after training, a plurality of branches are combined based on the first convolution unit to obtain a model used in a subsequent stage (such as a testing stage and a deployment stage). The addition of the second convolution units with different parameters can improve the variety of the learned characteristics among the branches, and meanwhile, other types of operators are not added in the multi-branch convolution structure, so that the expression capacity of the branches can be kept, and the performance of the neural network model is improved.

In an exemplary embodiment of the present disclosure, the second convolution unit in the branch is disposed upstream, with its output as the input of the first convolution unit. Referring to the multi-branch convolution structure shown in fig. 3, a first set of kxk Conv operators and BN operators in each branch in the figure form a second convolution unit, (denoted as kxk Conv-BN in the figure), and a second set of kxk Conv operators and BN operators in each branch form the first convolution unit. That is, in this embodiment, the input of the second convolution unit in the plurality of branches is the input of the convolution structure, and the outputs of the first convolution units in the plurality of branches are added to obtain the output of the convolution structure.

In an exemplary embodiment of the present disclosure, referring to the multi-branch convolution structure shown in fig. 3, the first convolution unit and the second convolution unit each include a kxkconv operator and a BN operator. However, in other embodiments, it is also possible that one of the convolution units comprises a kxk Conv operator and a BN operator, while the other comprises a kxk convolution operator, etc.; where k+.2, e.g., k=3 or 4 or 5,K ×k, represents the convolution kernel size of the convolution operator.

In one example of this embodiment, the weight parameters and/or bias parameters of the kxk convolution operator in the second convolution unit are randomly generated. In another example of this embodiment, the weight parameters and/or bias parameters of the kxk convolution operator in the second convolution unit may also be generated using different parameter initialization methods.

Parameters in the neural network are optimized based on a gradient descent method, and the minimum loss function and the optimal model weight are obtained through one-step iteration. The weight parameters and/or bias parameters are given an initial value prior to network training. For deep learning, appropriate parameter initial values may affect the convergence of the model. The different parameter initialization methods of the present embodiment may be arbitrarily selected from the following initialization methods:

pre-training (pre-training) initialization, namely pre-training on a large-scale data set to obtain a better parameter; the parameters are then used as initialization parameters (fine-tuning) for the model on the new task. Pre-training initialization can improve the generalization capability of the model and accelerate training.

All zeros are initialized, i.e. the weights are initialized to 0.

Random initialization, wherein saturation occurs if an initialization value is too large, so that gradient dispersion is caused on some activation functions; if the initial value is too small, gradient dispersion is also caused; to obtain the appropriate initialization values, the appropriate activation function may be selected and trained under BN constraints.

Xavier initialization is also divided into Xavier uniform distribution initialization and Xavier Gaussian distribution initialization.

He initialization, which is based on Xavier initialization, assuming that half of neurons of each layer of network are turned off, the variance of the distribution thereof will also become small. It was verified that He initialization can be considered as a result of Xavier initialization/2, since it works best when the initialization value is reduced by half. The effect of the ReLU on the distribution of the output data is taken into account so that the input and output data variances are consistent.

Because the convolution operation has homogeneity and additivity, the convolution structures of multiple branches can be combined. Taking the multi-branch convolution structure shown in fig. 1 as an example, the 1×1Conv operator and BN operator in the multi-branch convolution structure may be losslessly converted into one Conv operator, the k×k Conv operator and BN operator may be losslessly converted into one Conv operator, and the AVG operator may be losslessly converted into one Conv operator. Furthermore, two parallel kxkconv operators can also be converted into one kxkconv operator. By these transformations, the multi-branched convolution structure shown in fig. 1 can be merged into a single Conv operator, and this merging is lossless.

In an exemplary embodiment of the present disclosure, the first convolution unit includes one kxk convolution operator and one BN operator, and the plurality of branches of one convolution structure may be combined in the following manner: removing the second convolution units in the plurality of branches; combining the KxKConv operator and the BN operator of the first convolution unit in the branches respectively to obtain a plurality of intermediate convolution operators connected in parallel; and combining the plurality of intermediate convolution operators connected in parallel into a single convolution operator to obtain the single convolution unit of the embodiment. In this embodiment, the intermediate convolution operator is still a kxkconv operator, and a method of converting two parallel kxkconv operators into a kxkconv operator may be applied multiple times, so that more than 2 intermediate convolution operators are combined into a kxkconv operator. These combinations are lossless. The single convolution unit resulting from the combination of this embodiment includes a kxkconv operator, see fig. 4B.

In another exemplary embodiment of the present disclosure, the first convolution unit includes one kxk convolution operator and one BN operator, and the multiple branches of one convolution structure may be combined in another manner: removing a second convolution element from the plurality of branches of the convolution structure; combining K multiplied by K Conv operators and BN operators of a first convolution unit in M-1 branches of the convolution structure respectively to obtain M-1 intermediate convolution operators connected in parallel; the M-1 intermediate convolution operators in parallel and the KxKConv operator in one branch that is not combined are combined into a single convolution operator, which constitutes a single convolution unit of the present embodiment with the BN operator in one branch that is not combined. The single convolution unit into which the convolution structures in this embodiment are combined includes a kxkconv operator and a BN operator, see fig. 5B. The above M is the number of branches of the convolution structure, e.g. m=2, 3 or 4.

In this embodiment, a main branch is designated first during merging, the kxkconv operators and BN operators of the main branch are not merged, the kxkconv operators and BN operators of other branches are merged, and a plurality of intermediate Conv operators obtained by merging other branches are merged with the kxkconv operators of the main branch to reserve the BN operators of the main branch, so as to obtain a merged structure including the kxkconv operators and the BN operators. Of course, the k×k Conv operator and BN operator obtained by this method may be further combined into a k×k Conv operator.

In an exemplary embodiment of the present disclosure, the first neural network model and the second neural network model are convolutional neural network models, such as a ResNet network model, an acceptance network model, or a DenseNet network model. However, the disclosure is not limited thereto, and as long as the convolutional layer is included in other types of neural network models, the multi-branch network structure can be constructed, trained and combined by adopting the mode of the embodiment of the disclosure. It should be noted that when multiple convolution structures are included in a neural network model, it is not required that all convolution structures adopt the multi-branch convolution structure of the embodiments of the present disclosure (see fig. 3), and a part of the multi-branch convolution structures of the embodiments of the present disclosure may be adopted, and other convolution structures may be the multi-branch convolution structure shown in fig. 1, or other types of multi-branch convolution structures, or may be single-branch convolution structures, such as a k×k Conv operator and a BN operator that are sequentially connected.

In an exemplary embodiment of the present disclosure, the number N of branches of the plurality of convolution structures included in each network model may be the same or different, and in general, 2.ltoreq.N.ltoreq.4, to balance the complexity of the model, training time, and performance improvement of the model. But the present disclosure is not limited thereto and N may be equal to 5, 6, or other larger values, for example.

Training of the neural network model, the training mode is different when the tasks to be executed by the neural network model are different. For example, in supervised machine learning, a training set of samples may be input into a neural network model, the output of the neural network model is compared with target data, a trained loss is calculated by a loss function, parameters in the neural network model, such as weight parameters and bias parameters of a Conv operator, are optimized by an algorithm such as gradient descent according to the trained loss, and the objective of the optimization is to reduce the loss so that the output of the neural network model converges with the target data. The parameters of part of the operators can also be frozen during training, for example, in this embodiment, the parameters of the added second convolution unit are frozen, and no update is performed during training, so as to reduce training time. Furthermore, the second convolution unit is not used in combination, and the updating of the parameters of the second convolution unit cannot be used in the models of the subsequent test stage and the partial stage. However, due to the existence of the second convolution unit, the weight parameters and the bias parameters of the second convolution unit can be randomly generated or generated in different modes, and different branches are different, so that the diversity of each branch of the convolution structure can be increased, and a better model can be obtained through training.

In an exemplary embodiment of the present disclosure, the second neural network model may be tested, so to speak, validated, and if passed, the second neural network may be determined to be the neural network model used in the deployment phase. The testing or validation herein may be performed using a test sample set. Taking the second neural network model for target recognition as an example, whether the test passes or not can be determined according to the recognition precision of the second neural network model, if the recognition precision is greater than a certain preset threshold value, the precision requirement is considered to be met, and if the recognition precision is not greater than the preset threshold value, the test passes, otherwise, the test does not pass. When the test fails, the first neural network model can be trained by continuously adopting the training sample set, or the local network structure is modified by adopting different initialization parameters, and the training and the test are performed again. Deploying the second neural network here means that the second neural network has been trained and can be used to perform actual tasks, such as image classification, destination identification, etc.

An embodiment of the present disclosure further provides a neural network model determining apparatus, as shown in fig. 6, including, the apparatus is provided with a processor 60 and a memory 50, where the memory 50 stores a computer program, and the processor 60 may implement the neural network model determining method according to any embodiment of the present disclosure when executing the computer program.

The processor 60 in the present embodiment may be a general-purpose processor, for example, a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), or the like; but may also be a Digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. The neural network model determining device disclosed by the embodiment of the disclosure uses the first convolution unit and the second convolution unit which are connected with each other as a branch structure, and the parameters of the second convolution unit are at least partially different, so that only the parameters of the first convolution unit are updated in the training process, the learned feature diversity among the branches can be improved, the branch expression capability is maintained, and the performance of the neural network model is improved.

The memory 50 in this embodiment may comprise RAM, ROM, EEPROM, CD-ROM, or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can store the desired program code in the form of instructions or data structures and that can be accessed by a computer.

An embodiment of the present disclosure further provides a multi-branch convolution structure, where the convolution structure includes a plurality of branches, and the plurality of branches of the convolution structure each include a first convolution unit and a second convolution unit that are connected to each other, where at least some parameters of the second convolution units in different branches are different.

In an exemplary embodiment of the present disclosure, the input of the second convolution unit in the plurality of branches is an input of a convolution structure, and the output of the first convolution unit in the plurality of branches is added to obtain an output of the convolution structure.

Fig. 3 shows an example of the above convolution structure, which as shown includes 4 branches, but may also be 3 branches, 2 branches, 5 branches, or more. The first set of kxkconv and BN operators in each branch constitutes a second convolution unit, where the kxkconv and BN operators are drawn in a single box in the figure, but are still represented as K xkconv and BN operators connected in sequence for ease of distinction only when presented. The weight parameters and/or bias parameters of the KxKConv operator in the second convolution unit are randomly generated or are generated by adopting different parameter initialization methods, and K is more than or equal to 2. The second set of kxkconv operators and BN operators in each branch constitutes a first convolution unit. The "Input" in the figure is represented by a box, but it does not represent that an operator is to be set here, and the manner of representing the Input signal to the second convolution unit may be the same as in fig. 1. As shown, the outputs of the first convolution units of these 4 branches add to the Output that results in the convolution structure, output.

The present embodiment also provides a neural network model (corresponding to the first neural network model in the neural network model determining method) for training, including a plurality of convolution structures, where part or all of the convolution structures adopt the multi-branch convolution structure according to any one embodiment of the present disclosure.

When the training neural network model is actually constructed, a first convolution unit in each convolution structure can be constructed first, and then a second convolution unit is added between the input of each branch of the convolution structure and the first convolution unit. The number of channels, convolution kernel size, step size, padding, etc. of the kxkconv operator in the second convolution unit may be the same or partially the same as the kxkconv operator in the first convolution unit, but the disclosure is not limited thereto. The weight parameters and/or bias parameters of the kxkconv operator in the second convolution unit may be randomly generated or generated by using different initialization parameters, so that adding the kxkconv operator and the BN operator may also be referred to as adding a random convolution layer and a BN layer. By this arrangement of random convolution layers, the diversity of branches in a multi-branch convolution can be increased. Meanwhile, other types of operators are not introduced, so that the expression capacity of the model is not reduced, and the performance of the model is attenuated. Since these two added layers of parameters are not updated during training, the parameters of the other layers are updated normally during training. And thus does not unduly increase the complexity of the training and the time of the training.

In an exemplary embodiment of the present disclosure, the output of the multi-branch convolution structure may be connected to a nonlinear operator, as shown in fig. 4A, where the ReLU operator is exemplified, and the ReLU operator, also referred to as a modified linear unit (Rectified Linear Units, referred to as ReLU) operator, is a commonly used activation function in a neural network. The operations performed by the ReLU operator on the input element sequence element by element are: for each element x, outputting x if x is greater than 0; if x is less than or equal to 0, 0 is output. The present disclosure is not limited in this regard and the nonlinear operators herein may be other activation functions such as Sigmoid functions, or other types of nonlinear operators. The network structure obtained by merging the convolution networks shown in fig. 4A is shown in fig. 4B, that is, the multi-branch convolution structures are merged into a kxkconv operator, and the output of the kxkconv operator is connected with a ReLU operator. I.e. in the example shown in fig. 4B, the convolution structures of the multiple branches are combined into one kxkconv operator.

In fig. 5A, a partial network structure of a training res net network model is shown, in which 4 multi-branched convolution structures are included, each convolution structure including 4 branches, corresponding to the specific structure in fig. 3. Each convolution structure is connected to a nonlinear operator (as exemplified by the ReLU function). Wherein the input of the first convolution 10 is added to the output of the second convolution 20 and then subjected to the ReLU function as the input of the third convolution, and the input of the third convolution 30 is added to the output of the fourth convolution 40, which is a structural feature of the res net network. After training the ResNet network model for training, merging each multi-branch convolution structure into a convolution unit, wherein the merged convolution units comprise a KxK Conv operator and a BN operator, as shown in FIG. 5B, and obtaining the ResNet network model for testing after merging. In another embodiment, the BN operator may also be disposed between the multi-branched convolution structure and the nonlinear operator in the training res net network model, so that when merging, each multi-branched convolution structure may be merged into a kxk Conv operator, and the BN operator disposed after the convolution structure still remains, and the network model of fig. 5B may also be obtained. The multi-branch convolution structure and the method for determining the neural network model according to the embodiments of the present disclosure may also be used for other forms of neural network models, such as an acceptance network model, a DenseNet network model, and so on. And will not be described in detail here.

The neural network model of the embodiment of the disclosure has the multi-branch convolution structure provided by the embodiment of the disclosure, so that the neural network model of the embodiment of the disclosure can be used for training, and the expression capability of branches of the multi-branch convolution can be maintained while the diversity of the branches is not lost.

In the ImageNet image classification task, with a ResNet50 model as a reference (Baseline), the best accuracy (Top 1 accuracy) of the ResNet50 model is 76.40%, and Top1 accuracy of the ResNet model trained by using the neural network model determination method disclosed by the disclosure can reach 77.73%.

In Object Detection task (Object Detection) based on the COCO dataset, the average Accuracy (AP) of the centering net model was 29.83 with reference to the centering net model of the target Detection network version res net-18 (Baseline), and the AP of the model centering net model trained using the neural network model determination method of the present disclosure was 31.64.

In an image classification task based on a CityScaps dataset, a PSPNet model of ResNet-18 version is used as a reference, top1 accuracies of the PSPNet model are 70.18%, and Top1 accuracies of the PSPNet model trained by using the neural network model determining method disclosed by the disclosure can reach 72.24%.

The above embodiments of the present disclosure provide for improved diversity of branches in a multi-branch convolution structure by a random convolution layer, and this approach does not require changing the operators (Kx K conv) that perform feature learning within the branches, thereby increasing model performance.

An embodiment of the present disclosure further provides a computer program product, where the computer program product includes a computer program, where the computer program may be stored on a non-transitory computer readable storage medium, where the computer program when executed by a processor can perform the neural network model determining method provided by the foregoing method embodiments, and the method includes: acquiring a first neural network model used in a training stage, wherein the first neural network model comprises a multi-branch convolution structure, the multiple branches of the convolution structure comprise a first convolution unit and a second convolution unit which are connected with each other, and at least part of parameters of the second convolution units in different branches are different; in the process of training the first neural network model, updating parameters of the first convolution unit, wherein parameters of the second convolution unit are kept unchanged; after training, merging the convolution structures into a single convolution unit based on a first convolution unit in the convolution structures to obtain a second neural network model used in the test stage.

In any one or more of the exemplary embodiments described above, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium, and executed by a hardware-based processing unit. The computer-readable medium may comprise a computer-readable storage medium corresponding to a tangible medium, such as a data storage medium, or a communication medium that facilitates transfer of a computer program from one place to another, such as according to a communication protocol. In this manner, a computer-readable medium may generally correspond to a non-transitory tangible computer-readable storage medium or a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.

By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Moreover, any connection may also be termed a computer-readable medium, for example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be appreciated, however, that computer-readable storage media and data storage media do not include connection, carrier wave, signal, or other transitory (transient) media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk or blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The technical solutions of the embodiments of the present disclosure may be implemented in a wide variety of devices or apparatuses, including wireless handsets, integrated Circuits (ICs), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the described techniques, but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit or provided by a collection of interoperable hardware units (including one or more processors as described above) in combination with suitable software and/or firmware.

Claims

1. A neural network model determination method, comprising:

after training, merging the convolution structures into a single convolution unit based on a first convolution unit in the convolution structures to obtain a second neural network model used in the test stage.

2. The method of claim 1, wherein:

the input of the second convolution unit in the plurality of branches is the input of the convolution structure, and the output of the convolution structure is obtained after the outputs of the first convolution units in the plurality of branches are added.

3. The method of claim 1, wherein:

the first convolution unit comprises a KxK convolution operator or comprises a KxK convolution operator and a batch normalization BN operator;

the second convolution unit comprises a KxK convolution operator or comprises a KxK convolution operator and a BN operator, wherein K is more than or equal to 2.

4. A method as claimed in claim 3, wherein:

the weight parameters and/or bias parameters of the KxK convolution operator in the second convolution unit are randomly generated; alternatively, the weight parameters and/or bias parameters of the kxk convolution operator in the second convolution unit are generated by using different parameter initialization methods.

5. The method of claim 1, wherein:

the single convolution unit formed by combining the convolution structures comprises a K x K convolution operator or comprises a K x K convolution operator and a BN operator.

6. The method of claim 1, wherein:

the first convolution unit includes a kxk convolution operator and a BN operator, and the merging the convolution structures into a single convolution unit based on the first convolution unit in the convolution structure includes:

removing a second convolution unit in a plurality of branches of the convolution structure;

combining a KxK convolution operator and a BN operator of a first convolution unit in a plurality of branches of the convolution structure respectively to obtain a plurality of intermediate convolution operators connected in parallel;

combining the plurality of parallel intermediate convolution operators into a single convolution operator, the single convolution operator constituting the single convolution unit.

7. The method of claim 1, wherein:

combining K multiplied by K convolution operators and BN operators of a first convolution unit in M-1 branches of the convolution structure respectively to obtain M-1 intermediate convolution operators connected in parallel, wherein M is the number of branches in the convolution structure;

combining the M-1 intermediate convolution operators in parallel with the KxK convolution operators in one branch which is not combined by the convolution structure into a single convolution operator, wherein the single convolution operator and the BN operator in one branch which is not combined by the convolution structure form the single convolution unit.

8. The method of claim 1, wherein:

the first neural network model comprises a plurality of convolution structures, the number N of branches of the convolution structures is the same or different, and at least part of channel numbers, convolution kernel sizes, step sizes and filling of the first convolution unit and the second convolution unit are the same, wherein N is more than or equal to 2 and less than or equal to 4.

9. The method of any one of claims 1 to 8, wherein:

the method further comprises the steps of: and testing the second neural network model, and determining the second neural network as the neural network model used in the deployment stage after the test is passed.

10. A multi-branched convolution structure comprising a plurality of branches, each of the plurality of branches of the convolution structure comprising a first convolution element and a second convolution element connected to each other, at least some of the parameters of the second convolution elements in different branches being different.

11. The convolution structure of claim 10, wherein:

the input of the second convolution unit in the plurality of branches is the input of the convolution structure, and the output of the convolution structure is obtained after the output of the first convolution unit in the plurality of branches is added; the first convolution unit and the second convolution symbol unit comprise a KxK convolution operator and a batch normalization BN operator, weight parameters and/or bias parameters of the KxK convolution operator in the second convolution unit are randomly generated or generated by adopting different parameter initialization methods, and K is more than or equal to 2.

12. A neural network model for training comprising a plurality of convolution structures, wherein some or all of the plurality of convolution structures employ the multi-branched convolution structure of claim 10 or 11.

13. A neural network model determination device comprising a processor and a memory storing a computer program, wherein the processor is capable of implementing the neural network model determination method of any one of claims 1 to 9 when executing the computer program.

14. A computer program product comprising a computer program storable on a non-transitory computer readable storage medium, characterized in that the computer program, when executed by a processor, is capable of implementing the neural network model determination method according to any one of claims 1 to 9.

15. A non-transitory computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, is capable of implementing the neural network model determination method according to any one of claims 1 to 9.