CN112069370A

CN112069370A - Neural network structure search method, device, medium and device

Info

Publication number: CN112069370A
Application number: CN201910503236.XA
Authority: CN
Inventors: 方杰民; 张骞
Original assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Current assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date: 2019-06-11
Filing date: 2019-06-11
Publication date: 2020-12-11
Anticipated expiration: 2039-06-11
Also published as: CN112069370B

Abstract

A neural network structure search method, device, medium and device are disclosed. The method for searching a neural network structure includes: acquiring a first neural network including a plurality of blocks with different numbers of channels, wherein at least one block of the first neural network is connected to at least three blocks, and at least one block includes a block for executing The first layer of block channel number and spatial resolution conversion; according to the sample data in the first data set, using a gradient-based search strategy, the first neural network is subjected to neural network structure search processing, and the first neural network is obtained. The structure parameters of the first neural network are determined; according to the structure parameters of the first neural network, the structure of the second neural network obtained by the search is determined. The technical solution provided by the present disclosure is beneficial to improve the flexibility of the neural network structure search, thereby improving the performance and diversity of the neural network obtained by the search.

Description

Neural network structure search method, device, medium and device

技术领域technical field

本公开涉及计算机视觉技术，尤其是一种神经网络结构搜索方法、神经网络结构搜索装置、存储介质以及电子设备。The present disclosure relates to computer vision technology, in particular to a neural network structure search method, a neural network structure search device, a storage medium, and an electronic device.

背景技术Background technique

在计算机视觉技术领域中，设计神经网络的结构，并调整设计好结构的神经网络中的各参数，往往需要耗费大量的人力成本、算力成本以及时间成本。In the field of computer vision technology, designing the structure of a neural network and adjusting the parameters of the neural network with a well-designed structure often requires a lot of labor costs, computing power costs, and time costs.

如何便捷的设计神经网络的结构，是一个值得关注的技术问题。How to conveniently design the structure of neural network is a technical issue worthy of attention.

发明内容SUMMARY OF THE INVENTION

为了解决上述技术问题，提出了本公开。本公开的实施例提供了一种神经网络结构搜索方法、神经网络结构搜索装置、存储介质和电子设备。In order to solve the above-mentioned technical problems, the present disclosure is made. Embodiments of the present disclosure provide a neural network structure search method, a neural network structure search apparatus, a storage medium, and an electronic device.

根据本公开实施例的一方面，提供一种神经网络结构搜索方法，包括：获取包括多个通道数不相同的块的第一神经网络，其中，所述第一神经网络中的至少一块与至少三个块连接，至少一块包括用于执行块通道数和空间分辨率转换的头层；根据第一数据集合中的样本数据，利用基于梯度的搜索策略，对所述第一神经网络进行神经网络结构搜索处理，获得所述第一神经网络的结构参数；根据所述第一神经网络的结构参数，确定搜索获得的第二神经网络结构。According to an aspect of the embodiments of the present disclosure, a method for searching a neural network structure is provided, including: acquiring a first neural network including a plurality of blocks with different numbers of channels, wherein at least one block in the first neural network is related to at least one block in the first neural network. Three blocks are connected, and at least one block includes a head layer for performing block channel number and spatial resolution conversion; according to the sample data in the first data set, a gradient-based search strategy is used to perform a neural network on the first neural network. In the structure search process, the structure parameters of the first neural network are obtained; the second neural network structure obtained by searching is determined according to the structure parameters of the first neural network.

根据本公开实施例的另一方面，提供一种神经网络结构搜索装置，包括：获取模块，用于获取包括多个通道数不相同的块的第一神经网络，其中，所述第一神经网络中的至少一块与至少三个块连接，至少一块包括用于执行块通道数和空间分辨率转换的头层；搜索模块，用于根据第一数据集合中的样本数据，利用基于梯度的搜索策略，对所述获取模块获取的第一神经网络进行神经网络结构搜索处理，获得所述第一神经网络的结构参数；确定模块，用于根据所述搜索模块获得的第一神经网络的结构参数，确定搜索获得的第二神经网络结构。According to another aspect of the embodiments of the present disclosure, there is provided an apparatus for searching a neural network structure, including: an acquisition module configured to acquire a first neural network including a plurality of blocks with different numbers of channels, wherein the first neural network At least one of the blocks is connected to at least three blocks, and at least one block includes a head layer for performing block channel number and spatial resolution conversion; a search module for using a gradient-based search strategy according to the sample data in the first data set , performing a neural network structure search process on the first neural network obtained by the obtaining module to obtain the structural parameters of the first neural network; the determining module is used for obtaining the structural parameters of the first neural network according to the searching module, The second neural network structure obtained by the search is determined.

根据本公开实施例的再一方面，提供了一种计算机可读存储介质，所述存储介质存储有计算机程序，所述计算机程序用于执行上述神经网络结构搜索方法。According to yet another aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, where the storage medium stores a computer program, and the computer program is used to execute the above-mentioned neural network structure search method.

根据本公开实施例的又一方面，提供了一种电子设备，该电子设备包括：处理器；用于存储所述处理器可执行指令的存储器；所述处理器，用于从所述存储器中读取所述可执行指令，并执行所述指令以实现上述神经网络结构搜索方法。According to yet another aspect of the embodiments of the present disclosure, there is provided an electronic device, the electronic device comprising: a processor; a memory for storing instructions executable by the processor; the processor for retrieving from the memory The executable instructions are read and executed to implement the above-described neural network structure search method.

基于本公开上述实施例提供的一种神经网络结构搜索方法和神经网络结构搜索装置，由于本公开中的块可以与至少三个块连接，因此，本公开的第一神经网络可以认为是基于密集连接的搜索空间；由于本公开中的块包含用于执行块通道数转换的头层，因此，相互连接的块可以具有不同的通道数，从而本公开允许第一神经网络中可以包括通道数不相同的多个块，通过在包括多个通道数不相同的块的第一神经网络中，进行神经网络结构搜索处理，不仅可以实现对神经网络结构的深度搜索，还可以实现对神经网络结构的通道数搜索。由此可知，本公开提供的技术方案有利于提高神经网络结构搜索的灵活性，从而有利于提高搜索获得的神经网络的性能以及多样性。Based on the neural network structure search method and neural network structure search device provided by the above embodiments of the present disclosure, since the blocks in the present disclosure can be connected to at least three blocks, the first neural network of the present disclosure can be considered to be based on dense Connected search space; since the blocks in this disclosure contain a head layer for performing block channel number conversion, interconnected blocks can have different numbers of channels, so the present disclosure allows the first neural network to include different numbers of channels For the same multiple blocks, by performing the neural network structure search process in the first neural network including multiple blocks with different number of channels, not only the deep search of the neural network structure can be realized, but also the deep search of the neural network structure can be realized. Channel number search. It can be seen from this that the technical solution provided by the present disclosure is beneficial to improve the flexibility of the neural network structure search, thereby improving the performance and diversity of the neural network obtained by the search.

下面通过附图和实施例，对本公开的技术方案做进一步的详细描述。The technical solutions of the present disclosure will be further described in detail below through the accompanying drawings and embodiments.

附图说明Description of drawings

构成说明书的一部分的附图描述了本公开的实施例，并且连同描述一起用于解释本公开的原理。The accompanying drawings, which form a part of the specification, illustrate embodiments of the present disclosure and together with the description serve to explain the principles of the present disclosure.

参照附图，根据下面的详细描述，可以更加清楚地理解本公开，其中：The present disclosure may be more clearly understood from the following detailed description with reference to the accompanying drawings, wherein:

图1为本公开的神经网络结构搜索方法的一个例子的流程图；1 is a flowchart of an example of the disclosed neural network structure search method;

图2为本公开的第一神经网络的一个例子的结构示意图；FIG. 2 is a schematic structural diagram of an example of the first neural network of the present disclosure;

图3为本公开的并行层所包含的MBConv结构的一个例子的结构示意图；3 is a schematic structural diagram of an example of an MBConv structure included in the parallel layer of the present disclosure;

图4为本公开的堆叠层所包含的MBConv结构的一个例子的结构示意图；FIG. 4 is a schematic structural diagram of an example of an MBConv structure included in the stacked layers of the present disclosure;

图5为本公开的基于可变式分组的卷积模块一个实施例的结构示意图；5 is a schematic structural diagram of an embodiment of a variable grouping-based convolution module of the present disclosure;

图6为本公开的神经网络结构搜索装置的一个例子的结构示意图；6 is a schematic structural diagram of an example of the disclosed neural network structure search apparatus;

图7为本公开一示例性实施例提供的电子设备的结构图。FIG. 7 is a structural diagram of an electronic device provided by an exemplary embodiment of the present disclosure.

具体实施方式Detailed ways

下面将参考附图详细地描述根据本公开的示例实施例。显然，所描述的实施例仅仅是本公开的一部分实施例，而不是本公开的全部实施例，应理解，本公开不受这里描述的示例实施例的限制。Example embodiments according to the present disclosure will be described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only some of the embodiments of the present disclosure, not all of the embodiments of the present disclosure, and it should be understood that the present disclosure is not limited by the example embodiments described herein.

应注意到：除非另外具体说明，否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。It should be noted that the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

本领域技术人员可以理解，本公开实施例中的“第一”、“第二”等术语仅用于区别不同步骤、设备或模块等，既不代表任何特定技术含义，也不表示它们之间的必然逻辑顺序。Those skilled in the art can understand that terms such as "first" and "second" in the embodiments of the present disclosure are only used to distinguish different steps, devices, or modules, etc., and neither represent any specific technical meaning, nor represent any difference between them. the necessary logical order of .

还应理解，在本公开实施例中，“多个”可以指两个或者两个以上，“至少一个”可以指一个、两个或两个以上。It should also be understood that, in the embodiments of the present disclosure, "a plurality" may refer to two or more than two, and "at least one" may refer to one, two or more than two.

还应理解，对于本公开实施例中提及的任一部件、数据或结构，在没有明确限定或者在前后文给出相反启示的情况下，一般可以理解为一个或多个。It should also be understood that any component, data or structure mentioned in the embodiments of the present disclosure can generally be understood as one or more in the case of no explicit definition or contrary indications given in the context.

另外，本公开中术语“和/或”，仅是一种描述关联对象的关联关系，表示可以存在三种关系，如A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。另外，本公开中字符“/”，一般表示前后关联对象是一种“或”的关系。In addition, the term "and/or" in the present disclosure is only an association relationship to describe associated objects, which means that there can be three kinds of relationships, such as A and/or B, which can mean that A exists alone, A and B exist simultaneously, There are three cases of B alone. In addition, the character "/" in the present disclosure generally indicates that the related objects are an "or" relationship.

还应理解，本公开对各个实施例的描述着重强调各个实施例之间的不同之处，其相同或相似之处可以相互参考，为了简洁，不再一一赘述。It should also be understood that the description of the various embodiments in the present disclosure emphasizes the differences between the various embodiments, and the same or similar points can be referred to each other, and for the sake of brevity, they will not be repeated.

同时，应当明白，为了便于描述，附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。Meanwhile, it should be understood that, for the convenience of description, the dimensions of various parts shown in the accompanying drawings are not drawn in an actual proportional relationship.

以下对至少一个示例性实施例的描述实际上仅仅是说明性的，决不作为对本公开及其应用或使用的任何限制。The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application or uses in any way.

对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论，但在适当情况下，所述技术、方法和设备应当被视为说明书的一部分。Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods, and apparatus should be considered part of the specification.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步讨论。It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further discussion in subsequent figures.

本公开的实施例可以应用于终端设备、计算机系统、服务器等电子设备，其可与众多其它通用或者专用计算系统环境或配置一起操作。适于与终端设备、计算机系统或者服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子包括但不限于：个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统、大型计算机系统和包括上述任何系统的分布式云计算技术环境等等。Embodiments of the present disclosure may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which may operate in conjunction with numerous other general-purpose or special-purpose computing system environments or configurations. Examples of well-known terminal equipment, computing systems, environments and/or configurations suitable for use with electronic equipment such as terminal equipment, computer systems or servers include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients computer, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, minicomputer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the foregoing, among others.

终端设备、计算机系统、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常，程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等，它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施。在分布式云计算环境中，任务可以是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中，程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。Electronic devices such as terminal devices, computer systems, servers, etc., may be described in the general context of computer system-executable instructions, such as program modules, being executed by the computer system. Generally, program modules may include routines, programs, object programs, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment. In a distributed cloud computing environment, tasks may be performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located on local or remote computing system storage media including storage devices.

本公开概述SUMMARY OF THE DISCLOSURE

在实现本公开的过程中，发明人发现，在搜索空间(Search Space)中进行神经网络结构搜索(Neural Architecture Search，NAS)处理，以获得神经网络的结构，已经成为计算机视觉技术中的一个重要课题。In the process of realizing the present disclosure, the inventors found that performing Neural Architecture Search (NAS) processing in the search space (Search Space) to obtain the structure of the neural network has become an important aspect of computer vision technology. subject.

在现有的神经网络结构搜索技术中，通常利用超级网络(Super Network)来表示搜索空间。超级网络中包括多个块(Block)，且多个块之间通常采用顺序单连接方式，即其中一个块通常会与一个上游块和一个下游块分别连接。一个块中通常包括多层。通过在超级网络中搜索可以获得目标神经网络可能包括的层。然而，该方式通常只能实现神经网络的深度搜索，无法实现神经网络的宽度搜索，这会使从超级网络中搜索获得的目标神经网络的结构受到一定的限制。In the existing neural network structure search technology, a super network (Super Network) is usually used to represent the search space. A super network includes multiple blocks (Blocks), and a sequential single connection method is usually adopted between multiple blocks, that is, one block is usually connected to an upstream block and a downstream block respectively. A block usually includes multiple layers. The possible layers of the target neural network can be obtained by searching in the super network. However, this method usually only realizes the deep search of the neural network, but cannot realize the width search of the neural network, which will limit the structure of the target neural network obtained from the super network search.

示例性概述Exemplary overview

假定需要获得的目标神经网络包括多个顺序连接的块，且目标神经网络网络所包含的所有块的通道数和空间分辨率需要通过搜索获得。在算力不高的环境中，利用本公开提供的神经网络结构搜索技术，可以快速的从超级网络中获得目标神经网络。例如，在包含有四个GPU(Graphics Processing Unit，图形处理器)的算力环境中，可以在仅花费数十个小时左右的时间成本的情况下，实现目标神经网络的宽度搜索和深度搜索，最终获得目标神经网络。It is assumed that the target neural network to be obtained includes multiple sequentially connected blocks, and the number of channels and spatial resolutions of all the blocks contained in the target neural network need to be obtained by searching. In an environment with low computing power, the target neural network can be quickly obtained from the super network by using the neural network structure search technology provided by the present disclosure. For example, in a computing power environment containing four GPUs (Graphics Processing Unit, graphics processing unit), it is possible to realize the width search and depth search of the target neural network at a time cost of only about tens of hours. Finally, the target neural network is obtained.

示例性方法Exemplary method

图1为本公开的神经网络结构搜索方法的流程图。如图1所示，该实施例的方法包括步骤：S100、S101以及S102。下面对各步骤分别进行说明。FIG. 1 is a flowchart of the disclosed neural network structure search method. As shown in FIG. 1 , the method of this embodiment includes steps: S100 , S101 and S102 . Each step will be described below.

S100、获取包括多个通道数不相同的块的第一神经网络。S100. Acquire a first neural network including a plurality of blocks with different numbers of channels.

本公开中的第一神经网络可以称为搜索空间或者超级网络。第一神经网络包括：多个块。第一神经网络中的至少一块与多个块相连接，例如，第一神经网络中的其中的一块与至少三个块相连接。再例如，第一神经网络中的除了预定块之外的所有块均与至少三个块相连接。预定块可以包括：位于第一神经网络头部的两个块以及位于第一神经网络尾部的两个块等。本公开中的第一神经网络可以称为基于密集连接的超级网络。第一神经网络所包含的所有块的通道数可以不相同。The first neural network in this disclosure may be referred to as a search space or a super network. The first neural network includes: a plurality of blocks. At least one block of the first neural network is connected to a plurality of blocks, eg, one of the blocks of the first neural network is connected to at least three blocks. For another example, all blocks except the predetermined block in the first neural network are connected with at least three blocks. The predetermined blocks may include: two blocks located at the head of the first neural network, two blocks located at the tail of the first neural network, and the like. The first neural network in this disclosure may be referred to as a dense connection based super network. The number of channels of all blocks included in the first neural network may be different.

本公开的第一神经网络中的至少一个块(例如，所有块)包括：头层和至少一堆叠层。头层可以认为是块中的第一层。头层用于执行块通道数以及空间分辨率转换，从而使块中的位于头层之后的层可以针对与其所在块相连接的所有上游块的输出，进行处理。At least one block (eg, all blocks) in the first neural network of the present disclosure includes a head layer and at least one stacked layer. The header layer can be thought of as the first layer in a block. The header layer is used to perform block channel number and spatial resolution conversion, so that the layers behind the header layer in the block can process the output of all upstream blocks connected to the block in which it is located.

本公开中的块的通道数可以称为块的宽度。块的通道数可以认为是块中的位于头层之后的各层所对应的输入通道数以及输出通道数。块的空间分辨率可以称为块的长和高。块的空间分辨率可以认为是块中的位于头层之后的各层所对应的空间分辨率。例如，块中的位于头层之后的各层的输入特征图的通道数和长高可以认为是块的通道数和长高。由此可知，本公开的块中的位于头层之后的各层对应相同的通道数和空间分辨率。The number of channels of a block in the present disclosure may be referred to as the width of the block. The number of channels in a block can be considered as the number of input channels and the number of output channels corresponding to each layer located after the head layer in the block. The spatial resolution of a block can be referred to as the length and height of the block. The spatial resolution of a block can be considered as the spatial resolution corresponding to each layer located after the head layer in the block. For example, the number of channels and the height of the input feature maps of the layers located after the head layer in the block can be considered as the number of channels and the length of the block. It can be seen from this that each layer located after the head layer in the block of the present disclosure corresponds to the same number of channels and spatial resolution.

S110、根据第一数据集合中的样本数据，利用基于梯度的搜索策略，对第一神经网络进行神经网络结构搜索处理，获得第一神经网络的结构参数。S110. According to the sample data in the first data set, using a gradient-based search strategy, perform a neural network structure search process on the first neural network to obtain structural parameters of the first neural network.

本公开中的基于梯度的搜索策略可以认为是：对损失函数进行求导处理，并使求导后的损失函数逐步趋近于最小值。本公开可以将第一数据集合中的样本数据提供给第一神经网络，获得第一神经网络输出的处理结果，根据处理结果和样本数据之间的差异、以及求导后的损失函数，执行反向传播操作，以调整第一神经网络的结构参数。本公开中的第一神经网络的结构参数可以是指：针对第一神经网络的结构而设置的参数，第一神经网络的结构参数通常可以体现出第一神经网络中的相应结构，成为第二神经网络中的结构的可能性。The gradient-based search strategy in the present disclosure can be considered as: derivation of the loss function, and gradually approach the minimum value of the loss function after the derivation. The present disclosure can provide the sample data in the first data set to the first neural network, obtain the processing result output by the first neural network, and perform the reverse operation according to the difference between the processing result and the sample data and the loss function after derivation. A forward propagation operation to adjust the structural parameters of the first neural network. The structural parameters of the first neural network in the present disclosure may refer to: parameters set for the structure of the first neural network, and the structural parameters of the first neural network may generally reflect the corresponding structure in the first neural network, and become the second neural network. Possibilities of structures in neural networks.

S102、根据第一神经网络的结构参数，确定搜索获得的第二神经网络结构。S102. Determine the second neural network structure obtained by searching according to the structural parameters of the first neural network.

本公开中的第二神经网络可以称为目标神经网络。即通过在搜索空间进行神经网络结构搜索而获得的神经网络。第二神经网络可以认为是第一神经网络结构的一个子结构。The second neural network in the present disclosure may be referred to as a target neural network. That is, the neural network obtained by searching the neural network structure in the search space. The second neural network can be considered as a substructure of the first neural network structure.

由于本公开中的块可以与至少三个块连接，例如，一个块可以与至少三个上游块分别连接；因此，本公开的第一神经网络可以认为是基于密集连接的搜索空间或者基于密集连接的超级网络；本公开通过在块内设置用于执行块通道数转换的头层，可以使与一个块相连接的所有上游块具有不同的通道数，因此，本公开允许第一神经网络中的一个块的所有上游块可以包括通道数不相同的多个块。通过在基于密集连接的第一神经网络中，进行神经网络结构搜索处理，不仅可以实现神经网络的深度搜索，还可以实现神经网络的通道数搜索。由此可知，本公开提供的技术方案有利于提高神经网络结构搜索的灵活性，从而有利于提高搜索获得的神经网络的性能以及多样性。Since a block in the present disclosure can be connected to at least three blocks, for example, one block can be connected to at least three upstream blocks respectively; therefore, the first neural network of the present disclosure can be considered as a dense connection-based search space or a dense-connection-based search space the super network; the present disclosure can make all upstream blocks connected to a block have different channel numbers by setting a head layer for performing block channel number conversion within the block, so the present disclosure allows the first neural network in the first neural network. All upstream blocks of a block may include multiple blocks with different numbers of channels. By performing the neural network structure search process in the first neural network based on dense connection, not only the depth search of the neural network, but also the channel number search of the neural network can be realized. It can be seen from this that the technical solution provided by the present disclosure is beneficial to improve the flexibility of the neural network structure search, thereby improving the performance and diversity of the neural network obtained by the search.

在一个可选示例中，本公开获得第一神经网络的方式包括但不限于：通过数据导入的方式获得第一神经网络、或者根据预设信息生成第一神经网络。预设信息可以认为是：用于生成第一神经网络的软件程序的输入参数。预设信息可以包括但不限于：各块的通道数、以及与块连接的上游块数量等。另外，预设信息还可以包括：各块的空间分辨率、以及块所包含的堆叠层数量等。In an optional example, the manner of obtaining the first neural network in the present disclosure includes but is not limited to: obtaining the first neural network by means of data import, or generating the first neural network according to preset information. The preset information can be considered as input parameters of the software program for generating the first neural network. The preset information may include, but is not limited to, the number of channels of each block, the number of upstream blocks connected to the block, and the like. In addition, the preset information may further include: the spatial resolution of each block, the number of stacked layers included in the block, and the like.

在一个可选示例中，本公开获取包括多个通道数不相同的块的第一神经网络的过程可以包括下述两个步骤：In an optional example, the process of obtaining the first neural network including a plurality of blocks with different numbers of channels in the present disclosure may include the following two steps:

步骤1、获取第一神经网络所需的所有块的通道数和与块连接的上/下游块数量。Step 1. Obtain the number of channels of all blocks required by the first neural network and the number of upstream/downstream blocks connected to the block.

可选的，本公开的块的通道数可以认为是：块中的位于头层之后的各层所对应的输入通道数以及输出通道数。块的空间分辨率可以称为块的长和高，且块的长和高可以相等，当然，本公开也不排除长和高不相等的情况。块的空间分辨率可以认为是块中的位于头层之后的各层所对应的空间分辨率。例如，块中的位于头层之后的各层的输入特征图的通道数以及长和高可以认为是块的通道数以及长和高。本公开的块中的位于头层之后的各层对应相同的通道数和空间分辨率。Optionally, the number of channels of a block in the present disclosure may be considered as: the number of input channels and the number of output channels corresponding to each layer located after the head layer in the block. The spatial resolution of the block may be referred to as the length and height of the block, and the length and height of the block may be equal. Of course, the present disclosure does not exclude the case where the length and height are not equal. The spatial resolution of a block can be considered as the spatial resolution corresponding to each layer located after the head layer in the block. For example, the number of channels and the length and height of the input feature maps of the layers located after the head layer in the block can be considered as the number of channels and the length and height of the block. The layers located after the header layer in the block of the present disclosure correspond to the same number of channels and spatial resolution.

可选的，本公开中的所有块的通道数不相同。即本公开中的所有块并不是具有同一个通道数。例如，所有块中的一部分块的通道数均为第一通道数，而所有块中的另一部分块的通道数均为第二通道数，且第一通道数和第二通道数不相同。再例如，所有块中并不存在通道数相同的两个块。也就是说，假定第一神经网络所需的所有块的数量为N，则在N个块的通道数中，存在M种不同的通道数，其中，N和M均为大于2的正整数，且M小于等于N。本公开中的N个块所形成的块集合Arch可以表示为：Arch＝{B₁、B₂、B₃、......、B_N}。Optionally, all blocks in the present disclosure have different numbers of channels. That is, all blocks in the present disclosure do not have the same number of channels. For example, the number of channels of a part of the blocks in all the blocks is the first number of channels, and the number of channels of another part of the blocks of all the blocks is the number of the second channels, and the number of the first channels and the number of the second channels are different. For another example, there are no two blocks with the same number of channels in all blocks. That is to say, assuming that the number of all blocks required by the first neural network is N, there are M different channel numbers among the channel numbers of the N blocks, where N and M are both positive integers greater than 2, And M is less than or equal to N. The block set Arch formed by N blocks in the present disclosure can be expressed as: Arch={B ₁ , B ₂ , B ₃ , . . . , B _N }.

可选的，对于所有块中的第i块(即B_i)而言，本公开的与块连接的上游块数量是指：位于第i块(除了位于第一神经网络的头部的预定块之外的其他块，例如，除了位于第一神经网络的头部的前两块之外的其他块，预定块的数量取决于上游块数量，如预定块的数量为上游块数量减一)之前，且与第i块相邻并连接的块的数量。相应的，本公开的与块连接的下游块数量是指：位于第i块(除了位于第一神经网络的尾部的预定块之外的其他块，例如，除了位于第一神经网络的尾部的最后两块之外的其他块，预定块的数量取决于下游块数量，如预定块的数量为下游块数量减一)之后，且与第i块相邻并连接的块的数量。在获取到与块连接的上游块数量的情况下，预示着第一神经网络中的除了最前面的第一块至第x块之外的其他各块的上/下游块数量相同。其中的x为上游块数量减一。相应的，在获取到与块连接的下游块数量的情况下，预示着第一神经网络中的除了最后一块至最后第x块之外的其他块的下游块数量相同。其中的x为下游块数量减一。本公开可以根据实际需求，来设置第一神经网络所需的所有块的通道数和与块连接的上/下游块数量。本公开对此不作限定。Optionally, for the ith block (that is, B _i ) in all blocks, the number of upstream blocks connected to the block in the present disclosure refers to: other blocks, for example, other than the first two blocks located at the head of the first neural network, the number of predetermined blocks depends on the number of upstream blocks, such as the number of predetermined blocks is the number of upstream blocks minus one) before , and the number of blocks adjacent and connected to the ith block. Correspondingly, the number of downstream blocks connected to the block in the present disclosure refers to: blocks located in the i-th block (except for the predetermined block located at the tail of the first neural network, for example, except for the last block located at the tail of the first neural network. For blocks other than the two blocks, the number of predetermined blocks depends on the number of downstream blocks, for example, the number of predetermined blocks is the number of blocks adjacent to and connected to the ith block after the number of downstream blocks minus one). When the number of upstream blocks connected to the block is obtained, it indicates that the number of upstream/downstream blocks in the first neural network is the same except for the first block to the xth block at the front. where x is the number of upstream blocks minus one. Correspondingly, when the number of downstream blocks connected to the block is obtained, it indicates that the number of downstream blocks of other blocks in the first neural network except the last block to the last xth block is the same. where x is the number of downstream blocks minus one. The present disclosure can set the number of channels of all blocks and the number of upstream/downstream blocks connected to the blocks required by the first neural network according to actual requirements. The present disclosure does not limit this.

需要特别说明的是，本公开的第一神经网络所需的所有块，在根据块的通道数递增排序时，前后相邻的多个块可以具有相同的通道数增幅，例如，预先设置多种通道数增幅，在根据块的通道数递增，对所有块进行排序后，所有块可以根据通道数增幅被划分为多个组，每个组中的所有块对应同一个通道数增幅，不同组对应不同的通道数增幅。可选的，同一组中的所有块通常具有相同的空间分辨率，同一组中的所有块中的任意两个块的通道数通常不相同。It should be noted that, when all the blocks required by the first neural network of the present disclosure are sorted according to the increasing number of channels of the blocks, multiple adjacent blocks may have the same increase in the number of channels. The number of channels is increased. After sorting all the blocks according to the number of channels of the block, all the blocks can be divided into multiple groups according to the increase of the number of channels. All blocks in each group correspond to the same increase of the number of channels, and different groups correspond to Different channel count increments. Optionally, all blocks in the same group usually have the same spatial resolution, and the number of channels of any two blocks in all blocks in the same group is usually different.

步骤2、根据上述上游块数量，按照块的通道数递增，确定第一神经网络所需的所有块的连接关系。Step 2: Determine the connection relationship of all the blocks required by the first neural network according to the above-mentioned number of upstream blocks and increasing the number of channels of the blocks.

可选的，本公开可以根据所有块的通道数，按照块的通道数递增的顺序，对所有块进行排序，并利用上述上/下游块数量确定每一个块的上/下游块，通过确定各块和相应的上/下游块之间的连接关系，形成第一神经网络的块连接架构。Optionally, the present disclosure may sort all blocks according to the number of channels of all blocks and in an increasing order of the number of channels of the blocks, and determine the upstream/downstream blocks of each block by using the above-mentioned number of upstream/downstream blocks. The connection relationship between the blocks and the corresponding upstream/downstream blocks forms the block connection architecture of the first neural network.

可选的，假定本公开中的第一神经网络所需的所有块的数量为n+1，其中的n为大于5的正整数，且n+1个块的通道数分别为：C₀、C₀+c、C₀+2c、C₀+3c、C₀+4c、C₀+5c、……、C₀+nc，其中的C₀为大于1的正整数(如C₀＝3)，假定与块连接的上/下游块数量为4，则本公开所形成的第一神经网络的块连接架构的一个例子以如图2所示。Optionally, it is assumed that the number of all blocks required by the first neural network in the present disclosure is n+1, where n is a positive integer greater than 5, and the channel numbers of the n+1 blocks are: C ₀ , C ₀ +c, C ₀ +2c, C ₀ +3c, C ₀ +4c, C ₀ +5c, ..., C ₀ +nc, where C ₀ is a positive integer greater than 1 (eg C ₀ =3) , assuming that the number of upstream/downstream blocks connected to the block is 4, an example of the block connection architecture of the first neural network formed by the present disclosure is shown in FIG. 2 .

本公开通过获得所有块的通道数以及与块连接的上/下游块数量，并通过根据上/下游块数量，按照块的通道数递增的顺序，确定所有块的连接关系，为形成第一神经网络提供了一种可行的实现方式，并有利于降低神经网络结构搜索处理所消耗的算力。The present disclosure obtains the number of channels of all blocks and the number of upstream/downstream blocks connected to the block, and determines the connection relationship of all blocks according to the number of upstream/downstream blocks and in an increasing order of the number of channels of the blocks, so as to form a first neural network. The network provides a feasible implementation and helps to reduce the computational power consumed by the neural network structure search process.

在一个可选示例中，本公开可以根据块的空间分辨率以及块的通道数的增幅，将形成的第一神经网络划分为多个阶段，一个阶段可以包括：一个块或者多个块。一个阶段中的所有块的空间分辨率通常是相同的。一个阶段中的所有块的通道数通常不相同。一个阶段中的所有块的通道数的增幅通常是相同的，即一个阶段具有一个通道数增幅。第一神经网络中的一个阶段可以认为是前述的一个组。在第一神经网络中的每一个阶段分别具有一个通道数增幅的情况下，对于第一神经网络中的前后相邻的两个阶段而言，前一个阶段的通道数增幅通常小于后一个阶段的通道数增幅。可选的，对于第i块而言，如果第i-1块与第i块前后相邻且连接，且第i-1块为第i块的上游块，则第i块的长与第i-1块的长的比值通常不大于2，第i块的高与第i-1块的高的比值通常不大于2。In an optional example, the present disclosure may divide the formed first neural network into multiple stages according to the spatial resolution of the block and the increase in the number of channels of the block, and one stage may include one block or multiple blocks. The spatial resolution of all blocks in a stage is usually the same. The number of channels is usually not the same for all blocks in a stage. The increment of the number of channels is usually the same for all blocks in a stage, ie a stage has a increment of the number of channels. A stage in the first neural network can be thought of as a group of the aforementioned. In the case where each stage in the first neural network has an increase in the number of channels, for two adjacent stages in the first neural network, the increase in the number of channels in the former stage is usually smaller than that in the latter stage. increase in the number of channels. Optionally, for the ith block, if the ith block is adjacent to and connected to the ith block, and the ith block is the upstream block of the ith block, the length of the ith block is the same as that of the ith block. The ratio of the length of the -1 block is usually not greater than 2, and the ratio of the height of the i-th block to the height of the i-1-th block is usually not greater than 2.

在一个可选示例中，在第一神经网络中的顺序连接的三个块中，其中的第一个块相对于中间块的通道数增幅，通常不大于中间块相对于最后一个块的通道数增幅。例如，如果顺序连接的三个块属于同一个阶段，则第一个块相对于中间块的通道数增幅，通常等于中间块相对于最后一个块的通道数增幅。再例如，如果顺序连接的三个块属于两个阶段，则第一个块相对于中间块的通道数增幅，通常小于中间块相对于最后一个块的通道数增幅。In an optional example, among the three sequentially connected blocks in the first neural network, the increase in the number of channels of the first block relative to the intermediate block is usually not greater than the number of channels of the intermediate block relative to the last block. increase. For example, if three blocks connected in sequence belong to the same stage, the increase in the number of channels of the first block relative to the middle block is usually equal to the increase of the number of channels of the middle block relative to the last block. For another example, if three blocks connected in sequence belong to two stages, the increase in the number of channels of the first block relative to the middle block is usually smaller than the increase of the number of channels of the middle block relative to the last block.

本公开通过使第一神经网络中的排列在前的两个块之间的通道数增幅，小于排列在后的两个块之间的通道数增幅，有利于降低在第一神经网络中进行神经网络结构搜索处理，所消耗的算力。By making the increase of the number of channels between the two blocks arranged in the first neural network smaller than the increase of the number of channels between the two blocks arranged in the back, the present disclosure is beneficial to reduce the number of neural networks in the first neural network. The computing power consumed by the network structure search processing.

在一个可选示例中，本公开在形成第一神经网络的过程中，还需要设置每一个块中的头层以及头层之后的各层。本公开中的头层可以由多个并行层组成，头层所包含的并行层的数量通常与预先设置的上/下游块数量相同。本公开中的并行层是指：并列设置的层。即一个头层中的各并行层之间不存在上下游关系。一个并行层对应与其所在块相邻并连接的一个上游块。一个块中的不同并行层对应与其所在块相邻并连接的不同上游块。一个块中的所有并行层对应与其所在块相邻并连接的所有上游块。也就是说，一个块通过其头层对各上游块的输出信息(例如，特征图等)进行处理，以便于使该块中的位于头层之后的各层可以对各上游块的输出信息进行处理。In an optional example, in the process of forming the first neural network, the present disclosure also needs to set the head layer in each block and the layers after the head layer. The header layer in the present disclosure may be composed of multiple parallel layers, and the number of parallel layers included in the header layer is generally the same as the preset number of upstream/downstream blocks. The parallel layers in the present disclosure refer to layers arranged in parallel. That is, there is no upstream-downstream relationship between parallel layers in a head layer. A parallel layer corresponds to an upstream block that is adjacent to and connected to its own block. Different parallel layers in a block correspond to different upstream blocks adjacent to and connected to the block in which it resides. All parallel layers in a block correspond to all upstream blocks adjacent to and connected to the block in which it resides. That is to say, a block processes the output information (for example, feature maps, etc.) of each upstream block through its header layer, so that the layers located after the header layer in the block can process the output information of each upstream block. deal with.

可选的，对于第一神经网络中的第i块而言，本公开可以根据预先设置的上/下游块数量，确定第i块的头层所包括的并行层的数量，并根据第i块的通道数和空间分辨率、以及与第i块相邻并连接的各上游块的通道数和空间分辨率，确定第i块中的各并行层各自对应的输入通道数和空间分辨率、以及输出通道数和空间分辨率。即第i块中的任一并行层对应的输入通道数和空间分辨率，通常取决于与并行层所对应的上游块的通道数和空间分辨率，而第i块中的任一并行层对应的输出通道数和空间分辨率，通常取决于第i块的通道数和空间分辨率。并行层对应的输入通道数可以认为是：并行层接收到的输入信息(如特征图)的通道数。并行层对应的空间分辨率可以认为是：并行层接收到的输入信息(如特征图)的空间分辨率。Optionally, for the ith block in the first neural network, the present disclosure may determine the number of parallel layers included in the head layer of the ith block according to the preset number of upstream/downstream blocks, and determine the number of parallel layers included in the head layer of the ith block according to the preset number of upstream and downstream blocks. and the number of channels and spatial resolution of each upstream block adjacent to and connected to the ith block, determine the number of input channels and spatial resolution corresponding to each parallel layer in the ith block, and Number of output channels and spatial resolution. That is, the number of input channels and spatial resolution corresponding to any parallel layer in the ith block usually depends on the number of channels and spatial resolution of the upstream block corresponding to the parallel layer, and any parallel layer in the ith block corresponds to The number of output channels and spatial resolution of , usually depends on the number of channels and spatial resolution of the i-th block. The number of input channels corresponding to the parallel layer can be considered as the number of channels of input information (such as feature maps) received by the parallel layer. The spatial resolution corresponding to the parallel layer can be considered as the spatial resolution of the input information (such as feature maps) received by the parallel layer.

一个例子，假定第i块(即B_i)的上游块数量为a，a个上游块分别为表示为B_i-1、B_i-2、......、B_i-a，且B_i的通道数和空间分辨率分别表示为C_i和H_i×W_i，B_i-1的通道数和空间分辨率分别表示为C_i-1和H_i-1×W_i-1，B_i-2的通道数和空间分辨率分别表示为C_i-2和H_i-2×W_i-2，以此类推，B_i-a的通道数和空间分辨率分别表示为C_i-a和H_i-a×W_i-a；在上述假定的条件下，本公开中的B_i的头层可以包括a个并行层，其中的第一个并行层对应B_i-1(如与B_i-1连接)，第一个并行层用于将B_i-1输出的基于C_i-1×H_i-1×W_i-1的特征图转换为基于C_i×H_i×W_i的特征图；其中的第二个并行层对应B_i-2，第二个并行层用于将B_i-2输出的基于C_i-2×H_i-2×W_i-2的特征图转换为基于C_i×H_i×W_i的特征图，以此类推，其中的第a个并行层对应B_i-a，第a个并行层用于将B_i-a输出的基于C_i-a×H_i-a×W_i-a的特征图转换为C_i×H_i×W_i的特征图。As an example, assuming that the number of upstream blocks of the i-th block (ie, B _i ) is a, the a upstream blocks are denoted as B _i-1 , B _i-2 , . . . , B _ia , and B _i The number of channels and spatial resolution of B i-1 are denoted as C _i and H _i ×W _i , respectively, and the number of channels and spatial resolution of B _i-1 are denoted as C _i-1 and H _i-1 ×W _i-1 , B _i The number of channels and spatial resolution of _-2 are denoted as C _i-2 and H _i-2 ×W _i-2 , and so on, the number of channels and spatial resolution of B _ia are denoted as C _ia and H _ia ×W, respectively _ia ; Under the above-mentioned assumptions, the head layer of B _i in the present disclosure may include a parallel layers, wherein the first parallel layer corresponds to B _i-1 (eg, connected with B _i-1 ), and the first parallel layer corresponds to B i-1. The parallel layer is used to convert the C _i-1 ×H _i-1 ×W _i-1- based feature map output by B _i-1 into a C _i ×H _i ×W _i -based feature map; the second parallel layer corresponds to B _i-2 , and the second parallel layer is used to convert the C _i-2 ×H _i-2 ×W _i-2- based feature map output by B _i-2 to a C _i ×H _i ×W _i based feature map The feature map of , and so on, the a-th parallel layer corresponds to B _ia , and the a-th parallel layer is used to convert the feature map based on C _ia ×H _ia ×W _ia output by B _ia to C _i ×H _i Feature map of ×W _i .

本公开通过在块的头层中设置多个并行层，并使每一个并行层分别对应一个与其所在块相邻并连接的上游块，使与其所在块相邻并连接的多个上游块各自输出的具有不同通道数和空间分辨率的信息，均能够被转换为其所在块的通道数和空间分辨率，这样，使块中的位于头层之后的各层(如堆叠层)可以对来自具有不同通道数和空间分辨率的多个上游块各自输出的信息分别进行处理，从而使第一神经网络可以由具有不同通道数和空间分辨率的多个块形成，有利于提高神经网络结构搜索的灵活性。In the present disclosure, a plurality of parallel layers are set in the header layer of a block, and each parallel layer corresponds to an upstream block adjacent to and connected to the block where it is located, so that the multiple upstream blocks adjacent to and connected to the block where it is located each output The information with different channel numbers and spatial resolutions can be converted to the channel number and spatial resolution of the block in which it is located, so that the layers (such as stacked layers) located after the head layer in the block can The information output by multiple upstream blocks with different channel numbers and spatial resolutions are processed separately, so that the first neural network can be formed by multiple blocks with different channel numbers and spatial resolutions, which is beneficial to improve the search efficiency of the neural network structure. flexibility.

在一个可选示例中，本公开中的块的头层中的并行层所执行的操作可以根据预设候选操作集合来设置。例如，本公开可以根据预设候选操作集合中的所有候选操作，设置头层中的各并行层所包含的操作。本公开设置第一神经网络中的第i块中的任一并行层的输出的方式可以为：先根据该并行层中的各候选操作的操作权重以及各候选操作的输出，进行计算，获得计算结果；然后，再对该计算结果进行处理(如卷积处理等)，并将处理的结果作为并行层的输出。其中的处理结果的通道数与其所在块的通道数相同。也就是说，在对计算结果进行处理的过程中，计算结果的通道数和空间分辨率被转换为并行层所在块的通道数和空间分辨率。In an optional example, the operations performed by the parallel layer in the header layer of the block in the present disclosure may be set according to a preset candidate operation set. For example, the present disclosure can set the operations included in each parallel layer in the head layer according to all the candidate operations in the preset candidate operation set. The method of the present disclosure to set the output of any parallel layer in the i-th block in the first neural network may be as follows: first, according to the operation weight of each candidate operation in the parallel layer and the output of each candidate operation, perform calculation, and obtain the calculation method. The result; then, the calculation result is processed (such as convolution processing, etc.), and the processed result is used as the output of the parallel layer. The number of channels in the processing result is the same as the number of channels in the block in which it is located. That is, in the process of processing the calculation result, the channel number and spatial resolution of the calculation result are converted into the channel number and spatial resolution of the block where the parallel layer is located.

在一个可选示例中，本公开预先设置有候选操作集合。该候选操作集合可以是指：并行层以及位于头层之后的各层可能涉及到的所有候选操作，所形成的集合。通过神经网络结构搜索所获得的第二神经网络所包含的各块具体包含的并行层、堆叠层的数量以及并行层和堆叠层所执行的操作，通常是由从候选操作中最终选取出的候选操作决定的。In an optional example, the present disclosure is preset with a set of candidate operations. The set of candidate operations may refer to: a set formed by the parallel layer and all candidate operations that may be involved in the layers located after the head layer. The number of parallel layers, the number of stacked layers, and the operations performed by the parallel layers and the stacked layers in each block included in the second neural network obtained through the neural network structure search are usually selected from the candidate operations. determined by the operation.

可选的，本公开中的并行层可以采用但不限于：MBConv结构。在并行层采用MBConv结构的情况下，本公开中的并行层所包括的MBConv结构的一个例子如图3所示。Optionally, the parallel layer in the present disclosure may adopt, but is not limited to, the MBConv structure. When the parallel layer adopts the MBConv structure, an example of the MBConv structure included in the parallel layer in the present disclosure is shown in FIG. 3 .

图3所示的并行层中的该结构的输入通道数为C，输出通道数为C'，且C和C'可以不相同。图3中的并行层中的该结构包括3部分。最左边的梯形框表示卷积核大小为1×1的卷积操作以及ReLU6激活函数，其输入通道数为C，输出通道数为tC，其中的t为膨胀系数。中间的长方形框表示步长为2、卷积核大小为k×k的深度可分离卷积以及ReLU6激活函数，其输入通道数和输出通道数均为tC。最右侧的梯形框表示卷积核大小为1×1的卷积操作数，其输入通道数为tC，输出通道数为C。其中的t以及k×k取决于候选操作。The number of input channels of this structure in the parallel layer shown in FIG. 3 is C, the number of output channels is C', and C and C' may be different. The structure in the parallel layer in Figure 3 consists of 3 parts. The leftmost trapezoidal box represents the convolution operation with a convolution kernel size of 1×1 and the ReLU6 activation function. The number of input channels is C and the number of output channels is tC, where t is the expansion coefficient. The rectangular box in the middle represents a depthwise separable convolution with stride 2 and a kernel size of k×k and a ReLU6 activation function, whose number of input channels and number of output channels are both tC. The rightmost trapezoid box represents a convolution operand with a kernel size of 1×1, whose number of input channels is tC and the number of output channels is C. where t and k×k depend on the candidate operation.

可选的，本公开中的候选操作集合所包含的所有候选操作的一个例子可以如下表所示：Optionally, an example of all the candidate operations included in the candidate operation set in the present disclosure may be shown in the following table:

表1Table 1

表1中的Mbconv_k3e3表示卷积核大小为3×3，且扩张系数为3的Mbconv卷积操作；Mbconv_k3e6表示卷积核大小为3×3，且扩张系数为6的Mbconv卷积操作；以此类推，不再一一说明；skip表示跳过，也就是说，skip表示该候选操作不对其输入进行处理，可以认为，该候选操作的输出就是该候选操作的输入。Mbconv_k3e3 in Table 1 represents the Mbconv convolution operation with a convolution kernel size of 3×3 and an expansion factor of 3; Mbconv_k3e6 represents an Mbconv convolution operation with a convolution kernel size of 3×3 and an expansion factor of 6; By analogy, I will not explain them one by one; skip means skip, that is, skip means that the input of the candidate operation is not processed, and it can be considered that the output of the candidate operation is the input of the candidate operation.

对于任一块的头层中的任一并行层而言，该并行层的输出可以表示为：For any parallel layer in the head layer of any block, the output of the parallel layer can be expressed as:

在上述公式(1)中，x_lout表示块的头层中的第l个并行层的输出；o(x_lin)表示块的头层中的第l个并行层中的第o个候选操作针对第l个并行层的输入x_lin进行处理，获得的结果，

表示块的头层中的第l个并行层中的第o个候选操作的操作权重。In the above formula (1), x _lout represents the output of the l-th parallel layer in the head layer of the block; o(x _lin ) represents the o-th candidate operation in the l-th parallel layer in the head layer of the block for The input _xlin of the lth parallel layer is processed, and the result obtained is,

Represents the operation weight of the o-th candidate operation in the l-th parallel layer in the head layer of the block.

可选的，本公开中的

通常是经过归一化处理后的操作权重。本公开可以采用下述公式获得归一化处理后的第l个并行层中的第o个候选操作的操作权重：Optionally, in this disclosure

Usually it is the normalized operation weight. The present disclosure can use the following formula to obtain the operation weight of the 0th candidate operation in the 1th parallel layer after normalization:

在上述公式(2)中，

表示头层中的第l个并行层中的第o个候选操作的操作权重(即归一化处理后的操作权重)；

表示头层中的第l个并行层中的第o个候选操作的归一化处理前的操作权重；O表示候选操作集合；

表示头层中的第l并行层中的第o'个候选操作的归一化处理前的操作权重。In the above formula (2),

Represents the operation weight of the o-th candidate operation in the l-th parallel layer in the head layer (that is, the operation weight after normalization);

Represents the operation weight before normalization of the o-th candidate operation in the l-th parallel layer in the head layer; O represents the candidate operation set;

Represents the operation weight before normalization of the 0'th candidate operation in the lth parallel layer in the head layer.

本公开通过利用候选操作的操作权重，对并行层中的各候选操作的输出进行融合处理，可以使并行层的处理结果充分反映出该并行层中的所有候选操作，对该并行层的处理结果的影响，从而有利于利用损失函数调整并行层中的所有候选操作的操作权重，进而有利于从第一神经网络中搜索获得第二神经网络的结构。In the present disclosure, by using the operation weights of the candidate operations to perform fusion processing on the outputs of each candidate operation in the parallel layer, the processing results of the parallel layer can fully reflect all the candidate operations in the parallel layer, and the processing results of the parallel layer can fully reflect all the candidate operations in the parallel layer. , which is beneficial to use the loss function to adjust the operation weights of all candidate operations in the parallel layer, which is beneficial to search and obtain the structure of the second neural network from the first neural network.

在一个可选示例中，对于第一神经网络中的任一块(如第i块)而言，由于头层包括多个并行层，因此，本公开中的头层需要对多个并行层的输出进行融合处理，并将融合处理后的结果，作为头层的输出，提供给位于头层之后的第一个堆叠层。在头层对其包含的所有并行层的输出进行融合处理的过程中，可以考虑与各并行层相连接的各块的权重。本公开中的块的权重可以称为块连接权重。也就是说，本公开可以根据第i块的头层中的各并行层各自对应的块连接权重以及各并行层的输出，进行计算，例如，进行加权平均计算，所获得的计算结果被作为第i块的头层的输出，第i块的头层的输出可以被作为第i块中的位于头层之后的第一个堆叠层的输入。In an optional example, for any block (such as the i-th block) in the first neural network, since the head layer includes multiple parallel layers, the head layer in the present disclosure needs to output multiple parallel layers Perform fusion processing, and provide the result of fusion processing as the output of the head layer to the first stack layer after the head layer. In the process of fusing the outputs of all parallel layers included in the head layer, the weight of each block connected to each parallel layer can be considered. The weight of a block in the present disclosure may be referred to as a block connection weight. That is to say, the present disclosure can perform calculation according to the corresponding block connection weight of each parallel layer in the head layer of the ith block and the output of each parallel layer, for example, perform weighted average calculation, and the obtained calculation result is used as the ith block. The output of the head layer of the i-th block, the output of the head layer of the i-th block can be used as the input of the first stacked layer located after the head layer in the i-th block.

可选的，本公开中的第i块的头层所执行的操作可以通过下述公式来表示：Optionally, the operation performed by the head layer of the i-th block in the present disclosure can be represented by the following formula:

在上述公式(3)中，x_i表示第i块的头层的输出信息；m'表示与第i块连接的上游块数量；k＝1,......,m'；p_i-k,k表示与第i块连接的第i-k上游块的块连接权重；H_ik(x_i-k)表示第i块中的头层中的相应并列层针对第i-k上游块的输出信息的处理结果，即相应并列层输出的信息。In the above formula (3), x _i represents the output information of the head layer of the ith block; m' represents the number of upstream blocks connected to the ith block; k=1,...,m'; p _{ik , k} represents the block connection weight of the ik th upstream block connected to the ith block; H _ik (x _ik ) represents the processing result of the output information of the ik th upstream block by the corresponding parallel layer in the head layer in the ith block, i.e. The information output by the corresponding parallel layer.

可选的，本公开中的p_i-k,k通常是经过归一化处理后的块连接权重。假定与第i块连接的任一上游块可以表示为第j块，则本公开可以采用下述公式获得归一化处理后的第j块的块连接权重：Optionally, p _ik,k in the present disclosure is usually the block connection weight after normalization. Assuming that any upstream block connected to the i-th block can be represented as the j-th block, the present disclosure can obtain the block connection weight of the j-th block after normalization by the following formula:

在上述公式(4)中，p_ij表示：在第j块为第i块的上游块的情况下，对于第j块与第i块的连接而言，归一化处理后的第j块的块连接权重；m表示与第i块连接的上游块数量；k＝1,......,m；β_ij表示：在第j块为第i块的上游块的情况下，对于第j块与第i块的连接而言，归一化处理前的第j块的块连接权重；β_ik表示：在第k块为第i块的上游块的情况下，对于第k块与第i块的连接而言，归一化处理前的第k块的块连接权重。In the above formula (4), p _ij represents: when the jth block is the upstream block of the ith block, for the connection between the jth block and the ith block, the normalized value of the jth block Block connection weight; m represents the number of upstream blocks connected to the ith block; k=1,...,m; β _ij represents: when the jth block is the upstream block of the ith block, for the ith block For the connection between the j-th block and the i-th block, the block connection weight of the j- _th block before normalization; For the connection of the i block, the block connection weight of the kth block before normalization is processed.

本公开通过利用块连接权重，对块的头层中的各并行层的输出进行融合处理，一方面可以使块中的各堆叠层可以对具有不同通道数和空间分辨率的各上游块的输出信息进行处理，另一方面，可以使堆叠层的处理结果充分反映出具有不同块连接权重的多个上游块的输出信息，对堆叠层的处理结果的影响，从而有利于利用损失函数调整块连接权重，进而有利于从第一神经网络中搜索获得第二神经网络的结构。By using the block connection weight, the present disclosure performs fusion processing on the outputs of each parallel layer in the head layer of the block. On the one hand, each stacked layer in the block can fuse the outputs of each upstream block with different channel numbers and spatial resolutions. On the other hand, the processing results of the stacked layers can fully reflect the output information of multiple upstream blocks with different block connection weights, which affects the processing results of the stacked layers, which is beneficial to use the loss function to adjust the block connection. The weights are further beneficial to obtain the structure of the second neural network by searching from the first neural network.

在一个可选示例中，本公开中的任一块均可以包括：多个堆叠层。堆叠层即块中的位于头层之后的层。堆叠层可以用于针对来自与其所在块相邻并连接的所有上游块输出的信息，进行处理。一个块所包含的所有堆叠层顺序连接，且一个块所包含的所有堆叠层中的排列在最前面的堆叠层的输入，即为该块的头层的输出，一个块所包含的所有堆叠层中的排列在最后一个堆叠层的输出，可以被认为是该块的输出。In an optional example, any block in the present disclosure may include: a plurality of stacked layers. Stacked layers are layers in a block that follow the header layer. Stacked layers can be used to process information from the outputs of all upstream blocks adjacent to and connected to the block on which it resides. All stacked layers included in a block are connected in sequence, and the input of the stacked layer arranged at the front among all stacked layers included in a block is the output of the head layer of the block, and all stacked layers included in a block are The permutation in the output of the last stacked layer can be thought of as the output of the block.

在一个可选示例中，对于第一神经网络中的第i块而言，本公开可以根据第i块的通道数和空间分辨率，确定第i块中的顺序位于头层之后的各堆叠层各自对应的输入通道数和空间分辨率以及输出通道数和空间分辨率。例如，第i块中的各堆叠层各自对应的输入通道数和输出通道数均为第i块的通道数，而第i块中的各堆叠层各自对应的输入空间分辨率和输出空间分辨率均为第i块的空间分辨率。第i块所包含的堆叠层的数量可以根据实际需求设置，例如，第i块所包含的堆叠层的数量通常不小于3。另外，第一神经网络中的所有块所包含的堆叠层数量通常是相同的。当然，本公开也不排除第一神经网络中的所有块所包含的堆叠层数量不相同的情况。In an optional example, for the ith block in the first neural network, the present disclosure may determine, according to the number of channels and the spatial resolution of the ith block, the stacked layers in the ith block that are sequentially located after the head layer The corresponding number of input channels and spatial resolution and the number of output channels and spatial resolution. For example, the number of input channels and the number of output channels corresponding to each stacked layer in the ith block are the number of channels of the ith block, and the corresponding input spatial resolution and output spatial resolution of each stacked layer in the ith block are the spatial resolution of the i-th block. The number of stacked layers included in the i-th block can be set according to actual requirements. For example, the number of stacked layers included in the i-th block is usually not less than 3. In addition, the number of stacked layers contained in all blocks in the first neural network is usually the same. Of course, the present disclosure also does not exclude the situation that all blocks in the first neural network contain different numbers of stacked layers.

本公开通过根据第i块的通道数和空间分辨率，设置第i块中的堆叠层对应的输入信息和输出信息的格式，有利于规范化第一神经网络中的各块所执行的操作，使各块的结构规范化，从而有利于便捷的形成第一神经网络，并有利于提高第一神经网络的可维护性。In the present disclosure, by setting the format of the input information and output information corresponding to the stacked layers in the ith block according to the number of channels and the spatial resolution of the ith block, it is beneficial to normalize the operations performed by each block in the first neural network, so that the The structure of each block is standardized, thereby facilitating the convenient formation of the first neural network and improving the maintainability of the first neural network.

可选的，本公开可以根据预设候选操作集合中的所有候选操作，设置各块中的顺序位于头层之后的各堆叠层所包含的操作。所有候选操作的一个例子如上述表1所示。针对块中的任一堆叠层而言，本公开设置该堆叠层的输出的方式可以包括：根据针对该堆叠层中的各候选操作的操作权重以及各候选操作的输出进行计算(例如，加权平均计算)，获得计算结果；之后，可以对该计算结果进行处理(如卷积处理等)，并将处理的结果作为该堆叠层的输出。Optionally, the present disclosure may set the operations included in each stack layer sequentially located after the head layer in each block according to all the candidate operations in the preset candidate operation set. An example of all candidate operations is shown in Table 1 above. For any stacked layer in the block, the manner in which the present disclosure sets the output of the stacked layer may include: calculating (eg, weighted average) according to the operation weight for each candidate operation in the stacked layer and the output of each candidate operation calculation) to obtain the calculation result; after that, the calculation result can be processed (such as convolution processing, etc.), and the processed result can be used as the output of the stacked layer.

可选的，堆叠层中的任一候选操作的操作权重可以表示出该堆叠层中的该候选操作被选择，而成为第二神经网络中的结构的可能性。Optionally, the operation weight of any candidate operation in the stacked layer may indicate the possibility that the candidate operation in the stacked layer is selected to become a structure in the second neural network.

对于任一块中的任一堆叠层而言，该堆叠层的输出可以表示为：For any stack in any block, the output of the stack can be expressed as:

在上述公式(5)中，x_l+1表示块中的第l堆叠层的输出，也可以认为是块中的第l+1层的输入；o(x_l)表示块中的第l堆叠层中的第o个候选操作针对该块中的第l-1层(可能是堆叠层，也可能是头层)的输出进行处理，获得的结果，

表示块中的第l堆叠层中的第o个候选操作的操作权重。In the above formula (5), x _l+1 represents the output of the lth stack layer in the block, and can also be considered as the input of the l+1th layer in the block; o(x _l ) represents the lth stack in the block. The 0th candidate operation in the layer is processed against the output of the l-1th layer (may be the stacked layer or the head layer) in the block, and the result obtained,

Represents the operation weight of the 0th candidate operation in the lth stack layer in the block.

可选的，本公开中的

通常是经过归一化处理后的操作权重。本公开可以采用下述公式获得归一化处理后的第l堆叠层中的第o个候选操作的操作权重：Optionally, in this disclosure

Usually it is the normalized operation weight. The present disclosure can use the following formula to obtain the operation weight of the 0th candidate operation in the 1th stack layer after normalization:

在上述公式(6)中，

表示块中的第l堆叠层中的第o个候选操作的操作权重；

表示块中的第l堆叠层中的第o个候选操作的归一化处理前的操作权重；O表示候选操作集合；

表示块中的第l堆叠层中的第o'个候选操作的归一化处理前的操作权重。In the above formula (6),

represents the operation weight of the o-th candidate operation in the l-th stacked layer in the block;

Represents the operation weight before normalization of the o-th candidate operation in the l-th stacked layer in the block; O represents the candidate operation set;

Represents the operation weight before normalization of the 0'th candidate operation in the lth stack layer in the block.

可选的，本公开的堆叠层可以采用但不限于MBConv结构。在堆叠层采用MBConv结构的情况下，堆叠层所包含的MBConv结构的一个结构的例子如图4所示。图4中包括三个部分。最左边的梯形框表示卷积核大小为1×1的卷积操作以及ReLU6激活函数。中间的长方形框表示卷积核大小为k×k的深度可分离卷积以及ReLU6激活函数。最右侧的梯形框表示卷积核大小为1×1的卷积操作数。图4中的输入通道数和输出通道数均为C。Optionally, the stacked layers of the present disclosure may adopt, but not limited to, an MBConv structure. When the stacked layer adopts the MBConv structure, an example of the structure of the MBConv structure included in the stacked layer is shown in FIG. 4 . Figure 4 includes three sections. The leftmost trapezoidal box represents the convolution operation with a kernel size of 1×1 and the ReLU6 activation function. The rectangle in the middle represents the depthwise separable convolution with kernel size k×k and the ReLU6 activation function. The rightmost trapezoidal box represents the convolution operands with a kernel size of 1×1. The number of input channels and the number of output channels in Figure 4 are both C.

本公开通过利用候选操作的操作权重，对堆叠层中的各候选操作的输出进行融合处理，可以使堆叠层的处理结果充分反映出该堆叠层中的所有候选操作，对该堆叠层的处理结果的影响，从而有利于利用损失函数调整所有候选操作的操作权重，进而有利于从第一神经网络中搜索获得第二神经网络的结构。In the present disclosure, by using the operation weights of the candidate operations to perform fusion processing on the outputs of each candidate operation in the stacked layer, the processing result of the stacked layer can fully reflect all the candidate operations in the stacked layer, and the processing result of the stacked layer can fully reflect all the candidate operations in the stacked layer. , which is beneficial to use the loss function to adjust the operation weights of all candidate operations, which is beneficial to search and obtain the structure of the second neural network from the first neural network.

在一个可选示例中，本公开形成的第一神经网络的一个例子如图5所示。图5中的下图为包括17个块的第一神经网络。该第一神经网络中的每一个块均与3个下游块连接，即与块连接的上/下游块数量为3。图5中的第一神经网络可以被划分为5个阶段，分别为：In an optional example, an example of the first neural network formed by the present disclosure is shown in FIG. 5 . The lower graph in Figure 5 is the first neural network comprising 17 blocks. Each block in the first neural network is connected with 3 downstream blocks, that is, the number of upstream/downstream blocks connected with the block is 3. The first neural network in Figure 5 can be divided into 5 stages, namely:

1、位于图5最左边的第一阶段。第一阶段包括一个块，即图5中标注16的方框，该块的通道数为16，该块的空间分辨率为112×112。1. The first stage at the far left of Figure 5. The first stage includes a block, namely the box marked 16 in Fig. 5, the number of channels of this block is 16, and the spatial resolution of this block is 112×112.

2、位于图5次左边的第二阶段，第二阶段包括一个块，即图5中标注24的方框，该块的通道数为24，该块的空间分辨率为56×56。第一阶段的块与第二阶段的块之间的通道数增幅为8。2. The second stage located on the left side of Figure 5. The second stage includes a block, that is, the box marked 24 in Figure 5. The number of channels in this block is 24, and the spatial resolution of this block is 56×56. The number of channels between the blocks of the first stage and the blocks of the second stage is increased by 8.

3、位于图5中间偏左位置处的第三阶段，第三阶段包括三个块，即图5中标注32、40和48的三个方框。第三阶段中的第一个块、第二个块和第三个块的通道数分别为32、40和48，第三阶段中的第一个块、第二个块和第三个块的空间分辨率均为28×28。第三阶段的相邻块之间的通道数增幅为8。3. The third stage located at the left-middle position in FIG. 5 . The third stage includes three blocks, namely the three blocks marked 32 , 40 and 48 in FIG. 5 . The number of channels of the first block, the second block and the third block in the third stage are 32, 40 and 48 respectively, and the number of channels of the first block, the second block and the third block in the third stage The spatial resolutions are all 28×28. The number of channels between adjacent blocks in the third stage is increased by 8.

4、位于图5中间位置处的第四阶段，第四阶段包括六个块，即图5中的标注有64、80、96、112、128和144的六个方框。第四阶段中的第一个块至第六个块的通道数分别为64、80、96、112、128和144，第四阶段中的第一个块至第六个块的空间分辨率均为14×14。第四阶段的相邻块之间的通道数增幅为16。4. The fourth stage located in the middle of FIG. 5 , the fourth stage includes six blocks, namely the six blocks marked 64, 80, 96, 112, 128 and 144 in FIG. 5 . The channel numbers of the first to sixth blocks in the fourth stage are 64, 80, 96, 112, 128, and 144, respectively, and the spatial resolutions of the first to sixth blocks in the fourth stage are all is 14×14. The number of channels between adjacent blocks in the fourth stage is increased by 16.

5、位于图5最右侧位置处的第五阶段，第五阶段包括六个块，即图5中的标注有160、224、288、352、416和480的六个方框。第五阶段中的第一个块至第六个块的通道数分别为160、224、288、352、416和480，第五阶段中的第一个块至第六个块的空间分辨率均为7×7。第五阶段的相邻块之间的通道数增幅为64。5. The fifth stage at the far right position in FIG. 5 , the fifth stage includes six blocks, namely the six blocks marked 160 , 224 , 288 , 352 , 416 and 480 in FIG. 5 . The number of channels from the first block to the sixth block in the fifth stage are 160, 224, 288, 352, 416 and 480, respectively, and the spatial resolutions of the first block to the sixth block in the fifth stage are all is 7×7. The number of channels between adjacent blocks in the fifth stage is increased by 64.

图5中的第一神经网络中的任一块的结构如图5的左上图所示。即每一个块的头层均包括三个并行层以及三个顺序连接的堆叠层，三个并行层的输入分别是与该块连接的3个上游块的输出，在根据3个上游块各自的块连接权重对三个并行层的输出进行加权平均计算后，可以将加权平均计算的结果作为该块的第一个堆叠层的输入。The structure of any block in the first neural network in FIG. 5 is shown in the upper left figure of FIG. 5 . That is, the head layer of each block includes three parallel layers and three sequentially connected stacked layers. The inputs of the three parallel layers are the outputs of the three upstream blocks connected to the block. After the block connection weights perform a weighted average calculation on the outputs of the three parallel layers, the result of the weighted average calculation can be used as the input of the first stacked layer of the block.

图5中的第一神经网络中的任一块内的任一堆叠层以及任一头层中的任一并行层(如图3中间方框以及图2中间方框)所包含的结构如图5右上图所示。即每一个并行层以及每一个堆叠层均包括所有候选操作，各候选操作分别对其输入进行处理，形成每一个候选操作的输出，在候选操作为跳过时，该候选操作的输出即为该候选操作的输入。在根据各候选操作的操作权重对各候选操作的输出进行加权平均计算后，可以获得加权平均计算的结果，该计算的结果用于形成该层的输出，例如，对加权平均计算的结果进行卷积处理，并将卷积处理的结果作为该层的输出。最后一个堆叠层的输出可以作为该块的输出。The structure contained in any stacked layer in any block of the first neural network in FIG. 5 and any parallel layer in any head layer (the middle box in FIG. 3 and the middle box in FIG. 2 ) is shown in the upper right of FIG. 5 as shown in the figure. That is, each parallel layer and each stacking layer include all candidate operations. Each candidate operation processes its input separately to form the output of each candidate operation. When the candidate operation is skipped, the output of the candidate operation is the candidate operation. Operation input. After the output of each candidate operation is weighted and averaged according to the operation weight of each candidate operation, the result of the weighted average calculation can be obtained, and the result of the calculation is used to form the output of the layer, for example, the result of the weighted average calculation is rolled Product processing, and the result of convolution processing as the output of this layer. The output of the last stacked layer can be used as the output of this block.

在一个可选示例中，本公开在利用第一数据集合中的样本数据，对第一神经网络进行神经网络结构搜索处理之前，应先保证第一神经网络对输入信息的处理具有一定的准确性。因此，本公开可以在对第一神经网络进行神经网络结构搜索处理之前，利用第二数据集合中的样本数据对第一神经网络进行训练，以调整第一神经网络中各块中的各层(包括各并行层和各堆叠层)中的各候选操作的操作参数，从而使第一神经网络对输入信息的处理具有一定的准确性。由于利用样本数据来调整第一神经网络中的结构参数的过程也可以称为对第一神经网络的训练过程，因此，本公开的第一神经网络的训练过程可以包括两个阶段，第一阶段为针对操作参数的训练过程，第二阶段为针对结构参数的训练过程。In an optional example, before using the sample data in the first data set to perform the neural network structure search processing on the first neural network, the present disclosure should first ensure that the processing of the input information by the first neural network has a certain accuracy . Therefore, the present disclosure can use the sample data in the second data set to train the first neural network before performing the neural network structure search process on the first neural network to adjust each layer ( Including the operation parameters of each candidate operation in each parallel layer and each stacked layer), so that the first neural network can process the input information with certain accuracy. Since the process of using sample data to adjust the structural parameters in the first neural network can also be referred to as the training process of the first neural network, the training process of the first neural network in the present disclosure may include two stages, the first stage For the training process for the operation parameters, the second stage is the training process for the structural parameters.

需要特别说明的是，第二阶段可以仅针对结构参数进行训练，第二阶段也可以针对操作参数和结构参数同时进行训练(如采用多目标优化方式，对操作参数和结构参数同时进行训练)。另外，第一阶段和第二阶段可以迭代进行，即第一阶段和第二阶段可以交替执行。例如，先进行第一阶段的训练，在达到第一预定迭代条件(如使用样本数据的数量达到预定数量或者第一神经网络对输入信息的处理的准确性符合一定的要求等)时，停止第一阶段的训练过程，然后，开始进行第二阶段的训练，在达到第二预定迭代条件(如使用样本数据的数量达到预定数量或者第一神经网络的结构参数的收敛情况符合一定的要求等)时，停止第二阶段的训练过程，再次开始进行第一阶段的训练，以此类推，从而实现第一阶段和第二阶段的迭代进行。It should be noted that the second stage can be trained only for structural parameters, and the second stage can also be trained for operation parameters and structural parameters at the same time (for example, a multi-objective optimization method is used to train operation parameters and structural parameters at the same time). In addition, the first stage and the second stage can be performed iteratively, that is, the first stage and the second stage can be performed alternately. For example, the first stage of training is performed first, and when the first predetermined iterative condition is reached (such as the number of sample data used reaches a predetermined number or the accuracy of the processing of the input information by the first neural network meets certain requirements, etc.), the first stage is stopped. The first-stage training process, and then start the second-stage training, when the second predetermined iterative condition is reached (for example, the number of sample data used reaches a predetermined number or the convergence of the structural parameters of the first neural network meets certain requirements, etc.) , stop the training process of the second stage, start the training of the first stage again, and so on, so as to realize the iterative progress of the first stage and the second stage.

可选的，本公开中的候选操作的操作参数包括但不限于：候选操作中的卷积核权重等。本公开可以将第二数据集合中的样本数据提供给第一神经网络，经由第一神经网络对样本数据进行处理；之后，本公开可以根据样本数据的处理结果以及样本数据(如处理结果与样本数据之间的差异)，利用第二损失函数(如交叉熵损失函数等，下述称为第二交叉熵损失函数)，调整第一神经网络中各块中的各层(包括各并行层和各堆叠层)中的各候选操作的操作参数。在第一次执行第一阶段的训练之前，本公开可以采用随机初始化的方式，对第一神经网络的结构参数进行赋值。第二损失函数可以表示为

表示对操作参数w求导，L_train(w,α,β)表示针对操作参数w、候选操作的操作权重α以及块连接权重β的第二交叉熵损失函数。该第二交叉熵损失函数在计算时，会使用到操作权重α和块连接权重β，但是，在第一阶段的反向传播过程中，通常只更新操作参数w，而并不会对操作权重α和块连接权重β进行更新。Optionally, the operation parameters of the candidate operation in the present disclosure include, but are not limited to: the weight of the convolution kernel in the candidate operation, and the like. The present disclosure can provide the sample data in the second data set to the first neural network, and process the sample data via the first neural network; after that, the present disclosure can process the sample data according to the processing result of the sample data and the sample data (such as the processing result and the sample data). The difference between the data), using the second loss function (such as cross entropy loss function, etc., hereinafter referred to as the second cross entropy loss function), adjust each layer in each block in the first neural network (including each parallel layer and Operation parameters for each candidate operation in each stack layer). Before the first stage of training is performed for the first time, the present disclosure may use a random initialization method to assign values to the structural parameters of the first neural network. The second loss function can be expressed as

represents the derivation of the operation parameter w, and L _train (w, α, β) represents the second cross-entropy loss function for the operation parameter w, the operation weight α of the candidate operation, and the block connection weight β. The second cross-entropy loss function will use the operation weight α and the block connection weight β in the calculation, but in the back-propagation process of the first stage, only the operation parameter w is usually updated, and the operation weight is not affected. α and block connection weights β are updated.

可选的，在第一次执行第一阶段的训练之前，可以对操作参数w、候选操作的操作权重α以及块连接权重β，分别赋予初始值(例如，采用随机初始化方式赋予的初始值)。在第一次执行第一阶段的训练过程中，该第二交叉熵损失函数中的操作权重α和块连接权重β的取值仍然为前述赋予的初始值。在后续再次执行第一阶段的训练的过程中，该第二交叉熵损失函数中的操作权重α和块连接权重β的取值通常为最近一次第二阶段的训练所获得的数值。Optionally, before the first stage of training is performed for the first time, the operation parameter w, the operation weight α of the candidate operation, and the block connection weight β can be assigned initial values (for example, initial values assigned by random initialization) . During the first training process of the first stage, the values of the operation weight α and the block connection weight β in the second cross-entropy loss function are still the initial values given above. In the subsequent process of performing the first-stage training again, the values of the operation weight α and the block connection weight β in the second cross-entropy loss function are usually the values obtained in the most recent second-stage training.

可选的，本公开中的第一数据集合和第二数据集合所包含的样本数据通常不相同。例如，本公开可以将一个完整的样本数据集划分为两部分，其中一部分做为训练集(即第二数据集合)，另一部分做为验证集(即第一数据集合)，训练集中的样本数据用于调整第一神经网络中各块中的各层中的各候选操作的操作参数。验证集中的样本数据用于调整第一神经网络中的结构参数。Optionally, the sample data included in the first data set and the second data set in the present disclosure are generally different. For example, the present disclosure can divide a complete sample data set into two parts, one part is used as the training set (ie the second data set), the other part is used as the validation set (ie the first data set), the sample data in the training set Used to adjust the operation parameters of each candidate operation in each layer in each block in the first neural network. The sample data in the validation set is used to adjust the structural parameters in the first neural network.

本公开通过利用第二数据集合中的样本数据对第一神经网络进行训练，以调整第一神经网络中各块中的各层(包括各并行层和各堆叠层)中的各候选操作的操作参数，可以使第一神经网络对输入信息的处理具有一定的准确性，这样，在第二阶段的训练过程中，有利于使第一神经网络的结构参数尽快收敛，从而有利于提高神经网络结构搜索的效率。The present disclosure trains the first neural network by using the sample data in the second data set to adjust the operation of each candidate operation in each layer (including each parallel layer and each stacked layer) in each block in the first neural network The parameters can make the first neural network process the input information with certain accuracy, so in the second stage of training process, it is beneficial to make the structural parameters of the first neural network converge as soon as possible, which is beneficial to improve the neural network structure search efficiency.

在一个可选示例中，本公开的第一阶段的训练可以采用下述两种方式实现：In an optional example, the training of the first stage of the present disclosure can be implemented in the following two ways:

方式一、将第二数据集合(例如，训练集)中的样本数据提供给第一神经网络，经由第一神经网络中的所有块形成的所有路径中的所有层(包括各并行层和各堆叠层)中的所有候选操作，分别对样本数据进行处理，获得第一神经网络对样本数据的处理结果。也就是说，输入的样本数据流经了第一神经网络中的所有块连接，并流经了每一个块中的所有层，并被所有块中的各并行层以及各堆叠层中的所有候选操作进行了处理。Mode 1: Provide the sample data in the second data set (for example, the training set) to the first neural network, and pass through all layers (including parallel layers and stacks) in all paths formed by all blocks in the first neural network All candidate operations in the layer), respectively process the sample data, and obtain the processing result of the first neural network on the sample data. That is to say, the input sample data flows through all block connections in the first neural network, and flows through all layers in each block, and is processed by all parallel layers in all blocks and all candidates in each stacked layer. The operation is processed.

在第一阶段采用方式一的情况下，本公开可以根据样本数据的处理结果以及样本数据(如两者之间的差异)，利用第二损失函数，采用梯度下降方式，来调整第一神经网络中各块中的各层(包括各并行层和各堆叠层)中的所有候选操作的操作参数。即在训练的反向传播过程中，所有块中的所有层中的所有候选操作的操作参数均会得到更新。In the case where the first method is adopted in the first stage, the present disclosure can adjust the first neural network by using the second loss function and the gradient descent method according to the processing result of the sample data and the sample data (such as the difference between the two) Operation parameters of all candidate operations in each layer in each block (including each parallel layer and each stacked layer). That is, during backpropagation of training, the operation parameters of all candidate operations in all layers in all blocks are updated.

方式二、将第二数据集合中的样本数据提供给第一神经网络，经由第一神经网络中的所有块形成的所有路径中的所有层(包括各并行层和各堆叠层)中的被挑选的一个候选操作，分别对样本数据进行处理，获得第一神经网络对样本数据的处理结果。也就是说，输入样本数据流经了第一神经网络中的所有块连接，并流经了每一个块中的所有层，但是仅被所有块中的各并行层以及各堆叠层中的被挑选出的一个候选操作进行了处理。本公开可以采用随机挑选方式，来挑选候选操作，本公开也可以根据候选操作的操作权重来挑选候选操作。本公开对此不作限定。Mode 2: The sample data in the second data set is provided to the first neural network, and selected ones of all layers (including each parallel layer and each stacked layer) in all paths formed by all blocks in the first neural network A candidate operation of , respectively processes the sample data, and obtains the processing result of the first neural network on the sample data. That is, the input sample data flows through all block connections in the first neural network and through all layers in each block, but is only selected by each of the parallel layers in all the blocks and the stacked layers One of the selected candidate operations is processed. The present disclosure can use a random selection method to select candidate operations, and the present disclosure can also select candidate operations according to the operation weights of the candidate operations. The present disclosure does not limit this.

在第一阶段采用上述方式二的情况下，本公开可以根据第一神经网络对样本数据的处理结果以及样本数据(如两者之间的差异)，利用第二损失函数，采用梯度下降方式，来调整第一神经网络中各块中的各层中的被挑选出的候选操作的操作参数。即在第一阶段训练的一次反向传播过程中，所有块中的所有层(包括各并行层和各堆叠层)中的本次被挑选出的候选操作的操作参数会得到更新，而本次未被挑选出的候选操作的操作参数不会得到更新。本公开通过在第一阶段的训练过程中，挑选候选操作对样本数据进行处理，有利于提高第一阶段的训练效率。In the case where the above-mentioned method 2 is adopted in the first stage, the present disclosure can use the second loss function and the gradient descent method according to the processing result of the sample data by the first neural network and the sample data (such as the difference between the two). to adjust the operation parameters of the selected candidate operations in each layer in each block in the first neural network. That is, during a backpropagation process in the first stage of training, the operation parameters of the selected candidate operation in all layers in all blocks (including each parallel layer and each stacked layer) will be updated. Operation parameters for candidate operations that are not picked are not updated. In the present disclosure, by selecting candidate operations to process sample data during the training process of the first stage, it is beneficial to improve the training efficiency of the first stage.

在一个可选示例中，本公开可以在完成一次第一阶段的训练之后，开始执行第二阶段的训练过程。具体的，本公开可以将第一数据集合(例如，验证集)中的样本数据提供给第一神经网络，经由第一神经网络对该样本数据进行处理；之后，根据第一神经网络对样本数据的处理结果以及样本数据(如两者之间的差异)，利用第一损失函数(例如，交叉熵损失函数，下述称为第一交叉熵损失函数)，采用梯度下降方式，来调整第一神经网络中的块连接权重以及各块中的各并行层以及各堆叠层中的各候选操作的操作权重。In an optional example, the present disclosure may start to perform the training process of the second stage after completing the training of the first stage. Specifically, the present disclosure can provide sample data in a first data set (eg, a validation set) to a first neural network, and process the sample data through the first neural network; after that, the sample data is processed according to the first neural network. The processing results and sample data (such as the difference between the two) are used to adjust the first loss function (for example, the cross entropy loss function, hereinafter referred to as the first cross entropy loss function), and the gradient descent method is used to adjust the first loss function. Block connection weights in the neural network and operation weights for each candidate operation in each parallel layer in each block and in each stacked layer.

可选的，第一损失函数可以表示为

其中的

表示对候选操作的操作权重α(归一化处理前的操作权重)以及块连接权重β(归一化处理前的块连接权重)求导，

表示针对操作参数w、候选操作的操作权重α以及块连接权重β的第一交叉熵损失函数。第一交叉熵损失函数在计算时，会使用到操作参数w，但是，在第二阶段的反向传播过程中，本公开通常只会对操作权重α和块连接权重β进行更新，并不会对操作参数w进行更新。Optionally, the first loss function can be expressed as

one of them

represents the derivation of the operation weight α of the candidate operation (the operation weight before normalization processing) and the block connection weight β (the block connection weight before normalization processing),

Represents the first cross-entropy loss function for the operation parameter w, the operation weight α of the candidate operation, and the block connection weight β. The operation parameter w is used in the calculation of the first cross-entropy loss function. However, during the back-propagation process of the second stage, the present disclosure usually only updates the operation weight α and the block connection weight β, and does not Update the operating parameter w.

可选的，在开始执行第二阶段的训练时，第一交叉熵损失函数中的操作参数w的取值通常为最近一次第一阶段的训练所获得的数值。Optionally, when the training of the second stage is started, the value of the operation parameter w in the first cross-entropy loss function is usually the value obtained in the most recent training of the first stage.

本公开通过利用第一损失函数对第一神经网络中的块连接权重以及各块中的各并行层和各堆叠层中的各候选操作的操作权重进行更新，可以使第一神经网络中的块连接权重和各候选操作的操作权重，朝向梯度下降的方向收敛，从而有利于从第一神经网络中准确的体现出第二神经网络的结构。In the present disclosure, by using the first loss function to update the block connection weight in the first neural network and the operation weight of each candidate operation in each parallel layer in each block and each stack layer, the blocks in the first neural network can be The connection weight and the operation weight of each candidate operation converge toward the direction of gradient descent, which is beneficial to accurately reflect the structure of the second neural network from the first neural network.

在一个可选示例中，本公开的第二阶段的训练可以采用下述两种方式实现：In an optional example, the training of the second stage of the present disclosure can be implemented in the following two ways:

方式一、将第一数据集合中的样本数据提供给第一神经网络，经由第一神经网络中的所有块形成的所有路径中的所有层(包括各并行层和各堆叠层)中的所有候选操作，分别对样本数据进行处理，获得第一神经网络对样本数据的处理结果。Mode 1: Provide the sample data in the first data set to the first neural network, and pass through all candidates in all layers (including each parallel layer and each stacked layer) in all paths formed by all blocks in the first neural network operation, respectively process the sample data, and obtain the processing result of the sample data by the first neural network.

在第二阶段采用上述方式一的情况下，本公开可以根据样本数据的处理结果以及样本数据(如两者之间的差异)，利用第一损失函数(如第一交叉熵损失函数)，采用梯度下降方式，来调整第一神经网络中的各块连接权重和各块中的各并行层和各堆叠层中的各候选操作的操作权重。即在第二阶段训练的反向传播过程中，所有块连接权重和所有块中的所有并行层以及所有堆叠层中的所有候选操作的操作权重均会得到更新。In the case where the above-mentioned way 1 is adopted in the second stage, the present disclosure can use the first loss function (such as the first cross-entropy loss function) according to the processing result of the sample data and the sample data (such as the difference between the two), and adopt The gradient descent method is used to adjust the connection weight of each block in the first neural network and the operation weight of each candidate operation in each parallel layer in each block and each stack layer. That is, during the back-propagation process of the second stage training, all block connection weights and operation weights of all parallel layers in all blocks and all candidate operations in all stacked layers are updated.

可选的，本公开可以在上述第一损失函数中设置各层(包括并行层以及堆叠层)延迟参数，本公开可以根据样本数据的处理结果以及样本数据(如两者之间的差异)，利用设置有各层延迟参数的第一损失函数，采用梯度下降方式，进行多目标优化处理，以更新第一神经网络中的块连接权重以及各块中的各层(包括并行层以及堆叠层)中的各候选操作的操作权重。Optionally, the present disclosure may set delay parameters of each layer (including parallel layers and stacked layers) in the above-mentioned first loss function, and the present disclosure may, according to the processing result of the sample data and the sample data (such as the difference between the two), Using the first loss function set with the delay parameters of each layer, the gradient descent method is used to perform multi-objective optimization processing to update the block connection weights in the first neural network and each layer in each block (including parallel layers and stacked layers) The operation weight of each candidate operation in .

本公开通过在第一损失函数中设置各层延迟参数，并通过多目标优化方法来更新快连接权重以及操作权重，可以对最终获得的第二神经网络的处理速度进行控制，从而有利于提高第二神经网络对实际应用环境的适用性。The present disclosure can control the processing speed of the finally obtained second neural network by setting the delay parameters of each layer in the first loss function, and updating the fast connection weight and the operation weight through a multi-objective optimization method, thereby helping to improve the first Second, the applicability of neural networks to practical application environments.

可选的，设置有各层延迟参数的第一损失函数L(w,α,β)可以表示为：Optionally, the first loss function L(w, α, β) set with delay parameters of each layer can be expressed as:

L(w,α,β)＝L_CE+λlog_τlatency 公式(7)L(w,α,β)=L _CE +λlog _τ latency Formula (7)

在上述公式(7)中，L_CE表示操作参数的损失函数，例如，L_CE可以为上述实施例中描述的第二交叉熵损失函数；λ和τ表示超参数(hyper-parameters)，用于控制堆叠层延迟优化项(即latency)的幅度；λ和τ可以为已知值；λ的取值范围可以为：0.05-0.7；如λ可以为0.2；τ的取值范围可以为10-20，如τ可以为15；λ和τ与latency可以呈正相关；latency表示第一神经网络中的所有块的延迟时间。In the above formula (7), L _CE represents the loss function of the operation parameter, for example, L _CE can be the second cross-entropy loss function described in the above embodiment; λ and τ represent hyper-parameters, used for Controls the magnitude of the stacking layer delay optimization term (latency); λ and τ can be known values; λ can be in the range of 0.05-0.7; for example, λ can be 0.2; τ can be in the range of 10-20 , such as τ can be 15; λ and τ can be positively correlated with latency; latency represents the delay time of all blocks in the first neural network.

可选的，第一神经网络中的所有块的延迟时间latency，可以采用下述公式(8)计算获得：Optionally, the delay time latency of all blocks in the first neural network can be calculated and obtained by using the following formula (8):

在上述公式(8)中，latency_l表示块中的第l层的延迟时间；l的取值范围中的最小值可以为1，取值范围中的最大值可以为第一神经网络中的所有块中的所有头层和所有堆叠层所形成的层数。In the above formula (8), latency _l represents the delay time of the lth layer in the block; the minimum value in the value range of l can be 1, and the maximum value in the value range can be all the first neural network in the first neural network. The number of layers formed by all header layers and all stacked layers in the block.

可选的，在第l层为堆叠层时，上述公式(8)中的latency_l可以通过下述公式(9)计算获得：Optionally, when the lth layer is a stacked layer, the latency _l in the above formula (8) can be obtained by calculating the following formula (9):

在上述公式(9)中，

表示块中的第l堆叠层层中的第o个候选操作的操作权重(归一化处理后的操作权重)；

表示块中的第l堆叠层中的第o个候选操作的延迟时间；O表示候选操作集合。In the above formula (9),

represents the operation weight of the 0th candidate operation in the lth stack layer in the block (the operation weight after normalization);

represents the delay time of the 0th candidate operation in the lth stack layer in the block; O represents the set of candidate operations.

可选的，在第l层为头层时，上述公式(8)中的latency_l可以通过下述公式(10)计算获得：Optionally, when the lth layer is the head layer, the latency _l in the above formula (8) can be calculated by the following formula (10):

在上述公式(10)中，p_j,i表示第i块和第j块之间的块连接权重(归一化处理后的块连接权重)，第j块为与第i块相邻并连接的上游块；

表示第i块的头层中的第l并行层中的第o个候选操作的操作权重；

表示第i块的头层中的第l并行层中的第o个候选操作的延迟时间；O表示候选操作集合。In the above formula (10), p _j,i represents the block connection weight between the i-th block and the j-th block (the normalized block connection weight), and the j-th block is adjacent to and connected to the i-th block the upstream block;

represents the operation weight of the o-th candidate operation in the l-th parallel layer in the head layer of the i-th block;

represents the delay time of the o-th candidate operation in the l-th parallel layer in the head layer of the i-th block; O represents the set of candidate operations.

方式二、将第一数据集合中的样本数据提供给第一神经网络，经由第一神经网络中的所有块形成的所有路径中的所有层中的被挑选的两个候选操作，分别对样本数据进行处理，获得样本数据的处理结果。也就是说，输入样本数据流经了第一神经网络中的所有块连接，并流经了每一个块中的所有层，但是仅被所有块中的各并行层以及各堆叠层中的被挑选出的两个候选操作进行了处理。本公开可以采用随机挑选方式，来挑选两个候选操作，本公开也可以根据候选操作的操作权重来挑选两个候选操作。本公开对此不作限定。Mode 2: The sample data in the first data set is provided to the first neural network, and the sample data is processed separately through two selected candidate operations in all layers in all paths formed by all blocks in the first neural network. Perform processing to obtain the processing result of the sample data. That is, the input sample data flows through all block connections in the first neural network and through all layers in each block, but is only selected by each of the parallel layers in all the blocks and the stacked layers The two selected candidate operations are processed. The present disclosure may adopt a random selection method to select two candidate operations, and the present disclosure may also select two candidate operations according to the operation weights of the candidate operations. The present disclosure does not limit this.

在第二阶段采用上述方式二的情况下，本公开可以根据第一神经网络对样本数据的处理结果以及样本数据(如两者之间的差异)，利用第一损失函数，采用梯度下降方式，来调整第一神经网络中各块连接权重以及各块中的各层中的被挑选出的两个候选操作的操作权重。即在第二阶段训练的一次反向传播过程中，所有块连接权重以及所有块中的所有层(包括各并行层和各堆叠层)中的本次被挑选出的两个候选操作的操作权重会得到更新，而本次未被挑选出的候选操作的操作权重不会得到更新。本公开通过在第二阶段的训练过程中，挑选出两个候选操作对样本数据进行处理，有利于提高第二阶段的训练效率。In the case where the above-mentioned method 2 is adopted in the second stage, the present disclosure can use the first loss function and the gradient descent method according to the processing result of the sample data by the first neural network and the sample data (such as the difference between the two). to adjust the connection weight of each block in the first neural network and the operation weight of the selected two candidate operations in each layer in each block. That is, in a back-propagation process of the second-stage training, the connection weights of all blocks and the operation weights of the two candidate operations selected this time in all layers (including each parallel layer and each stacked layer) in all blocks will be updated, but the operation weights of the candidate operations that are not selected this time will not be updated. In the present disclosure, in the training process of the second stage, two candidate operations are selected to process the sample data, which is beneficial to improve the training efficiency of the second stage.

在一个可选示例中，在第二阶段采用上述方式二的情况下，本公开在更新了上述块连接权重以及操作权重之后，还可以进一步对被挑选出的两个候选操作的操作权重进行调整，例如，利用偏置量对被挑选出的两个候选操作的操作权重进行调整。In an optional example, in the case where the above-mentioned method 2 is adopted in the second stage, after updating the above-mentioned block connection weight and operation weight, the present disclosure may further adjust the operation weight of the two selected candidate operations , for example, using an offset to adjust the operation weights of the two selected candidate operations.

可选的，本公开可以根据更新后的操作权重，计算被挑选的两个候选操作的偏置量；之后，再根据计算出的偏置量，调整第一神经网络中各块中的各层中的被挑选的两个候选操作的操作权重，例如，在被挑选的两个候选操作的操作权重上分别增加该偏置量。Optionally, the present disclosure can calculate the offsets of the two selected candidate operations according to the updated operation weights; after that, adjust the layers in the blocks in the first neural network according to the calculated offsets. The operation weights of the two selected candidate operations in , for example, the offsets are respectively added to the operation weights of the two selected candidate operations.

可选的，本公开可以使用下述公式(11)来计算被挑选的两个候选操作的偏置量：Optionally, the present disclosure can use the following formula (11) to calculate the offsets of the two selected candidate operations:

在上述公式(11)中，O_s表示挑选出的两个候选操作；o表示挑选出的第o个候选操作；

表示更新前的第l层中的第o个候选操作的操作权重；

表示更新后的第l层中的第o个候选操作的操作权重。In the above formula (11), O _s represents the selected two candidate operations; o represents the selected o-th candidate operation;

Represents the operation weight of the o-th candidate operation in the l-th layer before the update;

Represents the operation weight of the o-th candidate operation in the updated l-th layer.

本公开通过利用偏置量来调整第一神经网络中各块中的各层中的被挑选的两个候选操作的操作权重，有利于使操作权重更合理化，从而有利于提高第二阶段的训练效率。The present disclosure helps to rationalize the operation weights by adjusting the operation weights of the two selected candidate operations in each layer of each block in the first neural network by using the offset, thereby improving the training in the second stage efficiency.

可选的，本公开可以通过比较第一神经网络中的各块连接权重，确定属于第二神经网络的块以及块中的并行层。本公开可以通过比较属于第二神经网络的块中的各层(包括并行层以及堆叠层)中的所有候选操作的操作权重，确定属于第二神经网络的块中的层以及层所包含的操作。另外，本公开可以根据属于第二神经网络中的块中的并行层，确定出第二神经网络中的上采样的位置。Optionally, the present disclosure may determine the blocks belonging to the second neural network and the parallel layers in the blocks by comparing the connection weights of the blocks in the first neural network. The present disclosure can determine the layers in the blocks belonging to the second neural network and the operations included in the layers by comparing the operation weights of all candidate operations in each layer (including parallel layers and stacked layers) in the blocks belonging to the second neural network . In addition, the present disclosure can determine the location of the upsampling in the second neural network according to the parallel layers in the blocks belonging to the second neural network.

需要特别说明的是，本公开可以根据实际需求预先设置第一神经网络中的部分层，预先设置的部分层会直接被导出成为第二神经网络中的部分层。例如，第一神经网络中的至少前一层(例如，前两层)；再例如，第一神经网络中的至少倒数最后一层等。即第二神经网络中的部分层可以是不通过神经网络结构搜索而获得的。预先设置的层所执行的操作可以根据实际需求设置。例如，第一/第二神经网络中的前两层是预先设置的，其中第一层可以为普通卷积层，其输出为基于16×112×112的信息(如16×112×112的特征图)。其中第二层可以采用卷积核大小为3×3，且扩张系数为1的MBConv结构，第二层的输出可以为基于24×56×56的信息(如24×56×56的特征图)。第一/第二神经网络中预先设置的层可以认为是预先设置的块，例如，上述第一层可以认为是第一/第二神经网络的第一块，上述第二层可以认为是第一/第二神经网络的第二块。在第一神经网络的第一阶段的训练过程中，可以对预先设置的部分层中的操作参数进行调整。It should be particularly noted that the present disclosure may preset some layers in the first neural network according to actual requirements, and the preset partial layers will be directly derived to become some layers in the second neural network. For example, at least the previous layer (eg, the first two layers) in the first neural network; for another example, at least the last-to-last layer in the first neural network, and so on. That is, some layers in the second neural network may be obtained without searching through the neural network structure. The operations performed by the pre-set layers can be set according to actual needs. For example, the first two layers in the first/second neural network are preset, where the first layer can be an ordinary convolutional layer whose output is based on 16×112×112 information (such as 16×112×112 features picture). The second layer can use an MBConv structure with a convolution kernel size of 3 × 3 and an expansion coefficient of 1, and the output of the second layer can be based on 24 × 56 × 56 information (such as a 24 × 56 × 56 feature map) . The pre-set layers in the first/second neural network can be considered as pre-set blocks, for example, the above-mentioned first layer can be considered as the first block of the first/second neural network, and the above-mentioned second layer can be considered as the first block. /Second block of the second neural network. During the training process of the first stage of the first neural network, the operation parameters in some of the preset layers may be adjusted.

示例性装置Exemplary device

图6为本公开的神经网络结构搜索装置一个实施例的结构示意图。该实施例的装置可用于实现本公开上述各方法实施例。图6所示的装置主要包括：获取模块600、搜索模块601以及确定模块602。FIG. 6 is a schematic structural diagram of an embodiment of a neural network structure search apparatus of the present disclosure. The apparatus of this embodiment can be used to implement the above method embodiments of the present disclosure. The apparatus shown in FIG. 6 mainly includes: an acquisition module 600 , a search module 601 and a determination module 602 .

获取模块600用于获取包括多个通道数不相同的块的第一神经网络。其中的第一神经网络中的至少一块均与至少三个块连接，至少一块包括用于执行块通道数和空间分辨率转换的头层。The obtaining module 600 is configured to obtain a first neural network including a plurality of blocks with different numbers of channels. At least one block in the first neural network is connected with at least three blocks, and at least one block includes a head layer for performing block channel number and spatial resolution conversion.

可选的，获取模块600可以获取第一神经网络所需的所有块的通道数以及与块连接的上/下游块数量，并根据上/下游块数量，按照块的通道数递增，确定所有块的连接关系，形成第一神经网络。Optionally, the acquisition module 600 can acquire the number of channels of all blocks required by the first neural network and the number of upstream/downstream blocks connected to the block, and according to the number of upstream/downstream blocks, increment the number of channels of the block to determine all blocks. The connection relationship forms the first neural network.

可选的，在获取模块600所形成的第一神经网络中，如果存在顺序连接的三个块，则在这三个块中，第一个块相对于中间块的通道数增幅，不大于中间块相对于最后一个块的通道数增幅。Optionally, in the first neural network formed by the acquisition module 600, if there are three blocks connected in sequence, in these three blocks, the increase in the number of channels of the first block relative to the middle block is not greater than that of the middle block. The number of channels increments of the block relative to the last block.

可选的，针对第一神经网络中的第i块，获取模块600可以根据第i块的上游块数量，确定第i块的头层所包括的并行层的数量，并根据第i块的通道数和空间分辨率、以及与第i块连接的各上游块的通道数和空间分辨率，确定各并行层各自对应的输入通道数和空间分辨率、以及输出通道数和空间分辨率。其中，不同并行层对应不同上游块。Optionally, for the ith block in the first neural network, the acquisition module 600 may determine the number of parallel layers included in the head layer of the ith block according to the number of upstream blocks of the ith block, and determine the number of parallel layers included in the head layer of the ith block according to the number of upstream blocks of the ith block, and The number and spatial resolution, as well as the channel number and spatial resolution of each upstream block connected to the i-th block, determine the corresponding input channel number and spatial resolution, as well as the output channel number and spatial resolution of each parallel layer. Among them, different parallel layers correspond to different upstream blocks.

可选的，获取模块600可以根据预设候选操作集合中的所有候选操作，设置头层中的各并行层所包含的操作；针对第一神经网络中的第i块中的任一并行层，获取模块600设置该并行层的输出的过程可以包括：根据该并行层中的各候选操作的操作权重以及各候选操作的输出，进行计算，获得计算结果。Optionally, the acquisition module 600 can set the operations included in each parallel layer in the head layer according to all the candidate operations in the preset candidate operation set; for any parallel layer in the i-th block in the first neural network, The process of setting the output of the parallel layer by the obtaining module 600 may include: performing calculation according to the operation weight of each candidate operation in the parallel layer and the output of each candidate operation, and obtaining the calculation result.

可选的，针对所述第一神经网络中的第i块，获取模块600设置第i块中的头层的输出的过程可以包括：获取模块600根据第i块的头层中的各并行层各自对应的块连接权重以及各并行层的输出，进行计算，获得计算结果。Optionally, for the i-th block in the first neural network, the process that the acquisition module 600 sets the output of the head layer in the i-th block may include: The corresponding block connection weight and the output of each parallel layer are calculated to obtain the calculation result.

可选的，针对第一神经网络中的第i块，获取模块600可以根据第i块的通道数和空间分辨率，确定第i块中的顺序位于头层之后的各堆叠层各自对应的输入通道数和空间分辨率以及输出通道数和空间分辨率。Optionally, for the i-th block in the first neural network, the acquisition module 600 can determine the respective inputs of the stacked layers in the i-th block that are sequentially located after the head layer according to the channel number and spatial resolution of the i-th block. Number of channels and spatial resolution and number of output channels and spatial resolution.

可选的，获取模块600可以根据预设候选操作集合中的所有候选操作，设置各块中的顺序位于头层之后的各堆叠层所包含的操作。针对第一神经网络中的第i块中的第j堆叠层，获取模块600设置第j堆叠层的输出的过程可以包括：根据第j堆叠层中的各候选操作的操作权重以及各候选操作的输出，进行计算，获得计算结果。Optionally, the obtaining module 600 may set the operations included in the stacking layers sequentially located after the head layer in each block according to all the candidate operations in the preset candidate operation set. For the jth stacked layer in the ith block in the first neural network, the process of setting the output of the jth stacked layer by the acquisition module 600 may include: according to the operation weight of each candidate operation in the jth stacked layer and the weight of each candidate operation Output, perform calculations, and obtain calculation results.

可选的，本公开中的搜索模块601在根据第一数据集合中的样本数据，利用基于梯度的搜索策略，对第一神经网络进行神经网络结构搜索处理，之前，还可以执行下述操作：搜索模块601将第二数据集合中的样本数据提供给第一神经网络，经由第一神经网络对样本数据进行处理；搜索模块601根据样本数据的处理结果以及样本数据，利用第一损失函数，调整第一神经网络中各块中的各层中的各候选操作的操作参数。Optionally, the search module 601 in the present disclosure can also perform the following operations before using a gradient-based search strategy to perform a neural network structure search process on the first neural network according to the sample data in the first data set: The search module 601 provides the sample data in the second data set to the first neural network, and processes the sample data via the first neural network; the search module 601 uses the first loss function to adjust the processing results of the sample data and the sample data. Operation parameters for each candidate operation in each layer in each block in the first neural network.

可选的，搜索模块601可以采用两种方式调整第一神经网络中各块中的各层中的各候选操作的操作参数，以实现第一阶段的训练。Optionally, the search module 601 can adjust the operation parameters of each candidate operation in each layer of each block in the first neural network in two ways, so as to realize the training in the first stage.

方式一、搜索模块601将第二数据集合中的样本数据提供给第一神经网络，经由第一神经网络中的所有块形成的所有路径中的所有层中的所有候选操作，分别对样本数据进行处理，获得样本数据的处理结果。Mode 1: The search module 601 provides the sample data in the second data set to the first neural network, and performs all candidate operations on the sample data through all candidate operations in all layers in all paths formed by all blocks in the first neural network. Process to obtain the processing result of the sample data.

方式二、搜索模块601将第二数据集合中的样本数据提供给第一神经网络，经由第一神经网络中的所有块形成的所有路径中的所有层中的被挑选的一个候选操作，分别对样本数据进行处理，获得样本数据的处理结果。在采用方式二的情况下，搜索模块601可以根据样本数据的处理结果以及样本数据，利用第二损失函数，采用梯度下降方式，调整所述第一神经网络中各块中的各层中的被挑选的候选操作的操作参数。Mode 2: The search module 601 provides the sample data in the second data set to the first neural network, and through a selected candidate operation in all layers in all paths formed by all blocks in the first neural network, respectively The sample data is processed to obtain the processing result of the sample data. In the case of adopting the second method, the search module 601 can use the second loss function and gradient descent method according to the processing result of the sample data and the sample data to adjust the parameters in the layers of the blocks in the first neural network. The operation parameters of the selected candidate operation.

可选的，搜索模块601可以将第一数据集合中的样本数据提供给第一神经网络，经由第一神经网络对所述样本数据进行处理，之后，搜索模块601根据样本数据的处理结果以及样本数据，利用第一损失函数，采用梯度下降方式，更新第一神经网络中的块连接权重以及各块中的各层中的各候选操作的操作权重。Optionally, the search module 601 can provide the sample data in the first data set to the first neural network, and process the sample data through the first neural network, and then the search module 601 can process the sample data according to the processing result of the sample data and the sample data. Using the first loss function and gradient descent method, the block connection weight in the first neural network and the operation weight of each candidate operation in each layer in each block are updated.

可选的，搜索模块601可以采用下述两种方式，实现第二阶段的训练。Optionally, the search module 601 can implement the second stage of training in the following two ways.

方式一、搜索模块601将第一数据集合中的样本数据提供给第一神经网络，经由第一神经网络中的所有块形成的所有路径中的所有层中的所有候选操作，分别对所述样本数据进行处理，获得样本数据的处理结果。Mode 1: The search module 601 provides the sample data in the first data set to the first neural network, and through all candidate operations in all layers in all paths formed by all blocks in the first neural network, respectively, for the sample The data is processed to obtain the processing result of the sample data.

方式二、搜索模块601将第一数据集合中的样本数据提供给第一神经网络，经由第一神经网络中的所有块形成的所有路径中的所有层中的被挑选的两个候选操作，分别对样本数据进行处理，获得样本数据的处理结果。Mode 2: The search module 601 provides the sample data in the first data set to the first neural network, and the selected two candidate operations in all the layers in all the paths formed by all the blocks in the first neural network, respectively. The sample data is processed to obtain the processing result of the sample data.

可选的，搜索模块可以根据样本数据的处理结果以及样本数据，利用包含有各层延迟参数的第一损失函数，采用梯度下降方式，进行多目标优化处理，以更新第一神经网络中的块连接权重以及各块中的各层中的各候选操作的操作权重。Optionally, the search module can perform multi-objective optimization processing according to the processing results of the sample data and the sample data, using the first loss function including the delay parameters of each layer, and adopt the gradient descent method to update the blocks in the first neural network. Connection weights and operation weights for each candidate operation in each layer in each block.

可选的，在搜索模块601采用上述方式二的情况下，搜索模块601还可以根据更新后的操作权重，计算被挑选的两个候选操作的偏置量；并根据该偏置量，调整第一神经网络中各块中的各层中的被挑选的两个候选操作的操作权重。Optionally, in the case where the search module 601 adopts the above-mentioned method 2, the search module 601 may also calculate the offsets of the two selected candidate operations according to the updated operation weights; and adjust the offsets according to the offsets. The operation weights of the two selected candidate operations in each layer in each block in a neural network.

确定模块602用于根据搜索模块601获得的第一神经网络的结构参数，确定搜索获得的第二神经网络结构。The determining module 602 is configured to determine the second neural network structure obtained by searching according to the structural parameters of the first neural network obtained by the searching module 601 .

示例性电子设备Exemplary Electronics

下面参考图7来描述根据本公开实施例的电子设备。图7示出了根据本公开实施例的电子设备的框图。如图7所示，电子设备71包括一个或多个处理器711和存储器77。An electronic device according to an embodiment of the present disclosure is described below with reference to FIG. 7 . 7 shows a block diagram of an electronic device according to an embodiment of the present disclosure. As shown in FIG. 7 , electronic device 71 includes one or more processors 711 and memory 77 .

处理器711可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元，并且可以控制电子设备71中的其他组件以执行期望的功能。Processor 711 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in electronic device 71 to perform desired functions.

存储器77可以包括一个或多个计算机程序产品，所述计算机程序产品可以包括各种形式的计算机可读存储介质，例如易失性存储器和/或非易失性存储器。所述易失性存储器，例如，可以包括：随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器，例如，可以包括：只读存储器(ROM)、硬盘以及闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令，处理器711可以运行所述程序指令，以实现上文所述的本公开的各个实施例的神经网络结构搜索方法以及/或者其他期望的功能。在所述计算机可读存储介质中还可以存储诸如输入信号、信号分量、噪声分量等各种内容。Memory 77 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache). The non-volatile memory may include, for example, read only memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 711 may execute the program instructions to implement the neural network structure search method and/or the various embodiments of the present disclosure described above. or other desired functionality. Various contents such as input signals, signal components, noise components, etc. may also be stored in the computer-readable storage medium.

在一个示例中，电子设备71还可以包括：输入装置713以及输出装置714等，这些组件通过总线系统和/或其他形式的连接机构(未示出)互连。此外，该输入设备713还可以包括例如键盘、鼠标等等。该输出装置714可以向外部输出各种信息。该输出设备714可以包括例如显示器、扬声器、打印机、以及通信网络及其所连接的远程输出设备等等。In one example, the electronic device 71 may further include: an input device 713 and an output device 714, etc., these components are interconnected through a bus system and/or other forms of connection mechanisms (not shown). In addition, the input device 713 may also include, for example, a keyboard, a mouse, and the like. The output device 714 can output various information to the outside. The output devices 714 may include, for example, displays, speakers, printers, and communication networks and their connected remote output devices, among others.

当然，为了简化，图7中仅示出了该电子设备71中与本公开有关的组件中的一些，省略了诸如总线、输入/输出接口等等的组件。除此之外，根据具体应用情况，电子设备71还可以包括任何其他适当的组件。Of course, for simplicity, only some of the components in the electronic device 71 related to the present disclosure are shown in FIG. 7 , and components such as buses, input/output interfaces, and the like are omitted. Besides, the electronic device 71 may also include any other appropriate components according to the specific application.

示例性计算机程序产品和计算机可读存储介质Exemplary computer program product and computer readable storage medium

除了上述方法和设备以外，本公开的实施例还可以是计算机程序产品，其包括计算机程序指令，所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本公开各种实施例的神经网络结构搜索方法中的步骤。In addition to the methods and apparatus described above, embodiments of the present disclosure may also be computer program products comprising computer program instructions that, when executed by a processor, cause the processor to perform the "exemplary method" described above in this specification The steps in the neural network structure search method according to various embodiments of the present disclosure are described in the section.

所述计算机程序产品可以以一种或多种程序设计语言的任意组合来编写用于执行本公开实施例操作的程序代码，所述程序设计语言包括面向对象的程序设计语言，诸如Java、C++等，还包括常规的过程式程序设计语言，诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。The computer program product may write program code for performing operations of embodiments of the present disclosure in any combination of one or more programming languages, including object-oriented programming languages, such as Java, C++, etc. , also includes conventional procedural programming languages, such as "C" language or similar programming languages. The program code may execute entirely on the user computing device, partly on the user device, as a stand-alone software package, partly on the user computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.

此外，本公开的实施例还可以是计算机可读存储介质，其上存储有计算机程序指令，所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本公开各种实施例的神经网络结构搜索方法中的步骤。In addition, embodiments of the present disclosure may also be computer-readable storage media having computer program instructions stored thereon that, when executed by a processor, cause the processor to perform the above-described "Example Method" section of this specification Steps in a neural network structure search method according to various embodiments of the present disclosure described in .

所述计算机可读存储介质可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列举)可以包括：具有一个或者多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The computer-readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatuses or devices, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media may include: electrical connections having one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), Erase programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the above.

以上结合具体实施例描述了本公开的基本原理，但是，需要指出的是，在本公开中提及的优点、优势、效果等仅是示例而非限制，不能认为这些优点、优势以及效果等是本公开的各个实施例必须具备的。另外，上述公开的具体细节仅是为了示例的作用和便于理解的作用，而非限制，上述细节并不限制本公开为必须采用上述具体的细节来实现。The basic principles of the present disclosure have been described above with reference to specific embodiments. However, it should be pointed out that the advantages, advantages, effects, etc. mentioned in the present disclosure are only examples rather than limitations, and these advantages, advantages, effects, etc. should not be considered to be A must-have for each embodiment of the present disclosure. In addition, the specific details disclosed above are only for the purpose of example and easy understanding, but not for limitation, and the above details do not limit the present disclosure to be implemented by the above specific details.

本说明书中各个实施例均采用递进的方式描述，每个实施例重点说明的都是与其它实施例的不同之处，各个实施例之间相同或相似的部分相互参见即可。对于系统实施例而言，由于其与方法实施例基本对应，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the points that are different from other embodiments, and the same or similar parts between the various embodiments may be referred to each other. As for the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for related parts, please refer to the partial description of the method embodiment.

本公开中涉及的器件、装置、设备、系统的方框图仅作为例示性的例子并且不意图要求或暗示必须按照方框图示出的方式进行连接、布置、配置。如本领域技术人员将认识到的，可以按任意方式连接、布置、配置这些器件、装置、设备以及系统。诸如“包括”、“包含、“具有”等等的词语是开放性词汇，指“包括但不限于”，且可与其互换使用。这里所使用的词汇“或”和“和”指词汇“和/或”，且可与其互换使用，除非上下文明确指示不是如此。这里所使用的词汇“诸如”指词组“诸如但不限于”，且可与其互换使用。The block diagrams of devices, apparatuses, apparatuses, and systems referred to in this disclosure are merely illustrative examples and are not intended to require or imply that the connections, arrangements, or configurations must be in the manner shown in the block diagrams. As those skilled in the art will appreciate, these devices, apparatus, apparatuses, and systems may be connected, arranged, and configured in any manner. Words such as "including," "including, "having," etc. are open-ended words meaning "including but not limited to," and are used interchangeably therewith. The words "or" and "and" as used herein refer to the words " and/or" and are used interchangeably therewith unless the context clearly dictates otherwise. The word "such as" as used herein refers to, and is used interchangeably with, the phrase "such as but not limited to".

可能以许多方式来实现本公开的方法和装置。例如，可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和装置。用于所述方法的步骤的上述顺序仅是为了进行说明，本公开的方法的步骤不限于以上具体描述的顺序，除非以其它方式特别说明。此外，在一些实施例中，还可将本公开实施为记录在记录介质中的程序，这些程序包括用于实现根据本公开的方法的机器可读指令。因而，本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。The methods and apparatus of the present disclosure may be implemented in many ways. For example, the methods and apparatus of the present disclosure may be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure can also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

还需要指出的是，在本公开的装置、设备和方法中，各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。It should also be noted that, in the apparatus, device and method of the present disclosure, each component or each step may be decomposed and/or recombined. These disaggregations and/or recombinations should be considered equivalents of the present disclosure.

提供所公开的方面的以上描述，以使本领域的任何技术人员能够做出或者使用本公开。对这些方面的各种修改等对于本领域技术人员而言，是非常显而易见的，并且在此定义的一般原理可以应用于其他方面，而不脱离本公开的范围。因此，本公开不意图被限制到在此示出的方面，而是按照与在此公开的原理和新颖的特征一致的最宽范围。The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications and the like to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the present disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

为了例示和描述的目的已经给出了以上描述。此外，此描述不意图将本公开的实施例限制到在此公开的形式中。尽管以上已经讨论了多个示例方面以及实施例，但是本领域技术人员将认识到其某些变型、修改、改变、添加和子组合。The foregoing description has been presented for the purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the forms disclosed herein. Although a number of example aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, changes, additions and sub-combinations thereof.

Claims

1. A neural network structure search method, comprising:

Obtain a first neural network including a plurality of blocks with different numbers of channels, wherein at least one block of the first neural network is connected to at least three blocks, and at least one block includes a head layer;

According to the sample data in the first data set, using a gradient-based search strategy, a neural network structure search process is performed on the first neural network to obtain structural parameters of the first neural network;

According to the structural parameters of the first neural network, the second neural network structure obtained by searching is determined.

2. The method according to claim 1, wherein the obtaining the first neural network comprising a plurality of blocks with different channel numbers comprises:

Get the number of channels of all blocks required for the first neural network and the number of upstream/downstream blocks connected to the block;

The connection relationship of all the blocks is determined according to the number of upstream/downstream blocks, and the number of channels of the blocks is incremented.

3. The method according to claim 2, wherein, among the three sequentially connected blocks in the first neural network, the increase in the number of channels of the first block relative to the middle block is not greater than that of the middle block relative to the last block Increases the number of channels for the block.

4. The method according to claim 2 or 3, wherein the acquiring the first neural network comprising a plurality of blocks with different numbers of channels comprises:

For the ith block in the first neural network, determine the number of parallel layers included in the head layer of the ith block according to the number of upstream blocks of the ith block, and determine the number of parallel layers included in the head layer of the ith block according to the channel of the ith block. number and spatial resolution, as well as the number of channels and spatial resolution of each upstream block connected to the i-th block, determine the number of input channels and spatial resolution, and the number of output channels and spatial resolution corresponding to each parallel layer;

Among them, different parallel layers correspond to different upstream blocks.

5. The method according to claim 4, wherein the obtaining the first neural network comprising a plurality of blocks with different numbers of channels comprises:

According to all the candidate operations in the preset candidate operation set, set the operations included in each parallel layer in the head layer;

For any parallel layer in the i-th block in the first neural network, setting the output of the parallel layer includes:

According to the operation weight of each candidate operation in the parallel layer and the output of each candidate operation, the calculation is performed to obtain the calculation result.

6. The method according to any one of claims 1 to 5, wherein the obtaining the first neural network comprising a plurality of blocks with different numbers of channels comprises:

For the ith block in the first neural network, setting the output of the head layer in the ith block includes:

According to the block connection weights corresponding to each parallel layer in the head layer of the i-th block and the output of each parallel layer, the calculation is performed to obtain the calculation result.

7. The method according to any one of claims 1 to 6, wherein the obtaining the first neural network comprising a plurality of blocks with different numbers of channels comprises:

For the ith block in the first neural network, according to the number of channels and the spatial resolution of the ith block, determine the number of input channels corresponding to each stacked layer in the ith block sequentially located after the head layer and spatial resolution as well as the number of output channels and spatial resolution.

8. The method according to any one of claims 1 to 7, wherein the obtaining the first neural network comprising a plurality of blocks with different numbers of channels comprises:

According to all the candidate operations in the preset candidate operation set, set the operations included in the stacking layers in the blocks that are sequentially located after the head layer;

For the jth stacked layer in the ith block in the first neural network, setting the output of the jth stacked layer includes: according to the operation weight of each candidate operation in the jth stacked layer and the output of each candidate operation, Do the calculation and get the calculation result.

9. The method according to any one of claims 1 to 8, wherein, according to the sample data in the first data set, the method uses a gradient-based search strategy to perform a neural network on the first neural network Structure search processing, before, also includes:

providing the sample data in the second data set to the first neural network, and processing the sample data via the first neural network;

According to the processing result of the sample data and the sample data, using the first loss function, the operation parameters of each candidate operation in each layer in each block in the first neural network are adjusted.

10. The method according to claim 9, wherein, providing the sample data in the second data set to the first neural network, and processing the sample data via the first neural network, comprising:

Provide the sample data in the second data set to the first neural network, and process the sample data through all candidate operations in all layers in all paths formed by all blocks in the first neural network to obtain samples. the result of the processing of the data; or

The sample data in the second data set is provided to the first neural network, and the sample data is processed separately through a selected candidate operation in all layers in all paths formed by all blocks in the first neural network , to obtain the processing results of the sample data;

The operation parameters of each candidate operation in each layer in each block in the first neural network are adjusted by using the first loss function according to the processing result of the sample data and the sample data, including:

According to the processing result of the sample data and the sample data, the second loss function is used to adjust the operation parameters of the selected candidate operations in each layer of each block in the first neural network by using a gradient descent method.

11. The method according to any one of claims 1 to 10, wherein, according to the sample data in the first data set, using a gradient-based search strategy, the first neural network is searched for a neural network structure processing, including:

providing sample data in the first data set to a first neural network, and processing the sample data via the first neural network;

According to the processing result of the sample data and the sample data, the first loss function is used, and the gradient descent method is used to update the block connection weight in the first neural network and the operation weight of each candidate operation in each layer in each block.

12. The method of claim 11, wherein the providing the sample data in the first data set to a first neural network, and processing the sample data via the first neural network, comprises:

Provide the sample data in the first data set to the first neural network, and process the sample data through all candidate operations in all layers in all paths formed by all blocks in the first neural network to obtain samples. the result of the processing of the data; or

The sample data in the first data set is provided to the first neural network, and the sample data are respectively subjected to two selected candidate operations in all layers in all paths formed by all blocks in the first neural network. Process to obtain the processing result of the sample data.

13. The method according to claim 12, wherein, according to the processing result of the sample data and the sample data, the first loss function is used to update the block connection weight and each block connection weight in the first neural network by using a gradient descent method. Operation weights for each candidate operation in each layer in the block, including:

According to the processing result of the sample data and the sample data, the first loss function including the delay parameters of each layer is used, and the gradient descent method is used to perform multi-objective optimization processing, so as to update the block connection weights and the block connection weights in the first neural network. The operation weight of each candidate operation in each layer in .

14. The method according to claim 13, wherein, according to the processing result of the sample data and the sample data, the second loss function is used to update the block connection weight and each block connection weight in the first neural network by using a gradient descent method. Operation weights for each candidate operation in each layer in the block, also including:

Calculate the offsets of the two selected candidate operations according to the updated operation weights;

According to the offset, the operation weights of the selected two candidate operations in each layer in each block in the first neural network are adjusted.

15. A neural network structure search device, comprising:

An acquisition module for acquiring a first neural network including a plurality of blocks with different numbers of channels, wherein at least one block of the first neural network is connected to at least three blocks, and at least one block includes a number of channels for performing block and Head layer for spatial resolution conversion;

A search module, configured to perform a neural network structure search process on the first neural network acquired by the acquisition module according to the sample data in the first data set, using a gradient-based search strategy, to obtain structural parameters of the first neural network ;

A determining module, configured to determine the second neural network structure obtained by searching according to the structural parameters of the first neural network obtained by the searching module.

16. A computer-readable storage medium storing a computer program for performing the method of any one of the preceding claims 1-14.

17. An electronic device comprising:

processor;

a memory for storing the processor-executable instructions;

The processor, configured to read the executable instructions from the memory and execute the instructions to implement the method of any one of the preceding claims 1-14.