CN112106077A

CN112106077A - Method, apparatus, storage medium, and computer program product for network structure search

Info

Publication number: CN112106077A
Application number: CN201980031708.4A
Authority: CN
Inventors: 蒋阳; 庞磊; 胡湛
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-12-18
Also published as: WO2021081809A1

Abstract

The present application provides a network structure search method, including: a general map training step: training a first general map according to a first network structure and training data to generate a second general map; network structure training step: according to the first general map The network structure determines several test sub-graphs from the second overall graph; tests the several test sub-graphs through test data to generate feedback results; determines jumper constraints according to the feedback results; according to the feedback results and The jumper constraints update the first network structure. In the training process of the neural network, different training stages have different requirements on the jumper density. The jumper constraints in the above method are determined based on the feedback results in the current training stage. Therefore, the jumper density in the above method is different from the current training. Stage correlation makes the current jumper density of the controller more suitable for the current training stage, so that the training efficiency can be improved while achieving good training results.

Description

Method, apparatus, storage medium and computer program product for network structure search

版权申明Copyright notice

本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或者该专利披露。The disclosure of this patent document contains material that is subject to copyright protection. This copyright belongs to the copyright owner. The copyright owner has no objection to the reproduction by anyone of the patent document or the patent disclosure as it exists in the official records and archives of the Patent and Trademark Office.

技术领域technical field

本申请涉及人工智能(Artificial Intelligence，AI)领域，并且更为具体地，涉及一种网络结构搜索的方法、装置、存储介质和计算机程序产品。The present application relates to the field of artificial intelligence (Artificial Intelligence, AI), and more particularly, to a method, apparatus, storage medium and computer program product for searching a network structure.

背景技术Background technique

神经网络是AI的基础，随着神经网络的性能不断提高，其网络结构也越来越复杂。一个神经网络可以需要经过训练才能够被正常使用，神经网络的训练过程主要是调整神经网络的各层内的操作(operation)以及各层之间的连接关系，以便于神经网络能够输出正确的结果。其中，上述连接关系也可以称为跳线(skip)或者捷径(shortcut)。Neural network is the foundation of AI. As the performance of neural network continues to improve, its network structure becomes more and more complex. A neural network may need to be trained before it can be used normally. The training process of the neural network is mainly to adjust the operations in each layer of the neural network and the connection between the layers, so that the neural network can output correct results. . The above connection relationship may also be referred to as a jumper (skip) or a shortcut (shortcut).

一种训练神经网络的方法是利用高效神经结构搜索(Efficient NeuralArchitecture Search，ENAS)训练神经网络。在利用ENAS训练神经网络的过程中，控制器不断的对神经网络的网络结构进行采样，尝试不同的网络结构对输出结果的影响，并利用上一次采样得到的网络结构的输出结果确定下一次采样的网络结构，直至神经网络收敛。One way to train a neural network is to use Efficient Neural Architecture Search (ENAS) to train the neural network. In the process of training the neural network with ENAS, the controller continuously samples the network structure of the neural network, tries the influence of different network structures on the output results, and uses the output results of the network structure obtained by the previous sampling to determine the next sampling the network structure until the neural network converges.

相比于手工调试神经网络的方法，利用ENAS训练神经网络能够提高神经网络的训练效率，但是，利用ENAS训练神经网络的训练效率仍有待于提高。Compared with the method of manually debugging the neural network, using ENAS to train the neural network can improve the training efficiency of the neural network, but the training efficiency of using the ENAS to train the neural network still needs to be improved.

发明内容SUMMARY OF THE INVENTION

本申请的实施方式提供一种网络结构搜索的方法及装置、计算机存储介质和计算机程序产品。Embodiments of the present application provide a method and apparatus for searching a network structure, a computer storage medium, and a computer program product.

第一方面，提供了一种网络结构搜索方法，包括：总图训练步骤：根据第一网络结构和训练数据对第一总图进行训练，生成第二总图；网络结构训练步骤：根据所述第一网络结构从所述第二总图中确定若干测试子图；通过测试数据对所述若干测试子图进行测试，生成反馈结果；根据所述反馈结果确定跳线约束项；根据所述反馈结果以及跳线约束项对所述第一网络结构进行更新。A first aspect provides a network structure search method, comprising: a general graph training step: training a first general graph according to a first network structure and training data to generate a second general graph; network structure training step: according to the The first network structure determines several test subgraphs from the second overall graph; tests the several test subgraphs through test data to generate feedback results; determines jumper constraints according to the feedback results; The result and the jumper constraints update the first network structure.

上述方法可以应用于芯片、移动终端或者服务器。在神经网络的训练过程中，不同训练阶段对跳线密度的要求不同；例如，在一些神经网络的初期训练阶段中，为了尽可能地探索搜索空间以避免网络结构出现偏差(bias)，需要采用跳线密度较大的控制器进行训练；在一些神经网络的后期训练阶段中，神经网络的随机性已大大减小，无需探索全部搜索空间，则可以采用跳线密度较小的控制器进行训练，以减小资源(包括算力资源和时间资源)的消耗。由于上述方法中的跳线约束项是基于当前训练阶段中的反馈结果确定的，因此，上述方法中跳线约束项与当前训练阶段相关，使得控制器的当前跳线密度更加适应当前训练阶段，从而能够在取得良好的训练结果的同时减小资源消耗，提高训练效率，尤其适用于移动设备。The above method can be applied to a chip, a mobile terminal or a server. In the training process of neural network, different training stages have different requirements for jumper density; for example, in the initial training stage of some neural networks, in order to explore the search space as much as possible to avoid the bias of the network structure, it is necessary to use A controller with a larger jumper density is used for training; in the later training stage of some neural networks, the randomness of the neural network has been greatly reduced, and the controller with a smaller jumper density can be used for training without exploring the entire search space. , in order to reduce the consumption of resources (including computing power resources and time resources). Since the jumper constraint in the above method is determined based on the feedback results in the current training stage, the jumper constraint in the above method is related to the current training stage, so that the current jumper density of the controller is more suitable for the current training stage, Therefore, it is possible to reduce resource consumption and improve training efficiency while obtaining good training results, which is especially suitable for mobile devices.

第二方面，提供一种网络结构搜索装置，所述装置用于执行上述第一方面中的方法。In a second aspect, an apparatus for searching a network structure is provided, the apparatus is configured to execute the method in the above-mentioned first aspect.

第三方面，提供了一种网络结构搜索装置，所述装置包括存储器和处理器，所述存储器用于存储指令，所述处理器用于执行所述存储器存储的指令，并且对所述存储器中存储的指令的执行使得所述处理器执行第一方面的方法。In a third aspect, a network structure search apparatus is provided, the apparatus includes a memory and a processor, the memory is used for storing instructions, the processor is used for executing the instructions stored in the memory, and the memory is stored in the memory. Execution of the instructions causes the processor to perform the method of the first aspect.

第四方面，提供了一种芯片，所述芯片包括处理模块与通信接口，所述处理模块用于控制所述通信接口与外部进行通信，所述处理模块还用于实现第一方面的方法。In a fourth aspect, a chip is provided, the chip includes a processing module and a communication interface, the processing module is configured to control the communication interface to communicate with the outside, and the processing module is further configured to implement the method of the first aspect.

第五方面，提供了一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被计算机执行时使得所述计算机实现第一方面的方法。In a fifth aspect, a computer-readable storage medium is provided on which a computer program is stored, and when the computer program is executed by a computer, the computer causes the computer to implement the method of the first aspect.

第六方面，提供一种包含指令的计算机程序产品，所述指令被计算机执行时使得所述计算机实现第一方面的方法。In a sixth aspect, there is provided a computer program product comprising instructions that, when executed by a computer, cause the computer to implement the method of the first aspect.

附图说明Description of drawings

图1是本申请提供的一种网络结构搜索方法的流程示意图；1 is a schematic flowchart of a network structure search method provided by the present application;

图2是本申请提供的一种总图和子图的示意图；Fig. 2 is the schematic diagram of a kind of general drawing and sub drawing provided by this application;

图3是本申请提供的另一种网络结构搜索方法的流程示意图；3 is a schematic flowchart of another network structure search method provided by the present application;

图4是本申请提供的再一种网络结构搜索方法的流程示意图；Fig. 4 is a schematic flowchart of yet another network structure search method provided by the present application;

图5是本申请提供的一种网络结构搜索装置的示意图。FIG. 5 is a schematic diagram of a network structure search apparatus provided by the present application.

具体实施方式Detailed ways

下面将结合附图，对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

除非另有定义，本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的，不是旨在于限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein in the specification of the application are for the purpose of describing specific embodiments only, and are not intended to limit the application.

在本申请的描述中，需要理解的是，术语“第一”、“第二”仅用于描述不同的对象，而不能被理解为存在其它限定。例如，“第一总图”和“第二总图”表示两个不同的总图，除此之外不存在其它限定。此外，在本申请的描述中，“多个”的含义是两个或两个以上，除非另有明确具体的限定。In the description of this application, it should be understood that the terms "first" and "second" are only used to describe different objects, and should not be construed as having other limitations. For example, the "first general drawing" and the "second general drawing" represent two different general drawings, but there is no other definition. In addition, in the description of the present application, "plurality" means two or more, unless otherwise expressly and specifically defined.

在本申请的描述中，需要说明的是，除非另有明确的规定和限定，术语“连接”应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或一体地连接；可以是机械连接，也可以是电连接或可以相互通信；可以是直接相连，也可以通过中间媒介间接相连，可以是两个元件内部的连通或两个元件的相互作用关系。对于本领域的普通技术人员而言，可以根据具体情况理解上述术语在本申请中的具体含义。In the description of this application, it should be noted that, unless otherwise expressly specified and limited, the term "connection" should be understood in a broad sense, for example, it may be a fixed connection, a detachable connection, or an integral connection; It is a mechanical connection, it can also be an electrical connection or can communicate with each other; it can be a direct connection or an indirect connection through an intermediate medium, and it can be the internal communication between the two elements or the interaction relationship between the two elements. For those of ordinary skill in the art, the specific meanings of the above terms in this application can be understood according to specific situations.

下文的公开提供了许多不同的实施方式或例子用来实现本申请的方案。为了简化本申请的公开，下文例子中对特定的部件或步骤进行描述。当然，下文中的例子的目的不在于限制本申请。此外，本申请可能在不同的例子中重复使用参考数字和/或参考字母，这种重复是为了简化和清楚的目的，其本身不代表所讨论各种实施方式和/或设置之间的关系。The following disclosure provides many different embodiments or examples for implementing the concepts of the present application. In order to simplify the disclosure of the present application, specific components or steps are described in the following examples. Of course, the following examples are not intended to limit the application. Furthermore, this application may repeat reference numerals and/or reference letters in various instances for the purpose of simplicity and clarity and does not in itself represent a relationship between the various embodiments and/or arrangements discussed.

下面将详细描述本申请的实施方式，下文所描述的实施方式是示例性的，仅用于解释本申请，而不能理解为对本申请的限制。The embodiments of the present application will be described in detail below. The embodiments described below are exemplary and are only used to explain the present application, but should not be construed as a limitation on the present application.

近年来，机器学习算法，尤其是深度学习算法，得到了快速发展与广泛应用。随着模型性能不断地提高，模型结构也越来越复杂。在非自动化机器学习算法中，这些结构需要机器学习专家手工设计和调试，过程非常繁复。而且，随着应用场景和模型结构变得越来越复杂，在应用场景中得到最优模型的难度也越来越大。在这种情况下，自动化机器学习算法(Auto Machine Learning，AutoML)受到学术界与工业界的广泛关注，尤其是神经结构搜索(Neural Architecture Search，NAS)。In recent years, machine learning algorithms, especially deep learning algorithms, have been rapidly developed and widely used. As the model performance continues to improve, the model structure becomes more and more complex. In non-automated machine learning algorithms, these structures need to be manually designed and debugged by machine learning experts, and the process is very cumbersome. Moreover, as application scenarios and model structures become more and more complex, it becomes increasingly difficult to obtain optimal models in application scenarios. In this case, Auto Machine Learning (AutoML) has received extensive attention from academia and industry, especially Neural Architecture Search (NAS).

具体地，网络结构搜索是一种利用算法自动化设计神经网络模型的技术。网络结构搜索就是要搜索出神经网络模型的结构。在本申请实施方式中，待进行网络结构搜索的神经网络模型为卷积神经网络(Convolutional Neural Networks，CNN)。Specifically, network structure search is a technique that uses algorithms to automate the design of neural network models. The network structure search is to search for the structure of the neural network model. In the embodiments of the present application, the neural network model to be searched for the network structure is a convolutional neural network (Convolutional Neural Networks, CNN).

网络结构搜索要解决的问题就是确定神经网络模型中的节点之间的操作。节点之间的操作的不同组合对应不同的网络结构。进一步地，神经网络模型中的节点可以理解为神经网络模型中的特征层。两个节点之间的操作指的是，其中一个节点上的特征数据变换为另一个节点上的特征数据所需的操作。本申请提及的操作可以为卷积操作、池化操作、或全连接操作等其他神经网络操作。可以认为两个节点之间的操作构成这两个节点之间的操作层。通常，两个节点之间的操作层上具有多个可供搜索的操作，即具有多个候选操作。网络结构搜索的目的就是在每个操作层上确定一个操作。The problem to be solved by network structure search is to determine the operations between nodes in the neural network model. Different combinations of operations between nodes correspond to different network structures. Further, the nodes in the neural network model can be understood as feature layers in the neural network model. An operation between two nodes refers to an operation required to transform feature data on one of the nodes into feature data on the other node. The operations mentioned in this application may be other neural network operations such as convolution operations, pooling operations, or fully connected operations. It can be considered that the operations between two nodes constitute the operation layer between the two nodes. Usually, there are multiple searchable operations on the operation layer between two nodes, that is, there are multiple candidate operations. The purpose of the network structure search is to identify an operation at each operation layer.

例如，将conv3*3，conv5*5，depthwise3*3，depthwise5*5，maxpool3*3，averagepool3*3等定义为搜索空间。也即是说，网络结构的每一层操作是在这六个选择中采样。For example, define conv3*3, conv5*5, depthwise3*3, depthwise5*5, maxpool3*3, averagepool3*3, etc. as the search space. That is, each layer of the network structure operates by sampling among these six choices.

如图1所示，NAS在建立搜索空间后，通常利用控制器(一种神经网络)在搜索空间中采样到一个网络结构A，然后训练具有网络结构A的子网络(a child network witharchitecture A)以确定例如准确度(accuracy)R的反馈量，其中，准确度也可称为预测值；随后，计算p的梯度并用R来缩放它以更新控制器，即，将R作为反馈(reward)更新控制器。As shown in Figure 1, after NAS establishes a search space, it usually uses a controller (a kind of neural network) to sample a network structure A in the search space, and then trains a child network with architecture A (a child network with architecture A) To determine the amount of feedback such as the accuracy R, which may also be referred to as the predicted value; then, compute the gradient of p and scale it with R to update the controller, ie, update R as reward controller.

随后，利用更新后的控制器在搜索空间中取样得到新的网络结构，循环执行上述步骤，直至得到一个收敛的子网络。Then, a new network structure is obtained by sampling in the search space with the updated controller, and the above steps are performed cyclically until a convergent sub-network is obtained.

在图1的示例中，控制器可以是循环神经网络(Recurrent Neural Network，RNN)，也可以是CNN或长短期记忆(Long-Short Term Memory，LSTM)神经网络。本申请不对控制器的具体形式进行限定。In the example of FIG. 1 , the controller may be a recurrent neural network (Recurrent Neural Network, RNN), or a CNN or a Long-Short Term Memory (Long-Short Term Memory, LSTM) neural network. This application does not limit the specific form of the controller.

然而，将子网络结构训练到收敛比较耗时。因此，相关技术出现了多种提高NAS效率的方法，例如，基于网络变换的高效网络结构搜索(efficient architecture search bynetwork transformation)，以及基于权值分享的ENAS(ENAS via parameter sharing)。其中，基于权值分享的ENAS应用较为广泛。However, training the sub-network structure to convergence is time-consuming. Therefore, various methods for improving the efficiency of NAS have emerged in the related art, such as efficient architecture search by network transformation based on network transformation, and ENAS via parameter sharing (ENAS via parameter sharing) based on weight sharing. Among them, ENAS based on weight sharing is widely used.

如图2所示，总图由各个节点表示的操作以及各个操作之间的跳线组成，其中，各个节点所表示的操作可以是搜索空间的全部操作。在使用基于权值分享的ENAS的过程中，控制器可以从搜索空间中搜索网络结构，例如，从搜索空间中确定各个节点的操作以及节点之间的连接关系，基于搜索到的最终网络结构从总图中确定一个子图。图2中由加粗的箭头连接的操作为最终子图的一个示例，其中，节点1为最终子图的输入节点，节点3和节点6为最终子图的输出节点。As shown in FIG. 2 , the overall graph is composed of operations represented by each node and jumpers between the operations, wherein the operations represented by each node may be all operations in the search space. In the process of using ENAS based on weight sharing, the controller can search the network structure from the search space, for example, determine the operation of each node and the connection relationship between nodes from the search space, Identify a subgraph in the overall graph. The operations connected by the bold arrows in FIG. 2 are an example of the final subgraph, wherein node 1 is the input node of the final subgraph, and node 3 and node 6 are the output nodes of the final subgraph.

在使用基于权值分享的ENAS的过程中，每次采样到一个网络结构后，基于该网络结构从总图中确定一个子图后，不再将其直接训练至收敛，而是使用一份小批量(batch)数据对其进行训练，例如，可以使用反向传播(Back Propagation，BP)算法更新该子图的参数，完成一次迭代。由于子图属于总图，因此，更新子图的参数相当于更新总图的参数。迭代多次后，总图最终可以收敛。请注意，总图的收敛并不相当于子图的收敛。In the process of using ENAS based on weight sharing, after sampling a network structure each time, after determining a subgraph from the overall graph based on the network structure, it is no longer directly trained to convergence, but a small It is trained on batches of data, for example, the parameters of the subgraph can be updated using the Back Propagation (BP) algorithm to complete one iteration. Since the subgraph belongs to the general graph, updating the parameters of the subgraph is equivalent to updating the parameters of the general graph. After many iterations, the total graph can finally converge. Note that the convergence of the total graph is not equivalent to the convergence of the subgraphs.

在训练完所述总图后，可以将总图的参数固定(fix)住，然后训练控制器。例如，可以通过该控制器在搜索空间中搜索网络结构，并基于该网络结构从总图中得到一个子图，将测试数据(valid)输入该子图得到预测值，利用该预测值更新控制器。After training the general graph, the parameters of the general graph can be fixed, and then the controller can be trained. For example, a network structure can be searched in the search space by the controller, and a subgraph can be obtained from the overall graph based on the network structure, test data (valid) can be input into the subgraph to obtain a predicted value, and the controller can be updated with the predicted value. .

由于基于权值分享的ENAS在每次搜索网络结构时，分享了可以分享的参数，从而提高网络结构搜索的效率。例如，在图2所示的例子中，上次搜索网络结构时搜索到节点1、节点3和节点6并对搜索到的网络结构进行训练，本次搜索到节点1、节点2、节点3和节点6，那么，上次得到的与节点1、节点3和节点6的相关参数可以应用到对本次搜索到的网络结构的训练中。这样，就实现了通过权值分享提高效率。Since ENAS based on weight sharing shares the parameters that can be shared each time it searches the network structure, the efficiency of network structure search is improved. For example, in the example shown in Figure 2, when the network structure was searched last time, node 1, node 3 and node 6 were searched and the searched network structure was trained. This time, node 1, node 2, node 3 and Node 6, then the parameters related to node 1, node 3 and node 6 obtained last time can be applied to the training of the network structure searched this time. In this way, the efficiency can be improved through weight sharing.

ENAS可以将NAS的效率提升1000倍以上，但是，在实际使用的过程中，会出现如下问题：子图的预测值是一直变化的，随着训练进程的进行，控制器确定的子图的预测结果越来越准确，即，子图的预测值逐渐增大；而控制器的参数更新公式的跳线约束项的系数是固定的，因此，跳线约束项所产生的约束力度随着子图的训练进程不断减小；跳线约束项反映了控制器的跳线密度，跳线约束项所产生的约束力度逐渐减小意味着控制器的跳线密度逐渐增大，过多的跳线会增大控制器的每秒浮点运算次数(floating-point operations persecond，FLOPS)，从而降低控制器的更新效率，进而影响确定最终子图的效率。此外，若将跳线约束项的初始值设置为较大的值，则由于子图的预测值在总图训练的初始阶段较小，会导致跳线约束项的约束力度过大，使得更新控制器时无法充分探索搜索空间，从而导致控制器的网络结构出现较大的偏差(bias)。ENAS can improve the efficiency of NAS by more than 1000 times. However, in the process of actual use, the following problems will occur: the predicted value of the sub-graph changes all the time. With the progress of the training process, the prediction of the sub-graph determined by the controller The result is more and more accurate, that is, the predicted value of the subgraph increases gradually; while the coefficient of the jumper constraint item of the parameter update formula of the controller is fixed, so the constraint force generated by the jumper constraint term increases with the subgraph. The training process of the controller decreases continuously; the jumper constraint item reflects the jumper density of the controller, and the gradual decrease of the constraint force generated by the jumper constraint means that the jumper density of the controller gradually increases, and too many jumpers will Increasing the number of floating-point operations per second (FLOPS) of the controller reduces the update efficiency of the controller, which in turn affects the efficiency of determining the final subgraph. In addition, if the initial value of the jumper constraint item is set to a larger value, since the predicted value of the subgraph is small in the initial stage of the overall graph training, the constraint force of the jumper constraint item will be too large, making the update control The controller cannot fully explore the search space, resulting in a large bias in the network structure of the controller.

基于此，本申请提供了一种网络结构搜索方法，如图3所示，该方法包括：Based on this, the present application provides a network structure search method, as shown in FIG. 3 , the method includes:

总图(whole graph)训练步骤S12：根据第一网络结构和训练(train)数据对第一总图进行训练，生成第二总图；The whole graph (whole graph) training step S12: according to the first network structure and the training (train) data, the first overall graph is trained, and the second overall graph is generated;

网络结构训练步骤S14：根据所述第一网络结构从所述第二总图中确定若干测试子图；通过测试数据对所述若干测试子图进行测试，生成反馈结果；根据所述反馈结果确定跳线约束项；根据所述反馈结果以及所述跳线约束项对所述第一网络结构进行更新。Network structure training step S14: Determine several test sub-graphs from the second overall graph according to the first network structure; test the several test sub-graphs through test data to generate feedback results; determine according to the feedback results A jumper constraint item; the first network structure is updated according to the feedback result and the jumper constraint item.

第一网络结构可以是控制器在任意一个训练阶段的网络结构，例如，第一网络结构可以是从未更新过的控制器的网络结构，或者，第一网络结构可以是更新过若干次的控制器的网络结构。The first network structure may be the network structure of the controller in any training stage. For example, the first network structure may be the network structure of the controller that has never been updated, or the first network structure may be the controller that has been updated several times. the network structure of the device.

在本申请中，“若干”指的是至少一个，例如，若干次指的是至少一次，若干测试子图指的是至少一个测试子图。In this application, "several" refers to at least one, eg, several times refers to at least one time, and several test submaps refer to at least one test submap.

第一总图可以是包含预设层数的神经网络，例如，预设层数为4，每层神经网络对应包含6个操作的搜索空间，则第一总图可以是由24个操作以及该24个操作之间的连接关系构成的网络结构，每层网络结构包含6个操作。The first general graph may be a neural network including a preset number of layers. For example, if the preset number of layers is 4, and each layer of neural network corresponds to a search space containing 6 operations, the first general graph may be composed of 24 operations and this The network structure formed by the connection relationship between 24 operations, each layer of network structure contains 6 operations.

第二总图可以不是收敛的总图，但是，经过训练数据的训练，第二总图的随机性通常小于第一总图的随机性。The second general graph may not be a converged general graph, however, after training on the training data, the randomness of the second general graph is generally smaller than the randomness of the first general graph.

在训练第一总图时，第一网络结构可以从在第一总图内确定第一训练子图；将该训练数据中的一批数据输入第一训练子图，生成第一训练结果；根据第一训练结果训练第一总图，生成所述第二总图。例如，可以利用第一训练结果和反向传播(Back Propagation，BP)算法训练第一总图。When training the first general graph, the first network structure can determine the first training subgraph in the first general graph; input a batch of data in the training data into the first training subgraph to generate the first training result; according to The first training result trains the first general graph to generate the second general graph. For example, the first overall graph may be trained using the first training result and a Back Propagation (BP) algorithm.

第一网络结构可以利用图2所示的方法从在第一总图内确定第一训练子图，并且，利用图2所示的方法更新第一网络结构。在循环执行若干次所述总图训练步骤和所述网络结构训练步骤后，生成最终总图和最终网络结构(即，最终的控制器)；通过所述最终网络结构在最终总图中确定最终子图，最终子图为符合预设场景的网络结构。The first network structure can use the method shown in FIG. 2 to determine the first training subgraph from within the first overall graph, and update the first network structure using the method shown in FIG. 2 . After cyclically executing the overall graph training step and the network structure training step several times, a final overall graph and a final network structure (ie, the final controller) are generated; the final overall graph is determined by the final network structure in the final overall graph Subgraph, the final subgraph is the network structure that conforms to the preset scene.

下面，将详细介绍更新第一网络结构的过程。Next, the process of updating the first network structure will be described in detail.

第一网络结构可以是LSTM神经网络，搜索空间例如包括conv3*3、conv5*5、depthwise3*3、depthwise5*5、maxpool3*3和averagepool3*3。The first network structure may be an LSTM neural network, and the search space includes, for example, conv3*3, conv5*5, depthwise3*3, depthwise5*5, maxpool3*3 and averagepool3*3.

若待搜索的网络结构的预设层数为20，则需要构建一个20层的第一总图，第一总图的每一层包含搜索空间的所有操作。待搜索的网络结构的每一个层对应LSTM神经网络的一个时间步(timestep)，在不考虑跳线的情况下，LSTM神经网络需要执行20个时间步，每执行一个时间步，LSTM神经网络的单元(cell)会输出一个隐状态(hidden state)，可以对该隐状态进行编码(encoding)操作，将其映射为维度为6的向量(vector)，该向量与搜索空间对应，6个维度对应搜索空间的6个操作；随后，该向量经过softmax函数处理，变为概率分布，LSTM神经网络依据此概率分布进行采样(sample)，得到待搜索的网络结构的当前层的操作(operation)。重复上述过程，得到一个网络结构(即，子图)。If the preset number of layers of the network structure to be searched is 20, a 20-layer first general graph needs to be constructed, and each layer of the first general graph includes all operations of the search space. Each layer of the network structure to be searched corresponds to a time step of the LSTM neural network. Without considering the jumper, the LSTM neural network needs to execute 20 time steps. The unit (cell) will output a hidden state (hidden state), which can be encoded (encoding) and mapped to a vector with a dimension of 6, which corresponds to the search space and 6 dimensions correspond to Six operations in the search space; then, the vector is processed by the softmax function and becomes a probability distribution, and the LSTM neural network performs sampling (sample) according to this probability distribution to obtain the operation of the current layer of the network structure to be searched. The above process is repeated to obtain a network structure (ie, subgraph).

图4示出了一种确定网络结构的示例。FIG. 4 shows an example of determining a network structure.

空白的矩形表示LSTM神经网络的单元(cell)，包含“conv3*3”等内容的方块表示在待搜索的网络结构中该层的操作，圆圈表示层与层之间的连接关系。The blank rectangle represents the cell of the LSTM neural network, the square containing "conv3*3" and other contents represents the operation of the layer in the network structure to be searched, and the circle represents the connection between layers.

LSTM神经网络可以对隐状态进行编码(encoding)操作，将其映射维度为6的向量(vector)，该向量经过归一化指数函数(softmax)，变为概率分布，依据此概率分布进行采样，得到当前层的操作。The LSTM neural network can perform an encoding operation on the hidden state, and map it to a vector with a dimension of 6. The vector is transformed into a probability distribution through a normalized exponential function (softmax), and sampling is performed according to this probability distribution. Get the operation of the current layer.

例如，在执行第一个时间步的过程中，输入LSTM神经网络的单元的输入量(例如，一个随机值)被softmax函数归一化成一个向量，随后被翻译成一个操作(conv3×3)；conv3×3作为LSTM神经网络的单元在执行第二个时间步时的输入量，执行第一个时间步生成的隐状态也作为执行第二个时间步时的输入量，上述两个输入量经过处理得到圆圈1，圆圈1表示当前操作层(节点2对应的操作层)的输出和第一个操作层(节点1对应的操作层)的输出拼接(concatenated)到一起。For example, during the execution of the first time step, the input quantity (e.g., a random value) to the unit of the LSTM neural network is normalized into a vector by the softmax function, which is subsequently translated into an operation (conv3×3); conv3×3 is used as the input of the unit of the LSTM neural network when executing the second time step, and the hidden state generated by the first time step is also used as the input when executing the second time step. After processing, circle 1 is obtained, and circle 1 indicates that the output of the current operation layer (operation layer corresponding to node 2) and the output of the first operation layer (operation layer corresponding to node 1) are concatenated together.

同理，圆圈1作为LSTM神经网络的单元在执行第三个时间步时的输入量，执行第二个时间步生成的隐状态也作为执行第三个时间步时的输入量，该两个输入量经过处理得到sep5×5。以此类推最终得到一个网络结构。Similarly, circle 1 is used as the input of the unit of the LSTM neural network when executing the third time step, and the hidden state generated by the second time step is also used as the input when executing the third time step. The amount was processed to obtain sep5×5. By analogy, a network structure is finally obtained.

随后，基于该网络结构从第一总图中确定一个子图，将训练数据中的一批数据输入上述子图，生成训练结果，以便于基于该训练结果训练第一总图。因此，第一网络结构从第一总图中确定的子图可以称为训练子图。Then, a subgraph is determined from the first overall graph based on the network structure, a batch of data in the training data is input into the above subgraph, and a training result is generated, so as to train the first overall graph based on the training result. Therefore, the subgraphs determined by the first network structure from the first overall graph may be referred to as training subgraphs.

例如，控制器根据训练结果和BP算法更新训练子图的参数，完成一次迭代，由于训练子图属于第一总图，因此，更新训练子图的参数相当于更新第一总图的参数；随后，控制器可以从迭代一次后的第一总图中再次确定一个训练子图，并将另一批训练数据输入该训练子图，生成另一个训练结果，利用该训练结果和BP算法再次更新该训练子图，完成又一次迭代。待全部训练数据均被使用后，得到第二总图。For example, the controller updates the parameters of the training subgraph according to the training results and the BP algorithm, and completes one iteration. Since the training subgraph belongs to the first overall graph, updating the parameters of the training subgraph is equivalent to updating the parameters of the first overall graph; then , the controller can re-determine a training subgraph from the first overall graph after one iteration, input another batch of training data into the training subgraph, generate another training result, and use the training result and the BP algorithm to update the training subgraph again. Train the subgraph and complete another iteration. After all the training data have been used, a second overall graph is obtained.

获取第二总图后，可以将第二总图的参数固定住，然后训练控制器。第一网络结构可以按照图4所示的方法从第二总图中确定一个子图，该子图可以称为测试子图；将测试数据中的一批数据输入该测试子图，得到反馈结果(例如，预测值)。可以直接利用该反馈结果更新第一网络结构，也可以利用多个反馈结果的平均值更新第一网络结构，其中，该多个测试结果是将测试数据中的多批数据输入测试子图得到的。After acquiring the second general graph, the parameters of the second general graph can be fixed, and then the controller can be trained. The first network structure can determine a subgraph from the second overall graph according to the method shown in FIG. 4, and this subgraph can be called a test subgraph; input a batch of data in the test data into the test subgraph to obtain a feedback result. (eg, predicted value). The first network structure can be updated directly with the feedback result, or the first network structure can be updated with the average value of multiple feedback results, wherein the multiple test results are obtained by inputting multiple batches of data in the test data into the test subgraph .

在更新第一网络结构的过程中，可以根据反馈结果确定跳线约束项，随后，根据该跳线约束项和该反馈结果更新第一网络结构。In the process of updating the first network structure, the jumper constraint item may be determined according to the feedback result, and then the first network structure is updated according to the jumper line constraint item and the feedback result.

由于跳线约束项与当前训练阶段的反馈结果相关，因此，基于跳线约束项所确定的当前跳线密度更加适应当前训练阶段，从而能够在搜索到较高可信度的网络结构的同时提高训练效率。Since the jumper constraints are related to the feedback results of the current training stage, the current jumper density determined based on the jumper constraints is more suitable for the current training stage, so that the network structure with higher reliability can be searched and improved at the same time. training efficiency.

可选地，上述跳线约束项的大小与反馈结果的大小正相关。Optionally, the size of the jumper constraint item is positively correlated with the size of the feedback result.

本申请中，“正相关”指的是：一个参数的值随着另外一个参数的值的增大而增大，或者，一个参数的值随着另外一个参数的值的减小而减小。In this application, "positive correlation" means that the value of one parameter increases as the value of another parameter increases, or the value of one parameter decreases as the value of another parameter decreases.

在初始训练阶段控制器需要充分探索搜索空间，以免产生较大的偏差导致后续迭代过程中总图无法收敛，因此，控制器的跳线密度不宜过小，即，跳线约束项的值不宜过大。经过若干次迭代后，总图的随机性减小，即，一些操作可能被采样的概率减小，在这种情况下，继续使用跳线密度较大的控制器采样将导致训练效率的降低以及算力的浪费，因此，需要使用较大的跳线约束项来更新控制器。In the initial training stage, the controller needs to fully explore the search space, so as to avoid large deviations that cause the total graph to fail to converge in the subsequent iteration process. Therefore, the jumper density of the controller should not be too small, that is, the value of the jumper constraint should not be too large. big. After several iterations, the randomness of the total graph decreases, i.e., the probability that some operations may be sampled decreases, in which case, continuing to use controller sampling with a higher density of jumpers will result in a decrease in training efficiency and A waste of computing power, therefore, requires a larger jumper constraint to update the controller.

由于反馈结果(测试子图的预测准确度)通常随着训练阶段的进行不断增大，因此，跳线约束项的大小与反馈结果的大小正相关可以使得跳线约束项的值随着训练阶段的进行不断增大，从而在网络结构的搜索过程中达到性能(即，子图的性能)与训练效率和算力的平衡。Since the feedback result (prediction accuracy of the test subgraph) usually increases as the training stage progresses, the size of the jumper constraint is positively correlated with the size of the feedback result so that the value of the jumper constraint increases with the training stage. The performance of the network structure is continuously increased, so as to achieve a balance between the performance (ie, the performance of the subgraph) and the training efficiency and computing power in the search process of the network structure.

可选地，跳线约束项包括cos(1-R_k)ⁿ，R_k为反馈结果，n为与应用场景相关的超参数。其中，n的取值可以是大于0的实数，例如，n的取值可以是10、20、30、40、50、60、70或100，可选地，n也可以取大于或等于100的值。Optionally, the jumper constraint term includes cos(1-R _k ) ⁿ , where R _k is the feedback result, and n is a hyperparameter related to the application scenario. The value of n can be a real number greater than 0. For example, the value of n can be 10, 20, 30, 40, 50, 60, 70, or 100. Optionally, n can also take a value greater than or equal to 100. value.

可选地，跳线约束项包括当前跳线密度和预设的期望跳线密度之间的KL散度(Kullback-Leibler divergence)，例如，跳线约束项为

其中，α＝cos(1-R_k)ⁿ，λ为超参数，θ_c为第一网络结构的参数，q为预设的期望跳线密度，p为当前跳线密度。当前跳线密度可以是基于测试子图的反馈结果得到的。所述反馈结果包括测试子图对测试数据的预测正确率。Optionally, the jumper constraint term includes the KL divergence (Kullback-Leibler divergence) between the current jumper density and the preset desired jumper density. For example, the jumper constraint is

Wherein, α=cos(1-R _k ) ⁿ , λ is a hyperparameter, θ _c is a parameter of the first network structure, q is a preset desired patch cord density, and p is the current patch cord density. The current patch density can be obtained based on the feedback results of the test subgraph. The feedback result includes the prediction accuracy rate of the test subgraph on the test data.

可选地，处理器可以根据公式(1)更新第一网络结构。Optionally, the processor may update the first network structure according to formula (1).

其中，a_t为第t个时间步中采样到的操作(operation)，P(a_t|a_(t-1):1；θ_c)为采样到该操作的概率，m为在对第一网络结构进行一次更新时使用的反馈结果的数量，T为第二总图的层数，λ为超参数，一般在分类任务中设置为0.0001，可以根据具体的任务设置不同的值。Among them, at is the sampled operation (operation) in the _t -th time step, P(at |a ( _t _-1):1 ; θ _c ) is the probability of sampling the operation, m is the first The number of feedback results used when the network structure is updated once, T is the number of layers in the second overall graph, λ is a hyperparameter, which is generally set to 0.0001 in classification tasks, and different values can be set according to specific tasks.

公式(1)的含义是：最大化R_k的同时，最小化KL散度；即，保持当前跳线密度与期望跳线密度一致的情况下，最大化R_k。The meaning of formula (1) is: while maximizing R _k , minimize the KL divergence; that is, maximize R _k while keeping the current patch density consistent with the expected patch density.

现有技术中，由于R_k随着总图的收敛逐渐增大，并且，由于λ所产生的惩罚力度一直不变，因此，在实际迭代过程中约束跳线时，q通常只能设置到0.4-0.6之间。其中，q按照(所有当前的跳线数量)/(所有可连的跳线数量)计算，取值为0-1之间的数值，初始状态下跳线为随机连接的线，跳线密度为0.5。In the prior art, since R _k gradually increases with the convergence of the overall graph, and since the penalty force generated by λ remains unchanged, q can usually only be set to 0.4 when constraining the jumper in the actual iterative process. -0.6. Among them, q is calculated according to (the number of all current jumpers)/(the number of all connectable jumpers), and the value is between 0 and 1. In the initial state, the jumpers are randomly connected lines, and the jumper density is 0.5.

公式(1)相比于现有技术增加了一个系数α，α与R_k正相关。在初始训练阶段，控制器从总图中确定的测试子图生成的预测值不够准确，因此，R_k较小，α也较小，跳线约束项的惩罚力度较小，更新后的控制器具有较大的跳线密度，能够充分探索搜索空间，避免在初始训练阶段产生较大偏差；随着训练阶段的进行，控制器从总图中确定的测试子图生成的预测值的准确度提高，R_k逐渐增大，α也逐渐增大，跳线约束项的惩罚力度也逐渐增大，更新后的控制器的跳线密度较小，不能再充分探索搜索空间(由于总图的随机性减小，控制器也无需充分探索搜索空间)，从而提高了训练效率；此外，由于更新后的控制器不再充分探索搜索空间，因此，无需过多的FLOPS，从而减小了算力消耗。由上可知，α是一个自适应的系数，能够平衡在网络结构的搜索过程中网络结构的性能(即，子图的性能)与算力消耗，尤其适用于处理能力较弱的移动设备。Compared with the prior art, formula (1) adds a coefficient α, and α is positively correlated with R _k . In the initial training stage, the predicted values generated by the controller from the test subgraphs determined in the overall graph are not accurate enough. Therefore, _Rk is small, α is also small, and the penalty of the jumper constraint is small, and the updated controller With a large jumper density, it can fully explore the search space and avoid large deviations in the initial training phase; as the training phase progresses, the accuracy of the prediction values generated by the controller from the test subgraphs determined in the overall graph increases , R _k gradually increases, α also increases gradually, the penalty of the jumper constraint term also increases gradually, the jumper density of the updated controller is small, and the search space can no longer be fully explored (due to the randomness of the overall graph In addition, since the updated controller no longer fully explores the search space, there is no need for too many FLOPS, thus reducing computing power consumption. It can be seen from the above that α is an adaptive coefficient, which can balance the performance of the network structure (that is, the performance of the subgraph) and the computing power consumption during the search process of the network structure, and is especially suitable for mobile devices with weak processing capabilities.

需要说明的是，上文中包含α的跳线约束项仅是举例说明，任何能够根据训练阶段自适应调整的跳线约束项均落入本申请的保护范围。It should be noted that the jumper constraints including α in the above are only examples, and any jumper constraints that can be adaptively adjusted according to the training stage fall within the protection scope of the present application.

上文详细介绍了本申请提供的网络结构搜索的方法的示例。可以理解的是，网络结构搜索的装置为了实现上述功能，其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Examples of the network structure search method provided by the present application are described in detail above. It can be understood that, in order to realize the above-mentioned functions, the apparatus for searching the network structure includes corresponding hardware structures and/or software modules for executing each function. Those skilled in the art should easily realize that the present application can be implemented in hardware or a combination of hardware and computer software with the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

本申请可以根据上述方法示例对网络结构搜索的装置进行功能单元的划分，例如，可以将各个功能划分为各个功能单元，也可以将两个或两个以上的功能集成在一个功能单元中。上述功能单元既可以采用硬件的形式实现，也可以采用软件的形式实现。需要说明的是，本申请中对单元的划分是示意性的，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式。The present application can divide the functional units of the apparatus for network structure search according to the above method examples. For example, each function can be divided into each functional unit, or two or more functions can be integrated into one functional unit. The above-mentioned functional units may be implemented in the form of hardware, or may be implemented in the form of software. It should be noted that the division of units in this application is schematic, and is only a logical function division, and other division methods may be used in actual implementation.

图5示出了本申请提供的一种网络结构搜索的装置的结构示意图。图5中的虚线表示该单元为可选的单元。装置500可用于实现上述方法实施例中描述的方法。装置500可以是软件模块、芯片、终端设备或者其它电子设备。FIG. 5 shows a schematic structural diagram of an apparatus for searching a network structure provided by the present application. The dotted line in Figure 5 indicates that this unit is an optional unit. The apparatus 500 can be used to implement the methods described in the above method embodiments. The apparatus 500 may be a software module, chip, terminal device or other electronic device.

装置500包括一个或多个处理单元501，该一个或多个处理单元501可支持装置500实现图3所对应方法实施例中的方法。处理单元501可以是软件处理单元、通用处理器或者专用处理器。处理单元501可以用于对装置500进行控制，执行软件程序(例如，用于实现第一方面所述的方法的软件程序)，处理数据(例如，预测值)。装置500还可以包括通信单元505，用以实现信号的输入(接收)和输出(发送)。The apparatus 500 includes one or more processing units 501, and the one or more processing units 501 can support the apparatus 500 to implement the method in the method embodiment corresponding to FIG. 3 . The processing unit 501 may be a software processing unit, a general purpose processor or a special purpose processor. The processing unit 501 may be used to control the apparatus 500, execute a software program (eg, a software program for implementing the method described in the first aspect), and process data (eg, a predicted value). The apparatus 500 may further include a communication unit 505 to enable input (reception) and output (transmission) of signals.

例如，装置500可以是软件模块，通信单元505可以是该软件模块的接口函数。该软件模块可以在处理器或者控制电路上运行。For example, the apparatus 500 may be a software module, and the communication unit 505 may be an interface function of the software module. The software module may run on a processor or control circuit.

又例如，装置500可以是芯片，通信单元505可以是该芯片的输入和/或输出电路，或者，通信单元505可以是该芯片的通信接口，该芯片可以作为终端设备或其它电子设备的组成部分。For another example, the apparatus 500 may be a chip, and the communication unit 505 may be an input and/or output circuit of the chip, or the communication unit 505 may be a communication interface of the chip, and the chip may be used as a component of a terminal device or other electronic device .

装置500中，处理单元501可以执行：In the apparatus 500, the processing unit 501 may execute:

总图训练步骤：根据第一网络结构和训练数据对第一总图进行训练，生成第二总图；The general map training step: according to the first network structure and training data, the first general map is trained to generate the second general map;

网络结构训练步骤：根据所述第一网络结构从所述第二总图中确定若干测试子图；通过测试数据对所述若干测试子图进行测试，生成反馈结果；根据所述反馈结果确定跳线约束项；根据所述反馈结果以及所述跳线约束项对所述第一网络结构进行更新。The network structure training step: determine several test sub-graphs from the second overall graph according to the first network structure; test the several test sub-graphs through test data to generate feedback results; Line constraint item; update the first network structure according to the feedback result and the jumper line constraint item.

可选地，所述跳线约束项的大小与所述反馈结果的大小正相关。Optionally, the size of the jumper constraint item is positively correlated with the size of the feedback result.

可选地，所述跳线约束项包括cos(1-R_k)ⁿ，R_k为所述反馈结果，n为与应用场景相关的超参数。Optionally, the jumper constraint item includes cos(1-R _k ) ⁿ , where R _k is the feedback result, and n is a hyperparameter related to an application scenario.

可选地，0<n≤100。Optionally, 0<n≤100.

可选地，所述跳线约束项包括当前跳线密度和预设的期望跳线密度之间的KL散度。Optionally, the patch cord constraint item includes a KL divergence between the current patch cord density and a preset desired patch cord density.

可选地，所述跳线约束项为

其中，α＝cos(1-R_k)ⁿ，λ为超参数，θ_c为所述第一网络结构的参数，q为预设的期望跳线密度，p为当前跳线密度。Optionally, the jumper constraint item is

Wherein, α=cos(1-R _k ) ⁿ , λ is a hyperparameter, θ _c is a parameter of the first network structure, q is a preset desired patch cord density, and p is a current patch cord density.

可选地，所述当前跳线密度是基于所述若干测试子图得到的。Optionally, the current patch cord density is obtained based on the several test submaps.

可选地，所述反馈结果包括所述若干测试子图对所述测试数据的预测正确率。Optionally, the feedback result includes a prediction accuracy rate of the test data by the several test subgraphs.

可选地，所述处理单元501具体用于：通过所述第一网络结构在所述第一总图内确定第一训练子图；将所述训练数据中的一批数据输入所述第一训练子图，生成第一训练结果；根据所述第一训练结果训练所述第一总图，生成所述第二总图。Optionally, the processing unit 501 is specifically configured to: determine a first training subgraph in the first overall graph through the first network structure; input a batch of data in the training data into the first training a subgraph to generate a first training result; training the first overall graph according to the first training result to generate the second overall graph.

可选地，所述处理单元501还用于：在循环执行若干次所述总图训练步骤和所述网络结构训练步骤后，生成最终总图和最终网络结构；通过所述最终网络结构在所述最终总图中确定最终子图，所述最终子图为符合预设场景的网络结构。Optionally, the processing unit 501 is further configured to: generate a final overall graph and a final network structure after cyclically executing the overall graph training step and the network structure training step several times; A final subgraph is determined in the final overall graph, and the final subgraph is a network structure conforming to a preset scene.

本领域的技术人员可以清楚地了解到，为了描述的方便和简洁，上述装置和单元的具体工作过程以及产生的效果可以参见图1至图4对应的方法实施例中的相关描述。为了简洁，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process and the effect of the above devices and units may refer to the relevant descriptions in the method embodiments corresponding to FIG. 1 to FIG. 4 . For brevity, details are not repeated here.

作为一种可选的实施方式，上述各步骤可以通过硬件形式的逻辑电路或者软件形式的指令完成。例如，处理单元501可以是中央处理器(central processing unit，CPU)、数字信号处理器(digital signal processor，DSP)、专用集成电路(application specificintegrated circuit，ASIC)、现场可编程门阵列(field programmable gate array，FPGA)或者其它可编程逻辑器件，例如，分立门、晶体管逻辑器件或分立硬件组件。As an optional implementation manner, each of the above steps may be completed by a logic circuit in the form of hardware or an instruction in the form of software. For example, the processing unit 501 may be a central processing unit (CPU), a digital signal processor (DSP), an application specific integrated circuit (ASIC), or a field programmable gate array (field programmable gate array). array, FPGA) or other programmable logic devices such as discrete gates, transistor logic devices or discrete hardware components.

装置500可以包括一个或多个存储单元502，其中存有程序504(例如，包含第二方面所述的方法的软件程序)，程序504可被处理单元501运行，生成指令503，使得处理单元501根据指令503执行上述方法实施例中描述的方法。可选地，存储单元502中还可以存储有数据(例如，预测值和跳线密度)。可选地，处理单元501还可以读取存储单元502中存储的数据，该数据可以与程序504存储在相同的存储地址，该数据也可以与程序504存储在不同的存储地址。The apparatus 500 may include one or more storage units 502 in which a program 504 (eg, a software program including the method described in the second aspect) is stored, and the program 504 may be executed by the processing unit 501 to generate instructions 503 to cause the processing unit 501 to The method described in the above method embodiment is performed according to the instruction 503 . Optionally, the storage unit 502 may also store data (eg, predicted values and patch density). Optionally, the processing unit 501 may also read the data stored in the storage unit 502 , the data may be stored at the same storage address as the program 504 , or the data may be stored at a different storage address from the program 504 .

处理单元501和存储单元502可以单独设置，也可以集成在一起，例如，集成在单板或者系统级芯片(system on chip，SOC)上。The processing unit 501 and the storage unit 502 may be provided separately, or may be integrated together, for example, integrated on a single board or a system on chip (system on chip, SOC).

本申请还提供了一种计算机程序产品，该计算机程序产品被处理单元501执行时实现本申请中任一实施例所述的方法。The present application also provides a computer program product, which implements the method described in any embodiment of the present application when the computer program product is executed by the processing unit 501 .

该计算机程序产品可以存储在存储单元502中，例如是程序504，程序504经过预处理、编译、汇编和链接等处理过程最终被转换为能够被处理单元501执行的可执行目标文件。The computer program product can be stored in the storage unit 502 , such as a program 504 , and the program 504 is finally converted into an executable object file that can be executed by the processing unit 501 through processing processes such as preprocessing, compilation, assembly, and linking.

该计算机程序产品可以从一个计算机可读存储介质向另一个计算机可读存储介质传输，例如，可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line，DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。The computer program product can be transmitted from one computer-readable storage medium to another computer-readable storage medium, eg, from a website site, computer, server, or data center over a wireline (eg, coaxial cable, optical fiber, digital subscriber line ( digital subscriber line, DSL) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center.

本申请还提供了一种计算机可读存储介质(如，存储单元502)，其上存储有计算机程序，该计算机程序被计算机执行时实现本申请中任一实施例所述的方法。该计算机程序可以是高级语言程序，也可以是可执行目标程序。The present application also provides a computer-readable storage medium (eg, the storage unit 502 ), on which a computer program is stored, and when the computer program is executed by a computer, implements the method described in any of the embodiments of the present application. The computer program can be a high-level language program or an executable object program.

该计算机可读存储介质可以是磁性介质(例如，软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc，DVD))、或者半导体介质(例如固态硬盘(solid statedisk，SSD))等。例如，该计算机可读存储介质可以是易失性存储器或非易失性存储器，或者，该计算机可读存储介质可以同时包括易失性存储器和非易失性存储器。其中，非易失性存储器可以是只读存储器(read-only memory，ROM)、可编程只读存储器(programmableROM，PROM)、可擦除可编程只读存储器(erasable PROM，EPROM)、电可擦除可编程只读存储器(electrically EPROM，EEPROM)或闪存。易失性存储器可以是随机存取存储器(randomaccess memory，RAM)，其用作外部高速缓存。通过示例性但不是限制性说明，许多形式的RAM可用，例如静态随机存取存储器(static RAM，SRAM)、动态随机存取存储器(dynamicRAM，DRAM)、同步动态随机存取存储器(synchronous DRAM，SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM，DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM，ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM，SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM，DR RAM)。The computer-readable storage medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a digital video disc (DVD)), or a semiconductor medium (eg, a solid state disk (SSD)) Wait. For example, the computer-readable storage medium may be volatile memory or non-volatile memory, or the computer-readable storage medium may include both volatile memory and non-volatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable memory Except programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory may be random access memory (RAM), which acts as an external cache. By way of example and not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (SDRAM) ), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlink DRAM, SLDRAM) And direct memory bus random access memory (direct rambus RAM, DR RAM).

应理解，在本申请的各个实施例中，各过程的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请的实施例的实施过程构成任何限定。It should be understood that, in each embodiment of the present application, the size of the sequence number of each process does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not Implementation constitutes any limitation.

本文中的术语“和/或”，仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。另外，本文中字符“/”，一般表示前后关联对象是一种“或”的关系。The term "and/or" in this article is only an association relationship to describe the associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, it can mean that A exists alone, A and B exist at the same time, independently There are three cases of B. In addition, the character "/" in this document generally indicates that the related objects are an "or" relationship.

本申请所提供的实施例所揭露的系统、装置和方法，可以通过其它的方式实现。例如，以上所描述的方法实施例的一些特征可以忽略，或不执行。以上所描述的装置实施例仅仅是示意性的，单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，多个单元或组件可以结合或者可以集成到另一个系统。另外，各单元之间的耦合或各个组件之间的耦合可以是直接耦合，也可以是间接耦合，上述耦合包括电的、机械的或其它形式的连接。The systems, devices, and methods disclosed in the embodiments provided in this application may be implemented in other ways. For example, some features of the method embodiments described above may be omitted, or not implemented. The apparatus embodiments described above are only illustrative, and the division of units is only a logical function division. In actual implementation, there may be other division methods, and multiple units or components may be combined or integrated into another system. In addition, the coupling between the various units or the coupling between the various components may be direct coupling or indirect coupling, and the above-mentioned coupling includes electrical, mechanical or other forms of connection.

总之，以上所述仅为本申请的部分实施例而已，并非用于限定本申请的保护范围。凡在本申请的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本申请的保护范围之内。In a word, the above descriptions are only some embodiments of the present application, and are not intended to limit the protection scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included within the protection scope of this application.

Claims

1. A network structure search method, comprising:

and (3) general diagram training: training the first general diagram according to the first network structure and the training data to generate a second general diagram;

network structure training: determining a plurality of test subgraphs from the second general graph according to the first network structure; testing the plurality of test subgraphs through the test data to generate a feedback result; determining a jumper wire constraint item according to the feedback result; and updating the first network structure according to the feedback result and the jumper constraint item.

2. The method of claim 1, wherein the size of the jumper constraint term is positively correlated to the size of the feedback result.

3. The method of claim 2, wherein the jumper constraint term comprises cos (1-R)_k)ⁿ，R_kAnd n is a hyper-parameter related to the application scene for the feedback result.

4. The method of claim 3, wherein 0< n ≦ 100.

5. The method according to any one of claims 1 to 4, wherein the jumper constraint term includes a KL divergence between a current jumper density and a preset desired jumper density.

6. The method of claim 5, wherein the step of removing the metal oxide layer comprises removing the metal oxide layer from the metal oxide layerIn that, the jumper constraint term is

Wherein α ═ cos (1-R)_k)ⁿλ is a hyperparameter, θ_cQ is a preset expected jumper density and p is a current jumper density, wherein q is a parameter of the first network structure.

7. The method of claim 5 or 6, wherein the current patch cord density is derived based on the number of test subgraphs.

8. The method of any of claims 1 to 7, wherein the feedback results comprise predicted correctness of the test data by the number of test subgraphs.

9. The method of any one of claims 1 to 8, wherein training the first total graph according to the first network structure and the training data to generate a second total graph comprises:

determining a first training subgraph within the first population graph over the first network structure;

inputting a batch of data in the training data into the first training subgraph to generate a first training result;

and training the first general diagram according to the first training result to generate the second general diagram.

10. The method of any one of claims 1 to 9, further comprising:

after the overall graph training step and the network structure training step are executed for a plurality of times in a circulating mode, generating a final overall graph and a final network structure;

and determining a final sub-graph in the final general graph through the final network structure, wherein the final sub-graph is a network structure conforming to a preset scene.

11. A network structure search apparatus, comprising a processing unit configured to perform:

12. The apparatus of claim 11, wherein the size of the jumper constraint term is positively correlated to the size of the feedback result.

13. The apparatus of claim 12, wherein the jumper constraint term comprises cos (1-R)_k)ⁿ，R_kAnd n is a hyper-parameter related to the application scene for the feedback result.

14. The apparatus of claim 13, wherein 0< n ≦ 100.

15. The apparatus of any of claims 11-14, wherein the jumper constraint term comprises a KL divergence between a current jumper density and a preset desired jumper density.

16. The apparatus of claim 15, wherein the jumper constraint term is

17. The apparatus of claim 15 or 16, wherein the current jumper density is derived based on the number of test subgraphs.

18. The apparatus of any of claims 11 to 17, wherein the feedback results comprise predicted correctness of the test data by the number of test subgraphs.

19. The apparatus according to any one of claims 11 to 18, wherein the processing unit is specifically configured to:

20. The apparatus according to any one of claims 11 to 19, wherein the processing unit is further configured to:

21. A network structure search device characterized by comprising: a memory for storing instructions and a processor for executing the instructions stored by the memory, and execution of the instructions stored in the memory causes the processor to perform the method of any of claims 1 to 10.

22. A computer storage medium, having stored thereon a computer program which, when executed by a computer, causes the computer to perform the method of any one of claims 1 to 10.

23. A computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of any one of claims 1 to 10.