WO2020082663A1 - Structural search method and apparatus for deep neural network - Google Patents

Structural search method and apparatus for deep neural network Download PDF

Info

Publication number
WO2020082663A1
WO2020082663A1 PCT/CN2019/077049 CN2019077049W WO2020082663A1 WO 2020082663 A1 WO2020082663 A1 WO 2020082663A1 CN 2019077049 W CN2019077049 W CN 2019077049W WO 2020082663 A1 WO2020082663 A1 WO 2020082663A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
module
information flow
computing unit
scaling operator
Prior art date
Application number
PCT/CN2019/077049
Other languages
French (fr)
Chinese (zh)
Inventor
黄泽昊
张新邦
王乃岩
Original Assignee
北京图森未来科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京图森未来科技有限公司 filed Critical 北京图森未来科技有限公司
Publication of WO2020082663A1 publication Critical patent/WO2020082663A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Definitions

  • the present application relates to the field of artificial intelligence technology, in particular to a deep neural network structure search method and device.
  • Deep neural networks have achieved great success in many fields, such as computer vision and natural language processing. Deep neural networks transform traditional hand-designed features into end-to-end learning through powerful representation capabilities.
  • the current deep neural network has a complex structure, such as convolution, pooling, and many computing unit nodes, so how to search among many computing unit nodes to obtain a compact, fast, and effective model structure has become a difficulty.
  • a search space is first defined, and then an optimal network structure is searched in the search space.
  • heuristic methods based on controller-based network structure search can be used for network structure search, or evolutionary algorithms can be used for network structure search.
  • a controller needs to be trained or an evolutionary algorithm to be used to search the network structure.
  • the subnet in the full set needs to be trained to converge to evaluate the subnetwork, making the time and calculation of the network structure search The amount is huge. For larger data sets, the process of searching for the optimal network structure using this method is cumbersome and slow.
  • the embodiments of the present application provide a deep neural network structure search method and device to solve the problem that the time and calculation amount of the network structure search in the prior art are extremely large. For larger data sets, the optimal network structure is searched. The process is cumbersome and slow.
  • this application provides a deep neural network structure search method, including:
  • each layer of computing units includes at least one computing unit
  • each module the computing units are connected by a preset connection method to obtain the information flow in each module; among them, computing units in the same layer of computing unit structure are not connected, and each computing unit can be connected to The calculation units of different layers in the module where it is located, and the input and output of the module where it is located are connected;
  • the initial neural network is obtained
  • the information flow in which the sparse scaling operator is zero in the intermediate neural network is deleted to obtain a search result neural network in the search space.
  • this application provides a target detection method, including:
  • the sample data to be subjected to target detection is obtained and input into the search result neural network obtained by using the above-mentioned deep neural network structure search method, and the output of the search result neural network is used as the target detection result.
  • this application provides a semantic segmentation method, including:
  • the sample data to be semantically segmented is obtained and input into the search result neural network obtained by using the structure search method of the deep neural network described above, and the output of the search result neural network is used as the semantic segmentation result.
  • the present application provides a deep neural network structure search device, including:
  • a computing unit structure obtaining unit configured to obtain each layer of computing unit structures in each module connected in series in the deep neural network in a preset search space; each layer of computing unit structure includes at least one computing unit;
  • the information flow obtaining unit is used to connect each computing unit in each module with a preset connection method to obtain the information flow in each module; wherein, the computing units in the same layer of computing unit structure are not connected, Each computing unit can be connected to the computing units at different layers in the module where it is located, and the inputs and outputs of the module where it is located;
  • the initial neural network obtaining unit is used to obtain the initial neural network according to the connection of the modules and the computing units in each module;
  • a sparse scaling operator setting unit configured to set a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow;
  • a weight and operator training unit used to train the weight of the initial neural network and the sparse scaling operator of the information flow using preset training sample data to obtain an intermediate neural network
  • a search result obtaining unit is used to delete the information stream whose sparse scaling operator is zero in the intermediate neural network to obtain a search result neural network in the search space.
  • the present application provides a computer-readable storage medium on which a computer program is stored, which is characterized in that, when the program is executed by a processor, the above deep neural network structure search method is implemented.
  • the present application provides a computer device, including a memory, a processor, and a computer program stored on the storage and executable on the processor.
  • the processor executes the program, the structure of the deep neural network described above is realized. Search method.
  • An embodiment of the present application provides a method and device for searching a structure of a deep neural network.
  • the unit structure includes at least one computing unit; after that, each computing unit is connected in a preset connection manner in each module to obtain the information flow in each module; wherein, between computing units in the same layer of computing unit structure is not Connect, each computing unit can be connected to the computing units at different layers of the module where it is located, and the input and output of the module where it is located; then, according to the connection of the module and the computing unit in each module, the initial Neural network; set the sparse scaling operator on the information flow in the initial neural network, where the sparse scaling operator is used to scale the information flow; the preset training sample data is used to weight the initial neural network and the sparse scaling operation of the information flow Sub-training to obtain an intermediate neural network; further, the intermediate neural network is sparse Operators put information deleted sub-zero flow,
  • This application is different from the prior art in searching for an important network structure directly from the search space.
  • This application can delete unimportant information flows to implement a network structure search by using a sparse scaling operator.
  • This application does not need to train the controller during the search of the network structure, nor does it need to use complex evolutionary algorithms, and does not need to train the subnet for a long time. It can be obtained only by training the weights and sparse scaling operators.
  • the search results greatly reduce the network structure search time, especially for the network structure search on large-scale data sets, and save the network structure search time.
  • FIG. 1 is a flowchart 1 of a method for searching a deep neural network structure provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a network structure in a search space in a deep neural network involved in an embodiment of this application;
  • FIG. 3 is a schematic diagram of an example of applying the embodiment of the present application to a two-layer structure network search
  • FIG. 4 is a schematic structural diagram of a deep neural network structure search device provided by an embodiment of the present application.
  • DNN Deep Neural Network (Deep Neural Network).
  • Computing unit A unit node in a neural network used for calculations such as convolution and pooling.
  • Network structure search The process of searching for the optimal network structure in a neural network.
  • an embodiment of the present application provides a structure search method for a deep neural network, including:
  • Step 101 Obtain a computing unit structure of each layer in each module connected in series in the deep neural network in a preset search space.
  • each layer of computing unit structure includes at least one computing unit.
  • Step 102 Connect each computing unit in a preset connection mode in each module to obtain the information flow in each module.
  • each computing unit can be connected to the computing unit at a different layer in the module where it is located, and the input and output of the module where it is located.
  • Step 103 Obtain an initial neural network according to the module and the connection of the computing units in each module.
  • Step 104 Set a sparse scaling operator on the information flow in the initial neural network, where the sparse scaling operator is used to scale the information flow.
  • Step 105 Use the preset training sample data to train the weight of the initial neural network and the sparse scaling operator of the information flow to obtain an intermediate neural network.
  • Step 106 Delete the information stream whose sparse scaling operator is zero in the intermediate neural network to obtain a search result neural network in the search space.
  • the preset search space can be as shown in Figure 2, which can include multiple modules 21, each module 21 in series, that is, the output of the previous module is the next module Input; each module 21 (can be regarded as a directed acyclic graph) can include a multi-layer computing unit structure 22, each layer of computing unit structure 22 includes at least one computing unit 23 (each computing unit can be regarded as a directed acyclic graph Node in the), the calculation unit 23 in the calculation unit structure 22 of each layer may generally include at least one of a convolution calculation unit and a pooling calculation unit.
  • the convolution calculation unit may also be an expansion convolution calculation unit or a group convolution calculation unit.
  • step 102 can be implemented as follows:
  • each computing unit 23 is connected in a fully connected manner, that is, as shown in FIG. 2, each computing unit 23 and the computing unit 23 at a different layer in the module 21 in which it is located, and its Connect the input and output of the module 21; this can get the input from the module 21 to the computing unit structure 22 of each layer, the output from the computing unit structure 22 of each layer to the module 21, and the information flow between the computing units 23 (It can be regarded as an edge between nodes in a directed acyclic graph). In this way, a complete set of network structures in the search space can be obtained (any network structure in the search space can be regarded as a subgraph of the directed acyclic graph).
  • the output h (i) of the i-th calculation unit F (i) (x) is equal to the sum of the outputs h (j) of all previous calculation units through the calculation unit F (i) ( x)
  • the calculation result can be expressed as:
  • the weights of the initial neural network may be configured to initialize the weights of the initial neural network.
  • the pre-trained sample data may be used to pre-train the weights of the initial neural network to obtain the pre-trained initial neural network, so that after pre-training, the weights of the initial neural network obtained better.
  • the weights are configured or pre-trained in order to obtain the initial weight values of the initial neural network, so as to facilitate the subsequent setting and training of the sparse scaling operator.
  • a sparse scaling operator needs to be set for the information flow in the initial neural network, that is, for example, a sparse scaling operator is added at the output h (j) of all the previous computing units
  • the above formula (1) should be expressed as:
  • each sparse scaling operator is greater than or equal to 0.
  • the value interval of the sparse scaling operator may be [0,1], and the sparse scaling operator may not be equal to 1.
  • the value of the sparse scaling operator is generally taken to be 1.
  • the calculation unit is the convolution calculation unit and the pooling calculation unit
  • the information flow is the feature map in the network.
  • this convolutional neural network structure it contains several modules, each module contains several layers of computing unit structure, and each layer of computing unit structure includes several different computing units (for example, 1 ⁇ 1 convolution calculation, 3 ( ⁇ 3 convolution calculation, 5 ⁇ 5 convolution calculation, pooling calculation, etc.
  • Each module is connected in series, that is, the output of the previous module is the input of the next module, and each computing unit is connected to the computing unit at a different layer in the module where it is located, and the input and output of the module where it is located.
  • the output of each computing unit can be expressed.
  • the output of the jth computing unit of the i-th layer of the b-th module can be expressed as:
  • F (b, i, j) (x) represents the calculation of the j-th computing unit of the i-th layer of the b-th module
  • N represents the total number of computing units included in the structure of one-layer computing unit
  • h (b, m, n) represents the The output of the nth calculation unit of the mth layer of the b modules
  • O (b-1) represents the output of the b-1th module, that is, the input of the bth module
  • connection between each computing unit and the output of the module where it is located can also be trained and learned.
  • the output O (b) of the bth module can be spliced by the outputs of all computing units in the module, and then the convolution kernel size of 1 is used to reduce the number of channels of the feature map To keep the number of channels unchanged, as shown in the following formula:
  • h (b, m, n) represents the output of the nth computing unit in the mth layer in the bth module
  • O (b-1) represents the output of the b-1th module, that is Input of module b.
  • R (x) represents the splicing of the feature map and the convolution calculation with the size of the convolution kernel being 1. It is used to fuse the feature map and ensure that the number of channels output by the module remains unchanged.
  • step 105 it can be implemented as follows:
  • Step S1. Construct an objective function corresponding to the initial neural network.
  • the objective function includes a loss function, a weight regular function, and a sparse regular function.
  • the objective function can be as the formula:
  • W is the weight
  • is the sparse scaling operator vector
  • K is the number of sample data
  • L (y i , Net (x i , W, ⁇ )) is the loss of the neural network on the sample data x i , y i Is the sample label
  • Net (x i , W, ⁇ ) is the output of the neural network
  • is the parameter attenuation weight of the weight W
  • 1 is the sparse regular function, denoted Rs ( ⁇ ).
  • 1 can also be replaced by more complex sparse constraints, such as non-convex sparse constraints.
  • Step S2 Perform iterative training on the initial neural network using the training sample data.
  • Step S3 When the number of iteration training times reaches a threshold or the target function meets a preset convergence condition, the intermediate neural network is obtained.
  • the foregoing step S2 may be implemented by performing the following iteration training on the initial neural network multiple times, taking an iteration process of a non-first iteration and a non-tail iteration (hereinafter referred to as this iteration training) as an example It is described that one iteration training includes the following steps C1 to C3:
  • Step C1 using the sparse scaling operator obtained in the previous iteration training as the constant of the objective function, using the weight as the variable of the objective function, and optimizing the objective function using the first optimization algorithm to obtain this time Weights for iterative training;
  • Step C2 Using the weight of this iteration training as the constant of the objective function, using the sparse scaling operator as the variable of the objective function, and optimizing the objective function with the second optimization algorithm to obtain Sparse scaling operator;
  • Step C3. Perform the next iterative training based on the weights and sparse scaling operator of this iterative training.
  • the first iteration training process is as follows: the initial sparse scaling operator is used as the constant of the objective function, the weight is used as the variable of the objective function, and the objective function is optimized using the first optimization algorithm to obtain this time Weights for iterative training; use the weights for this iterative training as the constants of the objective function, and use the sparse scaling operator as the variable of the objective function, and optimize the objective function with the second optimization algorithm to obtain this iteration
  • the sparse scaling operator for training; the second iterative training is based on the weights of this iterative training and the sparse scaling operator.
  • the last iteration training process is as follows: the sparse scaling operator obtained from the previous iteration training is used as the constant of the objective function, the weight is used as the variable of the objective function, and the objective function is optimized using the first optimization algorithm Perform optimization to obtain the weight of this iteration training; take the weight of this iteration training as the constant of the objective function, use the sparse scaling operator as the variable of the objective function, and use the second optimization algorithm to perform the objective function Optimize to obtain the sparse scaling operator trained in this iteration; use the neural network containing the sparse scaling operator and weights obtained in this iteration training as the intermediate neural network.
  • the first optimization algorithm may be, but not limited to, any one of the following algorithms: a stochastic gradient descent algorithm, a variant algorithm that introduces momentum.
  • the second optimization algorithm may be, but not limited to, any one of the following algorithms: accelerated near-end gradient descent algorithm, near-end gradient descent algorithm, or alternating direction multiplier algorithm.
  • the objective function is converted to The value of W can be obtained by using the stochastic gradient descent algorithm, and the specific process will not be described in detail.
  • the objective function is converted to The accelerated near-end gradient descent algorithm is used to solve the value of ⁇ , which can be obtained by but not limited to the following methods:
  • ⁇ t represents the step size of gradient descent during the t-th iteration training
  • S ⁇ (z) i sign (z i ) (
  • Mode 2 updates the formula of the foregoing Mode 1 to obtain formula (9) to formula (11), and calculates ⁇ according to formula (9) to formula (11):
  • Method 3 In this application, a variable substitution method can also be used, that is, the following formulas (12) to (14) are used to calculate ⁇ :
  • ⁇ ′ t-1 ⁇ t-1 + ⁇ t-1 v t-1
  • W and ⁇ are updated in the form of batch stochastic gradient descent.
  • the information stream with the sparse scaling operator being zero in the intermediate neural network can be deleted to obtain the search result neural network in the search space.
  • the computing unit has no effect on subsequent calculations, and the computing unit may be deleted.
  • the embodiment of the present application is applied to a picture classification task. It is assumed that the basic network contains two layers of Level 1 and Level 2, and there are two different computing units OP 1 and OP 2 in each layer.
  • the connection between the computing units is shown in the leftmost side of FIG. 3. After the above steps 101 to 105, it can be trained that the sparse scaling operator of the broken line shown in the middle of FIG. 3 is 0. Furthermore, as shown in the far right side of FIG. 3, after deleting these dotted lines, it is confirmed that the computing unit OP at Level 1 has no information flow corresponding to the connection, and it is also deleted, and finally the search result neural network is obtained.
  • the example listed in FIG. 3 is only a specific application of the embodiments of the present application, and not all applications.
  • the sparse scaling operators located in different modules of the network in this application can also be independently updated, so that different modules can search and train to obtain a more flexible network structure.
  • embodiments of the present application also provide a target detection method, including:
  • the sample data to be subjected to target detection is obtained and input into the search result neural network obtained by the structure search method of the deep neural network corresponding to FIG. 1 above, and the output of the search result neural network is used as the target detection result.
  • embodiments of the present application also provide a semantic segmentation method, including:
  • the sample data to be semantically segmented is obtained and input into the search result neural network obtained by the structure search method of the deep neural network corresponding to FIG. 1 above, and the output of the search result neural network is used as the semantic segmentation result.
  • the structure search method of the deep neural network corresponding to FIG. 1 is not limited to the application in target detection and semantic segmentation tasks, but can also be used in other different tasks, which will not be listed here one by one.
  • an embodiment of the present application also provides a deep neural network structure search device, which is characterized by including:
  • the calculation unit structure obtaining unit 31 is configured to obtain each layer of the calculation unit structure in each module connected in series in the deep neural network in a preset search space; each layer of the calculation unit structure includes at least one calculation unit.
  • the information flow obtaining unit 32 is used to connect each computing unit with a preset connection method in each module to obtain the information flow in each module; wherein, computing units in the same layer computing unit structure are not connected Each computing unit can be connected to computing units at different layers of the module in which it resides, and the inputs and outputs of the module in which it resides.
  • the initial neural network obtaining unit 33 is used to obtain the initial neural network according to the connection of the modules and the calculation units in each module.
  • the sparse scaling operator setting unit 34 is configured to set a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow.
  • the weight and operator training unit 35 is configured to train the weights of the initial neural network and the sparse scaling operator of the information flow using preset training sample data to obtain an intermediate neural network.
  • the search result obtaining unit 36 is configured to delete the information stream whose sparse scaling operator is zero in the intermediate neural network to obtain a search result neural network in the search space.
  • embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored, which is characterized in that, when the program is executed by a processor, the structure search method of the deep neural network corresponding to FIG. 1 described above is implemented.
  • the embodiments of the present application also provide a computer device, including a memory, a processor, and a computer program stored on the storage and executable on the processor.
  • the processor executes the program, the depth corresponding to FIG. 1 described above is realized.
  • Neural network structure search method When the processor executes the program, the depth corresponding to FIG. 1 described above is realized.
  • a method and device for searching a structure of a deep neural network provided in the embodiments of the present application, first, obtain each layer of computing units in each module connected in series in the deep neural network in a preset search space Structure; each layer of computing unit structure includes at least one computing unit; after that, each computing unit is connected in a preset connection mode in each module to obtain the information flow in each module; There is no connection between the computing units, each computing unit can be connected to the computing unit at a different layer in the module where it is located, and the input and output of the module where it is located; then, according to the module and the computing unit in each module Connect the situation to get the initial neural network; set the sparse scaling operator on the information flow in the initial neural network, where the sparse scaling operator is used to scale the information flow; use the preset training sample data to weight and information the initial neural network
  • the sparse scaling operator of the stream is trained to obtain an intermediate neural network; Sparse network operators scale of zero flow of information deleted, the neural network get search results in the search space.
  • This application is different from the prior art in searching for an important network structure directly from the search space.
  • This application can delete unimportant information flows to implement a network structure search by using a sparse scaling operator.
  • This application does not need to train the controller during the search of the network structure, nor does it need to use complex evolutionary algorithms, and does not need to train the subnet for a long time. It can be obtained only by training the weights and sparse scaling operators
  • the search results greatly reduce the network structure search time, especially for the network structure search on large-scale data sets, and save the network structure search time.
  • the functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software functional module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage and optical storage, etc.) containing computer usable program code.
  • a computer usable storage media including but not limited to disk storage and optical storage, etc.
  • These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions The device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and / or block diagrams.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device
  • the instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and / or block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A structural search method and apparatus for a deep neural network, relating to the field of artificial intelligence technology. The method comprises: obtaining, in a preset search space, the structure of each layer of computing units in each module connected in series in a deep neural network (101); connecting the computing units in each module in a preset connection manner to obtain information flows in each module (102); obtaining an initial neural network according to the modules and a connection situation of the computing units in each module (103); configuring a sparse scaling operator for the information flows in the initial neural network, the sparse scaling operator being used for scaling the information flows (104); using preset training sample data to train the weight of the initial neural network and the sparse scaling operator for the information flows, to obtain an intermediate neural network (105); and deleting, from the intermediate neural network, the information flows with the sparse scaling operator being zero, to obtain a search result neural network within the search space (106). The solution saves the time of a network structure search.

Description

一种深度神经网络的结构搜索方法及装置Deep neural network structure search method and device
本申请要求在2018年10月26日提交中国专利局、申请号为201811259033.2、申请名称为“一种深度神经网络的结构搜索方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application filed on October 26, 2018 in the Chinese Patent Office with the application number 201811259033.2 and the application name as "a deep neural network structure search method and device" In this application.
技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及一种深度神经网络的结构搜索方法及装置。The present application relates to the field of artificial intelligence technology, in particular to a deep neural network structure search method and device.
背景技术Background technique
近几年来,深度神经网络在诸多领域中取得了巨大的成功,如计算机视觉、自然语言处理等。深度神经网络通过强大的表征能力,将传统的手工设计的特征转化为了端到端的学习。然而,目前深度神经网络的结构复杂,诸如卷积、池化等计算单元节点众多,使得如何在众多计算单元节点中搜索得到一个结构紧凑、运行速度较快、效果又好的模型结构成为了一个难点。In recent years, deep neural networks have achieved great success in many fields, such as computer vision and natural language processing. Deep neural networks transform traditional hand-designed features into end-to-end learning through powerful representation capabilities. However, the current deep neural network has a complex structure, such as convolution, pooling, and many computing unit nodes, so how to search among many computing unit nodes to obtain a compact, fast, and effective model structure has become a difficulty.
目前现有技术一般采用先定义搜索空间,然后在搜索空间中搜索最优的网络结构。一般情况下可以采用基于控制器的网络结构搜索的启发式方法来进行网络结构搜索,或者使用进化算法来进行网络结构搜索。然而,现有技术中需要控制器进行训练或者使用进化算法来进行网络结构搜索,在搜索过程中需要将全集中的子网络训练到收敛来对子网络进行评估,使得网络结构搜索的时间与计算量极大,对于较大的数据集,采用此种方法搜索到最优网络结构的过程繁琐且缓慢。At present, in the prior art, a search space is first defined, and then an optimal network structure is searched in the search space. Generally, heuristic methods based on controller-based network structure search can be used for network structure search, or evolutionary algorithms can be used for network structure search. However, in the prior art, a controller needs to be trained or an evolutionary algorithm to be used to search the network structure. During the search process, the subnet in the full set needs to be trained to converge to evaluate the subnetwork, making the time and calculation of the network structure search The amount is huge. For larger data sets, the process of searching for the optimal network structure using this method is cumbersome and slow.
发明内容Summary of the invention
本申请的实施例提供一种深度神经网络的结构搜索方法及装置,以解决现有技术中的网络结构搜索的时间与计算量极大,对于较大的数据集,搜索到最优网络结构的过程繁琐且缓慢的问题。The embodiments of the present application provide a deep neural network structure search method and device to solve the problem that the time and calculation amount of the network structure search in the prior art are extremely large. For larger data sets, the optimal network structure is searched. The process is cumbersome and slow.
为达到上述目的,本申请采用如下技术方案:In order to achieve the above purpose, this application adopts the following technical solutions:
一方面,本申请提供一种深度神经网络的结构搜索方法,包括:On the one hand, this application provides a deep neural network structure search method, including:
在预先设置的搜索空间中获得深度神经网络中依次串接的每个模块中的每层计算单元结构;所述每层计算单元结构包括至少一个计算单元;Obtaining a structure of each layer of computing units in each module connected in series in the deep neural network in a preset search space; the structure of each layer of computing units includes at least one computing unit;
在每个模块中采用预设连接方式将各计算单元进行连接,得到每个模块中的信息流;其 中,处于同一层计算单元结构的计算单元之间不进行连接,每个计算单元能够与和其所在模块中的不同层的计算单元,以及其所在模块的输入和输出进行连接;In each module, the computing units are connected by a preset connection method to obtain the information flow in each module; among them, computing units in the same layer of computing unit structure are not connected, and each computing unit can be connected to The calculation units of different layers in the module where it is located, and the input and output of the module where it is located are connected;
根据模块及每个模块中的计算单元的连接情况,得到初始神经网络;According to the module and the connection of the computing unit in each module, the initial neural network is obtained;
对所述初始神经网络中的信息流设置稀疏缩放算子,其中所述稀疏缩放算子用于对所述信息流进行缩放;Setting a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow;
采用预置的训练样本数据对所述初始神经网络的权重和信息流的稀疏缩放算子进行训练,得到中间神经网络;Using preset training sample data to train the weight of the initial neural network and the sparse scaling operator of the information flow to obtain an intermediate neural network;
将所述中间神经网络中稀疏缩放算子为零的信息流删除,得到搜索空间内的搜索结果神经网络。The information flow in which the sparse scaling operator is zero in the intermediate neural network is deleted to obtain a search result neural network in the search space.
另一方面,本申请提供一种目标检测方法,包括:On the other hand, this application provides a target detection method, including:
获得待进行目标检测的样本数据,输入到采用上述的深度神经网络的结构搜索方法得到的搜索结果神经网络中,以所述搜索结果神经网络的输出作为目标检测结果。The sample data to be subjected to target detection is obtained and input into the search result neural network obtained by using the above-mentioned deep neural network structure search method, and the output of the search result neural network is used as the target detection result.
另一方面,本申请提供一种语义分割方法,包括:On the other hand, this application provides a semantic segmentation method, including:
获得待进行语义分割的样本数据,输入到采用上述的深度神经网络的结构搜索方法得到的搜索结果神经网络中,以所述搜索结果神经网络的输出作为语义分割结果。The sample data to be semantically segmented is obtained and input into the search result neural network obtained by using the structure search method of the deep neural network described above, and the output of the search result neural network is used as the semantic segmentation result.
又一方面,本申请提供一种深度神经网络的结构搜索装置,包括:In yet another aspect, the present application provides a deep neural network structure search device, including:
计算单元结构获得单元,用于在预先设置的搜索空间中获得深度神经网络中依次串接的每个模块中的每层计算单元结构;所述每层计算单元结构包括至少一个计算单元;A computing unit structure obtaining unit, configured to obtain each layer of computing unit structures in each module connected in series in the deep neural network in a preset search space; each layer of computing unit structure includes at least one computing unit;
信息流获得单元,用于在每个模块中采用预设连接方式将各计算单元进行连接,得到每个模块中的信息流;其中,处于同一层计算单元结构的计算单元之间不进行连接,每个计算单元能够与和其所在模块中的不同层的计算单元,以及其所在模块的输入和输出进行连接;The information flow obtaining unit is used to connect each computing unit in each module with a preset connection method to obtain the information flow in each module; wherein, the computing units in the same layer of computing unit structure are not connected, Each computing unit can be connected to the computing units at different layers in the module where it is located, and the inputs and outputs of the module where it is located;
初始神经网络获得单元,用于根据模块及每个模块中的计算单元的连接情况,得到初始神经网络;The initial neural network obtaining unit is used to obtain the initial neural network according to the connection of the modules and the computing units in each module;
稀疏缩放算子设置单元,用于对所述初始神经网络中的信息流设置稀疏缩放算子,其中所述稀疏缩放算子用于对所述信息流进行缩放;A sparse scaling operator setting unit, configured to set a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow;
权重和算子训练单元,用于采用预置的训练样本数据对所述初始神经网络的权重和信息流的稀疏缩放算子进行训练,得到中间神经网络;A weight and operator training unit, used to train the weight of the initial neural network and the sparse scaling operator of the information flow using preset training sample data to obtain an intermediate neural network;
搜索结果获得单元,用于将所述中间神经网络中稀疏缩放算子为零的信息流删除,得到搜索空间内的搜索结果神经网络。A search result obtaining unit is used to delete the information stream whose sparse scaling operator is zero in the intermediate neural network to obtain a search result neural network in the search space.
再一方面,本申请提供一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现上述的深度神经网络的结构搜索方法。In yet another aspect, the present application provides a computer-readable storage medium on which a computer program is stored, which is characterized in that, when the program is executed by a processor, the above deep neural network structure search method is implemented.
再一方面,本申请提供一种计算机设备,包括存储器、处理器及存储在存储上并可在处 理器上运行的计算机程序,所述处理器执行所述程序时实现上述的深度神经网络的结构搜索方法。In still another aspect, the present application provides a computer device, including a memory, a processor, and a computer program stored on the storage and executable on the processor. When the processor executes the program, the structure of the deep neural network described above is realized. Search method.
本申请实施例提供的一种深度神经网络的结构搜索方法及装置,首先,在预先设置的搜索空间中获得深度神经网络中依次串接的每个模块中的每层计算单元结构;每层计算单元结构包括至少一个计算单元;之后,在每个模块中采用预设连接方式将各计算单元进行连接,得到每个模块中的信息流;其中,处于同一层计算单元结构的计算单元之间不进行连接,每个计算单元能够与和其所在模块中的不同层的计算单元,以及其所在模块的输入和输出进行连接;然后,根据模块及每个模块中的计算单元的连接情况,得到初始神经网络;对初始神经网络中的信息流设置稀疏缩放算子,其中稀疏缩放算子用于对信息流进行缩放;采用预置的训练样本数据对初始神经网络的权重和信息流的稀疏缩放算子进行训练,得到中间神经网络;进而,将中间神经网络中稀疏缩放算子为零的信息流删除,得到搜索空间内的搜索结果神经网络。本申请与现有技术中直接从搜索空间搜索重要的网络结构不同,本申请通过稀疏缩放算子,可删除不重要的信息流来实现网络结构的搜索。本申请在网络结构的搜索过程中,无需对控制器进行训练,也无需使用复杂的进化算法,不需要对子网络进行长时间的训练,仅通过对权重和稀疏缩放算子的训练即可得到搜索结果,使得网络结构搜索的时间大大减小,特别是对于大规模数据集上的网络结构搜索,更为节省网络结构搜索的时间。An embodiment of the present application provides a method and device for searching a structure of a deep neural network. First, the structure of each layer of computing units in each module connected in series in the deep neural network is obtained in a preset search space; The unit structure includes at least one computing unit; after that, each computing unit is connected in a preset connection manner in each module to obtain the information flow in each module; wherein, between computing units in the same layer of computing unit structure is not Connect, each computing unit can be connected to the computing units at different layers of the module where it is located, and the input and output of the module where it is located; then, according to the connection of the module and the computing unit in each module, the initial Neural network; set the sparse scaling operator on the information flow in the initial neural network, where the sparse scaling operator is used to scale the information flow; the preset training sample data is used to weight the initial neural network and the sparse scaling operation of the information flow Sub-training to obtain an intermediate neural network; further, the intermediate neural network is sparse Operators put information deleted sub-zero flow, neural networks get search results in the search space. This application is different from the prior art in searching for an important network structure directly from the search space. This application can delete unimportant information flows to implement a network structure search by using a sparse scaling operator. This application does not need to train the controller during the search of the network structure, nor does it need to use complex evolutionary algorithms, and does not need to train the subnet for a long time. It can be obtained only by training the weights and sparse scaling operators. The search results greatly reduce the network structure search time, especially for the network structure search on large-scale data sets, and save the network structure search time.
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present application will be explained in the subsequent description, and partly become obvious from the description, or be understood by implementing the present application. The purpose and other advantages of the present application can be realized and obtained by the structures specifically pointed out in the written description, claims, and drawings.
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。The technical solutions of the present application will be further described in detail below through the accompanying drawings and embodiments.
附图说明BRIEF DESCRIPTION
附图用来提供对本申请的进一步理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本申请,并不构成对本申请的限制。显而易见地,下面描述中的附图仅仅是本申请一些实施例,对于本领域普通技术人员而言,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:The drawings are used to provide a further understanding of the present application, and constitute a part of the specification. They are used to explain the present application together with the embodiments of the present application, and do not constitute a limitation on the present application. Obviously, the drawings in the following description are only some embodiments of the present application, and those of ordinary skill in the art can obtain other drawings based on these drawings without creative efforts. In the drawings:
图1为本申请实施例提供的一种深度神经网络的结构搜索方法的流程图一;FIG. 1 is a flowchart 1 of a method for searching a deep neural network structure provided by an embodiment of the present application;
图2为本申请实施例中所涉及的深度神经网络中的搜索空间内的网络结构示意图;2 is a schematic diagram of a network structure in a search space in a deep neural network involved in an embodiment of this application;
图3为将本申请实施例应用到两层结构的网络搜索的实例示意图;3 is a schematic diagram of an example of applying the embodiment of the present application to a two-layer structure network search;
图4为本申请实施例提供的一种深度神经网络的结构搜索装置的结构示意图。FIG. 4 is a schematic structural diagram of a deep neural network structure search device provided by an embodiment of the present application.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本申请中的技术方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。In order to enable those skilled in the art to better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the drawings in the embodiments of the present application. Obviously, the described The embodiments are only a part of the embodiments of the present application, but not all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the scope of protection of this application.
为了便于理解本申请,下面对本申请所涉及的技术术语进行解释:In order to facilitate understanding of this application, the following explains technical terms involved in this application:
DNN:深度神经网络(Deep Neural Network)。DNN: Deep Neural Network (Deep Neural Network).
计算单元:神经网络中的用于进行卷积、池化等计算的单元节点。Computing unit: A unit node in a neural network used for calculations such as convolution and pooling.
网络结构搜索:在神经网络中搜索最优网络结构的过程。Network structure search: The process of searching for the optimal network structure in a neural network.
在实现本申请实施例的过程中,申请人发现现有技术一般采用基于控制器的网络结构搜索的启发式方法,即:In the process of implementing the embodiments of the present application, the applicant found that the prior art generally uses a heuristic method of network structure search based on a controller, namely:
根据先验知识和深度神经网络结构(神经元、网络层、模组、模块等特定结构)来构建一些待搜索的网络结构;然后为待搜索的网络结构设置控制器,采用分布式求解的方式,即对于每个控制器,进行多个待搜索的网络结构的并行计算,得到每个网络结构的准确率用来对控制器进行梯度下降计算,从而得到最优网络结构。可见,对于采用基于控制器的网络结构搜索的启发式方法,需要对大量控制器进行训练,以及分布式求解,过程较为繁琐缓慢。According to the prior knowledge and deep neural network structure (neuron, network layer, module, module and other specific structures) to build some network structures to be searched; then set the controller for the network structure to be searched, using a distributed solution That is, for each controller, parallel calculation of multiple network structures to be searched is performed, and the accuracy of each network structure is obtained and used to perform gradient descent calculation on the controller, thereby obtaining the optimal network structure. It can be seen that for the heuristic method of searching the network structure based on the controller, a large number of controllers need to be trained and distributed, and the process is cumbersome and slow.
为了解决上述现有技术中的问题,如图1所示,本申请实施例提供一种深度神经网络的结构搜索方法,包括:In order to solve the above problems in the prior art, as shown in FIG. 1, an embodiment of the present application provides a structure search method for a deep neural network, including:
步骤101、在预先设置的搜索空间中获得深度神经网络中依次串接的每个模块中的每层计算单元结构。Step 101: Obtain a computing unit structure of each layer in each module connected in series in the deep neural network in a preset search space.
其中,每层计算单元结构包括至少一个计算单元。Wherein, each layer of computing unit structure includes at least one computing unit.
步骤102、在每个模块中采用预设连接方式将各计算单元进行连接,得到每个模块中的信息流。Step 102: Connect each computing unit in a preset connection mode in each module to obtain the information flow in each module.
其中,处于同一层计算单元结构的计算单元之间不进行连接,每个计算单元能够与和其所在模块中的不同层的计算单元,以及其所在模块的输入和输出进行连接。Among them, there is no connection between the computing units in the same layer of computing unit structure, and each computing unit can be connected to the computing unit at a different layer in the module where it is located, and the input and output of the module where it is located.
步骤103、根据模块及每个模块中的计算单元的连接情况,得到初始神经网络。Step 103: Obtain an initial neural network according to the module and the connection of the computing units in each module.
步骤104、对初始神经网络中的信息流设置稀疏缩放算子,其中稀疏缩放算子用于对信息流进行缩放。Step 104: Set a sparse scaling operator on the information flow in the initial neural network, where the sparse scaling operator is used to scale the information flow.
步骤105、采用预置的训练样本数据对初始神经网络的权重和信息流的稀疏缩放算子进行训练,得到中间神经网络。Step 105: Use the preset training sample data to train the weight of the initial neural network and the sparse scaling operator of the information flow to obtain an intermediate neural network.
步骤106、将中间神经网络中稀疏缩放算子为零的信息流删除,得到搜索空间内的搜索结果神经网络。Step 106: Delete the information stream whose sparse scaling operator is zero in the intermediate neural network to obtain a search result neural network in the search space.
值得说明的是,在深度神经网络中,所预先设置的搜索空间可以如图2所示,其中可包 括多个模块21,各个模块21依次串接,即上一个模块的输出是下一个模块的输入;每个模块21(可以视为有向无环图)可包括多层计算单元结构22,每层计算单元结构22包括至少一个计算单元23(每个计算单元可以视为有向无环图中的节点),该每层计算单元结构22中的计算单元23一般可以包括卷积计算单元和池化计算单元中的至少一种。该卷积计算单元还可以为膨胀卷积计算单元或者组卷积计算单元等。It is worth noting that in the deep neural network, the preset search space can be as shown in Figure 2, which can include multiple modules 21, each module 21 in series, that is, the output of the previous module is the next module Input; each module 21 (can be regarded as a directed acyclic graph) can include a multi-layer computing unit structure 22, each layer of computing unit structure 22 includes at least one computing unit 23 (each computing unit can be regarded as a directed acyclic graph Node in the), the calculation unit 23 in the calculation unit structure 22 of each layer may generally include at least one of a convolution calculation unit and a pooling calculation unit. The convolution calculation unit may also be an expansion convolution calculation unit or a group convolution calculation unit.
优选地,上述步骤102可以通过如下方式实现:Preferably, the above step 102 can be implemented as follows:
在每个模块21中,采用全连接方式对每个计算单元23进行连接,即如图2所示,将每个计算单元23与和其所在模块21中的不同层的计算单元23,以及其所在模块21的输入和输出进行连接;这样可以得到从模块21的输入到每层计算单元结构22、从每层计算单元结构22到模块21的输出,以及各计算单元23之间的信息流(可视为有向无环图中节点之间的边)。这样可以得到搜索空间内的网络结构的全集(该搜索空间内的任一网络结构可以视为上述有向无环图的子图)。例如,在一个模块21中,第i个计算单元F (i)(x)的输出h(i),其等于所有之前的计算单元的输出h(j)之和经过计算单元F (i)(x)计算的结果,可以用公式表示为: In each module 21, each computing unit 23 is connected in a fully connected manner, that is, as shown in FIG. 2, each computing unit 23 and the computing unit 23 at a different layer in the module 21 in which it is located, and its Connect the input and output of the module 21; this can get the input from the module 21 to the computing unit structure 22 of each layer, the output from the computing unit structure 22 of each layer to the module 21, and the information flow between the computing units 23 (It can be regarded as an edge between nodes in a directed acyclic graph). In this way, a complete set of network structures in the search space can be obtained (any network structure in the search space can be regarded as a subgraph of the directed acyclic graph). For example, in a module 21, the output h (i) of the i-th calculation unit F (i) (x) is equal to the sum of the outputs h (j) of all previous calculation units through the calculation unit F (i) ( x) The calculation result can be expressed as:
Figure PCTCN2019077049-appb-000001
Figure PCTCN2019077049-appb-000001
这样,在上述步骤103中,根据上述图2所示的结构,即可得到初始神经网络。In this way, in the above step 103, according to the structure shown in FIG. 2 above, an initial neural network can be obtained.
进一步的,在上述步骤103之后,可以对初始神经网络的权重进行配置,以初始化初始神经网络的权重。或者,在上述步骤103之后,可以采用预置的预训练样本数据对所述初始神经网络的权重进行预训练,得到预训练后的初始神经网络,这样预训练过后,得到的初始神经网络的权重较好。此处对权重进行配置或者进行预训练是为了得到初始神经网络的权重初始值,以便于后续的稀疏缩放算子的设置和训练。Further, after the above step 103, the weights of the initial neural network may be configured to initialize the weights of the initial neural network. Alternatively, after the above step 103, the pre-trained sample data may be used to pre-train the weights of the initial neural network to obtain the pre-trained initial neural network, so that after pre-training, the weights of the initial neural network obtained better. Here, the weights are configured or pre-trained in order to obtain the initial weight values of the initial neural network, so as to facilitate the subsequent setting and training of the sparse scaling operator.
之后在上述步骤104中,需要对初始神经网络中的信息流设置稀疏缩放算子,即例如在上述所有之前的计算单元的输出h(j)处增加稀疏缩放算子
Figure PCTCN2019077049-appb-000002
用于表示第j个计算单元到第i个计算单元之间信息流的稀疏缩放算子。则上述公式(1)在增加稀疏缩放算子后,应表示为:
Then in the above step 104, a sparse scaling operator needs to be set for the information flow in the initial neural network, that is, for example, a sparse scaling operator is added at the output h (j) of all the previous computing units
Figure PCTCN2019077049-appb-000002
A sparse scaling operator used to represent the information flow between the jth calculation unit to the ith calculation unit. Then, after adding the sparse scaling operator, the above formula (1) should be expressed as:
Figure PCTCN2019077049-appb-000003
Figure PCTCN2019077049-appb-000003
此处,各稀疏缩放算子的取值大于等于0。例如,在上述对初始神经网络的权重进行配置,以初始化初始神经网络的权重后,该稀疏缩放算子的取值区间可以为[0,1],稀疏缩放算子不一定等于1。而在上述采用预置的预训练样本数据对初始神经网络的权重进行预训练后, 该稀疏缩放算子的取值一般取为1。Here, the value of each sparse scaling operator is greater than or equal to 0. For example, after configuring the weights of the initial neural network to initialize the weights of the initial neural network, the value interval of the sparse scaling operator may be [0,1], and the sparse scaling operator may not be equal to 1. However, after pre-training the initial neural network weights using preset pre-training sample data, the value of the sparse scaling operator is generally taken to be 1.
以下以一卷积神经网络结构的搜索进行说明,在卷积神经网络结构中,计算单元即为卷积计算单元和池化计算单元,信息流即为网络中的特征图。在该卷积神经网络结构中,包含了若干模块,每个模块包含若干层计算单元结构,每一层的计算单元结构又包括若干不同的计算单元(例如,1×1的卷积计算、3×3的卷积计算、5×5的卷积计算、池化计算等,不仅局限于上述这几种)。各个模块依次串接,即上一个模块的输出是下一个模块的输入,每个计算单元与和其所在模块中的不同层的计算单元,以及其所在模块的输入和输出进行连接。这样,每个计算单元的输出可以被表示出来,例如,在卷积神经网络结构中,第b个模块的第i层的第j个计算单元的输出可以表示为:The following is a description of the search for a convolutional neural network structure. In the convolutional neural network structure, the calculation unit is the convolution calculation unit and the pooling calculation unit, and the information flow is the feature map in the network. In this convolutional neural network structure, it contains several modules, each module contains several layers of computing unit structure, and each layer of computing unit structure includes several different computing units (for example, 1 × 1 convolution calculation, 3 (× 3 convolution calculation, 5 × 5 convolution calculation, pooling calculation, etc. are not limited to the above) Each module is connected in series, that is, the output of the previous module is the input of the next module, and each computing unit is connected to the computing unit at a different layer in the module where it is located, and the input and output of the module where it is located. In this way, the output of each computing unit can be expressed. For example, in the structure of a convolutional neural network, the output of the jth computing unit of the i-th layer of the b-th module can be expressed as:
Figure PCTCN2019077049-appb-000004
Figure PCTCN2019077049-appb-000004
其中,F (b,i,j)(x)表示第b个模块的第i层的第j个计算单元的计算;N表示一层计算单元结构所包含的计算单元总数;
Figure PCTCN2019077049-appb-000005
表示第b个模块的第m层的第n个计算单元到第b个模块的第i层的第j个计算单元之间信息流的稀疏缩放算子;h(b,m,n)表示第b个模块的第m层的第n个计算单元的输出;O(b-1)表示第b-1个模块的输出,即第b个模块的输入;
Figure PCTCN2019077049-appb-000006
表示第b个模块的输入O(b-1)到第b个模块的第i层的第j个计算单元之间信息流的稀疏缩放算子。此处,设h(b,0,0)=O(b-1)作为第b个模块的输入,设h(b,M+1,0)=O(b)作为第b个模块的输出,其中M表示第b个模块所包含的层总数。这样可以确定位于第m层的计算单元共有(m-1)N+1个输入。
Among them, F (b, i, j) (x) represents the calculation of the j-th computing unit of the i-th layer of the b-th module; N represents the total number of computing units included in the structure of one-layer computing unit;
Figure PCTCN2019077049-appb-000005
Represents the sparse scaling operator of the information flow between the nth computing unit of the mth layer of the bth module and the jth computing unit of the ith layer of the bth module; h (b, m, n) represents the The output of the nth calculation unit of the mth layer of the b modules; O (b-1) represents the output of the b-1th module, that is, the input of the bth module;
Figure PCTCN2019077049-appb-000006
Represents the sparse scaling operator of the information flow between the input O (b-1) of the b-th module to the j-th computing unit of the i-th layer of the b-th module. Here, let h (b, 0,0) = O (b-1) be the input of the bth module, and let h (b, M + 1,0) = O (b) be the output of the bth module , Where M represents the total number of layers contained in the b-th module. In this way, it can be determined that the computing unit located at the m-th layer has (m-1) N + 1 inputs.
此处,需要说明的是,在本申请实施例中,各个计算单元到其所在模块输出间的连接也是可以训练学习的。例如,上述卷积神经网络中,第b个模块的输出O(b)可以通过对该模块中所有计算单元的输出进行拼接,再使用卷积核大小为1的卷积降低特征图的通道数来保持通道数不变,如下公式所示:Here, it should be noted that, in the embodiment of the present application, the connection between each computing unit and the output of the module where it is located can also be trained and learned. For example, in the above convolutional neural network, the output O (b) of the bth module can be spliced by the outputs of all computing units in the module, and then the convolution kernel size of 1 is used to reduce the number of channels of the feature map To keep the number of channels unchanged, as shown in the following formula:
Figure PCTCN2019077049-appb-000007
Figure PCTCN2019077049-appb-000007
其中,h(b,m,n)表示在第b个模块中,位于第m层中第n个计算单元的输出,
Figure PCTCN2019077049-appb-000008
表示第b个模块中,位于第m层中第n个计算单元与该第b个模块输出连接的信息流的缩放算子,O(b-1)表示第b-1个模块的输出,即第b个模块的输入。R(x)表示特征图的拼接与卷积核大小为1的卷积计算,用来融合特征图并保证模块输出的通道数不变。
Among them, h (b, m, n) represents the output of the nth computing unit in the mth layer in the bth module,
Figure PCTCN2019077049-appb-000008
Represents the scaling operator of the information stream connected to the output of the nth computing unit in the mth layer and the bth module in the bth module, O (b-1) represents the output of the b-1th module, that is Input of module b. R (x) represents the splicing of the feature map and the convolution calculation with the size of the convolution kernel being 1. It is used to fuse the feature map and ensure that the number of channels output by the module remains unchanged.
对于上述步骤105,可以采用如下方式实现:For the above step 105, it can be implemented as follows:
步骤S1、构建初始神经网络对应的目标函数,所述目标函数包含损失函数、权重正则函数和稀疏正则函数。该目标函数可以如公式:Step S1. Construct an objective function corresponding to the initial neural network. The objective function includes a loss function, a weight regular function, and a sparse regular function. The objective function can be as the formula:
Figure PCTCN2019077049-appb-000009
Figure PCTCN2019077049-appb-000009
其中,W为权重,λ为稀疏缩放算子向量,K为样本数据的数量,L(y i,Net(x i,W,λ))为神经网络在样本数据x i上的损失,y i为样本标签,Net(x i,W,λ)为神经网络的输出,
Figure PCTCN2019077049-appb-000010
为权重正则函数,记为R(W),δ为权重W的参数衰减权重,γ||λ|| 1为稀疏正则函数,记为Rs(λ)。另外,此处的稀疏正则函数γ||λ|| 1还可以由更复杂的稀疏约束替代,例如非凸的稀疏约束。
Among them, W is the weight, λ is the sparse scaling operator vector, K is the number of sample data, L (y i , Net (x i , W, λ)) is the loss of the neural network on the sample data x i , y i Is the sample label, and Net (x i , W, λ) is the output of the neural network,
Figure PCTCN2019077049-appb-000010
Is the weight regular function, denoted R (W), δ is the parameter attenuation weight of the weight W, and γ || λ || 1 is the sparse regular function, denoted Rs (λ). In addition, the sparse regular function γ || λ || 1 can also be replaced by more complex sparse constraints, such as non-convex sparse constraints.
步骤S2、采用所述训练样本数据对所述初始神经网络进行迭代训练。Step S2: Perform iterative training on the initial neural network using the training sample data.
步骤S3、当迭代训练次数达到阈值或者所述目标函数满足预置的收敛条件时,得到所述中间神经网络。Step S3: When the number of iteration training times reaches a threshold or the target function meets a preset convergence condition, the intermediate neural network is obtained.
在一些实施例中,前述步骤S2的实现可通过对初始神经网络进行多次以下的迭代训练,以一次非首次迭代和非尾次迭代的迭代过程(以下称为本次迭代训练)为例进行描述,一次迭代训练包括以下步骤C1~步骤C3:In some embodiments, the foregoing step S2 may be implemented by performing the following iteration training on the initial neural network multiple times, taking an iteration process of a non-first iteration and a non-tail iteration (hereinafter referred to as this iteration training) as an example It is described that one iteration training includes the following steps C1 to C3:
步骤C1、将前一次迭代训练得到的稀疏缩放算子作为所述目标函数的常量,将所述权重作为所述目标函数的变量,采用第一优化算法对所述目标函数进行优化,得到本次迭代训练的权重;Step C1, using the sparse scaling operator obtained in the previous iteration training as the constant of the objective function, using the weight as the variable of the objective function, and optimizing the objective function using the first optimization algorithm to obtain this time Weights for iterative training;
步骤C2、将本次迭代训练的权重作为所述目标函数的常量,将稀疏缩放算子作为所述目标函数的变量,采用第二优化算法对所述目标函数进行优化,得到本次迭代训练的稀疏缩放算子;Step C2. Using the weight of this iteration training as the constant of the objective function, using the sparse scaling operator as the variable of the objective function, and optimizing the objective function with the second optimization algorithm to obtain Sparse scaling operator;
步骤C3、基于本次迭代训练的权重和稀疏缩放算子进行下一次迭代训练。Step C3. Perform the next iterative training based on the weights and sparse scaling operator of this iterative training.
另外,首次迭代训练过程如下:将初始稀疏缩放算子作为所述目标函数的常量,将所述权重作为所述目标函数的变量,采用第一优化算法对所述目标函数进行优化,得到本次迭代训练的权重;将本次迭代训练的权重作为所述目标函数的常量,将稀疏缩放算子作为所述目标函数的变量,采用第二优化算法对所述目标函数进行优化,得到本次迭代训练的稀疏缩放算子;基于本次迭代训练的权重和稀疏缩放算子进行第二次迭代训练。In addition, the first iteration training process is as follows: the initial sparse scaling operator is used as the constant of the objective function, the weight is used as the variable of the objective function, and the objective function is optimized using the first optimization algorithm to obtain this time Weights for iterative training; use the weights for this iterative training as the constants of the objective function, and use the sparse scaling operator as the variable of the objective function, and optimize the objective function with the second optimization algorithm to obtain this iteration The sparse scaling operator for training; the second iterative training is based on the weights of this iterative training and the sparse scaling operator.
另外,尾次迭代训练过程如下:将前一次迭代训练得到的稀疏缩放算子作为所述目标函数的常量,将所述权重作为所述目标函数的变量,采用第一优化算法对所述目标函数进行优化,得到本次迭代训练的权重;将本次迭代训练的权重作为所述目标函数的常量,将稀疏缩放算子作为所述目标函数的变量,采用第二优化算法对所述目标函数进行优化,得到本次迭 代训练的稀疏缩放算子;将包含本次迭代训练得到的稀疏缩放算子和权重的神经网络作为中间神经网络。In addition, the last iteration training process is as follows: the sparse scaling operator obtained from the previous iteration training is used as the constant of the objective function, the weight is used as the variable of the objective function, and the objective function is optimized using the first optimization algorithm Perform optimization to obtain the weight of this iteration training; take the weight of this iteration training as the constant of the objective function, use the sparse scaling operator as the variable of the objective function, and use the second optimization algorithm to perform the objective function Optimize to obtain the sparse scaling operator trained in this iteration; use the neural network containing the sparse scaling operator and weights obtained in this iteration training as the intermediate neural network.
此处,在本申请实施例中,该第一优化算法可以但不限于为以下任意一种算法:随机梯度下降算法、引入动量的变种算法。Here, in the embodiment of the present application, the first optimization algorithm may be, but not limited to, any one of the following algorithms: a stochastic gradient descent algorithm, a variant algorithm that introduces momentum.
此处,在本申请实施例中,该第二优化算法可以但不限于为以下任意一种算法:加速近端梯度下降算法、近端梯度下降算法或者交替方向乘子算法。Here, in the embodiment of the present application, the second optimization algorithm may be, but not limited to, any one of the following algorithms: accelerated near-end gradient descent algorithm, near-end gradient descent algorithm, or alternating direction multiplier algorithm.
为进一步对本申请实施例中如何求解出目标函数中的W和λ进行详细的描述,下面以目标函数为上述公式(5)为例,对一次迭代训练优化目标函数求解得到W和λ进行描述。将
Figure PCTCN2019077049-appb-000011
记为g(λ),Rs(λ)记为H(λ)。
In order to further describe in detail how to solve for the W and λ in the objective function in the embodiment of the present application, the following uses the objective function as the above formula (5) as an example to describe W and λ obtained by solving the objective function for one iteration of training optimization. will
Figure PCTCN2019077049-appb-000011
It is denoted as g (λ), and Rs (λ) is denoted as H (λ).
将λ作为常量,将W作为变量,则目标函数换转为
Figure PCTCN2019077049-appb-000012
采用随机梯度下降算法即可求解得到W的取值,具体过程不再详细描述。
Using λ as a constant and W as a variable, the objective function is converted to
Figure PCTCN2019077049-appb-000012
The value of W can be obtained by using the stochastic gradient descent algorithm, and the specific process will not be described in detail.
将W作为常量,将λ作为变量,则目标函数换转为
Figure PCTCN2019077049-appb-000013
采用加速近端梯度下降算法求解λ的取值,具体可通过但不仅限于以下几种方式得到:
Using W as a constant and λ as a variable, the objective function is converted to
Figure PCTCN2019077049-appb-000013
The accelerated near-end gradient descent algorithm is used to solve the value of λ, which can be obtained by but not limited to the following methods:
方式1,采用公式(6)~公式(8)得到λ:Method 1: Use formula (6) to formula (8) to obtain λ:
Figure PCTCN2019077049-appb-000014
Figure PCTCN2019077049-appb-000014
Figure PCTCN2019077049-appb-000015
Figure PCTCN2019077049-appb-000015
λ t=Proxη tH(z t)       公式(8) λ t = Proxη t H (z t ) Formula (8)
其中η t表示在第t次迭代训练时梯度下降的步长,
Figure PCTCN2019077049-appb-000016
Figure PCTCN2019077049-appb-000017
为软阈值算子,定义如下S α(z) i=sign(z i)(|z i|-α) +
Where η t represents the step size of gradient descent during the t-th iteration training,
Figure PCTCN2019077049-appb-000016
Figure PCTCN2019077049-appb-000017
For the soft threshold operator, it is defined as S α (z) i = sign (z i ) (| z i | -α) + .
方式2、由于前述方式1求解λ需要额外的前向后向计算来得到
Figure PCTCN2019077049-appb-000018
将该算法直接应用到现有深度学习框架有点难度。因此,方式2对前述方式1的公式进行更新,得到公式(9)~公式(11),根据公式(9)~公式(11)计算得到λ:
Method 2. Since the previous method 1 solves λ, additional forward and backward calculations are required to obtain
Figure PCTCN2019077049-appb-000018
Applying this algorithm directly to existing deep learning frameworks is a bit difficult. Therefore, Mode 2 updates the formula of the foregoing Mode 1 to obtain formula (9) to formula (11), and calculates λ according to formula (9) to formula (11):
Figure PCTCN2019077049-appb-000019
Figure PCTCN2019077049-appb-000019
Figure PCTCN2019077049-appb-000020
Figure PCTCN2019077049-appb-000020
λ t=λ t-1+v t        公式(11) λ t = λ t-1 + v t formula (11)
方式3、本申请还可以采用变量替代的方法,即采用下式(12)~(14)计算得到λ:Method 3: In this application, a variable substitution method can also be used, that is, the following formulas (12) to (14) are used to calculate λ:
Figure PCTCN2019077049-appb-000021
Figure PCTCN2019077049-appb-000021
Figure PCTCN2019077049-appb-000022
Figure PCTCN2019077049-appb-000022
Figure PCTCN2019077049-appb-000023
Figure PCTCN2019077049-appb-000023
其中λ′ t-1=λ t-1t-1v t-1,μ为预设的固定值,并采用批量随机梯度下降的形式来更新W和λ。 Where λ ′ t-1 = λ t-1 + μ t-1 v t-1 , μ is a preset fixed value, and W and λ are updated in the form of batch stochastic gradient descent.
之后,在上述步骤106中,即可将中间神经网络中稀疏缩放算子为零的信息流删除,得到搜索空间内的搜索结果神经网络。并且,在与一个计算单元的连接对应的信息流均被删除后,则该计算单元对后续的计算已经无作用,则可将该计算单元删除。Then, in the above step 106, the information stream with the sparse scaling operator being zero in the intermediate neural network can be deleted to obtain the search result neural network in the search space. In addition, after the information streams corresponding to the connection of one computing unit are all deleted, the computing unit has no effect on subsequent calculations, and the computing unit may be deleted.
例如,如图3所示,将本申请实施例应用于图片分类任务中。设定基础网络中含有两层结构Level 1和Level 2,每层中还有两个不同的计算单元OP 1和OP 2,计算单元间的连接如图3的最左侧所示。经过上述步骤101至步骤105之后,可以训练得到图3中的中间所示的虚线的稀疏缩放算子为0。进而如图3的最右侧所示,将这些虚线删除后,确认Level 1层的计算单元OP 1已经无连接对应的信息流,则也被删除,最终得到搜索结果神经网络。For example, as shown in FIG. 3, the embodiment of the present application is applied to a picture classification task. It is assumed that the basic network contains two layers of Level 1 and Level 2, and there are two different computing units OP 1 and OP 2 in each layer. The connection between the computing units is shown in the leftmost side of FIG. 3. After the above steps 101 to 105, it can be trained that the sparse scaling operator of the broken line shown in the middle of FIG. 3 is 0. Furthermore, as shown in the far right side of FIG. 3, after deleting these dotted lines, it is confirmed that the computing unit OP at Level 1 has no information flow corresponding to the connection, and it is also deleted, and finally the search result neural network is obtained.
值得说明的是,图3所列举的例子仅为本申请实施例的一个具体应用,并非全部的应用。本申请实施例除了应用在单个模块结构搜索外,本申请中位于网络不同模块的稀疏缩放算子还可以独立更新,使得不同模块能够搜索训练得到更加灵活的网络结构。It is worth noting that the example listed in FIG. 3 is only a specific application of the embodiments of the present application, and not all applications. In addition to being applied to the search of a single module structure in this embodiment of the present application, the sparse scaling operators located in different modules of the network in this application can also be independently updated, so that different modules can search and train to obtain a more flexible network structure.
另外,本申请实施例还提供一种目标检测方法,包括:In addition, embodiments of the present application also provide a target detection method, including:
获得待进行目标检测的样本数据,输入到上述图1对应的深度神经网络的结构搜索方法得到的搜索结果神经网络中,以所述搜索结果神经网络的输出作为目标检测结果。The sample data to be subjected to target detection is obtained and input into the search result neural network obtained by the structure search method of the deep neural network corresponding to FIG. 1 above, and the output of the search result neural network is used as the target detection result.
另外,本申请实施例还提供一种语义分割方法,包括:In addition, embodiments of the present application also provide a semantic segmentation method, including:
获得待进行语义分割的样本数据,输入到上述图1对应的深度神经网络的结构搜索方法得到的搜索结果神经网络中,以所述搜索结果神经网络的输出作为语义分割结果。The sample data to be semantically segmented is obtained and input into the search result neural network obtained by the structure search method of the deep neural network corresponding to FIG. 1 above, and the output of the search result neural network is used as the semantic segmentation result.
图1对应的深度神经网络的结构搜索方法不仅仅局限于应用在目标检测和语义分割任务中,还可以用于到其他不同的任务中,此处不再一一列举。The structure search method of the deep neural network corresponding to FIG. 1 is not limited to the application in target detection and semantic segmentation tasks, but can also be used in other different tasks, which will not be listed here one by one.
另外,如图4所示,本申请实施例还提供一种深度神经网络的结构搜索装置,其特征在于,包括:In addition, as shown in FIG. 4, an embodiment of the present application also provides a deep neural network structure search device, which is characterized by including:
计算单元结构获得单元31,用于在预先设置的搜索空间中获得深度神经网络中依次串接的每个模块中的每层计算单元结构;所述每层计算单元结构包括至少一个计算单元。The calculation unit structure obtaining unit 31 is configured to obtain each layer of the calculation unit structure in each module connected in series in the deep neural network in a preset search space; each layer of the calculation unit structure includes at least one calculation unit.
信息流获得单元32,用于在每个模块中采用预设连接方式将各计算单元进行连接,得到每个模块中的信息流;其中,处于同一层计算单元结构的计算单元之间不进行连接,每个计算单元能够与和其所在模块中的不同层的计算单元,以及其所在模块的输入和输出进行连接。The information flow obtaining unit 32 is used to connect each computing unit with a preset connection method in each module to obtain the information flow in each module; wherein, computing units in the same layer computing unit structure are not connected Each computing unit can be connected to computing units at different layers of the module in which it resides, and the inputs and outputs of the module in which it resides.
初始神经网络获得单元33,用于根据模块及每个模块中的计算单元的连接情况,得到初 始神经网络。The initial neural network obtaining unit 33 is used to obtain the initial neural network according to the connection of the modules and the calculation units in each module.
稀疏缩放算子设置单元34,用于对所述初始神经网络中的信息流设置稀疏缩放算子,其中所述稀疏缩放算子用于对所述信息流进行缩放。The sparse scaling operator setting unit 34 is configured to set a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow.
权重和算子训练单元35,用于采用预置的训练样本数据对所述初始神经网络的权重和信息流的稀疏缩放算子进行训练,得到中间神经网络。The weight and operator training unit 35 is configured to train the weights of the initial neural network and the sparse scaling operator of the information flow using preset training sample data to obtain an intermediate neural network.
搜索结果获得单元36,用于将所述中间神经网络中稀疏缩放算子为零的信息流删除,得到搜索空间内的搜索结果神经网络。The search result obtaining unit 36 is configured to delete the information stream whose sparse scaling operator is zero in the intermediate neural network to obtain a search result neural network in the search space.
此外,本申请实施例还提供一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现上述图1对应的深度神经网络的结构搜索方法。In addition, embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored, which is characterized in that, when the program is executed by a processor, the structure search method of the deep neural network corresponding to FIG. 1 described above is implemented.
此外,本申请实施例还提供一种计算机设备,包括存储器、处理器及存储在存储上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述图1对应的深度神经网络的结构搜索方法。In addition, the embodiments of the present application also provide a computer device, including a memory, a processor, and a computer program stored on the storage and executable on the processor. When the processor executes the program, the depth corresponding to FIG. 1 described above is realized. Neural network structure search method.
综上所述,本申请实施例提供的一种深度神经网络的结构搜索方法及装置,首先,在预先设置的搜索空间中获得深度神经网络中依次串接的每个模块中的每层计算单元结构;每层计算单元结构包括至少一个计算单元;之后,在每个模块中采用预设连接方式将各计算单元进行连接,得到每个模块中的信息流;其中,处于同一层计算单元结构的计算单元之间不进行连接,每个计算单元能够与和其所在模块中的不同层的计算单元,以及其所在模块的输入和输出进行连接;然后,根据模块及每个模块中的计算单元的连接情况,得到初始神经网络;对初始神经网络中的信息流设置稀疏缩放算子,其中稀疏缩放算子用于对信息流进行缩放;采用预置的训练样本数据对初始神经网络的权重和信息流的稀疏缩放算子进行训练,得到中间神经网络;进而,将中间神经网络中稀疏缩放算子为零的信息流删除,得到搜索空间内的搜索结果神经网络。本申请与现有技术中直接从搜索空间搜索重要的网络结构不同,本申请通过稀疏缩放算子,可删除不重要的信息流来实现网络结构的搜索。本申请在网络结构的搜索过程中,无需对控制器进行训练,也无需使用复杂的进化算法,不需要对子网络进行长时间的训练,仅通过对权重和稀疏缩放算子的训练即可得到搜索结果,使得网络结构搜索的时间大大减小,特别是对于大规模数据集上的网络结构搜索,更为节省网络结构搜索的时间。In summary, a method and device for searching a structure of a deep neural network provided in the embodiments of the present application, first, obtain each layer of computing units in each module connected in series in the deep neural network in a preset search space Structure; each layer of computing unit structure includes at least one computing unit; after that, each computing unit is connected in a preset connection mode in each module to obtain the information flow in each module; There is no connection between the computing units, each computing unit can be connected to the computing unit at a different layer in the module where it is located, and the input and output of the module where it is located; then, according to the module and the computing unit in each module Connect the situation to get the initial neural network; set the sparse scaling operator on the information flow in the initial neural network, where the sparse scaling operator is used to scale the information flow; use the preset training sample data to weight and information the initial neural network The sparse scaling operator of the stream is trained to obtain an intermediate neural network; Sparse network operators scale of zero flow of information deleted, the neural network get search results in the search space. This application is different from the prior art in searching for an important network structure directly from the search space. This application can delete unimportant information flows to implement a network structure search by using a sparse scaling operator. This application does not need to train the controller during the search of the network structure, nor does it need to use complex evolutionary algorithms, and does not need to train the subnet for a long time. It can be obtained only by training the weights and sparse scaling operators The search results greatly reduce the network structure search time, especially for the network structure search on large-scale data sets, and save the network structure search time.
以上结合具体实施例描述了本申请的基本原理,但是,需要指出的是,对本领域普通技术人员而言,能够理解本申请的方法和装置的全部或者任何步骤或者部件可以在任何计算装置(包括处理器、存储介质等)或者计算装置的网络中,以硬件固件、软件或者他们的组合加以实现,这是本领域普通技术人员在阅读了本申请的说明的情况下运用它们的基本编程技能就能实现的。The basic principles of the present application have been described above in conjunction with specific embodiments. However, it should be noted that those of ordinary skill in the art can understand that all or any steps or components of the methods and devices of the present application can be included in any computing device (including Processors, storage media, etc.) or computing device networks, implemented in hardware firmware, software, or a combination of these, this is the basic programming skills of those of ordinary skill in the art after reading the description of this application. Achievable.
本领域普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程 序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。A person of ordinary skill in the art may understand that all or part of the steps carried in the method of the above embodiment may be completed by instructing relevant hardware through a program, and the program may be stored in a computer-readable storage medium, and when the program is executed , Including one of the steps of the method embodiment or a combination thereof.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, the functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one module. The above integrated modules can be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software functional module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage and optical storage, etc.) containing computer usable program code.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。This application is described with reference to flowcharts and / or block diagrams of methods, devices (systems), and computer program products according to embodiments of the application. It should be understood that each flow and / or block in the flowchart and / or block diagram and a combination of the flow and / or block in the flowchart and / or block diagram may be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, special-purpose computer, embedded processing machine, or other programmable data processing device to produce a machine that enables the generation of instructions executed by the processor of the computer or other programmable data processing device A device for realizing the functions specified in one block or multiple blocks of one flow or multiple blocks of a flowchart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions The device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and / or block diagrams.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device The instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and / or block diagrams.
尽管已描述了本申请的上述实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括上述实施例以及落入本申请范围的所有变更和修改。Although the above embodiments of the present application have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic inventive concept. Therefore, the appended claims are intended to be construed as including the above embodiments and all changes and modifications falling within the scope of the present application.
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various modifications and variations to this application without departing from the spirit and scope of this application. In this way, if these modifications and variations of the present application fall within the scope of the claims of the present application and equivalent technologies thereof, the present application is also intended to include these modifications and variations.

Claims (15)

  1. 一种深度神经网络的结构搜索方法,其特征在于,包括:A deep neural network structure search method, which is characterized by:
    在预先设置的搜索空间中获得深度神经网络中依次串接的每个模块中的每层计算单元结构;所述每层计算单元结构包括至少一个计算单元;Obtaining a structure of each layer of computing units in each module connected in series in the deep neural network in a preset search space; the structure of each layer of computing units includes at least one computing unit;
    在每个模块中采用预设连接方式将各计算单元进行连接,得到每个模块中的信息流;其中,处于同一层计算单元结构的计算单元之间不进行连接,每个计算单元能够与和其所在模块中的不同层的计算单元,以及其所在模块的输入和输出进行连接;In each module, the computing units are connected by a preset connection method to obtain the information flow in each module; among them, computing units in the same layer of computing unit structure are not connected, and each computing unit can be connected to The calculation units of different layers in the module where it is located, and the input and output of the module where it is located are connected;
    根据模块及每个模块中的计算单元的连接情况,得到初始神经网络;According to the module and the connection of the computing unit in each module, the initial neural network is obtained;
    对所述初始神经网络中的信息流设置稀疏缩放算子,其中所述稀疏缩放算子用于对所述信息流进行缩放;Setting a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow;
    采用预置的训练样本数据对所述初始神经网络的权重和信息流的稀疏缩放算子进行训练,得到中间神经网络;Using preset training sample data to train the weight of the initial neural network and the sparse scaling operator of the information flow to obtain an intermediate neural network;
    将所述中间神经网络中稀疏缩放算子为零的信息流删除,得到搜索空间内的搜索结果神经网络。The information flow in which the sparse scaling operator is zero in the intermediate neural network is deleted to obtain a search result neural network in the search space.
  2. 根据权利要求1所述的方法,其特征在于,所述每层计算单元结构的计算单元包括卷积计算单元和池化计算单元中的至少一种。The method according to claim 1, wherein the calculation unit of the calculation unit structure of each layer comprises at least one of a convolution calculation unit and a pooling calculation unit.
  3. 根据权利要求1所述的方法,其特征在于,在每个模块中采用预设连接方式将各计算单元进行连接,得到每个模块中的信息流,包括:The method according to claim 1, wherein each computing unit is connected in a preset connection manner in each module to obtain the information flow in each module, including:
    在每个模块中,将每个计算单元与和其所在模块中的不同层的计算单元,以及其所在模块的输入和输出进行连接;得到从模块的输入到每层计算单元结构、从每层计算单元结构到模块的输出,以及各计算单元之间的信息流。In each module, connect each computing unit to the computing unit at the different layer of the module in which it is located, and the input and output of the module where it is located; get the input from the module to the structure of the computing unit at each layer, from each layer The output from the calculation unit structure to the module, and the information flow between the calculation units.
  4. 根据权利要求1所述的方法,其特征在于,在根据模块及每个模块中的计算单元的连接情况,得到初始神经网络之后,还包括:The method according to claim 1, characterized in that after obtaining the initial neural network according to the connection status of the modules and the calculation units in each module, the method further comprises:
    对初始神经网络的权重进行配置,以初始化初始神经网络的权重。Configure the initial neural network weights to initialize the initial neural network weights.
  5. 根据权利要求1所述的方法,其特征在于,在根据模块及每个模块中的计算单元的连接情况,得到初始神经网络之后,还包括:The method according to claim 1, characterized in that after obtaining the initial neural network according to the connection status of the modules and the calculation units in each module, the method further comprises:
    采用预置的预训练样本数据对所述初始神经网络的权重进行预训练,得到预训练后的初始神经网络。The pre-trained sample data is used to pre-train the weights of the initial neural network to obtain the pre-trained initial neural network.
  6. 根据权利要求1所述的方法,其特征在于,在将所述中间神经网络中稀疏缩放算子为零的信息流删除之后,还包括:The method according to claim 1, characterized in that, after deleting the information stream whose sparse scaling operator is zero in the intermediate neural network, further comprising:
    在与一个计算单元的连接对应的信息流均被删除后,将该计算单元删除。After all the information streams corresponding to the connection of a computing unit are deleted, the computing unit is deleted.
  7. 根据权利要求1所述的方法,其特征在于,所述采用预置的训练样本数据对所述初始 神经网络的权重和信息流的稀疏缩放算子进行训练,得到中间神经网络,包括:The method according to claim 1, characterized in that, the preset training sample data is used to train the weights of the initial neural network and the sparse scaling operator of the information flow to obtain an intermediate neural network, including:
    构建初始神经网络对应的目标函数,所述目标函数包含损失函数、权重正则函数和稀疏正则函数;Construct an objective function corresponding to the initial neural network, the objective function including a loss function, a weight regular function and a sparse regular function;
    采用所述训练样本数据对所述初始神经网络进行迭代训练;Performing iterative training on the initial neural network using the training sample data;
    当迭代训练次数达到阈值或者所述目标函数满足预置的收敛条件时,得到所述中间神经网络。When the number of iteration training times reaches a threshold or the target function meets a preset convergence condition, the intermediate neural network is obtained.
  8. 根据权利要求7所述的方法,其特征在于,所述采用所述训练样本数据对所述初始神经网络进行迭代训练,具体包括:The method according to claim 7, wherein the iterative training of the initial neural network using the training sample data specifically includes:
    对所述初始神经网络进行多次以下的迭代训练:Perform the following iterative training for the initial neural network multiple times:
    将前一次迭代训练得到的稀疏缩放算子作为所述目标函数的常量,将所述权重作为所述目标函数的变量,采用第一优化算法对所述目标函数进行优化,得到本次迭代训练的权重;The sparse scaling operator obtained in the previous iteration training is used as the constant of the objective function, and the weight is used as the variable of the objective function, and the objective function is optimized using the first optimization algorithm to obtain Weights;
    将本次迭代训练的权重作为所述目标函数的常量,将稀疏缩放算子作为所述目标函数的变量,采用第二优化算法对所述目标函数进行优化,得到本次迭代训练的稀疏缩放算子;Using the weight of this iteration training as the constant of the objective function, and the sparse scaling operator as the variable of the objective function, the second optimization algorithm is used to optimize the objective function to obtain the sparse scaling algorithm of this iteration training child;
    基于本次迭代训练的权重和稀疏缩放算子进行下一次迭代训练。Based on the weights of this iterative training and the sparse scaling operator for the next iterative training.
  9. 根据权利要求8所述的方法,其特征在于,所述第二优化算法为加速近端梯度下降算法、近端梯度下降算法或者交替方向乘子算法。The method according to claim 8, wherein the second optimization algorithm is an accelerated near-end gradient descent algorithm, a near-end gradient descent algorithm, or an alternating direction multiplier algorithm.
  10. 根据权利要求7所述的方法,其特征在于,所述目标函数为:The method according to claim 7, wherein the objective function is:
    Figure PCTCN2019077049-appb-100001
    Figure PCTCN2019077049-appb-100001
    其中,W为权重,λ为稀疏缩放算子向量,K为样本数据的数量,L(y i,Net(x i,W,λ))为神经网络在样本数据x i上的损失,y i为样本标签,Net(x i,W,λ)为神经网络的输出,
    Figure PCTCN2019077049-appb-100002
    为权重正则函数,δ为权重W的参数衰减权重,γ||λ|| 1为稀疏正则函数。
    Among them, W is the weight, λ is the sparse scaling operator vector, K is the number of sample data, L (y i , Net (x i , W, λ)) is the loss of the neural network on the sample data x i , y i Is the sample label, and Net (x i , W, λ) is the output of the neural network,
    Figure PCTCN2019077049-appb-100002
    Is the weight regular function, δ is the parameter attenuation weight of the weight W, and γ || λ || 1 is the sparse regular function.
  11. 一种目标检测方法,其特征在于,包括:A target detection method, characterized in that it includes:
    获得待进行目标检测的样本数据,输入到采用深度神经网络的结构搜索方法得到的搜索结果神经网络中,以所述搜索结果神经网络的输出作为目标检测结果;其中,所述深度神经网络的结构搜索方法包括:Obtain sample data to be detected by the target, and input it into the search result neural network obtained by the structural search method of the deep neural network, and use the output of the search result neural network as the target detection result; Search methods include:
    在预先设置的搜索空间中获得深度神经网络中依次串接的每个模块中的每层计算单元结构;所述每层计算单元结构包括至少一个计算单元;Obtaining a structure of each layer of computing units in each module connected in series in the deep neural network in a preset search space; the structure of each layer of computing units includes at least one computing unit;
    在每个模块中采用预设连接方式将各计算单元进行连接,得到每个模块中的信息流;其中,处于同一层计算单元结构的计算单元之间不进行连接,每个计算单元能够与和其所在模块中的不同层的计算单元,以及其所在模块的输入和输出进行连接;In each module, the computing units are connected by a preset connection method to obtain the information flow in each module; among them, computing units in the same layer of computing unit structure are not connected, and each computing unit can be connected to The calculation units of different layers in the module where it is located, and the input and output of the module where it is located are connected;
    根据模块及每个模块中的计算单元的连接情况,得到初始神经网络;According to the module and the connection of the computing unit in each module, the initial neural network is obtained;
    对所述初始神经网络中的信息流设置稀疏缩放算子,其中所述稀疏缩放算子用于对所述信息流进行缩放;Setting a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow;
    采用预置的训练样本数据对所述初始神经网络的权重和信息流的稀疏缩放算子进行训练,得到中间神经网络;Using preset training sample data to train the weight of the initial neural network and the sparse scaling operator of the information flow to obtain an intermediate neural network;
    将所述中间神经网络中稀疏缩放算子为零的信息流删除,得到搜索空间内的搜索结果神经网络。The information flow in which the sparse scaling operator is zero in the intermediate neural network is deleted to obtain a search result neural network in the search space.
  12. 一种语义分割方法,其特征在于,包括:A semantic segmentation method, which includes:
    获得待进行语义分割的样本数据,输入到采用深度神经网络的结构搜索方法得到的搜索结果神经网络中,以所述搜索结果神经网络的输出作为语义分割结果;其中,所述深度神经网络的结构搜索方法包括:Obtain the sample data to be semantically segmented, and input it into the search result neural network obtained by the structural search method of the deep neural network, and use the output of the search result neural network as the semantic segmentation result; wherein, the structure of the deep neural network Search methods include:
    在预先设置的搜索空间中获得深度神经网络中依次串接的每个模块中的每层计算单元结构;所述每层计算单元结构包括至少一个计算单元;Obtaining a structure of each layer of computing units in each module connected in series in the deep neural network in a preset search space; the structure of each layer of computing units includes at least one computing unit;
    在每个模块中采用预设连接方式将各计算单元进行连接,得到每个模块中的信息流;其中,处于同一层计算单元结构的计算单元之间不进行连接,每个计算单元能够与和其所在模块中的不同层的计算单元,以及其所在模块的输入和输出进行连接;In each module, the computing units are connected by a preset connection method to obtain the information flow in each module; among them, computing units in the same layer of computing unit structure are not connected, and each computing unit can be connected to The calculation units of different layers in the module where it is located, and the input and output of the module where it is located are connected;
    根据模块及每个模块中的计算单元的连接情况,得到初始神经网络;According to the module and the connection of the computing unit in each module, the initial neural network is obtained;
    对所述初始神经网络中的信息流设置稀疏缩放算子,其中所述稀疏缩放算子用于对所述信息流进行缩放;Setting a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow;
    采用预置的训练样本数据对所述初始神经网络的权重和信息流的稀疏缩放算子进行训练,得到中间神经网络;Using preset training sample data to train the weight of the initial neural network and the sparse scaling operator of the information flow to obtain an intermediate neural network;
    将所述中间神经网络中稀疏缩放算子为零的信息流删除,得到搜索空间内的搜索结果神经网络。The information flow in which the sparse scaling operator is zero in the intermediate neural network is deleted to obtain a search result neural network in the search space.
  13. 一种深度神经网络的结构搜索装置,其特征在于,包括:A structure search device for deep neural network, which is characterized by comprising:
    计算单元结构获得单元,用于在预先设置的搜索空间中获得深度神经网络中依次串接的每个模块中的每层计算单元结构;所述每层计算单元结构包括至少一个计算单元;A computing unit structure obtaining unit, configured to obtain each layer of computing unit structures in each module connected in series in the deep neural network in a preset search space; each layer of computing unit structure includes at least one computing unit;
    信息流获得单元,用于在每个模块中采用预设连接方式将各计算单元进行连接,得到每个模块中的信息流;其中,处于同一层计算单元结构的计算单元之间不进行连接,每个计算单元能够与和其所在模块中的不同层的计算单元,以及其所在模块的输入和输出进行连接;The information flow obtaining unit is used to connect each computing unit in each module with a preset connection method to obtain the information flow in each module; wherein, the computing units in the same layer of computing unit structure are not connected, Each computing unit can be connected to the computing units at different layers in the module where it is located, and the inputs and outputs of the module where it is located;
    初始神经网络获得单元,用于根据模块及每个模块中的计算单元的连接情况,得到初始神经网络;The initial neural network obtaining unit is used to obtain the initial neural network according to the connection of the modules and the computing units in each module;
    稀疏缩放算子设置单元,用于对所述初始神经网络中的信息流设置稀疏缩放算子,其中 所述稀疏缩放算子用于对所述信息流进行缩放;A sparse scaling operator setting unit, configured to set a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow;
    权重和算子训练单元,用于采用预置的训练样本数据对所述初始神经网络的权重和信息流的稀疏缩放算子进行训练,得到中间神经网络;A weight and operator training unit, used to train the weight of the initial neural network and the sparse scaling operator of the information flow using preset training sample data to obtain an intermediate neural network;
    搜索结果获得单元,用于将所述中间神经网络中稀疏缩放算子为零的信息流删除,得到搜索空间内的搜索结果神经网络。A search result obtaining unit is used to delete the information stream whose sparse scaling operator is zero in the intermediate neural network to obtain a search result neural network in the search space.
  14. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现深度神经网络的结构搜索方法,所述方法包括:A computer-readable storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, a structure search method for a deep neural network is implemented, the method includes:
    在预先设置的搜索空间中获得深度神经网络中依次串接的每个模块中的每层计算单元结构;所述每层计算单元结构包括至少一个计算单元;Obtaining a structure of each layer of computing units in each module connected in series in the deep neural network in a preset search space; the structure of each layer of computing units includes at least one computing unit;
    在每个模块中采用预设连接方式将各计算单元进行连接,得到每个模块中的信息流;其中,处于同一层计算单元结构的计算单元之间不进行连接,每个计算单元能够与和其所在模块中的不同层的计算单元,以及其所在模块的输入和输出进行连接;In each module, the computing units are connected by a preset connection method to obtain the information flow in each module; among them, computing units in the same layer of computing unit structure are not connected, and each computing unit can be connected to The calculation units of different layers in the module where it is located, and the input and output of the module where it is located are connected;
    根据模块及每个模块中的计算单元的连接情况,得到初始神经网络;According to the module and the connection of the computing unit in each module, the initial neural network is obtained;
    对所述初始神经网络中的信息流设置稀疏缩放算子,其中所述稀疏缩放算子用于对所述信息流进行缩放;Setting a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow;
    采用预置的训练样本数据对所述初始神经网络的权重和信息流的稀疏缩放算子进行训练,得到中间神经网络;Using preset training sample data to train the weight of the initial neural network and the sparse scaling operator of the information flow to obtain an intermediate neural network;
    将所述中间神经网络中稀疏缩放算子为零的信息流删除,得到搜索空间内的搜索结果神经网络。The information flow in which the sparse scaling operator is zero in the intermediate neural network is deleted to obtain a search result neural network in the search space.
  15. 一种计算机设备,包括存储器、处理器及存储在存储上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现深度神经网络的结构搜索方法,所述方法包括:A computer device, including a memory, a processor, and a computer program stored on the storage and runable on the processor, characterized in that the processor implements a deep neural network structure search method when the processor executes the program, the Methods include:
    在预先设置的搜索空间中获得深度神经网络中依次串接的每个模块中的每层计算单元结构;所述每层计算单元结构包括至少一个计算单元;Obtaining a structure of each layer of computing units in each module connected in series in the deep neural network in a preset search space; the structure of each layer of computing units includes at least one computing unit;
    在每个模块中采用预设连接方式将各计算单元进行连接,得到每个模块中的信息流;其中,处于同一层计算单元结构的计算单元之间不进行连接,每个计算单元能够与和其所在模块中的不同层的计算单元,以及其所在模块的输入和输出进行连接;In each module, the computing units are connected by a preset connection method to obtain the information flow in each module; among them, computing units in the same layer of computing unit structure are not connected, and each computing unit can be connected to The calculation units of different layers in the module where it is located, and the input and output of the module where it is located are connected;
    根据模块及每个模块中的计算单元的连接情况,得到初始神经网络;According to the module and the connection of the computing unit in each module, the initial neural network is obtained;
    对所述初始神经网络中的信息流设置稀疏缩放算子,其中所述稀疏缩放算子用于对所述信息流进行缩放;Setting a sparse scaling operator on the information flow in the initial neural network, wherein the sparse scaling operator is used to scale the information flow;
    采用预置的训练样本数据对所述初始神经网络的权重和信息流的稀疏缩放算子进行训练,得到中间神经网络;Using preset training sample data to train the weight of the initial neural network and the sparse scaling operator of the information flow to obtain an intermediate neural network;
    将所述中间神经网络中稀疏缩放算子为零的信息流删除,得到搜索空间内的搜索结果神经网络。The information flow in which the sparse scaling operator is zero in the intermediate neural network is deleted to obtain a search result neural network in the search space.
PCT/CN2019/077049 2018-10-26 2019-03-05 Structural search method and apparatus for deep neural network WO2020082663A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811259033.2A CN109284820A (en) 2018-10-26 2018-10-26 A kind of search structure method and device of deep neural network
CN201811259033.2 2018-10-26

Publications (1)

Publication Number Publication Date
WO2020082663A1 true WO2020082663A1 (en) 2020-04-30

Family

ID=65177420

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/077049 WO2020082663A1 (en) 2018-10-26 2019-03-05 Structural search method and apparatus for deep neural network

Country Status (2)

Country Link
CN (2) CN109284820A (en)
WO (1) WO2020082663A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738418A (en) * 2020-06-19 2020-10-02 北京百度网讯科技有限公司 Training method and device for hyper network
CN111753964A (en) * 2020-06-29 2020-10-09 北京百度网讯科技有限公司 Neural network training method and device
CN112100466A (en) * 2020-09-25 2020-12-18 北京百度网讯科技有限公司 Method, device and equipment for generating search space and storage medium
CN112528123A (en) * 2020-12-18 2021-03-19 北京百度网讯科技有限公司 Model searching method, model searching apparatus, electronic device, storage medium, and program product
CN112560985A (en) * 2020-12-25 2021-03-26 北京百度网讯科技有限公司 Neural network searching method and device and electronic equipment
CN112668702A (en) * 2021-01-15 2021-04-16 北京格灵深瞳信息技术股份有限公司 Fixed-point parameter optimization method, system, terminal and storage medium
CN112966812A (en) * 2021-02-25 2021-06-15 中国人民解放军战略支援部队航天工程大学 Automatic neural network structure searching method for communication signal modulation recognition
CN113326922A (en) * 2021-05-31 2021-08-31 北京市商汤科技开发有限公司 Neural network generation method and device, electronic equipment and storage medium
CN113469010A (en) * 2021-06-25 2021-10-01 中国科学技术大学 NOx concentration real-time estimation method based on diesel vehicle black smoke image and storage medium
CN113743168A (en) * 2020-05-29 2021-12-03 北京机械设备研究所 Urban flyer identification method based on micro-depth neural network search

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284820A (en) * 2018-10-26 2019-01-29 北京图森未来科技有限公司 A kind of search structure method and device of deep neural network
CN109919304B (en) * 2019-03-04 2021-07-02 腾讯科技(深圳)有限公司 Image processing method, image processing device, readable storage medium and computer equipment
CN109948795B (en) * 2019-03-11 2021-12-14 驭势科技(北京)有限公司 Method and device for determining network structure precision and delay optimization point
CN109978142B (en) * 2019-03-29 2022-11-29 腾讯科技(深圳)有限公司 Neural network model compression method and device
CN110276442B (en) * 2019-05-24 2022-05-17 西安电子科技大学 Searching method and device of neural network architecture
CN110197258B (en) * 2019-05-29 2021-10-29 北京市商汤科技开发有限公司 Neural network searching method, image processing device, neural network searching apparatus, image processing apparatus, and recording medium
WO2020237688A1 (en) * 2019-05-31 2020-12-03 深圳市大疆创新科技有限公司 Method and device for searching network structure, computer storage medium and computer program product
WO2020237687A1 (en) * 2019-05-31 2020-12-03 深圳市大疆创新科技有限公司 Network architecture search method and apparatus, computer storage medium and computer program product
CN112215332B (en) * 2019-07-12 2024-05-14 华为技术有限公司 Searching method, image processing method and device for neural network structure
CN112308200B (en) * 2019-07-30 2024-04-26 华为技术有限公司 Searching method and device for neural network
CN110473195B (en) * 2019-08-13 2023-04-18 中山大学 Medical focus detection framework and method capable of being customized automatically
CN110490323A (en) * 2019-08-20 2019-11-22 腾讯科技(深圳)有限公司 Network model compression method, device, storage medium and computer equipment
CN110428046B (en) * 2019-08-28 2023-12-15 腾讯科技(深圳)有限公司 Method and device for acquiring neural network structure and storage medium
WO2021057690A1 (en) * 2019-09-24 2021-04-01 华为技术有限公司 Neural network building method and device, and image processing method and device
CN110751267B (en) * 2019-09-30 2021-03-30 京东城市(北京)数字科技有限公司 Neural network structure searching method, training method, device and storage medium
CN110826696B (en) * 2019-10-30 2023-06-27 北京百度网讯科技有限公司 Super-network search space construction method and device and electronic equipment
CN111160515B (en) * 2019-12-09 2023-03-21 中山大学 Running time prediction method, model search method and system
CN111191785B (en) * 2019-12-20 2023-06-23 沈阳雅译网络技术有限公司 Structure searching method based on expansion search space for named entity recognition
CN111090673B (en) * 2019-12-20 2023-04-18 北京百度网讯科技有限公司 Cache unit searching method and related equipment
CN111401516B (en) * 2020-02-21 2024-04-26 华为云计算技术有限公司 Searching method for neural network channel parameters and related equipment
CN113361680B (en) * 2020-03-05 2024-04-12 华为云计算技术有限公司 Neural network architecture searching method, device, equipment and medium
CN111797983A (en) * 2020-05-25 2020-10-20 华为技术有限公司 Neural network construction method and device
CN111667057B (en) * 2020-06-05 2023-10-20 北京百度网讯科技有限公司 Method and apparatus for searching model structures
CN111714124B (en) * 2020-06-18 2023-11-03 中国科学院深圳先进技术研究院 Magnetic resonance film imaging method, device, imaging equipment and storage medium
CN111767985B (en) * 2020-06-19 2022-07-22 深圳市商汤科技有限公司 Neural network training method, video identification method and device
CN113379034B (en) * 2021-06-15 2023-10-20 南京大学 Neural network structure optimization method based on network structure search technology

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372721A (en) * 2016-08-29 2017-02-01 中国传媒大学 Large-scale nerve network 3D visualization method
CN107247991A (en) * 2017-06-15 2017-10-13 北京图森未来科技有限公司 A kind of method and device for building neutral net
CN107316079A (en) * 2017-08-08 2017-11-03 珠海习悦信息技术有限公司 Processing method, device, storage medium and the processor of terminal convolutional neural networks
CN107480774A (en) * 2017-08-11 2017-12-15 山东师范大学 Dynamic neural network model training method and device based on integrated study
CN109284820A (en) * 2018-10-26 2019-01-29 北京图森未来科技有限公司 A kind of search structure method and device of deep neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372721A (en) * 2016-08-29 2017-02-01 中国传媒大学 Large-scale nerve network 3D visualization method
CN107247991A (en) * 2017-06-15 2017-10-13 北京图森未来科技有限公司 A kind of method and device for building neutral net
CN107316079A (en) * 2017-08-08 2017-11-03 珠海习悦信息技术有限公司 Processing method, device, storage medium and the processor of terminal convolutional neural networks
CN107480774A (en) * 2017-08-11 2017-12-15 山东师范大学 Dynamic neural network model training method and device based on integrated study
CN109284820A (en) * 2018-10-26 2019-01-29 北京图森未来科技有限公司 A kind of search structure method and device of deep neural network

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743168A (en) * 2020-05-29 2021-12-03 北京机械设备研究所 Urban flyer identification method based on micro-depth neural network search
CN113743168B (en) * 2020-05-29 2023-10-13 北京机械设备研究所 Urban flyer identification method based on micro-depth neural network search
CN111738418A (en) * 2020-06-19 2020-10-02 北京百度网讯科技有限公司 Training method and device for hyper network
CN111753964A (en) * 2020-06-29 2020-10-09 北京百度网讯科技有限公司 Neural network training method and device
CN112100466A (en) * 2020-09-25 2020-12-18 北京百度网讯科技有限公司 Method, device and equipment for generating search space and storage medium
CN112528123A (en) * 2020-12-18 2021-03-19 北京百度网讯科技有限公司 Model searching method, model searching apparatus, electronic device, storage medium, and program product
CN112560985A (en) * 2020-12-25 2021-03-26 北京百度网讯科技有限公司 Neural network searching method and device and electronic equipment
CN112560985B (en) * 2020-12-25 2024-01-12 北京百度网讯科技有限公司 Neural network searching method and device and electronic equipment
CN112668702A (en) * 2021-01-15 2021-04-16 北京格灵深瞳信息技术股份有限公司 Fixed-point parameter optimization method, system, terminal and storage medium
CN112668702B (en) * 2021-01-15 2023-09-19 北京格灵深瞳信息技术股份有限公司 Fixed-point parameter optimization method, system, terminal and storage medium
CN112966812A (en) * 2021-02-25 2021-06-15 中国人民解放军战略支援部队航天工程大学 Automatic neural network structure searching method for communication signal modulation recognition
CN113326922A (en) * 2021-05-31 2021-08-31 北京市商汤科技开发有限公司 Neural network generation method and device, electronic equipment and storage medium
CN113326922B (en) * 2021-05-31 2023-06-13 北京市商汤科技开发有限公司 Neural network generation method and device, electronic equipment and storage medium
CN113469010A (en) * 2021-06-25 2021-10-01 中国科学技术大学 NOx concentration real-time estimation method based on diesel vehicle black smoke image and storage medium
CN113469010B (en) * 2021-06-25 2024-04-02 中国科学技术大学 NOx concentration real-time estimation method based on black smoke image of diesel vehicle and storage medium

Also Published As

Publication number Publication date
CN109284820A (en) 2019-01-29
CN110717586A (en) 2020-01-21

Similar Documents

Publication Publication Date Title
WO2020082663A1 (en) Structural search method and apparatus for deep neural network
WO2018227800A1 (en) Neural network training method and device
Goodfellow et al. Deep feedforward networks
US11966837B2 (en) Compression of deep neural networks
CN104346629B (en) A kind of model parameter training method, apparatus and system
US20180322383A1 (en) Storage controller accelaration for neural network training and inference
CN112231489B (en) Knowledge learning and transferring method and system for epidemic prevention robot
CN109120462A (en) Prediction technique, device and the readable storage medium storing program for executing of opportunistic network link
US11416743B2 (en) Swarm fair deep reinforcement learning
WO2018227801A1 (en) Method and device for building neural network
CN111602148A (en) Regularized neural network architecture search
CN106953862A (en) The cognitive method and device and sensor model training method and device of network safety situation
JP2017211799A (en) Information processing device and information processing method
CN106326346A (en) Text classification method and terminal device
Ettaouil et al. Architecture optimization model for the multilayer perceptron and clustering.
US20230222325A1 (en) Binary neural network model training method and system, and image processing method and system
US11853896B2 (en) Neural network model, method, electronic device, and readable medium
JP2020027399A (en) Computer system
CN113077237B (en) Course arrangement method and system for self-adaptive hybrid algorithm
CN115809340A (en) Entity updating method and system of knowledge graph
Cheng et al. Swiftnet: Using graph propagation as meta-knowledge to search highly representative neural architectures
US11488007B2 (en) Building of custom convolution filter for a neural network using an automated evolutionary process
Duggal et al. High Performance SqueezeNext for CIFAR-10
JP6993250B2 (en) Content feature extractor, method, and program
Zhan et al. Relationship explainable multi-objective reinforcement learning with semantic explainability generation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19876020

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 160821)

122 Ep: pct application non-entry in european phase

Ref document number: 19876020

Country of ref document: EP

Kind code of ref document: A1