CN112069370A - Neural network structure search method, apparatus, medium, and device - Google Patents

Neural network structure search method, apparatus, medium, and device Download PDF

Info

Publication number
CN112069370A
CN112069370A CN201910503236.XA CN201910503236A CN112069370A CN 112069370 A CN112069370 A CN 112069370A CN 201910503236 A CN201910503236 A CN 201910503236A CN 112069370 A CN112069370 A CN 112069370A
Authority
CN
China
Prior art keywords
neural network
block
layer
blocks
sample data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910503236.XA
Other languages
Chinese (zh)
Inventor
方杰民
张骞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Horizon Robotics Technology Research and Development Co Ltd
Original Assignee
Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Horizon Robotics Technology Research and Development Co Ltd filed Critical Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority to CN201910503236.XA priority Critical patent/CN112069370A/en
Publication of CN112069370A publication Critical patent/CN112069370A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

A neural network structure search method, apparatus, medium, and device are disclosed. The neural network structure searching method comprises the following steps: acquiring a first neural network comprising a plurality of blocks with different channel numbers, wherein at least one block in the first neural network is connected with at least three blocks, and at least one block comprises a header layer for performing block channel number and spatial resolution conversion; according to sample data in a first data set, carrying out neural network structure search processing on the first neural network by utilizing a search strategy based on gradient to obtain structure parameters of the first neural network; and determining a second neural network structure obtained by searching according to the structure parameters of the first neural network. The technical scheme provided by the disclosure is beneficial to improving the flexibility of searching the neural network structure, thereby being beneficial to improving the performance and diversity of the neural network obtained by searching.

Description

Neural network structure search method, apparatus, medium, and device
Technical Field
The present disclosure relates to computer vision technologies, and in particular, to a neural network structure search method, a neural network structure search device, a storage medium, and an electronic device.
Background
In the field of computer vision technology, designing a neural network structure and adjusting each parameter in the neural network with the designed structure often requires a large amount of labor cost, computational cost and time cost.
How to design the structure of the neural network conveniently is a technical problem of great concern.
Disclosure of Invention
The present disclosure is proposed to solve the above technical problems. Embodiments of the present disclosure provide a neural network structure search method, a neural network structure search apparatus, a storage medium, and an electronic device.
According to an aspect of the embodiments of the present disclosure, there is provided a neural network structure searching method, including: acquiring a first neural network comprising a plurality of blocks with different channel numbers, wherein at least one block in the first neural network is connected with at least three blocks, and at least one block comprises a header layer for performing block channel number and spatial resolution conversion; according to sample data in a first data set, carrying out neural network structure search processing on the first neural network by utilizing a search strategy based on gradient to obtain structure parameters of the first neural network; and determining a second neural network structure obtained by searching according to the structure parameters of the first neural network.
According to another aspect of the embodiments of the present disclosure, there is provided a neural network structure searching apparatus including: an obtaining module, configured to obtain a first neural network including a plurality of blocks with different channel numbers, where at least one block in the first neural network is connected to at least three blocks, and at least one block includes a header layer for performing block channel number and spatial resolution conversion; the searching module is used for searching the neural network structure of the first neural network acquired by the acquiring module by utilizing a gradient-based searching strategy according to sample data in the first data set to acquire the structural parameters of the first neural network; and the determining module is used for determining the second neural network structure obtained by searching according to the structure parameters of the first neural network obtained by the searching module.
According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the above neural network structure searching method.
According to still another aspect of an embodiment of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; the processor is used for reading the executable instruction from the memory and executing the instruction to realize the neural network structure searching method.
Based on the neural network structure searching method and the neural network structure searching device provided by the above embodiments of the present disclosure, since the blocks in the present disclosure can be connected with at least three blocks, the first neural network of the present disclosure can be considered as a search space based on dense connection; since the blocks in the present disclosure include the header layer for performing block channel number conversion, the blocks connected to each other may have different channel numbers, and thus the present disclosure allows a plurality of blocks having different channel numbers to be included in the first neural network, and by performing neural network structure search processing in the first neural network including the blocks having different channel numbers, not only deep search of the neural network structure but also channel number search of the neural network structure can be realized. Therefore, the technical scheme provided by the disclosure is beneficial to improving the flexibility of neural network structure searching, and is further beneficial to improving the performance and diversity of the neural network obtained by searching.
The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart of one example of a neural network structure search method of the present disclosure;
FIG. 2 is a schematic structural diagram of one example of a first neural network of the present disclosure;
fig. 3 is a schematic structural diagram of an example of an MBConv structure included in a parallel layer according to the present disclosure;
FIG. 4 is a schematic structural diagram of an example of MBConv structure contained in stacked layers of the present disclosure;
FIG. 5 is a schematic block diagram illustrating an embodiment of a variable packet based convolution module according to the present disclosure;
fig. 6 is a schematic structural diagram of an example of the neural network structure search apparatus of the present disclosure;
fig. 7 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.
Detailed Description
Example embodiments according to the present disclosure will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.
It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.
It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more than two and "at least one" may refer to one, two or more than two.
It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.
In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing the associated object, and means that there may be three kinds of relationships, such as a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.
It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Embodiments of the present disclosure may be implemented in electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with an electronic device, such as a terminal device, computer system, or server, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment. In a distributed cloud computing environment, tasks may be performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
Summary of the disclosure
In implementing the present disclosure, the inventors found that performing a Neural network structure Search (NAS) process in a Search Space (Search Space) to obtain a structure of a Neural network has become an important issue in computer vision technology.
In the existing neural Network structure search technology, a search space is generally represented by a Super Network (Super Network). The super network includes a plurality of blocks (blocks), and the blocks are usually connected in a sequential single connection manner, that is, one Block is usually connected to an upstream Block and a downstream Block. A block typically includes multiple layers. The layers that the target neural network may include are obtained by searching in the super network. However, this method can only realize deep search of the neural network, and cannot realize wide search of the neural network, which may cause the structure of the target neural network obtained from search in the super network to be limited.
Brief description of the drawings
It is assumed that the target neural network to be obtained includes a plurality of blocks connected in sequence, and the number of channels and the spatial resolution of all the blocks included in the target neural network need to be obtained by searching. In an environment with low cost, the neural network structure searching technology provided by the disclosure can be used for quickly obtaining the target neural network from the super network. For example, in a computing environment including four GPUs (Graphics Processing units), it is possible to realize a wide search and a deep search of a target neural network at a time cost of only several tens of hours, and finally obtain the target neural network.
Exemplary method
Fig. 1 is a flowchart of a neural network structure search method of the present disclosure. As shown in fig. 1, the method of this embodiment includes the steps of: s100, S101 and S102. The following describes each step.
S100, a first neural network comprising a plurality of blocks with different channel numbers is obtained.
The first neural network in this disclosure may be referred to as a search space or a super network. The first neural network includes: a plurality of blocks. At least one block in the first neural network is connected to a plurality of blocks, for example, one block in the first neural network is connected to at least three blocks. For another example, all blocks except the predetermined block in the first neural network are connected with at least three blocks. The predetermined block may include: two blocks at the head of the first neural network and two blocks at the tail of the first neural network, etc. The first neural network in the present disclosure may be referred to as a dense connection-based super network. The number of channels of all blocks comprised by the first neural network may be different.
At least one block (e.g., all blocks) in the first neural network of the present disclosure includes: a head layer and at least one stack layer. The header layer may be considered the first layer in the block. The header layer is used to perform block channel number and spatial resolution conversion so that the layer in the block located after the header layer can process the output of all the upstream blocks connected to the block in which it is located.
The number of channels of a block in this disclosure may be referred to as the width of the block. The number of channels of a block may be considered as the number of input channels and the number of output channels corresponding to each layer of the block located after the head layer. The spatial resolution of a block may be referred to as the length and height of the block. The spatial resolution of a block may be considered to be the spatial resolution corresponding to the layers in the block that are located after the head layer. For example, the number of channels and the length of the input feature map of each layer in the block located after the head layer may be regarded as the number of channels and the length of the block. It follows that the layers located behind the head layer in the block of the present disclosure correspond to the same number of channels and spatial resolution.
S110, according to sample data in the first data set, carrying out neural network structure search processing on the first neural network by using a search strategy based on gradient to obtain structure parameters of the first neural network.
The gradient-based search strategy in this disclosure can be thought of as: and carrying out derivation processing on the loss function, and enabling the derived loss function to gradually approach to the minimum value. The method can provide the sample data in the first data set to the first neural network, obtain a processing result output by the first neural network, and execute back propagation operation according to the difference between the processing result and the sample data and the derived loss function so as to adjust the structural parameters of the first neural network. The structural parameters of the first neural network in the present disclosure may refer to: the parameters set for the structure of the first neural network, the structural parameters of the first neural network may generally embody the likelihood that the corresponding structure in the first neural network becomes a structure in the second neural network.
S102, determining a second neural network structure obtained through searching according to the structural parameters of the first neural network.
The second neural network in this disclosure may be referred to as the target neural network. I.e. a neural network obtained by performing a neural network structure search in a search space. The second neural network may be considered a substructure of the first neural network structure.
Since the blocks in the present disclosure may be connected with at least three blocks, for example, one block may be connected with at least three upstream blocks, respectively; thus, the first neural network of the present disclosure may be considered a densely-connected based search space or a densely-connected based super network; the present disclosure can make all upstream blocks connected to one block have different channel numbers by providing a header layer for performing block channel number conversion within the block, and therefore, the present disclosure allows all upstream blocks of one block in the first neural network to include a plurality of blocks different in channel number. By carrying out the neural network structure search processing in the first neural network based on the dense connection, not only the deep search of the neural network can be realized, but also the channel number search of the neural network can be realized. Therefore, the technical scheme provided by the disclosure is beneficial to improving the flexibility of neural network structure searching, and is further beneficial to improving the performance and diversity of the neural network obtained by searching.
In one optional example, the manner in which the present disclosure obtains the first neural network includes, but is not limited to: and obtaining the first neural network in a data import mode or generating the first neural network according to preset information. The preset information may be considered as: input parameters of a software program for generating the first neural network. The preset information may include, but is not limited to: the number of channels in each block, and the number of upstream blocks connected to the block. In addition, the preset information may further include: the spatial resolution of each block, the number of stacked layers included in the block, and the like.
In an alternative example, the process of the present disclosure to obtain a first neural network comprising a plurality of blocks with different channel numbers may comprise the following two steps:
step 1, acquiring the number of channels of all blocks required by the first neural network and the number of upstream/downstream blocks connected with the blocks.
Optionally, the number of channels of the block of the present disclosure may be considered as: the number of input channels and the number of output channels corresponding to each layer in the block located behind the head layer. The spatial resolution of a block may be referred to as the length and height of the block, and the length and height of the block may be equal, although this disclosure does not exclude the case where the length and height are unequal. The spatial resolution of a block may be considered to be the spatial resolution corresponding to the layers in the block that are located after the head layer. For example, the number of channels and the length and height of the input feature map of each layer in the block located after the head layer may be considered as the number of channels and the length and height of the block. Each layer in the block of the present disclosure that is located after the head layer corresponds to the same number of channels and spatial resolution.
Optionally, the number of channels of all blocks in this disclosure is different. I.e. not all blocks in the present disclosure have the same number of channels. For example, the number of channels of a part of the blocks in all the blocks is the first number of channels, the number of channels of another part of the blocks in all the blocks is the second number of channels, and the first number of channels is different from the second number of channels. For another example, there are not two blocks with the same number of channels in all blocks. That is, assuming that the number of all blocks required for the first neural network is N, the channels in the N blocksIn the number, M different channel numbers exist, wherein N and M are positive integers larger than 2, and M is smaller than or equal to N. The block set Arch formed by N blocks in the present disclosure can be expressed as: arch ═ B1、B2、B3、......、BN}。
Optionally, for the ith block (i.e. B) of all blocksi) In other words, the number of upstream blocks connected to a block in the present disclosure refers to: the number of blocks located before and adjacent to and connected to the ith block (the other blocks than the predetermined block located at the head of the first neural network, for example, the other blocks except the first two blocks located at the head of the first neural network, the number of the predetermined blocks depending on the number of upstream blocks, e.g., the number of the predetermined blocks is the number of upstream blocks minus one). Accordingly, the number of downstream blocks connected to a block of the present disclosure refers to: the number of blocks located after and connected to the ith block (the other blocks except the predetermined block located at the tail of the first neural network, for example, the other blocks except the last two blocks located at the tail of the first neural network, the number of the predetermined blocks depending on the number of downstream blocks, e.g., the number of the predetermined blocks is the number of downstream blocks minus one). In the case where the number of upstream blocks connected to the block is acquired, it is predicted that the number of upstream/downstream blocks of each block other than the first to xth blocks in the first neural network is the same. Where x is the number of upstream blocks minus one. Accordingly, in the case where the number of downstream blocks connected to a block is acquired, it is predicted that the number of downstream blocks of the blocks other than the last block to the xth block in the first neural network is the same. Where x is the number of downstream blocks minus one. The present disclosure can set the number of channels of all blocks required for the first neural network and the number of upstream/downstream blocks connected to the blocks according to actual requirements. The present disclosure is not limited thereto.
It should be particularly noted that, when sorting is performed according to the channel number of the blocks, a plurality of blocks adjacent to each other in front and back may have the same channel number amplification, for example, multiple channel number amplifications are preset, after sorting all the blocks according to the channel number amplifications of the blocks, all the blocks may be divided into multiple groups according to the channel number amplifications, all the blocks in each group correspond to the same channel number amplification, and different groups correspond to different channel number amplifications. Optionally, all blocks in a same group generally have the same spatial resolution, and the number of channels of any two blocks in all blocks in a same group is generally different.
And 2, determining the connection relation of all blocks required by the first neural network according to the number of the upstream blocks and the incremental increase of the number of channels of the blocks.
Optionally, the present disclosure may sort all the blocks according to the channel number of all the blocks and the ascending order of the channel number of the blocks, determine the upstream/downstream blocks of each block by using the number of the upstream/downstream blocks, and form the block connection architecture of the first neural network by determining the connection relationship between each block and the corresponding upstream/downstream block.
Optionally, it is assumed that the number of all blocks required by the first neural network in the present disclosure is n +1, where n is a positive integer greater than 5, and the number of channels of n +1 blocks is: c0、C0+c、C0+2c、C0+3c、C0+4c、C0+5c、……、C0+ nc, wherein C0Is a positive integer greater than 1 (e.g. C)03), assuming that the number of upstream/downstream blocks connected to the block is 4, one example of the block connection architecture of the first neural network formed by the present disclosure is as shown in fig. 2.
The method and the device have the advantages that the number of channels of all the blocks and the number of the upstream/downstream blocks connected with the blocks are obtained, the connection relation of all the blocks is determined according to the number of the upstream/downstream blocks and the ascending sequence of the number of the channels of the blocks, a feasible implementation mode is provided for forming the first neural network, and the calculation power consumed by searching and processing the neural network structure is favorably reduced.
In an alternative example, the present disclosure may divide the first neural network formed into a plurality of stages according to the spatial resolution of the block and the increase in the number of channels of the block, and one stage may include: one block or a plurality of blocks. The spatial resolution of all blocks in a stage is typically the same. The number of channels is usually not the same for all blocks in a stage. The channel number amplification is usually the same for all blocks in a stage, i.e. a stage has one channel number amplification. A stage in the first neural network may be considered to be one of the aforementioned groups. In the case of a first neural network having a channel number gain for each stage, the channel number gain of the preceding stage is generally smaller than the channel number gain of the following stage for two stages that are adjacent to one another in the first neural network. Optionally, for the ith block, if the (i-1) th block and the ith block are adjacent and connected in front of and behind each other, and the (i-1) th block is an upstream block of the ith block, the ratio of the length of the ith block to the length of the (i-1) th block is usually not greater than 2, and the ratio of the height of the ith block to the height of the (i-1) th block is usually not greater than 2.
In an alternative example, of the three blocks connected in sequence in the first neural network, the first block has a channel number increase relative to the middle block that is typically no greater than the channel number increase of the middle block relative to the last block. For example, if three blocks connected in sequence belong to the same phase, the channel number increase of the first block relative to the middle block is generally equal to the channel number increase of the middle block relative to the last block. For another example, if three blocks connected in sequence belong to two stages, the channel number of the first block relative to the middle block is increased by a smaller amount than the channel number of the middle block relative to the last block.
The number of channels between the two blocks arranged in the first neural network is increased to be smaller than the number of channels between the two blocks arranged in the second neural network, so that the calculation power consumed for searching the neural network structure in the first neural network is reduced.
In an alternative example, the present disclosure also requires setting a head layer in each block and layers after the head layer in forming the first neural network. The header layer in the present disclosure may be composed of a plurality of parallel layers, and the number of parallel layers included in the header layer is generally the same as the number of pre-set up upstream/downstream blocks. Parallel layers in this disclosure refer to: layers arranged side by side. I.e. there is no upstream-downstream relationship between the parallel layers in a header layer. A parallel layer corresponds to an upstream block that is adjacent to and connected to the block in which it is located. Different parallel layers in a block correspond to different upstream blocks that are adjacent and connected to the block in which they are located. All parallel layers in a block correspond to all upstream blocks that are adjacent to and connected to the block in which it is located. That is, one block processes output information (e.g., a feature map or the like) of each upstream block through its header layer so that each layer in the block located after the header layer can process the output information of each upstream block.
Optionally, for the ith block in the first neural network, the present disclosure may determine, according to a preset number of upstream/downstream blocks, a number of parallel layers included in a header layer of the ith block, and determine, according to a number of channels and a spatial resolution of the ith block, and a number of channels and a spatial resolution of each upstream block adjacent to and connected to the ith block, a number of input channels and a spatial resolution, and a number of output channels and a spatial resolution, which correspond to each parallel layer in the ith block, respectively. I.e., the number of input channels and spatial resolution corresponding to any parallel layer in the ith block, typically depends on the number of channels and spatial resolution of the upstream block corresponding to the parallel layer, while the number of output channels and spatial resolution corresponding to any parallel layer in the ith block typically depends on the number of channels and spatial resolution of the ith block. The number of input channels corresponding to the parallel layer may be considered as: the parallel layer receives the number of channels of input information (e.g., a profile). The spatial resolution corresponding to the parallel layers can be considered as: the parallel layer receives the spatial resolution of the input information (e.g., the feature map).
As an example, assume that the ith block (i.e., B)i) A is a, and a upstream blocks are respectively represented as Bi-1、Bi-2、......、Bi-aAnd B isiThe number of channels and the spatial resolution of (C) are respectively expressed asiAnd Hi×Wi,Bi-1The number of channels and the spatial resolution of (C) are respectively expressed asi-1And Hi-1×Wi-1,Bi-2The number of channels and the spatial resolution of (C) are respectively expressed asi-2And Hi-2×Wi-2By analogy, Bi-aThe number of channels and the spatial resolution of (C) are respectively expressed asi-aAnd Hi-a×Wi-a(ii) a Under the above-assumed conditions, B in the present disclosureiMay comprise a parallel layers, the first of which corresponds to Bi-1(e.g. with B)i-1Connected), the first parallel layer is used to connect Bi-1Output based on Ci-1×Hi-1×Wi-1Is converted into a feature map based on Ci×Hi×WiA characteristic diagram of (1); wherein the second parallel layer corresponds to Bi-2The second parallel layer is used for combining Bi-2Output based on Ci-2×Hi-2×Wi-2Is converted into a feature map based on Ci×Hi×WiAnd so on, wherein the a-th parallel layer corresponds to Bi-aThe a-th parallel layer is used for combining Bi-aOutput based on Ci-a×Hi-a×Wi-aIs converted into Ci×Hi×WiThe characteristic diagram of (1).
The first neural network can be formed by a plurality of blocks with different channel numbers and spatial resolutions by arranging a plurality of parallel layers in the head layer of the block, enabling each parallel layer to respectively correspond to one upstream block adjacent to and connected with the block where the parallel layer is located, and enabling information with different channel numbers and spatial resolutions output by the upstream blocks adjacent to and connected with the block where the parallel layer is located to be converted into the channel numbers and the spatial resolutions of the block where the parallel layer is located.
In one optional example, the operations performed by the parallel layers in the header layer of the block in the present disclosure may be set according to a preset set of candidate operations. For example, the present disclosure may set the operations included in each parallel layer in the header layer according to all candidate operations in a preset candidate operation set. The way the present disclosure sets the output of any parallel layer in the ith block in the first neural network may be: firstly, calculating according to the operation weight of each candidate operation in the parallel layer and the output of each candidate operation to obtain a calculation result; then, the calculation result is processed (such as convolution processing) and the processed result is used as the output of the parallel layer. The number of channels of the processing result is the same as that of the block where the processing result is located. That is, in the process of processing the calculation result, the number of channels and the spatial resolution of the calculation result are converted into the number of channels and the spatial resolution of the block in which the parallel layer is located.
In one optional example, the present disclosure is pre-provisioned with a set of candidate operations. The set of candidate operations may refer to: the parallel layers and all candidate operations that may be involved by the layers following the head layer. The parallel layers, the number of stacked layers, and the operations performed by the parallel layers and the stacked layers, which are specifically included in each block included in the second neural network obtained by the neural network structure search, are generally determined by the candidate operation finally selected from the candidate operations.
Optionally, the parallel layers in the present disclosure may employ, but are not limited to: MBConv structure. In the case where the MBConv structure is adopted for the parallel layer, an example of the MBConv structure included in the parallel layer in the present disclosure is shown in fig. 3.
The number of input channels of the structure in the parallel layer shown in fig. 3 is C, the number of output channels is C ', and C' may be different. The structure in the parallel layer in fig. 3 comprises 3 parts. The leftmost trapezoid box represents the convolution operation with a convolution kernel size of 1 × 1 and the ReLU6 activation function with an input channel number C and an output channel number tC, where t is the expansion coefficient. The middle rectangular box represents the depth separable convolution with step size 2 and convolution kernel size k × k and the ReLU6 activation function, for which both the number of input channels and the number of output channels are tC. The rightmost trapezoid box represents a convolution operand with a convolution kernel size of 1 × 1, and the input channel number is tC, and the output channel number is C. Where t and k × k depend on the candidate operation.
Alternatively, an example of all candidate operations included in the candidate operation set in the present disclosure may be shown in the following table:
TABLE 1
Figure BDA0002090935420000101
Figure BDA0002090935420000111
Mbconv _ k3e3 in Table 1 represents an Mbconv convolution operation with a convolution kernel size of 3 × 3 and an expansion coefficient of 3; mbconv _ k3e6 represents an Mbconv convolution operation with a convolution kernel size of 3 × 3 and a dilation coefficient of 6; by analogy, the description is omitted; skip indicates skip, that is, skip indicates that the candidate operation does not process its input, and the output of the candidate operation can be considered as the input of the candidate operation.
For any parallel layer in the header layer of any block, the output of that parallel layer can be expressed as:
Figure BDA0002090935420000112
in the above formula (1), xloutRepresenting the output of the l parallel layer in the head layer of the block; o (x)lin) Representing the input x of the o-th candidate operation in the l-th parallel layer in the head layer of the block for the l-th parallel layerlinThe processing is carried out so as to obtain the result,
Figure BDA0002090935420000113
representing the operation weight of the o-th candidate operation in the l-th parallel layer in the head layer of the block.
Alternatively, in this disclosure
Figure BDA0002090935420000114
Usually the normalized operation weights. The present disclosure may obtain the operation weight of the o-th candidate operation in the i-th parallel layer after the normalization processing by using the following formula:
Figure BDA0002090935420000115
in the above-mentioned formula (2),
Figure BDA0002090935420000116
an operation weight (i.e., an operation weight after normalization processing) representing the o-th candidate operation in the l-th parallel layer in the head layer;
Figure BDA0002090935420000117
representing the operation weight before normalization processing of the o candidate operation in the l parallel layer in the head layer; o represents a set of candidate operations;
Figure BDA0002090935420000121
represents the operation weight before normalization processing of the o' th candidate operation in the ith parallel layer in the head layer.
The method and the device have the advantages that the operation weight of the candidate operation is utilized to carry out fusion processing on the output of each candidate operation in the parallel layer, so that the processing result of the parallel layer can fully reflect all the candidate operations in the parallel layer, and the influence on the processing result of the parallel layer is facilitated, and the method and the device are favorable for utilizing the loss function to adjust the operation weight of all the candidate operations in the parallel layer, and further are favorable for searching and obtaining the structure of the second neural network from the first neural network.
In an alternative example, for any block (e.g., ith block) in the first neural network, since the head layer includes a plurality of parallel layers, the head layer in the present disclosure needs to perform a fusion process on outputs of the plurality of parallel layers, and provide a result after the fusion process as an output of the head layer to a first stack layer located after the head layer. In the process of the header layer performing the fusion processing on the outputs of all the parallel layers included therein, the weights of the blocks connected to the parallel layers may be considered. The weights of the blocks in this disclosure may be referred to as block connection weights. That is, the present disclosure may perform calculation, for example, weighted average calculation, based on the block connection weight corresponding to each parallel layer in the head layer of the ith block and the output of each parallel layer, and the obtained calculation result is used as the output of the head layer of the ith block, and the output of the head layer of the ith block may be used as the input of the first stacked layer located after the head layer in the ith block.
Alternatively, the operation performed by the header layer of the ith block in the present disclosure may be represented by the following formula:
Figure BDA0002090935420000122
in the above formula (3), xiOutput information indicating a header layer of the ith block; m' represents the number of upstream blocks connected to the ith block; k 1.... m'; p is a radical ofi-k,kA block connection weight representing an i-k upstream block connected to the ith block; hik(xi-k) And (3) indicating the processing result of the output information of the corresponding parallel layer in the head layer in the ith block for the ith-k upstream block, namely the information output by the corresponding parallel layer.
Alternatively, p in this disclosurei-k,kTypically normalized block join weights. Assuming that any upstream block connected to the ith block can be represented as the jth block, the present disclosure may obtain the block connection weight of the jth block after the normalization processing using the following formula:
Figure BDA0002090935420000123
in the above formula (4), pijRepresents: when the jth block is an upstream block of the ith block, normalizing the block connection weight of the jth block after the process for the connection between the jth block and the ith block; m represents the number of upstream blocks connected to the ith block; k 1.... m; beta is aijRepresents: when the jth block is an upstream block of the ith block, normalizing the block connection weight of the jth block before the processing for the connection between the jth block and the ith block; beta is aikRepresents: when the k-th block is an upstream block of the i-th block, the block connection weight of the k-th block before the process is normalized with respect to the connection between the k-th block and the i-th block.
According to the method, the output of each parallel layer in the head layer of the block is subjected to fusion processing by using the block connection weight, on one hand, each stack layer in the block can process the output information of each upstream block with different channel numbers and spatial resolutions, and on the other hand, the processing result of the stack layer can fully reflect the output information of a plurality of upstream blocks with different block connection weights and influence on the processing result of the stack layer, so that the block connection weight can be adjusted by using a loss function, and the structure of the second neural network can be searched and obtained from the first neural network.
In one optional example, any block in the present disclosure may include: a plurality of stacked layers. The layers in the stack, i.e., the layers in the block that are behind the head layer. The stack layer may be used to process information output from all upstream blocks that are adjacent and connected to the block in which it is located. All the stacked layers included in one block are sequentially connected, and the input of the first stacked layer of all the stacked layers included in one block, that is, the output of the head layer of the block, and the output of the last stacked layer of all the stacked layers included in one block, that is, the output of the block, may be regarded as the output of the block.
In an alternative example, for the ith block in the first neural network, the present disclosure may determine, according to the number of channels and the spatial resolution of the ith block, the number of input channels and the spatial resolution and the number of output channels and the spatial resolution that each correspond to each of stacked layers sequentially located after the head layer in the ith block. For example, the number of input channels and the number of output channels corresponding to each stacked layer in the ith block are both the number of channels of the ith block, and the input spatial resolution and the output spatial resolution corresponding to each stacked layer in the ith block are both the spatial resolution of the ith block. The number of stacked layers included in the ith block may be set according to actual requirements, for example, the number of stacked layers included in the ith block is usually not less than 3. In addition, the number of stacked layers contained by all blocks in the first neural network is generally the same. Of course, this disclosure does not exclude the case where all blocks in the first neural network contain different numbers of stacked layers.
According to the method and the device, the formats of the input information and the output information corresponding to the stacking layer in the ith block are set according to the channel number and the spatial resolution of the ith block, so that the operation executed by each block in the first neural network is normalized, the structure of each block is normalized, the first neural network is conveniently and rapidly formed, and the maintainability of the first neural network is improved.
Alternatively, the present disclosure may set operations included in each stack layer sequentially located after the head layer in each block according to all candidate operations in the preset candidate operation set. An example of all the candidate operations is shown in table 1 above. For any stacked layer in a block, the manner in which the present disclosure sets the output of that stacked layer may include: performing calculation (for example, weighted average calculation) according to the operation weight for each candidate operation in the stack layer and the output of each candidate operation to obtain a calculation result; thereafter, the calculation result may be processed (e.g., convolution processing, etc.), and the processed result may be output as the stack layer.
Alternatively, the operation weight of any candidate operation in the stack layer may represent the likelihood that the candidate operation in the stack layer is selected to be a structure in the second neural network.
For any stack in any block, the output of the stack can be expressed as:
Figure BDA0002090935420000141
in the above formula (5), xl+1Represents the output of the l-th stacked layer in the block, and can also be considered as the input of the l + 1-th layer in the block; o (x)l) The o-th candidate operation in the l-th stack layer in the block is represented by processing the output of the l-1-th layer (possibly the stack layer, and possibly the head layer) in the block, obtaining a result,
Figure BDA0002090935420000142
represents the operation weight of the o-th candidate operation in the l-th stack layer in the block.
Alternatively, in this disclosure
Figure BDA0002090935420000143
Usually the normalized operation weights. The present disclosure may obtain the operation weight of the o-th candidate operation in the l-th stack layer after the normalization processing by using the following formula:
Figure BDA0002090935420000144
in the above-mentioned formula (6),
Figure BDA0002090935420000145
an operation weight representing the o-th candidate operation in the l-th stack layer in the block;
Figure BDA0002090935420000146
representing the operation weight before normalization processing of the o-th candidate operation in the l-th stack layer in the block; o represents a set of candidate operations;
Figure BDA0002090935420000147
represents the operation weight before normalization processing of the o' th candidate operation in the l-th stacked layer in the block.
Alternatively, the stacked layers of the present disclosure may employ, but are not limited to, an MBConv structure. In the case where the stack layer employs the MBConv structure, an example of one structure of the MBConv structure included in the stack layer is shown in fig. 4. Three parts are included in fig. 4. The leftmost trapezoidal box represents the convolution operation with a convolution kernel size of 1 × 1 and the ReLU6 activation function. The middle rectangular box represents the depth separable convolution with a convolution kernel size of k × k and the ReLU6 activation function. The rightmost trapezoidal box represents convolution operands with a convolution kernel size of 1 × 1. The number of input channels and the number of output channels in fig. 4 are both C.
According to the method and the device, the operation weight of the candidate operation is utilized to perform fusion processing on the output of each candidate operation in the stack layer, so that the processing result of the stack layer can fully reflect all the candidate operations in the stack layer, and the influence on the processing result of the stack layer is further facilitated, the operation weight of all the candidate operations can be adjusted by utilizing a loss function, and the structure of the second neural network can be searched and obtained from the first neural network.
In an alternative example, an example of a first neural network formed by the present disclosure is shown in fig. 5. The lower diagram in fig. 5 is a first neural network comprising 17 blocks. Each block in the first neural network is connected to 3 downstream blocks, i.e., the number of upstream/downstream blocks connected to a block is 3. The first neural network in fig. 5 can be divided into 5 stages, respectively:
1. the first stage located at the far left of fig. 5. The first stage comprises a block, the box labelled 16 in figure 5, with a channel number of 16, and a spatial resolution of 112 x 112.
2. The second stage, which is located to the far left of fig. 5, comprises a block, the box labelled 24 in fig. 5, the number of channels of which is 24, the spatial resolution of which is 56 x 56. The channel number increase between the first stage block and the second stage block is 8.
3. A third stage, located at the middle left position of fig. 5, comprises three blocks, namely the three blocks labelled 32, 40 and 48 in fig. 5. The number of channels of the first block, the second block and the third block in the third stage is 32, 40 and 48 respectively, and the spatial resolution of the first block, the second block and the third block in the third stage is 28 × 28. The channel number increase between adjacent blocks in the third stage is 8.
4. The fourth stage, located at the middle position of fig. 5, includes six blocks, namely the six blocks labeled 64, 80, 96, 112, 128, and 144 in fig. 5. The number of channels of the first to sixth blocks in the fourth stage is 64, 80, 96, 112, 128, and 144, respectively, and the spatial resolutions of the first to sixth blocks in the fourth stage are all 14 × 14. The number of channels between adjacent blocks in the fourth stage increases by 16.
5. The fifth stage, located at the rightmost position in fig. 5, includes six blocks, namely the six blocks labeled 160, 224, 288, 352, 416, and 480 in fig. 5. The number of channels of the first to sixth blocks in the fifth stage is 160, 224, 288, 352, 416, and 480, respectively, and the spatial resolutions of the first to sixth blocks in the fifth stage are all 7 × 7. The increase in the number of channels between adjacent blocks in the fifth stage is 64.
The structure of any one of the blocks in the first neural network in fig. 5 is shown in the upper left diagram of fig. 5. That is, the head layer of each block includes three parallel layers and three sequentially connected stacked layers, the inputs of the three parallel layers are the outputs of 3 upstream blocks connected to the block, respectively, and after performing weighted average calculation on the outputs of the three parallel layers according to the block connection weights of the 3 upstream blocks, the result of the weighted average calculation may be used as the input of the first stacked layer of the block.
The structure contained in any stack layer within any block in the first neural network in fig. 5 and any parallel layer in any head layer (as in the middle block of fig. 3 and the middle block of fig. 2) is shown in the upper right diagram of fig. 5. That is, each parallel layer and each stack layer include all candidate operations, and each candidate operation processes its input to form an output of each candidate operation, and when a candidate operation is a skip, the output of the candidate operation is the input of the candidate operation. After performing weighted average calculation on the output of each candidate operation according to the operation weight of each candidate operation, a result of the weighted average calculation for forming the output of the layer may be obtained, for example, performing convolution processing on the result of the weighted average calculation, and taking the result of the convolution processing as the output of the layer. The output of the last stacked layer may be taken as the output of the block.
In an optional example, before performing the neural network structure search processing on the first neural network by using the sample data in the first data set, the present disclosure should ensure that the processing of the input information by the first neural network has a certain accuracy. Therefore, the present disclosure may train the first neural network with the sample data in the second data set before performing the neural network structure search processing on the first neural network, so as to adjust the operation parameters of each candidate operation in each layer (including each parallel layer and each stacked layer) in each block in the first neural network, thereby enabling the first neural network to have a certain accuracy in processing the input information. Since the process of adjusting the structural parameters in the first neural network using the sample data may also be referred to as a training process for the first neural network, the training process for the first neural network of the present disclosure may include two stages, the first stage is a training process for the operational parameters, and the second stage is a training process for the structural parameters.
It should be noted that the second stage may be trained only for the structural parameters, or the second stage may be trained for both the operational parameters and the structural parameters (e.g., using a multi-objective optimization method to train both the operational parameters and the structural parameters). In addition, the first and second phases may be performed iteratively, i.e. the first and second phases may be performed alternately. For example, the training of the first stage is performed first, when a first predetermined iteration condition is reached (for example, the number of used sample data reaches a predetermined number, or the accuracy of the processing of the input information by the first neural network meets a certain requirement, etc.), the training process of the first stage is stopped, then, the training of the second stage is performed, when a second predetermined iteration condition is reached (for example, the number of used sample data reaches a predetermined number, or the convergence condition of the structural parameters of the first neural network meets a certain requirement, etc.), the training process of the second stage is stopped, the training of the first stage is performed again, and so on, thereby realizing the iteration of the first stage and the second stage.
Optionally, the operating parameters of the candidate operations in the present disclosure include, but are not limited to: convolution kernel weights in candidate operations, etc. The present disclosure may provide sample data in the second data set to the first neural network, process the sample data via the first neural network; thereafter, the present disclosure may adjust each of the layers (including the parallel layers and the stacked layers) in each of the blocks in the first neural network by using a second loss function (e.g., a cross-entropy loss function, etc., hereinafter referred to as a second cross-entropy loss function) according to the processing result of the sample data and the sample data (e.g., a difference between the processing result and the sample data)Operating parameters of each candidate operation. Before the first stage of training is performed for the first time, the present disclosure may assign values to the structural parameters of the first neural network in a random initialization manner. The second loss function can be expressed as
Figure BDA0002090935420000171
Figure BDA0002090935420000172
Indicating the derivation of an operating parameter w, Ltrain(w, α, β) represents a second cross entropy loss function for the operation parameter w, the operation weight α of the candidate operation, and the block join weight β. The second cross entropy loss function uses the operation weight α and the block connection weight β in the calculation, but only the operation parameter w is usually updated during the first stage of back propagation, and the operation weight α and the block connection weight β are not updated.
Alternatively, before the first stage of training is performed for the first time, initial values (for example, initial values given by a random initialization method) may be given to the operation parameter w, the operation weight α of the candidate operation, and the block connection weight β, respectively. In the first training process, the values of the operation weight α and the block connection weight β in the second cross entropy loss function are still the initial values given above. In the subsequent process of performing the training of the first stage again, the values of the operation weight α and the block connection weight β in the second cross entropy loss function are usually the values obtained in the last training of the second stage.
Optionally, in the present disclosure, the first data set and the second data set generally contain different sample data. For example, the present disclosure may divide a complete sample data set into two parts, one part being a training set (i.e., the second data set) and the other part being a validation set (i.e., the first data set), where the sample data in the training set is used to adjust the operation parameters of each candidate operation in each layer in each block in the first neural network. The sample data in the validation set is used to adjust structural parameters in the first neural network.
According to the method and the device, the first neural network is trained by using the sample data in the second data set to adjust the operation parameters of each candidate operation in each layer (including each parallel layer and each stacked layer) in each block in the first neural network, so that the first neural network has certain accuracy in processing the input information, and therefore, in the training process of the second stage, the structural parameters of the first neural network can be converged as soon as possible, and the efficiency of searching the neural network structure can be improved.
In an alternative example, the training of the first stage of the present disclosure may be implemented in two ways:
in a first mode, sample data in a second data set (for example, a training set) is provided to a first neural network, and the sample data is processed through all candidate operations in all layers (including parallel layers and stacked layers) in all paths formed by all blocks in the first neural network, so as to obtain a processing result of the first neural network on the sample data. That is, the input sample data stream has passed through all the block connections in the first neural network, has passed through all the layers in each block, and has been processed by all the candidate operations in each parallel layer and each stacked layer in all the blocks.
In the case of the first stage adopting the first mode, the present disclosure may adjust the operation parameters of all candidate operations in each layer (including each parallel layer and each stacked layer) in each block in the first neural network by using the second penalty function and adopting a gradient descent mode according to the processing result of the sample data and the sample data (such as the difference between the two). That is, during the back-propagation of training, the operating parameters of all candidate operations in all layers in all blocks are updated.
And in the second mode, the sample data in the second data set is provided for the first neural network, and the sample data is respectively processed through the selected candidate operation in all layers (including all parallel layers and all stacked layers) in all paths formed by all blocks in the first neural network, so that the processing result of the first neural network on the sample data is obtained. That is, the input sample data stream has passed through all the connections of blocks in the first neural network and through all the layers in each block, but has been processed only by parallel layers in all blocks and by one candidate operation selected in stacked layers. The method can adopt a random selection mode to select the candidate operation, and the method can also select the candidate operation according to the operation weight of the candidate operation. The present disclosure is not limited thereto.
In the case that the second method is adopted in the first stage, the present disclosure may adjust the operation parameters of the selected candidate operation in each layer in each block in the first neural network by using a gradient descent method according to the processing result of the first neural network on the sample data and the sample data (e.g., a difference between the two), using the second loss function. That is, in a back propagation process of the first stage training, the operation parameters of the currently selected candidate operation in all layers (including parallel layers and stacked layers) in all blocks are updated, and the operation parameters of the currently unselected candidate operation are not updated. According to the method and the device, the candidate operation is selected to process the sample data in the training process of the first stage, so that the training efficiency of the first stage is improved.
In one alternative example, the present disclosure may begin executing the second stage of the training process after completing the first stage of training once. In particular, the present disclosure may provide sample data in a first set of data (e.g., a validation set) to a first neural network, the sample data processed via the first neural network; then, according to the processing result of the first neural network on the sample data and the sample data (such as the difference between the two), the block connection weight in the first neural network and the operation weight of each candidate operation in each parallel layer and each stacked layer in each block are adjusted by a gradient descent method by using a first loss function (for example, a cross entropy loss function, which is hereinafter referred to as a first cross entropy loss function).
Alternatively, the first loss function may be expressed as
Figure BDA0002090935420000191
Therein
Figure BDA0002090935420000192
Which means that the operation weight a (operation weight before normalization processing) and the block connection weight β (block connection weight before normalization processing) of the candidate operation are derived,
Figure BDA0002090935420000193
a first cross entropy loss function representing an operation weight α and a block join weight β for the operation parameter w, the candidate operation. The first cross-entropy loss function uses the operation parameter w in the calculation, but in the second stage of back propagation, the disclosure usually only updates the operation weight α and the block connection weight β, and does not update the operation parameter w.
Optionally, when the second stage of training is started, the value of the operating parameter w in the first cross-entropy loss function is usually the value obtained by the last training in the first stage.
The present disclosure updates the block connection weight in the first neural network and the operation weight of each candidate operation in each parallel layer and each stacked layer in each block by using the first loss function, so that the block connection weight in the first neural network and the operation weight of each candidate operation can converge toward a gradient descending direction, thereby facilitating accurate representation of the structure of the second neural network from the first neural network.
In an alternative example, the training of the second stage of the present disclosure may be implemented in two ways:
in the first mode, sample data in the first data set is provided to the first neural network, and the sample data is processed through all candidate operations in all layers (including all parallel layers and all stacked layers) in all paths formed by all blocks in the first neural network, so that a processing result of the first neural network on the sample data is obtained.
In the case of the first mode in the second stage, the present disclosure may adjust the connection weight of each block in the first neural network and the operation weight of each candidate operation in each parallel layer and each stacked layer in each block in a gradient descent mode according to the processing result of the sample data and the sample data (e.g., a difference between the two) by using a first loss function (e.g., a first cross entropy loss function). That is, in the back propagation process of the second stage training, all block connection weights and operation weights of all candidate operations in all parallel layers and all stacked layers in all blocks are updated.
Optionally, the present disclosure may set delay parameters of each layer (including a parallel layer and a stacked layer) in the first loss function, and the present disclosure may perform multi-objective optimization processing by using the first loss function with the delay parameters of each layer and using a gradient descent method according to a processing result of sample data and sample data (e.g., a difference between the two), so as to update a block connection weight in the first neural network and an operation weight of each candidate operation in each layer (including a parallel layer and a stacked layer) in each block.
According to the method and the device, each layer of delay parameters are set in the first loss function, the fast connection weight and the operation weight are updated through a multi-objective optimization method, the finally obtained processing speed of the second neural network can be controlled, and therefore the applicability of the second neural network to the actual application environment is improved.
Optionally, the first loss function L (w, α, β) provided with the delay parameters of each layer may be expressed as:
L(w,α,β)=LCE+λlogτlatency formula (7)
In the above formula (7), LCELoss functions representing operating parameters, e.g. LCEMay be the second cross entropy loss function described in the above embodiments; λ and τ represent hyper-parameters (hyper-parameters) for controlling the magnitude of the stack-layer delay optimization term (i.e., latency); λ and τ may be known values; the value range of λ may be: 0.05-0.7; for example λ can be 0.2; tau can be in the range of 10-20, for example tau can be 15; λ and τ can be positively correlated with latency; latency represents the delay time of all blocks in the first neural network.
Alternatively, the delay times latency of all blocks in the first neural network may be calculated by using the following formula (8):
Figure BDA0002090935420000201
in the above formula (8), latencylRepresents the delay time of the l-th layer in the block; the minimum value in the value range of l may be 1, and the maximum value in the value range may be the number of layers formed by all the head layers and all the stacked layers in all the blocks in the first neural network.
Optionally, when the l-th layer is a stacked layer, the latency in the above formula (8)lCan be obtained by calculation according to the following formula (9):
Figure BDA0002090935420000202
in the above-mentioned formula (9),
Figure BDA0002090935420000203
an operation weight (operation weight after normalization processing) representing the o-th candidate operation in the l-th stacking layer in the block;
Figure BDA0002090935420000204
representing a delay time of an o-th candidate operation in an l-th stack layer in the block; o denotes a set of candidate operations.
Optionally, when the first layer is the head layer, the latency in the above formula (8)lCan be obtained by calculation from the following equation (10):
Figure BDA0002090935420000211
in the above formula (10), pj,iIndicates the block connection weight (normalized block connection weight) between the ith block and the jth blockUpstream blocks with adjacent and connected blocks;
Figure BDA0002090935420000212
representing an operation weight of an o-th candidate operation in an l-th parallel layer in a head layer of the i-th block;
Figure BDA0002090935420000213
represents a delay time of the o-th candidate operation in the l-th parallel layer in the head layer of the i-th block; o denotes a set of candidate operations.
And in the second mode, the sample data in the first data set is provided for the first neural network, and the sample data is respectively processed through the two selected candidate operations in all layers in all paths formed by all blocks in the first neural network, so that a processing result of the sample data is obtained. That is, the input sample data stream has passed through all the block connections in the first neural network and through all the layers in each block, but has been processed only by the parallel layers in all the blocks and the two candidate operations that have been picked out in the stack layers. The method can adopt a random selection mode to select two candidate operations, and the method can also select the two candidate operations according to the operation weight of the candidate operations. The present disclosure is not limited thereto.
In the second stage, in the case of adopting the second mode, the present disclosure may adjust the connection weight of each block in the first neural network and the operation weight of the two candidate operations selected in each layer in each block by using a gradient descent mode according to the processing result of the first neural network on the sample data and the sample data (e.g., a difference between the two samples). That is, in a back propagation process of the second stage training, the connection weights of all blocks and the operation weights of the two candidate operations selected this time in all layers (including parallel layers and stacked layers) in all blocks are updated, and the operation weights of the candidate operations not selected this time are not updated. According to the method and the device, two candidate operations are selected to process the sample data in the training process of the second stage, so that the training efficiency of the second stage is improved.
In an optional example, in a case that the manner two is adopted in the second stage, after the block connection weight and the operation weight are updated, the present disclosure may further adjust the operation weights of the two candidate operations that are selected, for example, adjust the operation weights of the two candidate operations that are selected by using an offset.
Optionally, the present disclosure may calculate the offset of the selected two candidate operations according to the updated operation weight; then, the operation weights of the two selected candidate operations in each layer in each block in the first neural network are adjusted according to the calculated offset, for example, the offset is respectively added to the operation weights of the two selected candidate operations.
Alternatively, the present disclosure may use the following equation (11) to calculate the offset for the two candidate operations that are chosen:
Figure BDA0002090935420000221
in the above formula (11), OsRepresenting two selected candidate operations; o represents the selected o-th candidate operation;
Figure BDA0002090935420000222
an operation weight representing the o-th candidate operation in the l-th layer before the update;
Figure BDA0002090935420000223
indicating the updated operation weight of the o-th candidate operation in the l-th layer.
The operation weights of the two selected candidate operations in each layer in each block in the first neural network are adjusted by utilizing the offset, so that the operation weights are more reasonable, and the training efficiency of the second stage is improved.
Alternatively, the present disclosure may determine the blocks belonging to the second neural network and the parallel layers in the blocks by comparing the block connection weights in the first neural network. The present disclosure may determine the layers in the block belonging to the second neural network and the operations contained in the layers by comparing the operation weights of all the candidate operations in the layers (including the parallel layers and the stacked layers) in the block belonging to the second neural network. Additionally, the present disclosure may determine the location of the upsampling in the second neural network from the parallel layers in the blocks belonging to the second neural network.
It should be particularly noted that the present disclosure may preset a partial layer in the first neural network according to actual requirements, and the preset partial layer may be directly derived to become a partial layer in the second neural network. For example, at least a previous layer (e.g., the first two layers) in the first neural network; as another example, at least the last-layer in the first neural network, and so on. I.e., part of the layers in the second neural network may be obtained without searching through the neural network structure. The operations performed by the preset layers can be set according to actual requirements. For example, the first two layers in the first/second neural network are preset, wherein the first layer can be a common convolutional layer, and the output is based on 16 × 112 × 112 information (such as a 16 × 112 × 112 feature map). The second layer may adopt an MBConv structure in which the convolution kernel size is 3 × 3 and the expansion coefficient is 1, and the output of the second layer may be information based on 24 × 56 × 56 (e.g., a 24 × 56 × 56 feature map). The preset layer in the first/second neural network may be regarded as a preset block, for example, the first layer may be regarded as a first block of the first/second neural network, and the second layer may be regarded as a second block of the first/second neural network. During the first stage of training of the first neural network, the operational parameters in the preset partial layers may be adjusted.
Exemplary devices
Fig. 6 is a schematic structural diagram of an embodiment of a neural network structure search apparatus according to the present disclosure. The apparatus of this embodiment may be used to implement the method embodiments of the present disclosure described above. The apparatus shown in fig. 6 mainly comprises: an acquisition module 600, a search module 601, and a determination module 602.
The obtaining module 600 is configured to obtain a first neural network including a plurality of blocks with different channel numbers. At least one block in the first neural network is connected with at least three blocks, and at least one block comprises a header layer for performing block channel number and spatial resolution conversion.
Optionally, the obtaining module 600 may obtain the number of channels of all blocks required by the first neural network and the number of upstream/downstream blocks connected to the blocks, and determine the connection relationship of all blocks according to the number of channels of the blocks and the number of upstream/downstream blocks, so as to form the first neural network.
Optionally, in the first neural network formed by the obtaining module 600, if there are three blocks connected in sequence, in the three blocks, the channel number increase of the first block relative to the middle block is not greater than the channel number increase of the middle block relative to the last block.
Optionally, for the ith block in the first neural network, the obtaining module 600 may determine, according to the number of upstream blocks of the ith block, the number of parallel layers included in the header layer of the ith block, and determine, according to the number of channels and the spatial resolution of the ith block, and the number of channels and the spatial resolution of each upstream block connected to the ith block, the number of input channels and the spatial resolution, and the number of output channels and the spatial resolution, which correspond to each parallel layer, respectively. Wherein different parallel layers correspond to different upstream blocks.
Optionally, the obtaining module 600 may set operations included in each parallel layer in the header layer according to all candidate operations in the preset candidate operation set; for any parallel layer in the ith block in the first neural network, the process of setting the output of the parallel layer by the obtaining module 600 may include: and calculating according to the operation weight of each candidate operation in the parallel layer and the output of each candidate operation to obtain a calculation result.
Optionally, for the ith block in the first neural network, the process of setting the output of the header layer in the ith block by the obtaining module 600 may include: the obtaining module 600 performs calculation according to the block connection weight corresponding to each parallel layer in the i-th block header layer and the output of each parallel layer, and obtains a calculation result.
Optionally, for the ith block in the first neural network, the obtaining module 600 may determine, according to the number of channels and the spatial resolution of the ith block, the number of input channels and the spatial resolution, and the number of output channels and the spatial resolution, which correspond to each of the stack layers sequentially located after the head layer in the ith block, respectively.
Alternatively, the obtaining module 600 may set, according to all candidate operations in the preset candidate operation set, operations included in each stack layer sequentially located after the head layer in each block. For the jth stack layer in the ith block in the first neural network, the process of setting the output of the jth stack layer by the obtaining module 600 may include: and calculating according to the operation weight of each candidate operation in the jth stack layer and the output of each candidate operation to obtain a calculation result.
Optionally, the searching module 601 in the present disclosure may further perform the following operations before performing the neural network structure search processing on the first neural network by using a gradient-based search strategy according to sample data in the first data set: the searching module 601 provides the sample data in the second data set to the first neural network, and processes the sample data through the first neural network; the search module 601 adjusts the operation parameters of each candidate operation in each layer in each block in the first neural network by using the first loss function according to the processing result of the sample data and the sample data.
Optionally, the search module 601 may adjust the operation parameters of each candidate operation in each layer in each block in the first neural network in two ways to implement the first stage of training.
In the first mode, the search module 601 provides the sample data in the second data set to the first neural network, and processes the sample data through all candidate operations in all layers in all paths formed by all blocks in the first neural network, so as to obtain a processing result of the sample data.
In the second mode, the search module 601 provides the sample data in the second data set to the first neural network, and processes the sample data through the selected one candidate operation in all layers in all paths formed by all blocks in the first neural network, so as to obtain a processing result of the sample data. In the case of the second method, the searching module 601 may adjust the operation parameters of the selected candidate operation in each layer in each block in the first neural network by using a second loss function and a gradient descent method according to the processing result of the sample data and the sample data.
Optionally, the search module 601 may provide sample data in the first data set to the first neural network, process the sample data through the first neural network, and then, according to the processing result of the sample data and the sample data, the search module 601 updates the block connection weight in the first neural network and the operation weight of each candidate operation in each layer in each block by using a gradient descent manner through the first loss function.
Optionally, the search module 601 may implement the second stage of training in the following two ways.
In the first mode, the search module 601 provides the sample data in the first data set to the first neural network, and processes the sample data through all candidate operations in all layers in all paths formed by all blocks in the first neural network, so as to obtain a processing result of the sample data.
In the second mode, the search module 601 provides the sample data in the first data set to the first neural network, and respectively processes the sample data through the selected two candidate operations in all layers in all paths formed by all blocks in the first neural network, so as to obtain a processing result of the sample data.
Optionally, the search module may perform multi-objective optimization processing by using a first loss function including delay parameters of each layer according to a processing result of sample data and the sample data and using a gradient descent method, so as to update the block connection weight in the first neural network and the operation weight of each candidate operation in each layer in each block.
Optionally, when the search module 601 adopts the second method, the search module 601 may further calculate offset amounts of the two selected candidate operations according to the updated operation weights; and adjusting the operation weights of the two selected candidate operations in each layer in each block in the first neural network according to the offset.
The determining module 602 is configured to determine a second neural network structure obtained by the search according to the structure parameter of the first neural network obtained by the searching module 601.
Exemplary electronic device
An electronic device according to an embodiment of the present disclosure is described below with reference to fig. 7. FIG. 7 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure. As shown in fig. 7, the electronic device 71 includes one or more processors 711 and memory 77.
The processor 711 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 71 to perform desired functions.
Memory 77 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory, for example, may include: random Access Memory (RAM) and/or cache memory (cache), etc. The nonvolatile memory, for example, may include: read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 711 to implement the neural network structure searching methods of the various embodiments of the present disclosure described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.
In one example, the electronic device 71 may further include: input devices 713 and output devices 714, among other components, interconnected by a bus system and/or other form of connection mechanism (not shown). The input device 713 may also include, for example, a keyboard, a mouse, and the like. The output device 714 can output various information to the outside. The output devices 714 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.
Of course, for simplicity, only some of the components of the electronic device 71 relevant to the present disclosure are shown in fig. 7, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device 71 may include any other suitable components, depending on the particular application.
Exemplary computer program product and computer-readable storage Medium
In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the neural network structure searching method according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.
The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the neural network structure searching method according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium may include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, and systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," comprising, "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects, and the like, will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (17)

1. A neural network structure searching method includes:
acquiring a first neural network comprising a plurality of blocks with different channel numbers, wherein at least one block in the first neural network is connected with at least three blocks, and at least one block comprises a header layer for performing block channel number and spatial resolution conversion;
according to sample data in a first data set, carrying out neural network structure search processing on the first neural network by utilizing a search strategy based on gradient to obtain structure parameters of the first neural network;
and determining a second neural network structure obtained by searching according to the structure parameters of the first neural network.
2. The method of claim 1, wherein the obtaining a first neural network comprising a plurality of blocks with different channel numbers comprises:
acquiring the number of channels of all blocks required by a first neural network and the number of upstream/downstream blocks connected with the blocks;
and determining the connection relation of all the blocks according to the number of the upstream/downstream blocks and the increment of the number of the channels of the blocks.
3. The method of claim 2, wherein, of the sequentially connected three blocks in the first neural network, the channel number increase of the first block relative to the middle block is not greater than the channel number increase of the middle block relative to the last block.
4. The method of claim 2 or 3, wherein the obtaining a first neural network comprising a plurality of blocks with different numbers of channels comprises:
for an ith block in the first neural network, determining the number of parallel layers included in a header layer of the ith block according to the number of upstream blocks of the ith block, and determining the number of input channels and the spatial resolution, and the number of output channels and the spatial resolution which respectively correspond to each parallel layer according to the number of channels and the spatial resolution of the ith block and the number of channels and the spatial resolution of each upstream block connected with the ith block;
wherein different parallel layers correspond to different upstream blocks.
5. The method of claim 4, wherein the obtaining a first neural network comprising a plurality of blocks with different channel numbers comprises:
setting operations contained in each parallel layer in the head layer according to all candidate operations in a preset candidate operation set;
for any parallel layer in the ith block in the first neural network, setting an output of the parallel layer comprises:
and calculating according to the operation weight of each candidate operation in the parallel layer and the output of each candidate operation to obtain a calculation result.
6. The method of any of claims 1 to 5, wherein the obtaining a first neural network comprising a plurality of blocks with different numbers of channels comprises:
for an ith block in the first neural network, setting an output of a header layer in the ith block comprises:
and calculating according to the block connection weight corresponding to each parallel layer in the head layer of the ith block and the output of each parallel layer to obtain a calculation result.
7. The method of any of claims 1 to 6, wherein the obtaining a first neural network comprising a plurality of blocks with different numbers of channels comprises:
and for the ith block in the first neural network, determining the number of input channels and the spatial resolution, and the number of output channels and the spatial resolution which correspond to each stacking layer sequentially positioned behind the head layer in the ith block according to the number of channels and the spatial resolution of the ith block.
8. The method of any of claims 1 to 7, wherein the obtaining a first neural network comprising a plurality of blocks with different numbers of channels comprises:
setting operations contained in each stacking layer sequentially positioned behind the head layer in each block according to all candidate operations in a preset candidate operation set;
for a jth stack layer in an ith block in the first neural network, setting an output of the jth stack layer comprises: and calculating according to the operation weight of each candidate operation in the jth stack layer and the output of each candidate operation to obtain a calculation result.
9. The method of any of claims 1 to 8, wherein the method further comprises, prior to performing a neural network structure search process on the first neural network using a gradient-based search strategy based on sample data in a first set of data:
providing the sample data in the second data set to the first neural network, and processing the sample data through the first neural network;
and adjusting the operating parameters of each candidate operation in each layer in each block in the first neural network by using a first loss function according to the processing result of the sample data and the sample data.
10. The method of claim 9, wherein said providing sample data in the second set of data to the first neural network, the processing of the sample data via the first neural network comprising:
providing the sample data in the second data set to the first neural network, and respectively processing the sample data through all candidate operations in all layers in all paths formed by all blocks in the first neural network to obtain a sample data processing result; or
Providing the sample data in the second data set to the first neural network, and respectively processing the sample data through the selected candidate operation in all layers in all paths formed by all blocks in the first neural network to obtain a sample data processing result;
the adjusting, according to the processing result of the sample data and the sample data, the operation parameters of each candidate operation in each layer in each block in the first neural network by using a first loss function includes:
and adjusting the operation parameters of the selected candidate operation in each layer in each block in the first neural network by using a second loss function and a gradient descent mode according to the processing result of the sample data and the sample data.
11. The method according to any one of claims 1 to 10, wherein said performing a neural network structure search process on said first neural network using a gradient-based search strategy according to sample data in a first data set comprises:
providing sample data in a first data set to a first neural network, and processing the sample data through the first neural network;
and updating the block connection weight in the first neural network and the operation weight of each candidate operation in each layer in each block by using a gradient descent mode according to the processing result of the sample data and by using a first loss function.
12. The method of claim 11, wherein said providing sample data in the first set of data to a first neural network, said sample data being processed via said first neural network, comprises:
providing sample data in the first data set to a first neural network, and respectively processing the sample data through all candidate operations in all layers in all paths formed by all blocks in the first neural network to obtain a sample data processing result; or
And providing the sample data in the first data set to a first neural network, and respectively processing the sample data through two selected candidate operations in all layers in all paths formed by all blocks in the first neural network to obtain a sample data processing result.
13. The method according to claim 12, wherein the updating the block connection weight in the first neural network and the operation weight of each candidate operation in each layer in each block in a gradient descent manner according to the processing result of the sample data and the sample data by using a first loss function comprises:
and performing multi-objective optimization processing by using a first loss function containing delay parameters of each layer and adopting a gradient descent mode according to the processing result of the sample data and the sample data so as to update the block connection weight in the first neural network and the operation weight of each candidate operation in each layer in each block.
14. The method according to claim 13, wherein the updating the block connection weight in the first neural network and the operation weight of each candidate operation in each layer in each block in a gradient descent manner according to the processing result of the sample data and the sample data by using a second loss function further comprises:
calculating the offset of the two selected candidate operations according to the updated operation weight;
adjusting operation weights of the two selected candidate operations in each layer in each block in the first neural network according to the offset.
15. A neural network structure search apparatus, comprising:
an obtaining module, configured to obtain a first neural network including a plurality of blocks with different channel numbers, where at least one block in the first neural network is connected to at least three blocks, and at least one block includes a header layer for performing block channel number and spatial resolution conversion;
the searching module is used for searching the neural network structure of the first neural network acquired by the acquiring module by utilizing a gradient-based searching strategy according to sample data in the first data set to acquire the structural parameters of the first neural network;
and the determining module is used for determining the second neural network structure obtained by searching according to the structure parameters of the first neural network obtained by the searching module.
16. A computer-readable storage medium, the storage medium storing a computer program for performing the method of any of the preceding claims 1-14.
17. An electronic device, the electronic device comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method of any one of claims 1-14.
CN201910503236.XA 2019-06-11 2019-06-11 Neural network structure search method, apparatus, medium, and device Pending CN112069370A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910503236.XA CN112069370A (en) 2019-06-11 2019-06-11 Neural network structure search method, apparatus, medium, and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910503236.XA CN112069370A (en) 2019-06-11 2019-06-11 Neural network structure search method, apparatus, medium, and device

Publications (1)

Publication Number Publication Date
CN112069370A true CN112069370A (en) 2020-12-11

Family

ID=73658516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910503236.XA Pending CN112069370A (en) 2019-06-11 2019-06-11 Neural network structure search method, apparatus, medium, and device

Country Status (1)

Country Link
CN (1) CN112069370A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113609310A (en) * 2021-08-25 2021-11-05 上海交通大学 Single-machine large-scale knowledge graph embedding system and method
CN115760777A (en) * 2022-11-21 2023-03-07 脉得智能科技(无锡)有限公司 Hashimoto's thyroiditis diagnostic system based on neural network structure search

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596330A (en) * 2018-05-16 2018-09-28 中国人民解放军陆军工程大学 Parallel characteristic full-convolution neural network and construction method thereof

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596330A (en) * 2018-05-16 2018-09-28 中国人民解放军陆军工程大学 Parallel characteristic full-convolution neural network and construction method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
温煌璐: "基于卷积神经网络的图像分类算法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 12, pages 138 - 1368 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113609310A (en) * 2021-08-25 2021-11-05 上海交通大学 Single-machine large-scale knowledge graph embedding system and method
CN113609310B (en) * 2021-08-25 2023-08-08 上海交通大学 Single-machine large-scale knowledge graph embedding system and method
CN115760777A (en) * 2022-11-21 2023-03-07 脉得智能科技(无锡)有限公司 Hashimoto's thyroiditis diagnostic system based on neural network structure search
CN115760777B (en) * 2022-11-21 2024-04-30 脉得智能科技(无锡)有限公司 Hashimoto thyroiditis diagnosis system based on neural network structure search

Similar Documents

Publication Publication Date Title
Guan et al. FPGA-based accelerator for long short-term memory recurrent neural networks
Ye et al. Inverted pyramid multi-task transformer for dense scene understanding
Xiao et al. Synchronization and stability of delayed fractional-order memristive quaternion-valued neural networks with parameter uncertainties
US20160358068A1 (en) Reducing computations in a neural network
EP3616134A1 (en) Systems and methods for improved optimization of machine-learned models
So et al. Non-asymptotic convergence analysis of inexact gradient methods for machine learning without strong convexity
JP7158236B2 (en) Speech recognition method and device
US20130116991A1 (en) Time series data analysis method, system and computer program
CN109697977B (en) Speech recognition method and device
JP2018109947A (en) Device and method for increasing processing speed of neural network, and application of the same
KR102655950B1 (en) High speed processing method of neural network and apparatus using thereof
JP2021504836A5 (en)
Mahajan et al. A distributed block coordinate descent method for training l1 regularized linear classifiers
CN112069370A (en) Neural network structure search method, apparatus, medium, and device
JP6859247B2 (en) Learning equipment, analysis systems, learning methods and learning programs
CN111242162A (en) Training method and device of image classification model, medium and electronic equipment
WO2019163718A1 (en) Learning device, speech recognition order estimation device, methods therefor, and program
Amirthakodi et al. An inventory system with service facility and finite orbit size for feedback customers
CN112396085B (en) Method and apparatus for recognizing image
Tanaka et al. Automated structure discovery and parameter tuning of neural network language model based on evolution strategy
CN104573331B (en) A kind of k nearest neighbor data predication method based on MapReduce
Fujimori et al. Modality-specific learning rate control for multimodal classification
Yu et al. Two-step hyperparameter optimization method: Accelerating hyperparameter search by using a fraction of a training dataset
JP7508798B2 (en) Optimization device, optimization program, and optimization method
US20240160894A1 (en) Mixture-of-experts layer with switchable parallel modes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination