CN113780518B - Network architecture optimization method, terminal equipment and computer readable storage medium - Google Patents

Network architecture optimization method, terminal equipment and computer readable storage medium Download PDF

Info

Publication number
CN113780518B
CN113780518B CN202110914528.XA CN202110914528A CN113780518B CN 113780518 B CN113780518 B CN 113780518B CN 202110914528 A CN202110914528 A CN 202110914528A CN 113780518 B CN113780518 B CN 113780518B
Authority
CN
China
Prior art keywords
network
code
sub
network architecture
architecture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110914528.XA
Other languages
Chinese (zh)
Other versions
CN113780518A (en
Inventor
马里佳
李坚强
林秋镇
黄兴
邵增洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202110914528.XA priority Critical patent/CN113780518B/en
Publication of CN113780518A publication Critical patent/CN113780518A/en
Application granted granted Critical
Publication of CN113780518B publication Critical patent/CN113780518B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Error Detection And Correction (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application is applicable to the technical field of computers, and provides a network architecture optimization method, terminal equipment and a computer readable storage medium, comprising the following steps: acquiring a network architecture set, wherein the network architecture set comprises a plurality of network architectures; encoding each network architecture in the network architecture set to obtain an encoding set; iteratively searching out an optimal code in the code set based on a particle swarm algorithm; and decoding the optimal code to obtain an optimized target network architecture. By the method, the optimization efficiency of the network architecture can be effectively improved, and the optimization of the performance of the convolutional neural network is ensured.

Description

Network architecture optimization method, terminal equipment and computer readable storage medium
Technical Field
The application belongs to the technical field of computers, and particularly relates to a network architecture optimization method, terminal equipment and a computer readable storage medium.
Background
With the development of artificial intelligence, deep learning is increasingly used. For example, deep learning may be applied in the fields of image detection, gesture recognition, voice recognition, and the like. Convolutional neural networks are a classical, widely used network in deep learning, and their performance is generally highly related to their own network architecture.
At present, a network architecture of a convolutional neural network is usually preset, and the network architecture is manually optimized according to the performance index of the network in an application process. The existing network architecture optimization method is low in efficiency and cannot guarantee the optimization of the performance of the convolutional neural network.
Disclosure of Invention
The embodiment of the application provides a network architecture optimization method, terminal equipment and a computer readable storage medium, which can effectively improve the optimization efficiency of a network architecture and ensure the optimization of the performance of a convolutional neural network.
In a first aspect, an embodiment of the present application provides a network architecture optimization method, including:
acquiring a network architecture set, wherein the network architecture set comprises a plurality of network architectures;
encoding each network architecture in the network architecture set to obtain an encoding set;
iteratively searching out an optimal code in the code set based on a particle swarm algorithm;
and decoding the optimal code to obtain an optimized target network architecture.
In the embodiment of the application, the network architecture is represented by codes, and then iterative search is carried out on the codes based on a particle swarm algorithm to determine the optimal codes; and finally, decoding the optimal code into an optimized target network structure. By encoding the network architecture and organically combining the particle swarm algorithm with the optimization method of the network architecture, the purpose of optimizing the network architecture by using the particle swarm algorithm is realized. In addition, the optimization efficiency of the network architecture is greatly improved due to the advantages of low calculation cost, less parameters to be adjusted, high convergence speed and the like of the particle swarm algorithm.
In a possible implementation manner of the first aspect, each of the network architectures in the set of network architectures includes at least one sub-network, and each sub-network includes at least one network component therein.
In a possible implementation manner of the first aspect, the encoding each network architecture in the set of network architectures to obtain an encoded set includes:
for each network architecture, respectively encoding the topological structure of each sub-network in the network architecture to obtain the structural code of each sub-network;
coding the number of the network components contained in each sub-network to obtain the depth coding of each sub-network;
and generating codes corresponding to the network architecture according to the structural codes and the depth codes of each sub-network.
In a possible implementation manner of the first aspect, the encoding, for each network architecture, a topology structure of each sub-network in the network architecture, to obtain a structural encoding of each sub-network includes:
generating binary code values for connection relations between every two network components in the sub-network for each sub-network in the network architecture;
and combining the binary code values according to the connection sequence of the network components to generate the structural codes of the sub-network.
In a possible implementation manner of the first aspect, the encoding the number of components of the network component included in each sub-network to obtain a depth code of each sub-network includes:
for each sub-network, calculating a difference between the number of components of the network components contained in the sub-network and the minimum value in a data set, wherein the data set comprises the number of components of the network components contained in each sub-network in the network architecture;
converting the number difference into a binary code;
the converted binary code is determined as the depth code of the sub-network.
In a possible implementation manner of the first aspect, the iterative searching of the optimal code in the code set based on the particle swarm algorithm includes:
searching a first target code in the current code set according to a preset target function in the process of iterative searching each time;
if the number of completed iterative optimization reaches the preset iterative number, determining the first target code as the optimal code;
if the number of the completed iterative optimization does not reach the preset iterative number, updating the codes in the code set according to the first target code to obtain the updated code set;
and continuing searching for a second target code in the updated code set according to the target function until the preset iteration times are reached.
In a possible implementation manner of the first aspect, the objective function is:
wherein X and Y are each one of the codes in the code set, S b For the code set, the F (X) represents an evaluation index of the network architecture corresponding to the X, and the F (Y) represents an evaluation index of the network architecture corresponding to the Y.
In a second aspect, an embodiment of the present application provides a network architecture optimization apparatus, including:
the system comprises an acquisition unit, a storage unit and a control unit, wherein the acquisition unit is used for acquiring a network architecture set, and the network architecture set comprises a plurality of network architectures;
the coding unit is used for coding each network architecture in the network architecture set to obtain a coding set;
the searching unit is used for iteratively searching out the optimal code in the code set based on a particle swarm algorithm;
and the decoding unit is used for decoding the optimal code to obtain an optimized target network architecture.
In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the network architecture optimization method according to any one of the first aspect when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium, where a computer program is stored, where the computer program is executed by a processor to implement a network architecture optimization method according to any one of the first aspects.
In a fifth aspect, embodiments of the present application provide a computer program product, which when run on a terminal device, causes the terminal device to perform the network architecture optimization method according to any one of the first aspects above.
It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required for the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a network architecture optimization method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of a network architecture provided by an embodiment of the present application;
FIG. 3 is a coding schematic diagram of a network architecture according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a particle location update provided by an embodiment of the present application;
fig. 5 is a block diagram of a network architecture optimization device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a terminal device provided in an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As used in this specification and the appended claims, the term "if" may be construed as "when..once" or "in response to a determination" or "in response to detection" depending on the context.
In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise.
In the following embodiments of the present application, a neural network is taken as an example to describe a network architecture optimization method. The neural network herein includes various similar neural networks such as convolutional neural network, cyclic neural network, BP neural network, and residual neural network.
Referring to fig. 1, which is a schematic flow chart of a network architecture optimization method provided in an embodiment of the present application, by way of example and not limitation, the method may include the following steps:
s101, acquiring a network architecture set, wherein the network architecture set comprises a plurality of network architectures.
In an embodiment of the present application, each network architecture in the set of network architectures includes at least one sub-network, and each sub-network includes at least one network component therein.
Network components may be extracted from an existing network architecture (e.g., denseNet, VGGNet) that is superior in performance and then at least one network component is combined into a sub-network. For example, a convolutional layer, a normalization layer, and an activation layer may be included in a network component that are connected in order.
Referring to fig. 2, a schematic diagram of a network architecture according to an embodiment of the present application is provided. The network architecture shown in fig. 2 includes 4 sub-networks (i.e., stage 1, stage 2, stage 3, and stage 4 shown in fig. 2). In the first subnetwork (stage 1) 4 network components are included (i.e. 4 nodes numbered 1, 2, 3,4 in stage 1 shown in fig. 2); the second subnetwork (stage 2) comprises 5 network components; the third sub-network comprises 3 network components; the fourth subnetwork includes 3 network components. In addition, each subnetwork includes an input node (e.g., the node labeled "I" as shown in fig. 2) and an output node (e.g., the node labeled "O" as shown in fig. 2). The 4 sub-networks are connected in sequence.
In the embodiment of the application, the network components in each sub-network are orderly coded, the input node of each sub-network is connected to all network components without precursors (not connected with the previous network components) in the sub-network in a default directional manner, and the network components in each sub-network are connected in a directional manner from small to large in sequence.
As the network components of stage 1 in fig. 2, each network component is numbered in sequence according to the flow direction sequence of the network data, the network component 1 is connected to the network component 3, the network component 3 is connected to the network component 4, and the network component 2 is connected to the network component 4. Since neither network component 1 nor 2 has a predecessor network component, the input node needs to connect both network components 1 and 2.
The number of sub-networks in the network architecture and the number of network components in each sub-network are not particularly limited. Preferably, to facilitate subsequent searches, the number of sub-networks in each network architecture is set to the same number, e.g., 4; defining an optional range of numbers of network components per sub-network, such as 3 to 10.
S102, coding each network architecture in the network architecture set to obtain a coding set.
Optimization of network architecture typically includes optimization of network depth and optimization of network topology. In the prior art, these two parts are usually optimized separately. In the embodiment of the application, optimization of two parts is considered simultaneously.
In one embodiment, the process of encoding may include the following steps for each network architecture:
I. and respectively encoding the topological structure of each sub-network in the network architecture to obtain the structural code of each sub-network.
Alternatively, one implementation of obtaining the structural code may include:
generating binary code values for connection relations between every two network components in the sub-network for each sub-network in the network architecture; and combining the binary code values according to the connection sequence of the network components to generate the structural codes of the sub-network.
Specifically, in the s-th stage, a binary code T with a length L is adopted s To represent the topology of the s-th stage, l=1/2×10×9=45. In the s-th stage, T s The 1 st bit code value of (2) represents the node pair (N s,1 ,N s,2 ) The connection state between the two, bit 2 indicates the node pair (N s,1 ,N s,3 ) The connection state between them, and so on. Wherein L is s =1/2×K s ×(K s -1),L' s =1/2×K s ×(K s -1)-K s +1. If node pair (N) s,i ,N s,j ) The corresponding bit value is 1, which indicates node N s,i And node N s,j The two are connected; otherwise, it means that there is no connection between the two nodes. Finally T s Residual L r Bits are filled with 0, L r =45-L s . For example, a 4-stage topology may be encoded as a length L T Binary string of L T =4×45=180。
For example, as in stage 1 shown in fig. 2, node 1 is not connected to node 2, and the 1 st bit code value is 0; node 1 is connected with node 3, and the 2 nd bit code value is 1; no connection exists between the node 2 and the node 1, and the 3 rd bit code value is 0; node 2 and node 3 are not connected, and the 4 th bit code value is 0; the node 2 is connected with the node 4, and the 5 th bit code value is 1; and so on, the remaining bits are complemented with 0 s for 180 bits in total.
The structure coding length in the s-th stage is set to L bits, but the effective coding length is only Ls bits. By supplementing 0, the structure coding length of each stage is unified, the coding position in each stage can be rapidly and effectively determined, and the calculation of a subsequent search algorithm is facilitated.
II. The number of components of the network components contained in each sub-network is encoded to obtain a depth code for each sub-network.
Alternatively, one implementation of obtaining depth coding may include:
for each sub-network, calculating a difference between the number of components of the network components contained in the sub-network and the minimum value in the data set, wherein the data set comprises the number of components of the network components contained in each sub-network in the network architecture; converting the quantity difference into binary codes; the converted binary code is determined as a depth code of the sub-network.
Illustratively, as shown in FIG. 2, only the number of active nodes (nodes other than the input and output nodes) of a phase are considered in encoding the number of nodes within each phase. The number of active nodes per stage has a parameter range of 3 to 10, i.e. the number of nodes has a parameter pool size of 8 (3, 4,5,6,7,8,9, 10), with a maximum number of 10, a minimum number of 3, and a difference of 7,7 corresponding binary code of 111. The number of nodes in each phase is thus represented by a 3-bit binary code. The specific coding mode is that the number Ks of the nodes in the s stage is subtracted by 3 (the minimum value in the parameter pool) to obtain Ks ', and then the Ks' is converted into a binary representation.
When the number of subnetworks is 4, a binary code of 12 bits in length is used to represent the number of active nodes in 4 phases. The 1 st to 3 rd bit code values in the binary code represent the number of nodes in the first stage, the next 3 rd bit code value represents the number of nodes in the second stage, and so on. As shown in fig. 2, there are 4 network components in phase 1, 4-1=1, and the corresponding binary code is 001, i.e., the depth code of phase 1 is 001.
And III, generating codes corresponding to the network architecture according to the structural codes and the depth codes of each sub-network.
The structure codes and the depth codes of each sub-network can be combined into a group of codes in turn according to the connection sequence of the network architecture, and the group of codes are used as codes corresponding to the network architecture.
The method can also be used for combining the structural codes of each sub-network into a group of codes according to the connection sequence of the network architecture, combining the depth codes of each sub-network into a group of codes, and finally combining the two groups of codes into codes corresponding to the network architecture. Referring to fig. 3, a coding schematic diagram of a network architecture according to an embodiment of the present application is provided. As shown in fig. 3, the front part is the topology coding composed of the structural codes of each sub-network, and the rear part is the depth coding composed of the depth codes of each sub-network.
In the embodiment of the present application, the combination order of the codes is not specifically limited. However, the order at the time of encoding and the order at the time of decoding need to be identical.
S103, iteratively searching out the optimal code in the code set based on a particle swarm algorithm.
And constructing a searching objective function based on the coding rule.
Illustratively, in the encoding rule shown in the embodiment of FIG. 2, a 192-bit binary code will be generated. Network architecture search nullingInter-mapping to a binary vector space defined as S bWherein x= (X) 1 ,x 2 ,x 3 ...,x 192 ). Accordingly, embodiments of the present application model the problem of optimizing the network architecture as a minimization problem of the objective function F (X):
wherein X is a binary space S b Corresponds to the network architecture in the network search space. Different points represent different network architectures. F (X) represents the recognition error rate obtained when the convolutional neural network corresponding to X is evaluated on the image test set.
The particle swarm algorithm is described below.
(a) And initializing a particle swarm. The population initialization step needs to initialize the position and speed vectors of particles, the position and speed of the particles i are initialized to be 192-bit-long vectors according to the binary coding strategy of the network architecture, and the position of the particles i is expressed as X i =(x i1 ,x i2 ,x i3 ,...,x i192 ) Wherein x is ij =0 or x ij =1, is distributed by bernoulli b l Sampling to obtain; b l -B (0.5). The velocity of particle i represents bit V i =(v i1 ,v i2 ,v i3 ,…,v i192 ),v ij Is real, obtained by uniformly distributing U samples: u (-1, 1). It should be noted that: the particle position represents the solution of the objective function F (x). After initializing the population, the optimal position P of particle i is updated ib Global optimum position P gb 。P ib Is the same as the particle i position, i.e.: x is X i =P ib And P is gb Is set to the optimal position in the initialization population.
(b) Evaluation of particles. First, a more proper method is selected to initialize the convolutional neural network architectureAnd (5) a weight. Because the xaview weight initialization method is beneficial to avoiding the convolutional neural network architecture from sinking into a local minimum in gradient optimization, the xaview method is selected to initialize the convolutional neural network weight in the embodiment of the application. The training dataset in the reference dataset is divided into two sub-datasets: d (D) train And D fitness The method comprises the steps of carrying out a first treatment on the surface of the Sub-dataset D train And D fitness The sample size ratio of (2) is 7:3. After the location of the particles is decoded into the corresponding neural network architecture with parameter settings, the neural network architecture is found in training data set D train Training E periods, and then training the neural network after E periods in the test data set D fitness And (3) performing a test, and taking the identification error rate obtained by the test as a fitness value of the particles.
The embodiment of the application uses a method for effectively evaluating the particle fitness. The particle fitness evaluation method not only relieves the dilemma of insufficient calculation resources in a laboratory, but also accelerates the evolution process of the whole population, and can obtain the optimal network architecture in effective time.
(c) Updating individual best particle positions P ib And global optimum particle position P gb 。P ib And P gb The updating mode can adopt an original binary particle swarm optimization algorithm (BPSO) method. Specifically, first, the fitness value F (X i (t)) and then the fitness value F (P) with the individual best particles ib ) The lower the value of F (X) the better the network architecture performance corresponding to X. If F (X) i (t))<F(P ib ) Then the best position P of particle i ib Changing to the current position X i (t):P ib =X i (t)。P gb Is the best place visited in the whole population, thus P gb The update formula is as follows:
where N is the population size. And comparing fitness values of all individual optimal particles in the population, wherein the individual optimal particle with the lowest fitness value is the global optimal particle.
(d) Particle velocity and location update. In the particle swarm algorithm, particles adjust a velocity vector by learning. The new velocity vector helps the particles fly towards the promising area. In the existing particle swarm optimization algorithm, the defined particle state updating rule is no longer a discrete environment in the application, and therefore, the embodiment of the application redefines the updating rule according to the newly defined particle state. The redefined particle state update rule is as follows:
V i =sig(wV i +c 1 r 1 (P gb ΔX i )+c 2 r 2 (P ib ΔX i )),
X i =X i ΘV i
in the above, V i Indicating a change in the current particle position X i W represents an inertia term of the current speed to the new speed, and the range of w values is (0, 1) and is set by the user. r is (r) 1 And r 2 Is two random variables within the range (0, 1), updated each iteration. And c 1 ,c 2 Also two fixed variables determined by the user. sig is a sigmoid function. The sigmoid function converts the real value of velocity into the probability of flipping a bit at the location of the particle, the specific expression for the sigmoid function follows.
The symbol "delta" is similar to an exclusive-or operation given an optimal position P i =(p i1 ,p i2 ,p i3 ,…,p i192 ) The operator "Δ" is defined as follows:
based on the above formula, there are 2 cases of particle velocity update. For P ib,j =X ij (P gb,j =X ij ) The velocity component (the particle current position turnover rate) should be reduced because both the particle i current optimal position (global optimal position) and the particle current position have a common knowledge (position vector j-th bit optimal state is 1 (0)). For P ib,j ≠X ij (P gb,j ≠X ij ) The velocity component (the particle current position turnover rate) should be increased and the particle current position is guided by the particle i best position (global best position) to change to a better position. This is reasonable because the global best position and the particle i best position represent a better solution. From the perspective of the neural network architecture, the defined "delta" operation actually reflects the adjustment of the best network architecture to the current network architecture.
In the examples of the present application, the position X of particle i is given i =(x i1 ,x i2 ,x i3 ,...,x i192 ) And velocity V i =(v i1 ,v i2 ,v i3 ,…,v i192 ). The operator "Θ" is defined as follows:
wherein r is ij Is [0,1]]The number of the random obtained in the uniform distribution in the range is more than or equal to 1 and less than or equal to 192. From the above equation, the j-th bit value v of the next generation velocity vector of particle i ij (t+1) increasing, the position vector of particle i, bit j, x ij (t+1) is an increase in probability of occurrence of rollover; jth bit value v of particle i next generation velocity vector ij (t+1) decrease, position vector j of particle i, bit x ij (t+1) the probability of a flip occurring decreases. A diagram for the "Θ" operation is shown as an example in fig. 4.
(e) Judging whether the iteration termination condition is met, and stopping iteration if the iteration termination condition is met; otherwise, jumping to step (b).
In this embodiment of the present application, the termination condition may be that the iteration number reaches a preset number.
S104, decoding the optimal code to obtain the optimized target network architecture.
Once the entire population evolution process is complete, the fitness value of many particles is low. Here, the application focuses on fitness values, and therefore, global best particles are selected for complete training in the embodiments of the application. The position vector of the global optimal particle is firstly decoded into a network architecture of a corresponding neural network, and then the network architecture is trained. The decoding process corresponds to the encoding process described in the above embodiment, and the rule used for decoding coincides with the rule used for encoding.
For the decoded network architecture, the training is performed for 240 cycles using a random gradient descent method, wherein the learning rate of the first 100 cycles is 10 -2 The learning rate of the next 80 cycles was 10 -3 The learning rate of the next 40 cycles is 10 -4 The learning rate of the last 20 cycles is 10 -5 . During training, the momentum factor is set to 0.9 and the weight decay (L2 regularization) is set to 1e-4.
In the embodiment of the application, the recognition error rate is used as an index for evaluating the neural network, and the calculation formula is as follows:
where m is the number of pictures in the test dataset and n is the number of pictures that the convolutional neural network erroneously recognizes in the test dataset. Thus, the smaller the value in the range of [0,1], the better the performance of the neural network on the image classification task.
Alternatively, the resources consumed by the algorithm may also be used as an evaluation index. The calculation formula is as follows:
G t =G×t;
where G represents the number of Graphics Processors (GPUs) used in the algorithm experiment setup and t represents the time spent executing the network architecture optimization algorithm once. t is in hours. When G t The lower the value, the higher the optimization efficiency of the algorithm is proved.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.
Corresponding to the network architecture optimization method described in the above embodiments, fig. 5 is a block diagram of the network architecture optimization apparatus provided in the embodiment of the present application, and for convenience of explanation, only the portion related to the embodiment of the present application is shown.
Referring to fig. 5, the apparatus includes:
an obtaining unit 51, configured to obtain a network architecture set, where the network architecture set includes a plurality of network architectures;
an encoding unit 52, configured to encode each network architecture in the network architecture set to obtain an encoded set;
a search unit 53, configured to iteratively search out an optimal code in the code set based on a particle swarm algorithm;
and the decoding unit 54 is configured to decode the optimal code to obtain an optimized target network architecture.
Optionally, each of the network architectures in the set of network architectures includes at least one sub-network, and each sub-network includes at least one network component therein.
Optionally, the encoding unit 52 is further configured to:
for each network architecture, respectively encoding the topological structure of each sub-network in the network architecture to obtain the structural code of each sub-network;
coding the number of the network components contained in each sub-network to obtain the depth coding of each sub-network;
and generating codes corresponding to the network architecture according to the structural codes and the depth codes of each sub-network.
Optionally, the encoding unit 52 is further configured to:
generating binary code values for connection relations between every two network components in the sub-network for each sub-network in the network architecture;
and combining the binary code values according to the connection sequence of the network components to generate the structural codes of the sub-network.
Optionally, the encoding unit 52 is further configured to:
for each sub-network, calculating a difference between the number of components of the network components contained in the sub-network and the minimum value in a data set, wherein the data set comprises the number of components of the network components contained in each sub-network in the network architecture;
converting the number difference into a binary code;
the converted binary code is determined as the depth code of the sub-network.
Optionally, the search unit 53 is further configured to:
searching a first target code in the current code set according to a preset target function in the process of iterative searching each time;
if the number of completed iterative optimization reaches the preset iterative number, determining the first target code as the optimal code;
if the number of the completed iterative optimization does not reach the preset iterative number, updating the codes in the code set according to the first target code to obtain the updated code set;
and continuing searching for a second target code in the updated code set according to the target function until the preset iteration times are reached.
Optionally, the objective function is:
wherein X and Y are each one of the codes in the code set, S b For the code set, the F (X) represents an evaluation index of the network architecture corresponding to the X, and the F (Y) represents an evaluation index of the network architecture corresponding to the Y.
It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein again.
In addition, the network architecture optimization device shown in fig. 5 may be a software unit, a hardware unit, or a unit combining soft and hard, which are built in an existing terminal device, or may be integrated into the terminal device as an independent pendant, or may exist as an independent terminal device.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
Fig. 6 is a schematic structural diagram of a terminal device provided in an embodiment of the present application. As shown in fig. 6, the terminal device 6 of this embodiment includes: at least one processor 60 (only one shown in fig. 6), a memory 61, and a computer program 62 stored in the memory 61 and executable on the at least one processor 60, the processor 60 implementing the steps in any of the various network architecture optimization method embodiments described above when executing the computer program 62.
The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that fig. 6 is merely an example of the terminal device 6 and is not meant to be limiting as to the terminal device 6, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.
The processor 60 may be a central processing unit (Central Processing Unit, CPU), the processor 60 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 61 may in some embodiments be an internal storage unit of the terminal device 6, such as a hard disk or a memory of the terminal device 6. The memory 61 may in other embodiments also be an external storage device of the terminal device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 6. Further, the memory 61 may also include both an internal storage unit and an external storage device of the terminal device 6. The memory 61 is used for storing an operating system, an application program, a Boot Loader (Boot Loader), data, other programs, etc., such as program codes of the computer program. The memory 61 may also be used for temporarily storing data that has been output or is to be output.
Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps that may implement the various method embodiments described above.
The present embodiments provide a computer program product which, when run on a terminal device, causes the terminal device to perform steps that enable the respective method embodiments described above to be implemented.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to an apparatus/terminal device, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (5)

1. A method for optimizing a network architecture, comprising:
acquiring a network architecture set, wherein the network architecture set comprises a plurality of network architectures;
encoding each network architecture in the network architecture set to obtain an encoding set;
iteratively searching out an optimal code in the code set based on a particle swarm algorithm;
decoding the optimal code to obtain an optimized target network architecture;
each of the network architectures in the set of network architectures includes at least one sub-network, each sub-network including at least one network component therein;
the encoding each network architecture in the network architecture set to obtain an encoded set includes:
for each network architecture, respectively encoding the topological structure of each sub-network in the network architecture to obtain the structural code of each sub-network;
coding the number of the network components contained in each sub-network to obtain the depth coding of each sub-network;
generating codes corresponding to the network architecture according to the structure codes and the depth codes of each sub-network;
the encoding the number of components of the network components contained in each sub-network to obtain a depth code of each sub-network includes:
for each sub-network, calculating a difference between the number of components of the network components contained in the sub-network and the minimum value in a data set, wherein the data set comprises the number of components of the network components contained in each sub-network in the network architecture;
converting the number difference into a binary code;
determining the converted binary code as the depth code of the sub-network;
the iterative search of the optimal code in the code set based on the particle swarm algorithm comprises the following steps:
searching a first target code in the current code set according to a preset target function in the process of iterative searching each time;
if the number of completed iterative optimization reaches the preset iterative number, determining the first target code as the optimal code;
if the number of the completed iterative optimization does not reach the preset iterative number, updating the codes in the code set according to the first target code to obtain the updated code set;
continuing searching a second target code in the updated code set according to the target function until the preset iteration times are reached;
the objective function is:
wherein X and Y are each one of the codes in the code set, S b And for the coding set, F (X) represents an evaluation index of the network architecture corresponding to X, and F (X) represents an identification error rate obtained when the convolutional neural network corresponding to X is evaluated on the image test set.
2. The network architecture optimization method of claim 1, wherein for each of the network architectures, the topology of each of the sub-networks in the network architecture is encoded separately to obtain a structural code for each of the sub-networks, comprising:
generating binary code values for connection relations between every two network components in the sub-network for each sub-network in the network architecture;
and combining the binary code values according to the connection sequence of the network components to generate the structural codes of the sub-network.
3. A network architecture optimization apparatus, comprising:
the system comprises an acquisition unit, a storage unit and a control unit, wherein the acquisition unit is used for acquiring a network architecture set, and the network architecture set comprises a plurality of network architectures;
the coding unit is used for coding each network architecture in the network architecture set to obtain a coding set;
the searching unit is used for iteratively searching out the optimal code in the code set based on a particle swarm algorithm;
the decoding unit is used for decoding the optimal code to obtain an optimized target network architecture;
the coding unit is further configured to, in the network architecture set, each network architecture includes at least one sub-network, where each sub-network includes at least one network component;
for each network architecture, respectively encoding the topological structure of each sub-network in the network architecture to obtain the structural code of each sub-network;
coding the number of the network components contained in each sub-network to obtain the depth coding of each sub-network;
generating codes corresponding to the network architecture according to the structure codes and the depth codes of each sub-network;
for each sub-network, calculating a difference between the number of components of the network components contained in the sub-network and the minimum value in a data set, wherein the data set comprises the number of components of the network components contained in each sub-network in the network architecture;
converting the number difference into a binary code;
determining the converted binary code as the depth code of the sub-network;
the searching unit is further used for searching a first target code in the current code set according to a preset target function in the process of each iterative search;
if the number of completed iterative optimization reaches the preset iterative number, determining the first target code as the optimal code;
if the number of the completed iterative optimization does not reach the preset iterative number, updating the codes in the code set according to the first target code to obtain the updated code set;
continuing searching a second target code in the updated code set according to the target function until the preset iteration times are reached;
the objective function is:
wherein X and Y are each one of the codes in the code set, S b And for the coding set, F (X) represents an evaluation index of the network architecture corresponding to X, and F (X) represents an identification error rate obtained when the convolutional neural network corresponding to X is evaluated on the image test set.
4. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 2 when executing the computer program.
5. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 2.
CN202110914528.XA 2021-08-10 2021-08-10 Network architecture optimization method, terminal equipment and computer readable storage medium Active CN113780518B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110914528.XA CN113780518B (en) 2021-08-10 2021-08-10 Network architecture optimization method, terminal equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110914528.XA CN113780518B (en) 2021-08-10 2021-08-10 Network architecture optimization method, terminal equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113780518A CN113780518A (en) 2021-12-10
CN113780518B true CN113780518B (en) 2024-03-08

Family

ID=78837288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110914528.XA Active CN113780518B (en) 2021-08-10 2021-08-10 Network architecture optimization method, terminal equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113780518B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287985A (en) * 2019-05-15 2019-09-27 江苏大学 A kind of deep neural network image-recognizing method based on the primary topology with Mutation Particle Swarm Optimizer
CN110443364A (en) * 2019-06-21 2019-11-12 深圳大学 A kind of deep neural network multitask hyperparameter optimization method and device
US10776691B1 (en) * 2015-06-23 2020-09-15 Uber Technologies, Inc. System and method for optimizing indirect encodings in the learning of mappings
CN112836794A (en) * 2021-01-26 2021-05-25 深圳大学 Method, device and equipment for determining image neural architecture and storage medium
CN112906865A (en) * 2021-02-19 2021-06-04 深圳大学 Neural network architecture searching method and device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3574453A1 (en) * 2017-02-23 2019-12-04 Google LLC Optimizing neural network architectures
US11308399B2 (en) * 2018-01-04 2022-04-19 Jean-Patrice Glafkidès Method for topological optimization of graph-based models

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10776691B1 (en) * 2015-06-23 2020-09-15 Uber Technologies, Inc. System and method for optimizing indirect encodings in the learning of mappings
CN110287985A (en) * 2019-05-15 2019-09-27 江苏大学 A kind of deep neural network image-recognizing method based on the primary topology with Mutation Particle Swarm Optimizer
CN110443364A (en) * 2019-06-21 2019-11-12 深圳大学 A kind of deep neural network multitask hyperparameter optimization method and device
CN112836794A (en) * 2021-01-26 2021-05-25 深圳大学 Method, device and equipment for determining image neural architecture and storage medium
CN112906865A (en) * 2021-02-19 2021-06-04 深圳大学 Neural network architecture searching method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A NOVEL CLONAL SELECTION ALGORITHM FOR COMMUNITY DETECTION IN COMPLEX NETWORKS;Devika Chhachhiya et al.;《2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence》;20170608;第442-464页 *
一种新的神经树网络模型优化方法;向来生;齐峰;刘希玉;;控制与决策;20130115(01);第76-80、86页 *

Also Published As

Publication number Publication date
CN113780518A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
TWI791610B (en) Method and apparatus for quantizing artificial neural network and floating-point neural network
Kurach et al. Neural random-access machines
JP6989387B2 (en) Quanton representation for emulating quantum similarity computations in classical processors
US11334671B2 (en) Adding adversarial robustness to trained machine learning models
CN110647920A (en) Transfer learning method and device in machine learning, equipment and readable medium
US11321625B2 (en) Quantum circuit optimization using machine learning
US11681796B2 (en) Learning input preprocessing to harden machine learning models
CN112001498A (en) Data identification method and device based on quantum computer and readable storage medium
CN107958285A (en) The mapping method and device of the neutral net of embedded system
US20240095563A1 (en) Quantum convolution operator
CN111435461B (en) Antagonistic input recognition using reduced accuracy deep neural networks
CN111950692B (en) Robust output coding based on hamming distance for improved generalization
CN113298152B (en) Model training method, device, terminal equipment and computer readable storage medium
Douillard et al. Tackling catastrophic forgetting and background shift in continual semantic segmentation
CN114792378A (en) Quantum image identification method and device
CN114358319B (en) Machine learning framework-based classification method and related device
CN114358216B (en) Quantum clustering method based on machine learning framework and related device
CN112446888A (en) Processing method and processing device for image segmentation model
CN117437494A (en) Image classification method, system, electronic equipment and storage medium
CN112086144A (en) Molecule generation method, molecule generation device, electronic device, and storage medium
CN112364198B (en) Cross-modal hash retrieval method, terminal equipment and storage medium
Fonseca et al. Model-agnostic approaches to handling noisy labels when training sound event classifiers
CN111126501B (en) Image identification method, terminal equipment and storage medium
CN113780518B (en) Network architecture optimization method, terminal equipment and computer readable storage medium
WO2023078009A1 (en) Model weight acquisition method and related system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant