CN113780518A - Network architecture optimization method, terminal device and computer-readable storage medium - Google Patents

Network architecture optimization method, terminal device and computer-readable storage medium Download PDF

Info

Publication number
CN113780518A
CN113780518A CN202110914528.XA CN202110914528A CN113780518A CN 113780518 A CN113780518 A CN 113780518A CN 202110914528 A CN202110914528 A CN 202110914528A CN 113780518 A CN113780518 A CN 113780518A
Authority
CN
China
Prior art keywords
network
code
network architecture
sub
architecture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110914528.XA
Other languages
Chinese (zh)
Other versions
CN113780518B (en
Inventor
马里佳
李坚强
林秋镇
黄兴
邵增洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202110914528.XA priority Critical patent/CN113780518B/en
Publication of CN113780518A publication Critical patent/CN113780518A/en
Application granted granted Critical
Publication of CN113780518B publication Critical patent/CN113780518B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Error Detection And Correction (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application is applicable to the technical field of computers, and provides a network architecture optimization method, terminal equipment and a computer-readable storage medium, wherein the method comprises the following steps: acquiring a network architecture set, wherein the network architecture set comprises a plurality of network architectures; encoding each network architecture in the network architecture set to obtain an encoded set; iteratively searching out an optimal code in the code set based on a particle swarm algorithm; and decoding the optimal code to obtain an optimized target network architecture. By the method, the optimization efficiency of the network architecture can be effectively improved, and the optimization of the performance of the convolutional neural network is ensured.

Description

Network architecture optimization method, terminal device and computer-readable storage medium
Technical Field
The present application belongs to the field of computer technologies, and in particular, to a network architecture optimization method, a terminal device, and a computer-readable storage medium.
Background
With the development of artificial intelligence, deep learning is more and more widely applied. For example, deep learning may be applied in the fields of image detection, gesture recognition, speech recognition, and the like. The convolutional neural network is a classic and widely-used network in deep learning, and the performance of the convolutional neural network is usually highly related to the network architecture of the convolutional neural network.
At present, a network architecture of a convolutional neural network is generally preset, and the network architecture is manually optimized according to performance indexes of the network in an application process. The existing optimization method of the network architecture has low efficiency and cannot ensure the optimization of the performance of the convolutional neural network.
Disclosure of Invention
The embodiment of the application provides a network architecture optimization method, terminal equipment and a computer readable storage medium, which can effectively improve the optimization efficiency of a network architecture and ensure the optimization of the performance of a convolutional neural network.
In a first aspect, an embodiment of the present application provides a network architecture optimization method, including:
acquiring a network architecture set, wherein the network architecture set comprises a plurality of network architectures;
encoding each network architecture in the network architecture set to obtain an encoded set;
iteratively searching out an optimal code in the code set based on a particle swarm algorithm;
and decoding the optimal code to obtain an optimized target network architecture.
In the embodiment of the application, a network architecture is represented by codes, and then iterative search is carried out on the codes based on a particle swarm algorithm to determine the optimal codes; and finally, decoding the optimal code into an optimized target network structure. By encoding the network architecture and organically combining the particle swarm algorithm with the optimization method of the network architecture, the aim of optimizing the network architecture by using the particle swarm algorithm is fulfilled. In addition, the particle swarm algorithm has the advantages of low calculation cost, few parameters needing to be adjusted, high convergence speed and the like, so that the optimization efficiency of the network architecture is greatly improved.
In one possible implementation manner of the first aspect, each of the network architectures in the set of network architectures includes at least one sub-network, and each sub-network includes at least one network component.
In a possible implementation manner of the first aspect, the encoding each network architecture in the network architecture set to obtain an encoding set includes:
for each network architecture, respectively coding the topological structure of each sub-network in the network architecture to obtain the structure code of each sub-network;
coding the number of the network components contained in each sub-network to obtain the depth code of each sub-network;
and generating codes corresponding to the network architecture according to the structure codes and the depth codes of each sub-network.
In a possible implementation manner of the first aspect, the encoding, for each network architecture, a topology structure of each sub-network in the network architecture to obtain a structure code of each sub-network includes:
for each sub-network in the network architecture, generating a binary code value for a connection relationship between each two of the network components in the sub-network;
combining the binary code values in the order of connection of the network components to generate the structural code of the sub-network.
In a possible implementation manner of the first aspect, the encoding the number of components of the network component included in each sub-network to obtain the depth code of each sub-network includes:
for each sub-network, calculating a difference between the number of components of the network component contained in the sub-network and a minimum number in a data set comprising the number of components of the network component contained in each sub-network of the network architecture;
converting the number difference into a binary code;
determining the converted binary code as the depth code of the sub-network.
In a possible implementation manner of the first aspect, the iteratively searching for the optimal encoding in the encoding set based on the particle swarm optimization includes:
in the process of each iterative search, searching a first target code in the current code set according to a preset target function;
if the number of times of the completed iterative optimization reaches a preset iteration number, determining the first target code as the optimal code;
if the number of times of the completed iterative optimization does not reach the preset iteration number, updating the codes in the code set according to the first target code to obtain the updated code set;
and continuing to search a second target code in the updated code set according to the target function until the preset iteration number is reached.
In a possible implementation manner of the first aspect, the objective function is:
Figure BDA0003204983780000031
wherein X and Y are each one of the codes in the code set, SbFor the encoding set, f (X) represents an evaluation index of the network architecture corresponding to X, and f (Y) represents an evaluation index of the network architecture corresponding to Y.
In a second aspect, an embodiment of the present application provides a network architecture optimization apparatus, including:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a network architecture set which comprises a plurality of network architectures;
the encoding unit is used for encoding each network architecture in the network architecture set to obtain an encoding set;
the searching unit is used for iteratively searching out the optimal code in the code set based on a particle swarm algorithm;
and the decoding unit is used for decoding the optimal code to obtain the optimized target network architecture.
In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor, when executing the computer program, implements the network architecture optimization method according to any one of the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, and the embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, where the computer program, when executed by a processor, implements the network architecture optimization method according to any one of the foregoing first aspects.
In a fifth aspect, an embodiment of the present application provides a computer program product, which, when run on a terminal device, causes the terminal device to execute the network architecture optimization method according to any one of the above first aspects.
It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic flowchart of a network architecture optimization method provided in an embodiment of the present application;
fig. 2 is a schematic diagram of a network architecture provided by an embodiment of the present application;
fig. 3 is a coding diagram of a network architecture provided by an embodiment of the present application;
FIG. 4 is a schematic diagram of particle location update provided by an embodiment of the present application;
fig. 5 is a block diagram illustrating a network architecture optimization apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when.. or" upon "or" in response to a determination "or" in response to a detection ".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise.
In the following embodiments of the present application, a neural network is taken as an example to describe an optimization method of a network architecture. The neural network herein includes various similar neural networks such as a convolutional neural network, a cyclic neural network, a BP neural network, and a residual neural network.
Referring to fig. 1, which is a schematic flow chart of a network architecture optimization method provided in the embodiment of the present application, by way of example and not limitation, the method may include the following steps:
s101, a network architecture set is obtained, and the network architecture set comprises a plurality of network architectures.
In an embodiment of the application, each network architecture of the set of network architectures comprises at least one sub-network, each sub-network comprising at least one network component.
Network components can be extracted from an existing network architecture (e.g., DenseNet, VGGNet) with superior performance, and then at least one network component can be combined into a sub-network. For example, a network component may include sequentially connected convolutional layers, normalization layers, and activation layers.
Fig. 2 is a schematic diagram of a network architecture provided in the embodiment of the present application. The network architecture as shown in fig. 2 includes 4 sub-networks (i.e., phase 1, phase 2, phase 3, and phase 4 shown in fig. 2). Including 4 network components in the first subnetwork (phase 1) (i.e. 4 nodes numbered 1, 2, 3, 4 in phase 1 shown in fig. 2); the second subnetwork (phase 2) comprises 5 network components; the third subnetwork comprises 3 network components; the fourth subnetwork comprises 3 network components. In addition, each subnetwork includes one input node (e.g., the node labeled "I" shown in FIG. 2) and one output node (e.g., the node labeled "O" shown in FIG. 2). The 4 sub-networks are connected in sequence.
In the embodiment of the present application, the network components in each sub-network are sequentially encoded, the input node of each sub-network is connected to all the network components without predecessors in the sub-network (not connected to the preceding network components) by default in a directed manner, and the network components in each sub-network are connected in a directed manner in descending order of number.
As for the network components of stage 1 in fig. 2, each network component is numbered in sequence according to the flow direction order of the network data, the network component 3 is connected to the network component 1, the network component 4 is connected to the network component 3, and the network component 4 is connected to the network component 2. Since neither network component 1 nor 2 has a predecessor network component, the input node needs to connect both network components 1 and 2.
The number of subnetworks in the network architecture and the number of network components in each subnetwork is not particularly limited. Preferably, to facilitate subsequent searches, the number of subnetworks in each network architecture is set to the same number, such as 4; an optional range of the number of network components per sub-network is defined, such as 3 to 10.
S102, each network architecture in the network architecture set is coded to obtain a coding set.
The optimization of the network architecture generally includes the optimization of the network depth and the optimization of the network topology. In the prior art, the two parts are usually optimized separately. In the embodiment of the application, optimization of two parts is considered simultaneously.
In one embodiment, for each network architecture, the process of encoding may include the steps of:
I. and respectively coding the topological structure of each sub-network in the network architecture to obtain the structure code of each sub-network.
Optionally, one implementation of obtaining the structural code may include:
for each sub-network in the network architecture, generating a binary code value for the connection relationship between each two network components in the sub-network; the binary code values are combined to generate a structural code of the subnetwork according to the connection order of the network components.
Specifically, in the s stage, a binary code T with length L is adoptedsTo indicate the topology of the s-th stage, L-1/2 × 10 × 9-45. In the s stage, T s1 st bit code value tableNode pair (N)s,1,Ns,2) The 2 nd bit represents the node pair (N)s,1,Ns,3) The connection state between the two parts is analogized in turn. Wherein L iss=1/2×Ks×(Ks-1),L's=1/2×Ks×(Ks-1)-Ks+1. If the node pair (N)s,i,Ns,j) If the corresponding bit value is 1, it represents the node Ns,iAnd node Ns,jThere is a connection between them; otherwise it means that there is no connection between the two nodes. Finally, TsThe remaining LrBits are filled with 0, Lr=45-Ls. For example, a 4-phase topology may be encoded as a length LTBinary string of, LT=4×45=180。
For example, in phase 1 shown in fig. 2, node 1 is not connected to node 2, and the 1 st bit code value is 0; if the node 1 is connected with the node 3, the 2 nd bit code value is 1; if the node 2 is not connected with the node 1, the 3 rd bit code value is 0; if the node 2 is not connected with the node 3, the 4 th bit code value is 0; if the node 2 is connected with the node 4, the 5 th bit code value is 1; by analogy, the rest digits are complemented by 0 to form 180 digits.
The coding length of the structure in the s stage is set to L bits, but the effective coding length is only Ls bits. By complementing 0, the structure coding length of each stage is unified, the coding position in each stage can be quickly and effectively determined, and the calculation of a subsequent search algorithm is facilitated.
II. The number of components of the network components contained in each sub-network is encoded to obtain the depth code for each sub-network.
Optionally, one implementation of obtaining depth coding may include:
for each sub-network, calculating the difference value between the number of the network components contained in the sub-network and the minimum value in a data set, wherein the data set comprises the number of the network components contained in each sub-network in the network architecture; converting the quantity difference into binary code; and determining the converted binary codes as the depth codes of the sub-networks.
Illustratively, as shown in FIG. 2, in encoding the number of nodes within each phase, only the number of active nodes (nodes other than the input nodes and output nodes) of the phase are considered. The parameter range of the number of active nodes per stage is 3 to 10, i.e. the parameter pool size of the number of nodes is 8(3, 4, 5, 6, 7, 8, 9, 10), where the maximum number is 10, the minimum number is 3, and the difference is 7, and 7 corresponds to a binary code of 111. The number of nodes in each phase need only be represented in 3-bit binary code. The specific encoding method is to subtract 3 (the minimum value in the parameter pool) from the node number Ks in the s-th stage to obtain Ks ', and then convert the Ks' into a binary representation.
When the number of subnetworks is 4, binary coding with the length of 12 bits is used to represent the number of active nodes in 4 stages. The 1 st to 3 rd bit code values in the binary code represent the number of nodes in the first stage, the next 3 bit code values represent the number of nodes in the second stage, and so on. As shown in fig. 2, there are 4 network components in phase 1, 4-1 equals to 1, and the corresponding binary code is 001, i.e. the depth code of phase 1 is 001.
And III, generating codes corresponding to the network architecture according to the structure codes and the depth codes of each sub network.
The structure code and the depth code of each sub-network can be combined into a group of codes in sequence according to the connection sequence of the network architecture, and the group of codes are used as the codes corresponding to the network architecture.
According to the connection sequence of the network architecture, the structure codes of each sub-network are combined into a group of codes, the depth codes of each sub-network are combined into a group of codes, and finally the two groups of codes are combined into the codes corresponding to the network architecture. Fig. 3 is a schematic coding diagram of a network architecture provided in the embodiment of the present application. As shown in fig. 3, the front part is topology coding composed of structure coding of each sub-network, and the rear part is depth coding composed of depth coding of each sub-network.
In the embodiment of the present application, the combination order of the codes is not specifically limited. However, the order of encoding and the order of decoding must be the same.
S103, iteratively searching out the optimal code in the code set based on the particle swarm optimization.
And constructing a searched target function based on the coding rule.
Illustratively, in the encoding rule shown in the embodiment of fig. 2, a 192-bit binary code is generated. Mapping a search space of a network architecture to a binary vector space, the binary vector space being defined as Sb
Figure BDA0003204983780000081
Wherein X is (X)1,x2,x3...,x192). Accordingly, the optimization problem of the network architecture can be modeled as a minimization problem of an objective function f (x) in the embodiments of the present application:
Figure BDA0003204983780000091
where X is the binary space SbCorresponds to the network architecture in the network search space. Different points represent different network architectures. And F (X) represents the identification error rate obtained when the convolutional neural network corresponding to the X is evaluated on the image test set.
The particle swarm algorithm is described below.
(a) And initializing a particle swarm. The population initialization step needs to initialize particle position and velocity vectors, according to a binary coding strategy of a network architecture, the position and the velocity of a particle i are initialized to be the vectors with the length of 192 bits, and the position of the particle i is represented as Xi=(xi1,xi2,xi3,...,xi192) Wherein x isij0 or xij1 is a Bernoulli distribution blObtaining by sampling; blB (0.5). Velocity indicating bit V of particle ii=(vi1,vi2,vi3,…,vi192),vijFor real numbers, we obtain from evenly distributed U samples: u (-1, 1). It should be noted that: the particle position represents the solution of the objective function F (#). After initializing the population, the particle i's maximum needs to be updatedGood position PibGlobal optimum position Pgb。PibIs the same as the particle i position, i.e.: xi=PibAnd P isgbIs set to the best position in the initialization population.
(b) And (4) evaluation of the particles. Firstly, a proper method is selected to initialize the weight of the convolutional neural network architecture. The method for initializing the xaview weight is beneficial to avoiding the convolutional neural network architecture from trapping a partial minimum value in gradient optimization, so that the method for initializing the xaview weight is selected to initialize the convolutional neural network weight in the embodiment of the application. The training data set in the baseline data set is divided into two subdata sets: dtrainAnd Dfitness(ii) a Subdata set DtrainAnd DfitnessThe sample size ratio of (2) is 7: 3. After the positions of the particles are decoded into the corresponding neural network architectures with parameter settings, the neural network architectures are trained on a data set DtrainTraining the neural network in E periods, and then training the neural network after the E periods in the test data set DfitnessAnd (4) testing, wherein the identification error rate obtained by testing is used as the fitness value of the particles.
The embodiment of the application uses a method for effectively evaluating the particle fitness. The particle fitness evaluation method not only relieves the predicament of insufficient computing resources in a laboratory, but also accelerates the evolution process of the whole population, and can obtain the optimal network architecture in effective time.
(c) Updating individual optimal particle positions PibAnd global optimum particle position Pgb。PibAnd PgbThe updating mode can adopt an original binary particle swarm optimization algorithm (BPSO) method. Specifically, first, the fitness value F (X) of the current particle i is calculatedi(t)), and then the fitness value F (P) of the individual best particleib) The lower the value of f (X) is compared, the better the performance of the network architecture corresponding to X is. If F (X)i(t))<F(Pib) Then the optimum position P of the particle iibChange to the current position Xi(t):Pib=Xi(t)。PgbIs the best place visited in the whole population, hence PgbUpdatingThe formula is as follows:
Figure BDA0003204983780000101
where N is the size of the population. And comparing the fitness values of all the individual optimal particles in the population, wherein the individual optimal particle with the lowest fitness value is the global optimal particle.
(d) Particle velocity and position updates. In the particle swarm algorithm, the particles adjust the velocity vector by learning. The new velocity vector helps the particles to fly to the desired region. In the existing particle swarm optimization algorithm, the defined particle state update rule does not have the discrete environment in the application, so the embodiment of the application redefines the update principle according to the newly defined particle state. The redefined particle state update rule is as follows:
Vi=sig(wVi+c1r1(PgbΔXi)+c2r2(PibΔXi)),
Xi=XiΘVi
in the above formula, ViIndicating a change in the current particle position XiW represents the inertia term of the current velocity to the new velocity, and the range of the value of w is (0,1) and is set by the user. r is1And r2Are two random variables in the range (0,1) that are updated each iteration. And c1,c2Also two fixed variables determined by the user. sig (—) is a sigmoid function. The sigmoid function converts the real value of the velocity into the probability that a bit at the particle position flips, and the following is a specific expression for the sigmoid function.
Figure BDA0003204983780000102
The symbol "Δ" is analogous to an exclusive-or operation, giving an optimum position Pi=(pi1,pi2,pi3,…,pi192) The operator "Δ"Is defined as follows:
Figure BDA0003204983780000103
based on the above formula, there are 2 cases of particle velocity update methods. For Pib,j=Xij(Pgb,j=Xij) In this case, the velocity component (the particle current position turnover rate) should be reduced because there is a consensus on both the current optimal position of particle i (the global optimal position) and the current position of particle i (the position vector j is the best state of 1 (0)). For Pib,j≠Xij(Pgb,j≠Xij) The velocity component (the particle current position turnover rate) should be increased, and the particle current position is guided by the optimal position (global optimal position) of the particle i to change to a more optimal position. This is reasonable because the global optimal position and the particle i optimal position represent a better solution. From the perspective of the neural network architecture, the defined "Δ" operation actually reflects the adjustment of the optimal network architecture to the current network architecture.
In the present embodiment, the particle i position X is giveni=(xi1,xi2,xi3,...,xi192) And velocity Vi=(vi1,vi2,vi3,…,vi192). The operator "Θ" is defined as follows:
Figure BDA0003204983780000111
wherein r isijIs [0,1]]J is more than or equal to 1 and less than or equal to 192. From the above formula, the j-th bit value v of the next generation velocity vector of the particle iij(t +1) increases, the position vector of the particle i is the jth position xij(t +1) is an increase in the probability of rollover; j-th bit value v of particle i next generation velocity vectorij(t +1) decreases and the position vector of the particle i is the jth xijThe probability of the (t +1) rollover is reduced. An illustration of the operation with "Θ" is shown in the example of fig. 4.
(e) Judging whether the iteration termination condition is met, and if so, stopping iteration; otherwise, jumping to the step (b).
In this embodiment of the present application, the termination condition may be that the number of iterations reaches a preset number.
And S104, decoding the optimal code to obtain an optimized target network architecture.
Once the entire population evolution process is complete, the fitness values for many particles are low. Here, the present application focuses on the fitness value, and therefore, in the embodiment of the present application, the global optimal particle is selected for full training. The position vector of the global optimal particle is firstly decoded into a network architecture of a corresponding neural network, and then the network architecture is trained. The decoding process corresponds to the encoding process described in the above embodiments, and the rule used for decoding corresponds to the rule used for encoding.
Illustratively, for the decoded network architecture, 240 cycles are trained using a stochastic gradient descent method, wherein the learning rate of the first 100 cycles is 10-2The learning rate of the next 80 cycles is 10-3And the learning rate of the next 40 cycles is 10-4And the learning rate of the last 20 cycles is 10-5. During training, the momentum factor is set to 0.9 and the weight decay (L2 regularization) is set to 1 e-4.
In the embodiment of the application, the recognition error rate is used as an index for evaluating the neural network, and the calculation formula is as follows:
Figure BDA0003204983780000121
where m is the number of pictures in the test data set and n is the number of pictures in the test data set that were misidentified by the convolutional neural network. Therefore, the smaller the value is in the range of [0,1], the better the performance of the neural network on the image classification task is represented.
Optionally, the resource consumed by the algorithm may also be used as an evaluation index. The calculation formula is as follows:
Gt=G×t;
where G represents the number of Graphics Processors (GPUs) used in the algorithm experiment setup and t represents the time consumed to execute the network architecture optimization algorithm once. t is in hours. When G istThe lower the value, the more efficient the algorithm optimization proves to be.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Corresponding to the network architecture optimization method described in the foregoing embodiment, fig. 5 is a block diagram of a network architecture optimization device provided in the embodiment of the present application, and for convenience of description, only the relevant portions of the embodiment of the present application are shown.
Referring to fig. 5, the apparatus includes:
an obtaining unit 51, configured to obtain a network architecture set, where the network architecture set includes a plurality of network architectures;
an encoding unit 52, configured to encode each network architecture in the network architecture set to obtain an encoded set;
a searching unit 53, configured to iteratively search out an optimal code in the code set based on a particle swarm algorithm;
and a decoding unit 54, configured to decode the optimal code to obtain an optimized target network architecture.
Optionally, each network architecture of the set of network architectures includes at least one sub-network, and each sub-network includes at least one network component.
Optionally, the encoding unit 52 is further configured to:
for each network architecture, respectively coding the topological structure of each sub-network in the network architecture to obtain the structure code of each sub-network;
coding the number of the network components contained in each sub-network to obtain the depth code of each sub-network;
and generating codes corresponding to the network architecture according to the structure codes and the depth codes of each sub-network.
Optionally, the encoding unit 52 is further configured to:
for each sub-network in the network architecture, generating a binary code value for a connection relationship between each two of the network components in the sub-network;
combining the binary code values in the order of connection of the network components to generate the structural code of the sub-network.
Optionally, the encoding unit 52 is further configured to:
for each sub-network, calculating a difference between the number of components of the network component contained in the sub-network and a minimum number in a data set comprising the number of components of the network component contained in each sub-network of the network architecture;
converting the number difference into a binary code;
determining the converted binary code as the depth code of the sub-network.
Optionally, the searching unit 53 is further configured to:
in the process of each iterative search, searching a first target code in the current code set according to a preset target function;
if the number of times of the completed iterative optimization reaches a preset iteration number, determining the first target code as the optimal code;
if the number of times of the completed iterative optimization does not reach the preset iteration number, updating the codes in the code set according to the first target code to obtain the updated code set;
and continuing to search a second target code in the updated code set according to the target function until the preset iteration number is reached.
Optionally, the objective function is:
Figure BDA0003204983780000141
wherein X and Y are each one of the codes in the code set, SbFor the encoding set, f (X) represents an evaluation index of the network architecture corresponding to X, and f (Y) represents an evaluation index of the network architecture corresponding to Y.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
In addition, the network architecture optimization apparatus shown in fig. 5 may be a software unit, a hardware unit, or a combination of software and hardware unit that is built in the existing terminal device, may be integrated into the terminal device as an independent pendant, or may exist as an independent terminal device.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 6, the terminal device 6 of this embodiment includes: at least one processor 60 (only one shown in fig. 6), a memory 61, and a computer program 62 stored in the memory 61 and executable on the at least one processor 60, the processor 60 implementing the steps in any of the various network architecture optimization method embodiments described above when executing the computer program 62.
The terminal device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that fig. 6 is only an example of the terminal device 6, and does not constitute a limitation to the terminal device 6, and may include more or less components than those shown, or combine some components, or different components, such as an input/output device, a network access device, and the like.
The Processor 60 may be a Central Processing Unit (CPU), and the Processor 60 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 61 may in some embodiments be an internal storage unit of the terminal device 6, such as a hard disk or a memory of the terminal device 6. The memory 61 may also be an external storage device of the terminal device 6 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are equipped on the terminal device 6. Further, the memory 61 may also include both an internal storage unit and an external storage device of the terminal device 6. The memory 61 is used for storing an operating system, an application program, a Boot Loader (Boot Loader), data, and other programs, such as program codes of the computer programs. The memory 61 may also be used to temporarily store data that has been output or is to be output.
The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.
The embodiments of the present application provide a computer program product, which when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments when executed.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to an apparatus/terminal device, recording medium, computer Memory, Read-Only Memory (ROM), Random-Access Memory (RAM), electrical carrier wave signals, telecommunications signals, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A method for optimizing a network architecture, comprising:
acquiring a network architecture set, wherein the network architecture set comprises a plurality of network architectures;
encoding each network architecture in the network architecture set to obtain an encoded set;
iteratively searching out an optimal code in the code set based on a particle swarm algorithm;
and decoding the optimal code to obtain an optimized target network architecture.
2. The method of network architecture optimization of claim 1, wherein each of the set of network architectures comprises at least one sub-network, each sub-network comprising at least one network component.
3. The method for optimizing network architecture according to claim 2, wherein said encoding each of the set of network architectures to obtain a set of codes comprises:
for each network architecture, respectively coding the topological structure of each sub-network in the network architecture to obtain the structure code of each sub-network;
coding the number of the network components contained in each sub-network to obtain the depth code of each sub-network;
and generating codes corresponding to the network architecture according to the structure codes and the depth codes of each sub-network.
4. The method according to claim 3, wherein the encoding the topology of each sub-network in the network architecture separately for each network architecture to obtain the structure code of each sub-network comprises:
for each sub-network in the network architecture, generating a binary code value for a connection relationship between each two of the network components in the sub-network;
combining the binary code values in the order of connection of the network components to generate the structural code of the sub-network.
5. The method of claim 3, wherein the encoding the number of components of the network component contained in each sub-network to obtain the depth code for each sub-network comprises:
for each sub-network, calculating a difference between the number of components of the network component contained in the sub-network and a minimum number in a data set comprising the number of components of the network component contained in each sub-network of the network architecture;
converting the number difference into a binary code;
determining the converted binary code as the depth code of the sub-network.
6. The method for optimizing network architecture according to claim 1, wherein the iteratively searching for the optimal code in the set of codes based on the particle swarm optimization comprises:
in the process of each iterative search, searching a first target code in the current code set according to a preset target function;
if the number of times of the completed iterative optimization reaches a preset iteration number, determining the first target code as the optimal code;
if the number of times of the completed iterative optimization does not reach the preset iteration number, updating the codes in the code set according to the first target code to obtain the updated code set;
and continuing to search a second target code in the updated code set according to the target function until the preset iteration number is reached.
7. The method of network architecture optimization of claim 6, wherein the objective function is:
Figure FDA0003204983770000021
wherein X and Y are each one of the codes in the code set, SbFor the encoding set, f (X) represents an evaluation index of the network architecture corresponding to X, and f (Y) represents an evaluation index of the network architecture corresponding to Y.
8. A network architecture optimization apparatus, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a network architecture set which comprises a plurality of network architectures;
the encoding unit is used for encoding each network architecture in the network architecture set to obtain an encoding set;
the searching unit is used for iteratively searching out the optimal code in the code set based on a particle swarm algorithm;
and the decoding unit is used for decoding the optimal code to obtain the optimized target network architecture.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202110914528.XA 2021-08-10 2021-08-10 Network architecture optimization method, terminal equipment and computer readable storage medium Active CN113780518B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110914528.XA CN113780518B (en) 2021-08-10 2021-08-10 Network architecture optimization method, terminal equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110914528.XA CN113780518B (en) 2021-08-10 2021-08-10 Network architecture optimization method, terminal equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113780518A true CN113780518A (en) 2021-12-10
CN113780518B CN113780518B (en) 2024-03-08

Family

ID=78837288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110914528.XA Active CN113780518B (en) 2021-08-10 2021-08-10 Network architecture optimization method, terminal equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113780518B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190205762A1 (en) * 2018-01-04 2019-07-04 Datavaloris S.A.S. Method for topological optimization of graph-based models
CN110287985A (en) * 2019-05-15 2019-09-27 江苏大学 A kind of deep neural network image-recognizing method based on the primary topology with Mutation Particle Swarm Optimizer
CN110443364A (en) * 2019-06-21 2019-11-12 深圳大学 A kind of deep neural network multitask hyperparameter optimization method and device
US20190370659A1 (en) * 2017-02-23 2019-12-05 Google Llc Optimizing neural network architectures
US10776691B1 (en) * 2015-06-23 2020-09-15 Uber Technologies, Inc. System and method for optimizing indirect encodings in the learning of mappings
CN112836794A (en) * 2021-01-26 2021-05-25 深圳大学 Method, device and equipment for determining image neural architecture and storage medium
CN112906865A (en) * 2021-02-19 2021-06-04 深圳大学 Neural network architecture searching method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10776691B1 (en) * 2015-06-23 2020-09-15 Uber Technologies, Inc. System and method for optimizing indirect encodings in the learning of mappings
US20190370659A1 (en) * 2017-02-23 2019-12-05 Google Llc Optimizing neural network architectures
US20190205762A1 (en) * 2018-01-04 2019-07-04 Datavaloris S.A.S. Method for topological optimization of graph-based models
CN110287985A (en) * 2019-05-15 2019-09-27 江苏大学 A kind of deep neural network image-recognizing method based on the primary topology with Mutation Particle Swarm Optimizer
CN110443364A (en) * 2019-06-21 2019-11-12 深圳大学 A kind of deep neural network multitask hyperparameter optimization method and device
CN112836794A (en) * 2021-01-26 2021-05-25 深圳大学 Method, device and equipment for determining image neural architecture and storage medium
CN112906865A (en) * 2021-02-19 2021-06-04 深圳大学 Neural network architecture searching method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DEVIKA CHHACHHIYA ET AL.: "A NOVEL CLONAL SELECTION ALGORITHM FOR COMMUNITY DETECTION IN COMPLEX NETWORKS", 《2017 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING - CONFLUENCE》, 8 June 2017 (2017-06-08), pages 442 - 464 *
向来生;齐峰;刘希玉;: "一种新的神经树网络模型优化方法", 控制与决策, no. 01, 15 January 2013 (2013-01-15), pages 76 - 80 *

Also Published As

Publication number Publication date
CN113780518B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
CN111639710B (en) Image recognition model training method, device, equipment and storage medium
Wang et al. Kvt: k-nn attention for boosting vision transformers
JP6989387B2 (en) Quanton representation for emulating quantum similarity computations in classical processors
CN111414987B (en) Training method and training device of neural network and electronic equipment
CN110728350B (en) Quantization for machine learning models
CN111382868A (en) Neural network structure search method and neural network structure search device
CN110673840A (en) Automatic code generation method and system based on tag graph embedding technology
CN110138595A (en) Time link prediction technique, device, equipment and the medium of dynamic weighting network
US11900243B2 (en) Spiking neural network-based data processing method, computing core circuit, and chip
US11681796B2 (en) Learning input preprocessing to harden machine learning models
US20220147877A1 (en) System and method for automatic building of learning machines using learning machines
CN115104105A (en) Antagonistic autocoder architecture for graph-to-sequence model approach
CN111950692B (en) Robust output coding based on hamming distance for improved generalization
CN111382555A (en) Data processing method, medium, device and computing equipment
CN112086144B (en) Molecule generation method, device, electronic equipment and storage medium
CN114358319B (en) Machine learning framework-based classification method and related device
WO2022126448A1 (en) Neural architecture search method and system based on evolutionary learning
Sarkar et al. An algorithm for DNA read alignment on quantum accelerators
CN111310743B (en) Face recognition method and device, electronic equipment and readable storage medium
CN114358216B (en) Quantum clustering method based on machine learning framework and related device
US20230289618A1 (en) Performing knowledge graph embedding using a prediction model
Dong et al. Refinement Co‐supervision network for real‐time semantic segmentation
CN112364198B (en) Cross-modal hash retrieval method, terminal equipment and storage medium
CN114764619A (en) Convolution operation method and device based on quantum circuit
CN117351299A (en) Image generation and model training method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant