CN115329924A - Neural network structure determination method and device and related products - Google Patents

Neural network structure determination method and device and related products Download PDF

Info

Publication number
CN115329924A
CN115329924A CN202110509864.6A CN202110509864A CN115329924A CN 115329924 A CN115329924 A CN 115329924A CN 202110509864 A CN202110509864 A CN 202110509864A CN 115329924 A CN115329924 A CN 115329924A
Authority
CN
China
Prior art keywords
hyper
network
parameter
neural network
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110509864.6A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cambrian Jixingge Nanjing Technology Co ltd
Original Assignee
Cambrian Jixingge Nanjing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cambrian Jixingge Nanjing Technology Co ltd filed Critical Cambrian Jixingge Nanjing Technology Co ltd
Priority to CN202110509864.6A priority Critical patent/CN115329924A/en
Publication of CN115329924A publication Critical patent/CN115329924A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The present disclosure relates to a neural network structure determination method and apparatus, and related products, including a combinatorial processing apparatus including a computational processing apparatus, an interface apparatus, other processing apparatus, and a storage apparatus. The computing processing device may include one or more computing devices. The computing processing device may be configured to perform user-specified operations, and the computing processing device may be implemented as a single-core artificial intelligence processor or a multi-core artificial intelligence processor, or one or more computing devices included within the computing processing device may be implemented as an artificial intelligence processor core or a partial hardware structure of an artificial intelligence processor core. By using the above combined processing device, the present disclosure can improve the operation efficiency of the related product when performing the operation of the neural network model.

Description

Neural network structure determination method and device and related products
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for determining a neural network structure, and a related product.
Background
Both the network structure and the network training superparameters can influence the precision of the neural network, however, in the training process, a group of superparameters is usually preset, and under the group of superparameters, network weight training and network structure searching are alternately carried out, and finally an optimal structure is obtained. However, under different training hyper-parameters, neural networks with different accuracies are obtained, and therefore, if a set of hyper-parameters is preset, the final network structure may be inaccurate.
Disclosure of Invention
The disclosure provides a neural network structure determination method and device, an electronic device and a storage medium.
According to an aspect of the present disclosure, there is provided a neural network structure determining method including: training a first sub-network of a current first neural network according to a hyper-parameter of a first hyper-parameter adjustment cycle to obtain a second neural network of the first hyper-parameter adjustment cycle, wherein the first neural network comprises a plurality of network levels, each network level comprises a plurality of network blocks, the first sub-network comprises randomly selected network blocks in the plurality of network blocks of each network level of the first neural network, and the hyper-parameter of the first hyper-parameter adjustment cycle is obtained by selection in a current hyper-parameter set; performing hyper-parameter regulation processing according to a first set of a current hyper-parameter regulation cycle and a second set of the current hyper-parameter regulation cycle, determining a hyper-parameter of a next hyper-parameter regulation cycle from the current hyper-parameter set, and determining a second neural network of the next hyper-parameter regulation cycle, wherein the first set of the current hyper-parameter regulation cycle comprises a hyper-parameter of the current hyper-parameter regulation cycle and a hyper-parameter of a historical hyper-parameter regulation cycle, and the second set of the current hyper-parameter regulation cycle comprises a verification result of the second neural network of the current hyper-parameter regulation cycle and a verification result of the second neural network of the historical hyper-parameter regulation cycle; determining a third neural network and an optimized hyper-parameter set according to the first set and the second set of the last hyper-parameter regulation period and a second neural network of a plurality of hyper-parameter regulation periods, wherein the number of hyper-parameters in the optimized hyper-parameter set is smaller than the number of hyper-parameters in the current hyper-parameter set; and obtaining a target neural network according to the third neural network and the optimized hyper-parameter set.
In one possible implementation, the method further includes: randomly sampling network blocks of each network level of the second neural network respectively, and determining a first network block from each network level; determining a second sub-network according to the first network block of each network hierarchy; processing the training samples with the labeling information through the second sub-network, and determining a first processing result of the second sub-network; determining the accuracy of the second sub-network according to the first processing result and the marking information; and determining the verification result of the second neural network according to the accuracy rates of the plurality of second sub-networks.
In one possible implementation, the verification result includes an average accuracy of the plurality of second sub-networks.
In one possible implementation, determining the third neural network and the optimized hyper-parameter set according to the first set and the second set of the last hyper-parameter accommodation cycle and the second neural network of the plurality of hyper-parameter accommodation cycles includes: determining a target hyper-parameter in the first set of the last hyper-parameter adjustment period according to the second set of the last hyper-parameter adjustment period; determining a second neural network of a hyper-parameter regulation cycle corresponding to the target hyper-parameter as the third neural network; and according to the second set of the last hyper-parameter adjustment period or the target hyper-parameter, reducing the hyper-parameter set of the current structure adjustment period to obtain the optimized hyper-parameter set.
In a possible implementation manner, narrowing the hyper-parameter set of the current structure adjustment period according to the second set of the last hyper-parameter adjustment period or the target hyper-parameter, and obtaining the optimized hyper-parameter set, includes: and under the condition that the hyper-parameters are continuous variables, keeping the hyper-parameters in a preset neighborhood taking the target hyper-parameters as the center, and removing the hyper-parameters outside the preset neighborhood.
In a possible implementation manner, narrowing the super-parameter set of the current structure adjusting period according to the second set of the last super-parameter adjusting period or the target super-parameter, and obtaining the optimized super-parameter set includes: and under the condition that the hyper-parameters are discrete variables, removing first hyper-parameters, wherein the verification result of the second neural network corresponding to the first hyper-parameters does not meet the accuracy condition.
In a possible implementation manner, the obtaining a target neural network according to the third neural network and the optimized hyper-parameter set includes: determining the first neural network of the next structural regulation cycle according to the verification result of the third neural network, wherein the number of network blocks included in each network layer of the first neural network of the next structural regulation cycle is less than that of the network blocks included in each network layer of the first neural network of the current structural regulation cycle; and obtaining the target neural network according to the network of the first neural network of the next structure regulation period and the hyper-parameter set of the next structure regulation period, wherein the target neural network is the first neural network with the structure meeting the structure condition.
In a possible implementation manner, the validation result of the third neural network includes validation results of a plurality of third sub-networks, and the third sub-networks include randomly selected network blocks in a plurality of network blocks of each network hierarchy of the third neural network, wherein determining the first neural network of the next structure adjustment cycle according to the validation result of the third neural network includes: determining a fourth sub-network which does not meet the preset accuracy requirement in the plurality of third sub-networks according to the verification results of the plurality of third sub-networks; counting the network blocks included in each network level in the preset number of fourth sub-networks, and determining second network blocks in each network level in the preset number of fourth sub-networks, wherein the second network blocks are the network blocks which meet the number requirement in each network level of the preset number of fourth sub-networks; and in the third neural network, removing the second network block to obtain the first neural network of the next structure regulation period.
In one possible implementation, the plurality of third sub-networks include a sub-network obtained by randomly sampling a plurality of network blocks of each network hierarchy of the second neural network corresponding to the third neural network when determining the verification result in the second set, and a sub-network obtained by randomly sampling a plurality of network blocks of each network hierarchy of the third neural network after determining the third neural network.
In one possible implementation, the method further includes: and training the target neural network through the randomly selected hyper-parameters in the hyper-parameter set of the structure adjusting period corresponding to the target neural network and the training sample with the labeled information to obtain the trained target neural network.
In one possible implementation, the target neural network is configured to perform any one of a text recognition task, an audio processing task, an image processing task, and a video processing task.
According to another aspect of the present disclosure, there is provided a neural network structure determining apparatus, the apparatus including: the first training module is used for training a first sub-network of a current first neural network according to a hyper-parameter of a first hyper-parameter regulation cycle to obtain a second neural network of the first hyper-parameter regulation cycle, wherein the first neural network comprises a plurality of network levels, each network level comprises a plurality of network blocks, the first sub-network comprises randomly selected network blocks in the plurality of network blocks of each network level of the first neural network, and the hyper-parameter of the first hyper-parameter regulation cycle is obtained by selection in a current hyper-parameter set; the hyper-parameter adjusting module is used for performing hyper-parameter adjusting processing according to a first set of a current hyper-parameter adjusting period and a second set of the current hyper-parameter adjusting period, determining hyper-parameters of a next hyper-parameter adjusting period from the current hyper-parameter set, and determining a second neural network of the next hyper-parameter adjusting period, wherein the first set of the current hyper-parameter adjusting period comprises the hyper-parameters of the current hyper-parameter adjusting period and the hyper-parameters of a historical hyper-parameter adjusting period, and the second set of the current hyper-parameter adjusting period comprises the verification result of the second neural network of the current hyper-parameter adjusting period and the verification result of the second neural network of the historical hyper-parameter adjusting period; the first optimization module is used for determining a third neural network and an optimized hyper-parameter set according to the first set and the second set of the last hyper-parameter regulation period and the second neural network of a plurality of hyper-parameter regulation periods, wherein the number of hyper-parameters in the optimized hyper-parameter set is smaller than that in the current hyper-parameter set; and the first obtaining module is used for obtaining the target neural network according to the third neural network and the optimized hyper-parameter set.
In a possible implementation manner, the apparatus further includes a verification result determining module configured to: randomly sampling network blocks of each network level of the second neural network respectively, and determining a first network block from each network level; determining a second sub-network according to the first network blocks of each network hierarchy; processing the training samples with the labeling information through the second sub-network, and determining a first processing result of the second sub-network; determining the accuracy of the second sub-network according to the first processing result and the marking information; and determining the verification result of the second neural network according to the accuracy rates of the plurality of second sub-networks.
In one possible implementation, the verification result includes an average accuracy of the plurality of second sub-networks.
In one possible implementation, the first optimization module is further configured to: determining a target hyper-parameter in the first set of the last hyper-parameter adjustment period according to the second set of the last hyper-parameter adjustment period; determining a second neural network of a hyper-parameter regulation cycle corresponding to the target hyper-parameter as the third neural network; and according to the second set of the last hyper-parameter adjusting period or the target hyper-parameter, reducing the hyper-parameter set of the current structure adjusting period to obtain the optimized hyper-parameter set.
In one possible implementation, the first optimization module is further configured to: and under the condition that the hyper-parameters are continuous variables, preserving the hyper-parameters in a preset neighborhood taking the target hyper-parameters as the center, and removing the hyper-parameters outside the preset neighborhood.
In one possible implementation, the first optimization module is further configured to: and under the condition that the hyper-parameters are discrete variables, removing first hyper-parameters, wherein the verification result of the second neural network corresponding to the first hyper-parameters does not accord with the accuracy condition.
In a possible implementation manner, the structure adjustment cycle includes multiple hyper-parameter adjustment cycles, the current hyper-parameter set is a hyper-parameter set of the current structure adjustment cycle, the optimized hyper-parameter set is a hyper-parameter set of a next hyper-parameter adjustment cycle, the current first neural network is a first neural network of the current structure adjustment cycle, and the first obtaining module is further configured to: determining the first neural network of the next structural regulation cycle according to the verification result of the third neural network, wherein the number of network blocks included in each network layer of the first neural network of the next structural regulation cycle is less than that of the network blocks included in each network layer of the first neural network of the current structural regulation cycle; and obtaining the target neural network according to the network of the first neural network of the next structure regulation period and the hyper-parameter set of the next structure regulation period, wherein the target neural network is the first neural network with the structure meeting the structure condition.
In a possible implementation manner, the verification result of the third neural network includes verification results of a plurality of third sub-networks, the third sub-networks include randomly selected network blocks from a plurality of network blocks of each network hierarchy of the third neural network, and the first obtaining module is further configured to: determining a fourth sub-network which does not meet the preset accuracy requirement in the plurality of third sub-networks according to the verification results of the plurality of third sub-networks; counting the network blocks included in each network hierarchy in the preset number of fourth subnetworks, and determining second network blocks in each network hierarchy in the preset number of fourth subnetworks, wherein the second network blocks are the network blocks in each network hierarchy in the preset number of fourth subnetworks, and the number of the network blocks in each network hierarchy meets the requirement; and in the third neural network, removing the second network block to obtain the first neural network of the next structure regulation period.
In one possible implementation, the plurality of third sub-networks include a sub-network obtained by randomly sampling a plurality of network blocks of each network hierarchy of the second neural network corresponding to the third neural network when determining the verification result in the second set, and a sub-network obtained by randomly sampling a plurality of network blocks of each network hierarchy of the third neural network after determining the third neural network.
In one possible implementation, the apparatus further includes: and the training module is used for training the target neural network through the hyper-parameters randomly selected from the hyper-parameter set of the structure adjusting period corresponding to the target neural network and the training samples with the labeled information to obtain the trained target neural network.
In one possible implementation, the target neural network is configured to perform any one of a text recognition task, an audio processing task, an image processing task, and a video processing task.
According to another aspect of the present disclosure, there is provided an artificial intelligence chip including the neural network structure determining apparatus.
According to another aspect of the present disclosure, there is provided an electronic device including the artificial intelligence chip.
According to another aspect of the present disclosure, a board card is provided, which includes: the device comprises a storage device, an interface device, a control device and the artificial intelligence chip; wherein, the artificial intelligence chip is respectively connected with the storage device, the control device and the interface device; the storage device is used for storing data; the interface device is used for realizing data transmission between the artificial intelligence chip and external equipment; and the control device is used for monitoring the state of the artificial intelligence chip.
According to another aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the neural network structure determination method.
According to another aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions, characterized in that the computer program instructions, when executed by a processor, implement the neural network structure determination method.
According to the neural network structure determining method disclosed by the embodiment of the disclosure, in each structure adjusting period, the first neural network can be optimized through a plurality of super-parameter adjusting periods to obtain the third neural network with the highest accuracy, and the super-parameters with poor training effect can be selected to be removed based on the target super-parameters or the second set to reduce the super-parameter set and keep the super-parameters with better training effect, so that the selection precision is higher in the super-parameter selecting process. Furthermore, in each structure adjusting period, the network blocks with lower accuracy in the third neural network are removed to obtain the first neural network in the next structure adjusting period, and the target neural network is obtained through multiple iterations, so that the network blocks with the highest accuracy can be reserved, the accuracy of the neural network is improved, and the structure of the neural network can be optimized.
According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.
According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.
FIG. 1 shows a flow diagram of a neural network structure determination method according to an embodiment of the present disclosure;
FIG. 2 shows a schematic diagram of a neural network, according to an embodiment of the present disclosure;
FIG. 3 shows a schematic diagram of a neural network, according to an embodiment of the present disclosure;
FIG. 4 shows a schematic diagram of a network block according to an embodiment of the present disclosure;
FIG. 5 illustrates a block diagram of a neural network structure determining apparatus, in accordance with an embodiment of the present disclosure;
FIG. 6 is a block diagram illustrating a combined treatment device according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram illustrating a board according to an embodiment of the disclosure;
FIG. 8 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
It should be understood that the terms "first," "second," etc. in the claims, description, and drawings of the present disclosure are used for distinguishing between different objects and not for describing a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.
As used in this specification and claims, the term "if" may be interpreted contextually as "when.. Or" once "or" in response to a determination "or" in response to a detection ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
The neural network structure determination method according to the embodiment of the disclosure can be applied to a processor to improve the processing efficiency of the processor. The processor may be a general purpose processor, such as a CPU (Central Processing Unit), or an artificial Intelligence Processor (IPU) for performing artificial intelligence operations. The artificial intelligence operations may include machine learning operations, brain-like operations, and the like. The machine learning operation comprises neural network operation, k-means operation, support vector machine operation and the like. The artificial intelligence processor may include, for example, one or a combination of a GPU (Graphics Processing Unit), an NPU (Neural-Network Processing Unit), a DSP (Digital Signal Processing Unit), and a Field-Programmable Gate Array (FPGA) chip. The present disclosure is not limited to a particular type of processor.
In one possible implementation, the processor referred to in this disclosure may include multiple processing units, each of which may independently run various tasks assigned thereto, such as: a convolution operation task, a pooling task, a full connection task, or the like. The present disclosure is not limited to processing units and tasks executed by processing units.
Fig. 1 shows a flow diagram of a neural network structure determination method according to an embodiment of the present disclosure, the method comprising:
in step S11, training a first sub-network of a current first neural network according to a hyper-parameter of a first hyper-parameter adjusting period, to obtain a second neural network of the first hyper-parameter adjusting period, where the first neural network includes a plurality of network levels, each network level includes a plurality of network blocks, the first sub-network includes randomly selected network blocks from the plurality of network blocks of each network level of the first neural network, and the hyper-parameter of the first hyper-parameter adjusting period is obtained by selecting from a current hyper-parameter set;
in step S12, performing hyper-parameter adjustment processing according to a first set of a current hyper-parameter adjustment cycle and a second set of the current hyper-parameter adjustment cycle, determining a hyper-parameter of a next hyper-parameter adjustment cycle from the current hyper-parameter set, and determining a second neural network of the next hyper-parameter adjustment cycle, where the first set of the current hyper-parameter adjustment cycle includes a hyper-parameter of the current hyper-parameter adjustment cycle and a hyper-parameter of a historical hyper-parameter adjustment cycle, and the second set of the current hyper-parameter adjustment cycle includes a verification result of the second neural network of the current hyper-parameter adjustment cycle and a verification result of the second neural network of the historical hyper-parameter adjustment cycle;
in step S13, determining a third neural network and an optimized hyper-parameter set according to the first set and the second set of the last hyper-parameter adjusting period and the second neural network of the plurality of hyper-parameter adjusting periods, wherein the number of hyper-parameters in the optimized hyper-parameter set is smaller than the number of hyper-parameters in the current hyper-parameter set;
in step S14, a target neural network is obtained according to the third neural network and the optimized hyper-parameter set.
According to the neural network structure determining method disclosed by the embodiment of the disclosure, the hyperparameters can be optimized and screened in a plurality of hyperparameter adjusting periods, the optimized hyperparameters are obtained, and the method can be used for determining a network structure with higher accuracy in the structure adjustment of the neural network so as to achieve higher training accuracy.
In one possible implementation, the neural network structure determining method may be performed by an electronic device such as a terminal device or a server, the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor calling a computer readable instruction stored in a memory. Alternatively, the method may be performed by a server.
In one possible implementation manner, during the machine learning process, the hyper-parameters (e.g., optimizer type, learning rate (learning rate), weight attenuation coefficient (weight decay), mix ratio (mix ratio), etc.) used in the training process and the network structure of the neural network may all have an influence on the final training precision of the neural network. In general, a preset hyper-parameter may be used to determine a suitable neural network structure and training is performed to obtain a trained neural network. However, the preset hyper-parameters are not necessarily accurate, which may result in that the determined neural network structure is not appropriate, and may affect the accuracy of the neural network. Thus, hyper Parameter Optimization (HPO) and Neural Network Architecture Search (NAS) can be combined for Optimization.
In one possible implementation, both the hyperparameter optimization and the neural network structure search may perform multiple cycles of iteration and search to determine the appropriate hyperparameter and neural network structure. In an example, the above process may include a plurality of structural tuning cycles (i.e., searching for a suitable structure of the neural network in iterations of a plurality of cycles), and each structural tuning cycle may include a plurality of hyper-parameter tuning cycles (i.e., each hyper-parameter set may be optimized by a plurality of hyper-parameter tuning cycles in each structural tuning cycle, and thus the hyper-parameters may be selected within a more precise range).
In one possible implementation, as described above, each structural adjustment cycle may include a plurality of hyper-parameter adjustment cycles, and in each structural adjustment cycle, the hyper-parameter set may be optimized by selecting a hyper-parameter in the hyper-parameter set, and the structure of the neural network may be optimized by iteration over a plurality of cycles. In the next structure adjustment period, the hyper-parameters can be continuously selected to optimize the structure of the neural network and the hyper-parameter set on the basis of the optimized hyper-parameter set. Through a plurality of iterations of the structure adjustment period, a target neural network meeting the requirements can be finally determined (for example, the structure meets the requirements), and the hyper-parameter set with a smaller selection range and higher selection precision is selected. A group of hyper-parameters can be selected from the hyper-parameter set to train a target neural network meeting requirements, and a better training effect and higher training precision can be obtained.
In one possible implementation, the neural network may be first pre-processed without any conditioning and training, which in an example may be a deep learning neural network comprising a plurality of network levels, each of which may comprise a plurality of network blocks.
Fig. 2 shows a schematic diagram of a neural network, which, as shown in fig. 2, may include a plurality of stages, each of which may include a plurality of network levels, according to an embodiment of the present disclosure.
FIG. 3 shows a schematic diagram of a neural network according to an embodiment of the disclosure, as shown in FIG. 3, each network hierarchy may include a plurality of network blocks, in an example, network block b i,l,n An nth network block of an ith network level of an ith stage may be represented, where i, l, and n are positive integers.
FIG. 4 shows a schematic diagram of a network block, which may be one of the network blocks shown in FIG. 4, e.g., b, as shown in FIG. 4, according to an embodiment of the present disclosure 1,2,3 May be the network block shown in fig. 4 (a), b 2,2 And 4 may be the network block shown in fig. 4 (b), the present disclosure being directed to a networkThe structure of the winding blocks is not limited. In an example, the network block in (a) includes a1 × 1 convolutional layer, a normalization layer, an activation layer, a3 × 3 convolutional layer, etc., and may also include a bridged residual structure. (b) The network block in (1) includes a1 × 1 convolutional layer, a normalization layer, an active layer, a3 × 3 convolutional layer with an expansion ratio =2, and the like, and may further include a bridged residual structure. The network block shown in fig. 4 (c) includes a1 × 1 convolutional layer, a normalization layer, an activation layer, a3 × 3 convolutional layer with an expansion ratio =4, and the like, and may further include a bridged residual structure. The network block shown in fig. 4 (d) includes a normalization layer, an activation layer, a3 × 3 convolution layer, and the like, and may further include a cross-connected residual structure. The network block shown in fig. 4 (e) includes a bridged residual structure. The structure of the network block is not limited to the above-mentioned several structures, and the network block with other structures may also be included, and the present disclosure does not limit the structure of the network block. In an example, the network blocks of the first network hierarchy of each stage may have a function of down-sampling, and thus, the first network hierarchy of each stage does not include the network block shown in (e).
In a possible implementation manner, since the neural network includes numerous network blocks and has a complex structure, the network structure of the neural network can be optimized. During the above pre-processing, hyper-parameter sets HP that may never be optimized base In which a set of hyper-parameters HP is selected 0 And randomly selecting a network block from each network hierarchy of the non-optimized neural network, and utilizing the hyper-parameter HP 0 Training the sub-network composed of randomly selected network blocks, for example, inputting samples with labeled information into the sub-network, adjusting the network parameters of the sub-network by using the processing results and labeled information of the sub-network, and during the adjustment process, according to the selected hyper-parameter HP 0 And (4) adjusting. Randomly selecting network blocks to form sub-networks and according to the hyper-parameter HP 0 The training process may be iterated several times to obtain a preprocessed first neural network SP 0 (i.e., the first neural network in the first structural regulation cycle). In examples, the training samples may include text samples, audio samples, image samples, video samples, and so on, and the present disclosure does not limit the category of the training samples.
In an example, a hyper-parameter set HP base Each hyper-parameter in (1) can have a certain range, for example, the optimizer category includes three categories of adam, sgd and momentum, and the learning rate interval is [0.01,0.5 ]]The interval of the weight attenuation coefficient is [1e ] -6 ,5e -4 ]The mixing ratio is in the range of [0,0.5 ]]The present disclosure does not limit the scope of the superparameters in the superparameter set. In the subsequent optimization, the range of the hyper-parameter set of each structure adjusting period can be smaller than that of the hyper-parameter set of the last structure adjusting period, and when the hyper-parameter is selected, the hyper-parameter set can be selected in a smaller range, so that the selection of the hyper-parameter is more accurate. Further, after each structure adjustment period, each level of the first neural network can reduce a part of network blocks with poor performance, that is, the number of the network blocks included in the first neural network in the next structure adjustment period is less than that of the network blocks included in the first neural network in the current structure adjustment period, so that in the next structure adjustment period, the network blocks can be randomly selected from fewer network blocks to form a sub-network for training, the performance of the selected network blocks is better, and more accurate hyper-parameters can be selected from a smaller-parameter set for training, which is helpful for enabling the final target neural network to include the network blocks with better performance, and improving the precision of the neural network.
In one possible implementation, in step S11, a plurality of hyper-parameter adjustment periods may be included in the current hyper-parameter adjustment period (e.g., 1 st, 2 nd, \8230; j th hyper-parameter adjustment period). During the first hyper-parameter adjustment period, a hyper-parameter set HP which may be in the current hyper-parameter adjustment period base Randomly selecting a group of hyper-parameters HP 1 For training.
In one possible implementation, the current first neural network SP may be 0 (i.e., the first neural network of the current fabric conditioning cycle), a block of networks is randomly selected (e.g., equal probability random selection) in each network level to form a first subnetwork. Further, the above-mentioned hyper-parameter HP may be based on 1 The first subnetwork is trained. The above process of selecting and training network blocks can be iteratively executed for multiple timesAfter network parameters of a plurality of network blocks in the first neural network are adjusted, a second neural network SP can be obtained 1
In one possible implementation, to optimize the hyper-parameter set, i.e., narrow the hyper-parameter set, the second neural network may be verified, the verification result of the second neural network obtained, and the optimization is performed based on the verification result. The method further comprises the following steps: randomly sampling network blocks of each network level of the second neural network respectively, and determining a first network block from each network level; determining a second sub-network according to the first network block of each network hierarchy; processing the training samples with the marking information through the second sub-network, and determining a first processing result of the second sub-network; determining the accuracy of the second sub-network according to the first processing result and the marking information; and determining a verification result of the second neural network according to the accuracy rates of the plurality of second sub-networks.
In a possible implementation manner, in the current structure adjustment cycle, the network structure of the second neural network is the same as that of the first neural network, except that the network parameters of the second neural network are different from those of the first neural network after the training process of the first sub-network. Thus, the second neural network also includes a plurality of network levels, and each network level includes a plurality of network blocks. The network blocks of the network levels of the second neural network may be randomly sampled separately in the same manner as above for determining the first sub-network, e.g., one network block, i.e., the first network block, may be randomly selected per network level. Further, the network composed of the first network blocks is the second sub-network.
In one possible implementation, the training samples may be processed by a second sub-network to obtain the first processing result, for example, the training samples are images, and the second sub-network may process the images to identify the target object in the images. Further, the accuracy of the second sub-network may be determined according to the first processing result and the annotation information, that is, a plurality of training samples may be input into the second sub-network, and the first processing result and the annotation information at each output may be compared to determine the accuracy of the second sub-network. For example, a plurality of images may be input to the second sub-network, the second sub-network may output a plurality of recognition results, and recognition accuracy of the plurality of recognition results may be counted as accuracy of the second sub-network based on the label information of the images.
In one possible implementation, a plurality of second sub-networks may be obtained in the manner described above, and the accuracy of each second sub-network may be determined separately. Further, the validation result of the second neural network may be determined based on the accuracy of the plurality of second sub-networks. The verification result includes an average accuracy rate of the plurality of second subnetworks. In an example, the random sampling obtains 10 second sub-networks, each of which processes 1000 training samples to separately count each second sub-network accuracy, and further, the accuracy of the 10 second sub-networks may be averaged to obtain the average accuracy, that is, the verification result of the second neural network. In another example, the number of training samples processed by each second sub-network may be different, for example, 2000 training samples processed by a first second sub-network and 1000 training samples processed by a second sub-network \8230 \ 8230, where in determining the average accuracy, the accuracy of each second sub-network may be weighted-averaged according to the number of training samples processed by each second sub-network to obtain the average accuracy of the plurality of second sub-networks, i.e., the verification result of the second neural network.
In one possible implementation, the first and second sets of first hyper-parameter adjustment periods may be obtained in the manner described above. The first set comprises the first neural network SP in the first hyper-parameter adjustment cycle 0 Is used in the training process 1 I.e. the first set is { HP } 1 }. The second set comprises a second neural network SP in the first hyper-parametric regulation cycle 1 Is verified to be
Figure BDA0003059918550000091
That is, the second set is
Figure BDA0003059918550000092
Based on the first set and the second set, a hyperparameter HP for a second hyperparameter adjustment period may be determined 2 And a second neural network SP of a second hyper-parametric regulation cycle 2
In one possible implementation manner, the first set and the second set may be used as input parameters, and the first set and the second set may be subjected to a hyper-parameter optimization process. In an example, the hyper-parameter optimization can be performed by a random search algorithm, a grid search algorithm, a simulated annealing algorithm, a bayesian optimization algorithm and the like, that is, the first set and the second set are used as input parameters of the algorithms, and the hyper-parameter HP of the second hyper-parameter adjustment period can be selected from the hyper-parameter set of the current structure adjustment period 2
In a possible implementation, the first neural network SP may be implemented in the same way as described above 0 Randomly sampling network blocks of each network layer to obtain a first sub-network, and regulating the hyper-parameter HP of the period by a second hyper-parameter 2 Training the first subnetwork, i.e. based on the hyperparameter HP 2 Training the first sub-network by training samples with labeled information, and after the random sampling process and the training process are iterated for multiple times, the first neural network SP 0 The network parameters of at least part of the network blocks are adjusted to obtain a second neural network SP with a second hyper-parameter adjustment period 2
In a possible implementation, the second neural network SP may be obtained in the same manner as described above 2 Verification result of (2)
Figure BDA0003059918550000101
For example, a second neural network SP may be paired 2 The network levels are randomly sampled, a first network block is selected from the network levels to form a second sub-network, and the accuracy of the second sub-network is determined through the training samples with the marking information. Further, the accuracy of the plurality of second sub-networks may be averaged or weighted to determine the accuracy of the plurality of second sub-networksAverage accuracy, i.e. second neural network SP 2 Verification result of (2)
Figure BDA0003059918550000102
In a possible implementation, through the above-mentioned processing, the hyperparameter HP is obtained 2 And a second neural network SP 2 Verification result of (2)
Figure BDA0003059918550000103
In the second hyperparametric regulation period, the first set may include the hyperparametric HP of the first hyperparametric regulation period 1 And the hyperparameter HP of the second hyperparameter adjustment period 2 I.e. the first set is { HP } 1 ,HP 2 }. A second neural network SP may be included in the second set 1 Verification result of (2)
Figure BDA0003059918550000104
And a second neural network SP 2 Verification result of (2)
Figure BDA0003059918550000105
That is, the second set is
Figure BDA0003059918550000106
In a possible implementation, the above-described process is continued, i.e. the hyper-parameter HP of the third hyper-parameter adjustment cycle may be determined by the first set and the second set 3 And based on the hyperparameter HP 3 For the first neural network SP 0 Training the first sub-network to obtain a second neural network SP 3 Further, a second neural network SP may be determined 3 Verification result of (2)
Figure BDA0003059918550000107
Thus, in the third hyperparametric regulation period, the first set is { HP } 1 ,HP 2 ,HP 3 In the second set of
Figure BDA0003059918550000108
In a possible implementation manner, the above processing may be performed iteratively, that is, in step S12, a hyper-parameter adjustment process is performed according to the first set of current hyper-parameter adjustment cycles and the second set of current hyper-parameter adjustment cycles, and a hyper-parameter of a next hyper-parameter adjustment cycle is determined from the current hyper-parameter set. And the first neural network SP is subjected to hyper-parameter pair based on the next hyper-parameter adjustment period 0 Training the first sub-network to obtain a second neural network of a next hyper-parameter regulation cycle. Further, the second neural network of the next hyper-parameter adjustment cycle may be verified, that is, a plurality of second sub-networks of the second neural network are obtained through random sampling, and an average accuracy of the plurality of second sub-networks is determined as a verification result of the second neural network. The first set of the current over-parameter regulation period comprises over-parameters of the current over-parameter regulation period and over-parameters of the historical over-parameter regulation period, and the second set of the current over-parameter regulation period comprises verification results of the second neural network of the current over-parameter regulation period and verification results of the second neural network of the historical over-parameter regulation period.
In one possible implementation, a first set { HP } including t sets of hyper-parameters may be obtained for t iterations through t hyper-parameter adjustment cycles (i.e., the above processing iterations are performed t times, t being a positive integer) 1 ,HP 2 ,…,HP t And a second set of t validation results
Figure BDA0003059918550000111
A second neural network SP of t hyper-parameter regulation periods is also obtained 1 、SP 2 …SP t
In one possible implementation, in step S13, a third neural network and an optimized hyper-parameter set may be determined according to the first set, the second set, and the second neural network of the plurality of hyper-parameter adjustment cycles of the last hyper-parameter adjustment cycle. The second neural network of the last hyper-parameter regulation cycle, i.e. the t-th hyper-parameter regulation cycle, the first set and the second set of the t-th hyper-parameter regulation cycle and the t hyper-parameter regulation cycles is as shown above. The third neural network and the optimized hyper-parameter set may be determined based on the first set, the second set, and the second neural network described above. The third neural network may be a second neural network with the highest accuracy of the verification result, that is, a neural network with the best performance in the plurality of second neural networks, the optimized hyper-parameter set is the hyper-parameter set obtained by reducing the hyper-parameter set of the current structure adjustment period (that is, removing the hyper-parameter with poor training effect), that is, the number of the hyper-parameters in the optimized hyper-parameter set is smaller than that in the current hyper-parameter set, and the optimized hyper-parameter set may be used as the hyper-parameter set of the next structure adjustment period.
In one possible implementation, step S13 may include: determining a target hyper-parameter in the first set of the last hyper-parameter adjustment period according to the second set of the last hyper-parameter adjustment period; determining a second neural network of a hyper-parameter regulation cycle corresponding to the target hyper-parameter as the third neural network; and according to the second set of the last hyper-parameter adjusting period or the target hyper-parameter, reducing the hyper-parameter set of the current structure adjusting period to obtain the optimized hyper-parameter set.
In one possible implementation, the second set of the last hyper-parameter adjustment cycles, i.e., the second set of the tth hyper-parameter adjustment cycles, is
Figure BDA0003059918550000112
The verification result with the highest accuracy can be determined from the set, for example, the second neural network SP of the kth (k is a positive integer less than or equal to t) hyperparametric regulation period k The accuracy of the verification result of (1) is the highest, and therefore, the hyper-parameter HP of the kth hyper-parameter adjustment period can be considered k The training effect is the best. Thus, the tth hyperparameter may be adjusted for the first set of periods { HP } 1 ,HP 2 ,…,HP t Hyperparameter HP in k Determined as a target hyperparameter HP * . And can connect the verification nodeSecond neural network SP with highest result accuracy k Determined as the third neural network SP * . Target hyper-parameter HP * The method can be used for optimizing the hyper-parameter set of the current structure adjusting period, namely, the hyper-parameter set with poor training effect is reduced, and the optimized hyper-parameter set, namely, the hyper-parameter set of the next structure adjusting period is obtained. Third neural network SP * First neural network SP which can be used to obtain the next structural regulation cycle after optimization 0
In a possible implementation manner, narrowing the hyper-parameter set of the current structure adjustment period according to the second set of the last hyper-parameter adjustment period or the target hyper-parameter, and obtaining the optimized hyper-parameter set, includes: and under the condition that the hyper-parameters are continuous variables, keeping the hyper-parameters in a preset neighborhood taking the target hyper-parameters as the center, and removing the hyper-parameters outside the preset neighborhood. In an example, the hyper-parameters may include an optimizer type, a learning rate, a weight attenuation coefficient, a mix ratio, and the like, wherein the learning rate, the weight attenuation coefficient, and the mix ratio are continuous variables, and the hyper-parameters in a preset neighborhood centered on the target hyper-parameter may be retained in the reduction process. For example, in the current hyper-parameter set, the learning rate interval is [0.01,0.5 ]]At the target hyperparameter HP * If the learning rate is 0.3, 0.3 can be used as the neighborhood center, and the neighborhood radius (e.g., 0.2) can be preset, so that the hyper-parameter interval in the neighborhood is preset to [0.1,0.5 ]]The interval of the hyperparameters to be removed is [0.01,0.1 ]. In the hyper-parameter set of the next structure regulation period, the learning rate interval is [0.1,0.5 ]]. The present disclosure does not limit the neighborhood center and neighborhood radius.
In a possible implementation manner, narrowing the hyper-parameter set of the current structure adjustment period according to the second set of the last hyper-parameter adjustment period or the target hyper-parameter, and obtaining the optimized hyper-parameter set, includes: and under the condition that the hyper-parameters are discrete variables, removing first hyper-parameters, wherein the verification result of the second neural network corresponding to the first hyper-parameters does not meet the accuracy condition. That is, inWhere the hyperparameter is a discrete variable, the removed first hyperparameter may be determined based on the second set of the tth hyperparameter adjustment cycles. In an example, the second set of hyper-parameter adjustment periods may be at the t-th
Figure BDA0003059918550000121
Determining one or more verification results with the lowest accuracy, and removing the hyper-parameters corresponding to the verification results. In an example, the optimizer classes are discrete variables and may include three classes adam, sgd, momentum. E.g. in the second set
Figure BDA0003059918550000122
In the method, T (T is a positive integer smaller than T) verification results with the lowest accuracy are determined, and T sets of hyper-parameters in the first set corresponding to the T verification results can be determined. Further, the T sets of hyper-parameters may be counted, with the most used hyper-parameter as the hyper-parameter with the worst training effect (i.e., the first hyper-parameter), for removal. For example, if momentum is the most used super-parameter among the T super-parameters, momentum can be considered as the super-parameter with the worst training effect, and can be removed. Then in the hyper-parameter set of the next structure adjustment cycle, the optimizer class may include adam and sgd only. The present disclosure does not limit the reserved hyper-parameters.
By the mode, the hyper-parameters with poor training effect can be selected to be removed based on the target hyper-parameters or the second set, so that the hyper-parameters set is reduced, the hyper-parameters with good training effect are reserved, the selection precision is higher in the process of selecting the hyper-parameters, and the training effect is improved.
In a possible implementation manner, in step S14, the third neural network SP may be subjected to the optimized hyper-parameter set * And further optimizing to optimize the structure of the third neural network to obtain the target neural network. In an example, the third neural network SP of the cycle may be adjusted for the current structure * Optimizing to reduce the number of network blocks in each network layer to obtain the first neural network SP of the next structure regulation period 0 . Go toStep by step, the first neural network SP of the next structure regulation cycle can be adjusted by a plurality of hyper-parameter regulation cycles in the next structure regulation cycle 0 Optimizing (the optimization process is consistent with the iteration process of a plurality of hyper-parameter adjustment periods of the current structure adjustment period) to obtain a third neural network SP of the next structure adjustment period * After the process of reducing the number of network blocks in each network hierarchy, the first neural network SP in the next structure adjustment cycle can be obtained 0 8230the optimization process of the structure regulation period can be iterated until a first neural network meeting the structure condition is obtained. In an example, the fabric condition may include that each network hierarchy includes only one network block, and the disclosure does not limit the fabric condition.
In one possible implementation, step S14 may include: determining the first neural network of the next structural regulation cycle according to the verification result of the third neural network, wherein the number of network blocks included in each network layer of the first neural network of the next structural regulation cycle is less than that of the network blocks included in each network layer of the first neural network of the current structural regulation cycle; and obtaining the target neural network according to the network of the first neural network of the next structure regulation period and the hyper-parameter set of the next structure regulation period, wherein the target neural network is the first neural network with the structure meeting the structure condition.
In one possible implementation, as described above, the number of network blocks of each network hierarchy of the third neural network may be reduced based on the verification result of the third neural network. The validation result of the third neural network includes validation results of a plurality of third sub-networks, the third sub-networks include randomly selected network blocks from a plurality of network blocks of each network hierarchy of the third neural network, wherein determining the first neural network of the next structural regulation cycle according to the validation result of the third neural network includes: determining a fourth sub-network which does not meet the preset accuracy requirement in the plurality of third sub-networks according to the verification results of the plurality of third sub-networks; counting the network blocks included in each network hierarchy in the preset number of fourth subnetworks, and determining second network blocks in each network hierarchy in the preset number of fourth subnetworks, wherein the second network blocks are the network blocks in each network hierarchy in the preset number of fourth subnetworks, and the number of the network blocks in each network hierarchy meets the requirement; and in the third neural network, removing the second network block to obtain the first neural network of the next structure regulation period.
In a possible implementation manner, the manner of randomly sampling each network level of the third neural network may refer to the manner of obtaining the first sub-network or the second sub-network, and is not described herein again. The plurality of third sub-networks include a sub-network obtained by randomly sampling a plurality of network blocks of each network hierarchy of the second neural network corresponding to the third neural network when determining the verification result in the second set, and a sub-network obtained by randomly sampling a plurality of network blocks of each network hierarchy of the third neural network after determining the third neural network. In an example, p (p is a positive integer) random samples may be taken of the third neural network, resulting in p third subnetworks. In addition, the third neural network is the second neural network with the highest accuracy of the verification result, and when the verification result of the second neural network is determined, the second neural network has been subjected to random sampling for a plurality of times to obtain a plurality of second sub-networks, for example, m (m is a positive integer) second sub-networks, so that m second sub-networks of the third neural network can be used as the third sub-networks to obtain m + p third sub-networks in total.
In one possible implementation, the third sub-networks may be verified, for example, by processing the training samples with the labeling information through the third sub-networks and comparing the processing results with the labeling information to determine the accuracy of each third sub-network. Further, a fourth sub-network that does not meet the accuracy requirement may be determined based on the accuracy of each third sub-network, for example, b (b is a positive integer smaller than m + p) third sub-networks with the lowest accuracy may be determined as the fourth sub-network.
In one possible implementation, the network blocks in the fourth plurality of subnetworks may be counted to determine a number of second network blocks per level. For example, b =1000, that is, a total of 1000 fourth subnetworks are obtained, the network blocks of each network level in the 1000 fourth subnetworks can be counted respectively, and the second network block meeting the number requirement is determined. For example, in an a-th (a is a positive integer) network hierarchy, a fourth sub-network that uses a c-th (c is a positive integer) network block exceeds a number threshold (e.g., 500), the c-th network block of the network hierarchy may be determined to be the second network block. Alternatively, one or more network blocks with the highest usage rate in each network hierarchy may be counted and determined as the second network block.
In a possible implementation, since the accuracy of the fourth sub-network is low, it can be determined that the second network block with high usage rate is low in accuracy and poor in usage effect, and therefore, the second network block of the third neural network can be removed and the first neural network of the next structure adjustment cycle can be obtained. The number of network blocks included in the first neural network for the next structural adjustment cycle is less than the number of network blocks included in the first neural network for the current structural adjustment cycle.
In a possible implementation manner, according to the verification result of the third sub-network, the fifth sub-network meeting the requirement of accuracy is determined, the third network blocks with higher utilization rate in each network hierarchy of the plurality of fifth sub-networks are counted, the third network blocks in the third neural network are reserved, and other network blocks are removed to obtain the first neural network of the next structure adjustment cycle. The manner in which network blocks are removed is not limited by this disclosure.
In a possible implementation manner, the first neural network of the next structure adjustment cycle may be optimized based on the hyper-parameters selected from the hyper-parameter set of the next structure adjustment cycle, and the optimization process may be according to the processing processes of the plurality of hyper-parameter adjustment cycles as described above, and obtain the third neural network of the next structure adjustment cycle. Further, the network blocks in the third neural network of the next structural adjustment cycle can be reduced according to the above manner to obtain the first neural network of the next structural adjustment cycle \8230, the process of 8230can be iteratively executed until the first neural network conforming to the structural condition, i.e., the target neural network, is obtained. In an example, each network hierarchy of the target neural network includes only one network block. The present disclosure does not limit the network structure of the target neural network.
In this way, the first neural network of the next structure regulation cycle can be obtained by removing the network blocks with lower accuracy in the third neural network, the target neural network can be obtained by iteration processing, and the network blocks with the highest accuracy can be selected from each level of the neural network after multiple iterations, so that the accuracy of the neural network is improved, and the structure of the neural network can be simplified.
In one possible implementation, the target neural network may be further trained to further enhance the performance of the target neural network. The method further comprises the following steps: and training the target neural network through the randomly selected hyper-parameters in the hyper-parameter set of the structure adjusting period corresponding to the target neural network and the training sample with the labeled information to obtain the trained target neural network. In an example, the structure adjustment period corresponding to the target neural network is a structure adjustment period that enables the first neural network to meet the structure condition, for example, each network level of the first neural network only includes a structure adjustment period (e.g., a last structure adjustment period) of one network block, in the structure adjustment period, the hyper-parameter set is reduced to a smaller range, and the training effect of the hyper-parameters in the hyper-parameter set is better, the hyper-parameters can be randomly selected from the hyper-parameter set of the structure adjustment period, and based on the selected hyper-parameters, a plurality of training samples with labeled information are used for training, so as to further improve the accuracy of the target neural network, and obtain the trained target neural network.
In one possible implementation, the training samples may include any one of text samples, audio samples, image samples, video samples, and the like, and the present disclosure does not limit the category of the training samples. The target neural network may be used to perform any one of a text recognition task, an audio processing task, an image processing task, and a video processing task. The present disclosure does not limit the use of the target neural network.
According to the neural network structure determining method disclosed by the embodiment of the disclosure, the first neural network can be optimized through a plurality of hyper-parameter adjusting periods in each structure adjusting period to obtain the third neural network with the highest accuracy, and the hyper-parameters with poor training effect can be selected to be removed based on the target hyper-parameters or the second set to reduce the hyper-parameter set and reserve the hyper-parameters with better training effect, so that the selection precision is higher in the process of selecting the hyper-parameters. Furthermore, in each structure regulation period, the network block with lower accuracy in the third neural network can be removed to obtain the first neural network in the next structure regulation period, and the target neural network can be obtained through multiple iterations, so that the network block with the highest accuracy can be reserved, the accuracy of the neural network can be improved, and the structure of the neural network can be optimized.
Fig. 5 shows a block diagram of a neural network structure determining apparatus according to an embodiment of the present disclosure, as shown in fig. 6, the apparatus including: a first training module 11, configured to train a first sub-network of a current first neural network according to a hyper-parameter of a first hyper-parameter adjustment cycle, to obtain a second neural network of the first hyper-parameter adjustment cycle, where the first neural network includes a plurality of network levels, each network level includes a plurality of network blocks, the first sub-network includes randomly selected network blocks from the plurality of network blocks of each network level of the first neural network, and the hyper-parameter of the first hyper-parameter adjustment cycle is obtained by selecting from a current hyper-parameter set; a hyper-parameter adjusting module 12, configured to perform hyper-parameter adjustment processing according to a first set of a current hyper-parameter adjusting cycle and a second set of the current hyper-parameter adjusting cycle, determine a hyper-parameter of a next hyper-parameter adjusting cycle from the current hyper-parameter set, and determine a second neural network of the next hyper-parameter adjusting cycle, where the first set of the current hyper-parameter adjusting cycle includes a hyper-parameter of the current hyper-parameter adjusting cycle and a hyper-parameter of a historical hyper-parameter adjusting cycle, and the second set of the current hyper-parameter adjusting cycle includes a verification result of the second neural network of the current hyper-parameter adjusting cycle and a verification result of the second neural network of the historical hyper-parameter adjusting cycle; a first optimization module 13, configured to determine a third neural network and an optimized hyper-parameter set according to the first set and the second set of the last hyper-parameter regulation period and a second neural network of a plurality of hyper-parameter regulation periods, where the number of hyper-parameters in the optimized hyper-parameter set is smaller than the number of hyper-parameters in the current hyper-parameter set; and a first obtaining module 14, configured to obtain a target neural network according to the third neural network and the optimized hyper-parameter set.
In a possible implementation manner, the apparatus further includes a verification result determining module configured to: randomly sampling network blocks of each network level of the second neural network respectively, and determining a first network block from each network level; determining a second sub-network according to the first network blocks of each network hierarchy; processing the training samples with the labeling information through the second sub-network, and determining a first processing result of the second sub-network; determining the accuracy of the second sub-network according to the first processing result and the marking information; and determining the verification result of the second neural network according to the accuracy rates of the plurality of second sub-networks.
In one possible implementation, the verification result includes an average accuracy rate of the plurality of second subnetworks.
In one possible implementation, the first optimization module is further configured to: determining a target hyper-parameter in the first set of the last hyper-parameter adjustment period according to the second set of the last hyper-parameter adjustment period; determining a second neural network of a hyper-parameter regulation cycle corresponding to the target hyper-parameter as the third neural network; and according to the second set of the last hyper-parameter adjustment period or the target hyper-parameter, reducing the hyper-parameter set of the current structure adjustment period to obtain the optimized hyper-parameter set.
In one possible implementation, the first optimization module is further configured to: and under the condition that the hyper-parameters are continuous variables, preserving the hyper-parameters in a preset neighborhood taking the target hyper-parameters as the center, and removing the hyper-parameters outside the preset neighborhood.
In one possible implementation, the first optimization module is further configured to: and under the condition that the hyper-parameters are discrete variables, removing first hyper-parameters, wherein the verification result of the second neural network corresponding to the first hyper-parameters does not accord with the accuracy condition.
In one possible implementation manner, the structure adjustment cycle includes a plurality of hyper-parameter adjustment cycles, the current hyper-parameter set is a hyper-parameter set of the current structure adjustment cycle, the optimized hyper-parameter set is a hyper-parameter set of a next hyper-parameter adjustment cycle, the current first neural network is a first neural network of the current structure adjustment cycle, and the first obtaining module is further configured to: determining the first neural network of the next structural regulation cycle according to the verification result of the third neural network, wherein the number of network blocks included in each network layer of the first neural network of the next structural regulation cycle is less than that of the network blocks included in each network layer of the first neural network of the current structural regulation cycle; and obtaining the target neural network according to the network of the first neural network of the next structure regulation period and the hyper-parameter set of the next structure regulation period, wherein the target neural network is the first neural network with the structure meeting the structure condition.
In a possible implementation manner, the verification result of the third neural network includes verification results of a plurality of third sub-networks, the third sub-networks include randomly selected network blocks from a plurality of network blocks of each network hierarchy of the third neural network, and the first obtaining module is further configured to: determining a fourth sub-network which does not meet the preset accuracy requirement in the plurality of third sub-networks according to the verification results of the plurality of third sub-networks; counting the network blocks included in each network level in the preset number of fourth sub-networks, and determining second network blocks in each network level in the preset number of fourth sub-networks, wherein the second network blocks are the network blocks which meet the number requirement in each network level of the preset number of fourth sub-networks; and in the third neural network, removing the second network block to obtain the first neural network of the next structure regulation period.
In a possible implementation manner, the plurality of third subnetworks include a subnetwork obtained by randomly sampling a plurality of network blocks of each network hierarchy of the second neural network corresponding to the third neural network when determining the verification result in the second set, and a subnetwork obtained by randomly sampling a plurality of network blocks of each network hierarchy of the third neural network after determining the third neural network.
In one possible implementation, the apparatus further includes: and the training module is used for training the target neural network through the hyper-parameters randomly selected from the hyper-parameter set of the structure adjusting period corresponding to the target neural network and the training samples with the labeled information to obtain the trained target neural network.
In one possible implementation, the target neural network is configured to perform any one of a text recognition task, an audio processing task, an image processing task, and a video processing task.
Fig. 6 is a block diagram illustrating a combined processing device 1200 according to an embodiment of the present disclosure. As shown in fig. 6, the combined processing device 1200 includes a computing processing device 1202, an interface device 1204, other processing devices 1206, and a storage device 1208. Depending on the application scenario, one or more computing devices 1210 may be included in the computing processing device, and may be configured to perform the operations described herein in conjunction with fig. 2.
In various embodiments, the computing processing device of the present disclosure may be configured to perform user-specified operations. In an exemplary application, the computing processing device may be implemented as a single-core artificial intelligence processor or a multi-core artificial intelligence processor. Similarly, one or more computing devices included within a computing processing device may be implemented as an artificial intelligence processor core or as part of a hardware structure of an artificial intelligence processor core. When multiple computing devices are implemented as artificial intelligence processor cores or as part of a hardware structure of an artificial intelligence processor core, computing processing devices of the present disclosure may be considered to have a single core structure or a homogeneous multi-core structure.
In an exemplary operation, the computing processing device of the present disclosure may interact with other processing devices through an interface device to collectively perform user-specified operations. Other Processing devices of the present disclosure may include one or more types of general and/or special purpose processors such as a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), an artificial intelligence processor, and the like, depending on the implementation. These processors may include, but are not limited to, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic, discrete hardware components, etc., and the number may be determined based on actual needs. As previously mentioned, the computational processing apparatus of the present disclosure may be considered to have a single core structure or a homogeneous multi-core structure only. However, when considered together, a computing processing device and other processing devices may be considered to form a heterogeneous multi-core structure.
In one or more embodiments, the other processing devices can interface with external data and controls as a computational processing device of the present disclosure (which can be embodied as artificial intelligence, e.g., a computing device associated with neural network operations), performing basic controls including, but not limited to, data handling, turning on and/or off of the computing device, and the like. In further embodiments, other processing devices may also cooperate with the computing processing device to collectively perform computational tasks.
In one or more embodiments, the interface device may be used to transfer data and control instructions between the computing processing device and other processing devices. For example, the computing processing device may obtain input data from other processing devices via the interface device, and write the input data into a storage device (or memory) on the computing processing device. Further, the computing processing device may obtain the control instruction from the other processing device via the interface device, and write the control instruction into the control cache on the computing processing device slice. Alternatively or optionally, the interface device may also read data from the memory device of the computing processing device and transmit the data to the other processing device.
Additionally or alternatively, the combined processing device of the present disclosure may further include a storage device. As shown in the figure, the storage means is connected to the computing processing means and the further processing means, respectively. In one or more embodiments, the storage device may be used to store data for the computing processing device and/or the other processing devices. For example, the data may be data that is not fully retained within internal or on-chip storage of a computing processing device or other processing device.
In some embodiments, the present disclosure also discloses an artificial intelligence seed chip (e.g., chip 1302 shown in fig. 8) that includes the neural network structure determining apparatus described above. In one implementation, the Chip is a System on Chip (SoC) and is integrated with one or more combinatorial processing devices. The chip may be connected to other associated components through an external interface device, such as external interface device 1306 shown in fig. 8. The relevant component may be, for example, a camera, a display, a mouse, a keyboard, a network card, or a wifi interface. In some application scenarios, other processing units (e.g., video codecs) and/or interface modules (e.g., DRAM interfaces) and/or the like may be integrated on the chip. In some embodiments, the present disclosure also discloses a chip packaging structure, which includes the above chip. In some embodiments, the present disclosure further discloses a board card including the above chip package structure. The board will be described in detail below with reference to fig. 8.
Fig. 7 is a schematic diagram illustrating a structure of a board 1300 according to an embodiment of the present disclosure. As shown in FIG. 7, the board includes a memory device 1304 for storing data, which includes one or more memory cells 1310. The memory device may be coupled to and communicate data with the control device 1308 and the artificial intelligence chip 1302 as described above, such as via a bus. Further, the board card also includes an external interface device 1306 configured for data relay or transfer function between the chip (or chips in the chip package structure) and an external device 1312 (such as a server or a computer). For example, the data to be processed may be transferred to the chip by an external device through an external interface. For another example, the calculation result of the chip may be transmitted back to an external device via the external interface device. According to different application scenarios, the external interface device may have different interface forms, for example, it may adopt a standard PCIE interface or the like.
Each group of the storage units is connected with the artificial intelligence chip through a bus. It is understood that each set of the memory cells may be DDR SDRAM (Double Data Rate SDRAM).
DDR can double up the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the artificial intelligence chip may include 4 72-bit DDR4 controllers, and 64 bits of the 72-bit DDR4 controllers are used for data transmission, and 8 bits are used for ECC checking. It can be understood that when DDR4-3200 grains are adopted in each group of memory cells, the theoretical bandwidth of data transmission can reach 25600MB/s.
In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each memory unit.
The interface device is electrically connected with the artificial intelligence chip. The interface device is used for realizing data transmission between the artificial intelligence chip and external equipment (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X 16 interface is adopted for transmission, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device may also be another interface, and the disclosure does not limit the specific expression of the other interface, and the interface unit may implement a switching function. In addition, the calculation result of the artificial intelligence chip is still transmitted back to the external device (e.g. server) by the interface device.
The control device is electrically connected with the artificial intelligence chip. The control device is used for regulating and controlling the state of the artificial intelligence chip. Specifically, the artificial intelligence chip and the control device can be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). As the artificial intelligence chip can comprise a plurality of processing chips, a plurality of processing cores or a plurality of processing circuits, a plurality of loads can be driven. Therefore, the artificial intelligence chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the artificial intelligence chip.
From the above description in conjunction with fig. 7 and 8, those skilled in the art will understand that the present disclosure also discloses an electronic device or apparatus, which may include one or more of the above boards, one or more of the above chips and/or one or more of the above combined processing devices.
According to different application scenarios, the electronic device or apparatus of the present disclosure may include a server, a cloud server, a server cluster, a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a PC device, an internet of things terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a visual terminal, an automatic driving terminal, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance instrument, a B ultrasonic instrument and/or an electrocardiograph.
Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.
An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.
The electronic device may be provided as a terminal, server, or other form of device.
Fig. 8 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 8, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.
The electronic device 1900 may further include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.
It is noted that for the sake of brevity, this disclosure presents some methods and embodiments thereof as a series and combination of acts, but those skilled in the art will appreciate that the disclosed aspects are not limited by the order of acts described. Accordingly, one of ordinary skill in the art will appreciate that certain steps may be performed in other sequences or simultaneously, in accordance with the disclosure or teachings of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in this disclosure are capable of being practiced in other than the specifically disclosed embodiments, and that the acts or modules illustrated herein are not necessarily required to practice one or more aspects of the disclosure. In addition, the present disclosure may focus on the description of some embodiments, depending on the solution. In view of the above, those skilled in the art will understand that portions of the disclosure that are not described in detail in one embodiment may also be referred to in the related description of other embodiments.
In particular implementation, based on the disclosure and teachings of the present disclosure, one of ordinary skill in the art will appreciate that the several embodiments disclosed in the present disclosure may be implemented in other ways not disclosed herein. For example, as for each unit in the foregoing embodiments of the electronic device or apparatus, the units are divided based on the logic function, and there may be another division manner in the actual implementation. Also for example, multiple units or components may be combined or integrated with another system or some features or functions in a unit or component may be selectively disabled. The connections discussed above in connection with the figures may be direct or indirect couplings between the units or components in terms of connectivity between the different units or components.
In the present disclosure, units described as separate parts may or may not be physically separate, and parts shown as units may or may not be physical units. The aforementioned components or units may be co-located or distributed across multiple network elements. In addition, according to actual needs, part or all of the units can be selected to achieve the purpose of the scheme of the embodiment of the disclosure. In addition, in some scenarios, multiple units in embodiments of the present disclosure may be integrated into one unit or each unit may exist physically separately.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments. All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be construed as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The electronic device or apparatus of the present disclosure may also be applied to the fields of the internet, the internet of things, data centers, energy, transportation, public management, manufacturing, education, power grid, telecommunications, finance, retail, construction site, medical, and the like. Further, the electronic device or apparatus of the present disclosure may also be used in application scenarios related to artificial intelligence, big data, and/or cloud computing, such as a cloud, an edge, and a terminal. In one or more embodiments, an electronic device or apparatus with high computing power according to the present disclosure may be applied to a cloud device (e.g., a cloud server), and an electronic device or apparatus with low power consumption may be applied to a terminal device and/or an edge device (e.g., a smartphone or a camera). In one or more embodiments, the hardware information of the cloud device and the hardware information of the terminal device and/or the edge device are compatible with each other, so that appropriate hardware resources can be matched from the hardware resources of the cloud device according to the hardware information of the terminal device and/or the edge device to simulate the hardware resources of the terminal device and/or the edge device, so as to complete unified management, scheduling and cooperative work of end-cloud integration or cloud-edge-end integration.
The foregoing may be better understood in light of the following clauses:
for example, clause A1, a neural network structure determining method, comprising: training a first sub-network of a current first neural network according to a hyper-parameter of a first hyper-parameter regulation cycle to obtain a second neural network of the first hyper-parameter regulation cycle, wherein the first neural network comprises a plurality of network levels, each network level comprises a plurality of network blocks, the first sub-network comprises randomly selected network blocks from the plurality of network blocks of each network level of the first neural network, and the hyper-parameter of the first hyper-parameter regulation cycle is obtained by selection in a current hyper-parameter set; performing hyper-parameter regulation processing according to a first set of a current hyper-parameter regulation cycle and a second set of the current hyper-parameter regulation cycle, determining a hyper-parameter of a next hyper-parameter regulation cycle from the current hyper-parameter set, and determining a second neural network of the next hyper-parameter regulation cycle, wherein the first set of the current hyper-parameter regulation cycle comprises a hyper-parameter of the current hyper-parameter regulation cycle and a hyper-parameter of a historical hyper-parameter regulation cycle, and the second set of the current hyper-parameter regulation cycle comprises a verification result of the second neural network of the current hyper-parameter regulation cycle and a verification result of the second neural network of the historical hyper-parameter regulation cycle; determining a third neural network and an optimized hyper-parameter set according to the first set and the second set of the last hyper-parameter regulation period and the second neural network of a plurality of hyper-parameter regulation periods, wherein the number of hyper-parameters in the optimized hyper-parameter set is smaller than that in the current hyper-parameter set; and obtaining a target neural network according to the third neural network and the optimized hyper-parameter set.
Clause A2, the method of clause A1, further comprising: randomly sampling network blocks of each network level of the second neural network respectively, and determining a first network block from each network level; determining a second sub-network according to the first network block of each network hierarchy; processing the training samples with the labeling information through the second sub-network, and determining a first processing result of the second sub-network; determining the accuracy of the second sub-network according to the first processing result and the labeling information; and determining the verification result of the second neural network according to the accuracy rates of the plurality of second sub-networks.
Clause A3, the method of clause A2, wherein the validation result comprises an average accuracy rate for the plurality of second subnetworks.
Clause A4, determining a third neural network and an optimized hyper-parameter set from the first set, the second set, and the second neural network of the plurality of hyper-parameter accommodation cycles according to the method of clause A1, including: determining a target hyper-parameter in the first set of the last hyper-parameter adjustment period according to the second set of the last hyper-parameter adjustment period; determining a second neural network of a hyper-parameter regulation cycle corresponding to the target hyper-parameter as the third neural network; and according to the second set of the last hyper-parameter adjusting period or the target hyper-parameter, reducing the hyper-parameter set of the current structure adjusting period to obtain the optimized hyper-parameter set.
Clause A5, according to the method of clause A4, narrowing the hyper-parameter set of the current structure adjustment cycle according to the second set of the last hyper-parameter adjustment cycle or the target hyper-parameter, to obtain the optimized hyper-parameter set, including: and under the condition that the hyper-parameters are continuous variables, preserving the hyper-parameters in a preset neighborhood taking the target hyper-parameters as the center, and removing the hyper-parameters outside the preset neighborhood.
Clause A6, according to the method of clause A4, narrowing the hyper-parameter set of the current structure adjustment cycle according to the second set of the last hyper-parameter adjustment cycle or the target hyper-parameter, to obtain the optimized hyper-parameter set, including: and under the condition that the hyper-parameters are discrete variables, removing first hyper-parameters, wherein the verification result of the second neural network corresponding to the first hyper-parameters does not meet the accuracy condition.
Clause A7, according to the method described in clause A1, the structural regulation cycle includes multiple hyper-parameter regulation cycles, the current hyper-parameter set is a hyper-parameter set of the current structural regulation cycle, the optimized hyper-parameter set is a hyper-parameter set of the next hyper-parameter regulation cycle, the current first neural network is a first neural network of the current structural regulation cycle, and the target neural network is obtained according to the third neural network and the optimized hyper-parameter set, including: determining the first neural network of the next structural regulation cycle according to the verification result of the third neural network, wherein the number of network blocks included in each network layer of the first neural network of the next structural regulation cycle is less than that of the network blocks included in each network layer of the first neural network of the current structural regulation cycle; and obtaining the target neural network according to the network of the first neural network of the next structure regulation period and the hyper-parameter set of the next structure regulation period, wherein the target neural network is the first neural network with the structure meeting the structure condition.
Clause A8, the method according to clause A7, wherein the validation results of the third neural network include validation results of a plurality of third sub-networks, the third sub-networks including randomly selected network blocks from a plurality of network blocks of each network hierarchy of the third neural network, wherein determining the first neural network for the next structural regulation cycle according to the validation results of the third neural network includes: determining a fourth sub-network which does not meet the preset accuracy requirement in the plurality of third sub-networks according to the verification results of the plurality of third sub-networks; counting the network blocks included in each network level in the preset number of fourth sub-networks, and determining second network blocks in each network level in the preset number of fourth sub-networks, wherein the second network blocks are the network blocks which meet the number requirement in each network level of the preset number of fourth sub-networks; and in the third neural network, removing the second network block to obtain the first neural network of the next structure regulation period.
Clause A9, the method of clause A8, wherein the plurality of third subnetworks include a subnetwork obtained by randomly sampling a plurality of network blocks of each network level of the second neural network corresponding to the third neural network when determining the verification result in the second set, and a subnetwork obtained by randomly sampling a plurality of network blocks of each network level of the third neural network after determining the third neural network.
Clause a10, the method of clause A1, further comprising: and training the target neural network through the randomly selected hyper-parameters in the hyper-parameter set of the structure adjusting period corresponding to the target neural network and the training sample with the labeled information to obtain the trained target neural network.
Clause a11, according to the method of clause A1, the target neural network is configured to perform any one of a text recognition task, an audio processing task, an image processing task, and a video processing task.
Clause a12, a neural network structure determining apparatus, comprising: the first training module is used for training a first sub-network of a current first neural network according to a hyper-parameter of a first hyper-parameter regulation cycle to obtain a second neural network of the first hyper-parameter regulation cycle, wherein the first neural network comprises a plurality of network levels, each network level comprises a plurality of network blocks, the first sub-network comprises randomly selected network blocks in the plurality of network blocks of each network level of the first neural network, and the hyper-parameter of the first hyper-parameter regulation cycle is obtained by selection in a current hyper-parameter set; the hyper-parameter adjusting module is used for carrying out hyper-parameter adjusting processing according to a first set of a current hyper-parameter adjusting period and a second set of the current hyper-parameter adjusting period, determining a hyper-parameter of a next hyper-parameter adjusting period from the current hyper-parameter set, and determining a second neural network of the next hyper-parameter adjusting period, wherein the first set of the current hyper-parameter adjusting period comprises the hyper-parameter of the current hyper-parameter adjusting period and the hyper-parameter of a historical hyper-parameter adjusting period, and the second set of the current hyper-parameter adjusting period comprises a verification result of the second neural network of the current hyper-parameter adjusting period and a verification result of the second neural network of the historical hyper-parameter adjusting period; the first optimization module is used for determining a third neural network and an optimized hyper-parameter set according to the first set and the second set of the last hyper-parameter regulation period and the second neural network of a plurality of hyper-parameter regulation periods, wherein the number of hyper-parameters in the optimized hyper-parameter set is smaller than that in the current hyper-parameter set; and the first obtaining module is used for obtaining a target neural network according to the third neural network and the optimized hyper-parameter set.
Clause a13, the method of clause a12, the apparatus further comprising a verification result determination module for: randomly sampling network blocks of each network level of the second neural network respectively, and determining a first network block from each network level; determining a second sub-network according to the first network block of each network hierarchy; processing the training samples with the marking information through the second sub-network, and determining a first processing result of the second sub-network; determining the accuracy of the second sub-network according to the first processing result and the marking information; and determining the verification result of the second neural network according to the accuracy rates of the plurality of second sub-networks.
Clause a14, the method of clause a13, wherein the verification result comprises an average accuracy rate of the plurality of second subnetworks.
Clause a15, the method of clause a12, the first optimization module further for: determining a target hyper-parameter in the first set of the last hyper-parameter adjustment period according to the second set of the last hyper-parameter adjustment period; determining a second neural network of a hyper-parameter regulation cycle corresponding to the target hyper-parameter as the third neural network; and according to the second set of the last hyper-parameter adjusting period or the target hyper-parameter, reducing the hyper-parameter set of the current structure adjusting period to obtain the optimized hyper-parameter set.
Clause a16, the method of clause a15, the first optimization module further for: and under the condition that the hyper-parameters are continuous variables, keeping the hyper-parameters in a preset neighborhood taking the target hyper-parameters as the center, and removing the hyper-parameters outside the preset neighborhood.
Clause a17, the method of clause a15, the first optimization module further to: and under the condition that the hyper-parameters are discrete variables, removing first hyper-parameters, wherein the verification result of the second neural network corresponding to the first hyper-parameters does not meet the accuracy condition.
Clause a18, according to the method of clause a12, the structural adjustment cycle includes a plurality of hyper-parameter adjustment cycles, the current hyper-parameter set is a hyper-parameter set of the current structural adjustment cycle, the optimized hyper-parameter set is a hyper-parameter set of a next hyper-parameter adjustment cycle, the current first neural network is a first neural network of the current structural adjustment cycle, and the first obtaining module is further configured to: determining the first neural network of the next structural regulation cycle according to the verification result of the third neural network, wherein the number of network blocks included in each network layer of the first neural network of the next structural regulation cycle is less than that of the network blocks included in each network layer of the first neural network of the current structural regulation cycle; and obtaining the target neural network according to the network of the first neural network of the next structure regulation period and the hyper-parameter set of the next structure regulation period, wherein the target neural network is the first neural network with the structure meeting the structure condition.
Clause a19, the method of clause a18, wherein the validation results of the third neural network include validation results of a plurality of third subnetworks, the third subnetworks including randomly selected network blocks from a plurality of network blocks of each network level of the third neural network, the first obtaining module is further configured to: determining a fourth sub-network which does not meet the preset accuracy requirement in the plurality of third sub-networks according to the verification results of the plurality of third sub-networks; counting the network blocks included in each network level in the preset number of fourth sub-networks, and determining second network blocks in each network level in the preset number of fourth sub-networks, wherein the second network blocks are the network blocks which meet the number requirement in each network level of the preset number of fourth sub-networks; and in the third neural network, removing the second network block to obtain the first neural network of the next structure regulation period.
Clause a20, the method of clause a19, wherein the plurality of third subnetworks include a subnetwork obtained by randomly sampling a plurality of network blocks of each network hierarchy of a second neural network corresponding to a third neural network when determining the verification result in the second set, and a subnetwork obtained by randomly sampling a plurality of network blocks of each network hierarchy of the third neural network after determining the third neural network.
Clause a21, the method of clause a12, the apparatus further comprising: and the training module is used for training the target neural network through the hyper-parameters randomly selected from the hyper-parameter set of the structure regulation period corresponding to the target neural network and the training samples with the labeled information to obtain the trained target neural network.
Clause a22, the method of clause a12, wherein the target neural network is configured to perform any one of a text recognition task, an audio processing task, an image processing task, and a video processing task.
While various embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous modifications, changes, and substitutions will occur to those skilled in the art without departing from the spirit and scope of the present disclosure. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed in practicing the disclosure. It is intended that the following claims define the scope of the disclosure and that equivalents or alternatives within the scope of these claims be covered thereby.

Claims (14)

1. A neural network structure determining method, comprising:
training a first sub-network of a current first neural network according to a hyper-parameter of a first hyper-parameter regulation cycle to obtain a second neural network of the first hyper-parameter regulation cycle, wherein the first neural network comprises a plurality of network levels, each network level comprises a plurality of network blocks, the first sub-network comprises randomly selected network blocks from the plurality of network blocks of each network level of the first neural network, and the hyper-parameter of the first hyper-parameter regulation cycle is obtained by selection in a current hyper-parameter set;
performing hyper-parameter regulation processing according to a first set of a current hyper-parameter regulation cycle and a second set of the current hyper-parameter regulation cycle, determining a hyper-parameter of a next hyper-parameter regulation cycle from the current hyper-parameter set, and determining a second neural network of the next hyper-parameter regulation cycle, wherein the first set of the current hyper-parameter regulation cycle comprises a hyper-parameter of the current hyper-parameter regulation cycle and a hyper-parameter of a historical hyper-parameter regulation cycle, and the second set of the current hyper-parameter regulation cycle comprises a verification result of the second neural network of the current hyper-parameter regulation cycle and a verification result of the second neural network of the historical hyper-parameter regulation cycle;
determining a third neural network and an optimized hyper-parameter set according to the first set and the second set of the last hyper-parameter regulation period and a second neural network of a plurality of hyper-parameter regulation periods, wherein the number of hyper-parameters in the optimized hyper-parameter set is smaller than the number of hyper-parameters in the current hyper-parameter set;
and obtaining a target neural network according to the third neural network and the optimized hyper-parameter set.
2. The method of claim 1, further comprising:
randomly sampling network blocks of each network level of the second neural network respectively, and determining a first network block from each network level;
determining a second sub-network according to the first network blocks of each network hierarchy;
processing the training samples with the marking information through the second sub-network, and determining a first processing result of the second sub-network;
determining the accuracy of the second sub-network according to the first processing result and the labeling information;
and determining the verification result of the second neural network according to the accuracy rates of the plurality of second sub-networks.
3. The method of claim 2, wherein the validation result comprises an average accuracy of the plurality of second subnetworks.
4. The method of claim 1, wherein determining the third neural network and the optimized hyper-parameter set based on the first set, the second set, and the second neural network for the plurality of hyper-parameter modulation cycles comprises:
determining a target hyper-parameter in the first set of the last hyper-parameter adjustment period according to the second set of the last hyper-parameter adjustment period;
determining a second neural network of a hyper-parameter regulation cycle corresponding to the target hyper-parameter as the third neural network;
and according to the second set of the last hyper-parameter adjustment period or the target hyper-parameter, reducing the hyper-parameter set of the current structure adjustment period to obtain the optimized hyper-parameter set.
5. The method of claim 4, wherein reducing the hyper-parameter set of the current structure adjustment cycle according to the second set of the last hyper-parameter adjustment cycle or the target hyper-parameter to obtain the optimized hyper-parameter set comprises:
and under the condition that the hyper-parameters are continuous variables, keeping the hyper-parameters in a preset neighborhood taking the target hyper-parameters as the center, and removing the hyper-parameters outside the preset neighborhood.
6. The method of claim 4, wherein reducing the hyper-parameter set of the current structure adjustment cycle according to the second set of the last hyper-parameter adjustment cycle or the target hyper-parameter to obtain the optimized hyper-parameter set comprises:
and under the condition that the hyper-parameters are discrete variables, removing first hyper-parameters, wherein the verification result of the second neural network corresponding to the first hyper-parameters does not accord with the accuracy condition.
7. The method of claim 1, wherein the structure regulation period comprises a plurality of hyper-parameter regulation periods, the current hyper-parameter set is a hyper-parameter set of the current structure regulation period, the optimized hyper-parameter set is a hyper-parameter set of a next hyper-parameter regulation period, the current first neural network is a first neural network of the current structure regulation period,
obtaining a target neural network according to the third neural network and the optimized hyper-parameter set, including:
determining the first neural network of the next structural regulation cycle according to the verification result of the third neural network, wherein the number of network blocks included in each network layer of the first neural network of the next structural regulation cycle is less than that of the network blocks included in each network layer of the first neural network of the current structural regulation cycle;
and obtaining the target neural network according to the network of the first neural network of the next structure regulation period and the hyper-parameter set of the next structure regulation period, wherein the target neural network is the first neural network with the structure meeting the structure condition.
8. The method of claim 7, wherein the validation results of the third neural network comprise validation results of a plurality of third sub-networks, the third sub-networks comprising randomly selected network blocks from a plurality of network blocks of each network level of the third neural network,
wherein, according to the verification result of the third neural network, determining the first neural network of the next structure regulation cycle comprises:
determining a fourth sub-network which does not meet the preset accuracy requirement in the plurality of third sub-networks according to the verification results of the plurality of third sub-networks;
counting the network blocks included in each network level in the preset number of fourth sub-networks, and determining second network blocks in each network level in the preset number of fourth sub-networks, wherein the second network blocks are the network blocks which meet the number requirement in each network level of the preset number of fourth sub-networks;
in the third neural network, removing the second network block to obtain the first neural network of the next structure regulation cycle.
9. The method of claim 8, wherein the plurality of third sub-networks comprise sub-networks obtained by randomly sampling a plurality of network blocks of each network level of a second neural network corresponding to a third neural network when determining the verification result in the second set, and sub-networks obtained by randomly sampling a plurality of network blocks of each network level of the third neural network after determining the third neural network.
10. The method of claim 1, further comprising:
and training the target neural network through the randomly selected hyper-parameters in the hyper-parameter set of the structure regulation period corresponding to the target neural network and the training samples with the labeled information to obtain the trained target neural network.
11. The method of claim 1, wherein the target neural network is configured to perform any one of a text recognition task, an audio processing task, an image processing task, and a video processing task.
12. A neural network structure determining apparatus, comprising:
the first training module is used for training a first sub-network of a current first neural network according to a hyper-parameter of a first hyper-parameter regulation cycle to obtain a second neural network of the first hyper-parameter regulation cycle, wherein the first neural network comprises a plurality of network levels, each network level comprises a plurality of network blocks, the first sub-network comprises randomly selected network blocks in the plurality of network blocks of each network level of the first neural network, and the hyper-parameter of the first hyper-parameter regulation cycle is obtained by selection in a current hyper-parameter set;
the hyper-parameter adjusting module is used for performing hyper-parameter adjusting processing according to a first set of a current hyper-parameter adjusting period and a second set of the current hyper-parameter adjusting period, determining hyper-parameters of a next hyper-parameter adjusting period from the current hyper-parameter set, and determining a second neural network of the next hyper-parameter adjusting period, wherein the first set of the current hyper-parameter adjusting period comprises the hyper-parameters of the current hyper-parameter adjusting period and the hyper-parameters of a historical hyper-parameter adjusting period, and the second set of the current hyper-parameter adjusting period comprises the verification result of the second neural network of the current hyper-parameter adjusting period and the verification result of the second neural network of the historical hyper-parameter adjusting period;
the first optimization module is used for determining a third neural network and an optimized hyper-parameter set according to the first set and the second set of the last hyper-parameter regulation period and a second neural network of a plurality of hyper-parameter regulation periods, wherein the number of hyper-parameters in the optimized hyper-parameter set is smaller than that in the current hyper-parameter set;
and the first obtaining module is used for obtaining the target neural network according to the third neural network and the optimized hyper-parameter set.
13. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the memory-stored instructions to perform the method of any one of claims 1 to 11.
14. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 11.
CN202110509864.6A 2021-05-11 2021-05-11 Neural network structure determination method and device and related products Pending CN115329924A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110509864.6A CN115329924A (en) 2021-05-11 2021-05-11 Neural network structure determination method and device and related products

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110509864.6A CN115329924A (en) 2021-05-11 2021-05-11 Neural network structure determination method and device and related products

Publications (1)

Publication Number Publication Date
CN115329924A true CN115329924A (en) 2022-11-11

Family

ID=83912535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110509864.6A Pending CN115329924A (en) 2021-05-11 2021-05-11 Neural network structure determination method and device and related products

Country Status (1)

Country Link
CN (1) CN115329924A (en)

Similar Documents

Publication Publication Date Title
US20200279187A1 (en) Model and infrastructure hyper-parameter tuning system and method
WO2021185262A1 (en) Computing apparatus and method, board card, and computer readable storage medium
WO2021114904A1 (en) Data processing method and apparatus, computer device and storage medium
CN109740746B (en) Operation method, device and related product
CN112308201A (en) Neural network quantization method, device, chip, electronic equipment and board card
CN115329924A (en) Neural network structure determination method and device and related products
CN113033813A (en) Data processing method, data processing device, computer equipment and storage medium
CN115329925A (en) Neural network structure determination method and device and related products
CN111523653A (en) Arithmetic device and method
JP7060719B2 (en) Methods, equipment, and related products for processing data
CN115549854A (en) Cyclic redundancy check method, cyclic redundancy check device, storage medium and electronic device
CN116797464A (en) Computing method, computing device, computer apparatus, and storage medium
CN110020720B (en) Operator splicing method and device
CN111381875B (en) Data comparator, data processing method, chip and electronic equipment
CN111967588A (en) Quantitative operation method and related product
WO2021169914A1 (en) Data quantification processing method and apparatus, electronic device and storage medium
CN113112009A (en) Method, apparatus and computer-readable storage medium for neural network data quantization
WO2021017546A1 (en) Neural network quantization method and apparatus, chip, electronic device and board card
CN111738428B (en) Computing device, method and related product
JP7072680B2 (en) Methods, equipment, and related products for processing data
CN113111997B (en) Method, apparatus and related products for neural network data quantization
CN113112008B (en) Method, apparatus and computer readable storage medium for neural network data quantization
CN112232498B (en) Data processing device, integrated circuit chip, electronic equipment, board card and method
CN113469365B (en) Reasoning and compiling method based on neural network model and related products thereof
CN111275197B (en) Operation method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination