CN113902094A - Structure searching method of double-unit searching space facing language model - Google Patents

Structure searching method of double-unit searching space facing language model Download PDF

Info

Publication number
CN113902094A
CN113902094A CN202111084940.XA CN202111084940A CN113902094A CN 113902094 A CN113902094 A CN 113902094A CN 202111084940 A CN202111084940 A CN 202111084940A CN 113902094 A CN113902094 A CN 113902094A
Authority
CN
China
Prior art keywords
unit
search
language model
search space
searching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111084940.XA
Other languages
Chinese (zh)
Inventor
余正涛
苗育华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202111084940.XA priority Critical patent/CN113902094A/en
Publication of CN113902094A publication Critical patent/CN113902094A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a structure searching method of a language model-oriented double-unit searching space, and relates to the field of artificial intelligence. The invention provides improvement on the search space of the existing search strategy on the language model task, and constructs the search space more suitable for the language model task. The information storage unit is added in the cyclic neural network unit to effectively store the sequence front-end information, so that the search space is more matched with the language model task, the added unit can solve the problem that the conventional cyclic neural network unit structure cannot solve long sequence dependence, and the continuity of the sequence semantic information is improved. Meanwhile, the search space is directly enlarged due to the addition of the units, and the probability of searching a better network structure is also improved.

Description

Structure searching method of double-unit searching space facing language model
Technical Field
The invention relates to a structure searching method of a language model-oriented double-unit searching space, belonging to the technical field of artificial intelligence.
Background
The design of the search space is the first step in the search research of the neural network structure and is also an extremely important step, and the search space determines the upper limit and the lower limit of the model performance. However, the size of the search space and the contradictory relationship between the search speed and the hardware requirements make the design thereof difficult. On one hand, a huge search space has huge network exploration potential, but extremely high hardware support and time consumption are needed; on the other hand, a smaller search space, while more hardware and time friendly, is very limited in the ability to mine network potential. Therefore, how to define a suitable search space to achieve the best search effect becomes a problem to be solved in the current structure search research.
The research of the neural network structure search is still in the preliminary stage, but field experts have proposed many excellent structure search methods and achieved good results. The neural network structure searching method DARTS which is most popular at present constructs a simplest unit based on a loop structure, a directed acyclic graph is arranged in the unit, the structure in the unit is learned through a gradient optimization method, and the learned unit is circularly connected to form a final model. The model based on the cyclic unit can process a certain sequence short-term dependence problem, but when the sequence is long, the gradient at the far end of the sequence is difficult to propagate backwards to the current sequence, so that the problem of gradient disappearance is generated, and the semantic information of the sequence is interrupted. Aiming at the problem, the invention researches the search space of the structure search on the language model task and provides a structure search method based on a double-unit expansion space.
Disclosure of Invention
The invention provides a structure searching method of a language model-oriented double-unit searching space, which is used for solving the problem that when a sequence is long, the gradient of the far end of the sequence is difficult to reversely propagate to the current sequence, the gradient disappears, and the semantic information of the sequence is interrupted.
The technical scheme of the invention is as follows: the structure searching method of the double-unit searching space facing to the language model comprises the following steps of firstly, constructing the double-unit searching space;
secondly, searching on the PTB data set, and selecting a structure with the minimum loss on the verification set in the searching process as a unit structure to be selected;
and finally, entering an evaluation stage, and evaluating the unit structure to be selected obtained in the search stage in a short time on the language model task to obtain the optimal unit structure.
As a further aspect of the present invention, the structure searching method based on the dual-unit search space includes the following specific implementation steps:
step1, a double-unit search space is provided for the language model task, a search unit is arranged, and a final recurrent neural network is formed through the connection of the units, so that the search space is constructed;
step2, establishing a PTB in the whole search stage, inputting parameters, and continuously training epochs for 50 generations in total to obtain a plurality of different initial cell structures to be selected; selecting a plurality of structures with minimum loss on the verification set in the searching process as unit structures to be selected;
and Step3, evaluating a plurality of unit structures to be selected obtained in the search stage in a short time on the language model task to obtain an optimal unit structure.
As a further aspect of the present invention, the two-unit search space proposed in Step1 is to extend the large frame of the whole search space to the arrangement in DARTS, i.e. to search for one unit, and then to construct the final recurrent neural network by the connection of units, unlike DARTS, two sub-units are arranged inside each unit: information storage unit cellctAnd an information processing unit cellht(ii) a Each unit is a directed acyclic graph comprising a plurality of nodes; the input of the information storage unit is the input of a plurality of moments before the sequence, so that the front-end information of the sequence can be effectively stored.
As a further aspect of the invention, the experimental parameters of Step2 for the search phase mostly follow the settings in DARTS, with the different parameters: the number of layers of the recurrent neural network is determined as one layer, the word embedding size and the hidden layer size are both 300, and the batch size is 256; information storage unit cellc is arranged in each unittAnd an information processing unit cellhtThe information storage unit internally comprises 3 nodes, and the information processing unit internally comprises 8 nodes.
As a further scheme of the invention, the edges between the nodes are operated by adopting the following four operation functions, wherein the four operation functions are tanh, relu, sigmoid and identity.
As a further scheme of the invention, in the Step2 search stage, different algorithms are respectively used for the two optimization stages for optimization, the network weight w is optimized by using a random gradient descent SGD algorithm, the learning rate is 20, and the weight attenuation is 5 e-7; the structural weight alpha is optimized by using an Adam algorithm, the initial learning rate is 3e-3, and the weight attenuation is 1 e-3.
As a further scheme of the invention, in Step3, parameter setting in the evaluation phase is carried out, the word embedding size and the hidden layer size of the model are expanded to 850, the batch size is 64, the weight optimization method uses an average random gradient descent ASGD algorithm, the initial learning rate is 20, and the weight attenuation size is 8 e-7.
As a further scheme of the invention, in Step3, a plurality of candidate unit structures obtained in the search stage are evaluated in a short time to obtain an optimal unit structure, after the optimal unit structure is obtained, the unit is initialized with network weight again at random, and training is carried out for a longer time in the training set until the training set converges.
When the migratability of the unit structures searched by the present invention is verified, the optimal unit structures searched on the PTB data set are directly migrated to the WT2 data set for evaluation.
The invention has the beneficial effects that:
1. the structure searching method of the dual-unit searching space facing the language model, provided by the invention, improves the searching space of the existing searching strategy on the language model task, and constructs the searching space more suitable for the language model task. The information storage unit is added in the cyclic neural network unit to effectively store the sequence front-end information, so that the search space is more matched with the language model task, the added unit can solve the problem that the conventional cyclic neural network unit structure cannot solve long sequence dependence, and the continuity of the sequence semantic information is improved. Meanwhile, the search space is directly enlarged due to the addition of the units, and the probability of searching a better network structure is also improved.
2. The invention improves the search unit frame of DARTS and provides a double-unit frame. Two sub-units are arranged in each unit, and each unit is a directed acyclic graph containing a plurality of nodes. The addition of the information storage unit also directly enlarges the search space and improves the probability of searching the excellent network structure. Experiments on the Penn Treebank (PTB, word 10000) dataset and the Wikitext-2(WT2, word 33000) dataset show that on the PTB dataset, the confusion is reduced by 0.4 point relative to the baseline method, achieving better results. The migratability of the present invention was also verified on the WT2 data set, reducing the confusion on the test set by 0.2 points compared to the baseline method.
Drawings
FIG. 1 is a model diagram of a structure search method of a language model-oriented two-unit search space according to the present invention;
FIG. 2 is a schematic illustration of the confusion of five candidate structures shown in the present invention;
FIG. 3 is a schematic diagram of a structure of an information storage unit searched on a PTB data set corresponding to the confusion degree of the present invention;
FIG. 4 is a schematic diagram of a structure of an information processing unit searched on a PTB data set corresponding to the confusion degree of the present invention;
FIG. 5 is a graph illustrating the confusion performance of the present invention compared to the Darts method.
Detailed Description
Example 1: as shown in fig. 1 to 5, the structure search method of the language model-oriented two-unit search space includes: firstly, constructing a double-unit search space; secondly, searching on the PTB data set, and selecting a structure with the minimum loss on the verification set in the searching process as a unit structure to be selected; finally, entering an evaluation stage, and carrying out short-time evaluation on the unit structure to be selected obtained in the search stage on the language model task to obtain the optimal unit structure
The structure searching method based on the double-unit searching space comprises the following specific implementation steps:
step1, a double-unit search space is provided for the language model task, a search unit is arranged, and a final recurrent neural network is formed through the connection of the units, so that the search space is constructed;
the two-cell search space proposed in Step1 is a large frame continuation of the entire search spaceUnlike DARTS, in which a unit is searched and then connected to form a final recurrent neural network, two sub-units are provided in the unit at each time in the recurrent neural network: information storage unit cellctAnd an information processing unit cellhtAs shown in fig. 1, each unit is a directed acyclic graph including a plurality of nodes; the input of the information storage unit is the input x at the first five moments of the sequencet-1,xt-2,xt-3,xt-4,xt-5So as to effectively store the front-end information of the sequence. And performing linear transformation and addition on the five inputs, and then obtaining the input of the first node in the cellct unit through an activation function tanh, wherein the output of the unit is obtained by adding and averaging the outputs of all intermediate nodes. The addition of the information storage unit also directly enlarges the search space and improves the probability of searching the excellent network structure. The input of the information processing unit is the input x of the current time of the sequencetHidden state h at the previous momentt-1And output c of the information storage unitt. The input of the first node in the cell and the output of the cell are processed in the same way as the information storage cell.
Step2, establishing a PTB in the whole search stage, inputting parameters, and continuously training epochs for 50 generations in total to obtain a plurality of different initial cell structures to be selected; selecting a plurality of structures with minimum loss on the verification set in the searching process as unit structures to be selected;
the experimental parameters for the search phase in Step2 mostly follow the setup in DARTS, which is the first choice because both reinforcement learning and evolutionary algorithms require a large enough GPU cluster to search, and DARTS is much less demanding in terms of hardware and more efficient in search speed than the first two methods. The different parameters are: the number of layers of the recurrent neural network is determined as one layer, the word embedding size and the hidden layer size are both 300, and the batch size is 256; information storage unit cellc is arranged in each unittAnd an information processing unit cellhtThe information storage unit internally comprises 3 nodes, and the information processing unit internally comprises 8 nodes.Edges between nodes are operated by adopting the following four operation functions, wherein the four operation functions are tanh, relu, sigmoid and identity.
In the Step2 search stage, different algorithms are respectively used for optimizing the two optimization stages, the network weight w is optimized by using a random gradient descent SGD algorithm, the learning rate is 20, and the weight attenuation is 5 e-7; the structural weight alpha is optimized by using an Adam algorithm, the initial learning rate is 3e-3, and the weight attenuation is 1 e-3.
The searching algorithm mainly comprises four steps:
1) constructing a directed acyclic graph including a plurality of nodes, each node having a set of ordered nodes(1),node(1),…,node(n)
2) All actions that can be taken are placed between every two nodes, thereby making the discrete network structure continuous. Wherein o is(i,j)(i<j) The input of the node j is obtained by operating all nodes smaller than j, and the specific formula is as follows:
Figure BDA0003265257660000051
operation o(i,j)Usually from a set of alternative operations, for example, a recurrent neural network, which are some activation functions.
3) And (5) finding the operation corresponding to the maximum weight alpha in the joint optimization process of the structure weight alpha and the network weight w. For each set of operations o(i,j)The present invention defines a set of coefficients α ═ α(i,j)}. In practice, the present invention uses a blending operation during the training of the search
Figure BDA0003265257660000053
I.e. with softmax as activation, all operation weights are averaged, as follows:
Figure BDA0003265257660000052
4) because the whole recurrent neural network has two groups of parameters to be trained, one group is the structural parameter alpha of the network, and the other group is the weight parameter w of the network. The two sets of parameters are a process of alternating optimization. Firstly, the invention randomly initializes the structure parameter alpha to obtain an initialized network, then trains the network weight w on a training set, and reduces the loss L on the training settrainUpdating the network weight w, wherein the network structure parameter alpha is according to the loss L on the verification setvalAnd (6) updating. By such an alternate optimization, the optimal network structure is obtained, and then the search phase of the NAS is ended. And fixing the network structure according to the structure parameter alpha obtained in the searching stage, randomly initializing all the weights w of the network, and training on the training set again to obtain the final network.
Step3, in the evaluation stage and parameter setting of the evaluation stage, the word embedding size and the hidden layer size of the model are expanded to 850, the batch size is 64, the optimization method of the weight w uses an average random gradient descent (ASGD) algorithm, the initial learning rate is 20, and the weight attenuation size is 8 e-7. The invention evaluates five unit structures to be selected obtained in the search stage in a short time on the language model task to obtain an optimal unit structure. The weight w of each unit structure to be selected is initialized randomly and trained on 300 epochs in a training set, the unit structure with the lowest verification confusion degree at the moment is selected as an optimal structure, fig. 2 shows the confusion degree of the five unit structures to be selected when the five unit structures are trained on 300 epochs respectively, the lowest confusion degree is 61.79, the dotted line is the confusion degree on the verification set when the most structures searched by DARTS are trained on 300 epochs, and the lower the confusion degree is, the better. The structure of the unit corresponding to the confusion degree is shown in figures 3 and 4. Wherein, fig. 3 is an information storage unit cellct searched on the PTB data set, fig. 4 is an information processing unit cellht searched on the PTB data set, after obtaining an optimal unit structure, the unit randomly initializes the network weight w again, and trains the training set for a longer time until it converges, and table 1 shows the confusion degree after the optimal unit structure searched by the present invention is fully trained, compared with the baseline method and other methods.
Table 1: confusion contrast of the present invention over PTB datasets with other methods
Figure BDA0003265257660000061
Table 1 shows the results of the second line for the hand-designed network, the third line for the other NAS methods, and the fourth line for the baseline model and the results of the present invention. Compared with a baseline model, the confusion degree of the method is reduced by 0.6 point on the verification set and 0.4 point on the test set, so that better performance is achieved.
Step4, in order to verify the migration performance of the cell structure searched by the present invention, the present invention directly migrates the optimal cell structure searched on the PTB data set to the WT2 data set for evaluation. The size of the embedded and hidden layers are both set to 700 and the weight decay is 5 e-7. Table 2 shows the results of the confusion on the test set after migration to WT2 data sets and training.
Table 2: results of migrating searched structures on PTB dataset directly onto WT2 dataset
Figure BDA0003265257660000062
Table 2 shows the results of the second action of manually designed network, the third action of migration of network structures searched by other NAS methods on the PTB data set to the WT2 data set, and the last action of migration of network structures searched by the present invention on the PTB data set to the WT2 data set. Compared with a baseline model, the method has better effect, and the confusion degree on the test set is reduced by 0.2 point.
Step5, verifying the matching degree of the established double-unit search space, and analyzing the matching degree between the currently constructed search space and the task by verifying the complexity of sentences with different lengths. Specifically, the test sets are counted and grouped as shown in table 3:
table 3: PTB test set size and grouping
Figure BDA0003265257660000071
As shown in Table 3, the test set had 3761 sentences, the shortest word number of the sentence was 1 and the longest word number was 77. The invention divides the test set into eight groups according to the number of words and tests the performance of the model under the eight groups of data. The result is shown in FIG. 5, where the abscissa is the different sequence lengths and the ordinate is the degree of confusion in FIG. 5. FIG. 5 grouping experiment proves that the modeling capacity of the method of the invention on long sequences is better and the modeling capacity of the model on long sequences is enhanced by comparing with the Darts method.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (8)

1. The structure searching method of the double-unit searching space facing the language model is characterized in that: firstly, constructing a double-unit search space;
secondly, searching on the PTB data set, and selecting a structure with the minimum loss on the verification set in the searching process as a unit structure to be selected;
and finally, entering an evaluation stage, and evaluating the unit structure to be selected obtained in the search stage in a short time on the language model task to obtain the optimal unit structure.
2. The structure search method of a language model-oriented two-unit search space according to claim 1, wherein: the structure searching method based on the double-unit searching space comprises the following specific implementation steps:
step1, a double-unit search space is provided for the language model task, a search unit is arranged, and a final recurrent neural network is formed through the connection of the units, so that the search space is constructed;
step2, establishing a PTB in the whole search stage, inputting parameters, and continuously training epochs for 50 generations in total to obtain a plurality of different initial cell structures to be selected; selecting a plurality of structures with minimum loss on the verification set in the searching process as unit structures to be selected;
and Step3, evaluating a plurality of unit structures to be selected obtained in the search stage in a short time on the language model task to obtain an optimal unit structure.
3. The structure search method of a language model-oriented two-unit search space according to claim 1, wherein: unlike DARTS, in which two sub-units are arranged inside each unit, the two-unit search space proposed in Step1 is a large frame extending the whole search space to the arrangement in DARTS, i.e. searching for one unit, and then constructing the final recurrent neural network by the connection of units: information storage unit cellctAnd an information processing unit cellht(ii) a Each unit is a directed acyclic graph comprising a plurality of nodes; the input of the information storage unit is the input of a plurality of moments before the sequence, so that the front-end information of the sequence can be effectively stored.
4. The structure search method of a language model-oriented two-unit search space according to claim 1, wherein: the experimental parameters for the search phase in Step2 mostly follow the settings in DARTS, with the different parameters: the number of layers of the recurrent neural network is determined as one layer, the word embedding size and the hidden layer size are both 300, and the batch size is 256; information storage unit cellc is arranged in each unittAnd an information processing unit cellhtThe information storage unit internally comprises 3 nodes, and the information processing unit internally comprises 8 nodes.
5. The structure search method of a language model-oriented two-unit search space according to claim 4, wherein: edges between nodes are operated by adopting the following four operation functions, wherein the four operation functions are tanh, relu, sigmoid and identity.
6. The structure search method of a language model-oriented two-unit search space according to claim 1, wherein: in the Step2 search stage, different algorithms are respectively used for optimizing the two optimization stages, the network weight w is optimized by using a random gradient descent SGD algorithm, the learning rate is 20, and the weight attenuation is 5 e-7; the structural weight alpha is optimized by using an Adam algorithm, the initial learning rate is 3e-3, and the weight attenuation is 1 e-3.
7. The structure search method of a language model-oriented two-unit search space according to claim 1, wherein: in Step3, parameter setting in the evaluation phase is carried out, the word embedding size and the hidden layer size of the model are expanded to 850, the batch size is 64, the weight optimization method uses an average random gradient descent ASGD algorithm, the initial learning rate is 20, and the weight attenuation size is 8 e-7.
8. The structure search method of a language model-oriented two-unit search space according to claim 1, wherein: and in Step3, carrying out short-time evaluation on a plurality of cell structures to be selected obtained in the search stage to obtain an optimal cell structure, after the optimal cell structure is obtained, randomly initializing the cell with network weight again, and carrying out longer-time training on the training set until the cell structure converges.
CN202111084940.XA 2021-09-16 2021-09-16 Structure searching method of double-unit searching space facing language model Pending CN113902094A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111084940.XA CN113902094A (en) 2021-09-16 2021-09-16 Structure searching method of double-unit searching space facing language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111084940.XA CN113902094A (en) 2021-09-16 2021-09-16 Structure searching method of double-unit searching space facing language model

Publications (1)

Publication Number Publication Date
CN113902094A true CN113902094A (en) 2022-01-07

Family

ID=79028602

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111084940.XA Pending CN113902094A (en) 2021-09-16 2021-09-16 Structure searching method of double-unit searching space facing language model

Country Status (1)

Country Link
CN (1) CN113902094A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016130950A (en) * 2015-01-14 2016-07-21 京セラドキュメントソリューションズ株式会社 Data processing apparatus and data processing method
US20190228035A1 (en) * 2013-03-15 2019-07-25 Locus Lp Weighted analysis of stratified data entities in a database system
CN111126037A (en) * 2019-12-18 2020-05-08 昆明理工大学 Thai sentence segmentation method based on twin cyclic neural network
CN111428854A (en) * 2020-01-17 2020-07-17 华为技术有限公司 Structure searching method and structure searching device
CN111860495A (en) * 2020-06-19 2020-10-30 上海交通大学 Hierarchical network structure searching method and device and readable storage medium
CN111931901A (en) * 2020-07-02 2020-11-13 华为技术有限公司 Neural network construction method and device
CN112215332A (en) * 2019-07-12 2021-01-12 华为技术有限公司 Searching method of neural network structure, image processing method and device
WO2021014986A1 (en) * 2019-07-22 2021-01-28 ソニー株式会社 Information processing method, information processing device, and program
US20210056378A1 (en) * 2019-08-23 2021-02-25 Google Llc Resource constrained neural network architecture search
CN113191489A (en) * 2021-04-30 2021-07-30 华为技术有限公司 Training method of binary neural network model, image processing method and device
WO2023035986A1 (en) * 2021-09-10 2023-03-16 Oppo广东移动通信有限公司 Image processing method, electronic device and computer storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190228035A1 (en) * 2013-03-15 2019-07-25 Locus Lp Weighted analysis of stratified data entities in a database system
JP2016130950A (en) * 2015-01-14 2016-07-21 京セラドキュメントソリューションズ株式会社 Data processing apparatus and data processing method
CN112215332A (en) * 2019-07-12 2021-01-12 华为技术有限公司 Searching method of neural network structure, image processing method and device
WO2021014986A1 (en) * 2019-07-22 2021-01-28 ソニー株式会社 Information processing method, information processing device, and program
US20210056378A1 (en) * 2019-08-23 2021-02-25 Google Llc Resource constrained neural network architecture search
CN111126037A (en) * 2019-12-18 2020-05-08 昆明理工大学 Thai sentence segmentation method based on twin cyclic neural network
CN111428854A (en) * 2020-01-17 2020-07-17 华为技术有限公司 Structure searching method and structure searching device
CN111860495A (en) * 2020-06-19 2020-10-30 上海交通大学 Hierarchical network structure searching method and device and readable storage medium
CN111931901A (en) * 2020-07-02 2020-11-13 华为技术有限公司 Neural network construction method and device
CN113191489A (en) * 2021-04-30 2021-07-30 华为技术有限公司 Training method of binary neural network model, image processing method and device
WO2023035986A1 (en) * 2021-09-10 2023-03-16 Oppo广东移动通信有限公司 Image processing method, electronic device and computer storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
WAN, Q等: ""Dual-cell differentiable architecture search for language modeling"", 《JOURNAL OF INTELLIGENT & FUZZY SYSTEMS》, vol. 41, no. 02, 15 September 2021 (2021-09-15), pages 3985 - 3992 *
YINQIAO LI等: "Learning architectures from an extended search space for language modeling", 《ARXIV》, 6 March 2020 (2020-03-06), pages 1 - 11 *
万全等: "面向语言模型的全自动单元结构搜索", 《小型微型计算机系统》, vol. 43, no. 11, 8 October 2021 (2021-10-08), pages 2308 - 2313 *
李建明等: "优化搜索空间下带约束的可微分神经网络架构搜索", 《计算机应用》, vol. 42, no. 01, 17 April 2021 (2021-04-17), pages 44 - 49 *

Similar Documents

Publication Publication Date Title
Devlin et al. Neural program meta-induction
CN108132927B (en) Keyword extraction method for combining graph structure and node association
CN107729999A (en) Consider the deep neural network compression method of matrix correlation
US20230112223A1 (en) Multi-stage fpga routing method for optimizing time division multiplexing
CN104731882B (en) A kind of adaptive querying method that weighting sequence is encoded based on Hash
CN111144555A (en) Recurrent neural network architecture search method, system and medium based on improved evolutionary algorithm
CN109697289A (en) It is a kind of improved for naming the Active Learning Method of Entity recognition
CN110738362A (en) method for constructing prediction model based on improved multivariate cosmic algorithm
Ding et al. NAP: Neural architecture search with pruning
Nugroho et al. Hyper-parameter tuning based on random search for densenet optimization
CN112381208A (en) Neural network architecture searching method and system with gradual depth optimization
CN111667043B (en) Chess game playing method, system, terminal and storage medium
CN111191785A (en) Structure searching method based on expanded search space
Wan Deep learning: Neural network, optimizing method and libraries review
US11461656B2 (en) Genetic programming for partial layers of a deep learning model
CN113539372A (en) Efficient prediction method for LncRNA and disease association relation
Lan et al. Accelerated device placement optimization with contrastive learning
CN113313250B (en) Neural network training method and system adopting mixed precision quantization and knowledge distillation
Hornby et al. Accelerating human-computer collaborative search through learning comparative and predictive user models
CN108805280A (en) A kind of method and apparatus of image retrieval
CN113902094A (en) Structure searching method of double-unit searching space facing language model
Li et al. Pruner to predictor: An efficient pruning method for neural networks compression
KR102559605B1 (en) Method and apparatus for function optimization
CN115345303A (en) Convolutional neural network weight tuning method, device, storage medium and electronic equipment
Yuan et al. Uncertainty-based network for few-shot image classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination