CN113902094A - Structure searching method of double-unit searching space facing language model - Google Patents
Structure searching method of double-unit searching space facing language model Download PDFInfo
- Publication number
- CN113902094A CN113902094A CN202111084940.XA CN202111084940A CN113902094A CN 113902094 A CN113902094 A CN 113902094A CN 202111084940 A CN202111084940 A CN 202111084940A CN 113902094 A CN113902094 A CN 113902094A
- Authority
- CN
- China
- Prior art keywords
- unit
- search
- language model
- search space
- searching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000013528 artificial neural network Methods 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims description 16
- 230000010365 information processing Effects 0.000 claims description 12
- 230000000306 recurrent effect Effects 0.000 claims description 12
- 238000011156 evaluation Methods 0.000 claims description 11
- 238000005457 optimization Methods 0.000 claims description 11
- 238000012795 verification Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 8
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 208000026753 anterior segment dysgenesis Diseases 0.000 claims description 3
- 125000004122 cyclic group Chemical group 0.000 abstract description 5
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000012360 testing method Methods 0.000 description 9
- 210000004027 cell Anatomy 0.000 description 6
- 230000009471 action Effects 0.000 description 4
- 238000013508 migration Methods 0.000 description 4
- 230000005012 migration Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 210000000352 storage cell Anatomy 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to a structure searching method of a language model-oriented double-unit searching space, and relates to the field of artificial intelligence. The invention provides improvement on the search space of the existing search strategy on the language model task, and constructs the search space more suitable for the language model task. The information storage unit is added in the cyclic neural network unit to effectively store the sequence front-end information, so that the search space is more matched with the language model task, the added unit can solve the problem that the conventional cyclic neural network unit structure cannot solve long sequence dependence, and the continuity of the sequence semantic information is improved. Meanwhile, the search space is directly enlarged due to the addition of the units, and the probability of searching a better network structure is also improved.
Description
Technical Field
The invention relates to a structure searching method of a language model-oriented double-unit searching space, belonging to the technical field of artificial intelligence.
Background
The design of the search space is the first step in the search research of the neural network structure and is also an extremely important step, and the search space determines the upper limit and the lower limit of the model performance. However, the size of the search space and the contradictory relationship between the search speed and the hardware requirements make the design thereof difficult. On one hand, a huge search space has huge network exploration potential, but extremely high hardware support and time consumption are needed; on the other hand, a smaller search space, while more hardware and time friendly, is very limited in the ability to mine network potential. Therefore, how to define a suitable search space to achieve the best search effect becomes a problem to be solved in the current structure search research.
The research of the neural network structure search is still in the preliminary stage, but field experts have proposed many excellent structure search methods and achieved good results. The neural network structure searching method DARTS which is most popular at present constructs a simplest unit based on a loop structure, a directed acyclic graph is arranged in the unit, the structure in the unit is learned through a gradient optimization method, and the learned unit is circularly connected to form a final model. The model based on the cyclic unit can process a certain sequence short-term dependence problem, but when the sequence is long, the gradient at the far end of the sequence is difficult to propagate backwards to the current sequence, so that the problem of gradient disappearance is generated, and the semantic information of the sequence is interrupted. Aiming at the problem, the invention researches the search space of the structure search on the language model task and provides a structure search method based on a double-unit expansion space.
Disclosure of Invention
The invention provides a structure searching method of a language model-oriented double-unit searching space, which is used for solving the problem that when a sequence is long, the gradient of the far end of the sequence is difficult to reversely propagate to the current sequence, the gradient disappears, and the semantic information of the sequence is interrupted.
The technical scheme of the invention is as follows: the structure searching method of the double-unit searching space facing to the language model comprises the following steps of firstly, constructing the double-unit searching space;
secondly, searching on the PTB data set, and selecting a structure with the minimum loss on the verification set in the searching process as a unit structure to be selected;
and finally, entering an evaluation stage, and evaluating the unit structure to be selected obtained in the search stage in a short time on the language model task to obtain the optimal unit structure.
As a further aspect of the present invention, the structure searching method based on the dual-unit search space includes the following specific implementation steps:
step1, a double-unit search space is provided for the language model task, a search unit is arranged, and a final recurrent neural network is formed through the connection of the units, so that the search space is constructed;
step2, establishing a PTB in the whole search stage, inputting parameters, and continuously training epochs for 50 generations in total to obtain a plurality of different initial cell structures to be selected; selecting a plurality of structures with minimum loss on the verification set in the searching process as unit structures to be selected;
and Step3, evaluating a plurality of unit structures to be selected obtained in the search stage in a short time on the language model task to obtain an optimal unit structure.
As a further aspect of the present invention, the two-unit search space proposed in Step1 is to extend the large frame of the whole search space to the arrangement in DARTS, i.e. to search for one unit, and then to construct the final recurrent neural network by the connection of units, unlike DARTS, two sub-units are arranged inside each unit: information storage unit cellctAnd an information processing unit cellht(ii) a Each unit is a directed acyclic graph comprising a plurality of nodes; the input of the information storage unit is the input of a plurality of moments before the sequence, so that the front-end information of the sequence can be effectively stored.
As a further aspect of the invention, the experimental parameters of Step2 for the search phase mostly follow the settings in DARTS, with the different parameters: the number of layers of the recurrent neural network is determined as one layer, the word embedding size and the hidden layer size are both 300, and the batch size is 256; information storage unit cellc is arranged in each unittAnd an information processing unit cellhtThe information storage unit internally comprises 3 nodes, and the information processing unit internally comprises 8 nodes.
As a further scheme of the invention, the edges between the nodes are operated by adopting the following four operation functions, wherein the four operation functions are tanh, relu, sigmoid and identity.
As a further scheme of the invention, in the Step2 search stage, different algorithms are respectively used for the two optimization stages for optimization, the network weight w is optimized by using a random gradient descent SGD algorithm, the learning rate is 20, and the weight attenuation is 5 e-7; the structural weight alpha is optimized by using an Adam algorithm, the initial learning rate is 3e-3, and the weight attenuation is 1 e-3.
As a further scheme of the invention, in Step3, parameter setting in the evaluation phase is carried out, the word embedding size and the hidden layer size of the model are expanded to 850, the batch size is 64, the weight optimization method uses an average random gradient descent ASGD algorithm, the initial learning rate is 20, and the weight attenuation size is 8 e-7.
As a further scheme of the invention, in Step3, a plurality of candidate unit structures obtained in the search stage are evaluated in a short time to obtain an optimal unit structure, after the optimal unit structure is obtained, the unit is initialized with network weight again at random, and training is carried out for a longer time in the training set until the training set converges.
When the migratability of the unit structures searched by the present invention is verified, the optimal unit structures searched on the PTB data set are directly migrated to the WT2 data set for evaluation.
The invention has the beneficial effects that:
1. the structure searching method of the dual-unit searching space facing the language model, provided by the invention, improves the searching space of the existing searching strategy on the language model task, and constructs the searching space more suitable for the language model task. The information storage unit is added in the cyclic neural network unit to effectively store the sequence front-end information, so that the search space is more matched with the language model task, the added unit can solve the problem that the conventional cyclic neural network unit structure cannot solve long sequence dependence, and the continuity of the sequence semantic information is improved. Meanwhile, the search space is directly enlarged due to the addition of the units, and the probability of searching a better network structure is also improved.
2. The invention improves the search unit frame of DARTS and provides a double-unit frame. Two sub-units are arranged in each unit, and each unit is a directed acyclic graph containing a plurality of nodes. The addition of the information storage unit also directly enlarges the search space and improves the probability of searching the excellent network structure. Experiments on the Penn Treebank (PTB, word 10000) dataset and the Wikitext-2(WT2, word 33000) dataset show that on the PTB dataset, the confusion is reduced by 0.4 point relative to the baseline method, achieving better results. The migratability of the present invention was also verified on the WT2 data set, reducing the confusion on the test set by 0.2 points compared to the baseline method.
Drawings
FIG. 1 is a model diagram of a structure search method of a language model-oriented two-unit search space according to the present invention;
FIG. 2 is a schematic illustration of the confusion of five candidate structures shown in the present invention;
FIG. 3 is a schematic diagram of a structure of an information storage unit searched on a PTB data set corresponding to the confusion degree of the present invention;
FIG. 4 is a schematic diagram of a structure of an information processing unit searched on a PTB data set corresponding to the confusion degree of the present invention;
FIG. 5 is a graph illustrating the confusion performance of the present invention compared to the Darts method.
Detailed Description
Example 1: as shown in fig. 1 to 5, the structure search method of the language model-oriented two-unit search space includes: firstly, constructing a double-unit search space; secondly, searching on the PTB data set, and selecting a structure with the minimum loss on the verification set in the searching process as a unit structure to be selected; finally, entering an evaluation stage, and carrying out short-time evaluation on the unit structure to be selected obtained in the search stage on the language model task to obtain the optimal unit structure
The structure searching method based on the double-unit searching space comprises the following specific implementation steps:
step1, a double-unit search space is provided for the language model task, a search unit is arranged, and a final recurrent neural network is formed through the connection of the units, so that the search space is constructed;
the two-cell search space proposed in Step1 is a large frame continuation of the entire search spaceUnlike DARTS, in which a unit is searched and then connected to form a final recurrent neural network, two sub-units are provided in the unit at each time in the recurrent neural network: information storage unit cellctAnd an information processing unit cellhtAs shown in fig. 1, each unit is a directed acyclic graph including a plurality of nodes; the input of the information storage unit is the input x at the first five moments of the sequencet-1,xt-2,xt-3,xt-4,xt-5So as to effectively store the front-end information of the sequence. And performing linear transformation and addition on the five inputs, and then obtaining the input of the first node in the cellct unit through an activation function tanh, wherein the output of the unit is obtained by adding and averaging the outputs of all intermediate nodes. The addition of the information storage unit also directly enlarges the search space and improves the probability of searching the excellent network structure. The input of the information processing unit is the input x of the current time of the sequencetHidden state h at the previous momentt-1And output c of the information storage unitt. The input of the first node in the cell and the output of the cell are processed in the same way as the information storage cell.
Step2, establishing a PTB in the whole search stage, inputting parameters, and continuously training epochs for 50 generations in total to obtain a plurality of different initial cell structures to be selected; selecting a plurality of structures with minimum loss on the verification set in the searching process as unit structures to be selected;
the experimental parameters for the search phase in Step2 mostly follow the setup in DARTS, which is the first choice because both reinforcement learning and evolutionary algorithms require a large enough GPU cluster to search, and DARTS is much less demanding in terms of hardware and more efficient in search speed than the first two methods. The different parameters are: the number of layers of the recurrent neural network is determined as one layer, the word embedding size and the hidden layer size are both 300, and the batch size is 256; information storage unit cellc is arranged in each unittAnd an information processing unit cellhtThe information storage unit internally comprises 3 nodes, and the information processing unit internally comprises 8 nodes.Edges between nodes are operated by adopting the following four operation functions, wherein the four operation functions are tanh, relu, sigmoid and identity.
In the Step2 search stage, different algorithms are respectively used for optimizing the two optimization stages, the network weight w is optimized by using a random gradient descent SGD algorithm, the learning rate is 20, and the weight attenuation is 5 e-7; the structural weight alpha is optimized by using an Adam algorithm, the initial learning rate is 3e-3, and the weight attenuation is 1 e-3.
The searching algorithm mainly comprises four steps:
1) constructing a directed acyclic graph including a plurality of nodes, each node having a set of ordered nodes(1),node(1),…,node(n);
2) All actions that can be taken are placed between every two nodes, thereby making the discrete network structure continuous. Wherein o is(i,j)(i<j) The input of the node j is obtained by operating all nodes smaller than j, and the specific formula is as follows:
operation o(i,j)Usually from a set of alternative operations, for example, a recurrent neural network, which are some activation functions.
3) And (5) finding the operation corresponding to the maximum weight alpha in the joint optimization process of the structure weight alpha and the network weight w. For each set of operations o(i,j)The present invention defines a set of coefficients α ═ α(i,j)}. In practice, the present invention uses a blending operation during the training of the searchI.e. with softmax as activation, all operation weights are averaged, as follows:
4) because the whole recurrent neural network has two groups of parameters to be trained, one group is the structural parameter alpha of the network, and the other group is the weight parameter w of the network. The two sets of parameters are a process of alternating optimization. Firstly, the invention randomly initializes the structure parameter alpha to obtain an initialized network, then trains the network weight w on a training set, and reduces the loss L on the training settrainUpdating the network weight w, wherein the network structure parameter alpha is according to the loss L on the verification setvalAnd (6) updating. By such an alternate optimization, the optimal network structure is obtained, and then the search phase of the NAS is ended. And fixing the network structure according to the structure parameter alpha obtained in the searching stage, randomly initializing all the weights w of the network, and training on the training set again to obtain the final network.
Step3, in the evaluation stage and parameter setting of the evaluation stage, the word embedding size and the hidden layer size of the model are expanded to 850, the batch size is 64, the optimization method of the weight w uses an average random gradient descent (ASGD) algorithm, the initial learning rate is 20, and the weight attenuation size is 8 e-7. The invention evaluates five unit structures to be selected obtained in the search stage in a short time on the language model task to obtain an optimal unit structure. The weight w of each unit structure to be selected is initialized randomly and trained on 300 epochs in a training set, the unit structure with the lowest verification confusion degree at the moment is selected as an optimal structure, fig. 2 shows the confusion degree of the five unit structures to be selected when the five unit structures are trained on 300 epochs respectively, the lowest confusion degree is 61.79, the dotted line is the confusion degree on the verification set when the most structures searched by DARTS are trained on 300 epochs, and the lower the confusion degree is, the better. The structure of the unit corresponding to the confusion degree is shown in figures 3 and 4. Wherein, fig. 3 is an information storage unit cellct searched on the PTB data set, fig. 4 is an information processing unit cellht searched on the PTB data set, after obtaining an optimal unit structure, the unit randomly initializes the network weight w again, and trains the training set for a longer time until it converges, and table 1 shows the confusion degree after the optimal unit structure searched by the present invention is fully trained, compared with the baseline method and other methods.
Table 1: confusion contrast of the present invention over PTB datasets with other methods
Table 1 shows the results of the second line for the hand-designed network, the third line for the other NAS methods, and the fourth line for the baseline model and the results of the present invention. Compared with a baseline model, the confusion degree of the method is reduced by 0.6 point on the verification set and 0.4 point on the test set, so that better performance is achieved.
Step4, in order to verify the migration performance of the cell structure searched by the present invention, the present invention directly migrates the optimal cell structure searched on the PTB data set to the WT2 data set for evaluation. The size of the embedded and hidden layers are both set to 700 and the weight decay is 5 e-7. Table 2 shows the results of the confusion on the test set after migration to WT2 data sets and training.
Table 2: results of migrating searched structures on PTB dataset directly onto WT2 dataset
Table 2 shows the results of the second action of manually designed network, the third action of migration of network structures searched by other NAS methods on the PTB data set to the WT2 data set, and the last action of migration of network structures searched by the present invention on the PTB data set to the WT2 data set. Compared with a baseline model, the method has better effect, and the confusion degree on the test set is reduced by 0.2 point.
Step5, verifying the matching degree of the established double-unit search space, and analyzing the matching degree between the currently constructed search space and the task by verifying the complexity of sentences with different lengths. Specifically, the test sets are counted and grouped as shown in table 3:
table 3: PTB test set size and grouping
As shown in Table 3, the test set had 3761 sentences, the shortest word number of the sentence was 1 and the longest word number was 77. The invention divides the test set into eight groups according to the number of words and tests the performance of the model under the eight groups of data. The result is shown in FIG. 5, where the abscissa is the different sequence lengths and the ordinate is the degree of confusion in FIG. 5. FIG. 5 grouping experiment proves that the modeling capacity of the method of the invention on long sequences is better and the modeling capacity of the model on long sequences is enhanced by comparing with the Darts method.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.
Claims (8)
1. The structure searching method of the double-unit searching space facing the language model is characterized in that: firstly, constructing a double-unit search space;
secondly, searching on the PTB data set, and selecting a structure with the minimum loss on the verification set in the searching process as a unit structure to be selected;
and finally, entering an evaluation stage, and evaluating the unit structure to be selected obtained in the search stage in a short time on the language model task to obtain the optimal unit structure.
2. The structure search method of a language model-oriented two-unit search space according to claim 1, wherein: the structure searching method based on the double-unit searching space comprises the following specific implementation steps:
step1, a double-unit search space is provided for the language model task, a search unit is arranged, and a final recurrent neural network is formed through the connection of the units, so that the search space is constructed;
step2, establishing a PTB in the whole search stage, inputting parameters, and continuously training epochs for 50 generations in total to obtain a plurality of different initial cell structures to be selected; selecting a plurality of structures with minimum loss on the verification set in the searching process as unit structures to be selected;
and Step3, evaluating a plurality of unit structures to be selected obtained in the search stage in a short time on the language model task to obtain an optimal unit structure.
3. The structure search method of a language model-oriented two-unit search space according to claim 1, wherein: unlike DARTS, in which two sub-units are arranged inside each unit, the two-unit search space proposed in Step1 is a large frame extending the whole search space to the arrangement in DARTS, i.e. searching for one unit, and then constructing the final recurrent neural network by the connection of units: information storage unit cellctAnd an information processing unit cellht(ii) a Each unit is a directed acyclic graph comprising a plurality of nodes; the input of the information storage unit is the input of a plurality of moments before the sequence, so that the front-end information of the sequence can be effectively stored.
4. The structure search method of a language model-oriented two-unit search space according to claim 1, wherein: the experimental parameters for the search phase in Step2 mostly follow the settings in DARTS, with the different parameters: the number of layers of the recurrent neural network is determined as one layer, the word embedding size and the hidden layer size are both 300, and the batch size is 256; information storage unit cellc is arranged in each unittAnd an information processing unit cellhtThe information storage unit internally comprises 3 nodes, and the information processing unit internally comprises 8 nodes.
5. The structure search method of a language model-oriented two-unit search space according to claim 4, wherein: edges between nodes are operated by adopting the following four operation functions, wherein the four operation functions are tanh, relu, sigmoid and identity.
6. The structure search method of a language model-oriented two-unit search space according to claim 1, wherein: in the Step2 search stage, different algorithms are respectively used for optimizing the two optimization stages, the network weight w is optimized by using a random gradient descent SGD algorithm, the learning rate is 20, and the weight attenuation is 5 e-7; the structural weight alpha is optimized by using an Adam algorithm, the initial learning rate is 3e-3, and the weight attenuation is 1 e-3.
7. The structure search method of a language model-oriented two-unit search space according to claim 1, wherein: in Step3, parameter setting in the evaluation phase is carried out, the word embedding size and the hidden layer size of the model are expanded to 850, the batch size is 64, the weight optimization method uses an average random gradient descent ASGD algorithm, the initial learning rate is 20, and the weight attenuation size is 8 e-7.
8. The structure search method of a language model-oriented two-unit search space according to claim 1, wherein: and in Step3, carrying out short-time evaluation on a plurality of cell structures to be selected obtained in the search stage to obtain an optimal cell structure, after the optimal cell structure is obtained, randomly initializing the cell with network weight again, and carrying out longer-time training on the training set until the cell structure converges.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111084940.XA CN113902094A (en) | 2021-09-16 | 2021-09-16 | Structure searching method of double-unit searching space facing language model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111084940.XA CN113902094A (en) | 2021-09-16 | 2021-09-16 | Structure searching method of double-unit searching space facing language model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113902094A true CN113902094A (en) | 2022-01-07 |
Family
ID=79028602
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111084940.XA Pending CN113902094A (en) | 2021-09-16 | 2021-09-16 | Structure searching method of double-unit searching space facing language model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113902094A (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2016130950A (en) * | 2015-01-14 | 2016-07-21 | 京セラドキュメントソリューションズ株式会社 | Data processing apparatus and data processing method |
US20190228035A1 (en) * | 2013-03-15 | 2019-07-25 | Locus Lp | Weighted analysis of stratified data entities in a database system |
CN111126037A (en) * | 2019-12-18 | 2020-05-08 | 昆明理工大学 | Thai sentence segmentation method based on twin cyclic neural network |
CN111428854A (en) * | 2020-01-17 | 2020-07-17 | 华为技术有限公司 | Structure searching method and structure searching device |
CN111860495A (en) * | 2020-06-19 | 2020-10-30 | 上海交通大学 | Hierarchical network structure searching method and device and readable storage medium |
CN111931901A (en) * | 2020-07-02 | 2020-11-13 | 华为技术有限公司 | Neural network construction method and device |
CN112215332A (en) * | 2019-07-12 | 2021-01-12 | 华为技术有限公司 | Searching method of neural network structure, image processing method and device |
WO2021014986A1 (en) * | 2019-07-22 | 2021-01-28 | ソニー株式会社 | Information processing method, information processing device, and program |
US20210056378A1 (en) * | 2019-08-23 | 2021-02-25 | Google Llc | Resource constrained neural network architecture search |
CN113191489A (en) * | 2021-04-30 | 2021-07-30 | 华为技术有限公司 | Training method of binary neural network model, image processing method and device |
WO2023035986A1 (en) * | 2021-09-10 | 2023-03-16 | Oppo广东移动通信有限公司 | Image processing method, electronic device and computer storage medium |
-
2021
- 2021-09-16 CN CN202111084940.XA patent/CN113902094A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190228035A1 (en) * | 2013-03-15 | 2019-07-25 | Locus Lp | Weighted analysis of stratified data entities in a database system |
JP2016130950A (en) * | 2015-01-14 | 2016-07-21 | 京セラドキュメントソリューションズ株式会社 | Data processing apparatus and data processing method |
CN112215332A (en) * | 2019-07-12 | 2021-01-12 | 华为技术有限公司 | Searching method of neural network structure, image processing method and device |
WO2021014986A1 (en) * | 2019-07-22 | 2021-01-28 | ソニー株式会社 | Information processing method, information processing device, and program |
US20210056378A1 (en) * | 2019-08-23 | 2021-02-25 | Google Llc | Resource constrained neural network architecture search |
CN111126037A (en) * | 2019-12-18 | 2020-05-08 | 昆明理工大学 | Thai sentence segmentation method based on twin cyclic neural network |
CN111428854A (en) * | 2020-01-17 | 2020-07-17 | 华为技术有限公司 | Structure searching method and structure searching device |
CN111860495A (en) * | 2020-06-19 | 2020-10-30 | 上海交通大学 | Hierarchical network structure searching method and device and readable storage medium |
CN111931901A (en) * | 2020-07-02 | 2020-11-13 | 华为技术有限公司 | Neural network construction method and device |
CN113191489A (en) * | 2021-04-30 | 2021-07-30 | 华为技术有限公司 | Training method of binary neural network model, image processing method and device |
WO2023035986A1 (en) * | 2021-09-10 | 2023-03-16 | Oppo广东移动通信有限公司 | Image processing method, electronic device and computer storage medium |
Non-Patent Citations (4)
Title |
---|
WAN, Q等: ""Dual-cell differentiable architecture search for language modeling"", 《JOURNAL OF INTELLIGENT & FUZZY SYSTEMS》, vol. 41, no. 02, 15 September 2021 (2021-09-15), pages 3985 - 3992 * |
YINQIAO LI等: "Learning architectures from an extended search space for language modeling", 《ARXIV》, 6 March 2020 (2020-03-06), pages 1 - 11 * |
万全等: "面向语言模型的全自动单元结构搜索", 《小型微型计算机系统》, vol. 43, no. 11, 8 October 2021 (2021-10-08), pages 2308 - 2313 * |
李建明等: "优化搜索空间下带约束的可微分神经网络架构搜索", 《计算机应用》, vol. 42, no. 01, 17 April 2021 (2021-04-17), pages 44 - 49 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Devlin et al. | Neural program meta-induction | |
CN108132927B (en) | Keyword extraction method for combining graph structure and node association | |
CN107729999A (en) | Consider the deep neural network compression method of matrix correlation | |
US20230112223A1 (en) | Multi-stage fpga routing method for optimizing time division multiplexing | |
CN104731882B (en) | A kind of adaptive querying method that weighting sequence is encoded based on Hash | |
CN111144555A (en) | Recurrent neural network architecture search method, system and medium based on improved evolutionary algorithm | |
CN109697289A (en) | It is a kind of improved for naming the Active Learning Method of Entity recognition | |
CN110738362A (en) | method for constructing prediction model based on improved multivariate cosmic algorithm | |
Ding et al. | NAP: Neural architecture search with pruning | |
Nugroho et al. | Hyper-parameter tuning based on random search for densenet optimization | |
CN112381208A (en) | Neural network architecture searching method and system with gradual depth optimization | |
CN111667043B (en) | Chess game playing method, system, terminal and storage medium | |
CN111191785A (en) | Structure searching method based on expanded search space | |
Wan | Deep learning: Neural network, optimizing method and libraries review | |
US11461656B2 (en) | Genetic programming for partial layers of a deep learning model | |
CN113539372A (en) | Efficient prediction method for LncRNA and disease association relation | |
Lan et al. | Accelerated device placement optimization with contrastive learning | |
CN113313250B (en) | Neural network training method and system adopting mixed precision quantization and knowledge distillation | |
Hornby et al. | Accelerating human-computer collaborative search through learning comparative and predictive user models | |
CN108805280A (en) | A kind of method and apparatus of image retrieval | |
CN113902094A (en) | Structure searching method of double-unit searching space facing language model | |
Li et al. | Pruner to predictor: An efficient pruning method for neural networks compression | |
KR102559605B1 (en) | Method and apparatus for function optimization | |
CN115345303A (en) | Convolutional neural network weight tuning method, device, storage medium and electronic equipment | |
Yuan et al. | Uncertainty-based network for few-shot image classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |