CN111598216B - Method, device and equipment for generating student network model and storage medium - Google Patents

Method, device and equipment for generating student network model and storage medium Download PDF

Info

Publication number
CN111598216B
CN111598216B CN202010298183.5A CN202010298183A CN111598216B CN 111598216 B CN111598216 B CN 111598216B CN 202010298183 A CN202010298183 A CN 202010298183A CN 111598216 B CN111598216 B CN 111598216B
Authority
CN
China
Prior art keywords
network model
training
basic network
model
training sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010298183.5A
Other languages
Chinese (zh)
Other versions
CN111598216A (en
Inventor
牛国成
何伯磊
肖欣延
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010298183.5A priority Critical patent/CN111598216B/en
Publication of CN111598216A publication Critical patent/CN111598216A/en
Application granted granted Critical
Publication of CN111598216B publication Critical patent/CN111598216B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application discloses a method, a device, equipment and a storage medium for generating a student network model, and relates to the field of natural language processing. The specific implementation scheme is as follows: acquiring a teacher network model of a target field; predicting a first training sample through the teacher network model to generate a first prediction result; and inputting the first training sample and the first prediction result into an automatic machine learning model to generate a student network model corresponding to the teacher network model, wherein the automatic machine learning model is used for training a network structure of the student network model according to the first training sample and the first prediction result. Therefore, the student network model is generated according to the teacher network model based on the automatic machine learning model, the generated student network model is excellent in performance, and therefore prediction is carried out by using the generated student network model in the prediction stage, and prediction efficiency and effect are improved.

Description

Method, device and equipment for generating student network model and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating a student network model.
Background
Natural language processing is an important direction in the technical field of artificial intelligence, and the application field of the natural language processing is wide, and comprises an intelligent assistant, a translation tool, an unmanned automobile and the like of terminal equipment. However, if the natural speech processing model is too large in scale and high in computational complexity, not only a large amount of computational resources are consumed in prediction, but also the prediction speed of the natural speech processing model is directly affected, and further the reaction speed of an intelligent assistant of a terminal device, the translation speed of a translation tool, the reaction speed of an unmanned automobile during driving, and the like are affected.
In the related technology, a knowledge distillation mode is usually adopted to generate a student network model with the scale and the calculation complexity far lower than those of a teacher network model according to the teacher network model with the large scale and the high calculation complexity. However, the structures of the student network models in the related art are all designed manually according to experience, and the generated student network models are fixed models and have poor performance.
Disclosure of Invention
The method, the device, the equipment and the storage medium for generating the student network model are provided, the student network model is generated according to the teacher network model based on the automatic machine learning model, and the generated student network model is excellent in performance, so that the generated student network model is used for prediction in a prediction stage, and the prediction efficiency and effect are improved.
According to a first aspect, there is provided a method for generating a student network model, comprising: acquiring a teacher network model of a target field; predicting a first training sample through the teacher network model to generate a first prediction result; and inputting the first training sample and the first prediction result into an automatic machine learning model to generate a student network model corresponding to the teacher network model, wherein the automatic machine learning model is used for training a network structure of the student network model according to the first training sample and the first prediction result.
According to a second aspect, there is provided an apparatus for generating a student network model, comprising: the acquisition module is used for acquiring a teacher network model of the target field; the first generation module is used for predicting the first training sample through the teacher network model so as to generate a first prediction result; and a second generation module, configured to input the first training sample and the first prediction result into an automatic machine learning model to generate a student network model corresponding to the teacher network model, where the automatic machine learning model is used to train a network structure of the student network model according to the first training sample and the first prediction result.
According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.
According to a fourth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the first aspect.
According to the technology of the application, the student network model is generated according to the teacher network model based on the automatic machine learning model, and the generated student network model is excellent in performance, so that the generated student network model is used for prediction in the prediction stage, and the prediction efficiency and effect are improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present application;
FIG. 2 is a schematic diagram according to a second embodiment of the present application;
FIG. 3 is a schematic illustration according to a third embodiment of the present application;
FIG. 4 is a schematic illustration according to a fourth embodiment of the present application;
FIG. 5 is a schematic illustration according to a fifth embodiment of the present application;
FIG. 6 is a schematic illustration according to a sixth embodiment of the present application;
FIG. 7 is a schematic illustration according to a seventh embodiment of the present application;
fig. 8 is a block diagram of an electronic device for implementing a method for generating a student network model according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It is understood that, in natural language processing, a natural language processing model is generally used to predict the voice or characters input by a user and execute corresponding operations according to the prediction result, however, if the natural voice processing model is too large in scale and high in computational complexity, not only a large amount of computational resources are consumed in prediction, but also the prediction speed of the natural language processing model is directly affected, and further the reaction speed of an intelligent assistant of a terminal device, the translation speed of a translation tool, the reaction speed of an unmanned automobile in driving, and the like are affected.
In the related technology, a student network model with the scale and the calculation complexity far lower than those of a teacher network model can be generated according to the pre-trained teacher network model with the large scale and the high calculation complexity through a knowledge distillation method, the student network model is used for predicting the voice or characters input by a user, the student network model can learn task related knowledge from the teacher network model and imitate the behavior of the teacher network model, and therefore the effect which can be realized by the teacher network model can be achieved, the calculation resource consumption in the prediction stage can be reduced, and the prediction speed is improved. However, in the related art, the structures of the student network models are all designed manually according to experience, for example, the number of layers of the neural network, the number of neurons in each hidden layer, and the like are manually set, and the generated student network models are fixed models and have poor performance.
Aiming at the problems, by adopting the idea of knowledge distillation, firstly a teacher network model in the target field is obtained, then a first training sample is predicted through the teacher network model to generate a first prediction result, and then the first training sample and the first prediction result are input into an automatic machine learning model to generate a student network model corresponding to the teacher network model, wherein the automatic machine learning model is used for training a network structure of the student network model according to the first training sample and the first prediction result. Therefore, the student network model is generated according to the teacher network model based on the automatic machine learning model, the generated student network model is excellent in performance, and therefore prediction is carried out by using the generated student network model in the prediction stage, and prediction efficiency and effect are improved.
The following describes a method, an apparatus, an electronic device, and a non-transitory computer-readable storage medium for generating a student network model provided in the present application with reference to the drawings.
First, a method for generating a student network model provided in the present application will be described with reference to the drawings.
Fig. 1 is a flowchart illustrating a method for generating a student network model according to a first embodiment of the present application.
As shown in fig. 1, the method for generating a student network model provided by the present application may include:
step 101, a teacher network model of a target field is obtained.
Specifically, the method for generating the student network model provided by the present application can be executed by a generating device of the student network model provided by the present application, hereinafter referred to as a generating device for short, and the generating device can be configured in the electronic device to generate a network model with excellent student performance according to the teacher network model based on the automatic machine learning model. The electronic device may be a terminal, a server, or the like, which is not limited in this application.
The target field is used for representing the application fields of the student network model and the teacher network model which need to be generated. In specific implementation, the domain may be divided as needed, for example, the domain may be divided into the domains of weather, music, geography, sports, and the like, and correspondingly, the target domain may be any one of the domains of weather, music, geography, sports, and the like.
In an exemplary embodiment, the teacher network model may be generated using training samples in a number of arbitrary domains and target domains. The teacher Network model may be a Network model with multiple layers of RNN (Recurrent Neural Network), CNN (Convolutional Neural Network), transform (self-attention mechanism), or any other structure, which is not limited in this application. For a specific process of training to generate a teacher network model, reference may be made to a model training method in the related art, which is not described herein again.
And 102, predicting the first training sample through the teacher network model to generate a first prediction result.
And 103, inputting the first training sample and the first prediction result into an automatic machine learning model to generate a student network model corresponding to the teacher network model, wherein the automatic machine learning model is used for training a network structure of the student network model according to the first training sample and the first prediction result.
It can be understood that the knowledge distillation aims to extract useful information and knowledge from a teacher network model to be used as guidance in the training process of the student network model, and the student network model can obtain better performance than a student network model which is directly and independently trained by training and learning according to the useful information and knowledge extracted from the teacher network model, so that the student network model is used for predicting voice or characters and the like input by a user, the effect which can be realized by the teacher network model can be realized, the consumption of computing resources in the prediction stage can be reduced, and the prediction speed is improved.
In the application, in order to extract useful information and knowledge from the teacher network model to be used as guidance in the training process of the student network model, the first training sample can be predicted through the obtained teacher network model to generate a first prediction result, and then the first training sample and the first prediction result are used as training samples for training and generating the student network model.
The first training sample may be a training sample in the same field as the obtained teacher network model. In practical application, the first training sample may be obtained as needed, for example, if the teacher network model and the student network model are translation models for translating english into chinese, the first training sample may include massive english texts in the target field.
Specifically, the first training sample is input into the teacher network model, and a first prediction result corresponding to the first training sample can be generated.
It is understood that, since the training samples used in generating the student network model include the first prediction result in addition to the first training sample, and the first prediction result is the prediction result of the teacher network model on the first training sample, useful information and knowledge in the teacher network model can be migrated into the student network model, and the student network model can imitate the behavior of the teacher network model through learning and training.
Specifically, when the student network model is generated, the automatic machine learning model included in the generation device may be used to generate the student network model according to the first training sample and the first prediction result. The automatic machine learning model is used for training the network structure of the student network model according to the first training sample and the first prediction result.
In an exemplary embodiment, the student network model may be a network model capable of performing natural language processing and having any structure, such as RNN, CNN, or Transformer, or a network model formed by combining networks of these types, and the present application does not limit the type of network structure, the number of layers, and the like of the student network model.
In an exemplary embodiment, the automatic machine learning model may automatically construct a network structure of the student network model, and set initial parameters of the network structure, for example, the automatic machine learning model may set a hidden layer size, a depth, a convolution kernel size, a transform depth, and the like, set parameters related to the network structure and capable of determining a complexity of the generated student network model, and automatically select an optimal optimizer parameter. Then, the automatic machine learning model may predict a certain training sample in the first training samples by assuming the training sample a1 using the network structure with the parameters as the initial parameters, generate a prediction result a2 corresponding to the training sample a1, and determine a first correction coefficient according to a difference between the prediction result a2 and the prediction result of the teacher network model on the training sample a1 in the first prediction result, so as to perform a first correction on each initial parameter of the network structure by using the first correction coefficient.
Then, a second correction coefficient may be determined according to a difference between the prediction result B2 and the prediction result of the teacher network model on the training sample B1 in the first prediction result, so as to perform a second correction on the parameter after the first correction of the network structure by using the second correction coefficient.
In this manner, in a similar manner, by making multiple revisions, various parameters of the constructed network structure may be determined.
It is understood that the automatic machine learning model includes a controller, which is capable of learning a network structure with better performance when the performance of the network structure generated by the automatic machine learning model is low, such as by using a Neural Architecture Search algorithm (NAS), for example, a reinforcement learning or evolution algorithm, and for example, when the prediction speed of the generated network structure is not high enough, a network unit in the network structure that affects the prediction speed is replaced with a network unit capable of making the prediction speed of the network structure faster. Then, in the application, after determining each parameter of the constructed network structure, the automatic machine learning model may also determine the performance of the network structure, such as prediction speed, accuracy, and the like, and when the performance of the generated network structure is low, a network structure with better performance is trained as a student network model through the NAS algorithm.
It should be noted that, the performance of the network structure or the model in the present application may include a prediction efficiency and a prediction effect of the network structure or the model, where the prediction efficiency may be represented by a prediction speed of the network structure or the model and an occupation of a computing resource, and generally, the more the network structure or the model is complex, the slower the prediction speed is, the more the memory and the hard disk are occupied, and the worse the prediction efficiency is. The prediction effect can be characterized by high accuracy and low recall rate of the network structure or the model, and generally, the higher the accuracy and the recall rate of the network structure or the model are, the better the prediction effect of the network structure or the model is.
It should be noted that, when the automatic machine learning model constructs the network structure of the student network model, the network structure may be flexibly selected, for example, a RNN or Transformer network structure, or a network structure combining RNN and CNN, etc. is selected.
It can be understood that, in the method for generating a student network model provided by the application, the first training sample and the first prediction result are input into the automatic machine learning model, and the student network model corresponding to the teacher network model is generated by training the automatic machine learning model.
The method for generating the student network model comprises the steps of firstly obtaining a teacher network model in a target field, then predicting a first training sample through the teacher network model to generate a first prediction result, and then inputting the first training sample and the first prediction result into an automatic machine learning model to generate the student network model corresponding to the teacher network model, wherein the automatic machine learning model is used for training a network structure of the student network model according to the first training sample and the first prediction result. Therefore, the student network model is generated according to the teacher network model based on the automatic machine learning model, the generated student network model is excellent in performance, and therefore prediction is carried out by using the generated student network model in the prediction stage, and prediction efficiency and effect are improved.
Through the analysis, the student network model can be generated according to the teacher network model based on the automatic machine learning model, so that the voice, the text and the like input by the user can be predicted by using the student network model. In practical application, a specific scene or a specific task may have a specific requirement on the performance of the student network model, for example, the tolerance of the terminal side and the tolerance of the server side to resource consumption are different, and since the larger the scale of the model is and the higher the computational complexity is, the more accurate the prediction result of the model is, the lower the requirement of the terminal side on the accuracy of the prediction result of the student network model may be, and the higher the requirement of the server side on the accuracy of the prediction result of the student network model may be. The generation method of the student network model provided by the present application is further described below with reference to fig. 2.
Fig. 2 is a flowchart illustrating a method for generating a student network model according to a second embodiment of the present application.
As shown in fig. 2, the method for generating a student network model provided by the present application may include:
step 201, a teacher network model of the target field is obtained.
Specifically, a pre-training model may be generated according to the unlabeled and non-domain training samples, and then the pre-training model may be trained according to the labeled training samples in the target domain to generate the teacher network model.
In an exemplary embodiment, the pre-training model and the teacher network model may be a multi-layer RNN, CNN, Transformer, or any other network model with any structure, which is not limited in this application.
The specific generation methods of the pre-training model and the teacher network model may refer to a model training method in the related art, and are not described herein again.
The training samples without labels and fields can comprise the linguistic data without labels in any fields, and the linguistic data without labels in any fields are massive and large in scale, so that the pre-training model generated by the training samples without labels and fields has more parameters and high calculation resource consumption, but the pre-training model has high prediction accuracy, high recall rate, good prediction effect and strong migration capacity.
And the target field is used for representing the application fields of the student network model and the teacher network model which need to be generated.
The method has the advantages that the labeled training samples in the target field only comprise the specific linguistic data in the specific field, and the linguistic data are small in scale, so that the pre-training model is trained according to the labeled training samples in the target field, fine adjustment of parameters of the pre-training model can be achieved, the type and the parameter quantity of the pre-training model are not changed, the labeled training samples in the target field are utilized to fine-adjust the pre-training model to generate the teacher network model, the generated teacher network model is more suitable for the target field, and the prediction effect in the target field is better.
Step 202, predicting the first training sample through the teacher network model to generate a first prediction result.
It should be noted that, in the embodiment of the present application, the implementation process and principle of steps 201-202 may refer to the related description of the above embodiments, and details are not repeated herein.
The first training sample may be an unlabeled training sample of the target field, and may include unlabeled corpora of the target field.
The first training sample is a training sample without labels, so the corpus of the first training sample is of a mass scale, and in the application, the first training sample of the mass scale and the first prediction result of the teacher network model on the first training sample of the mass scale are used for generating the student network model, so that the generated student network model has more accurate parameters and better performance.
Step 203, a plurality of basic network structures are obtained, wherein each basic network structure comprises a plurality of basic network units.
The basic network structure may be any type of network structure such as CNN, RNN, Transformer, and the like. It should be noted that a basic network structure may include only a single type of network structure, such as only a CNN type of network structure or only a transform type of network structure; alternatively, multiple types of network structures may be included, that is, a basic network structure is formed by combining multiple types of network structures, for example, a combination of a CNN and an RNN is included, which is not limited in this application.
And step 204, training a plurality of basic network units in each basic network structure according to the first training sample and the first prediction result to generate parameters of the plurality of basic network units included in each basic network structure.
Specifically, the automatic machine learning model may first obtain a plurality of basic network structures respectively including a plurality of basic network elements, and then train the plurality of basic network elements in each basic network structure according to the first training sample and the first prediction result to generate parameters of the plurality of basic network elements included in each basic network structure.
Taking one of the basic network structures X as an example, the automatic machine learning model may first set initial parameters of a plurality of basic network elements included in the basic network structure X, then, predict a training sample in the first training sample by using the basic network structure X with the parameters as the initial parameters, assuming the training sample a1 to generate a prediction result a2 corresponding to the training sample a1, and determine a first correction coefficient according to a difference between the prediction result a2 and a prediction result of the teacher network model on the training sample a1 in the first prediction result, so as to perform first correction on the initial parameters of the plurality of basic network elements included in the basic network structure X by using the first correction coefficient.
Then, a basic network structure X whose parameter is the parameter after the first correction may be used to predict another training sample in the first training sample, assuming that the training sample B1 is predicted, to generate a prediction result B2 corresponding to the training sample B1, and a second correction coefficient may be determined according to a difference between the prediction result B2 and the prediction result of the teacher network model in the first prediction result on the training sample B1, so as to perform a second correction on the parameter after the first correction of the plurality of basic network units included in the basic network structure X by using the second correction coefficient.
In this way, in a similar manner, by making a plurality of corrections, the respective parameters of a plurality of basic network elements comprised by the basic network structure X can be determined.
In a similar manner, the parameters of the plurality of basic network elements included in each basic network structure can be generated by training the plurality of basic network elements in each basic network structure according to the first training sample and the first prediction result.
And step 205, obtaining the evaluation index of the student network model.
The evaluation index is used for evaluating the performance of the student network model, such as evaluating the prediction efficiency or effect of the learning network model.
In an exemplary embodiment, the evaluation index may include a speed index and a precision index so that the prediction speed and the prediction precision of the student network model can be evaluated using the speed index and the precision index, respectively, or the evaluation index may include only the speed index so that the prediction speed of the student network model can be evaluated using the speed index, or the evaluation index may include only the precision index so that the prediction precision of the student network model can be evaluated using the precision index.
Specifically, the evaluation index of the student network model may be set by the user as needed, for example, if the student network model with a prediction speed greater than 20k/s (Kilobyte/s) needs to be generated, the user may set the speed index included in the evaluation index of the student network model to be greater than 20 k/s; alternatively, the evaluation index may be determined by the automatic machine learning model according to the operation indexes of the trained plurality of basic network structures, such as the operation speed, or may be determined by other manners, which is not limited in this application.
It should be noted that, the step 205 may be executed before the step 201, or after the step 201, before the step 202, or after the step 202, before the step 203, and so on, and the application does not limit the execution timing of the step 205.
And step 206, taking the basic network structure meeting the evaluation index in the trained multiple basic network structures as a student network model.
Specifically, after parameters of a plurality of basic network units included in each basic network structure are generated, each basic network structure can be used for predicting a certain training sample, and an operation index corresponding to each basic network structure is determined according to the performance of each basic network structure in the prediction process, so that whether each trained basic network structure meets the evaluation index or not is judged according to the operation index and the evaluation index of each basic network structure, and the basic network structure meeting the evaluation index in the trained basic network structures is used as a student network model. The training sample may be a corpus included in the first training sample, or may be other corpora, which is not limited in this application.
That is, before step 206, it may further include:
acquiring operation indexes corresponding to each basic network structure after training;
and judging whether each trained basic network structure meets the evaluation index or not according to the operation index and the evaluation index corresponding to each basic network structure.
The operation index can be determined by the automatic machine learning model according to the evaluation index and the performance of the basic network structure in the process of predicting the training sample. For example, when the evaluation index includes a speed index, the operation index correspondingly includes a speed operation index, when the evaluation index includes a precision index, the operation index correspondingly includes a precision operation index, and when the evaluation index includes a speed index and a precision index, the operation index correspondingly includes a speed operation index and a precision operation index.
In an exemplary embodiment, when the evaluation index includes a speed index, if a speed operation index corresponding to a certain basic network structure is higher than the speed index, it may be determined that the basic network structure satisfies the evaluation index; when the evaluation index comprises a precision index, if the precision operation index corresponding to a certain basic network structure is higher than the precision index, the basic network structure can be determined to meet the evaluation index; when the evaluation index includes a speed index and a precision index, if the speed operation index corresponding to a certain basic network structure is higher than the speed index and the precision operation index is higher than the precision index, it can be determined that the basic network structure meets the evaluation index.
It should be noted that, in the multiple basic network structures after training, the number of the basic network structures meeting the evaluation index may be one or more, and in this embodiment of the application, the basic network structures meeting the preset number of the evaluation index may be set as the student network model according to needs.
By utilizing the above mode, according to the first training sample and the first prediction result, a plurality of basic network units in each basic network structure are trained, and according to the operation index of each basic network structure and the obtained evaluation index, the basic network structure which meets the evaluation index in the plurality of trained basic network structures is used as the student network model, so that one or more student network models meeting specific requirements can be generated as required, and the generated student network model can meet the performance requirements of a specific scene or a specific task on the student network model.
The method for generating the student network model comprises the steps of obtaining a teacher network model in a target field, predicting a first training sample through the teacher network model to generate a first prediction result, obtaining a plurality of basic network structures, wherein each basic network structure comprises a plurality of basic network units, training the plurality of basic network units in each basic network structure according to the first training sample and the first prediction result to generate parameters of the plurality of basic network units included in each basic network structure, and obtaining evaluation indexes of the student network model, so that the basic network structures meeting the evaluation indexes in the plurality of basic network structures after training are used as the student network model. Therefore, the student network model is generated according to the teacher network model based on the automatic machine learning model, the generated student network model is excellent in performance and meets evaluation indexes, and therefore prediction is carried out by using the generated student network model in the prediction stage, and prediction efficiency and effect are improved.
As can be seen from the above analysis, in the present application, based on the automatic machine learning model, a plurality of basic network units included in each of the plurality of basic network structures may be trained according to a first prediction result of the teacher network model on the first training sample and the first training sample, and a basic network structure that satisfies the evaluation index among the plurality of basic network structures after training is used as the student network model. In a possible implementation form, the trained multiple basic network structures may not meet the evaluation index, and the generation method of the student network model provided in the present application is further described below with reference to fig. 3 for the above situation.
Fig. 3 is a flowchart illustrating a method for generating a student network model according to a third embodiment of the present application.
As shown in fig. 3, the method for generating a student network model provided by the present application may include:
step 301, a teacher network model of the target field is obtained.
Step 302, the first training sample is predicted through the teacher network model to generate a first prediction result.
Step 303, obtaining a plurality of basic network structures, wherein each basic network structure comprises a plurality of basic network units.
And step 304, training a plurality of basic network units in each basic network structure according to the first training sample and the first prediction result to generate parameters of the plurality of basic network units included in each basic network structure.
And 305, obtaining an evaluation index of the student network model.
In the embodiment of the present application, the evaluation index may include a speed index or a precision index, or include a speed index and a precision index, which is not limited in the present application. The present embodiment will be described by taking an example in which the evaluation index includes a speed index.
And step 306, acquiring the operation index corresponding to each basic network structure after training.
The specific implementation process and principle of the steps 301-306 can refer to the related description of the above embodiments, and are not described herein again.
Step 307, judging whether each trained basic network structure meets the evaluation index according to the operation index and the evaluation index corresponding to each basic network structure, if so, executing step 308, otherwise, executing step 309.
And 308, taking the basic network structure meeting the evaluation index in the trained multiple basic network structures as a student network model.
Specifically, after parameters of a plurality of basic network units included in each basic network structure are generated, each basic network structure can be used to predict a certain training sample, and an operation index corresponding to each basic network structure is determined according to the performance of each basic network structure in the prediction process, so that whether each trained basic network structure meets the evaluation index is determined according to the operation index and the evaluation index of each basic network structure. The training sample may be a corpus included in the first training sample, or may be other corpora, which is not limited in this application.
Specifically, if it is determined that at least one basic network structure among the plurality of basic network structures after training satisfies the evaluation index, the basic network structure satisfying the evaluation index may be used as the student network model.
And 309, replacing the basic network unit in each basic network structure by using the alternative basic network unit with higher speed.
It is understood that the automatic machine learning model includes a controller, which is capable of learning a network structure with better performance by using a NAS algorithm, such as a reinforcement learning or evolution algorithm, when the performance of the network structure generated by the automatic machine learning model does not meet a preset requirement, for example, when the prediction speed of the generated student network model is not high enough, a network unit influencing the prediction speed in the network structure of the student network model is replaced by a network unit capable of making the prediction speed of the network structure faster.
Then, in the present application, if it is determined that the trained multiple basic network structures do not satisfy the evaluation index, for example, the multiple basic network structures do not satisfy the speed index, the automatic machine learning model may obtain a faster alternative basic network unit through the NAS algorithm to replace the basic network unit in each basic network unit, and then re-train until there is a basic network structure that satisfies the evaluation index in the trained multiple basic network structures, so as to use the basic network structure that satisfies the evaluation index as the student network model.
In specific implementation, taking an example that the evaluation index includes a speed index, the automatic machine learning model may pre-construct a plurality of basic network units, determine the prediction speed, accuracy, and the like of each basic network unit, and use, as alternative basic network units, the basic network units except the plurality of basic network units included in the plurality of basic network structures in the basic network units, so that when none of the plurality of basic network structures after training satisfies the speed index, an alternative basic network unit having a speed higher than that of the plurality of basic network units included in the current plurality of basic network structures may be selected from the plurality of pre-constructed basic network units to replace the basic network unit in each basic network structure according to the speed, accuracy, and the like of the plurality of pre-constructed basic network units.
For example, assuming that the evaluation index includes a prediction speed greater than 20k/s, 5 basic network elements respectively identified as "1", "2", "3", "4", and "5" are pre-constructed by the automatic machine learning model, and the speed of the basic network element "5" is fastest, and then the basic network element "4", the basic network element "3", and the basic network element "2" are sequentially constructed, the speed of the basic network element "1" is slowest, the basic network structure X includes the basic network elements "1", "2", the basic network structure Y includes the basic network elements "2", "3", and the prediction speeds of the basic network structures X and Y after training are respectively 18k/s and 17k/s, the basic network element "1" or "2" in the basic network structure X may be replaced with the basic network element "3", replacing a basic network unit '2' or '3' in a basic network unit Y with a basic network unit '4', further retraining a basic network structure X comprising the basic network units '1' and '3', or '2' and '3', and a basic network structure Y comprising the basic network units '2' and '4', or '3' and '4', if the predicted speeds of the basic network structures X and Y after training are still less than 20k/s, replacing any basic network unit in the basic network structure X with the basic network unit '4', replacing any basic network unit in the basic network structure Y with the basic network unit '5', and retraining the replaced basic network structures X and Y until the basic network structures with the predicted speeds of more than 20k/s exist in the trained basic network structures, and then the basic network structure with the predicted speed of more than 20k/s is used as a student network model.
It should be noted that, in practical application, the automatic machine learning model may arbitrarily set, as needed, which basic network unit in the basic network structure is replaced by the faster alternative basic network unit, and the faster alternative basic network unit for replacement may also be selected as needed, for example, when the difference between the operation index and the evaluation index of the trained basic network structure is large, the faster alternative basic network unit is selected from the faster alternative basic network units, and when the difference between the operation index and the evaluation index of the trained basic network structure is small, the slower alternative basic network unit is selected from the faster alternative basic network units. In addition, the alternative basic network unit for replacement may be a basic network unit with a higher speed relative to each basic network unit in the multiple basic network structures, so that any basic network unit included in the multiple basic network structures may be replaced; or, the alternative basic network element for replacement may also be a basic network element with a higher speed relative to a part of basic network elements in the plurality of basic network structures, so that any basic network element in the part of basic network elements in the plurality of basic network structures with a speed lower than that of the alternative basic network element may be replaced.
In the exemplary embodiment, if the evaluation index includes a precision index, after determining parameters of a plurality of basic network units in each basic network structure, a precision operation index of each basic network structure may be determined in a similar manner, and if it is determined that at least one basic network structure exists in the plurality of basic network structures after training and meets the precision index, the basic network structure meeting the precision index may be used as a student network model. If it is determined that the plurality of basic network structures after training do not meet the accuracy index, the alternative basic network units with higher accuracy can be obtained to replace the basic network units in each basic network unit, and then the training is carried out again until the basic network structures meeting the accuracy index exist in the plurality of trained basic network structures, and then the basic network structures meeting the accuracy index are used as the student network model.
In an exemplary embodiment, if the evaluation index includes a speed index and a precision index, after determining parameters of a plurality of basic network elements in each basic network structure, a precision operation index and a speed operation index of each basic network structure may be determined in a similar manner, and if it is determined that at least one basic network structure exists in the plurality of basic network structures after training, which satisfies both the precision index and the speed index, the basic network structure satisfying the precision index and the speed index may be used as the student network model. If it is determined that the plurality of basic network structures after training all meet the precision index but do not meet the speed index, the alternative basic network units with higher speed can be obtained to replace the basic network units in each basic network unit, and then training is performed again until the basic network structures meeting the precision index and the speed index exist in the plurality of trained basic network structures, so that the basic network structures meeting the precision index and the speed index are used as the student network model. If it is determined that the trained multiple basic network structures all meet the speed index but do not meet the precision index, the alternative basic network units with higher precision can be obtained to replace the basic network units in each basic network unit, and then the training is performed again until the trained multiple basic network structures have the basic network structures meeting both the precision index and the speed index, so that the basic network structures meeting both the precision index and the speed index are used as the student network model. If it is determined that all of the trained basic network structures do not meet the speed index or the precision index, the basic network structures with higher speed and higher precision can be obtained, for example, the alternative basic network unit with the highest speed in the alternative basic network units with higher computational complexity is selected to replace the basic network unit in each basic network unit, then the training is performed again, and then whether each basic network structure after the training meets the precision index and the speed index is continuously judged according to the operation index and the evaluation index of each basic network structure after the training until the trained basic network structures which meet the precision index and the speed index exist in the trained basic network structures, so that the basic network structures which meet the precision index and the speed index are used as the student network model.
When a plurality of basic network structures do not meet the evaluation index, the alternative basic network units with higher use speed, higher precision or higher speed and higher precision replace the basic network units in each basic network structure, so that when the plurality of basic network structures generated by utilizing the first training sample and the first prediction result do not meet the evaluation index, the basic network structures meeting the evaluation index are generated as the student network model based on the automatic machine learning model, and the generated student network model can meet the performance requirements of a specific scene or a specific task on the student network model.
The method for generating the student network model comprises the steps of firstly obtaining a teacher network model in a target field, then predicting a first training sample through the teacher network model to generate a first prediction result, then obtaining a plurality of basic network structures, wherein each basic network structure comprises a plurality of basic network units, training the plurality of basic network units in each basic network structure according to the first training sample and the first prediction result to generate parameters of the plurality of basic network units included in each basic network structure, then obtaining evaluation indexes of the student network model, then obtaining operation indexes corresponding to each basic network structure after training, then judging whether each basic network structure after training meets the evaluation indexes according to the operation indexes and the evaluation indexes corresponding to each basic network structure, if at least one basic network structure meets the evaluation indexes, and if the plurality of basic network structures after training do not meet the evaluation index, replacing the basic network units in each basic network structure by using the alternative basic network units with higher speed, so that the automatic machine learning-based generation of the student network model according to the teacher network model is realized, the generated student network model has excellent performance and meets the evaluation index, and the generated student network model is used for prediction in the prediction stage, thereby improving the prediction efficiency and effect.
As can be seen from the above analysis, after a plurality of basic network structures each including a plurality of basic network elements are obtained, a plurality of basic network elements in each basic network structure are trained according to a first training sample and a first prediction result, and then a basic network structure satisfying an evaluation index among the plurality of trained basic network structures is used as a student network model. In practical applications, if a large-scale training sample is directly used to generate a student network model, since the trained basic network structure including a plurality of basic network elements may not meet the evaluation index, it is time-consuming to train the basic network structure including the plurality of basic network elements, and the method for generating the student network model provided by the present application is further described with reference to fig. 4 for the above situation.
Fig. 4 is a flowchart illustrating a generation method of a student network model according to a fourth embodiment of the present application.
As shown in fig. 4, the method for generating a student network model provided by the present application may include:
step 401, a teacher network model of the target field is obtained.
Step 402, predicting the first training sample through the teacher network model to generate a first prediction result.
Step 403, obtaining a plurality of basic network structures, wherein each basic network structure comprises a plurality of basic network units.
The specific implementation process and principle of the steps 401-403 may refer to the description of the foregoing embodiments, and are not described herein again.
In step 404, a first training sample set and a second training sample set are extracted from the first training sample and the first prediction result.
And the number of training samples in the second training sample set is greater than that of the training samples in the first training sample set.
The first training sample set comprises a first training sample and a part of samples in the first prediction result, and the second training sample set comprises the first training sample and the part of samples in the first prediction result.
In an exemplary embodiment, the proportions of the first training sample set and the second training sample set in the first training sample and the first prediction result, respectively, may be arbitrarily set as needed, and only the number of training samples in the second training sample set is greater than the number of training samples in the first training sample set.
For example, 10% of the training samples in the first training sample set and 10% of the predicted results in the first predicted results may be included in the first training sample set, and 90% of the samples in the first training sample set and 90% of the predicted results in the first predicted results may be included in the second training sample set.
It should be noted that the training samples and the prediction results included in the first training sample set and the second training sample set are respectively corresponding, that is, the prediction result in the first training sample set is the prediction result of the teacher network model on the training samples in the first training sample set, and the prediction result in the second training sample set is the prediction result of the teacher network model on the training samples in the second training sample set.
In step 405, a plurality of basic network elements in each basic network structure are initially trained according to the first training sample set.
Specifically, the way of primarily training the multiple basic network units in each basic network structure according to the first training sample set is similar to the way of training the multiple basic network units in each basic network structure according to the first training sample and the first prediction result in the foregoing embodiment, and details are not repeated here.
And step 406, obtaining the evaluation index of the student network model.
Step 407, generating a preliminary evaluation index according to the evaluation index.
The preliminary evaluation index may be set as needed, for example, when the evaluation index includes a speed greater than 20k/s, the preliminary evaluation index may include a speed greater than 2 × 110%, that is, 2.20 k/s.
It should be noted that step 406 may be executed after step 405 or before step 405, and the execution timing of step 406 is not limited in the present application. In addition, if the preliminary evaluation index needs to be determined according to the evaluation index, step 406 may be performed before step 407.
And step 408, judging whether the plurality of basic network structures after the initial training meet the initial evaluation index.
And step 409, if the preliminary evaluation index is met, continuing training the plurality of basic network units in each basic network structure after the preliminary evaluation index is met according to the second training sample set to generate parameters of the plurality of basic network units included in each basic network structure.
And step 410, taking the basic network structure meeting the evaluation index in the trained multiple basic network structures as a student network model.
It can be understood that, in the present application, the first training sample set with a smaller number of training samples may be used first, the multiple basic network units in each basic network structure are initially trained, and it is determined whether the multiple basic network structures after the initial training meet the initial evaluation index, if the multiple basic network structures after the initial training do not meet the initial evaluation index, the basic network units in each basic network structure may be replaced by alternative basic network units with higher speed, higher precision, or higher speed and higher precision, and then the first training sample set is reused for training. If part or all of the plurality of basic network structures after the initial training meet the initial evaluation index, further training a plurality of basic network units in each basic network structure meeting the initial evaluation index according to a second training sample set with a large number of training samples, and taking the basic network structure meeting the evaluation index in the plurality of basic network structures after the training as a student network model.
Therefore, a plurality of basic network units in each basic network structure are initially trained by utilizing a small-scale first training sample set, and then a plurality of basic network units in each basic network structure meeting the initial evaluation index are continuously trained by utilizing a large-scale second training sample set, so that the redundant training caused by the fact that the basic network structures do not meet the evaluation index after a plurality of basic network units in each basic network structure are trained by directly utilizing a large-scale training sample can be reduced, and the training time and the computing resource consumption in the generation process of the student network model are reduced.
The method for generating the student network model comprises the steps of firstly obtaining a teacher network model in a target field, then predicting a first training sample through the teacher network model to generate a first prediction result, and then obtaining a plurality of basic network structures, wherein each basic network structure comprises a plurality of basic network units, then extracting a first training sample set and a second training sample set from the first training sample and the first prediction result, then carrying out primary training on the plurality of basic network units in each basic network structure according to the first training sample set to obtain evaluation indexes of the student network model, then generating a primary evaluation index according to the evaluation indexes, then judging whether the plurality of basic network structures after primary training meet the primary evaluation indexes, and if the primary evaluation indexes are met, further carrying out primary evaluation on the plurality of basic network units in each basic network structure after the primary evaluation indexes according to the second training sample set And the element continues training to generate parameters of a plurality of basic network units included by each basic network structure, and the basic network structure meeting the evaluation index in the plurality of basic network structures after training is used as a student network model. Therefore, the student network model is generated according to the teacher network model based on the automatic machine learning model, the generated student network model is excellent in performance and meets evaluation indexes, and therefore prediction is carried out by using the generated student network model in the prediction stage, and prediction efficiency and effect are improved.
The method for generating a student network model provided by the present application is further described below with reference to a flowchart of the method for generating a student network model shown in fig. 5.
Fig. 5 is a flowchart of a method for generating a student network model according to a fifth embodiment of the present application.
As shown in fig. 5, the method for generating a student network model provided in the embodiment of the present application may include three stages, where the first stage is a pre-training stage, the second stage is a fine-tuning stage, and the third stage is an automatic machine learning model distillation stage.
In the first stage, a general large-scale unlabeled corpus, which is a non-labeled and domain-free corpus, may be obtained (step 501), and then a pre-training model may be generated using the general large-scale unlabeled corpus as a training sample (step 502). Because the unmarked and non-field training samples are massive and large-scale, the parameters of the pre-training model generated by the unmarked and non-field training samples are more, the consumption of computing resources is large, but the prediction accuracy and the recall rate of the pre-training model are high, the prediction effect of the pre-training model is good, and the migration capability is strong.
Then, in the second stage, a small number of domains, that is, specific labeled corpora of the target domain, may be obtained (step 503), and then the pre-training model is fine-tuned by using the small number of domains of specific labeled corpora as training samples, so as to obtain a large model after domain fine tuning as a teacher network model (step 504). Because the labeled training samples in the target field only comprise the specific linguistic data in the specific field and the linguistic data are small in scale, the pre-training model is trained according to the labeled training samples in the target field, the parameters of the pre-training model can be finely adjusted, the type and the parameter quantity of the pre-training model are not changed, the pre-training model is finely adjusted by utilizing the labeled training samples in the target field to generate the teacher network model, the generated teacher network model is more suitable for the target field, and the prediction effect in the target field is better.
In the third stage, a mass of unmarked corpus of the specific field, that is, the target field, may be obtained (step 505), then the unmarked corpus of the specific field is used as a first training sample, and the first training sample is predicted by using the teacher network model to obtain a prediction result of the teacher network model (step 506), that is, the first prediction result of the application, and then the first training sample and the first prediction result are input into the automatic machine learning model. After acquiring a demand instruction of a user for speed and precision, an automatic machine learning model may generate a preliminary evaluation parameter according to the demand instruction, automatically construct a plurality of basic network results, each basic network structure includes a plurality of basic network units (step 507), automatically adjust parameters of each basic network unit (step 508), then perform model training according to a large-scale prediction result and a first training sample (step 509), specifically, during model training, a first training sample set and a second training sample set may be extracted from a first training sample and a first prediction result, where the number of training samples in the second training sample set is greater than the number of training samples in the first training sample set, then perform preliminary training on each basic network unit in each basic network structure according to the first training sample set, and then judging whether the plurality of basic network structures after the initial training meet the initial evaluation index, if so, continuing training the plurality of basic network units in each basic network structure meeting the initial evaluation index by using the number of second training samples. After training the plurality of basic network structures, it may be determined whether the trained plurality of basic network structures meet the requirements for speed and accuracy (step 510), and the basic network structures meeting the requirements are used as the student network model (step 511). If the plurality of basic network structures after training do not meet the speed and precision requirements, replacing the basic network units in the plurality of basic network structures by using alternative basic network units with higher speed and precision, then continuing to train a new basic network structure according to the process until the basic network structures meeting the speed and precision requirements exist in the plurality of basic network structures after training, and then taking the basic network structures meeting the requirements as a student network model.
Through the process, the teacher network model can be generated, and the student network model which is excellent in performance and meets the user requirements is generated according to the teacher network model based on the automatic machine learning model, so that the generated student network model is used for prediction in the prediction stage, and the prediction efficiency and effect are improved. And a plurality of basic network units in each basic network structure are initially trained by utilizing a small-scale training sample, and then a plurality of basic network units in each basic network structure meeting the initial evaluation index are continuously trained by utilizing a large-scale training sample, so that the redundant training caused by the fact that the basic network structures do not meet the evaluation index after the plurality of basic network units in each basic network structure are trained by directly utilizing the large-scale training sample can be reduced, and the training time and the computing resource consumption in the generation process of the student network model are reduced.
Next, a device for generating a student network model according to the present application will be described with reference to fig. 6.
Fig. 6 is a schematic structural diagram of a generation apparatus of a student network model according to a sixth embodiment of the present application.
As shown in fig. 6, the device 10 for generating a student network model provided by the present application includes:
the acquisition module 11 is used for acquiring a teacher network model of the target field;
the first generation module 12 is used for predicting the first training sample through the teacher network model to generate a first prediction result; and
and a second generating module 13, configured to input the first training sample and the first prediction result into an automatic machine learning model to generate a student network model corresponding to the teacher network model, where the automatic machine learning model is used to train a network structure of the student network model according to the first training sample and the first prediction result.
Specifically, the device for generating the student network model provided by the application can execute the method for generating the student network model provided by the application, and the device can be configured in the electronic equipment to generate the student network model according to the teacher network model by using the automatic machine learning model. The electronic device may be a terminal, a server, or the like, which is not limited in this application.
The description of the method for generating a student network model in the foregoing embodiment is also applicable to the device 10 for generating a student network model in the embodiment of the present application, and is not described herein again.
The device for generating the student network model comprises a teacher network model in a target field, a first training sample is predicted through the teacher network model to generate a first prediction result, the first training sample and the first prediction result are input into an automatic machine learning model to generate the student network model corresponding to the teacher network model, and the automatic machine learning model is used for training the network structure of the student network model according to the first training sample and the first prediction result. Therefore, the student network model is generated according to the teacher network model based on the automatic machine learning model, the generated student network model is excellent in performance, and therefore prediction is carried out by using the generated student network model in the prediction stage, and prediction efficiency and effect are improved.
The generation apparatus of the student network model provided in the present application is further described below with reference to fig. 7.
Fig. 7 is a schematic structural diagram of a generation apparatus of a student network model according to a seventh embodiment of the present application.
As shown in fig. 7, the device 10 for generating a student network model provided by the present application includes:
the acquisition module 11 is used for acquiring a teacher network model of the target field;
the first generation module 12 is used for predicting the first training sample through the teacher network model to generate a first prediction result; and
and a second generating module 13, configured to input the first training sample and the first prediction result into an automatic machine learning model to generate a student network model corresponding to the teacher network model, where the automatic machine learning model is used to train a network structure of the student network model according to the first training sample and the first prediction result.
In an exemplary embodiment, the first training sample may be an unlabeled training sample of the target domain.
In an exemplary embodiment, as shown in fig. 7, the second generating module 13 may include:
a first obtaining unit 131, configured to obtain a plurality of basic network structures, where each basic network structure includes a plurality of basic network elements;
a first generating unit 132, configured to train a plurality of basic network units in each basic network structure according to the first training sample and the first prediction result to generate parameters of the plurality of basic network units included in each basic network structure;
a second obtaining unit 133, configured to obtain an evaluation index of the student network model;
and the first processing unit 134 is configured to use, as the student network model, a basic network structure that meets the evaluation index among the plurality of basic network structures after training.
In an exemplary embodiment, the first generating unit 132 may include:
the extraction subunit is used for extracting a first training sample set and a second training sample set from the first training sample and the first prediction result, wherein the number of training samples in the second training sample set is greater than that of the training samples in the first training sample set;
the first training subunit is used for carrying out preliminary training on a plurality of basic network units in each basic network structure according to a first training sample set;
a generating subunit, configured to generate a preliminary evaluation index according to the evaluation index;
the judging subunit is used for judging whether the plurality of basic network structures after the initial training meet the initial evaluation index; and
and the second training subunit is used for further continuing training the plurality of basic network units in each basic network structure after the preliminary evaluation indexes are met according to the second training sample set when at least one of the plurality of basic network structures after the preliminary training meets the preliminary evaluation indexes.
In an exemplary embodiment, the evaluation index may include a speed index and/or a precision index, and as shown in fig. 7, the second generating module 13 may further include:
a second processing unit 135, configured to, when none of the plurality of basic network structures meets the speed index and/or the accuracy index, replace a basic network element in each basic network structure with an alternative basic network element with a faster speed and/or a higher accuracy.
In an exemplary embodiment, the obtaining module 11 may include:
a second generating unit 111, configured to generate a pre-training model according to the training samples without labels and without fields; and
and a third generating unit 112, configured to train the pre-training model according to the labeled training samples of the target domain to generate a teacher network model.
In an exemplary embodiment, the second generating module may further include:
a third obtaining unit 136, configured to obtain an operation index corresponding to each basic network structure after training;
the judging unit 137 is configured to judge whether each trained basic network structure meets the evaluation index according to the operation index and the evaluation index corresponding to each basic network structure.
The description of the method for generating a student network model in the foregoing embodiment is also applicable to the device 10 for generating a student network model in the embodiment of the present application, and is not described herein again.
The device for generating the student network model comprises a teacher network model in a target field, a first training sample is predicted through the teacher network model to generate a first prediction result, the first training sample and the first prediction result are input into an automatic machine learning model to generate the student network model corresponding to the teacher network model, and the automatic machine learning model is used for training the network structure of the student network model according to the first training sample and the first prediction result. Therefore, the student network model is generated according to the teacher network model based on the automatic machine learning model, the generated student network model is excellent in performance, and therefore prediction is carried out by using the generated student network model in the prediction stage, and prediction efficiency and effect are improved.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 8 is a block diagram of an electronic device according to a method for generating a student network model according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 8, the electronic apparatus includes: one or more processors 801, memory 802, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 8 illustrates an example of a processor 801.
The memory 802 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the method for generating a student network model provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the generation method of the student network model provided by the present application.
The memory 802 is a non-transitory computer readable storage medium, and can be used for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (for example, the acquisition module 11, the first generation module 12, and the second generation module 13 shown in fig. 7) corresponding to the generation method of the student network model in the embodiment of the present application. The processor 801 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 802, that is, implements the generation method of the student network model in the above-described method embodiment.
The memory 802 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device of the generation method of the student network model, and the like. Further, the memory 802 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 802 optionally includes memory located remotely from the processor 801, which may be connected over a network to the electronics of the student network model generation method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the generation method of the student network model may further include: an input device 803 and an output device 804. The processor 801, the memory 802, the input device 803, and the output device 804 may be connected by a bus or other means, and are exemplified by a bus in fig. 8.
The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the generation method of the student network model, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 804 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, a teacher network model in a target field is obtained firstly, then a first training sample is predicted through the teacher network model to generate a first prediction result, and then the first training sample and the first prediction result are input into an automatic machine learning model to generate a student network model corresponding to the teacher network model, wherein the automatic machine learning model is used for training a network structure of the student network model according to the first training sample and the first prediction result. Therefore, the student network model is generated according to the teacher network model based on the automatic machine learning model, the generated student network model is excellent in performance, and therefore prediction is carried out by using the generated student network model in the prediction stage, and prediction efficiency and effect are improved.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (16)

1. A method for generating a student network model for natural language processing, the student network model being used for predicting speech or text input by a user, the method comprising:
acquiring a teacher network model of a target field;
predicting a first training sample through the teacher network model to generate a first prediction result, wherein the first training sample is a corpus; and
inputting the first training sample and the first prediction result into an automatic machine learning model to generate a student network model for natural language processing corresponding to the teacher network model, wherein the automatic machine learning model is used for training a network structure of the student network model according to the first training sample and the first prediction result;
wherein the inputting the first training sample and the first prediction result into an automatic machine learning model to generate a student network model for natural language processing corresponding to the teacher network model comprises:
obtaining a plurality of basic network structures, each of which includes a plurality of basic network elements, extracting a first training sample set and a second training sample set from the first training sample and the first prediction result, where the number of training samples in the second training sample set is greater than the number of training samples in the first training sample set, performing preliminary training on the plurality of basic network elements in each of the basic network structures according to the first training sample set, and when the plurality of basic network structures after the preliminary training satisfy a preliminary evaluation index, further performing continuous training on the plurality of basic network elements in each of the basic network structures after the preliminary training satisfies the preliminary evaluation index according to the second training sample set to generate parameters of the plurality of basic network elements included in each of the basic network structures, and using a basic network structure which meets the evaluation index of the student network model in the plurality of trained basic network structures as the student network model.
2. The method of generating a student network model for natural language processing as claimed in claim 1 further comprising:
and obtaining the evaluation index of the student network model.
3. The method of generating a student network model for natural language processing as claimed in claim 2 further comprising:
and generating a preliminary evaluation index according to the evaluation index.
4. The method of generating a student network model for natural language processing as claimed in claim 2 wherein the evaluation index comprises a speed index and/or a precision index, the method further comprising:
and if the plurality of basic network structures do not meet the speed index and/or the precision index, replacing the basic network elements in each basic network structure by using alternative basic network elements with higher speed and/or higher precision.
5. The method of generating a student network model for natural language processing according to claim 1, wherein said obtaining a teacher network model of a target field includes:
generating a pre-training model according to the unmarked and domain-free training sample; and
and training the pre-training model according to the labeled training samples of the target field to generate the teacher network model.
6. The method of generating a student network model for natural language processing as claimed in claim 5 wherein the first training sample is an unlabeled training sample of the target domain.
7. The method for generating a student network model for natural language processing according to claim 1, wherein before said taking as the student network model a basic network structure that satisfies an evaluation index of the student network model among the plurality of basic network structures after training, further comprising:
acquiring operation indexes corresponding to each basic network structure after training;
and judging whether each trained basic network structure meets the evaluation index or not according to the operation index and the evaluation index corresponding to each basic network structure.
8. An apparatus for generating a student network model for natural language processing, the student network model being used for predicting speech or text input by a user, the apparatus comprising:
the acquisition module is used for acquiring a teacher network model of the target field;
the teacher network model is used for predicting a first training sample to generate a first prediction result, wherein the first training sample is a corpus; and
a second generation module, configured to input the first training sample and the first prediction result into an automatic machine learning model to generate a student network model for natural language processing corresponding to the teacher network model, where the automatic machine learning model is used to train a network structure of the student network model according to the first training sample and the first prediction result;
wherein the second generating module comprises:
a first obtaining unit, configured to obtain a plurality of basic network structures, where each of the basic network structures includes a plurality of basic network elements;
a first generating unit, configured to extract a first training sample set and a second training sample set from the first training sample and the first prediction result, where the number of training samples in the second training sample set is greater than the number of training samples in the first training sample set, perform preliminary training on the plurality of basic network units in each basic network structure according to the first training sample set, and when the plurality of basic network structures after the preliminary training satisfy a preliminary evaluation index, further perform continuous training on the plurality of basic network units in each basic network structure after the preliminary evaluation index is satisfied according to the second training sample set, so as to generate parameters of the plurality of basic network units included in each basic network structure;
a first processing unit, configured to use, as the student network model, a basic network structure that satisfies an evaluation index of the student network model among the plurality of basic network structures after training.
9. The apparatus for generating a student network model for natural language processing as claimed in claim 8, wherein the second generating module further comprises:
and the second acquisition unit is used for acquiring the evaluation index of the student network model.
10. The apparatus for generating a student network model for natural language processing according to claim 9, wherein the first generating unit is further configured to:
and generating the preliminary evaluation index according to the evaluation index.
11. The apparatus for generating a student network model for natural language processing according to claim 9, wherein the evaluation index includes a speed index and/or a precision index, the second generation module further comprising:
and the second processing unit is used for replacing the basic network unit in each basic network structure by using a standby basic network unit with higher speed and/or higher precision when the plurality of basic network structures do not meet the speed index and/or the precision index.
12. The apparatus for generating a student network model for natural language processing as claimed in claim 8, wherein said obtaining module comprises:
the second generation unit is used for generating a pre-training model according to the non-labeled and non-domain training sample; and
and the third generation unit is used for training the pre-training model according to the labeled training samples of the target field so as to generate the teacher network model.
13. The apparatus for generating a student network model for natural language processing of claim 12 wherein the first training sample is an unlabeled training sample for the target domain.
14. The apparatus for generating a student network model for natural language processing as claimed in claim 8, wherein the second generating module further comprises:
a third obtaining unit, configured to obtain an operation index corresponding to each basic network structure after training;
and the judging unit is used for judging whether each trained basic network structure meets the evaluation index or not according to the operation index and the evaluation index corresponding to each basic network structure.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.
CN202010298183.5A 2020-04-16 2020-04-16 Method, device and equipment for generating student network model and storage medium Active CN111598216B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010298183.5A CN111598216B (en) 2020-04-16 2020-04-16 Method, device and equipment for generating student network model and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010298183.5A CN111598216B (en) 2020-04-16 2020-04-16 Method, device and equipment for generating student network model and storage medium

Publications (2)

Publication Number Publication Date
CN111598216A CN111598216A (en) 2020-08-28
CN111598216B true CN111598216B (en) 2021-07-06

Family

ID=72190319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010298183.5A Active CN111598216B (en) 2020-04-16 2020-04-16 Method, device and equipment for generating student network model and storage medium

Country Status (1)

Country Link
CN (1) CN111598216B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837374A (en) * 2020-06-23 2021-12-24 中兴通讯股份有限公司 Neural network generation method, device and computer readable storage medium
CN112507294B (en) * 2020-10-23 2022-04-22 重庆交通大学 English teaching system and teaching method based on human-computer interaction
CN112381209B (en) * 2020-11-13 2023-12-22 平安科技(深圳)有限公司 Model compression method, system, terminal and storage medium
CN112257815A (en) * 2020-12-03 2021-01-22 北京沃东天骏信息技术有限公司 Model generation method, target detection method, device, electronic device, and medium
CN112541122A (en) * 2020-12-23 2021-03-23 北京百度网讯科技有限公司 Recommendation model training method and device, electronic equipment and storage medium
CN112734007A (en) * 2020-12-31 2021-04-30 青岛海尔科技有限公司 Method and device for acquiring compression model, storage medium and electronic device
CN113033774A (en) * 2021-03-10 2021-06-25 北京百度网讯科技有限公司 Method and device for training graph processing network model, electronic equipment and storage medium
US11200497B1 (en) * 2021-03-16 2021-12-14 Moffett Technologies Co., Limited System and method for knowledge-preserving neural network pruning
CN113222035B (en) * 2021-05-20 2021-12-31 浙江大学 Multi-class imbalance fault classification method based on reinforcement learning and knowledge distillation
CN113850012A (en) * 2021-06-11 2021-12-28 腾讯科技(深圳)有限公司 Data processing model generation method, device, medium and electronic equipment
CN113610111B (en) * 2021-07-08 2023-11-03 中南民族大学 Fusion method, device, equipment and storage medium of distributed multi-source data
US20230022947A1 (en) * 2021-07-23 2023-01-26 Lemon Inc. Identifying music attributes based on audio data
CN113947196A (en) * 2021-10-25 2022-01-18 中兴通讯股份有限公司 Network model training method and device and computer readable storage medium
CN114241282B (en) * 2021-11-04 2024-01-26 河南工业大学 Knowledge distillation-based edge equipment scene recognition method and device
CN115564024B (en) * 2022-10-11 2023-09-15 清华大学 Characteristic distillation method, device, electronic equipment and storage medium for generating network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108389576A (en) * 2018-01-10 2018-08-10 苏州思必驰信息科技有限公司 The optimization method and system of compressed speech recognition modeling
CN110175628A (en) * 2019-04-25 2019-08-27 北京大学 A kind of compression algorithm based on automatic search with the neural networks pruning of knowledge distillation
CN110807515A (en) * 2019-10-30 2020-02-18 北京百度网讯科技有限公司 Model generation method and device
CN110826344A (en) * 2019-10-24 2020-02-21 北京小米智能科技有限公司 Neural network model compression method, corpus translation method and apparatus thereof
CN110998716A (en) * 2017-08-11 2020-04-10 微软技术许可有限责任公司 Domain adaptation in speech recognition via teacher-student learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11741342B2 (en) * 2018-05-18 2023-08-29 Baidu Usa Llc Resource-efficient neural architects
CN110674880B (en) * 2019-09-27 2022-11-11 北京迈格威科技有限公司 Network training method, device, medium and electronic equipment for knowledge distillation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110998716A (en) * 2017-08-11 2020-04-10 微软技术许可有限责任公司 Domain adaptation in speech recognition via teacher-student learning
CN108389576A (en) * 2018-01-10 2018-08-10 苏州思必驰信息科技有限公司 The optimization method and system of compressed speech recognition modeling
CN110175628A (en) * 2019-04-25 2019-08-27 北京大学 A kind of compression algorithm based on automatic search with the neural networks pruning of knowledge distillation
CN110826344A (en) * 2019-10-24 2020-02-21 北京小米智能科技有限公司 Neural network model compression method, corpus translation method and apparatus thereof
CN110807515A (en) * 2019-10-30 2020-02-18 北京百度网讯科技有限公司 Model generation method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Blockwisely Supervised Neural Architecture Search with Knowledge Distillation;Changlin Li,et al.;《arXiv》;20200306;全文 *
Pre-trained Models for Natural Language Processing: A Survey;Xipeng Qiu,et al.;《arXiv》;20200324;第2.3-2.4节 *
Search to Distill: Pearls are Everywhere but not the Eyes;Yu Liu,et al.;《arXiv》;20200317;摘要、第3节 *
Training Small Networks for Scene Classification of Remote Sensing Images via Knowledge Distillation;Guanzhou Chen,et al.;《remote sensing》;20180507;第3.1.4节 *

Also Published As

Publication number Publication date
CN111598216A (en) 2020-08-28

Similar Documents

Publication Publication Date Title
CN111598216B (en) Method, device and equipment for generating student network model and storage medium
CN110727806B (en) Text processing method and device based on natural language and knowledge graph
KR102497945B1 (en) Text recognition method, electronic device, and storage medium
CN112270379B (en) Training method of classification model, sample classification method, device and equipment
JP7098853B2 (en) Methods for establishing label labeling models, devices, electronics, programs and readable storage media
CN111539223A (en) Language model training method and device, electronic equipment and readable storage medium
CN111539227A (en) Method, apparatus, device and computer storage medium for training semantic representation model
JP2021099774A (en) Vectorized representation method of document, vectorized representation device of document, and computer device
CN111783981A (en) Model training method and device, electronic equipment and readable storage medium
CN110807331B (en) Polyphone pronunciation prediction method and device and electronic equipment
CN111079945B (en) End-to-end model training method and device
CN111737995A (en) Method, device, equipment and medium for training language model based on multiple word vectors
CN111079938A (en) Question-answer reading understanding model obtaining method and device, electronic equipment and storage medium
CN111737996A (en) Method, device and equipment for obtaining word vector based on language model and storage medium
CN111709252B (en) Model improvement method and device based on pre-trained semantic model
CN111950291A (en) Semantic representation model generation method and device, electronic equipment and storage medium
CN110543558B (en) Question matching method, device, equipment and medium
CN112507101A (en) Method and device for establishing pre-training language model
US11443100B2 (en) Method and apparatus for correcting character errors, electronic device and storage medium
CN111539209A (en) Method and apparatus for entity classification
CN113723278A (en) Training method and device of form information extraction model
CN112528669A (en) Multi-language model training method and device, electronic equipment and readable storage medium
CN111950293A (en) Semantic representation model generation method and device, electronic equipment and storage medium
CN111859953A (en) Training data mining method and device, electronic equipment and storage medium
CN112560499A (en) Pre-training method and device of semantic representation model, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant