CN114168844A - Online prediction method, device, equipment and storage medium - Google Patents

Online prediction method, device, equipment and storage medium Download PDF

Info

Publication number
CN114168844A
CN114168844A CN202111335056.9A CN202111335056A CN114168844A CN 114168844 A CN114168844 A CN 114168844A CN 202111335056 A CN202111335056 A CN 202111335056A CN 114168844 A CN114168844 A CN 114168844A
Authority
CN
China
Prior art keywords
model
training
student
student model
instructor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111335056.9A
Other languages
Chinese (zh)
Inventor
刘文哲
金长虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shareit Information Technology Co Ltd
Original Assignee
Beijing Shareit Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shareit Information Technology Co Ltd filed Critical Beijing Shareit Information Technology Co Ltd
Priority to CN202111335056.9A priority Critical patent/CN114168844A/en
Publication of CN114168844A publication Critical patent/CN114168844A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the disclosure discloses an online prediction method, a device, equipment and a storage medium, wherein the method comprises the steps of jointly training a teacher model and a student model by utilizing a preset feature set based on the same target training parameter to obtain the trained student model; wherein, the instructor model is used for data rough arrangement; the student model is used for data fine ranking; applying predetermined parameter information obtained based on the instructor model to the student model by distillation migration; and performing online prediction of a predetermined task based on the trained student model. Compared with a teacher model for coarse exhaust and a student model for fine exhaust based on different target training parameters, the technical scheme of the embodiment of the disclosure can reduce the condition that the accuracy of coarse exhaust and fine exhaust is not matched, integrally improves the prediction accuracy of the model, can simplify the structure of the model, improves the training efficiency of the model, and brings good experience to users.

Description

Online prediction method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of online prediction, but not limited to the field of online prediction, and in particular, to an online prediction method, apparatus, device, and storage medium.
Background
With the rapid development of information technology, the amount of data is increasing day by day. In order to solve the problem of Information overload (Information overload), the recommendation system comes up. The recommendation system screens the contents which are currently preferred by the user from the rich content pool through modes of recalling, coarse ranking, fine ranking and the like on the basis of the interaction between the user and the contents. With the introduction of deep learning, the recommendation effect is more personalized, and the complexity of the model is also highlighted.
In the related art, the prediction model is complex and inefficient, so that the recommended content is not satisfactory, and bad experience is brought to the user.
Disclosure of Invention
The embodiment of the disclosure discloses an online prediction method, an online prediction device, online prediction equipment and a storage medium.
According to a first aspect of the embodiments of the present disclosure, there is provided an online prediction method, the method including:
on the basis of the same target training parameter, a predetermined feature set is used for jointly training a teacher model and a student model to obtain the trained student model; wherein, the instructor model is used for data rough arrangement; the student model is used for data fine ranking; applying predetermined parameter information obtained based on the instructor model to the student model by distillation migration;
and performing online prediction of a predetermined task based on the trained student model.
In one embodiment, the performing online prediction of the predetermined task based on the trained student model includes:
and performing online prediction of a data recommendation task based on the trained student model.
In one embodiment, the training of the instructor model and the student model using the predetermined feature set jointly based on the same target training parameter to obtain the trained student model includes:
training the instructor model and the student model simultaneously and jointly by using the preset feature set based on the same target training parameter to obtain the trained student model;
or,
and on the basis of the same target training parameter, jointly training the student model after training the instructor model by utilizing the preset feature set to obtain the trained student model.
In one embodiment, the jointly training the student model after training the instructor model by using the predetermined feature set to obtain the trained student model comprises:
and jointly training the student models after training the instructor model for a preset number of steps by using the preset feature set to obtain the trained student models.
In one embodiment, the instructor model is a single tower model, wherein the input features of the single tower model are cross features of user information and content information; the student model is a double-tower model, wherein the double-tower model comprises a user tower model and a content tower model, the input characteristics of the user tower model are user information characteristics, and the input characteristics of the content tower model are content information characteristics.
In one embodiment, the loss parameters of the distillation loss function of the manner of distillation migration are determined from the error of the logits of the instructor model and the student model.
According to a second aspect of the embodiments of the present disclosure, there is provided an online prediction apparatus, the apparatus including:
a training module to: on the basis of the same target training parameter, a predetermined feature set is used for jointly training a teacher model and a student model to obtain the trained student model; wherein, the instructor model is used for data rough arrangement; the student model is used for data fine ranking; applying predetermined parameter information obtained based on the instructor model to the student model by distillation migration;
a prediction module to: and performing online prediction of a predetermined task based on the trained student model.
In one embodiment, the prediction module is further configured to:
and performing online prediction of a data recommendation task based on the trained student model.
In one embodiment, the training module is further configured to:
training the instructor model and the student model simultaneously and jointly by using the preset feature set based on the same target training parameter to obtain the trained student model;
or,
and on the basis of the same target training parameter, jointly training the student model after training the instructor model by utilizing the preset feature set to obtain the trained student model.
In one embodiment, the training module is further configured to:
and jointly training the student models after training the instructor model for a preset number of steps by using the preset feature set to obtain the trained student models.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus, including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to: when the executable instructions are executed, the method of any embodiment of the present disclosure is implemented.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer storage medium storing a computer-executable program which, when executed by a processor, implements the method of any of the embodiments of the present disclosure.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
in the embodiment of the disclosure, based on the same target training parameter, a predetermined feature set is used to jointly train a teacher model and a student model, and the trained student model is obtained; wherein, the instructor model is used for data rough arrangement; the student model is used for data fine ranking; applying predetermined parameter information obtained based on the instructor model to the student model by distillation migration; and performing online prediction of a predetermined task based on the trained student model. Here, because training the instructor model that is used for the thick row of data and the student model that is used for the accurate row of data jointly based on the same target training parameter, compare in training the instructor model that is used for the thick row and training the student model that is used for the accurate row alone based on different target training parameters, can reduce the unmatched condition of precision of thick row and accurate row, wholly promote the prediction precision of model, and can simplify the model structure, promote model training efficiency, bring good experience for the user.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flow diagram illustrating a method of online prediction, according to an example embodiment.
FIG. 2 is a schematic diagram illustrating a machine learning model, according to an exemplary embodiment.
FIG. 3 is a schematic diagram illustrating a machine learning model, according to an exemplary embodiment.
FIG. 4 is a flow diagram illustrating a method of online prediction, according to an example embodiment.
FIG. 5 is a flow diagram illustrating a method of online prediction, according to an example embodiment.
FIG. 6 is a flow diagram illustrating a method of online prediction, according to an example embodiment.
FIG. 7 is a flow diagram illustrating a method of online prediction, according to an example embodiment.
FIG. 8 is a block diagram illustrating an online prediction device, according to an example embodiment.
FIG. 9 is a block diagram illustrating an electronic device in accordance with an example embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
In order to facilitate understanding of technical solutions of the embodiments of the present disclosure, a plurality of embodiments are listed in the embodiments of the present disclosure to clearly explain the technical solutions of the embodiments of the present disclosure. Of course, it can be understood by those skilled in the art that the embodiments provided in the present disclosure can be implemented alone, or in combination with other embodiments of the methods in the present disclosure, or in combination with some methods in other related technologies; the disclosed embodiments are not limited thereto.
For a better understanding of the embodiments of the present disclosure, the following description of the disclosed embodiments is provided by way of some exemplary embodiments:
in one scenario embodiment, the recommended ranking step focuses on precise ranking, so that under the condition of certain characteristics, more and more complex models are introduced into the recommended ranking stage, so that the relationship between the user and the content can be mined from deeper information dimensions. Especially, the consistency between the rough-row link and the fine-row link is important. The rough ranking is in an intermediate stage of recall and fine ranking, and the goal is to select a hundred-level candidate set which is consistent with the fine ranking goal from ten million-level content sets under the condition that the calculation force constraint is met.
The industry uses complex network models and rich features during the fine ranking stage, and uses simple network models and basic features to meet the time-consuming requirement during the coarse ranking stage. This approach can lead to two problems: 1. as the rough typesetting and the fine typesetting respectively use different network structures and different feature sets to train respective models, the problem of inconsistent targets of the rough typesetting and the fine typesetting can be caused, and finally the TopK candidate set output by the rough typesetting can not meet the precision required by the fine typesetting. 2. The investment in machine resources and manpower brought by the need to train many models in a multitask learning scenario can be increased greatly.
In one embodiment, for the target consistency problem, in the related art, the solution is to train in a way of fitting a coarse line to a fine line target. However, in a scenario where multiple targets are mutually exclusive in a multitask learning scenario, this approach is prone to cause the drift of the coarse-ranked target, and a phenomenon that the final coarse-ranked TopK result is neither optimal nor worst may occur. Another problem of adopting the rough-row fitting and fine-row mode is that the learning speed of the rough row cannot follow the fine row phenomenon under the scene that the strategy change after the fine row is frequent.
In one embodiment, the distillation migration framework is applied to the rough bars, and when the distillation migration framework is applied to the rough bars, the instructor model can directly guide the student model to learn the rough bars, so that the rough bars can learn the generalization ability of the fine bars, and the obtained rough bar result is better than the rough bars which are only fit with the training data. In addition, if the entropy of the soft targets transferred by distillation is higher than that of the hard targets, the rough discharge can obviously learn more information, and the rough discharge and the fine discharge are calculated simultaneously, so that the calculation resource and the labor cost are reduced. In addition, the coarse typesetting can be quickly influenced by the change of any fine typesetting, the target of the coarse typesetting is kept to serve the fine typesetting all the time, the characteristics and the model structure are more flexible, and the targets of the coarse typesetting and the fine typesetting are kept consistent all the time in the training process.
As shown in fig. 1, the present embodiment provides an online prediction method, including:
step 11, based on the same target training parameter, jointly training a teacher model and a student model by using a predetermined feature set to obtain the trained student model; wherein, the instructor model is used for data rough arrangement; the student model is used for data fine ranking; applying predetermined parameter information obtained based on the instructor model to the student model by distillation migration;
and step 12, performing online prediction of a predetermined task based on the trained student model.
In one embodiment, the online prediction method may be applied to a terminal, and the terminal may be, but is not limited to, a mobile phone, a wearable device, a vehicle-mounted terminal, a Road Side Unit (RSU), a smart home terminal, an industrial sensing device, and/or a medical device.
Illustratively, the predetermined task may be a search task. After the terminal acquires the picture information or the text information input by the user, the acquired picture information or the text information can be input into the trained student model, the content matched with the picture information or the text information is searched from the mass database, and the content is displayed to the user.
Illustratively, the predetermined task may be a recommended task. After the terminal acquires personal information (preference information and the like) of a user, the acquired personal information can be input into the trained student model, content associated with the personal information is predicted, and the content is recommended to the user.
In one embodiment, samples in the predetermined feature set are used to input a mentor model and/or a student model for training of the models. In one embodiment, the samples can be preprocessed in advance according to the input format requirements of the input layers of the instructor model and the student model, and samples meeting the requirements are obtained.
In one embodiment, the content of the sample is related to the application scenario of the predetermined task. For example, in a search task scenario where the predetermined task is a search based on image information, the content of the sample may be an image. In a recommended task scenario in which the predetermined task is a recommendation based on textual information, the content of the sample may be text, e.g., a text keyword.
In one embodiment, the instructor model and the student model may be deep convolutional neural networks. It should be noted that the instructor model and the student model may also be other network models besides a deep convolutional neural network, and are not limited in the embodiment of the present disclosure.
In one embodiment, referring to fig. 2, a schematic diagram of a double-column crude distillation migration is shown, and the main body of the migration model comprises a teacher model and a student model. When coarse, the tutor model may exploit the user-content intersection features. In one embodiment, the student model is a dual tower model comprising a user tower and a content tower, wherein the user tower is used for inputting user information; the content tower is used for inputting content information. The information is processed by double towers, and then the inner product is obtained to obtain the output of the student model.
In one embodiment, please refer to fig. 3. Schematic representation of the DNN crude distillation migration.
In one embodiment, the predetermined tasks include the stages of recall, coarse and fine.
In one embodiment, a teacher model and a student model are jointly trained by using a predetermined feature set based on the same target training parameters to obtain the trained student model, wherein the training in the rough stage and the training in the fine stage can be performed simultaneously, and the predetermined parameter information obtained by the teacher model is applied to the student model by means of distillation migration; and performing online prediction of a predetermined task based on the trained student model.
It should be noted that, jointly training the instructor model and the student model may be to train the instructor model and the student model by using the same target training parameter.
In one embodiment, the target training parameters may be determined based on the accuracy requirement parameters of the online predictions. And on the basis of the same target training parameters, jointly training a teacher model and a student model by using the preset feature set to obtain the trained student model. And performing online prediction of a predetermined task based on the trained student model.
For example, if the accuracy requirement parameter of the online prediction is greater than the accuracy threshold, it may be determined that the target training parameter is less than the parameter threshold; alternatively, if the accuracy requirement parameter of the online prediction is smaller than the accuracy threshold, it may be determined that the target training parameter is larger than the parameter threshold. In one embodiment, the target training parameter may be an output error.
In one embodiment, the target training parameters are determined based on the number of recommended content items retrieved from the mass database. And on the basis of the same target training parameters, jointly training a teacher model and a student model by using the preset feature set to obtain the trained student model. And performing online prediction of a predetermined task based on the trained student model.
For example, if the number of recommended contents obtained from the mass database is greater than a predetermined number, it may be determined that the target training parameter is greater than a parameter threshold; or, if the number of recommended contents acquired from the mass database is smaller than the predetermined number, it may be determined that the target training parameter is smaller than the parameter threshold.
In one embodiment, the target training parameters are determined based on the time required to train the model. And on the basis of the same target training parameters, jointly training a teacher model and a student model by using the preset feature set to obtain the trained student model. And performing online prediction of a predetermined task based on the trained student model.
Illustratively, if the required time for training the model is greater than a time threshold, determining that the target training parameter is less than a parameter threshold; or, if the required time for training the model is less than the time threshold, determining that the target training parameter is greater than the parameter threshold.
In one embodiment, the target training parameters are determined based on the amount of content in the source content database used for content recommendation. And on the basis of the same target training parameters, jointly training a teacher model and a student model by using the preset feature set to obtain the trained student model. And performing online prediction of a predetermined task based on the trained student model.
Illustratively, if the number of contents in the source content database is greater than a number threshold, determining that the target training parameter is less than a parameter threshold; or, if the number of contents in the source content database is smaller than the number threshold, determining that the target training parameter is larger than the parameter threshold. In one embodiment, the target training parameter may be an output error.
For a better understanding of the embodiments of the present disclosure, the following further illustrates the technical solution by an exemplary embodiment:
in one example, the distillation shift is shown in fig. 2 and 3, fig. 2 is a two column crude distillation shift, and fig. 3 is a DNN crude distillation shift. The main bodies of both structures are a student (student) model and a teacher (teacher) model. The instructor model is a complex fine-ranking model and can use the cross Features of users and contents (Other Features), and the student model is a double-tower model (user tower and content tower). The user tower only inputs user information, the content tower only inputs content information, and the inner product is finally solved after double towers, so that the output of the student model can be obtained. FIG. 3 is a DNN rough distillation migration and utilizes the feature migration previously proposed, wherein the instructor model is a complex fine line model and features are more abundant, and the network structure of the student model is shallow and features are only a part.
In one embodiment, for the above two structures, at the time of training, let X denote online Features, i.e., Features of the student model, X denotes Other Features, and denote tags, i.e., Label, by y; l represents a loss function, where Ld is distillation loss, Lt is mentor loss, Ls is student loss, Ws represents student model parameters, and Wt represents mentor model parameters.
In this embodiment, an online prediction method is provided, please refer to fig. 4, and the method includes:
and step 41, setting the training step length to be 0, and initializing the student model parameters Ws and the instructor model parameters Wt.
And step 42, training the Model (y, X, X) through the training data, wherein during training, the initial instructor Model is not well learned, and if the student Model is guided, the training deviation is likely to be caused. Therefore, the parameter λ can be introduced and initially set to 0.
And 43, in the initial training stage, when the number of training steps is less than the parameter K, updating the student model as follows:
Figure BDA0003350262670000061
when the number of training steps is more than K, the student model is updated as follows:
Figure BDA0003350262670000062
at the moment, the control is carried out through the hyper-parameter lambdaThe proportion of the loss is made.
Step 44, the update parameters of the instructor model in the whole training process are as follows:
Figure BDA0003350262670000063
the calculation of the step 45 and the step Ld is obtained by calculating the logits errors of the teacher network and the student network, and the information entropy contained in the logits directly used is sometimes lower, and at the moment, the information entropy of the logits can be improved in a distillation mode. The formula is as follows:
Figure BDA0003350262670000064
wherein qi is soft targets learned by the instructor model and the student model, and z is output logits before softmax of the neural network. The super parameter T is the distillation temperature, and the information entropy output by a network can be changed, namely, the teacher model guides the student model.
When online prediction is carried out, a student model is used for the coarse arrangement part, and a guide model is used for the fine arrangement part.
In the embodiment of the disclosure, based on the same target training parameter, a predetermined feature set is used to jointly train a teacher model and a student model, and the trained student model is obtained; wherein, the instructor model is used for data rough arrangement; the student model is used for data fine ranking; applying predetermined parameter information obtained based on the instructor model to the student model by distillation migration; and performing online prediction of a predetermined task based on the trained student model. Here, because training the instructor model that is used for the thick row of data and the student model that is used for the accurate row of data jointly based on the same target training parameter, compare in training the instructor model that is used for the thick row and training the student model that is used for the accurate row alone based on different target training parameters, can reduce the unmatched condition of precision of thick row and accurate row, wholly promote the prediction precision of model, and can simplify the model structure, promote model training efficiency, bring good experience for the user.
It should be noted that, as can be understood by those skilled in the art, the methods provided in the embodiments of the present disclosure can be executed alone or together with some methods in the embodiments of the present disclosure or some methods in the related art.
As shown in fig. 5, in this embodiment, an online prediction method is provided, where the method includes:
and 51, performing online prediction of a data recommendation task based on the trained student model.
In one embodiment, in a shopping application, when an operation of a user on an alternative object in a picture displayed in a terminal is received, commodity information expected to be obtained by the user can be determined; determining commodity preference information of a user based on historical purchase records of the user; inputting the commodity preference information into the trained student model to perform online prediction of a data recommendation task; the student model will select a predetermined number of recommended merchandise from the merchandise database and present the recommended merchandise to the user.
In one embodiment, student models may be trained for different recommendation services, obtaining different trained said student models; and according to different service type requirements, performing online prediction of the data recommendation task based on the trained student model corresponding to the service type requirements. Illustratively, the service type requirements comprise a first service type requirement, a second service type requirement and a third service type requirement, and when the requirement of the user is determined to be the first service type requirement, on-line prediction of the data recommendation task is performed based on a trained student model corresponding to the first service type requirement.
In one embodiment, the service type requirement may be determined based on characteristic information of a user input operation. Exemplarily, a portrait image, a car image and a shop image are displayed on a terminal screen, if the user input operation is a touch operation for the car image, it can be determined that the service type requirement may be a taxi taking service requirement, a taxi taking service application is selected from a massive taxi taking service application database based on the trained student model, and the taxi taking service application is recommended to the user. If the user input operation is a touch operation aiming at the shop image, the service type requirement can be determined to be a shopping requirement, a shopping link is selected from the massive shopping link database based on the trained student model, and the shopping link is recommended to the user.
In one embodiment, the service type requirement may be determined according to the feature information of the user input operation and the service type requirement priority. Exemplarily, a portrait image, an automobile image and a shop image are displayed on a terminal screen, if a user input operation is a touch operation for the portrait image, it may be determined that the service type requirement may be a communication requirement, a face recognition requirement, and a purchase requirement for clothes on a task in the portrait image, if the priority of the face recognition requirement is higher than the priority of the communication requirement, and the priority of the communication requirement is higher than the priority of the purchase requirement, the operation will be preferentially performed for the face recognition requirement, and then a face in the portrait image is determined from a massive face database based on the trained student model, and a recognition result of the face is recommended to the user.
It should be noted that, as can be understood by those skilled in the art, the methods provided in the embodiments of the present disclosure can be executed alone or together with some methods in the embodiments of the present disclosure or some methods in the related art.
As shown in fig. 6, the present embodiment provides an online prediction method, including:
step 61, training the instructor model and the student model simultaneously and jointly by using the preset feature set based on the same target training parameter to obtain the trained student model;
or,
and on the basis of the same target training parameter, jointly training the student model after training the instructor model by utilizing the preset feature set to obtain the trained student model.
In one embodiment, based on the same target training parameter, the instructor model and the student model are simultaneously and jointly trained by using the predetermined feature set, and the trained student model is obtained; wherein, the instructor model is used for data rough arrangement; the student model is used for data fine ranking; applying predetermined parameter information obtained based on the instructor model to the student model by distillation migration; and performing online prediction of a predetermined task based on the trained student model. It should be noted that joint training can be understood as training the instructor model and the student model based on the same target training parameters.
In one embodiment, the student models are jointly trained after the instructor model is trained by using the predetermined feature sets based on the same target training parameters, and the trained student models are obtained. Wherein, the instructor model is used for data rough arrangement; the student model is used for data fine ranking; applying predetermined parameter information obtained based on the instructor model to the student model by distillation migration; and performing online prediction of a predetermined task based on the trained student model. It should be noted that, jointly training the student model may be based on the training result of the instructor model, and the training result is applied to the training of the student model.
In one embodiment, the student models are jointly trained after the teacher model is trained for a predetermined number of steps by using the predetermined feature set based on the same target training parameter, and the trained student models are obtained. Wherein, the instructor model is used for data rough arrangement; the student model is used for data fine ranking; applying predetermined parameter information obtained based on the instructor model to the student model by distillation migration; and performing online prediction of a predetermined task based on the trained student model.
It should be noted that, as can be understood by those skilled in the art, the methods provided in the embodiments of the present disclosure can be executed alone or together with some methods in the embodiments of the present disclosure or some methods in the related art.
As shown in fig. 7, in this embodiment, an online prediction method is provided, where the method includes:
and step 71, training the student model jointly after training the instructor model for a preset number of steps by using the preset feature set to obtain the trained student model.
In one embodiment, the student models are jointly trained after the teacher model is trained for a predetermined number of steps by using the predetermined feature set based on the same target training parameter, and the trained student models are obtained. Wherein, the instructor model is used for data rough arrangement; the student model is used for data fine ranking; applying predetermined parameter information obtained based on the instructor model to the student model by distillation migration; and performing online prediction of a predetermined task based on the trained student model.
In one embodiment, the predetermined number of steps may be determined based on efficiency requirements of model training. Illustratively, if the efficiency requirement of the model training is less than an efficiency threshold, determining that the predetermined number of steps is greater than a predetermined number; or if the efficiency requirement of the model training is greater than the efficiency threshold, determining that the predetermined number of steps is less than a predetermined number. In this way, the predetermined number of steps may be adapted to the efficiency requirements of the model training.
In one embodiment, the predetermined number of steps may be determined based on an error requirement of a training result of the model training. Illustratively, if the error requirement is less than an error threshold, determining that the predetermined number of steps is greater than a predetermined number; or if the error requirement is greater than an error threshold, determining that the predetermined number of steps is less than a predetermined number. In this way, the predetermined number of steps may be adapted to the error requirements of the training results of the model training.
It should be noted that, as can be understood by those skilled in the art, the methods provided in the embodiments of the present disclosure can be executed alone or together with some methods in the embodiments of the present disclosure or some methods in the related art.
In one embodiment, referring again to fig. 2, the instructor model is a single tower model, wherein the input features of the single tower model are cross features of user information and content information; the student model is a double-tower model, wherein the double-tower model comprises a user tower model and a content tower model, the input characteristics of the user tower model are user information characteristics, and the input characteristics of the content tower model are content information characteristics.
In one embodiment, the loss parameters of the distillation loss function of the manner of distillation migration are determined from the error of the logits of the instructor model and the student model.
As shown in fig. 8, the present embodiment provides an online prediction apparatus, which includes:
a training module 81 for: on the basis of the same target training parameter, a predetermined feature set is used for jointly training a teacher model and a student model to obtain the trained student model; wherein, the instructor model is used for data rough arrangement; the student model is used for data fine ranking; applying predetermined parameter information obtained based on the instructor model to the student model by distillation migration;
a prediction module 82 to: and performing online prediction of a predetermined task based on the trained student model.
In one embodiment, the prediction module 82 is further configured to:
and performing online prediction of a data recommendation task based on the trained student model.
In one embodiment, the training module 81 is further configured to:
training the instructor model and the student model simultaneously and jointly by using the preset feature set based on the same target training parameter to obtain the trained student model;
or,
and jointly training the student models after the teacher model is trained by utilizing the preset feature set to obtain the trained student models.
In one embodiment, the training module 81 is further configured to:
and jointly training the student models after training the instructor model for a preset number of steps by using the preset feature set to obtain the trained student models.
An embodiment of the present disclosure further provides a communication device, including:
an antenna;
a memory;
and the processor is respectively connected with the antenna and the memory and used for controlling the antenna to transmit and receive wireless signals by executing the executable program stored in the memory, and can execute the steps of the wireless network access method provided by any of the foregoing embodiments.
The communication device provided in this embodiment may be the aforementioned terminal or base station. The terminal can be various human-borne terminals or vehicle-borne terminals. The base stations may be various types of base stations, such as 4G base stations or 5G base stations, and so on.
The antenna may be various types of antennas, for example, a mobile antenna such as a 3G antenna, a 4G antenna, or a 5G antenna; the antenna may further include: a WiFi antenna or a wireless charging antenna, etc.
The memory may include various types of storage media, which are non-transitory computer storage media capable of continuing to remember the information stored thereon after a communication device has been powered down.
The processor may be connected to the antenna and the memory via a bus or the like for reading an executable program stored on the memory, e.g. at least one of the methods shown in any of the embodiments of the present disclosure.
The embodiments of the present disclosure further provide a non-transitory computer-readable storage medium, which stores an executable program, where the executable program, when executed by a processor, implements the steps of the wireless network access method provided in any of the foregoing embodiments, for example, at least one of the methods shown in any of the embodiments of the present disclosure.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 9 is a block diagram illustrating an electronic device 600 according to an example embodiment. For example, the electronic device 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 9, electronic device 600 may include one or more of the following components: processing component 602, memory 604, power component 606, multimedia component 608, audio component 610, input/output (I/O) interface 612, sensor component 614, and communication component 616.
The processing component 602 generally controls overall operation of the electronic device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 602 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 602 can include one or more modules that facilitate interaction between the processing component 602 and other components. For example, the processing component 602 can include a multimedia module to facilitate interaction between the multimedia component 608 and the processing component 602.
The memory 604 is configured to store various types of data to support operation at the device 600. Examples of such data include instructions for any application or method operating on the electronic device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 604 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power supply component 606 provides power to the various components of electronic device 600. The power components 606 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 600.
The multimedia component 608 includes a screen that provides an output interface between the electronic device 600 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 608 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 600 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 610 is configured to output and/or input audio signals. For example, the audio component 610 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 604 or transmitted via the communication component 616. In some embodiments, audio component 610 further includes a speaker for outputting audio signals.
The I/O interface 612 provides an interface between the processing component 602 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 614 includes one or more sensors for providing status assessment of various aspects of the electronic device 600. For example, the sensor component 614 may detect an open/closed state of the device 600, the relative positioning of components, such as a display and keypad of the electronic device 600, the sensor component 614 may also detect a change in the position of the electronic device 600 or a component of the electronic device 600, the presence or absence of user contact with the electronic device 600, orientation or acceleration/deceleration of the electronic device 600, and a change in the temperature of the electronic device 600. The sensor assembly 614 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 614 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 614 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 616 is configured to facilitate communications between the electronic device 600 and other devices in a wired or wireless manner. The electronic device 600 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 616 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 616 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the electronic device 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 604 comprising instructions, executable by the processor 820 of the electronic device 600 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (12)

1. An online prediction method, the method comprising:
on the basis of the same target training parameter, a predetermined feature set is used for jointly training a teacher model and a student model to obtain the trained student model; wherein, the instructor model is used for data rough arrangement; the student model is used for data fine ranking; applying predetermined parameter information obtained based on the instructor model to the student model by distillation migration;
and performing online prediction of a predetermined task based on the trained student model.
2. The method of claim 1, wherein the performing online prediction of a predetermined task based on the trained student model comprises:
and performing online prediction of a data recommendation task based on the trained student model.
3. The method of claim 1, wherein the training of the instructor model and the student model using the predetermined feature set based on the same target training parameter to obtain the student model after training comprises:
training the instructor model and the student model simultaneously and jointly by using the preset feature set based on the same target training parameter to obtain the trained student model;
or,
and on the basis of the same target training parameter, jointly training the student model after training the instructor model by utilizing the preset feature set to obtain the trained student model.
4. The method of claim 3, wherein the jointly training the student model after training the instructor model using the predetermined feature set to obtain the trained student model comprises:
and jointly training the student models after training the instructor model for a preset number of steps by using the preset feature set to obtain the trained student models.
5. The method of claim 1, wherein the instructor model is a single tower model, wherein the input features of the single tower model are cross features of user information and content information; the student model is a double-tower model, wherein the double-tower model comprises a user tower model and a content tower model, the input characteristics of the user tower model are user information characteristics, and the input characteristics of the content tower model are content information characteristics.
6. The method of claim 1, wherein the loss parameters of the distillation loss function of the manner of distillation migration are determined from errors in logits of the instructor model and the student model.
7. An online prediction apparatus, the apparatus comprising:
a training module to: on the basis of the same target training parameter, a predetermined feature set is used for jointly training a teacher model and a student model to obtain the trained student model; wherein, the instructor model is used for data rough arrangement; the student model is used for data fine ranking; applying predetermined parameter information obtained based on the instructor model to the student model by distillation migration;
a prediction module to: and performing online prediction of a predetermined task based on the trained student model.
8. The apparatus of claim 7, wherein the prediction module is further configured to:
and performing online prediction of a data recommendation task based on the trained student model.
9. The apparatus of claim 7, wherein the training module is further configured to:
training the instructor model and the student model simultaneously and jointly by using the preset feature set based on the same target training parameter to obtain the trained student model;
or,
and on the basis of the same target training parameter, jointly training the student model after training the instructor model by utilizing the preset feature set to obtain the trained student model.
10. The apparatus of claim 9, wherein the training module is further configured to:
and jointly training the student models after training the instructor model for a preset number of steps by using the preset feature set to obtain the trained student models.
11. An online prediction device, comprising:
a memory;
a processor, coupled to the memory, configured to implement the method of any of claims 1-6 by executing computer-executable instructions stored on the memory.
12. A computer storage medium having stored thereon computer-executable instructions capable, when executed by a processor, of carrying out the method of any one of claims 1 to 6.
CN202111335056.9A 2021-11-11 2021-11-11 Online prediction method, device, equipment and storage medium Pending CN114168844A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111335056.9A CN114168844A (en) 2021-11-11 2021-11-11 Online prediction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111335056.9A CN114168844A (en) 2021-11-11 2021-11-11 Online prediction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114168844A true CN114168844A (en) 2022-03-11

Family

ID=80478893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111335056.9A Pending CN114168844A (en) 2021-11-11 2021-11-11 Online prediction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114168844A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114722805A (en) * 2022-06-10 2022-07-08 苏州大学 Little sample emotion classification method based on size instructor knowledge distillation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114722805A (en) * 2022-06-10 2022-07-08 苏州大学 Little sample emotion classification method based on size instructor knowledge distillation

Similar Documents

Publication Publication Date Title
CN109684510B (en) Video sequencing method and device, electronic equipment and storage medium
CN109859096A (en) Image Style Transfer method, apparatus, electronic equipment and storage medium
CN109543066B (en) Video recommendation method and device and computer-readable storage medium
CN109670077B (en) Video recommendation method and device and computer-readable storage medium
CN106485567B (en) Article recommendation method and device
CN109670632B (en) Advertisement click rate estimation method, advertisement click rate estimation device, electronic device and storage medium
CN107230137A (en) Merchandise news acquisition methods and device
CN111476057B (en) Lane line acquisition method and device, and vehicle driving method and device
CN110781905A (en) Image detection method and device
CN114168844A (en) Online prediction method, device, equipment and storage medium
CN113609380B (en) Label system updating method, searching device and electronic equipment
CN110297970B (en) Information recommendation model training method and device
CN112259122A (en) Audio type identification method and device and storage medium
CN113486978B (en) Training method and device for text classification model, electronic equipment and storage medium
CN112712385A (en) Advertisement recommendation method and device, electronic equipment and storage medium
CN116310633A (en) Key point detection model training method and key point detection method
CN112784151A (en) Method and related device for determining recommendation information
CN111796690A (en) Data processing method and device and electronic equipment
CN114648116A (en) Model quantification method and device, vehicle and storage medium
CN113656637B (en) Video recommendation method and device, electronic equipment and storage medium
CN111597431A (en) Recommendation method and device and electronic equipment
CN115203573A (en) Portrait label generating method, model training method, device, medium and chip
CN114550691A (en) Multi-tone word disambiguation method and device, electronic equipment and readable storage medium
CN111984864A (en) Object recommendation method and device, electronic equipment and storage medium
CN112990240B (en) Method and related device for determining vehicle type

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination