CN114021714A

CN114021714A - Transfer learning training method and device, electronic equipment and storage medium

Info

Publication number: CN114021714A
Application number: CN202111096355.1A
Authority: CN
Inventors: 吴学超; 周杨; 白云龙; 秦才霞
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-09-17
Filing date: 2021-09-17
Publication date: 2022-02-08

Abstract

The present disclosure provides a transfer learning training method, device, electronic device and storage medium, which relate to the technical field of data processing, in particular to the technical field of deep learning, and include: obtaining source domain samples; calculating a first cross entropy of each source domain data by using a first-stage model and calculating similarity weight according to the first cross entropy; obtaining a target domain sample; calculating a second cross entropy of each source domain data and each target domain data by using the two-stage model; calculating a third cross entropy of each source domain data and each target domain data according to the second cross entropy and the similarity weight of each source domain data and each target domain data; updating parameters of the two-stage model according to the third cross entropy of each source domain data and each target domain data; and predicting or sequencing the service data by the two-stage model after the parameters are updated.

Description

Transfer learning training method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of data processing technology, and more particularly, to the field of deep learning technology.

Background

The existing migration learning training method is to directly use the data of the source domain sample and the data of the target domain sample to perform the joint training on the target domain model, but the data distribution of the scene of the source domain is not consistent with the data distribution of the current scene, that is, the scene of the target domain, even has larger difference, the direct full-scale joint training can generate the negative migration phenomenon on the model of the current scene, which can not only improve but reduce the performance of the model of the current scene, but the sampled joint training controls the quantity of the introduced source domain samples, but because the difference between the data distribution of the sample data of the source domain and the data distribution of the current scene is not solved, the negative migration phenomenon can only be lightened to a certain extent, the influence of the data difference on the model of the current scene is difficult to change, and the network structure with better migration learning effect is used, although the negative migration phenomenon can be effectively alleviated, but a great deal of labor cost is required to adjust the model parameters of the network structure according to different service scenes.

Disclosure of Invention

The disclosure provides a transfer learning training method and device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a transfer learning training method, including:

obtaining a source domain sample, wherein the source domain sample comprises a plurality of source domain data and a label value corresponding to the source domain data;

calculating a first cross entropy of each source domain data by using a first-stage model and calculating similarity weight according to the first cross entropy;

acquiring a target domain sample, wherein the target domain sample comprises a plurality of target domain data and a label value corresponding to the target domain data;

calculating a second cross entropy of each source domain data and each target domain data by using the two-stage model;

calculating a third cross entropy of each source domain data and each target domain data according to the second cross entropy and the similarity weight of each source domain data and each target domain data;

updating parameters of the two-stage model according to the third cross entropy of each source domain data and each target domain data;

and predicting or sequencing the service data by the two-stage model after the parameters are updated.

According to another aspect of the present disclosure, there is provided a migration learning training apparatus including:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a source domain sample, and the source domain sample comprises a plurality of source domain data and a label value corresponding to the source domain data;

the calculation module is used for calculating a first cross entropy of each source domain data by using the first-stage model and calculating similarity weight according to the first cross entropy;

the acquisition sample is also used for acquiring a target domain sample, and the target domain sample comprises a plurality of target domain data and a label value corresponding to the target domain data;

the calculation module is further used for calculating a second cross entropy of each source domain data and each target domain data by using a two-stage model;

the calculation module is further used for calculating a third cross entropy of each source domain data and each target domain data according to the second cross entropy and the similarity weight of each source domain data and each target domain data;

the training module is used for updating parameters of the two-stage model according to the third cross entropy of each source domain data and each target domain data;

and the processing module is used for pre-estimating or sequencing the service data by the two-stage model after the parameters are updated.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any of the methods described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of the above.

According to another aspect of the disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method of any of the above.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic flow chart diagram of a transfer learning training method provided according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of a transfer learning training apparatus provided according to an embodiment of the present disclosure;

fig. 3 is a block diagram of an electronic device for implementing a transfer learning training method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In order to slow down the negative migration effect and improve the stability of the migration learning effect without consuming a large amount of labor cost, as shown in fig. 1, an embodiment of the present disclosure provides a migration learning training method, including:

step 101, obtaining a source domain sample, where the source domain sample includes a plurality of source domain data and a tag value corresponding to the source domain data.

The method comprises the steps of obtaining a source domain sample, wherein the source domain sample comprises a plurality of source domain data and label values corresponding to the source domain data, the label values corresponding to the source domain data are 0 or 1, a model (namely a target domain model, in the embodiment, a two-stage model) in a recommendation system (namely a target domain) with a small business scene scale or short building time is low in training data and low in model ordering estimation accuracy, and the transfer learning is to obtain the source domain sample from a source domain with more data to train the target domain model so as to improve the ordering estimation accuracy of the target domain model.

And 102, calculating a first cross entropy of each source domain data by using a one-stage model and calculating a similarity weight according to the first cross entropy.

Calculating a first cross entropy of each source domain data by using a one-stage model, wherein the first cross entropy L can be calculated according to the following formula_1i：

Wherein, y_1iIs the tag value of the ith source field data,

an estimated value of the ith source domain data;

after the first cross entropy of the source domain data is calculated, the similarity weight W can be calculated according to the following formula_1i：

Wherein e is a natural constant.

Step 103, obtaining a target domain sample, where the target domain sample includes a plurality of target domain data and a tag value corresponding to the target domain data.

And acquiring a target domain sample, wherein the target domain sample comprises a plurality of target domain data and a label value corresponding to the target domain data, and the label value corresponding to the target domain data is 0 or 1.

And 104, calculating second cross entropy of each source domain data and each target domain data by using the two-phase model.

Calculating a second cross entropy L of each source domain data according to the following formula_2i：

Wherein, y_2iIs the tag value of the ith source field data,

an estimated value of the ith source domain data;

calculating a second cross entropy L of each target domain data according to the following formula_2j：

Wherein, y_2jIs the tag value of the jth target domain data,

an estimated value of jth target domain data;

since the estimated value of the two-stage model estimated for the source domain data may be different from that of the one-stage model, the second cross entropy of the source domain data also needs to be recalculated here.

And 105, calculating a third cross entropy of each source domain data and each target domain data according to the second cross entropy and the similarity weight of each source domain data and each target domain data.

Calculating a third cross entropy L of each source domain data according to the second cross entropy of each source domain data and each target domain data and the similarity weight of each source domain data_3iAnd a third cross entropy L of each target domain data_3j：

L_3i＝W_1iL_2i

L_3j＝L_2j

Wherein, W_1iIs the similarity weight of the ith source domain data, L_2iSecond cross entropy, L, for ith source domain data_2jIs the second cross entropy of the jth target domain data.

And 106, updating parameters of the two-stage model according to the third cross entropy of each source domain data and each target domain data.

And step 107, pre-estimating or sequencing the service data by the two-stage model after the parameters are updated.

The two-stage model is trained by utilizing the source domain sample of the source domain and the target domain sample of the target domain, so that the model in the target domain with smaller business scene scale or shorter building time can not cause low precision of model sequencing estimation because of less training data, the first-stage model is utilized to estimate the source domain sample, the first cross entropy of the source domain sample is calculated according to the pre-estimation value and the label value, the similarity weight of the source domain sample is further calculated according to the first cross entropy, when the two-stage model is trained, the second cross entropy of each source domain data and each target domain data is calculated firstly, the third cross entropy is calculated according to the similarity weight and the second cross entropy, and finally the two-stage model is subjected to parameter updating according to the third cross entropy, so that the negative migration phenomenon caused by the difference between the source domain and the target domain can be effectively reduced, compared with the prior art that the two-stage model is trained after the source domain sample is screened, the effect of slowing down the negative migration phenomenon is better, a specific network structure is not needed, and huge labor cost is not needed to be consumed to adjust model parameters of the network structure aiming at different service scenes.

In step 102, calculating a first cross entropy of each source domain data by using a one-stage model and calculating a similarity weight according to the first cross entropy, and in an implementation mode, extracting feature data of each source domain data on preset N dimensions by using the one-stage model;

estimating the source domain data according to the feature data of each source domain data on preset N dimensions to obtain a predicted value corresponding to each source domain data;

calculating a first cross entropy of each source domain data according to the label value and the estimated value of the source domain data;

and calculating the similarity weight of each source domain data according to the first cross entropy of the source domain data.

The method comprises the steps of extracting feature data of each source domain data on preset N dimensions by using a one-stage model, wherein the preset N dimensions can comprise user features, article features, behavior sequence features, request features, other service related features and the like, and can be specifically set according to scene types of a source domain and a target domain, the one-stage model carries out estimation according to the feature data of the source domain data on the multiple dimensions, the estimation accuracy of the one-stage model on the source domain data can be improved, after the estimation value is obtained, a first cross entropy of the source domain data is calculated according to a label value and the estimation value of each source domain data, and then the similarity weight of the source domain data is calculated according to the first cross entropy of each source domain data.

In step 104, calculating a second cross entropy of each source domain data and each target domain data by using a two-phase model, and in an implementation manner, extracting feature data of each source domain data and each target domain data on preset M dimensions by using the two-phase model, where the preset M dimensions are L dimensions more than preset N dimensions, and the feature data on the L dimensions can represent differences between the source domain data and the target domain data;

estimating each source domain data and each target domain data according to the characteristic data of each source domain data and each target domain data on preset M dimensions to obtain an estimated value of each source domain data and each target domain data;

and calculating a second cross entropy of each source domain data and each target domain data according to the label value and the estimated value of each source domain data and each target domain data.

Extracting feature data of each source domain data and each target domain data on preset M dimensions by using a two-stage model, wherein the preset M dimensions have L dimensions more than preset N dimensions, the feature data on the L dimensions can represent the difference between the source domain data and the target domain data, the preset M dimensions have L dimensions more than the preset N dimensions, such as APP feature data, scene feature data and the like, which can obviously represent the difference between the source domain and the target domain, when the two-stage model predicts each source domain data and each target domain data, the feature data can enable the two-stage model to clearly distinguish the source domain data with larger difference, more accurately predict the data, and finally obtain more accurate cross entropy, and after extracting the feature data, each source domain data and each target domain data are predicted according to the feature data of each source domain data and each target domain data on the preset M dimensions And pre-estimating the data to obtain a pre-estimated value of each source domain data and each target domain data, and calculating a second cross entropy of each source domain data and each target domain data according to the label value and the pre-estimated value of each source domain data and each target domain data.

After the first cross entropy of each source domain data is calculated using the one-phase model in step 102, in an embodiment, the one-phase model is not updated with parameters after the first cross entropy of each source domain data is calculated using the one-phase model.

After the first cross entropy of each source domain data is calculated by using the one-stage model, the one-stage model is not updated in parameters, so that the one-stage model cannot be influenced by the source domain data with large difference, a negative migration effect is generated in the one-stage model, and the inaccuracy of the calculated similarity weight of the source domain data due to the reduction of the estimation capability of the one-stage model on the source domain data is prevented.

After calculating the first cross entropy of each source domain data by using the one-stage model and calculating the similarity weight according to the first cross entropy in step 102, in an embodiment, the source domain data with the similarity weight less than or equal to a preset threshold in the source domain samples are removed.

The source domain data with the similarity weight smaller than or equal to the preset threshold value in the source domain sample are removed, namely the source domain sample is screened according to the similarity weight, the source domain data with the smaller similarity weight, namely the source domain data with the larger difference with the target domain in the source domain sample are removed, so that the difference between the source domain data and the target domain which are finally applied to the model in the training two-stage is in a certain controllable range, the generation of the negative migration phenomenon is effectively slowed down, and the migration learning effect is further improved.

An embodiment of the present disclosure provides a transfer learning training apparatus, as shown in fig. 2, the apparatus includes:

the system comprises an acquisition module 10, a storage module and a processing module, wherein the acquisition module is used for acquiring a source domain sample, and the source domain sample comprises a plurality of source domain data and a label value corresponding to the source domain data;

a calculating module 20, configured to calculate a first cross entropy of each source domain data by using the first-stage model and calculate a similarity weight according to the first cross entropy;

the acquisition sample 10 is further configured to obtain a target domain sample, where the target domain sample includes a plurality of target domain data and a tag value corresponding to the target domain data;

the calculating module 20 is further configured to calculate a second cross entropy of each source domain data and each target domain data by using a two-stage model;

the calculating module 20 is further configured to calculate a third cross entropy of each source domain data and each target domain data according to the second cross entropy and the similarity weight of each source domain data and each target domain data;

the training module 30 is configured to perform parameter updating on the two-stage model according to the third cross entropy of each source domain data and each target domain data;

and the processing module 40 is used for predicting or sequencing the service data by the two-stage model after the parameters are updated.

The computing module 20 is further configured to extract feature data of each source domain data in preset N dimensions by using a one-stage model;

the computing module 20 is further configured to estimate the source domain data according to the feature data of each source domain data in preset N dimensions, so as to obtain a pre-estimated value corresponding to each source domain data;

the calculating module 20 is further configured to calculate a first cross entropy of each source domain data according to the label value and the estimated value of the source domain data;

the calculating module 20 is further configured to calculate a similarity weight of each source domain data according to the first cross entropy of the source domain data.

The computing module 20 is further configured to extract feature data of each source domain data and each target domain data in preset M dimensions by using a two-stage model, where the preset M dimensions are L more than preset N dimensions, and the feature data in the L dimensions can represent differences between the source domain data and the target domain data;

the calculation module 20 is further configured to estimate each source domain data and each target domain data according to the feature data of each source domain data and each target domain data in preset M dimensions, so as to obtain an estimated value of each source domain data and each target domain data;

the calculating module 20 is further configured to calculate a second cross entropy of each source domain data and each target domain data according to the label value and the estimated value of each source domain data and each target domain data.

Wherein the training module 30 is further configured to calculate the first cross entropy of each source domain data by using the one-stage model, and then not perform parameter updating on the one-stage model.

The calculating module 20 is further configured to remove the source domain data with the similarity weight less than or equal to the preset threshold in the source domain sample.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 3 illustrates a schematic block diagram of an example electronic device 300 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 3, the apparatus 300 includes a computing unit 301 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)302 or a computer program loaded from a storage unit 308 into a Random Access Memory (RAM) 303. In the RAM303, various programs and data required for the operation of the device 300 can also be stored. The calculation unit 301, the ROM302, and the RAM303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

Various components in device 300 are connected to I/O interface 305, including: an input unit 306 such as a keyboard, a mouse, or the like; an output unit 307 such as various types of displays, speakers, and the like; a storage unit 308 such as a magnetic disk, optical disk, or the like; and a communication unit 309 such as a network card, modem, wireless communication transceiver, etc. The communication unit 309 allows the device 300 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 301 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 301 performs the various methods and processes described above, such as a transfer learning training method. For example, in some embodiments, the transfer learning training method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 308. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 300 via ROM302 and/or communication unit 309. When the computer program is loaded into RAM303 and executed by computing unit 301, one or more steps of the migration learning training method described above may be performed. Alternatively, in other embodiments, the computing unit 301 may be configured to perform the transfer learning training method in any other suitable manner (e.g., by way of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A transfer learning training method, comprising:

2. The method of claim 1, the calculating a first cross entropy for each source domain data using a one-stage model and calculating similarity weights from the first cross entropy, comprising:

extracting feature data of each source domain data on preset N dimensions by using a first-stage model;

3. The method of claim 2, the calculating a second cross entropy for each source domain data and each target domain data using a two-phase model, comprising:

extracting feature data of each source domain data and each target domain data on preset M dimensions by using a two-stage model, wherein the preset M dimensions are more than the preset N dimensions by L dimensions, and the feature data on the L dimensions can represent the difference of the source domain data and the target domain data;

4. The method of claim 1, after calculating the first cross entropy for each source domain data using the one-phase model, further comprising:

the parameter updating is not performed on the one-phase model after the first cross entropy of each source domain data is calculated by the one-phase model.

5. The method of claim 1, after calculating a first cross entropy for each source domain data using the one-phase model and calculating a similarity weight based on the first cross entropy, further comprising:

and eliminating the source domain data with the similarity weight less than or equal to a preset threshold value in the source domain samples.

6. A transfer learning training apparatus comprising:

7. The apparatus of claim 6, comprising:

the computing module is further used for extracting feature data of each source domain data on preset N dimensions by using a one-stage model;

the computing module is further configured to predict the source domain data according to the feature data of each source domain data in preset N dimensions to obtain a predicted value corresponding to each source domain data;

the calculation module is further used for calculating a first cross entropy of each source domain data according to the label value and the estimated value of the source domain data;

the calculating module is further used for calculating the similarity weight of each source domain data according to the first cross entropy of the source domain data.

8. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

9. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.

10. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-5.