CN116467629A - Training method of user identification model, user identification method and system - Google Patents

Training method of user identification model, user identification method and system Download PDF

Info

Publication number
CN116467629A
CN116467629A CN202310438653.7A CN202310438653A CN116467629A CN 116467629 A CN116467629 A CN 116467629A CN 202310438653 A CN202310438653 A CN 202310438653A CN 116467629 A CN116467629 A CN 116467629A
Authority
CN
China
Prior art keywords
user
task
target
feature
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310438653.7A
Other languages
Chinese (zh)
Inventor
唐浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AlipayCom Co ltd
Original Assignee
AlipayCom Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AlipayCom Co ltd filed Critical AlipayCom Co ltd
Priority to CN202310438653.7A priority Critical patent/CN116467629A/en
Publication of CN116467629A publication Critical patent/CN116467629A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

According to the training method, the user identification method and the system for the user identification model, a sample data set of a plurality of user samples is determined on the basis of a pre-training user identification model corresponding to a first user category, the plurality of user samples comprise users of the first user category and users of a second user category, then the sample data set is input into a preset multitask model to obtain a prediction result of each task in a plurality of tasks, the plurality of tasks comprise users for predicting the first user category and users for predicting the second user category, the preset multitask model and the pre-training user identification model share network parameters, and the preset multitask model is updated on the basis of the prediction result to obtain a target user identification model corresponding to the second user category; the scheme can improve the accuracy of user identification.

Description

Training method of user identification model, user identification method and system
Technical Field
The present disclosure relates to the field of user identification, and in particular, to a training method for a user identification model, a user identification method and a system.
Background
In recent years, with the rapid development of internet technology, user data of users are becoming more and more abundant, and users can be divided into a plurality of user categories based on the user data. Among the user categories, some user categories are convenient for accumulating user samples, so that user identification can be performed by training a corresponding user identification model through the accumulated samples, and some user categories are inconvenient for accumulating the samples.
At present, for the user identification of the user category inconvenient to accumulate samples, a cold start model or a model approximate modeling can be directly used, so that the user identification is realized, and then, if the cold start model or the model approximate modeling is adopted, a large amount of sparse samples can appear, so that the model learning performance is greatly reduced, and the accuracy of the user identification is lower. Therefore, there is a need to provide a training method, a user recognition method and a system for a user recognition model with higher recognition accuracy.
Disclosure of Invention
The specification provides a training method, a user identification method and a system for a user identification model with higher identification accuracy.
In a first aspect, the present disclosure provides a training method for a user identification model, including: determining a sample dataset of a plurality of user samples based on a pre-trained user recognition model corresponding to a first user category, the plurality of user samples including users of the first user category and users of a second user category; inputting the sample data set into a preset multitasking model to obtain a prediction result of each task in a plurality of tasks, wherein the tasks comprise users predicting a first user class and users predicting a second user class, and the preset multitasking model and the pre-training user identification model share network parameters; and updating the preset multitasking model based on the prediction result to obtain a target user identification model, wherein the target user identification model is configured to identify a target user corresponding to the second user category.
In some embodiments, the users of the first user category include users having target rights directly, the users of the second user category include users having the target rights after reaching a preset condition, and the target rights include rights to open at least one sub-account for the users.
In some embodiments, the determining a sample data set of a plurality of user samples based on a pre-trained user identification model corresponding to a first user category includes: obtaining a user data set of an original user set; inputting the user data set into the pre-training user identification model to obtain a first original user corresponding to the first user category in the original user set; and determining a sample dataset of the plurality of user samples based on the first original user and the user dataset.
In some embodiments, the determining a sample dataset of the plurality of user samples based on the first original user and the user dataset comprises: selecting at least one original user except the first original user from the original user set to obtain a second original user of the second user category; selecting user data corresponding to the second original user from the user data set to obtain target user data; and obtaining current user data of at least one user of the first user category, and taking the current user data and the target user data as a sample data set of the plurality of user samples.
In some embodiments, the inputting the sample data set into a preset multitasking model to obtain a prediction result of each task of the plurality of tasks includes: performing multidimensional feature extraction on the sample data set to obtain a user feature set of each user sample in the plurality of user samples; and inputting the user characteristic set into the preset multi-task model to obtain a prediction result of each task in the plurality of tasks.
In some embodiments, the set of user features includes at least one of user explicit features, user implicit features, continuous features, or user behavior sequence features.
In some embodiments, the preset multitasking model includes a feature extraction network group and a prediction network corresponding to each task; and inputting the user feature set to the preset multitasking model to obtain a prediction result of each task in the plurality of tasks, including: feature conversion is carried out on the features in the user feature set to obtain the target user feature set, the target user feature set is input into the feature extraction network group to obtain task features corresponding to each task and sharing features among the tasks, the task features corresponding to each task are fused with the sharing features to obtain target task features corresponding to each task, and the target task features are input into a prediction network of the corresponding task to obtain a prediction result of each task.
In some embodiments, the set of feature extraction networks comprises a multi-layer feature extraction network; and inputting the target user feature set into the feature extraction network group to obtain a task feature corresponding to each task and a sharing feature among the tasks, including: selecting a target feature extraction network corresponding to a first layer from the multi-layer feature extraction networks, inputting the target user feature set into the target feature extraction network to obtain initial task features corresponding to each task and initial sharing features among the tasks, fusing the initial sharing features, the initial task features and the target user feature set to obtain target sample features, taking the target sample features as the target user feature set, taking a next-layer feature extraction network of the target feature extraction network as the target feature extraction network, and returning to execute the step of inputting the target user feature set into the target feature extraction network until the target feature extraction network is the last-layer feature extraction network to obtain task features corresponding to each task and sharing features among the tasks.
In some embodiments, each layer of feature extraction network in the multi-layer feature extraction network includes a task feature extraction sub-network corresponding to each task and a shared feature extraction sub-network between the plurality of tasks; and inputting the target user feature set to the target feature extraction network to obtain an initial task feature corresponding to each task and an initial sharing feature between the tasks, including: and taking each task as a target task, inputting the user characteristic set into a task characteristic extraction sub-network corresponding to the target task to obtain initial task characteristics corresponding to the target task, and inputting the target user characteristic set into the sharing characteristic extraction sub-network to obtain the initial sharing characteristics.
In some embodiments, the inputting the target user feature set into the task feature extraction sub-network corresponding to the target task to obtain an initial task feature corresponding to the target task includes: performing relative position coding on the features in the target user feature set to obtain a candidate task feature set of the target task; and fusing the features in the candidate task feature set to obtain initial task features corresponding to the target task.
In some embodiments, the encoding the relative positions of the user features in the target user feature set to obtain a candidate task feature set of the target task includes: performing relative position coding on the user features in the target user feature set to obtain an initial user coding feature set; and performing spatial transformation on the features in the initial user coding feature set based on a preset gating linear activation function to obtain a candidate task feature set of the target task.
In some embodiments, the fusing the features in the candidate task feature set to obtain the initial task feature corresponding to the target task includes: based on a sample domain of the user sample, adjusting the coding position corresponding to the candidate task feature set to obtain an adjusted candidate task feature set; performing feature transformation on the adjusted candidate task feature set to obtain a transformed candidate task feature set; and adjusting the coding positions corresponding to the transformed candidate task feature sets to obtain initial task features of the target task.
In some embodiments, the adjusting the encoding position corresponding to the candidate task feature set to obtain an adjusted candidate task feature set includes: obtaining position coding adjustment parameters, and determining the position weight of a coding position corresponding to the candidate task feature set based on the position coding adjustment parameters; and weighting the features in the candidate task feature set based on the position weight to obtain the adjusted candidate task feature set.
In some embodiments, the position-coding adjustment parameters include a feature mapping parameter and a normalization parameter, the feature mapping parameter including a first mapping parameter and a second mapping parameter; and determining the position weight of the coding position corresponding to the candidate task feature set based on the position coding adjustment parameter, including: mapping the features in the candidate task feature set to a preset first feature space based on the first mapping parameter to obtain a first mapping feature set, performing nonlinear change on the features in the first mapping feature set to obtain a nonlinear feature set, mapping the features in the nonlinear feature set to a preset second feature space based on the second mapping parameter to obtain a second mapping feature set, and normalizing the features in the second mapping feature set based on the normalization parameter to obtain the position weight corresponding to the coding position.
In some embodiments, updating the preset multitasking model based on the prediction result to obtain a target user identification model includes: obtaining task labeling results corresponding to the plurality of user samples in the plurality of tasks; comparing the task labeling result with the prediction result to obtain user joint loss corresponding to each task in the plurality of tasks; and converging the preset multi-task model based on the user joint loss to obtain a target multi-task model, and taking the target multi-task model as the target user identification model.
In some embodiments, comparing the task labeling result with the prediction result to obtain a user joint loss corresponding to each task of the plurality of tasks, including: comparing the task labeling result with the prediction result to obtain task loss of each user sample in the plurality of user samples, wherein the task loss comprises loss corresponding to each task in the plurality of tasks; selecting at least one corresponding user sample from the plurality of user samples based on the task type of each task to obtain a target user sample corresponding to each task; and accumulating the task losses of the target user sample under the corresponding tasks to obtain the user joint losses corresponding to each task.
In some embodiments, the plurality of tasks further comprises at least one auxiliary task comprising at least one of predicting other categories of users or predicting at least one user behavior of the user.
In a second aspect, the present specification provides a user identification method, including: obtaining user data of each user in a user set; inputting the user data into a target user identification model to obtain a category prediction result of each user, wherein the target user identification model comprises a multi-task model obtained by transfer learning based on a pre-trained user identification model corresponding to a first user category; and selecting at least one user corresponding to the second user category from the user set based on the category prediction result to obtain a target user.
In some embodiments, the users of the first user category include users having target rights directly, the users of the second user category include users having the target rights after reaching a preset condition, and the target rights include rights to open at least one sub-account for the users.
In a third aspect, the present disclosure further provides a training system for a user identification model, including: at least one storage medium storing at least one set of instructions for performing training of a user identification model; and at least one processor communicatively coupled to the at least one storage medium, wherein the at least one processor reads the at least one instruction set and performs the method of training the user identification model described in the first aspect of the specification as indicated by the at least one instruction set when the training system of the user identification model is running.
In a fourth aspect, the present specification further provides a user identification system, including: at least one storage medium storing at least one instruction set for user identification; and at least one processor communicatively coupled to the at least one storage medium, wherein the at least one processor reads the at least one instruction set and performs the user identification method of the first aspect of the present specification as directed by the at least one instruction set when the user identification system is operating.
According to the training method, the user identification method and the system for the user identification model, which are provided by the specification, in the pre-training user identification model corresponding to the first user category, a sample data set of a plurality of user samples is determined, wherein the plurality of user samples comprise users in the first user category and users in the second user category, then the sample data set is input into a preset multitask model to obtain a prediction result of each task in a plurality of tasks, the tasks comprise users for predicting the first user category and users for predicting the second user category, the preset multitask model and the pre-training user identification model share network parameters, and the preset multitask model is updated based on the prediction result to obtain a target user identification model, and the target user identification model is configured to identify a target user corresponding to the second user category; according to the scheme, the pre-training user identification model corresponding to the first user category can be started in a cold mode, so that a small amount of user samples corresponding to the second user category are accumulated, then, network parameters of the pre-training user identification model are migrated to a preset multi-task model, and the network parameters of the preset multi-task model are updated through accumulated sample data of a plurality of user categories, so that the accuracy of the trained target user identification model can be improved, and the accuracy of user identification is improved.
Other functions of the training method, user recognition method, and system of the user recognition model provided in the present specification will be partially set forth in the following description. The following numbers and examples presented will be apparent to those of ordinary skill in the art in view of the description. The inventive aspects of the user identification model training methods, user identification methods, and systems provided herein may be best explained by practicing or using the methods, apparatuses, and combinations described in the detailed examples below.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present description, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 shows an application scenario of a user identification system according to an embodiment of the present disclosure;
FIG. 2 illustrates a hardware architecture diagram of a computing device provided in accordance with an embodiment of the present description;
FIG. 3 illustrates a flowchart of a training method for a user identification model provided in accordance with an embodiment of the present disclosure;
FIG. 4 shows a schematic diagram of a feature extraction sub-network provided in accordance with an embodiment of the present disclosure;
FIG. 5 shows a network architecture comparison schematic of a feature extraction sub-network and a BERT network provided in accordance with an embodiment of the present description;
FIG. 6 is a schematic diagram of a daptor module according to an embodiment of the disclosure;
FIG. 7 illustrates a schematic diagram of a network framework for migration learning of a preset multitasking model provided in accordance with an embodiment of the present disclosure;
FIG. 8 illustrates a training schematic of a preset multitasking model provided in accordance with an embodiment of the present disclosure; and
fig. 9 shows a flowchart of a user identification method according to an embodiment of the present specification.
Detailed Description
The following description is presented to enable one of ordinary skill in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the disclosure. Thus, the present description is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the claims.
The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. For example, as used herein, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. The terms "comprises," "comprising," "includes," and/or "including," when used in this specification, are taken to specify the presence of stated integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
These and other features of the present specification, as well as the operation and function of the related elements of structure, as well as the combination of parts and economies of manufacture, may be significantly improved upon in view of the following description. All of which form a part of this specification, reference is made to the accompanying drawings. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the description. It should also be understood that the drawings are not drawn to scale.
The flowcharts used in this specification illustrate operations implemented by systems according to some embodiments in this specification. It should be clearly understood that the operations of the flow diagrams may be implemented out of order. Rather, operations may be performed in reverse order or concurrently. Further, one or more other operations may be added to the flowchart. One or more operations may be removed from the flowchart.
For convenience of description, the present specification will explain terms that will appear from the following description as follows:
balance +: a balance upgrading service, wherein a user can open a balance sub-account through signing;
one-touch admittance to the user: the user can directly sign up by clicking the sign-up, so that the authority for opening balance + service is obtained;
non-one-key admittance to the user: after the user needs to complete the corresponding task and reach the condition, the right of balance + service can be opened.
In recent years, with the rapid development of internet technology, user data of users are becoming more and more abundant, and users can be divided into a plurality of user categories based on the user data. Among the plurality of user categories, some user categories (hereinafter referred to as first user categories) are convenient for accumulating user samples, so that user identification can be performed by training a corresponding user identification model through the accumulated samples, and some user categories (hereinafter referred to as second user categories) are inconvenient for accumulating samples, so that user identification can not be performed by training the corresponding user identification model through the samples.
At present, the user identification model corresponding to the first user category is usually used for cold start directly for the identification of the user of the second user category, or the user identification model corresponding to the first user category is used for approximate modeling, but a large amount of samples are sparse, so that the model learning performance is greatly reduced, the training precision of the user identification model corresponding to the second user category is lower, and the accuracy of the user identification of the second user category is lower.
The inventors of the present specification propose the following technical ideas for the above technical problems: and performing cold start through the pre-training user identification model corresponding to the first user category, so as to accumulate a small amount of user samples corresponding to the second user category, then, migrating the network parameters of the pre-training user identification model to a preset multi-task model, and updating the network parameters of the preset multi-task model through accumulated sample data of a plurality of user categories, thereby improving the precision of the trained target user identification model and further improving the accuracy of user identification.
Fig. 1 shows a schematic application scenario of a user identification system 100 according to an embodiment of the present disclosure. The user identification system (hereinafter referred to as system 100) may be applied to user identification in any scenario, such as user identification in financial scenarios, user identification in medical scenarios, or other user identification in scenarios where there is some data or sample missing, etc. As shown in fig. 1, system 100 may include a user 110, a client 120, a server 130, and a network 140.
User 110 may be the user that triggers user identification, and user 110 may perform user identification for the second user category in client 120.
The client 120 may be a device that performs user identification in response to a user identification operation of the user 110. In some embodiments, the user identification method may be performed on the client 120. At this time, the client 120 may store data or instructions for performing the user identification method described in the present specification, and may perform or be used to perform the data or instructions. In some embodiments, the client 120 may include a hardware device having a data information processing function and a program necessary to drive the hardware device to operate. As shown in fig. 1, a client 120 may be communicatively coupled to a server 130. In some embodiments, the server 130 may be communicatively coupled to a plurality of clients 120. In some embodiments, client 120 may interact with server 130 over network 140 to receive or send messages, etc. In some embodiments, the client 120 may include a mobile device, a tablet, a laptop, a built-in device of a motor vehicle, or the like, or any combination thereof. In some embodiments, the mobile device may include a smart home device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the smart home device may include a smart television, a desktop computer, or the like, or any combination. In some embodiments, the smart mobile device may include a smart phone, personal digital assistant, gaming device, navigation device, etc., or any combination thereof. In some embodiments, the virtual reality device or augmented reality device may include a virtual reality helmet, virtual reality glasses, virtual reality handles, an augmented reality helmet, augmented reality glasses, an augmented reality handle, or the like, or any combination thereof. For example, the virtual reality device or the augmented reality device may include google glass, head mounted display, VR, or the like. In some embodiments, the built-in devices in the motor vehicle may include an on-board computer, an on-board television, and the like. In some embodiments, the client 120 may be a device with positioning technology for locating the position of the client 120.
In some embodiments, client 120 may be installed with one or more Applications (APPs). The APP can provide the user 110 with the ability to interact with the outside world via the network 140 as well as an interface. The APP includes, but is not limited to: web browser-like APP programs, search-like APP programs, chat-like APP programs, shopping-like APP programs, video-like APP programs, financial-like APP programs, instant messaging tools, mailbox clients, social platform software, and the like. In some embodiments, the client 120 may have a target APP installed thereon. The target APP is capable of user identification for the client 120. In some embodiments, the user 110 may also trigger a user identification request through the target APP. The target APP may perform the user identification method described in the present specification in response to the user identification request. The user identification method will be described in detail later.
The server 130 may be a server providing various services, such as a server for user identification of user data of a user set obtained by the client 120, or may be a server for providing other services in user identification of user data for the client 120. In some embodiments, the user identification method may be performed on the server 130. At this time, the server 130 may store data or instructions for performing the user identification method described in the present specification, and may perform or be used to perform the data or instructions. In some embodiments, the server 130 may include a hardware device having a data information processing function and a program necessary to drive the hardware device to operate. The server 130 may be communicatively coupled to a plurality of clients 120 and receive data transmitted by the clients 120.
Network 140 is the medium used to provide communication connections between clients 120 and servers 130. The network 140 may facilitate the exchange of information or data. As shown in fig. 1, the client 120 and the server 130 may be connected to a network 140 and transmit information or data to each other through the network 140. In some embodiments, network 140 may be any type of wired or wireless network, or a combination thereof. For example, network 140 may include a cable network, a wireline network, a fiber optic network, a telecommunications network, an intranet, the Internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a Public Switched Telephone Network (PSTN), a Bluetooth network TM 、ZigBee TM A network, a Near Field Communication (NFC) network, or the like. In some embodiments, network 140 may include one or more network access points. For example, network 140 may include a wired or wireless network access point, such as a base station or an internet switching point, through which one or more components of client 120 and server 130 may connect to network 140 to exchange data or information.
It should be understood that the number of clients 120, servers 130, and networks 140 in fig. 1 are merely illustrative. There may be any number of clients 120, servers 130, and networks 140, as desired for an implementation.
It should be noted that, the content searching method of the static site may be performed entirely on the client 120, may be performed entirely on the server 130, may be performed partially on the client 120, and may be performed partially on the server 130.
An application scenario schematic diagram of a training system of a user identification model may be shown in fig. 1, where the training system of the user identification model may train a target user identification model, and a target user of a second user class is selected from a user set in the system 100 through the target user identification model, and specific content may be referred to the foregoing, which is not described herein.
Fig. 2 illustrates a hardware architecture diagram of a computing device 200 provided in accordance with an embodiment of the present specification. Computing device 200 may perform the training method and/or user recognition method of the user recognition model described herein. The training method of the user recognition model and/or the user recognition method are described in other parts of the specification. The computing device 200 may be the client 120 when the training method of the user identification model and/or the user identification method is performed on the client 120. When the training method of the user identification model and/or the user identification method is performed on the server 130, the computing device 200 may be the server 130. The computing device 200 may be a client 120 and a server 130 when the training method of the user identification model and/or the user identification method may be partially performed on the client 120 and partially performed on the server 130.
As shown in fig. 2, computing device 200 may include at least one storage medium 230 and at least one processor 220. In some embodiments, computing device 200 may also include a communication port 240 and an internal communication bus 210. Meanwhile, the computing device 200 may also include an I/O component 250.
Internal communication bus 210 may connect the various system components including storage medium 230, processor 220, and communication ports 240.
I/O component 250 supports input/output between computing device 200 and other components.
The communication port 240 is used for data communication between the computing device 200 and the outside world, for example, the communication port 240 may be used for data communication between the computing device 200 and the network 140. The communication port 240 may be a wired communication port or a wireless communication port.
Storage medium 230 may include a data storage device. The data storage device may be a non-transitory storage medium or a transitory storage medium. For example, the data storage devices may include one or more of magnetic disk 232, read Only Memory (ROM) 234, or Random Access Memory (RAM) 236. The storage medium 230 further includes at least one set of instructions stored in the data storage device. The instructions are computer program code that may include programs, routines, objects, components, data structures, procedures, modules, etc. that perform the training methods and/or user identification methods of the user identification models provided herein.
The at least one processor 220 may be communicatively coupled with at least one storage medium 230 and a communication port 240 via an internal communication bus 210. The at least one processor 220 is configured to execute the at least one instruction set. When the computing device 200 is running, the at least one processor 220 reads the at least one instruction set and, according to the instructions of the at least one instruction set, performs the content search method of the static site provided herein. The processor 220 may perform the training method of the user identification model and/or all steps involved in the user identification method. Processor 220 may be in the form of one or more processors, in some embodiments processor 220 may include one or more hardware processors, such as microcontrollers, microprocessors, reduced Instruction Set Computers (RISC), application Specific Integrated Circuits (ASICs), application specific instruction set processors (ASIPs), central Processing Units (CPUs), graphics Processing Units (GPUs), physical Processing Units (PPUs), microcontroller units, digital Signal Processors (DSPs), field Programmable Gate Arrays (FPGAs), advanced RISC Machines (ARM), programmable Logic Devices (PLDs), any circuit or processor capable of executing one or more functions, or the like, or any combination thereof. For illustrative purposes only, only one processor 220 is depicted in the computing device 200 in this specification. It should be noted, however, that computing device 200 may also include multiple processors, and thus, operations and/or method steps disclosed in this specification may be performed by one processor as described in this specification, or may be performed jointly by multiple processors. For example, if the processor 220 of the computing device 200 performs steps a and B in this specification, it should be understood that steps a and B may also be performed by two different processors 220 in combination or separately (e.g., a first processor performs step a, a second processor performs step B, or the first and second processors together perform steps a and B).
Fig. 3 shows a flowchart of a training method 300 for a user identification model provided according to an embodiment of the present disclosure. As before, computing device 200 may perform training method 300 of the user identification model of the present specification. Specifically, the processor 220 may read the instruction set stored in its local storage medium and then execute the training method 300 of the user recognition model of the present specification according to the specification of the instruction set. As shown in fig. 3, the method 300 may include:
s320: a sample dataset of a plurality of user samples is determined based on a pre-trained user recognition model corresponding to the first user category.
Wherein the pre-trained user identification model comprises a pre-trained user identification model identifying a first user category user. The pre-trained user identification model is configured to identify users of a first user category.
Wherein the plurality of user samples includes users of a first user category and users of a second user category. The users of the first user category may be pre-labeled user samples. The users of the second user category may be user samples accumulated by cold start, and thus the number of samples of the users of the first user category may be greater than the number of samples of the users of the second user category. The users of the first user category comprise users directly provided with target rights, the users of the second user category comprise users provided with target rights after reaching preset conditions, and the target rights comprise rights of the users to open at least one sub-account. The preset conditions may include at least one of completing at least one preset task or the user grade reaching a preset grade. The at least one preset task may be associated with an application scenario, for example, taking the application scenario as an example of a financial scenario, the at least one preset task may include a transaction task, an interaction task, a click task, a touch-up task, or tasks under other financial scenarios, and so on.
The sample data set comprises multi-dimensional user data of each user sample in the plurality of user samples, wherein the multi-dimensional user data can comprise user attribute data, user behavior data or various implicit data of a user, and the implicit data can comprise preference data, willingness data or other implicit data of the user, and the like.
The manner of determining the sample data set of the plurality of user samples based on the pre-trained user identification model corresponding to the first user category may be various, and specifically may be as follows:
for example, the processor 220 may obtain a user dataset of an original user set, input the user dataset to a pre-trained user recognition model to obtain a first original user corresponding to a first user category in the original user set, and determine a sample dataset of a plurality of user samples based on the first original user and the user dataset.
Wherein the set of original users may include a first original user of a first user category and original users of other user categories, which may include a second user category. The manner of determining the sample data set of the plurality of user samples may be various based on the first original user and the user data set, for example, the processor 220 may select at least one original user other than the first original user from the original user set to obtain a second original user of a second user class, select user data corresponding to the second original user from the user data set to obtain target user data, and obtain current user data of at least one user of the first user class, and use the current user data and the target user data as the sample data set of the plurality of user samples.
Because the prediction task for the user of the second user category has no label (label), a cold start model is needed to accumulate the samples, and the requirements of the possibility of implementation and the fast upper limit are considered, therefore, a pre-training user identification model corresponding to the first user category can be adopted as the cold start model to accumulate the samples, after the user of the first user category is predicted, the rest of users are taken as the users of the second user category, so that a small number of samples of the second user category can be obtained, and then the user prediction of the second user category is modeled by adopting a migration learning scheme.
S340: and inputting the sample data set into a preset multitask model to obtain a prediction result of each task in the plurality of tasks.
The preset multitasking model may include a model that may predict a plurality of tasks, and the multitasking model may be referred to as a multi-objective model. The plurality of tasks may include predicting a user of a first user category and predicting a user of a second user category. The plurality of tasks may also include at least one auxiliary task, which may include at least one of predicting other categories of users or predicting at least one user behavior of the user. The other categories include user categories other than the first user category and the second user category, and the other categories may be various, for example, may include at least one category classified based on user attributes (such as the number of resources of the user, the location of the user, the consumption scene or others in which the user is located), or at least one category classified based on other information of the user, and so on. The at least one user action may include a user click, touch, or other interactive action, and so forth. The preset multitasking model and the pre-trained user identification model share network parameters, that is, the initial network parameters of the preset multitasking model are the same as the network parameters of the pre-trained user identification model. In addition, the network structure of the pre-training user identification model may be identical to the network structure of the pre-training user identification model, the network structure of the pre-training user identification model may be partially identical to the network structure of the pre-training user identification model, or the pre-training user identification model may further include the pre-training user identification model, and so on. Thus, the pre-trained user recognition model may also be a multi-tasking model or a multi-objective model.
The prediction result may predict a result under a corresponding task, for example, taking a task as an example of predicting a user of the second user category, where the corresponding prediction result may include a result of predicting whether the user is the user of the second user category, or a probability or score that the predicted user is the user of the second user category, and so on.
The sample data set is input to a preset multitasking model, and the manner of obtaining the prediction result of each task in the plurality of tasks may be various, which may be specifically as follows:
for example, the processor 220 may perform multidimensional feature extraction on the sample data set to obtain a user feature set of each of the plurality of user samples, and input the user feature set to a preset multitasking model to obtain a prediction result of each of the plurality of tasks.
Wherein the set of user features may include at least one of user display features, user implicit features, continuous features, or user behavior sequence features. The user display characteristics may include characteristics related to user attributes, such as user identification, user level, or other attribute information, and so forth. The user display characteristics may include discrete distributions of characteristic information. The implicit user characteristics may include characteristic information characterizing user preferences, interests or willingness. The continuous characteristic may include a characteristic of continuous information characterizing the user, for example, may include characteristic information of the user over a period of time or period, such as the user's activity over a week/month or other period of time or other continuous characteristic information, and so forth. The sequence of user behavior features may include sequence features corresponding to at least one user behavior of the user, which may include user transaction behavior, end-in behavior, or other interactive behavior, and so forth.
After extracting the user feature set of each user sample, the processor 220 may input the user feature set into a preset multitasking model to obtain a prediction result of each task of the plurality of tasks. The preset multitasking model may include a feature extraction network group configured to extract task features corresponding to each task and sharing features between a plurality of tasks, and a prediction network corresponding to each task. The task features may include feature information for each task that is different from other tasks, and the shared features may include feature information shared among the different tasks. The prediction network may be configured to predict a predicted outcome of the corresponding task based on the task feature and the shared feature. The manner of inputting the user feature set into the multitask model may be various, for example, the processor 220 may perform feature conversion on features in the user feature set to obtain a target user feature set, input the target user feature set into the feature extraction network group to obtain task features corresponding to each task and sharing features among a plurality of tasks, fuse the task features corresponding to each task with the sharing features to obtain target task features corresponding to each task, and input the target task to the prediction network of the corresponding task to obtain a prediction result of each task.
There may be various ways to perform feature conversion on the features in the user feature set, for example, the processor 220 may convert the features in the user feature set into a query feature (q), a key feature (k), and a value feature (v), so as to obtain the target user feature set.
After feature grabbing the features in the user feature set, the processor 220 may input the converted target user feature set into the feature extraction network group. The feature extraction network group includes a multi-layer feature extraction network. The manner of inputting the target user feature set into the feature extraction network group may be various, for example, the processor 220 may select a target feature extraction network corresponding to a first layer from the multi-layer feature extraction networks, input the target user feature set into the target feature extraction network to obtain an initial task feature corresponding to each task and an initial shared feature between a plurality of tasks, fuse the initial shared feature, the initial task feature and the target user feature set to obtain a target sample feature, take the target sample feature as the target user feature set, take a next layer of the target feature extraction network as the target feature extraction network, and return to execute the step of inputting the target user feature set into the target feature extraction network until the target feature extraction network is a final layer of feature extraction network, so as to obtain the task feature corresponding to each task and the shared feature between a plurality of tasks.
Each layer of feature extraction network in the multi-layer feature extraction network comprises a task feature extraction sub-network corresponding to each task and a shared feature extraction sub-network among a plurality of tasks. There may be various ways of inputting the target user feature set into the target feature extraction network, for example, the processor 220 may take each task as a target task, input the user feature set into the task feature extraction sub-network corresponding to the target task to obtain an initial task feature corresponding to the target task, and input the target user feature set into the shared feature extraction sub-network to obtain an initial shared feature.
The method of inputting the target user feature set into the task feature extraction sub-network corresponding to the target task to obtain the initial task feature corresponding to the target task may be various, for example, the processor 220 may perform relative position encoding on the features in the target user feature set to obtain a candidate task feature set of the target task, and fuse the features in the candidate task feature set to obtain the initial task feature corresponding to the target task.
The method for performing the relative position encoding on the features in the target user feature set may be various, for example, the processor 220 may perform the relative position encoding on the features in the target user feature set to obtain an initial user encoded feature set, and perform spatial transformation on the features in the initial user encoded feature set based on a preset gating linear activation function to obtain a candidate task feature set of the target task.
The task feature extraction sub-network of the target task may include an MSA layer (multi-headed attention layer) in a network of a T5 (a network structure) structure, among others. There may be various ways to encode the features in the target user feature set in a relative position, for example, the processor 220 may encode the features in the target user feature set in a relative position through the MSA layer, so as to obtain the initial user feature encoded feature set.
After performing the relative position coding on the features in the target user feature set, the processor 220 may perform spatial transformation on the features in the initial user feature code set based on the preset gating linear activation function, so as to obtain a candidate task feature set of the target task. The task feature extraction subnetwork may also include a feed forward neural network Layer (FFN Layer). There may be various manners of spatially transforming the features in the initial user feature code set, for example, the processor 220 may spatially transform the features in the initial user feature code set through the FFN layer, so as to obtain a candidate task feature set of the target task. Compared with a traditional T5 structure, the FFN layer in the scheme can change the first change layer activated by relu (an activation function) into a gating linear unit activated by gelu (an activation function), so that the FFN layer is increased by 50% of parameters, and a more accurate candidate task feature set is obtained.
After performing the relative position coding on the features in the target user feature set, the processor 220 may fuse the features in the candidate task feature set to obtain the initial task feature corresponding to the target task. The manner of fusing the features in the candidate task feature set may be various, for example, the processor 220 may adjust the coding position corresponding to the candidate task feature set based on the sample domain of the user sample, to obtain an adjusted candidate task feature set, perform feature transformation on the adjusted candidate task feature set, to obtain a transformed candidate task feature set, and adjust the coding position corresponding to the transformed candidate task feature set, to obtain the initial task feature of the target task.
The method for adjusting the coding position corresponding to the candidate task feature set may be various, for example, the processor 220 may obtain a position coding adjustment parameter, determine a position weight of the coding position corresponding to the candidate task feature set based on the position coding adjustment parameter, and weight the feature in the candidate task feature set based on the position weight, so as to obtain an adjusted candidate task feature set.
Wherein the location weight may characterize the importance of the location of the feature at the time of embedding. The relative coding position of the feature at the time of embedding can be selected by the position weight. The position code adjustment parameters include a feature map parameter and a normalization parameter, the feature map parameter including a first map parameter and a second map parameter. The first mapping parameter and the second mapping parameter may be parameters mapped to a preset feature space. There are various ways to obtain the position code adjustment parameters, for example, the processor 220 may directly generate the position code adjustment parameters through a Hyper-network (super network), or may directly obtain the position code parameters, or the like.
After obtaining the position code adjustment parameters, the processor 220 may determine the position weights for the encoding positions corresponding to the candidate task feature set based on the position code adjustment parameters. The manner of determining the position weight may be various, for example, the processor 220 may map the features in the candidate task feature set to a preset first feature space based on a first mapping parameter to obtain a first mapped feature set, perform nonlinear change on the features in the first feature set to obtain a nonlinear feature set, map the features in the nonlinear feature set to a preset second feature space based on a second mapping parameter to obtain a second mapped feature set, and normalize the features in the second mapped feature set based on the normalization parameter to obtain the position weight corresponding to the coding position.
After determining the location weight corresponding to the encoding location, the processor 220 may weight the features in the candidate task feature set based on the location weight to obtain an adjusted candidate task feature set. And then, carrying out feature transformation on the adjusted candidate task feature set to obtain a transformed candidate task feature set. There may be various ways of performing feature transformation on the adjusted candidate task feature set, for example, the processor 220 may accumulate features in the adjusted candidate task feature set, normalize the accumulated features, and input the normalized features to the FFN layer, so as to obtain a transformed candidate task feature set.
After performing feature transformation on the adjusted candidate task feature set, the processor 220 may adjust the coding position corresponding to the transformed candidate task feature set, so as to obtain the initial task feature of the target task. The manner of adjusting the coding positions corresponding to the transformed candidate feature sets is similar to the manner of adjusting the coding positions corresponding to the candidate task features, which is described in detail above, and will not be described in detail here.
The task feature extraction sub-network has a similar network structure to the shared feature extraction sub-network, so that the method of extracting the initial shared feature through the shared feature extraction sub-network is similar to the method of extracting the initial task feature through the task extraction sub-network, which is described in detail above, and will not be described in detail here.
It should be further noted that the network structures of the task feature extraction sub-network and the shared feature extraction sub-network may be similar or identical. The task feature extraction sub-network and the shared feature extraction sub-network can be regarded as a basic feature extraction sub-network (backhaul), and the type of network structure of the feature extraction sub-network can be various, for example, superMoE (a gated multi-expert network), CNN or other network structures can be included. Taking backbox as SuperMoE as an example, the network structure of the feature extraction subnetwork may be as shown in fig. 4. For the user pre-training characterization, the conventional feature extraction sub-network may generally adopt BERT (a bi-directional coding network), where BERT structure is upgraded to T5 structure and the characterization capability of the ulti-source domain of the pre-training model (preset multi-task model) is emphasized, in addition, an adapter module (a module for adjusting coding position) is added to the T5 structure and used as a pre-training knowledge migration (knowledge transfer) of the ulti-source domain, and the network structure of the present scheme can be distinguished from the BERT network as shown in fig. 5. The structure of the adapter module may be as shown in fig. 6, where the position code adjustment parameter is generated by using Hype-network, then, based on the position code adjustment parameter, the features in the candidate task feature set are mapped to a preset first feature space by using Down Projection Layer (lower mapping Layer), so as to obtain a first mapping feature set, the features in the first mapping feature set are subjected to Nonlinear change by using a Nonlinear Layer (Nonlinear Layer), so as to obtain a Nonlinear feature set, then, the features in the Nonlinear feature set are mapped to a preset second feature space by using Up Projection Layer (upper mapping Layer), so as to obtain a second mapping feature set, then, the second mapping feature set is normalized by using a Layer normal (normalization Layer), so as to obtain a position weight of a coding position, and finally, based on the position weight, the features in the candidate task feature set are weighted, so as to obtain an adjusted candidate task feature set, thereby realizing control or adjustment of the coding position of the relative coding.
After extracting the initial task feature and the initial shared feature of the plurality of tasks corresponding to each task through the task feature extraction sub-network and the shared feature extraction sub-network in the target feature extraction network of the first layer, the processor 220 may fuse the initial shared feature, the initial task feature and the target user feature to obtain a target sample feature, and use the target sample feature as a target user feature set, then use the next layer feature extraction network of the target feature extraction network as a target feature extraction network, and return to execute the step of inputting the target user feature into the target feature extraction network, thereby obtaining the task feature corresponding to each task and the shared feature among the plurality of tasks.
It should be noted that, stacking or other dense learning modes in dense learning may be adopted between feature extraction networks of different layers, so that knowledge between different tasks may be obtained, and in a multi-task model, prediction results of models corresponding to multiple tasks may be combined, so that a better and more robust prediction result than a model corresponding to any one task may be obtained.
After obtaining the task feature of each task and the sharing feature between the tasks, the processor 220 may fuse the task feature corresponding to each task with the sharing feature, thereby obtaining the target task feature corresponding to each task. The fusion manner may be various, for example, the processor 220 may directly splice or accumulate the task feature of each task and the shared feature of a plurality of tasks, so as to obtain the target task feature corresponding to each task, or may also obtain the task weight, weight the task feature and the shared feature based on the task weight, respectively, so as to obtain the weighted task feature and the weighted shared feature, and splice or accumulate the weighted task feature and the weighted shared feature, so as to obtain the target task feature corresponding to each task.
After fusing the task features corresponding to each task with the shared features, the processor 220 may input the fused target task features to the prediction network of the corresponding task, so as to obtain a prediction result of each task. Each task corresponds to a prediction network, and the network structure of the preset multitask model can comprise a double-tower multi-target model or a multi-tower multi-target model. The prediction network in the dual-tower multi-objective model may include a first prediction network and a second prediction network, the first prediction network may include a prediction network corresponding to a first user category, and the second prediction network may include a prediction network corresponding to a second user category and/or a prediction network corresponding to at least one auxiliary task. The prediction networks in the multi-tower and multi-objective model may include a first prediction network corresponding to a first user category, a second prediction network corresponding to a second user category, and a third prediction network corresponding to at least one auxiliary task. The type of network structure of the predictive network may be various, and may include, for example, a softmax network (a sort network), an MLP (multi-layer perceptron) or other type of sort network, and so on.
S360: and updating the preset multitasking model based on the prediction result to obtain a target user identification model.
The target user identification model is configured to identify a target user corresponding to the second user category.
The method for updating the preset multitasking model based on the prediction result may be various, and specifically may be as follows:
for example, the processor 220 may obtain task labeling results corresponding to the plurality of user samples in the plurality of tasks, compare the task labeling results with the prediction results to obtain user joint loss corresponding to each task in the plurality of tasks, converge the preset multitask model based on the user joint loss, obtain a target multitask model, and use the target multitask model as the target user identification model.
The task labeling result may include a result of labeling a plurality of user samples under a plurality of tasks, and taking a predicted task as an example of predicting a user corresponding to the first user category, where the task labeling result may include labeling whether each user sample in the plurality of user samples is a user of the first user category. The task labeling result and the prediction result are compared to obtain multiple modes of user joint loss corresponding to each task in the multiple tasks, for example, the processor 220 may compare the task labeling result with the prediction result to obtain task loss of each user sample in the multiple user samples, where the task loss includes loss corresponding to each task in the multiple tasks, based on task type of each task, at least one corresponding user sample is selected from the multiple user samples to obtain a target user sample corresponding to each task, and task loss of the target user sample under the corresponding task is accumulated to obtain user joint loss corresponding to each task.
The manner of selecting at least one user from the plurality of user samples may be various based on the task type of each task, for example, the processor 220 may determine a task feature class corresponding to each task based on the task type of each task, match the task feature class with a target task feature corresponding to each user sample of the plurality of user samples, and use the user sample successfully matched as a target user sample of the corresponding task, so as to obtain a target user sample corresponding to each task.
The task loss of each user sample in the plurality of user samples may include a loss corresponding to each task, and may further include a loss corresponding to a target task, where the target task may include a plurality of tasks, and may also include one or more tasks in the plurality of tasks.
After selecting the target user sample corresponding to each task, the processor 220 may accumulate the task loss of the target user sample under the corresponding task, thereby obtaining the user joint loss corresponding to each task. The user joint loss can comprise task loss of at least one user sample related to each task in the corresponding task, so that accuracy of finishing (fine tuning) in transfer learning can be improved, and training precision of a target user identification model is further improved.
After determining the user joint loss corresponding to each task, the processor 220 may converge the preset multitasking model based on the user joint loss, thereby obtaining a target multitasking model, and take the target multitasking model as a target user identification model.
It should be noted that, the target multitasking model may include a multitasking model obtained by performing migration learning based on a pre-trained user identification model. The training accuracy or training effect of the multi-task model in the transfer learning can be judged by predicting the accuracy of the user of the first user class in the multi-task model. In addition, the preset multi-task model may be used for the transfer learning, and the specific framework may include an input layer, an underlying extraction layer, a task Feature extraction layer, and a prediction layer, as illustrated in fig. 7, where the input layer may include a user explicit Feature (hyperbolic embedding) for inputting a user sample, a user behavior sequence Feature (BERT email), a user implicit Feature (speaker email), and a continuous Feature (response Feature), and so on. The underlying extraction layer may be used to extract initial task features for each task and initial shared features between multiple tasks. The task feature extraction layer can be used for carrying out feature interaction and purification in the extracted initial task features and initial shared features, and further extracting more accurate task features corresponding to each task in a plurality of user samples and shared features among a plurality of tasks. The prediction layer can output user scores of a plurality of user samples under a plurality of tasks, determine user joint task loss corresponding to each task based on the user scores, and adjust network parameters of a preset multi-task model based on the user joint loss.
Taking the first user category as a one-key subscriber and the second user category as a non-one-key subscriber as an example, the training process of the preset multitasking model may be as shown in fig. 8, and may mainly include three parts of feature extraction (Feature Projection), model Pretrain (Pretrain) and model fine tuning (Finetune). In feature extraction, multi-dimensional user features are extracted from user data of a user sample, and the user features may include embedded features (feature embedding), relative position (Relative position), relative time (Relative time), sequential features (Month/Week/Hour/hiliday embedding), and so on. And taking the features as a user feature set of each user sample in the plurality of user samples, fusing the features in the user feature set, and converting the fused user features into query features (Q), key features (K) and value features (V), so as to obtain a target user feature set of each user sample. In the model pre-training process, the processor 220 may use a MOE Trm (a feature extraction network structure) to extract the task features of the multiple samples in each task and the shared features between the multiple tasks from the target user feature set, and in the extracting process, may use a Stacking manner to process the features extracted from different layers, so as to output the task features corresponding to each task and the shared features between the multiple tasks. And inputting the task features and the congratulation features into a softmax network/Pooling layer/MLP so as to obtain user scoring of each user in the plurality of user samples for a plurality of tasks, and determining a prediction result of each task based on the user scoring. In the fine tuning of the model, the prediction result is compared with the task labeling result to obtain the user joint loss of each task, the network parameters of the preset multi-task model are fine-tuned based on the user joint loss, so that a target multi-task model is obtained, and the target multi-task model is used as a target user identification model for non-one-key access users.
In the identification process of non-one-key access users, the problems of no user sample, no accumulation of user behaviors, no label (label) and the like are faced, the user sample is accumulated through the cold start model, the requirement of quick online is met, and a solid foundation is laid for follow-up migration learning work. In the model research and development stage, a large number of sample sparseness problems are faced, a characterization+double-tower (or multi-tower) PLE scheme is adopted, and a multi-target multi-tower model is used for migration learning; in the model training stage, different training tasks are performed, sample spaces are different and have continuity, and aiming at different tasks, user joint loss is calculated by adopting each task, so that the training precision of the model can be improved; in the model optimization stage, the problem of multi-point delivery of the target user identification model is faced, and user sequence characterization is carried out on behaviors such as SPM (terminal internal behavior), rights and interests of a user, so that the effect of the target user identification model can be improved; on the model structure, the traditional BERT structure is updated to a T5 structure, and an adapter module is added on the basis of the T5 structure, so that the pre-training knowledge migration of a multi-source domain can be improved, and the accuracy of a target user identification model is further improved.
After performing the transfer learning on the preset multitasking model based on the pre-trained user identification model corresponding to the first user category to obtain the target user identification model, the processor 220 may identify the target user corresponding to the second user category based on the target user identification model. Fig. 9 shows a flowchart of a user identification method 400 provided in accordance with an embodiment of the present description. As before, the computing device 200 may perform the user identification method 400 of the present specification. In particular, the processor 220 may read the instruction set stored in its local storage medium and then perform the user identification method 400 of the present specification, as specified by the instruction set. As shown in fig. 9, the method 400 may include:
s420: user data for each user in the set of users is obtained.
The user data may include user attribute data, user behavior data, or various implicit data of the user, where the implicit data may include preference data, willingness data, or other implicit data of the user, and so on.
The manner of obtaining the user data of each user in the user set may be various, and specifically may be as follows:
for example, the processor 220 may receive user data uploaded by the user 110 through the client 120 or the terminal, or may obtain user data of at least one user on a user interaction platform, or may obtain user data of each user in a user set in a network or a database, and may further receive a user identification request when the number of users in the user set is large or the memory of the user data is large, where the user identification request includes a storage address of the user data, obtain the user data of each user in the user set based on the storage address, and so on.
S440: and inputting the user data into a target user identification model to obtain a category prediction result of each user.
The target user identification model includes a multi-task model obtained by performing migration learning based on a pre-training user identification model corresponding to the first user category, and specific training processes can be described above, and will not be described in detail herein.
Wherein the category prediction result indicates whether each user is predicted to be a user of the second user category, and thus the category prediction result may include one of a user of the second user category and a user of a non-second user category.
The manner of inputting the user data into the target user identification model to obtain the category prediction result of each user may be various, and specifically may be as follows:
for example, the processor 220 may input user data into the target user identification model, derive a user category probability or user category score for each user, and determine a category prediction result for each user based on the user category probability or user category score.
Wherein the user category probabilities may include probabilities of the user being a user of the second user category, and the user category score may indicate a degree of realism of the user being a user of the second user category. Based on the user category probabilities or the user category scores, there may be various ways of determining the category prediction result of each user, for example, the processor 220 may determine that the user category probability is greater than a preset category probability threshold or the user category score is greater than a preset score threshold, determine that the corresponding user is a user of the second user category, or determine that the user category probability is less than a preset category probability threshold or the user category score is less than a preset score threshold, determine that the corresponding user is a user of a non-second user category, and so on.
S460: and selecting at least one user corresponding to the second user category from the user set based on the category prediction result to obtain a target user.
For example, the processor 220 may select at least one user of the set of users for whom the category prediction result is a second user category, thereby yielding a target user.
The users of the first user category include users directly having target rights, the users of the second user category include users having target rights after reaching preset conditions, and the target rights include rights to open at least one sub-account for the users, which is described in detail above, and will not be described in detail herein.
In some embodiments, after identifying the target user of the second user category, the processor 220 may further push the information of the opening sub-account to the target terminal corresponding to the target user of the second user category, where the target user may complete at least one preset task at the terminal, and the processor 220 may open the authority of at least one word account for the target user. The at least one preset task may be referred to above, and will not be described in detail herein.
In summary, in the training method 300, the user identifying method 400, and the system 100 for a user identifying model provided in the present disclosure, a sample data set of a plurality of user samples is determined based on a pre-training user identifying model corresponding to a first user category, where the plurality of user samples include users of the first user category and users of a second user category, and then the sample data set is input into a preset multitask model to obtain a prediction result of each task of a plurality of tasks, where the plurality of tasks include users predicting the first user category and users predicting the second user category, the preset multitask model and the pre-training user identifying model share network parameters, and the preset multitask model is updated based on the prediction result to obtain a target user identifying model, where the target user identifying model is configured to identify a target user corresponding to the second user category; according to the scheme, the pre-training user identification model corresponding to the first user category can be started in a cold mode, so that a small amount of user samples corresponding to the second user category are accumulated, then, network parameters of the pre-training user identification model are migrated to a preset multi-task model, and the network parameters of the preset multi-task model are updated through accumulated sample data of a plurality of user categories, so that the accuracy of the trained target user identification model can be improved, and the accuracy of user identification is improved.
In another aspect, the present description provides a non-transitory storage medium storing at least one set of executable instructions for performing training of a user recognition model and/or user recognition. When executed by a processor, the executable instructions direct the processor to implement the steps of the user identification model training method 300 and/or the user identification method 400 described herein. In some possible implementations, aspects of the specification can also be implemented in the form of a program product including program code. The program code is for causing the computing device 200 to perform the steps of the training method 300 and/or the user identification method 400 of the user identification model described in this specification when the program product is run on the computing device 200. The program product for implementing the methods described above may employ a portable compact disc read only memory (CD-ROM) comprising program code and may run on computing device 200. However, the program product of the present specification is not limited thereto, and in the present specification, the readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system. The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Program code for carrying out operations of the present specification may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on computing device 200, partly on computing device 200, as a stand-alone software package, partly on computing device 200, partly on a remote computing device, or entirely on a remote computing device.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In view of the foregoing, it will be evident to a person skilled in the art that the foregoing detailed disclosure may be presented by way of example only and may not be limiting. Although not explicitly described herein, those skilled in the art will appreciate that the present description is intended to encompass various adaptations, improvements, and modifications of the embodiments. Such alterations, improvements, and modifications are intended to be proposed by this specification, and are intended to be within the spirit and scope of the exemplary embodiments of this specification.
Furthermore, certain terms in the present description have been used to describe embodiments of the present description. For example, "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present description. Thus, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined as suitable in one or more embodiments of the invention.
It should be appreciated that in the foregoing description of embodiments of the present specification, various features have been combined in a single embodiment, the accompanying drawings, or description thereof for the purpose of simplifying the specification in order to assist in understanding one feature. However, this is not to say that a combination of these features is necessary, and it is entirely possible for a person skilled in the art to label some of the devices as separate embodiments to understand them upon reading this description. That is, embodiments in this specification may also be understood as an integration of multiple secondary embodiments. While each secondary embodiment is satisfied by less than all of the features of a single foregoing disclosed embodiment.
Each patent, patent application, publication of patent application, and other materials, such as articles, books, specifications, publications, documents, articles, etc., cited herein are hereby incorporated by reference. All matters are to be interpreted in a generic and descriptive sense only and not for purposes of limitation, except for any prosecution file history associated therewith, any and all matters not inconsistent or conflicting with this document or any and all matters not complaint file histories which might have a limiting effect on the broadest scope of the claims. Now or later in association with this document. For example, if there is any inconsistency or conflict between the description, definition, and/or use of terms associated with any of the incorporated materials, the terms in the present document shall prevail.
Finally, it is to be understood that the embodiments of the application disclosed herein are illustrative of the principles of the embodiments of the present specification. Other modified embodiments are also within the scope of this specification. Accordingly, the embodiments disclosed herein are by way of example only and not limitation. Those skilled in the art can adopt alternative arrangements to implement the application in the specification based on the embodiments in the specification. Therefore, the embodiments of the present specification are not limited to the embodiments precisely described in the application.

Claims (21)

1. A training method of a user identification model, comprising:
determining a sample dataset of a plurality of user samples based on a pre-trained user recognition model corresponding to a first user category, the plurality of user samples including users of the first user category and users of a second user category;
inputting the sample data set into a preset multitasking model to obtain a prediction result of each task in a plurality of tasks, wherein the tasks comprise users predicting a first user class and users predicting a second user class, and the preset multitasking model and the pre-training user identification model share network parameters; and
and updating the preset multitasking model based on the prediction result to obtain a target user identification model, wherein the target user identification model is configured to identify a target user corresponding to the second user category.
2. The training method of a user identification model according to claim 1, wherein the users of the first user category include users who directly possess target rights, the users of the second user category include users who possess the target rights after reaching a preset condition, and the target rights include rights to open at least one sub-account for the users.
3. The method for training a user identification model according to claim 1, wherein the determining a sample dataset of a plurality of user samples based on the pre-trained user identification model corresponding to the first user category comprises:
obtaining a user data set of an original user set;
inputting the user data set into the pre-training user identification model to obtain a first original user corresponding to the first user category in the original user set; and
a sample dataset of the plurality of user samples is determined based on the first original user and the user dataset.
4. A method of training a user recognition model according to claim 3, wherein the determining a sample dataset of the plurality of user samples based on the first original user and the user dataset comprises:
Selecting at least one original user except the first original user from the original user set to obtain a second original user of the second user category;
selecting user data corresponding to the second original user from the user data set to obtain target user data; and
current user data of at least one user of the first user category is obtained, and the current user data and the target user data are taken as sample data sets of the plurality of user samples.
5. The training method of a user identification model according to claim 1, wherein the inputting the sample dataset into a preset multitasking model to obtain a prediction result of each task of a plurality of tasks includes:
performing multidimensional feature extraction on the sample data set to obtain a user feature set of each user sample in the plurality of user samples; and
and inputting the user characteristic set into the preset multitasking model to obtain a prediction result of each task in the plurality of tasks.
6. The method of training a user recognition model of claim 5, wherein the set of user features comprises at least one of user explicit features, user implicit features, continuous features, or user behavior sequence features.
7. The training method of a user identification model according to claim 5, wherein the preset multitasking model includes a feature extraction network group and a prediction network corresponding to each task; and
the step of inputting the user feature set to the preset multitasking model to obtain a prediction result of each task in the plurality of tasks includes:
performing feature conversion on the features in the user feature set to obtain the target user feature set,
inputting the target user feature set into the feature extraction network group to obtain task features corresponding to each task and sharing features among the tasks,
fusing the task features corresponding to each task with the sharing features to obtain target task features corresponding to each task, and
and inputting the target task characteristics into a prediction network of the corresponding task to obtain a prediction result of each task.
8. The training method of a user identification model of claim 7, wherein the feature extraction network group comprises a multi-layer feature extraction network; and
the step of inputting the target user feature set into the feature extraction network group to obtain the task feature corresponding to each task and the sharing feature among the tasks, includes:
Selecting a target feature extraction network corresponding to a first layer from the multi-layer feature extraction networks, inputting the target user feature set into the target feature extraction network to obtain initial task features corresponding to each task and initial sharing features among the tasks,
fusing the initial shared feature, the initial task feature and the target user feature set to obtain a target sample feature, taking the target sample feature as the target user feature set,
taking a next-layer feature extraction network of the target feature extraction network as the target feature extraction network, and
and returning to the step of inputting the target user feature set into the target feature extraction network until the target feature extraction network is the last layer of feature extraction network, so as to obtain the task feature corresponding to each task and the sharing feature among the tasks.
9. The training method of a user identification model according to claim 8, wherein each layer of feature extraction network in the multi-layer feature extraction network includes a task feature extraction sub-network corresponding to each task and a shared feature extraction sub-network between the plurality of tasks; and
Inputting the target user feature set to the target feature extraction network to obtain an initial task feature corresponding to each task and an initial sharing feature between the tasks, including:
taking each task as a target task, inputting the user feature set into a task feature extraction sub-network corresponding to the target task to obtain initial task features corresponding to the target task, and
and inputting the target user feature set into the shared feature extraction sub-network to obtain the initial shared feature.
10. The training method of the user identification model according to claim 9, wherein the inputting the target user feature set into the task feature extraction sub-network corresponding to the target task to obtain the initial task feature corresponding to the target task includes:
performing relative position coding on the features in the target user feature set to obtain a candidate task feature set of the target task; and
and fusing the features in the candidate task feature set to obtain initial task features corresponding to the target task.
11. The method for training a user identification model according to claim 10, wherein the performing the relative position coding on the user features in the target user feature set to obtain the candidate task feature set of the target task includes:
Performing relative position coding on the user features in the target user feature set to obtain an initial user coding feature set; and
and carrying out space transformation on the features in the initial user coding feature set based on a preset gating linear activation function to obtain a candidate task feature set of the target task.
12. The method for training the user identification model according to claim 10, wherein the fusing the features in the candidate task feature set to obtain initial task features corresponding to the target task includes:
based on a sample domain of the user sample, adjusting the coding position corresponding to the candidate task feature set to obtain an adjusted candidate task feature set;
performing feature transformation on the adjusted candidate task feature set to obtain a transformed candidate task feature set; and
and adjusting the coding positions corresponding to the transformed candidate task feature sets to obtain initial task features of the target task.
13. The method for training a user identification model according to claim 12, wherein the adjusting the coding positions corresponding to the candidate task feature set to obtain an adjusted candidate task feature set includes:
Obtaining position coding adjustment parameters, and determining the position weight of a coding position corresponding to the candidate task feature set based on the position coding adjustment parameters; and
and weighting the features in the candidate task feature set based on the position weight to obtain the adjusted candidate task feature set.
14. The method of training a user identification model of claim 13, wherein the position-coding adjustment parameters include a feature mapping parameter and a normalization parameter, the feature mapping parameter including a first mapping parameter and a second mapping parameter; and
the determining the position weight of the coding position corresponding to the candidate task feature set based on the position coding adjustment parameter comprises the following steps:
mapping the features in the candidate task feature set to a preset first feature space based on the first mapping parameters to obtain a first mapping feature set,
nonlinear variation is carried out on the features in the first mapping feature set to obtain a nonlinear feature set,
mapping the features in the nonlinear feature set to a preset second feature space based on the second mapping parameters to obtain a second mapping feature set, and
And normalizing the features in the second mapping feature set based on the normalization parameters to obtain the position weights corresponding to the coding positions.
15. The training method of the user identification model according to claim 1, wherein updating the preset multitasking model based on the prediction result to obtain a target user identification model comprises:
obtaining task labeling results corresponding to the plurality of user samples in the plurality of tasks;
comparing the task labeling result with the prediction result to obtain user joint loss corresponding to each task in the plurality of tasks; and
and based on the user joint loss, converging the preset multitasking model to obtain a target multitasking model, and taking the target multitasking model as the target user identification model.
16. The method for training a user identification model according to claim 15, wherein comparing the task annotation result with the prediction result to obtain a user joint loss corresponding to each task of the plurality of tasks, comprises:
comparing the task labeling result with the prediction result to obtain task loss of each user sample in the plurality of user samples, wherein the task loss comprises loss corresponding to each task in the plurality of tasks;
Selecting at least one corresponding user sample from the plurality of user samples based on the task type of each task to obtain a target user sample corresponding to each task; and
and accumulating the task losses of the target user sample under the corresponding tasks to obtain the user joint losses corresponding to each task.
17. The method of training a user recognition model of claim 1, wherein the plurality of tasks further comprises at least one auxiliary task comprising at least one of predicting other categories of users or predicting at least one user behavior of a user.
18. A user identification method, comprising:
obtaining user data of each user in a user set;
inputting the user data into a target user identification model to obtain a category prediction result of each user, wherein the target user identification model comprises a multi-task model obtained by transfer learning based on a pre-trained user identification model corresponding to a first user category; and
and selecting at least one user corresponding to the second user category from the user set based on the category prediction result to obtain a target user.
19. The user identification method of claim 18, wherein the users of the first user category include users having target rights directly, the users of the second user category include users having the target rights after reaching a preset condition, and the target rights include rights to open at least one sub-account for the users.
20. A training system for a user identification model, comprising:
at least one storage medium storing at least one set of instructions for performing training of a user identification model; and
at least one processor communicatively coupled to the at least one storage medium,
wherein the at least one processor reads the at least one instruction set and performs the training method of the user identification model of any of claims 1-17 as indicated by the at least one instruction set when the training system of the user identification model is running.
21. A user identification system, comprising:
at least one storage medium storing at least one instruction set for user identification; and
at least one processor communicatively coupled to the at least one storage medium,
wherein the at least one processor reads the at least one instruction set and performs the user identification method of any of claims 18-19 as directed by the at least one instruction set when the user identification system is running.
CN202310438653.7A 2023-04-22 2023-04-22 Training method of user identification model, user identification method and system Pending CN116467629A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310438653.7A CN116467629A (en) 2023-04-22 2023-04-22 Training method of user identification model, user identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310438653.7A CN116467629A (en) 2023-04-22 2023-04-22 Training method of user identification model, user identification method and system

Publications (1)

Publication Number Publication Date
CN116467629A true CN116467629A (en) 2023-07-21

Family

ID=87183975

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310438653.7A Pending CN116467629A (en) 2023-04-22 2023-04-22 Training method of user identification model, user identification method and system

Country Status (1)

Country Link
CN (1) CN116467629A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116805253A (en) * 2023-08-18 2023-09-26 腾讯科技(深圳)有限公司 Intervention gain prediction method, device, storage medium and computer equipment
CN117675351A (en) * 2023-12-06 2024-03-08 中国电子产业工程有限公司 Abnormal flow detection method and system based on BERT model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116805253A (en) * 2023-08-18 2023-09-26 腾讯科技(深圳)有限公司 Intervention gain prediction method, device, storage medium and computer equipment
CN116805253B (en) * 2023-08-18 2023-11-24 腾讯科技(深圳)有限公司 Intervention gain prediction method, device, storage medium and computer equipment
CN117675351A (en) * 2023-12-06 2024-03-08 中国电子产业工程有限公司 Abnormal flow detection method and system based on BERT model

Similar Documents

Publication Publication Date Title
US20210224306A1 (en) System, Apparatus and Methods for Providing an Intent Suggestion to a User in a Text-Based Conversational Experience with User Feedback
CN116467629A (en) Training method of user identification model, user identification method and system
WO2023168909A1 (en) Pre-training method and model fine-tuning method for geographical pre-training model
CN116720004B (en) Recommendation reason generation method, device, equipment and storage medium
WO2023045605A1 (en) Data processing method and apparatus, computer device, and storage medium
CN110827831A (en) Voice information processing method, device, equipment and medium based on man-machine interaction
CN112395390B (en) Training corpus generation method of intention recognition model and related equipment thereof
WO2022105121A1 (en) Distillation method and apparatus applied to bert model, device, and storage medium
WO2024098524A1 (en) Text and video cross-searching method and apparatus, model training method and apparatus, device, and medium
JP2023017921A (en) Content recommendation and sorting model training method, apparatus, and device and computer program
CN113516480A (en) Payment risk identification method, device and equipment
CN110399479A (en) Search for data processing method, device, electronic equipment and computer-readable medium
JP2024508502A (en) Methods and devices for pushing information
CN115712657A (en) User demand mining method and system based on meta universe
CN116684330A (en) Traffic prediction method, device, equipment and storage medium based on artificial intelligence
CN114357125A (en) Natural language identification method, device and equipment in task type dialogue system
CN110232108B (en) Man-machine conversation method and conversation system
CN114780753A (en) Dialogue recommendation method, device and equipment based on knowledge graph and storage medium
CN117059095B (en) IVR-based service providing method and device, computer equipment and storage medium
CN116680481B (en) Search ranking method, apparatus, device, storage medium and computer program product
CN117235530A (en) Method and device for training intention prediction model and electronic equipment
KR102189558B1 (en) Apparatus, method and system for providing intelligent electric document using voice
US10529323B2 (en) Semantic processing method of robot and semantic processing device
JP2024522358A (en) Machine Learning Assisted Automated Taxonomies for Marketing Automation and Customer Relationship Management Systems
US20220180865A1 (en) Runtime topic change analyses in spoken dialog contexts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination