CN113762585B - Data processing method, account type identification method and device - Google Patents

Data processing method, account type identification method and device Download PDF

Info

Publication number
CN113762585B
CN113762585B CN202110535924.1A CN202110535924A CN113762585B CN 113762585 B CN113762585 B CN 113762585B CN 202110535924 A CN202110535924 A CN 202110535924A CN 113762585 B CN113762585 B CN 113762585B
Authority
CN
China
Prior art keywords
account
sample
identification
model
models
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110535924.1A
Other languages
Chinese (zh)
Other versions
CN113762585A (en
Inventor
张堃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110535924.1A priority Critical patent/CN113762585B/en
Publication of CN113762585A publication Critical patent/CN113762585A/en
Application granted granted Critical
Publication of CN113762585B publication Critical patent/CN113762585B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The application discloses a data processing method, an account type identification device, computer equipment and a storage medium, and belongs to the technical field of computers. According to the method, multiple groups of sample service data of the sample account under multiple service reference dimensions and the account types marked by the sample service data are used for carrying out joint training on multiple initial identification models and initial fusion models, parameters of other models are fixed unchanged in the process of adjusting parameters of each model, so that layering training is not carried out in an isolated mode when a single model is trained, an end-to-end training mode is provided, the identification accuracy of the multiple account identification models and the identification accuracy of the fusion models obtained through training can be greatly improved, and the identification accuracy of the account types is improved.

Description

Data processing method, account type identification method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method, an account type identification device, a computer device, and a storage medium.
Background
With the development of computer technology and the advancement of artificial intelligence (Artificial Intelligence, AI) technology, in more and more application scenarios, machine learning models need to be applied to perform some recognition tasks, such as face recognition, speech recognition, type recognition (i.e., classification), and so on.
Taking the example of identifying high-quality accounts in short video accounts, each user account has multiple business reference dimensions such as account liveness, account consumption condition, account mutual power and the like. For high-quality accounts with higher account liveness, good account consumption condition and larger account mutual power, the high-quality accounts are usually easily identified by a model, in addition, a high-quality account with very good account consumption condition but low account liveness exists, and the identification accuracy of the current model on the high-quality account is lower, so that a method capable of improving the identification accuracy of the account type is needed.
Disclosure of Invention
The embodiment of the application provides a data processing method, an account type identification device, computer equipment and a storage medium, which can improve the account type identification accuracy. The technical scheme is as follows:
in one aspect, a method for processing data is provided, including:
acquiring a plurality of groups of sample service data and account types of sample accounts, wherein the sample service data of different groups correspond to different service reference dimensions;
based on the multiple groups of sample service data and the account types, parameters of multiple initial recognition models and initial fusion models are adjusted, wherein when parameters of any initial recognition model or initial fusion model are adjusted, parameters of other models are kept unchanged;
And responding to the iteration meeting the convergence condition, acquiring a plurality of account identification models and a fusion model after parameter adjustment, wherein the account identification models are used for acquiring the predicted account types based on single-group service data corresponding to service reference dimensions, and the fusion model is used for acquiring the predicted account types based on multiple groups of service data.
In one aspect, a method for identifying an account type is provided, where the method includes:
acquiring multiple groups of service data of a target account, wherein the service data of different groups correspond to different service reference dimensions;
acquiring a plurality of first identification results of the target account based on the plurality of sets of service data, wherein the first identification results are prediction account types determined based on a single set of service data;
acquiring a second identification result of the target account based on a plurality of first identification results of the target account, wherein the second identification result is a predicted account type determined based on the plurality of sets of business data;
and determining the predicted account type of the target account based on the first identification results of the target account and the second identification results of the target account.
In one aspect, there is provided a data processing apparatus, the apparatus comprising:
The first acquisition module is used for acquiring a plurality of groups of sample service data and account types of the sample account, wherein the sample service data of different groups correspond to different service reference dimensions;
the adjustment module is used for adjusting parameters of a plurality of initial recognition models and initial fusion models based on the plurality of groups of sample service data and the account types, wherein the parameters of other models are kept unchanged when the parameters of any initial recognition model or the initial fusion model are adjusted;
the second acquisition module is used for responding to the condition that the iteration meets convergence, acquiring a plurality of account identification models and fusion models after parameter adjustment, wherein the account identification models are used for acquiring the predicted account types based on single-set business data corresponding to business reference dimensions, and the fusion models are used for acquiring the predicted account types based on multiple sets of business data.
In one possible implementation, the adjustment module is configured to:
respectively inputting the multiple groups of sample service data into the multiple initial recognition models to obtain multiple first sample recognition results of the sample account;
inputting the plurality of first sample recognition results into the initial fusion model to obtain a second sample recognition result of the sample account;
Determining a loss function value based on the plurality of first sample recognition results, the second sample recognition result, and the account type;
in response to not meeting the stop condition, adjusting parameters of any one of the plurality of initial recognition models until meeting the stop condition, and adjusting parameters of a next initial recognition model;
and adjusting the parameters of the initial fusion model in response to the completion of the adjustment of the parameters of the plurality of initial recognition models.
In one possible implementation, the adjustment module is further configured to:
and carrying out parameter adjustment on the basic identification model based on any group of sample service data of the sample account to obtain any initial identification model, wherein any initial identification model corresponds to a service reference dimension to which any group of sample service data belongs.
In one possible implementation, the adjustment module is further configured to:
respectively inputting the multiple groups of sample service data into the multiple initial recognition models to obtain multiple first sample recognition results;
and carrying out parameter adjustment on the basic fusion model based on the plurality of first sample recognition results to obtain the initial fusion model.
In one possible implementation manner, the convergence condition is that a difference between the loss function values of the current iteration process and the previous iteration process is smaller than a convergence threshold, and the convergence threshold is a value greater than or equal to 0.
In one aspect, an account type identification device is provided, and the device includes:
the first acquisition module is used for acquiring multiple groups of business data of the target account, wherein the business data of different groups correspond to different business reference dimensions;
the second acquisition module is used for acquiring a plurality of first identification results of the target account based on the plurality of groups of service data, wherein the first identification results are prediction account types determined based on a single group of service data;
the third acquisition module is used for acquiring a second identification result of the target account based on a plurality of first identification results of the target account, wherein the second identification result is a predicted account type determined based on the plurality of groups of business data;
and the determining module is used for determining the predicted account type of the target account based on the first identification results of the target account and the second identification results of the target account.
In one possible implementation manner, the second obtaining module is configured to:
for any one set of business data in the plurality of sets of business data, determining a business reference dimension corresponding to the any one set of business data;
determining an account identification model corresponding to a service reference dimension based on a mapping relation between the service reference dimension and the account identification model, wherein the account identification model is used for acquiring a predicted account type based on single-group service data of the service reference dimension;
And inputting any group of business data into the account identification model, and processing the any group of business data through the account identification model to obtain a first identification result corresponding to the any group of business data.
In one possible implementation manner, the third obtaining module is configured to:
inputting the plurality of first recognition results into a fusion model, weighting the plurality of first recognition results through the fusion model to obtain a plurality of weighted recognition results, wherein the fusion model is used for acquiring a predicted account type based on a plurality of groups of business data;
and carrying out linear mapping on the sum value among the weighted recognition results to obtain the second recognition result.
In one possible implementation manner, the first obtaining module is configured to:
determining service reference dimensions to which a plurality of service data of the target account belong respectively;
and dividing each service data belonging to the same service reference dimension into the same group of service data.
In one possible implementation, the service reference dimension includes at least two of: account liveness, account influence, account consumption condition, account mutual power, associated account number and video play completion rate.
In one aspect, a computer device is provided that includes one or more processors and one or more memories having at least one computer program stored therein, the at least one computer program loaded and executed by the one or more processors to implement a method of processing data or a method of identifying account types as described above.
In one aspect, a storage medium is provided, in which at least one computer program is stored, the at least one computer program being loaded and executed by a processor to implement a method of processing data or a method of identifying account types as described above.
In one aspect, a computer program product or computer program is provided, the computer program product or computer program comprising one or more program codes, the one or more program codes being stored in a computer readable storage medium. The one or more processors of the computer device can read the one or more program codes from the computer-readable storage medium, and the one or more processors execute the one or more program codes, so that the computer device can execute the processing method of the data or the identification method of the account type.
The beneficial effects that technical scheme that this application embodiment provided include at least:
the method has the advantages that multiple groups of sample service data of sample accounts under multiple service reference dimensions and the account types marked by the sample service data are utilized to perform joint training on multiple initial identification models and initial fusion models, parameters of other models are fixed unchanged in the process of adjusting parameters of each model, so that layering training is not performed in an isolated mode when a single model is trained, an end-to-end training mode is provided, namely, each model performs joint training by considering not only self identification results but also combining with identification results of other models, and identification accuracy of the multiple account identification models and fusion models obtained through training can be greatly improved, and accordingly identification accuracy of the account types is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an implementation environment of a data processing method according to an embodiment of the present application;
FIG. 2 is a schematic illustration of a cross-validation method provided in an embodiment of the present application;
FIG. 3 is a flowchart of a method for processing data according to an embodiment of the present application;
FIG. 4 is a flowchart of a method for processing data according to an embodiment of the present application;
fig. 5 is a flowchart of an account type identification method provided in an embodiment of the present application;
fig. 6 is a schematic flowchart of an account type identification method provided in an embodiment of the present application;
fig. 7 is a schematic diagram of an account type identification method according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a data processing device according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of an account type identification device according to an embodiment of the present application;
FIG. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The terms "first," "second," and the like in this application are used to distinguish between identical or similar items that have substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the "first," "second," and "nth" terms, nor is it limited to the number or order of execution.
The term "at least one" in this application means one or more, meaning "a plurality of" means two or more, for example, a plurality of first positions means two or more first positions.
Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure generally includes sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics, etc. The artificial intelligence software technology mainly comprises audio processing technology, computer vision technology, natural language processing technology, machine learning/deep learning and other directions.
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.
The solution provided in the embodiments of the present application relates to techniques such as machine learning of artificial intelligence, and mainly relates to how to use a machine learning model to improve accuracy of identifying account types, which will be described in detail in the following embodiments.
Fig. 1 is a schematic diagram of an implementation environment of a data processing method according to an embodiment of the present application. Referring to fig. 1, in this implementation environment, including terminal 110 and server 120, terminal 110 and server 120 are each an exemplary illustration of a computer device.
The terminal 110 is configured to provide service data of the target account, the user logs in the target account on the terminal 110, and may initiate various service requests and service behaviors, and the server 120 may collect and count the service data of the target account, or the terminal 110 may collect and count the service data of the target account and then send the collected and counted service data to the server 120.
The terminal 110 and the server 120 can be directly or indirectly connected through wired or wireless communication, which is not limited herein.
The server 120 may be configured to provide business services to each terminal 110 and may be further configured to perform account type recognition of the target account, that is, the server 120 may be trained with a plurality of account recognition models and a fusion model, where each account recognition model is configured to obtain a predicted account type of the input account based on a single set of business data corresponding to a business reference dimension, and the fusion model is configured to obtain a predicted account type of the input account based on a plurality of sets of business data, and further, may provide a plurality of first recognition results output by the plurality of account recognition models and a second recognition result output by the fusion model to a technician, so that the technician can analyze the account type of the input account conveniently. In addition, after the account type is determined, based on the determined account type, an indication function can be provided for the resource recommendation work of the subsequent target account, optionally, the resource recommendation work comprises two meanings, namely, how to accurately recommend the resource to the target account, and whether to recommend the target account to other accounts so as to improve the exposure rate of the target account. In addition, the accounts can be integrally sequenced according to account consumption conditions, so that the accounts with higher consumption potential can be conveniently mined, and the high-quality accounts (or called high-grade accounts and core accounts) can be mined or deleted, so that the method has a wide application value.
It should be noted that, as disclosed in the present application, the service data of each account may be stored on the blockchain.
Optionally, the server 120 maintains the multiple account identification models and the fusion model only at the server side, so that the server 120 can group each service data according to the collected multiple service data of each account according to the service reference dimension to obtain multiple groups of service data, call the multiple account identification models to obtain multiple first identification results, call the fusion model to obtain a second identification result, and determine the predicted account type of each account based on the multiple first identification results and the second identification results, so that accurate resource recommendation is performed on the terminals logged in by each account based on the predicted account type of each account.
Optionally, after training to obtain the multiple account identification models and the fusion model, the server 120 sends the multiple account identification models and the fusion model to the terminal 110, so that the terminal 110 can locally invoke the multiple account identification models and the fusion model to determine the predicted account type of the own account by itself, so that the terminal 110 actively requests to recommend corresponding resources to the server 120 according to the predicted account type of the own account.
Alternatively, after the terminal 110 is trained locally to obtain the multiple account identification models and the fusion model, the multiple account identification models and the fusion model are called locally to determine the predicted account type of the own account, so that the terminal 110 actively requests the server 120 to recommend corresponding resources according to the predicted account type of the own account, and the communication overhead between the terminal 110 and the server 120 can be reduced.
Server 120 may include at least one of a server, a plurality of servers, a cloud computing platform, or a virtualization center. Alternatively, the server 120 may undertake primary computing work and the terminal 110 may undertake secondary computing work; alternatively, the server 120 takes on secondary computing work and the terminal 110 takes on primary computing work; alternatively, a distributed computing architecture is employed between both the terminal 110 and the server 120 for collaborative computing.
Optionally, the server 120 is a stand-alone physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), and basic cloud computing services such as big data and artificial intelligence platforms.
Optionally, the terminal 110 is a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, an MP3 (Moving Picture Experts Group Audio Layer III, moving picture experts compression standard audio layer 3) player, an MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compression standard audio layer 4) player, an electronic book reader, or the like, but is not limited thereto.
Those skilled in the art will appreciate that the terminal 110 may refer broadly to one of a plurality of terminals, and that the number of terminals may be greater or lesser. Such as the above-mentioned terminals may be only one, or the above-mentioned terminals may be several tens or hundreds, or more. The number of terminals and the device type are not limited in the embodiment of the present application.
With the development of computer technology and the advancement of AI technology, a stacking method is a multi-model fusion method, and when a plurality of sets of business data are involved, the stacking method needs to train one or more models under each set of business data, and predict on a training set and a testing set to prepare for the next model fusion. In the above procedure, if multiple models are trained on each set of business data, the multiple models can be regarded as a whole, so as to perform cross-validation to prevent model overfitting, if the overfitting problem is not considered, only one model can be trained on each set of business data.
Fig. 2 is a schematic diagram of a cross-validation method provided in this embodiment, as shown in 200, illustrating a training process of a model corresponding to any group of service data, dividing the service data (i.e. training data) into 5 parts randomly according to data amounts, training to obtain 5 models 1 based on a basic model, where the 5 models 1 can be regarded as an integral model group 1, when each model 1 is trained, 1 part of the 5 parts is selected as a validation set, the remaining 4 parts are combined as a training set, model 1 is trained, and a prediction result of model 1 is obtained on the validation set, then the prediction results of the 5 models 1 in the validation set are spliced to obtain a complete prediction result set of model group 1, further, the test sets are respectively input into the 5 models 1 to obtain respective test results of the 5 models 1, and an average value of the 5 test results is used as a final test result of model group 1. And executing the flow on each group of business data to obtain a prediction result set and a final test result of each model group, and training a fusion model based on the prediction result set and the final test result. When in actual use, the business data with different business reference dimensions are respectively input into the corresponding model groups, the output result of each model group is averaged, and then the result is input into the fusion model, thus the final business processing result can be output.
The problem of insufficient information in the training process of the stacking method is described by taking the task of identifying high-quality accounts in short video accounts as an example, and whether each account is a high-quality account or not is comprehensively judged by taking account liveness, account consumption condition, account mutual power and other multiple groups of business reference dimensions of each account. In practical application, an account exists, even if the account liveness is not high under the condition of very good account consumption, the account is finally judged to be a high-quality account, and the final recognition result is described to be subjected to multi-dimensional judgment by integrating multiple groups of service data. In the process of training each group of service data, training samples may be mutated into noise data under different groups of service data, for example, the account numbers of the above type have good account consumption conditions, and belong to normal samples when training under the group of service data of account consumption conditions, but the normal high-quality account numbers also have higher account number liveness, and at the moment, the account numbers of the type show lower account number liveness but carry labels of the high-quality account numbers, so that the characteristics opposite to the account number liveness of the actual high-quality account numbers are shown, and therefore belong to noise samples when training under the group of service data of the account number liveness, new noise is added in the training process, interference is caused to the training process, and the recognition accuracy of the whole model is reduced.
In addition, after the model is trained under each set of service data, the stacking method predicts on the same data set to be used as input data of the next layer, in order to ensure that fitting is not performed on the training set, a cross-validation mode shown in fig. 2 is adopted, namely, one model set is trained for each set of service data instead of a single model, which results in running more times of models in actual use, and the calculation efficiency is greatly affected.
In view of this, the embodiment of the application provides a data processing method, and provides an end-to-end training method, which avoids the problem of insufficient training process information caused by training each service data after grouping and then training a fusion model, so as to train and obtain a plurality of account identification models and fusion models with higher identification accuracy.
Fig. 3 is a flowchart of a data processing method provided in the embodiment of the present application, please refer to fig. 3, the embodiment is applied to a computer device, and the following description will take the computer device as a server as an example, and the embodiment includes the following steps:
301. The server acquires a plurality of groups of sample service data and account types of the sample account, and the sample service data of different groups correspond to different service reference dimensions.
The number of the sample accounts can be any one account registered on a platform provided by the server, the number of the sample accounts can be one or more, and in the embodiment of the present application, only a single iteration process of a single sample account is illustrated as an example, but the number of the sample accounts should not be limited, and when the number of the sample accounts is multiple, each sample account can execute a service data processing flow similar to that of the single sample account.
Optionally, according to different services provided by the account registration platform, the sample account may be a short video account, a social media account, an up main (video uploader) account, a public number, a game account, a comment account, and the embodiment of the present application is not limited in detail.
In some embodiments, when the server acquires the plurality of sets of sample service data, the server may first acquire a plurality of original sample service data of the sample account, determine service reference dimensions to which a plurality of sample service data of the sample account respectively belong, and divide the plurality of sample service data into the plurality of sets of sample service data based on the plurality of service reference dimensions, for example, divide each sample service data belonging to the same service reference dimension into the same set of sample service data.
It should be noted that the business reference dimension refers to an evaluation dimension that needs to be considered when dividing account types, and different business reference dimensions can be set for different account types to be divided. For example, in dividing whether a sample account is a high quality account, business reference dimensions that need to be considered include, but are not limited to: account liveness, account influence, account consumption condition, account mutual power and the like; for another example, when dividing whether the sample account is a high activity account, the business reference dimension to be considered may only include account liveness, account interaction force, and the like, but does not need to consider account consumption situation and account influence; for another example, if the sample account is a short video account, the video completion rate may be introduced in the business reference dimension, if the sample account is a social media account, the account homepage hotness may be introduced in the business reference dimension, and the like, and the embodiments of the present application do not specifically limit the kinds of the business reference dimension.
In some embodiments, the account types of the sample accounts, that is, the classification labels of the sample accounts, may be manually labeled by a technician, or the sample accounts with different account types may be acquired by collecting sample accounts (for example, collecting 1000 high-quality accounts and 1000 non-high-quality accounts) to obtain sample accounts that naturally carry the labeled account types.
In some embodiments, taking the sample account as the short video account as an example, the business reference dimension includes at least two of the following: account liveness, account influence, account consumption situation, account mutual power, number of associated accounts, video playback rate and the like.
Optionally, the plurality of sample business data of the sample account includes: the method comprises the steps of account number daily play amount, account number weekly play amount, account number monthly play amount, account number uploaded video work accumulated play time length, account number uploaded video accumulated click times, account number uploaded video work accumulated comment times, account number uploaded video work accumulated sharing times, account number accumulated total consumption amount, account number latest one order consumption amount, account number accumulated all dynamic comment numbers, account number latest one dynamic comment number, account number vermicelli amount, account number attention amount, account number accumulated clicked video play completion rate, account number 7-day clicked video play completion rate and the like.
When the sample service data is grouped on the basis, the service reference dimension to which each sample service data belongs needs to be determined, namely: the service reference dimensions of the account number daily play amount, the account number weekly play amount and the account number monthly play amount are account number liveness, the service reference dimensions of the account number uploaded video works are account number influence, the service reference dimensions of the account number uploaded video works are account number total consumption amount and the account number latest order consumption amount are account number consumption conditions, the service reference dimensions of the account number accumulated dynamic comment number and the account number latest dynamic comment number are account number mutual power, the service reference dimensions of the account number vermicelli amount and the account number attention amount are associated account number, and the service reference dimensions of the account number accumulated clicked video are video completion rate and the account number 7-day clicked video completion rate.
In some embodiments, taking the sample account as a social media account as an example, the business reference dimension includes at least two of: account liveness, account influence, account consumption situation, account mutual power, number of associated accounts, account homepage heat and the like.
Optionally, the plurality of sample business data of the sample account includes: the account number daily active time, the account number Zhou Huoyue time, the account number month active time, the number of times of accumulated clicks of the issued dynamic state of the account number, the number of times of accumulated comments of the issued dynamic state of the account number, the accumulated total consumption amount of the account number, the consumption amount of the last order of the account number, the account number member grade, the number of all dynamic comments of the account number, the last dynamic comment number of the account number, the last dynamic reading amount of the account number, the account number vermicelli amount, the account number attention amount, the daily access amount of the account number homepage, the weekly access amount of the account number homepage, the monthly access amount of the homepage and the like.
When the sample service data is grouped on the basis, the service reference dimension to which each sample service data belongs needs to be determined, namely: the service reference dimensions of the account daily active time length, the account Zhou Huoyue time length and the account month active time length are account liveness, the service reference dimensions of the account issued dynamic accumulated click times, the account issued dynamic accumulated comment times and the account issued dynamic accumulated share times are account influence, the service reference dimensions of the account accumulated total consumption amount, the consumption amount of the account latest order and the account member grade are account consumption conditions, the service reference dimensions of the account accumulated dynamic comment numbers, the account latest dynamic comment numbers and the account latest dynamic reading amount are account mutual power, the service reference dimensions of the account vermicelli amounts and the account attention amounts are associated account numbers, and the service reference dimensions of the account homepage, the daily access amount of the account homepage and the monthly access amount of the homepage are homepage heat.
In some embodiments, a user may log in the sample account on the terminal, initiate various service requests to the server based on the sample account, record various sample service data of the sample account when the server provides various service requests to the sample account for corresponding service services, and group various sample service data according to service reference dimensions by utilizing various statistical analysis tools to obtain multiple groups of sample service data of the sample account.
In other embodiments, the terminal locally records each sample service data of the sample account, groups each sample service data according to the service reference dimension by using various statistical analysis tools, obtains a plurality of groups of sample service data of the sample account, sends the plurality of groups of sample service data of the sample account to the server, and the server receives the plurality of groups of sample service data of the sample account.
Optionally, according to the setting of the technician, the sample account may be divided into different account types, for example, the sample account is divided into a high-quality account and a non-high-quality account, or the sample account is divided into a head account, a waist account and a bottom account, or the sample account is divided into a high credit account, a medium credit account and a low credit account, which can indicate the downstream task of performing personalized resource recommendation for the accounts of different account types through accurate division of the account types.
In the embodiment of the present application, only the example of identifying whether the sample account is a high-quality account is described, but the account type of the sample account should not be limited. The high-quality account number refers to a core account number or a high-quality account number of the platform, and is required to be comprehensively evaluated and measured through a plurality of business reference dimensions such as account number liveness, account number consumption condition, account number mutual power and the like.
302. And the server adjusts parameters of a plurality of initial recognition models and initial fusion models based on the plurality of groups of sample service data and the account types, wherein the parameters of other models are kept unchanged when adjusting the parameters of any initial recognition model or the initial fusion model.
In some embodiments, the server, in adjusting the parameters of the respective initial recognition model and initial fusion model, may perform the following operations: in the iterative process, respectively inputting the multiple groups of sample service data into the multiple initial recognition models to obtain multiple first sample recognition results of the sample account; inputting the plurality of first sample recognition results into the initial fusion model to obtain a second sample recognition result of the sample account; determining a loss function value based on the plurality of first sample recognition results, the second sample recognition result, and the account type; in response to not meeting the stop condition, adjusting parameters of any one of the plurality of initial recognition models until meeting the stop condition, and adjusting parameters of a next initial recognition model; and adjusting the parameters of the initial fusion model in response to the parameters of the plurality of initial recognition models being adjusted.
Alternatively, the stop condition may be that the loss function value is smaller than a loss threshold value, which may be any value greater than or equal to 0, or the stop condition may be that the number of iterations is greater than a number threshold value, which may be any integer greater than or equal to 1.
In the above process, taking any iteration process as an example, if the stopping condition is not met, adjusting the parameters of the current initial recognition model (and fixing the parameters of other models) until the stopping condition is met, indicating that the adjustment of the parameters of the current initial recognition model is finished, adjusting the parameters of the next initial recognition model (and fixing the parameters of other models) and repeatedly executing the steps until all the parameters of the initial recognition model are finished, adjusting the parameters of the initial fusion model (and fixing the parameters of other models) again, and after the adjustment of the parameters of the initial fusion model is finished, considering that the iteration process is finished and continuously executing the next iteration process.
Optionally, one or more sample accounts may be input in the current iteration process, one or more sample accounts may also be input in the next iteration process, and the sample accounts in the current iteration process and the sample accounts in the next iteration process may be the same or different, which is not specifically limited in the embodiments of the present disclosure.
In the embodiment of the application, a plurality of models are not required to be trained respectively under each group of service data to perform cross-validation like a stacking mode, but the whole training is performed again by utilizing an end-to-end mode in the subsequent step, so that the overfitting problem caused by layering training is avoided.
In some embodiments, the plurality of initial recognition models may be basic recognition models which are not pre-trained, or the plurality of initial recognition models may be models which are pre-trained by the plurality of sets of sample business data based on the basic recognition models.
Alternatively, the base recognition model may be a gradient lift tree (Gradient Boosting, GB) model, a gradient lift decision tree (Gradient Boosting Decision Tree, GBDT) model, a limit gradient lift (eXtreme Gradient Boosting, XGBoost) model, a lightweight gradient lift machine (Light Gradient Boosting Machine, lightGBM) model, or the like, and the embodiment of the present application does not specifically limit the model structure of the base recognition model.
In some embodiments, the process of obtaining any one of the plurality of initial recognition models includes: and carrying out parameter adjustment on the basic identification model based on any group of sample service data of the sample account to obtain any initial identification model, wherein the initial identification model corresponds to a service reference dimension to which any group of sample service data belongs.
In other words, the server pre-trains each initial recognition model under each group of sample service data based on the basic recognition model with the same structure, so that each initial recognition model can correspond to the service reference dimension to which each group of input sample service data belongs, and the initialization work of each initial recognition model is completed.
In some embodiments, the initial fusion model may be a basic fusion model that is not pre-trained, or the initial fusion model may be a model that is pre-trained based on the basic fusion model via the plurality of first sample recognition results.
Optionally, the basic fusion model may be a logistic regression model, a least square method model, or the like, and the embodiment of the present application does not specifically limit the model result of the basic fusion model.
In some embodiments, the obtaining of the initial fusion model includes: respectively inputting the multiple groups of sample service data into the multiple initial recognition models to obtain multiple first sample recognition results; and based on the plurality of first sample recognition results, carrying out parameter adjustment on the basic fusion model to obtain the initial fusion model.
In the process, after each initial recognition model is obtained through pre-training, each group of sample service data is input into each corresponding initial recognition model again, each first sample recognition result output by each initial recognition model is integrated to form a new training set, the training set is used for training the basic fusion model, the initial fusion model is finally obtained, and the initialization work of the initial fusion model is completed.
303. The server responds to the iteration meeting the convergence condition, and acquires a plurality of account identification models and a fusion model after parameter adjustment, wherein the account identification models are used for acquiring the predicted account types based on single-set business data corresponding to business reference dimensions, and the fusion model is used for acquiring the predicted account types based on multiple sets of business data.
In some embodiments, the convergence condition is that a difference between the loss function values of the current iteration process and the previous iteration process is less than a convergence threshold, and the convergence threshold is a value greater than or equal to 0.
In some embodiments, the above step 302 is performed iteratively, and the process of training all the initial recognition models and the initial fusion models is referred to as an iterative process. In each iteration process, obtaining a loss function value when the initial fusion model is trained in the current iteration process, in addition, obtaining a loss function value when the initial fusion model is trained in the previous iteration process, then obtaining the difference between the two loss function values, stopping training if the difference between the two loss function values accords with the convergence condition, determining a plurality of initial recognition models in the current iteration process as a plurality of account recognition models, and determining the initial fusion model in the current iteration process as the fusion model. If the difference value between the two is not in accordance with the convergence condition, continuing to execute the next iteration process until the convergence condition is met.
All the above optional solutions can be combined to form an optional embodiment of the present disclosure, which is not described in detail herein.
According to the method provided by the embodiment of the invention, the plurality of initial recognition models and the initial fusion models are jointly trained by utilizing the plurality of groups of sample service data of the sample account under the plurality of service reference dimensions and the account types marked by the sample account, and in the process of adjusting the parameters of each model, the parameters of other models are fixed unchanged, so that layering training is not carried out in an isolated manner when a single model is trained, but an end-to-end training mode is provided, namely, each model is independently trained by considering the recognition results of the model, and is jointly trained by combining the recognition results of the other models, so that the recognition accuracy of the plurality of account recognition models and the fusion models obtained by training can be greatly improved, and the recognition accuracy of the account types is improved.
Fig. 4 is a flowchart of a data processing method provided in the embodiment of the present application, please refer to fig. 4, the embodiment is applied to a computer device, and the following description will take the computer device as a server as an example, and the embodiment includes the following steps:
401. the server acquires a plurality of groups of sample service data and account types of the sample account, and the sample service data of different groups correspond to different service reference dimensions.
Step 401 is similar to step 301, and will not be described again.
402. And the server inputs the multiple groups of sample service data into multiple initial recognition models respectively to obtain multiple first sample recognition results of the sample account.
In some embodiments, for any one set of sample service data of the plurality of sets of sample service data, in the iterative process, the server determines a service reference dimension corresponding to the any one set of service data; determining an initial recognition model corresponding to the service reference dimension based on the mapping relation between the service reference dimension and the initial recognition model; and inputting the sample service data of any group into the initial recognition model, and processing the sample service data of any group through the initial recognition model to obtain a first sample recognition result corresponding to the sample service data of any group. The above operation is performed on each set of sample service data, so as to obtain a plurality of first sample identification results.
Optionally, the process of acquiring any one of the plurality of initial recognition models is similar to the process of acquiring any one of the initial recognition models in step 302, which is not described herein.
403. And the server inputs the plurality of first sample recognition results into the initial fusion model to obtain a second sample recognition result of the sample account.
In some embodiments, the server inputs the plurality of first sample recognition results into an initial fusion model, and weights the plurality of first sample recognition results through the initial fusion model to obtain a plurality of sample weighted recognition results; and linearly mapping the sum value among the plurality of sample weighted recognition results to obtain the second sample recognition result.
Optionally, the process of obtaining the initial fusion model is similar to the process of obtaining the initial fusion model in step 302, which is not described herein.
404. The server determines a loss function value based on the plurality of first sample identification results, the second sample identification result, and the account type.
The loss function value refers to the loss function value of the plurality of initial recognition models and the whole initial fusion model, that is, in each iteration process, the first sample recognition result output by each initial recognition model does not directly participate in the loss function calculation of the single initial recognition model, but is combined with the first sample recognition results output by other initial recognition models and the second sample recognition model output by the initial fusion model, so as to obtain a loss function value.
For example, assuming that each initial recognition model is a gradient lifting tree model, the initial fusion model is a logistic regression model, and assuming that the current iteration is performed on the ith sample service data of the ith group, the recognition result of the gradient lifting tree model on the ith first sample of a certain sample account is x, the account type (i.e. the actual classification label) of the sample account is y, and the loss function adopted by the gradient lifting tree model is L, if the gradient lifting tree model is trained independently by adopting a stacking mode, the loss function value L (x, y) needs to be obtained independently. However, in the end-to-end training manner in the embodiment of the present application, the loss function value L (x, y) is not required to be acquired, but the second sample recognition result p output by the logistic regression model is determined first, as follows:
wherein c i Weight parameter representing ith gradient-lifting tree model, f i And representing a first sample identification result output by the ith gradient lifting tree model, wherein i is an integer greater than or equal to 1 and less than or equal to n, n is the number of gradient lifting tree models and n is an integer greater than or equal to 1, and b is the intercept (namely the bias parameter) of the logistic regression model.
Assuming that C is a summary of all constants, the above formula can be abbreviated as:
Then, when training the present gradient-lifting tree model, it is not necessary to refer to the loss function value L (x, y) of the individual model itself, but to refer to the loss function value L (p, y) of the whole of each model, so that the current gradient-lifting tree model can be trained under the whole model framework. Assuming that the loss function employed is a logikehood (log likelihood) function, the loss function value is expressed as follows:
L=-(y log p+(1-y)log(1-p))
405. and the server responds to the condition that the stopping condition is not met, and adjusts the parameters of any initial recognition model in the plurality of initial recognition models until the stopping condition is met, and adjusts the parameters of the next initial recognition model.
Alternatively, the stop condition may be that the loss function value is smaller than a loss threshold value, which may be any value greater than or equal to 0, or the stop condition may be that the number of iterations is greater than a number threshold value, which may be any integer greater than or equal to 1.
In some embodiments, in response to not meeting the stop condition, for any one of the plurality of initial recognition models, the server maintains parameters of the initial recognition model and the initial fusion model except for the any one initial recognition model unchanged when adjusting parameters of the any one initial recognition model, adjusts parameters of the any one initial recognition model until meeting the stop condition, indicates that the parameter adjustment of the any one initial recognition model is completed, and adjusts parameters of a next initial recognition model.
In other words, the server traverses each set of sample service data, fixes the parameters of other initial recognition models corresponding to the sample service data of other sets while training the initial recognition model corresponding to each set of sample service data, fixes the parameters of the initial fusion model, trains the current initial recognition model under the whole model frame again until the current initial recognition model is trained, continues to train the next initial recognition model, repeatedly performs the above steps until all the initial recognition models are trained, and then performs the following step 406 to train the initial fusion model.
406. And the server responds to the completion of the adjustment of the parameters of the plurality of initial recognition models, and adjusts the parameters of the initial fusion model.
In some embodiments, the server begins to adjust the parameters of the initial fusion model only if it is ensured that all of the parameters of the initial recognition model have been adjusted. Then, parameters of the adjusted multiple initial recognition models are kept unchanged, parameters of the initial fusion model are adjusted until the parameters meet the stopping condition, a loss function value of the current iteration process in training the initial fusion model is obtained, a difference value is calculated with the loss function value of the last iteration process in training the initial fusion model, and when the difference value meets the convergence condition, the following step 407 is executed.
In other words, when the server trains the initial fusion model, all the parameters of the initial recognition model are fixed, and trains the initial fusion model again under the overall model framework until the stopping condition is met, the loss function value of the iterative process when training the initial fusion model is obtained, and the difference value is calculated with the loss function value of the iterative process when training the initial fusion model, and when the difference value meets the convergence condition, the following step 407 is executed.
In the steps 402-406, the server adjusts parameters of the plurality of initial recognition models and the initial fusion model based on the plurality of groups of sample service data and the account type, wherein when adjusting parameters of any initial recognition model or the initial fusion model, parameters of other models are kept unchanged.
407. The server responds to the iteration meeting the convergence condition, and acquires a plurality of account identification models and a fusion model after parameter adjustment, wherein the account identification models are used for acquiring the predicted account types based on single-set business data corresponding to business reference dimensions, and the fusion model is used for acquiring the predicted account types based on multiple sets of business data.
In some embodiments, the convergence condition is that a difference between the loss function values of the current iteration process and the previous iteration process is less than a convergence threshold, and the convergence threshold is a value greater than or equal to 0.
Step 407 is similar to step 303, and will not be described here.
In some embodiments, since the parameters that need to be input during training include the first and second derivatives of the loss function value over the model output result (i.e., the first sample recognition result x), the expressions for the first and second derivatives are as follows:
in the iterative process, the first derivative and the second derivative are input by utilizing the formula, so that the normal operation of the model iterative process can be ensured, and an end-to-end training mode is realized to train each account identification model and fusion model.
In the embodiment of the application, the server finally obtains the plurality of account identification models and the fusion model by adjusting parameters of the plurality of initial identification models and the initial fusion model. By adjusting the parameters of each model, parameters of other models under the whole framework are kept unchanged, the loss function of the model does not participate in calculation, and the final loss function of the model participates in calculation with the whole model, so that the model can be prevented from being split with the training process of other models when a single model is trained, the problem of model overfitting is solved, a plurality of models are not required to be trained for each group of sample service data, and the data size of the model is greatly reduced.
All the above optional solutions can be combined to form an optional embodiment of the present disclosure, which is not described in detail herein.
According to the method provided by the embodiment of the invention, the plurality of initial recognition models and the initial fusion models are jointly trained by utilizing the plurality of groups of sample service data of the sample account under the plurality of service reference dimensions and the account types marked by the sample account, and in the process of adjusting the parameters of each model, the parameters of other models are fixed unchanged, so that layering training is not carried out in an isolated manner when a single model is trained, but an end-to-end training mode is provided, namely, each model is independently trained by considering the recognition results of the model, and is jointly trained by combining the recognition results of the other models, so that the recognition accuracy of the plurality of account recognition models and the fusion models obtained by training can be greatly improved, and the recognition accuracy of the account types is improved.
In an exemplary scenario, experiments are performed on the task of judging whether a short video account is a high-quality account, and the recall rate is 46.6% under the condition that the accuracy rate on a test set is 30% by using a stacking method, and is improved to 50.9% under the condition that the accuracy rate is also 30% by using the end-to-end training method provided by the embodiment of the application.
Fig. 5 is a flowchart of an identification method of account type according to an embodiment of the present application. Referring to fig. 5, this embodiment is applied to a computer device, and will be described below by taking the computer device as a server, and includes:
501. the server acquires a plurality of groups of service data of the target account, and the service data of different groups correspond to different service reference dimensions.
The number of target accounts may be any account registered on a platform provided by the server, and the number of target accounts may be one or more, which is described only by taking an account type identification process of a single target account as an example, but should not be construed as limiting the number of target accounts, and when the number of target accounts is multiple, each target account may execute an account type identification process similar to that of the single target account.
Optionally, the target account may be a short video account, a social media account, an up main (video uploader) account, a public number, a game account, a comment account, etc. according to different services provided by the account registration platform, which is not specifically limited in the embodiments of the present application.
In some embodiments, when the server obtains the plurality of sets of service data, the server may first obtain a plurality of original service data of the target account, determine service reference dimensions to which a plurality of service data of the target account belong, and divide the plurality of service data into the plurality of sets of service data based on the plurality of service reference dimensions, for example, divide each service data belonging to the same service reference dimension into the same set of service data.
It should be noted that the business reference dimension refers to an evaluation dimension that needs to be considered when dividing account types, and different business reference dimensions can be set for different account types to be divided. For example, in classifying whether the target account is a high quality account, business reference dimensions that need to be considered include, but are not limited to: account liveness, account influence, account consumption condition, account mutual power and the like; for another example, when dividing whether the target account is a high activity account, the business reference dimension to be considered may only include account liveness, account mutual power, and the like, but does not need to consider account consumption situation and account influence; for another example, if the target account is a short video account, the video completion rate may be introduced in the business reference dimension, if the target account is a social media account, the account homepage hotness may be introduced in the business reference dimension, etc., and the embodiments of the present application do not specifically limit the kinds of the business reference dimension.
In some embodiments, taking the target account as the short video account as an example, the business reference dimension includes at least two of: account liveness, account influence, account consumption situation, account mutual power, number of associated accounts, video playback rate and the like.
Optionally, the plurality of service data of the target account includes: the method comprises the steps of account number daily play amount, account number weekly play amount, account number monthly play amount, account number uploaded video work accumulated play time length, account number uploaded video accumulated click times, account number uploaded video work accumulated comment times, account number uploaded video work accumulated sharing times, account number accumulated total consumption amount, account number latest one order consumption amount, account number accumulated all dynamic comment numbers, account number latest one dynamic comment number, account number vermicelli amount, account number attention amount, account number accumulated clicked video play completion rate, account number 7-day clicked video play completion rate and the like.
When the service data is grouped on the basis, the service reference dimension to which each service data belongs needs to be determined, namely: the service reference dimensions of the three types of account daily play quantity, account weekly play quantity and account monthly play quantity are account liveness, the accumulated play time of the video works uploaded by the account, the accumulated click times of the videos uploaded by the account, the accumulated comment times of the video works uploaded by the account and the accumulated share times of the video works uploaded by the account are account influence, the service reference dimensions of the two types of account accumulated total consumption amount and the consumption amount of the last order of the account are account consumption conditions, the service reference dimensions of the two types of account accumulated dynamic comment numbers and the last dynamic comment number of the account are account mutual power, the service reference dimensions of the two types of account vermicelli quantity and account attention quantity are associated account numbers, and the service reference dimensions of the two types of all videos clicked by the account accumulated and the video playing rate clicked by the account within 7 days are video playing rates.
In some embodiments, taking the target account as the social media account as an example, the business reference dimension includes at least two of: account liveness, account influence, account consumption situation, account mutual power, number of associated accounts, account homepage heat and the like.
Optionally, the plurality of service data of the target account includes: the account number daily active time, the account number Zhou Huoyue time, the account number month active time, the number of times of accumulated clicks of the issued dynamic state of the account number, the number of times of accumulated comments of the issued dynamic state of the account number, the accumulated total consumption amount of the account number, the consumption amount of the last order of the account number, the account number member grade, the number of all dynamic comments of the account number, the last dynamic comment number of the account number, the last dynamic reading amount of the account number, the account number vermicelli amount, the account number attention amount, the daily access amount of the account number homepage, the weekly access amount of the account number homepage, the monthly access amount of the homepage and the like.
When the service data is grouped on the basis, the service reference dimension to which each service data belongs needs to be determined, namely: the service reference dimensions of the account daily active time length, the account Zhou Huoyue time length and the account month active time length are account liveness, the service reference dimensions of the account issued dynamic accumulated click times, the account issued dynamic accumulated comment times and the account issued dynamic accumulated share times are account influence, the service reference dimensions of the account accumulated total consumption amount, the consumption amount of the account latest order and the account member grade are account consumption conditions, the service reference dimensions of the account accumulated dynamic comment numbers, the account latest dynamic comment numbers and the account latest dynamic reading amount are account mutual power, the service reference dimensions of the account vermicelli amounts and the account attention amounts are associated account numbers, and the service reference dimensions of the account homepage, the daily access amount of the account homepage and the monthly access amount of the homepage are homepage heat.
In some embodiments, a user may log in the target account on the terminal, initiate various service requests to the server based on the target account, record various service data of the target account when the server provides the service corresponding to the various service requests to the target account, and group the various service data according to service reference dimensions by using various statistical analysis tools to obtain multiple groups of service data of the target account.
In other embodiments, the terminal locally records each service data of the target account, groups each service data according to the service reference dimension by using various statistical analysis tools, obtains multiple groups of service data of the target account, and sends the multiple groups of service data of the target account to the server to request the server to identify the account type of the target account based on the multiple groups of service data of the target account.
Optionally, according to the setting of the technician, the target account may be divided into different account types, for example, the target account is divided into a high-quality account and a non-high-quality account, or the target account is divided into a head account, a waist account and a bottom account, or the target account is divided into a high credit account, a medium credit account and a low credit account, which can indicate the downstream task of performing personalized resource recommendation for the accounts of different account types through accurate division of the account types.
In the embodiment of the present application, only the case of identifying whether the target account is a high-quality account is described, but the account type of the target account should not be limited. The high-quality account number refers to a core account number or a high-quality account number of the platform, and is required to be comprehensively evaluated and measured through a plurality of business reference dimensions such as account number liveness, account number consumption condition, account number mutual power and the like.
502. The server obtains a plurality of first identification results of the target account based on the plurality of sets of service data, wherein the first identification results are the predicted account types determined based on the single set of service data.
In some embodiments, the server inputs the plurality of sets of service data into a plurality of account identification models respectively, processes the plurality of sets of service data through the plurality of account identification models respectively to obtain the plurality of first identification results, and the account identification models are used for obtaining the predicted account types based on the single set of service data corresponding to the service reference dimension.
Optionally, after the server acquires multiple sets of service data of the target account, determining a service reference dimension corresponding to any set of service data in the multiple sets of service data; based on the mapping relation between the service reference dimension and the account identification model, determining an account identification model corresponding to the service reference dimension from a plurality of locally pre-stored account identification models, wherein the account identification model is used for acquiring a predicted account type based on single-group service data of the service reference dimension; inputting any group of business data into the corresponding account identification model, processing the any group of business data through the account identification model to obtain a first identification result corresponding to the any group of business data, and executing the steps on each group of business data of the target account to obtain a plurality of first identification results.
In the process, by training an account identification model for each service reference dimension and inputting the service data corresponding to the service reference dimension into the corresponding account identification model, the service data of different service reference dimensions can be processed in a targeted manner by using different account identification models, so that the influence of the service data of other service reference dimensions can be stripped off by the prediction result, and the account type of the target account can be accurately predicted in a single service reference dimension.
In some embodiments, the server may pre-store a mapping relationship between a service reference dimension and an account identification model, where generally one service reference dimension corresponds to a unique account identification model, and further determine, according to a service reference dimension to which each set of input service data belongs, an account identification model having a mapping relationship with the service reference dimension. For example, the server pre-stores 3 account identification models A, B, C, and the service reference dimensions corresponding to the account identification models A, B, C respectively include: account liveness, account consumption condition and account mutual power, and the business data of the input target account comprises: the online time length of the last 7 days is 10 hours, the cumulative consumption is 1 ten thousand yuan, and the number of comments sent out in the last 7 days is 100, and as the service reference dimension corresponding to the cumulative consumption of 1 ten thousand yuan is the account consumption condition, the service data of 1 ten thousand yuan is input into the account identification model B corresponding to the account consumption condition.
Alternatively, the account identification model may be any classification model, such as a gradient lift tree (Gradient Boosting, GB) model, a gradient lift decision tree (Gradient Boosting Decision Tree, GBDT) model, a limit gradient lift (eXtreme Gradient Boosting, XGBoost) model, a lightweight gradient lift machine (Light Gradient Boosting Machine, lightGBM) model, etc., and the embodiments of the present application do not specifically define the model structure of the account identification model.
In an exemplary embodiment, it is assumed that the account identification model is used to identify whether the input account is a high-quality account, where the high-quality account is a core account or a high-quality account of the platform, and the measurement needs to be comprehensively evaluated through multiple dimensions such as account liveness, account consumption situation, account interaction force and the like. Taking the account identification model as an XGBoost model for illustration, the XGBoost model is a strong learner integrated by a plurality of weak learners, wherein the weak learners can be CART (Classification And Regression Tree ) or linear classifier (gbleiner), and the embodiment of the application is not limited in particular. The XGBoost model can reduce variance and deviation and improve prediction effect, and mainly comprises machine learning algorithms such as Boosting algorithm, bagging algorithm, stacking algorithm and the like.
For the XGBoost model, the server inputs a corresponding group of service data into the XGBoost model, that is, inputs the group of service data into the plurality of weak learners, each weak learner performs feature splitting on the group of service data to obtain leaf nodes of a decision tree where the weak learner is located, outputs corresponding leaf node scores, finally, the server performs weighted processing on the plurality of leaf node scores output by the plurality of weak learners to obtain a prediction probability (i.e., a first recognition result), wherein the prediction probability is used for representing the probability that a target account belongs to a high-quality account on the group of service data, alternatively, the prediction probability is a numerical value which is greater than or equal to 0 and less than or equal to 1, when the prediction probability is greater, the probability that the XGBoost model predicts that the target account belongs to the high-quality account is greater, and when the prediction probability is smaller, the probability that the XGBoost model predicts that the target account belongs to the high-quality account is smaller. Alternatively, each decision tree may be a binary tree, i.e., each weak learner is split into two parts, a left sub-tree and a right sub-tree, when performing feature splitting.
503. The server obtains a second identification result of the target account based on a plurality of first identification results of the target account, wherein the second identification result is a predicted account type determined based on the plurality of sets of business data.
In some embodiments, the server inputs the plurality of first recognition results into a fusion model, weights the plurality of first recognition results through the fusion model to obtain a plurality of weighted recognition results, and the fusion model is used for acquiring the predicted account type based on a plurality of groups of service data; and linearly mapping the sum value among the weighted recognition results to obtain the second recognition result.
In the above process, the second recognition result can comprehensively reference the respective values of the plurality of first recognition results by performing linear weighted fusion on each first recognition result, and the importance degree of each group of service data in the second recognition result can be dynamically adjusted by adjusting the weight of each first recognition result, so that the second recognition result has higher accuracy.
Optionally, the fusion model may be a logistic regression (Logistic Regression, LR) model, in which a respective weight parameter is allocated to each account identification model, the first identification result output by each account identification model is multiplied by a corresponding weight parameter to obtain a plurality of weighted identification results, the plurality of weighted identification results are added to obtain the sum, and the sum is linearly mapped to obtain the second identification result.
In one possible implementation, assume c i Weight parameter representing ith account identification model, f i Representing a first recognition result output by an ith account number recognition model, wherein i is an integer greater than or equal to 1 and less than or equal to n, n is the number of account number recognition models and n is an integer greater than or equal to 1, b is an intercept (i.e. a bias parameter) of a logistic regression model, and then a second recognition result p output by the logistic regression model can be represented as:
assuming that C is a summary of all constants, the above formula can be abbreviated as:
in some embodiments, the fusion model may be a least square model in addition to the LR model, where each of the LR model and the least square model may perform linear weighted fusion on the plurality of first recognition results, so that the final second recognition result may integrate the first recognition result of each account recognition model, thereby achieving a global account type prediction effect.
Fig. 6 is a schematic flowchart of an account type identification method provided in this embodiment of the present application, as shown in 600, assuming that a server collects n groups of service data (n is greater than or equal to 1) of a target account, the n groups of service data are respectively input into n corresponding account identification models to obtain n first identification results, for example, each account identification model is a gradient lifting tree model, and the n first identification results are input into a fusion model to obtain a second identification result.
504. The server determines a predicted account type of the target account based on the plurality of first recognition results of the target account and the second recognition result of the target account.
In some embodiments, the server determines that the predicted account type of the target account is a high quality account in response to the second recognition result being greater than the first target threshold. Wherein the first target threshold is any value greater than or equal to 0 and less than or equal to 1.
In some embodiments, the server determines that the predicted account type of the target account is a high-quality account in response to a target recognition result of the plurality of first recognition results being greater than a second target threshold, where the target recognition result is a first recognition result corresponding to decisive service data of the plurality of sets of service data, and the decisive service data may be specified by a technician, for example, an account consumption situation belongs to decisive service data. Wherein the second target threshold is any value greater than or equal to 0 and less than or equal to 1.
In some embodiments, the server determines that the predicted account type of the target account is a high quality account in response to a number of first recognition results of the plurality of first recognition results being greater than a third target threshold being greater than a number threshold. Wherein the third target threshold is any value greater than or equal to 0 and less than or equal to 1, and the number threshold is any value greater than or equal to 1.
In some embodiments, when the server satisfies at least one of the three conditions, it may determine that the predicted account type of the target account is a high-quality account, and the determining manner of determining the account type of the target account is not specifically limited in this embodiment of the present application.
Fig. 7 is a schematic diagram of an account type identification method provided in this embodiment, as shown in 700, assuming that a target account is a short video account, a server analyzes a service request of the target account by using various statistical analysis tools to obtain multiple service data of the target account, and groups the multiple service data according to service reference dimensions, for example, the multiple service data are divided into multiple groups of service data such as account liveness, account consumption condition, account interaction force, and the like, each group of service data is input into a corresponding account identification model (taking a gradient lifting tree model as an example) to obtain each first identification result, each first identification result is input into a fusion model to obtain a second identification result, where each first identification result is a probability of predicting that the target account belongs to a high quality account under each group of service data, and the second identification result is a probability of comprehensively predicting that the target account integrally belongs to the high quality account. Based on the first identification result and the second identification result, account analysis can be performed to determine the type of the predicted account to which the target account finally belongs, and based on the type of the predicted account, downstream application works such as recommendation position distribution, account consumption overall sorting, high-quality account mining/deleting and the like are performed. For example, short videos issued by the high-quality account are recommended to other accounts in priority, so that the exposure rate of the high-quality account is increased, the indexes such as duration retention of a user are improved, and for example, the account characteristics are analyzed by the high-quality account and can be used as a reference factor of an analysis and management tool of the account in the platform.
All the above optional solutions can be combined to form an optional embodiment of the present disclosure, which is not described in detail herein.
According to the method provided by the embodiment of the invention, the original business data are grouped according to the business reference dimension, the first identification result of the target account number is predicted independently on the basis of each group of business data, the first identification results are synthesized to comprehensively predict the second identification result of the target account number, and when the final predicted account number type is determined, the second identification result and the first identification results are comprehensively considered, and the single dimension judgment is not carried out only on the basis of the second identification result, so that the identification accuracy of the account number type can be greatly improved.
Further, different account identification models are modeled on service data with different service reference dimensions, the account identification models are utilized to obtain corresponding first identification results of each group of service data, fusion models are modeled on the first identification results, the fusion models are utilized to obtain second identification results, the first identification results and the second identification results can be automatically obtained through the machine learning models, and data processing efficiency is improved.
Fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, please refer to fig. 8, which includes:
a first obtaining module 801, configured to obtain multiple groups of sample service data and account types of a sample account, where different groups of sample service data correspond to different service reference dimensions;
the adjusting module 802 is configured to adjust parameters of a plurality of initial recognition models and initial fusion models based on the plurality of sets of sample service data and the account type, where parameters of other models are kept unchanged when adjusting parameters of any initial recognition model or the initial fusion model;
the second obtaining module 803 is configured to obtain, in response to the iteration meeting the convergence condition, a plurality of account identification models and a fusion model after parameter adjustment, where the account identification model is used to obtain a predicted account type based on a single set of service data corresponding to the service reference dimension, and the fusion model is used to obtain the predicted account type based on multiple sets of service data.
According to the device provided by the embodiment of the application, the plurality of initial recognition models and the initial fusion models are jointly trained by utilizing the plurality of groups of sample service data of the sample account under the plurality of service reference dimensions and the account types marked by the sample account, and in the process of adjusting the parameters of each model, the parameters of other models are fixed unchanged, so that layering training is not carried out in an isolated manner when a single model is trained, but an end-to-end training mode is provided, namely, each model is independently trained by considering the recognition results of the model, and is jointly trained by combining the recognition results of the other models, so that the recognition accuracy of the plurality of account recognition models and the fusion models obtained by training can be greatly improved, and the recognition accuracy of the account types is improved.
In one possible implementation, the adjustment module 802 is configured to:
respectively inputting the multiple groups of sample service data into the multiple initial recognition models to obtain multiple first sample recognition results of the sample account;
inputting the plurality of first sample recognition results into the initial fusion model to obtain a second sample recognition result of the sample account;
determining a loss function value based on the plurality of first sample recognition results, the second sample recognition result, and the account type;
in response to not meeting the stop condition, adjusting parameters of any one of the plurality of initial recognition models until meeting the stop condition, and adjusting parameters of a next initial recognition model;
and adjusting the parameters of the initial fusion model in response to the parameters of the plurality of initial recognition models being adjusted.
In one possible implementation, the adjustment module 802 is further configured to:
and carrying out parameter adjustment on the basic identification model based on any group of sample service data of the sample account to obtain any initial identification model, wherein the any initial identification model corresponds to a service reference dimension to which any group of sample service data belongs.
In one possible implementation, the adjustment module 802 is further configured to:
Respectively inputting the multiple groups of sample service data into the multiple initial recognition models to obtain multiple first sample recognition results;
and based on the plurality of first sample recognition results, carrying out parameter adjustment on the basic fusion model to obtain the initial fusion model.
In one possible implementation manner, the convergence condition is that a difference between the loss function values of the current iteration process and the previous iteration process is smaller than a convergence threshold, and the convergence threshold is a value greater than or equal to 0.
All the above optional solutions can be combined to form an optional embodiment of the present disclosure, which is not described in detail herein.
It should be noted that: in the processing device for data provided in the above embodiment, when processing service data, only the division of the above functional modules is used for illustration, and in practical application, the above functional allocation can be completed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the functions described above. In addition, the data processing device and the data processing method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the data processing device and the data processing method are detailed in the data processing method embodiments, which are not described herein again.
Fig. 9 is a schematic structural diagram of an account type identification device provided in an embodiment of the present application, please refer to fig. 9, and the device includes:
the first obtaining module 901 is configured to obtain multiple sets of service data of the target account, where the different sets of service data correspond to different service reference dimensions;
a second obtaining module 902, configured to obtain, based on the multiple sets of service data, multiple first identification results of the target account, where the first identification results are predicted account types determined based on a single set of the service data;
a third obtaining module 903, configured to obtain a second identification result of the target account based on a plurality of first identification results of the target account, where the second identification result is a predicted account type determined based on the plurality of sets of service data;
a determining module 904, configured to determine a predicted account type of the target account based on the plurality of first identification results of the target account and the second identification result of the target account.
According to the device provided by the embodiment of the invention, the original business data are grouped according to the business reference dimension, the first identification result of the target account number is predicted independently on the basis of each group of business data, the second identification result of the target account number is predicted comprehensively by integrating the first identification results, and when the final predicted account number type is determined, the second identification result and the first identification results are considered comprehensively, and the single dimension judgment is not performed only on the basis of the second identification result, so that the identification accuracy of the account number type can be greatly improved.
In one possible implementation, the second obtaining module 902 is configured to:
for any one set of business data in the plurality of sets of business data, determining a business reference dimension corresponding to the any one set of business data;
determining an account identification model corresponding to the service reference dimension based on a mapping relation between the service reference dimension and the account identification model, wherein the account identification model is used for acquiring a predicted account type based on single-group service data of the service reference dimension;
and inputting any group of business data into the account identification model, and processing the any group of business data through the account identification model to obtain a first identification result corresponding to the any group of business data.
In one possible implementation, the third obtaining module 903 is configured to:
inputting the plurality of first recognition results into a fusion model, weighting the plurality of first recognition results through the fusion model to obtain a plurality of weighted recognition results, wherein the fusion model is used for acquiring a predicted account type based on a plurality of groups of business data;
and linearly mapping the sum value among the weighted recognition results to obtain the second recognition result.
In one possible implementation manner, the first obtaining module 901 is configured to:
Determining service reference dimensions to which a plurality of service data of the target account belong respectively;
and dividing each service data belonging to the same service reference dimension into the same group of service data.
In one possible implementation, the service reference dimension includes at least two of: account liveness, account influence, account consumption condition, account mutual power, associated account number and video play completion rate.
All the above optional solutions can be combined to form an optional embodiment of the present disclosure, which is not described in detail herein.
It should be noted that: in the identification device for the account type provided in the above embodiment, only the division of the above functional modules is used for illustrating when the account type is identified, and in practical application, the above functional allocation can be completed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the device for identifying the account type and the method embodiment for identifying the account type provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the device are detailed in the method embodiment for identifying the account type, which is not described herein again.
Fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application. Optionally, a computer device is taken as an example of the terminal 1000, and the device types of the terminal 1000 include: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Terminal 1000 can also be referred to by other names of user equipment, portable terminal, laptop terminal, desktop terminal, etc.
In general, terminal 1000 can include: a processor 1001 and a memory 1002.
Optionally, the processor 1001 includes one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. Optionally, the processor 1001 is implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). In some embodiments, the processor 1001 includes a main processor and a coprocessor, the main processor being a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1001 is integrated with a GPU (Graphics Processing Unit, image processor) for taking care of rendering and drawing of the content that the display screen is required to display. In some embodiments, the processor 1001 further includes an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.
In some embodiments, memory 1002 includes one or more computer-readable storage media, optionally non-transitory. The memory 1002 also optionally includes high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1002 is used to store at least one program code for execution by processor 1001 to implement the method of processing data or the method of identifying account types provided by the various embodiments herein.
In some embodiments, terminal 1000 can optionally further include: a peripheral interface 1003, and at least one peripheral. The processor 1001, the memory 1002, and the peripheral device interface 1003 can be connected by a bus or signal line. The individual peripheral devices can be connected to the peripheral device interface 1003 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1004, a display 1005, a camera assembly 1006, audio circuitry 1007, and a power supply 1009.
Peripheral interface 1003 may be used to connect I/O (Input/Output) related at least one peripheral to processor 1001 and memory 1002. In some embodiments, processor 1001, memory 1002, and peripheral interface 1003 are integrated on the same chip or circuit board; in some other embodiments, any one or both of the processor 1001, memory 1002, and peripheral interface 1003 are implemented on a separate chip or circuit board, which is not limited in this embodiment.
Radio Frequency circuit 1004 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. Radio frequency circuitry 1004 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1004 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1004 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. Optionally, the radio frequency circuitry 1004 communicates with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 1004 also includes NFC (Near Field Communication ) related circuitry, which is not limited in this application.
The display screen 1005 is used to display a UI (User Interface). Optionally, the UI includes graphics, text, icons, video, and any combination thereof. When the display 1005 is a touch screen, the display 1005 also has the ability to capture touch signals at or above the surface of the display 1005. The touch signal can be input to the processor 1001 as a control signal for processing. Optionally, the display 1005 is also used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, display 1005 is one, providing a front panel of terminal 1000; in other embodiments, at least two display screens 1005 are provided on different surfaces of terminal 1000 or in a folded configuration; in still other embodiments, display 1005 is a flexible display disposed on a curved surface or a folded surface of terminal 1000. Even alternatively, the display 1005 is arranged in a non-rectangular irregular pattern, i.e. a shaped screen. Optionally, the display 1005 is made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.
The camera assembly 1006 is used to capture images or video. Optionally, camera assembly 1006 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 1006 also includes a flash. Alternatively, the flash is a single-color temperature flash, or a dual-color temperature flash. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and is used for light compensation under different color temperatures.
In some embodiments, audio circuit 1007 includes a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 1001 for processing, or inputting the electric signals to the radio frequency circuit 1004 for voice communication. For purposes of stereo acquisition or noise reduction, a plurality of microphones are provided at different portions of terminal 1000, respectively. Optionally, the microphone is an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 1001 or the radio frequency circuit 1004 into sound waves. Alternatively, the speaker is a conventional thin film speaker, or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only an electric signal but also an acoustic wave audible to humans can be converted into an acoustic wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuit 1007 also includes a headphone jack.
Power supply 1009 is used to power the various components in terminal 1000. Optionally, the power source 1009 is an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power source 1009 includes a rechargeable battery, the rechargeable battery supports wired or wireless charging. The rechargeable battery is also used to support fast charge technology.
In some embodiments, terminal 1000 can further include one or more sensors 1010. The one or more sensors 1010 include, but are not limited to: acceleration sensor 1011, gyro sensor 1012, pressure sensor 1013, optical sensor 1015, and proximity sensor 1016.
In some embodiments, acceleration sensor 1011 detects the magnitude of acceleration on three coordinate axes of the coordinate system established with terminal 1000. For example, the acceleration sensor 1011 is used to detect components of gravitational acceleration on three coordinate axes. Optionally, the processor 1001 controls the display screen 1005 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 1011. The acceleration sensor 1011 is also used for acquisition of motion data of a game or a user.
In some embodiments, gyro sensor 1012 detects the body direction and rotation angle of terminal 1000, and gyro sensor 1012 and acceleration sensor 1011 cooperate to collect 3D motion of the user on terminal 1000. The processor 1001 realizes the following functions according to the data acquired by the gyro sensor 1012: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.
Optionally, pressure sensor 1013 is disposed on a side frame of terminal 1000 and/or under display 1005. When the pressure sensor 1013 is provided at a side frame of the terminal 1000, a grip signal of the terminal 1000 by a user can be detected, and the processor 1001 performs right-and-left hand recognition or quick operation based on the grip signal collected by the pressure sensor 1013. When the pressure sensor 1013 is provided at the lower layer of the display screen 1005, the processor 1001 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1005. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.
The optical sensor 1015 is used to collect ambient light intensity. In one embodiment, the processor 1001 controls the display brightness of the display screen 1005 based on the ambient light intensity collected by the optical sensor 1015. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 1005 is turned up; when the ambient light intensity is low, the display brightness of the display screen 1005 is turned down. In another embodiment, the processor 1001 also dynamically adjusts the shooting parameters of the camera module 1006 according to the ambient light intensity collected by the optical sensor 1015.
Proximity sensor 1016, also referred to as a distance sensor, is typically located on the front panel of terminal 1000. Proximity sensor 1016 is used to collect the distance between the user and the front of terminal 1000. In one embodiment, when proximity sensor 1016 detects a gradual decrease in the distance between the user and the front face of terminal 1000, processor 1001 controls display 1005 to switch from the bright screen state to the off screen state; when proximity sensor 1016 detects a gradual increase in the distance between the user and the front of terminal 1000, processor 1001 controls display 1005 to switch from the off-screen state to the on-screen state.
Those skilled in the art will appreciate that the structure shown in fig. 10 is not limiting as terminal 1000 and can include more or fewer components than shown, or certain components may be combined, or a different arrangement of components may be employed.
Fig. 11 is a schematic structural diagram of a computer device provided in the embodiment of the present application, where the computer device 1100 may generate a relatively large difference due to different configurations or performances, and the computer device 1100 includes one or more processors (Central Processing Units, CPU) 1101 and one or more memories 1102, where at least one computer program is stored in the memories 1102, and the at least one computer program is loaded and executed by the one or more processors 1101 to implement the processing method of data or the identification method of account type provided in each embodiment. Optionally, the computer device 1100 further includes a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.
In an exemplary embodiment, a computer readable storage medium is also provided, for example a memory comprising at least one computer program executable by a processor in a terminal to perform the method of processing data or the method of identifying account types in the respective embodiments described above. For example, the computer readable storage medium includes ROM (Read-Only Memory), RAM (Random-Access Memory), CD-ROM (Compact Disc Read-Only Memory), magnetic tape, floppy disk, optical data storage device, and the like.
In an exemplary embodiment, a computer program product or computer program is also provided, comprising one or more program codes, the one or more program codes being stored in a computer readable storage medium. The one or more processors of the computer device can read the one or more program codes from the computer-readable storage medium, and the one or more processors execute the one or more program codes so that the computer device can execute to complete the processing method of data or the identification method of account type in the above embodiments.
Those of ordinary skill in the art will appreciate that all or a portion of the steps implementing the above-described embodiments can be implemented by hardware, or can be implemented by a program instructing the relevant hardware, optionally stored in a computer readable storage medium, optionally a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments is merely exemplary in nature and is in no way intended to limit the invention, since it is intended that all modifications, equivalents, improvements, etc. that fall within the spirit and scope of the invention.

Claims (22)

1. A method of processing data, the method comprising:
acquiring a plurality of groups of sample service data and account types of sample accounts, wherein the sample service data of different groups correspond to different service reference dimensions;
based on the multiple groups of sample service data and the account types, parameters of multiple initial recognition models and initial fusion models are adjusted, wherein when parameters of any initial recognition model or initial fusion model are adjusted, parameters of other models are kept unchanged;
and responding to the iteration meeting the convergence condition, acquiring a plurality of account identification models and a fusion model after parameter adjustment, wherein the account identification models are used for acquiring the predicted account types based on single-group service data corresponding to service reference dimensions, and the fusion model is used for acquiring the predicted account types based on multiple groups of service data.
2. The method of claim 1, wherein adjusting parameters of a plurality of initial recognition models and initial fusion models based on the plurality of sets of sample business data and the account type comprises:
Respectively inputting the multiple groups of sample service data into the multiple initial recognition models to obtain multiple first sample recognition results of the sample account;
inputting the plurality of first sample recognition results into the initial fusion model to obtain a second sample recognition result of the sample account;
determining a loss function value based on the plurality of first sample recognition results, the second sample recognition result, and the account type;
in response to not meeting the stop condition, adjusting parameters of any one of the plurality of initial recognition models until meeting the stop condition, and adjusting parameters of a next initial recognition model;
and adjusting the parameters of the initial fusion model in response to the completion of the adjustment of the parameters of the plurality of initial recognition models.
3. The method of claim 1, wherein before adjusting parameters of a plurality of initial identification models and initial fusion models based on the plurality of sets of sample business data and the account type, the method further comprises:
and carrying out parameter adjustment on the basic identification model based on any group of sample service data of the sample account to obtain any initial identification model, wherein any initial identification model corresponds to a service reference dimension to which any group of sample service data belongs.
4. The method of claim 1, wherein before adjusting parameters of a plurality of initial identification models and initial fusion models based on the plurality of sets of sample business data and the account type, the method further comprises:
respectively inputting the multiple groups of sample service data into the multiple initial recognition models to obtain multiple first sample recognition results;
and carrying out parameter adjustment on the basic fusion model based on the plurality of first sample recognition results to obtain the initial fusion model.
5. The method according to any one of claims 1 to 4, wherein the convergence condition is that a difference between a loss function value of the current iteration process and a loss function value of a previous iteration process is smaller than a convergence threshold, and the convergence threshold is a value greater than or equal to 0.
6. An account type identification method, which is characterized by comprising the following steps:
acquiring multiple groups of service data of a target account, wherein the service data of different groups correspond to different service reference dimensions;
acquiring a plurality of first identification results of the target account based on the plurality of sets of service data, wherein the first identification results are predicted account types determined by an account identification model based on a single set of service data;
Acquiring a second identification result of the target account based on a plurality of first identification results of the target account, wherein the second identification result is a predicted account type determined by a fusion model based on the plurality of groups of business data; the combined training process of the account identification model and the fusion model comprises the following steps: acquiring a plurality of groups of sample service data and account types of sample accounts, wherein the sample service data of different groups correspond to different service reference dimensions; based on the multiple groups of sample service data and the account types, parameters of multiple initial recognition models and initial fusion models are adjusted, wherein when parameters of any initial recognition model or initial fusion model are adjusted, parameters of other models are kept unchanged; responding to the iteration meeting the convergence condition, and acquiring a plurality of account identification models and fusion models after parameter adjustment;
and determining the predicted account type of the target account based on the first identification results of the target account and the second identification results of the target account.
7. The method of claim 6, wherein the obtaining a plurality of first identification results of the target account based on the plurality of sets of business data comprises:
For any one set of business data in the plurality of sets of business data, determining a business reference dimension corresponding to the any one set of business data;
determining an account identification model corresponding to a service reference dimension based on a mapping relation between the service reference dimension and the account identification model, wherein the account identification model is used for acquiring a predicted account type based on single-group service data of the service reference dimension;
and inputting any group of business data into the account identification model, and processing the any group of business data through the account identification model to obtain a first identification result corresponding to the any group of business data.
8. The method of claim 6, wherein the obtaining a second recognition result of the target account based on the plurality of first recognition results of the target account comprises:
inputting the plurality of first recognition results into a fusion model, weighting the plurality of first recognition results through the fusion model to obtain a plurality of weighted recognition results, wherein the fusion model is used for acquiring a predicted account type based on a plurality of groups of business data;
and carrying out linear mapping on the sum value among the weighted recognition results to obtain the second recognition result.
9. The method of claim 6, wherein the obtaining the plurality of sets of business data for the target account number comprises:
determining service reference dimensions to which a plurality of service data of the target account belong respectively;
and dividing each service data belonging to the same service reference dimension into the same group of service data.
10. The method according to any of claims 6 to 9, wherein the traffic reference dimension comprises at least two of: account liveness, account influence, account consumption condition, account mutual power, associated account number and video play completion rate.
11. A data processing apparatus, the apparatus comprising:
the first acquisition module is used for acquiring a plurality of groups of sample service data and account types of the sample account, wherein the sample service data of different groups correspond to different service reference dimensions;
the adjustment module is used for adjusting parameters of a plurality of initial recognition models and initial fusion models based on the plurality of groups of sample service data and the account types, wherein the parameters of other models are kept unchanged when the parameters of any initial recognition model or the initial fusion model are adjusted;
The second acquisition module is used for responding to the condition that the iteration meets convergence, acquiring a plurality of account identification models and fusion models after parameter adjustment, wherein the account identification models are used for acquiring the predicted account types based on single-set business data corresponding to business reference dimensions, and the fusion models are used for acquiring the predicted account types based on multiple sets of business data.
12. The apparatus of claim 11, wherein the adjustment module is configured to:
respectively inputting the multiple groups of sample service data into the multiple initial recognition models to obtain multiple first sample recognition results of the sample account;
inputting the plurality of first sample recognition results into the initial fusion model to obtain a second sample recognition result of the sample account;
determining a loss function value based on the plurality of first sample recognition results, the second sample recognition result, and the account type;
in response to not meeting the stop condition, adjusting parameters of any one of the plurality of initial recognition models until meeting the stop condition, and adjusting parameters of a next initial recognition model;
and adjusting the parameters of the initial fusion model in response to the completion of the adjustment of the parameters of the plurality of initial recognition models.
13. The apparatus of claim 11, wherein the adjustment module is further configured to:
and carrying out parameter adjustment on the basic identification model based on any group of sample service data of the sample account to obtain any initial identification model, wherein any initial identification model corresponds to a service reference dimension to which any group of sample service data belongs.
14. The apparatus of claim 11, wherein the adjustment module is further configured to:
respectively inputting the multiple groups of sample service data into the multiple initial recognition models to obtain multiple first sample recognition results;
and carrying out parameter adjustment on the basic fusion model based on the plurality of first sample recognition results to obtain the initial fusion model.
15. The apparatus according to any one of claims 11 to 14, wherein the convergence condition is that a difference between a loss function value of the current iteration process and a loss function value of a previous iteration process is smaller than a convergence threshold, and the convergence threshold is a value greater than or equal to 0.
16. An account type identification device, the device comprising:
the first acquisition module is used for acquiring multiple groups of business data of the target account, wherein the business data of different groups correspond to different business reference dimensions;
The second acquisition module is used for acquiring a plurality of first identification results of the target account based on the plurality of groups of service data, wherein the first identification results are the predicted account types determined by the account identification model based on a single group of service data;
the third acquisition module is used for acquiring a second identification result of the target account based on a plurality of first identification results of the target account, wherein the second identification result is a predicted account type determined by a fusion model based on the plurality of groups of business data; the combined training process of the account identification model and the fusion model comprises the following steps: acquiring a plurality of groups of sample service data and account types of sample accounts, wherein the sample service data of different groups correspond to different service reference dimensions; based on the multiple groups of sample service data and the account types, parameters of multiple initial recognition models and initial fusion models are adjusted, wherein when parameters of any initial recognition model or initial fusion model are adjusted, parameters of other models are kept unchanged; responding to the iteration meeting the convergence condition, and acquiring a plurality of account identification models and fusion models after parameter adjustment;
And the determining module is used for determining the predicted account type of the target account based on the first identification results of the target account and the second identification results of the target account.
17. The apparatus of claim 16, wherein the second acquisition module is configured to:
for any one set of business data in the plurality of sets of business data, determining a business reference dimension corresponding to the any one set of business data;
determining an account identification model corresponding to a service reference dimension based on a mapping relation between the service reference dimension and the account identification model, wherein the account identification model is used for acquiring a predicted account type based on single-group service data of the service reference dimension;
and inputting any group of business data into the account identification model, and processing the any group of business data through the account identification model to obtain a first identification result corresponding to the any group of business data.
18. The apparatus of claim 16, wherein the third acquisition module is configured to:
inputting the plurality of first recognition results into a fusion model, weighting the plurality of first recognition results through the fusion model to obtain a plurality of weighted recognition results, wherein the fusion model is used for acquiring a predicted account type based on a plurality of groups of business data;
And carrying out linear mapping on the sum value among the weighted recognition results to obtain the second recognition result.
19. The apparatus of claim 16, wherein the first acquisition module is configured to:
determining service reference dimensions to which a plurality of service data of the target account belong respectively;
and dividing each service data belonging to the same service reference dimension into the same group of service data.
20. The apparatus according to any one of claims 16 to 19, wherein the traffic reference dimension comprises at least two of: account liveness, account influence, account consumption condition, account mutual power, associated account number and video play completion rate.
21. A computer device comprising one or more processors and one or more memories, the one or more memories having stored therein at least one computer program loaded and executed by the one or more processors to implement the method of processing data as claimed in any of claims 1 to 5; or to implement the method of identification of account types as claimed in any one of claims 6 to 10.
22. A storage medium having stored therein at least one computer program loaded and executed by a processor to implement the method of processing data according to any one of claims 1 to 5; or to implement the method of identification of account types as claimed in any one of claims 6 to 10.
CN202110535924.1A 2021-05-17 2021-05-17 Data processing method, account type identification method and device Active CN113762585B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110535924.1A CN113762585B (en) 2021-05-17 2021-05-17 Data processing method, account type identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110535924.1A CN113762585B (en) 2021-05-17 2021-05-17 Data processing method, account type identification method and device

Publications (2)

Publication Number Publication Date
CN113762585A CN113762585A (en) 2021-12-07
CN113762585B true CN113762585B (en) 2023-08-01

Family

ID=78787072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110535924.1A Active CN113762585B (en) 2021-05-17 2021-05-17 Data processing method, account type identification method and device

Country Status (1)

Country Link
CN (1) CN113762585B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114169440A (en) * 2021-12-08 2022-03-11 北京百度网讯科技有限公司 Model training method, data processing method, device, electronic device and medium

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107346464B (en) * 2016-05-06 2021-04-16 腾讯科技(深圳)有限公司 Service index prediction method and device
CN110598840B (en) * 2018-06-13 2023-04-18 富士通株式会社 Knowledge migration method, information processing apparatus, and storage medium
CN109948670A (en) * 2019-03-04 2019-06-28 腾讯科技(深圳)有限公司 Training method and device, the data processing method and device of data identification model
US20200312307A1 (en) * 2019-03-25 2020-10-01 Microsoft Technology Licensing, Llc Dynamic Combination of Acoustic Model States
US20220318641A1 (en) * 2019-06-07 2022-10-06 The Regents Of The University Of California General form of the tree alternating optimization (tao) for learning decision trees
CN110188836B (en) * 2019-06-21 2021-06-11 西安交通大学 Brain function network classification method based on variational self-encoder
CN110399925B (en) * 2019-07-26 2023-09-19 腾讯科技(武汉)有限公司 Account risk identification method, device and storage medium
CN110738263B (en) * 2019-10-17 2020-12-29 腾讯科技(深圳)有限公司 Image recognition model training method, image recognition method and image recognition device
CN111582694B (en) * 2020-04-29 2023-08-08 腾讯科技(深圳)有限公司 Learning evaluation method and device
CN111783998B (en) * 2020-06-30 2023-08-11 百度在线网络技术(北京)有限公司 Training method and device for illegal account identification model and electronic equipment
CN111708823B (en) * 2020-08-18 2021-05-18 腾讯科技(深圳)有限公司 Abnormal social account identification method and device, computer equipment and storage medium
CN112016633A (en) * 2020-09-25 2020-12-01 北京百度网讯科技有限公司 Model training method and device, electronic equipment and storage medium
CN112221156B (en) * 2020-10-27 2021-07-27 腾讯科技(深圳)有限公司 Data abnormality recognition method, data abnormality recognition device, storage medium, and electronic device
CN112257876B (en) * 2020-11-15 2021-07-30 腾讯科技(深圳)有限公司 Federal learning method, apparatus, computer device and medium
CN112488163A (en) * 2020-11-17 2021-03-12 中国平安财产保险股份有限公司 Abnormal account identification method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN113762585A (en) 2021-12-07

Similar Documents

Publication Publication Date Title
CN111298445B (en) Target account detection method and device, electronic equipment and storage medium
CN110458360B (en) Method, device, equipment and storage medium for predicting hot resources
CN112069414A (en) Recommendation model training method and device, computer equipment and storage medium
CN109784351B (en) Behavior data classification method and device and classification model training method and device
CN112163717B (en) Population data prediction method and device, computer equipment and medium
CN111552888A (en) Content recommendation method, device, equipment and storage medium
CN110162604B (en) Statement generation method, device, equipment and storage medium
CN111104980B (en) Method, device, equipment and storage medium for determining classification result
CN112749728A (en) Student model training method and device, computer equipment and storage medium
CN111897996A (en) Topic label recommendation method, device, equipment and storage medium
CN111831917A (en) Content recommendation method, device, equipment and medium
CN110555102A (en) media title recognition method, device and storage medium
WO2022193973A1 (en) Image processing method and apparatus, electronic device, computer readable storage medium, and computer program product
CN114282587A (en) Data processing method and device, computer equipment and storage medium
CN113762585B (en) Data processing method, account type identification method and device
CN114281936A (en) Classification method and device, computer equipment and storage medium
CN113674856A (en) Medical data processing method, device, equipment and medium based on artificial intelligence
CN111931075A (en) Content recommendation method and device, computer equipment and storage medium
CN114765062A (en) Gene data processing method, gene data processing device, computer equipment and storage medium
CN113486260B (en) Method and device for generating interactive information, computer equipment and storage medium
CN107807940B (en) Information recommendation method and device
CN112232890B (en) Data processing method, device, equipment and storage medium
CN111259252B (en) User identification recognition method and device, computer equipment and storage medium
CN114328948A (en) Training method of text standardization model, text standardization method and device
CN113761195A (en) Text classification method and device, computer equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant