CN112149807A - Method and device for processing user characteristic information - Google Patents

Method and device for processing user characteristic information Download PDF

Info

Publication number
CN112149807A
CN112149807A CN202011037919.XA CN202011037919A CN112149807A CN 112149807 A CN112149807 A CN 112149807A CN 202011037919 A CN202011037919 A CN 202011037919A CN 112149807 A CN112149807 A CN 112149807A
Authority
CN
China
Prior art keywords
value
characteristic information
user characteristic
user
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011037919.XA
Other languages
Chinese (zh)
Inventor
陈亮辉
付琰
周洋杰
甘露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011037919.XA priority Critical patent/CN112149807A/en
Publication of CN112149807A publication Critical patent/CN112149807A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application discloses a method and a device for processing user characteristic information, and relates to the technical field of deep learning and natural language processing. The specific implementation mode comprises the following steps: acquiring user characteristic information of a plurality of initial types, and determining a numerical value of a first index for the user characteristic information of each initial type to obtain a numerical value set corresponding to the plurality of initial types; combining the numerical values in the numerical value set to obtain at least two numerical value ranges, and taking at least one initial type corresponding to each numerical value range as a binding type; determining the user characteristic information of the binding type; and determining a training sample of the deep learning model based on the user characteristic information of the binding type. The method and the device can integrate scattered characteristics, effectively improve the user coverage rate of the characteristics, and avoid the problems that the characteristics with too small data volume and low user coverage rate are ignored or unavailable during training. Furthermore, by increasing the amount of data included by a type, the accuracy of the training may be increased.

Description

Method and device for processing user characteristic information
Technical Field
The application relates to the technical field of computers, in particular to the technical field of deep learning and natural language processing, and particularly relates to a method and a device for processing user characteristic information.
Background
Deep learning models play an important role in many areas. Deep learning is a complex machine learning algorithm whose ultimate goal is to make a machine capable of human-like analysis.
For this model, it is important to input feature information of the model, which determines an upper limit of the effect of the trained model. The feature information may include continuous features and type features. Each type of type characteristic information has a certain property, for example, the characteristic information may be that a user installs a certain application. The continuation feature may represent a degree, such as revenue for the user.
Disclosure of Invention
A method and a device for processing user characteristic information, electronic equipment and a storage medium are provided.
According to a first aspect, a method for processing user feature information is provided, which includes: acquiring user characteristic information of a plurality of initial types, and determining a numerical value of a first index for the user characteristic information of each initial type to obtain a numerical value set corresponding to the plurality of initial types; combining the numerical values in the numerical value set to obtain at least two numerical value ranges, and taking at least one initial type corresponding to each numerical value range as a binding type; determining user characteristic information of a binding type based on user characteristic information included in at least one initial type, wherein the user coverage rate of any user characteristic information in the binding type is greater than or equal to the user coverage rate of the user characteristic information corresponding to any user characteristic information in the initial type; and determining a training sample of the deep learning model based on the user characteristic information of the binding type.
According to a second aspect, there is provided an apparatus for processing user feature information, comprising: the acquisition unit is configured to acquire a plurality of initial types of user characteristic information, and for each initial type of user characteristic information, determine a numerical value of a first index to obtain a numerical value set corresponding to the plurality of initial types; the merging unit is configured to merge the numerical values in the numerical value set to obtain at least two numerical value ranges, and at least one initial type corresponding to each numerical value range is used as a binding type; the characteristic determining unit is configured to determine user characteristic information of one binding type based on the user characteristic information included in at least one initial type, wherein the user coverage rate of any user characteristic information in the binding type is greater than or equal to the user coverage rate of the user characteristic information corresponding to any user characteristic information in the initial type; and the sample determining unit is configured to determine a training sample of the deep learning model based on the user characteristic information of the binding type.
According to a third aspect, there is provided an electronic device comprising: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement a method as in any embodiment of a method for processing user characteristic information.
According to a fourth aspect, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as in any one of the embodiments of the method of processing user characteristic information.
According to the scheme of the application, scattered features can be integrated, the user coverage rate of the features is effectively improved, and the problems that the features with too small data volume and low user coverage rate are ignored or unavailable during training are avoided. Furthermore, by increasing the amount of data included by a type, the accuracy of training the deep neural network may be increased.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram to which some embodiments of the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a method of processing user profile information according to the present application;
FIG. 3a is a schematic diagram of an application scenario of a method for processing user characteristic information according to the present application;
FIG. 3b is a schematic diagram of an application scenario of a method for processing user characteristic information according to the present application;
FIG. 4 is a flow diagram of yet another embodiment of a method of processing user characteristic information according to the present application;
FIG. 5 is a schematic diagram of an embodiment of a device for processing user profile information according to the present application;
fig. 6 is a block diagram of an electronic device for implementing a method for processing user characteristic information according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary system architecture 100 of an embodiment of a processing method of user characteristic information or a processing apparatus of user characteristic information to which the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as shopping applications, search applications, live applications, instant messaging tools, mailbox clients, social platform software, and the like, may be installed on the terminal devices 101, 102, 103.
Here, the terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, such as a background server providing support for the terminal devices 101, 102, 103. The background server may analyze and otherwise process the received data such as the multiple initial types of user feature information, and feed back a processing result (e.g., a trained deep learning model) to the terminal device.
It should be noted that the processing method of the user characteristic information provided in the embodiment of the present application may be executed by the server 105 or the terminal devices 101, 102, and 103, and accordingly, the processing device of the user characteristic information may be disposed in the server 105 or the terminal devices 101, 102, and 103.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method of processing user profile information in accordance with the present application is shown. The processing method of the user characteristic information comprises the following steps:
step 201, obtaining a plurality of initial types of user characteristic information, and determining a numerical value of a first index for each initial type of user characteristic information to obtain a numerical value set corresponding to the plurality of initial types.
In this embodiment, an execution subject (for example, the server or the terminal device shown in fig. 1) on which the user characteristic information processing method operates may obtain a plurality of initial types of user characteristic information, and determine a numerical value of the first index for each of the plurality of initial types of user characteristic information, so as to obtain a numerical value set corresponding to the plurality of initial types. Specifically, the set of numerical values refers to a set of numerical values of the first index of the plurality of initial types of user characteristic information. The first index here is an index relating to the effect of training the deep learning model (accuracy and/or speed of the model). For example, the first indicator may be a positive sample rate.
Specifically, the user characteristic information is used to indicate attributes or behaviors of the user, such as the gender of the user, whether the user clicks an advertisement of a certain product, and the like. The user feature information may be type feature information, that is, a type feature. Alternatively, the user characteristic information may also include type characteristic information and continuous characteristic information. The user characteristic information can be represented in the form of characters (including words and/or numbers, for example) and also can be represented in the form of vectors.
Step 202, combining the values in the log set to obtain at least two value ranges, and using at least one initial type corresponding to each value range as a binding type.
In this embodiment, the execution body may combine the values in the value set into at least two value ranges. Each numerical range corresponds to at least one initial type, that is, each numerical range is a range in which the numerical value of the user characteristic information in at least one initial type is located. At least one initial type may be bound as a binding type. The number of binding types obtained is less than the number of initial types in the plurality of initial types.
In practice, the execution bodies described above may be combined in various ways. For example, the execution agent may combine values greater than or equal to a preset value into one value range and combine values smaller than the preset value into another value range with respect to the values in the value set.
Step 203, determining the user characteristic information of a binding type based on the user characteristic information included in at least one initial type, wherein the user coverage rate of any user characteristic information in the binding type is greater than or equal to the user coverage rate of the user characteristic information corresponding to any user characteristic information in the initial type.
In this embodiment, after the execution main body binds the binding type, the user feature information of the binding type corresponding to at least one initial type may be determined based on the at least one initial type corresponding to each value range. The user coverage rate may be used to indicate a ratio of users having a feature corresponding to the feature information to the total number of users. The total number of users herein may refer to the total number of users to which the set of user characteristic information relates.
For example, the user characteristic information is whether the user clicks the advertisement of the product a, and the user coverage rate of the user characteristic information indicates that the user clicks the advertisement of the product a, and occupies a proportion of the total number of users. In practice, the initial type of the plurality of initial types may be an initial type in which the user coverage of the user feature information is low (e.g. lower than a preset coverage threshold).
And step 204, determining a training sample of the deep learning model based on the user characteristic information of the binding type.
In this embodiment, the execution subjects may respectively determine a training sample of the deep learning model based on the user feature information of the binding type, and the training sample may be used to train the deep learning model. The deep learning model may be various models, such as a convolutional neural network. The same or different training samples can be determined from the user characteristic information of different binding types.
In practice, the execution subject may determine the training sample in various ways. For example, when the user feature information is expressed in a vector form, the execution subject may label the user feature information directly, and use the labeled user feature information as a training sample of the deep learning model. In addition, when the user feature information is expressed in the form of characters, the execution subject may determine a vector corresponding to each piece of user feature information, label the vector, and use the labeled vector as a training sample.
The method provided by the embodiment of the application can integrate scattered features, effectively improve the user coverage rate of the features, and avoid the problems that the features with too small data volume and low user coverage rate are ignored or unavailable during training. Furthermore, by increasing the amount of data included by a type, the accuracy of training the deep neural network may be increased.
In another embodiment of the application, the determining a training sample of the deep learning model based on the user feature information of the binding type may include: for each binding type in the respective binding types, determining a value of a second indicator of the user characteristic information of the binding type, wherein the value of the second indicator is obtained based on at least one of the following: user coverage, positive sample rate; and if the numerical value of the second index of each binding type is larger than or equal to the preset numerical value threshold, determining a training sample of the deep learning model based on the user characteristic information of each binding type.
In these implementations, the execution principal may determine, for each binding type, a value of the second indicator of the user characteristic information for the binding type. And, the execution subject may compare the value corresponding to each determined binding type with a preset value threshold. In the case that the determined value is larger or equal to the preset value threshold, the executing agent may execute determining a training sample of the deep learning model based on the user feature information of the binding type. Specifically, each binding type herein refers to each binding type corresponding to each numerical range obtained by merging.
In practice, the execution body may determine the value of the second index in various ways. The second index may be any one of user coverage and positive sample rate, or the numerical value of the second index may be obtained by processing at least one of user coverage and positive sample rate. For example, the value of the second index may be derived based on both user coverage and positive sample rate. For example, the value of the second index may be an output value of a preset coverage processing model obtained by inputting the user coverage into the model. The value of the second indicator may also be a weighted value of the user coverage rate and the positive sample rate, and then the execution subject may weight the user coverage rate and the positive sample rate of the binding type to obtain the value of the second indicator. The numerical value of the second index may also be a numerical value corresponding to the user coverage rate and the positive sample rate. For example, the execution subject may input the user coverage rate and the positive sample rate of the binding type into the preset index model to obtain the value of the second index, or the execution subject may search the value of the second index corresponding to the user coverage rate and the positive sample rate of the binding type in the preset correspondence table.
The implementation modes can verify whether the obtained binding type is effective or not, so that the user coverage rate and the positive sample rate of the training sample are improved by obtaining the binding type, and the effect of training the deep learning model by using the training sample is further improved.
In some optional implementations of this embodiment, combining the values in the log value set to obtain at least two value ranges may include: sequencing all values in the value set according to the size of the values to obtain a value sequence; determining the difference value between each group of adjacent numerical values in the numerical value sequence, and determining at least one group of adjacent numerical values according to the sequence of the difference values from large to small; for each adjacent value of at least one group of adjacent values, the larger value of the group of adjacent values is taken as the minimum value of one value range, and the smaller value of the group of adjacent values is taken as the maximum value of another value range, so as to obtain at least two value ranges.
In these alternative implementations, the execution subject may sort each value in the set of values according to size, and use the sorted result as a value sequence. The execution body may then determine differences between adjacent values in the sequence of values. Every two adjacent values may be considered as a set of adjacent values. The execution body may sort the obtained differences, and determine a preset number of differences, for example, determine the largest difference. And taking the adjacent numerical value of the difference value of the preset number as the at least one group of adjacent numerical values.
The execution body may use two values included in each of at least one set of adjacent values as values of two adjacent value ranges respectively. The difference between each adjacent value in the at least one group of adjacent values is greater than or equal to the difference between any other adjacent value in the at least one group of adjacent values in the value sequence.
As shown in fig. 3a and 3b, the user characteristic information may be whether a user installs a certain application, i.e., app. The initial types of user feature information may include 5 initial types, app1 type, app2 type, app3 type, app4 type, app5 type, respectively. These 5 initial types include the following user characteristic information, respectively: whether the user has app1 installed, whether the user has app2 installed, whether the user has app3 installed, whether the user has app4 installed, and whether the user has app5 installed. The values of the first indexes, i.e., the values of the positive sample rates, corresponding to the 5 initial types are 0.8, 0.7, 0.6, 0.1, and 0.1, respectively, and the user coverage rates of the 5 apps are 3%, 4%, and 4%, respectively.
The execution agent may merge 0.8, 0.7, 0.6 into a value range, i.e., 0.6-0.8, corresponding to the first binding type, involving app 1-3. And merges 0.1, 0.1 into a range of values, i.e., 0.1, corresponding to the second binding type, involving app 4-5. The user characteristic information of the first binding type may indicate whether the user installs app1, app2, or app3, the user coverage rate corresponding to the first binding type is 8%, and the positive sample rate of the first binding type is 0.7. The user characteristic information of the second binding type may indicate whether the user installs app4 or app5, the user coverage rate corresponding to the second binding type is 8%, and the positive sample rate of the second binding type is 0.1.
The implementation modes can respectively use the user characteristic information with larger difference value of the index values as the user characteristic information of two different types, thereby better distinguishing different characteristics of the input deep learning model and being beneficial to obtaining the more accurate deep learning model.
In some optional implementations of this embodiment, the value of the first indicator is obtained based on at least one of: positive sample rate, information value, evidence weight.
In these alternative implementations, the first indicator may be any one of a positive sample rate, an information value, and an evidence weight, or a numerical value of the first indicator may be obtained by performing a preset process on at least one of the positive sample rate, the information value, and the evidence weight.
In practice, the execution body described above may determine the value of the first index in various ways. For example, the executing entity may determine an Information Value (IV) and an Evidence Weight (WOE) of the initial type of user feature Information in a preset corresponding relationship table for determining the first index, input the Information Value and the Evidence Weight into a preset formula, and use the result of the preset formula as the Value of the first index.
The realization modes can realize the quantitative differentiation of the user characteristic information through the positive sample rate, the information value and the evidence weight, thereby being beneficial to obtaining different binding types.
With further reference to fig. 4, a flow 400 of yet another embodiment of a method of processing user characteristic information is shown. The deep learning model may be a decision tree model. The process 400 includes the following steps:
step 401, obtaining a plurality of initial types of user characteristic information, and determining a numerical value of a first index for each initial type of user characteristic information to obtain a numerical value set corresponding to the plurality of initial types.
In this embodiment, an execution subject (for example, the server or the terminal device shown in fig. 1) on which the user characteristic information processing method operates may obtain a plurality of initial types of user characteristic information, and determine a numerical value of the first index for each of the plurality of initial types of user characteristic information, so as to obtain a numerical value set corresponding to the plurality of initial types.
Step 402, combining the values in the log set to obtain at least two value ranges, and using at least one initial type corresponding to each value range as a binding type.
In this embodiment, the execution body may combine the values in the value set into at least two value ranges. Each numerical range corresponds to at least one initial type, that is, each numerical range is a range in which the numerical value of the user characteristic information in at least one initial type is located. At least one initial type may be bound as a binding type. The number of binding types obtained is less than the number of initial types in the plurality of initial types.
Step 403, determining user feature information of a binding type based on the user feature information included in at least one initial type, where the user coverage of any user feature information in the binding type is greater than or equal to the user coverage of the user feature information corresponding to any user feature information in the initial type.
In this embodiment, after the execution main body binds the binding type, the user feature information of the binding type corresponding to at least one initial type may be determined based on the at least one initial type corresponding to each value range. The user coverage rate may be used to indicate a ratio of users having a feature corresponding to the feature information to the total number of users. The total number of users herein may refer to the total number of users to which the set of user characteristic information relates.
Step 404, determining a training sample of the decision tree model based on the user characteristic information of each binding type.
In this embodiment, the executing entity may determine a training sample of the decision tree model based on the user feature information of each binding type, and the training sample may be used to train the deep learning model.
Step 405, inputting the training sample into the decision tree model to train the decision tree model, so as to obtain the trained decision tree model.
In this embodiment, the executing entity may input the determined training sample into a decision tree model to train the decision tree model, so as to obtain a trained decision tree model.
The embodiment can avoid the problems that the type features with too small data volume and low user coverage rate are ignored or unavailable when the decision tree model is trained. Furthermore, by increasing the amount of data included by a type, the accuracy of training the decision tree may be increased.
In some optional implementations of this embodiment, the method may further include: obtaining a prediction result of the trained decision tree model; determining the prediction accuracy of the trained decision tree model based on the prediction result and the real data; and if the prediction accuracy is greater than a preset effect threshold value, taking the trained decision tree model as an applicable model to be deployed.
In these alternative implementations, the executing agent may perform effect verification on the trained decision tree model. Specifically, the execution subject may perform prediction by using the trained decision tree model to obtain a prediction result. Thereafter, the execution subject may determine a prediction effect value based on the prediction result and the real data. Deployment refers to the application of the trained decision tree model in practice. An applicable model refers to a model that can be applied.
In practice, the predicted effect value may be a predicted accuracy value and/or a predicted speed value. For example, the prediction accuracy value may be an inverse of a loss value obtained by inputting the prediction result and the real data into a preset loss function. The predicted speed value may be the inverse of the time it takes to make the prediction using the decision tree model.
These implementations can verify the effectiveness of the trained decision tree model and, if it is determined that the effectiveness is good, take the model as an applicable model.
With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of a device for processing user feature information, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and besides the features described below, the embodiment of the device may further include the same or corresponding features or effects as the embodiment of the method shown in fig. 2. The device can be applied to various electronic equipment.
As shown in fig. 5, the apparatus 500 for processing user characteristic information of the present embodiment includes: an acquisition unit 501, a merging unit 502, a feature determination unit 503, and a sample determination unit 504. The obtaining unit 501 is configured to obtain a plurality of initial types of user characteristic information, and determine a numerical value of a first index for each initial type of user characteristic information, to obtain a numerical value set corresponding to the plurality of initial types; a merging unit 502 configured to merge values in the value set to obtain at least two value ranges, and use at least one initial type corresponding to each value range as a binding type; a feature determining unit 503 configured to determine user feature information of a binding type based on user feature information included in at least one initial type, wherein a user coverage rate of any user feature information in the binding type is greater than or equal to a user coverage rate of user feature information corresponding to any user feature information in the initial type; a sample determination unit 504 configured to determine a training sample of the deep learning model based on the user feature information of the binding type.
In this embodiment, specific processes of the obtaining unit 501, the combining unit 502, the feature determining unit 503, and the sample determining unit 504 of the user feature information processing apparatus 500 and technical effects brought by the specific processes can refer to related descriptions of step 201, step 202, step 203, and step 204 in the corresponding embodiment of fig. 2, which are not repeated herein.
In some optional implementations of the embodiment, the sample determining unit is further configured to perform the determining of the training sample of the deep learning model based on the user feature information of the binding type as follows: for each binding type in the respective binding types, determining a value of a second indicator of the user characteristic information of the binding type, wherein the value of the second indicator is obtained based on at least one of the following: user coverage, positive sample rate; and if the numerical value of the second index of each binding type is larger than or equal to the preset numerical value threshold, determining a training sample of the deep learning model based on the user characteristic information of the binding type.
In some optional implementations of this embodiment, the merging unit is further configured to perform merging of the values in the log value set to obtain at least two value ranges as follows: sequencing all values in the value set according to the size of the values to obtain a value sequence; determining the difference value between each group of adjacent numerical values in the numerical value sequence, and determining at least one group of adjacent numerical values according to the sequence of the difference values from large to small; for each adjacent value of at least one group of adjacent values, the larger value of the group of adjacent values is taken as the minimum value of one value range, and the smaller value of the group of adjacent values is taken as the maximum value of another value range, so as to obtain at least two value ranges.
In some optional implementations of this embodiment, the deep learning model is a decision tree model; the device still includes: and the training unit is configured to input the training samples into the decision tree model so as to train the decision tree model to obtain the trained decision tree model.
In some optional implementations of this embodiment, the apparatus further includes: a prediction unit configured to obtain a prediction result of the trained decision tree model; an effect determination unit configured to determine a predicted effect value of the trained decision tree model based on the prediction result and the real data; and the unit to be deployed is configured to take the trained decision tree model as the applicable model to be deployed if the predicted effect value is greater than a preset effect threshold value.
In some optional implementations of this embodiment, the value of the first indicator is obtained based on at least one of: positive sample rate, information value, evidence weight.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 6, the electronic device is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.
The memory 602 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor, so that the at least one processor executes the processing method of the user characteristic information provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the processing method of user characteristic information provided by the present application.
The memory 602, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the processing method of user feature information in the embodiment of the present application (for example, the acquisition unit 501, the merging unit 502, the feature determination unit 503, and the sample determination unit 504 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing, i.e., a processing method of the user characteristic information in the above-described method embodiments, by executing the non-transitory software programs, instructions, and modules stored in the memory 602.
The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the processing electronics of the user characteristic information, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, and these remote memories may be connected over a network to processing electronics for user characteristic information. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the user feature information processing method may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.
The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the processing electronics for user characteristic information, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a merging unit, a feature determination unit, and a sample determination unit. Where the names of the units do not in some cases constitute a limitation on the units themselves, for example, the sample determination unit may also be described as a "unit that determines training samples of the deep learning model based on the user feature information of the binding type".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring user characteristic information of a plurality of initial types, and determining a numerical value of a first index for the user characteristic information of each initial type to obtain a numerical value set corresponding to the plurality of initial types; combining the numerical values in the numerical value set to obtain at least two numerical value ranges, and taking at least one initial type corresponding to each numerical value range as a binding type; determining user characteristic information of a binding type based on user characteristic information included in at least one initial type, wherein the user coverage rate of any user characteristic information in the binding type is greater than or equal to the user coverage rate of the user characteristic information corresponding to any user characteristic information in the initial type; and determining a training sample of the deep learning model based on the user characteristic information of the binding type.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (14)

1. A method for processing user characteristic information, the method comprising:
acquiring user characteristic information of a plurality of initial types, and determining a numerical value of a first index for the user characteristic information of each initial type to obtain a numerical value set corresponding to the plurality of initial types;
combining the numerical values in the numerical value set to obtain at least two numerical value ranges, and taking at least one initial type corresponding to each numerical value range as a binding type;
determining the user characteristic information of the binding type based on the user characteristic information included in the at least one initial type, wherein the user coverage rate of any user characteristic information in the binding type is greater than or equal to the user coverage rate of the user characteristic information corresponding to the any user characteristic information in the initial type;
and determining a training sample of the deep learning model based on the user characteristic information of the binding type.
2. The method of claim 1, wherein the determining training samples of a deep learning model based on the user feature information of the binding type comprises:
for each binding type in the binding types, determining a value of a second index of the user characteristic information of the binding type, wherein the value of the second index is obtained based on at least one of the following: user coverage, positive sample rate;
and if the value of the second index of each binding type is larger than or equal to a preset value threshold, determining a training sample of the deep learning model based on the user characteristic information of the binding type.
3. The method of claim 1, wherein the combining the values in the set of values to obtain at least two ranges of values comprises:
sorting all numerical values in the numerical value set according to the magnitude of the numerical values to obtain a numerical value sequence;
determining the difference value between each group of adjacent numerical values in the numerical value sequence, and determining at least one group of adjacent numerical values according to the sequence of the difference values from large to small;
for each adjacent value in the at least one group of adjacent values, taking the larger value of the group of adjacent values as the minimum value of one value range, and taking the smaller value of the group of adjacent values as the maximum value of another value range, so as to obtain the at least two value ranges.
4. The method of claim 1, wherein the deep learning model is a decision tree model;
the method further comprises the following steps:
and inputting the training sample into the decision tree model to train the decision tree model to obtain the trained decision tree model.
5. The method of claim 4, wherein the method further comprises:
obtaining a prediction result of the trained decision tree model;
determining a prediction effect value of the trained decision tree model based on the prediction result and the real data;
and if the predicted effect value is larger than a preset effect threshold value, taking the trained decision tree model as an applicable model to be deployed.
6. The method of claim 1, wherein the value of the first indicator is derived based on at least one of: positive sample rate, information value, evidence weight.
7. An apparatus for processing user characteristic information, the apparatus comprising:
the system comprises an acquisition unit, a calculation unit and a display unit, wherein the acquisition unit is configured to acquire a plurality of initial types of user characteristic information, and for each initial type of user characteristic information, determine a numerical value of a first index to obtain a numerical value set corresponding to the plurality of initial types;
a merging unit configured to merge the values in the value set to obtain at least two value ranges, and take at least one initial type corresponding to each value range as a binding type;
the characteristic determining unit is configured to determine the user characteristic information of the one binding type based on the user characteristic information included in the at least one initial type, wherein the user coverage rate of any user characteristic information in the binding type is greater than or equal to the user coverage rate of the user characteristic information corresponding to the any user characteristic information in the initial type;
a sample determining unit configured to determine a training sample of the deep learning model based on the user feature information of the binding type.
8. The apparatus of claim 7, wherein the sample determining unit is further configured to perform the determining of the training sample of the deep learning model based on the user feature information of the binding type as follows:
for each binding type in the binding types, determining a value of a second index of the user characteristic information of the binding type, wherein the value of the second index is obtained based on at least one of the following: user coverage, positive sample rate;
and if the value of the second index of each binding type is larger than or equal to a preset value threshold, determining a training sample of the deep learning model based on the user characteristic information of the binding type.
9. The apparatus of claim 7, wherein the merging unit is further configured to perform the merging of the values in the value set to obtain at least two value ranges as follows:
sorting all numerical values in the numerical value set according to the magnitude of the numerical values to obtain a numerical value sequence;
determining the difference value between each group of adjacent numerical values in the numerical value sequence, and determining at least one group of adjacent numerical values according to the sequence of the difference values from large to small;
for each adjacent value in the at least one group of adjacent values, taking the larger value of the group of adjacent values as the minimum value of one value range, and taking the smaller value of the group of adjacent values as the maximum value of another value range, so as to obtain the at least two value ranges.
10. The apparatus of claim 7, wherein the deep learning model is a decision tree model;
the device further comprises:
and the training unit is configured to input the training samples into the decision tree model so as to train the decision tree model to obtain a trained decision tree model.
11. The apparatus of claim 10, wherein the apparatus further comprises:
a prediction unit configured to obtain a prediction result of the trained decision tree model;
an effect determination unit configured to determine a predicted effect value of the trained decision tree model based on the prediction result and the real data;
and the unit to be deployed is configured to take the trained decision tree model as an applicable model to be deployed if the predicted effect value is greater than a preset effect threshold value.
12. The apparatus of claim 7, wherein the value of the first indicator is derived based on at least one of: positive sample rate, information value, evidence weight.
13. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.
14. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-6.
CN202011037919.XA 2020-09-28 2020-09-28 Method and device for processing user characteristic information Pending CN112149807A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011037919.XA CN112149807A (en) 2020-09-28 2020-09-28 Method and device for processing user characteristic information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011037919.XA CN112149807A (en) 2020-09-28 2020-09-28 Method and device for processing user characteristic information

Publications (1)

Publication Number Publication Date
CN112149807A true CN112149807A (en) 2020-12-29

Family

ID=73895588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011037919.XA Pending CN112149807A (en) 2020-09-28 2020-09-28 Method and device for processing user characteristic information

Country Status (1)

Country Link
CN (1) CN112149807A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017219548A1 (en) * 2016-06-20 2017-12-28 乐视控股(北京)有限公司 Method and device for predicting user attributes
CN109285075A (en) * 2017-07-19 2019-01-29 腾讯科技(深圳)有限公司 A kind of Claims Resolution methods of risk assessment, device and server
WO2019114422A1 (en) * 2017-12-15 2019-06-20 阿里巴巴集团控股有限公司 Model integration method and apparatus
WO2019237657A1 (en) * 2018-06-15 2019-12-19 北京字节跳动网络技术有限公司 Method and device for generating model
CN111225009A (en) * 2018-11-27 2020-06-02 北京沃东天骏信息技术有限公司 Method and apparatus for generating information
CN111428008A (en) * 2020-06-11 2020-07-17 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for training a model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017219548A1 (en) * 2016-06-20 2017-12-28 乐视控股(北京)有限公司 Method and device for predicting user attributes
CN109285075A (en) * 2017-07-19 2019-01-29 腾讯科技(深圳)有限公司 A kind of Claims Resolution methods of risk assessment, device and server
WO2019114422A1 (en) * 2017-12-15 2019-06-20 阿里巴巴集团控股有限公司 Model integration method and apparatus
WO2019237657A1 (en) * 2018-06-15 2019-12-19 北京字节跳动网络技术有限公司 Method and device for generating model
CN111225009A (en) * 2018-11-27 2020-06-02 北京沃东天骏信息技术有限公司 Method and apparatus for generating information
CN111428008A (en) * 2020-06-11 2020-07-17 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for training a model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
乔雨;李玲娟;: "融合用户相似度与评分信息的协同过滤算法", 南京邮电大学学报(自然科学版), no. 03 *
李媛媛;李旭晖;: "结合本体与社会化标签的用户动态兴趣建模研究", 情报学报, no. 04 *

Similar Documents

Publication Publication Date Title
CN111428008B (en) Method, apparatus, device and storage medium for training a model
CN112036509A (en) Method and apparatus for training image recognition models
CN111667056A (en) Method and apparatus for searching model structure
CN112084366A (en) Method, apparatus, device and storage medium for retrieving image
CN111782785B (en) Automatic question and answer method, device, equipment and storage medium
CN110706147A (en) Image processing environment determination method and device, electronic equipment and storage medium
CN113238943A (en) Method and device for continuous integration test
CN111753964A (en) Neural network training method and device
CN111126063A (en) Text quality evaluation method and device
CN112561059B (en) Method and apparatus for model distillation
CN112329453B (en) Method, device, equipment and storage medium for generating sample chapter
CN113765734A (en) Method and device for detecting network access amount
CN111783427A (en) Method, device, equipment and storage medium for training model and outputting information
CN112115334A (en) Method, device, equipment and storage medium for distinguishing hot content of network community
CN111767990A (en) Neural network processing method and device
CN112149807A (en) Method and device for processing user characteristic information
CN111625710B (en) Processing method and device of recommended content, electronic equipment and readable storage medium
CN112598136B (en) Data calibration method and device
CN111582480A (en) Method and device for pruning a model
CN112101447A (en) Data set quality evaluation method, device, equipment and storage medium
CN113128436A (en) Method and device for detecting key points
CN111767988A (en) Neural network fusion method and device
CN111782794A (en) Question-answer response method and device
CN112733879A (en) Model distillation method and device for different scenes
CN111767728A (en) Short text classification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination