CN115131025A - User type identification method and device, computer equipment and storage medium - Google Patents

User type identification method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN115131025A
CN115131025A CN202110336507.4A CN202110336507A CN115131025A CN 115131025 A CN115131025 A CN 115131025A CN 202110336507 A CN202110336507 A CN 202110336507A CN 115131025 A CN115131025 A CN 115131025A
Authority
CN
China
Prior art keywords
sample
user
target
space
sample space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110336507.4A
Other languages
Chinese (zh)
Inventor
杨志欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110336507.4A priority Critical patent/CN115131025A/en
Publication of CN115131025A publication Critical patent/CN115131025A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/405Establishing or using transaction specific rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Computer Security & Cryptography (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application discloses a user type identification method, a user type identification device, computer equipment and a storage medium; the method comprises the steps of generating a sample space of a sample user set, wherein the sample user set comprises at least one sample user, and the sample space comprises user characteristics of each sample user in at least one characteristic dimension; determining a target user set from the sample user set, the target user set comprising at least one target sample user; based on the distribution of the target user set in each feature dimension, carrying out space removal processing on the sample space to obtain a target sample space, wherein the distribution information of the sample user set in the target sample space meets a preset distribution condition; determining a user type identification rule based on the characteristic value range of each characteristic dimension in the target sample space; and identifying the user type of the target user based on the user type identification rule. The scheme can effectively generate and apply the user type identification rule, and improves the accuracy and efficiency of user type identification.

Description

User type identification method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a user type identification method, apparatus, computer device, and storage medium.
Background
Accurately identifying the type of user is beneficial for applications to provide better quality of service. For example, in the financial field, identifying the user type of a target user is an important link in risk control in the process of providing a service by an application; as another example, in the social domain, identifying a user type of a target user facilitates an application to provide high quality service content to the target user on a targeted basis, and so forth.
In the research and practice process of the related art, the inventors of the present application found that the current way of identifying the user type can be realized by exploring the rule of identifying the user type, for example, the user type can be explored by a grid method or a machine learning method, but the current way needs to consume larger resources, for example, computing resources, time resources, sample resources, and the like, so that the way of identifying the user type still needs to be improved.
Disclosure of Invention
The embodiment of the application provides a user type identification method, a user type identification device, computer equipment and a storage medium, which can effectively generate and apply a user type identification rule and improve the accuracy and efficiency of user type identification.
The embodiment of the application provides a user type identification method, which comprises the following steps:
generating a sample space of a sample user set, wherein the sample user set comprises at least one sample user, the sample space comprising user features of each of the sample users in at least one feature dimension;
determining a target user set from the sample user set, wherein the target user set comprises at least one target sample user;
based on the distribution of the target user set in each feature dimension, performing space removal processing on the sample space to obtain a target sample space, wherein the distribution information of the sample user set in the target sample space meets a preset distribution condition;
determining a user type identification rule based on the characteristic value range of each characteristic dimension in the target sample space;
and identifying the user type of the target user based on the user type identification rule.
Correspondingly, an embodiment of the present application further provides a user type identification apparatus, including:
a generating unit, configured to generate a sample space of a sample user set, where the sample user set includes at least one sample user, and the sample space includes user features of each of the sample users in at least one feature dimension;
a target determination unit for determining a target user set from the sample user set, wherein the target user set comprises at least one target sample user;
a removing unit, configured to perform spatial removal processing on the sample space based on distribution of the target user set in each feature dimension to obtain a target sample space, where distribution information of the sample user set in the target sample space meets a preset distribution condition;
a rule determining unit, configured to determine a user type identification rule based on a feature value range of each feature dimension in the target sample space;
and the identification unit is used for identifying the user type of the target user based on the user type identification rule.
In one embodiment, the removing unit includes:
the space removal subunit is configured to perform space removal processing on the sample space based on the distribution of the target user set in each feature dimension, so as to obtain a removed sample space;
an information obtaining subunit, configured to obtain distribution information of the sample user set in the removed sample space;
and the space determining subunit is configured to, if the distribution information satisfies the preset distribution condition, use the removed sample space as a target sample space.
In an embodiment, the removing unit further includes:
and the target updating subunit is configured to update the sample space to the removed sample space if the distribution information does not satisfy the preset distribution condition, return to execute the step of performing the spatial removal processing on the sample space based on the distribution of the target user set in each feature dimension.
In an embodiment, the removing subunit is configured to:
determining a subspace to be removed corresponding to each characteristic dimension based on the distribution of the sample user set under each characteristic dimension, wherein the subspace to be removed comprises each characteristic dimension; determining a target removal subspace from each subspace to be removed based on the distribution of the target user set in each subspace to be removed; and aiming at the target removal subspace, carrying out space removal processing on the sample space to obtain a removed sample space.
In an embodiment, the removing subunit is specifically configured to:
based on the distribution of the sample user set under each characteristic dimension, sorting the sample users in the sample user set to obtain sorting results corresponding to each characteristic dimension; selecting users to be removed of each characteristic dimension from the sample user set according to the sorting result; and determining the subspace to be removed corresponding to each feature dimension based on the user to be removed.
In an embodiment, the removing subunit is specifically configured to:
calculating a spatial removal contribution degree of each subspace to be removed to the sample space based on the distribution of the target user set in each subspace to be removed, wherein the spatial removal contribution degree characterizes the distribution characteristics of the target user set in the removed sample space after the subspace to be removed is removed from the sample space; and determining a target removal subspace from the subspace to be removed according to the space removal contribution degree.
In an embodiment, the removing subunit is specifically configured to:
obtaining attribute information of the target removal subspace on at least one characteristic dimension; and according to the attribute information, carrying out space removal processing on the sample space to obtain a removed sample space.
In one embodiment, the rule determining unit includes:
a rule determining subunit, configured to determine, based on a feature value range of each feature dimension in the target sample space, a feature identification rule corresponding to each feature dimension;
and the rule combination subunit is used for combining the characteristic identification rules to obtain the combined user type identification rule.
In one embodiment, the identification unit includes:
a feature obtaining subunit, configured to obtain a user feature of the target user in at least one feature dimension;
the feature identification subunit is used for carrying out feature identification on the user features based on the user type identification rule to obtain an identification result;
and the type determining subunit is used for determining the user type of the target user based on the identification result.
In one embodiment, the target determination unit includes:
the label obtaining subunit is used for obtaining the real label of each sample user in the sample user set;
a user determination subunit, configured to determine a target sample user from the sample user set according to the real tag;
and the target determining subunit is used for determining a target user set based on the target sample users.
Accordingly, the present application further provides a storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the steps of the user type identification method as shown in the present application.
Accordingly, the embodiment of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the user type identification method according to the embodiment of the present application.
The embodiment of the application can generate a sample space of a sample user set, wherein the sample user set comprises at least one sample user, and the sample space comprises user features of each sample user in at least one feature dimension; determining a target user set from the sample user set, wherein the target user set comprises at least one target sample user; based on the distribution of the target user set in each feature dimension, performing space removal processing on the sample space to obtain a target sample space, wherein the distribution information of the sample user set in the target sample space meets a preset distribution condition; determining a user type identification rule based on the characteristic value range of each characteristic dimension in the target sample space; and identifying the user type of the target user based on the user type identification rule.
According to the method and the device, the high-dimensional space samples are subjected to space removal processing, so that the concentration of target sample users in the target sample space obtained after processing is far greater than that of the whole sample space, and the high-concentration space is concerned, and the user type identification rule is further obtained by searching and combining the high-concentration space, so that the user type identification model can be applied to identify the user type. The user type prediction rule is determined by not exhaustively exhausting all combination possibilities, but is purposefully aimed at a high-concentration local space in a sample space, and the user type identification rule is generated based on the local space, so that the computing resource and the time resource are greatly saved. In addition, the user type prediction rule is generated by determining the local space with high user concentration of the target sample in given sample data without absorbing a large amount of sample data to train a machine-learned global model, so that the user type of the target user is efficiently and accurately predicted by applying the target type prediction rule.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic view of a scenario of a user type identification method according to an embodiment of the present application;
FIG. 2 is a flowchart of a user type identification method provided in an embodiment of the present application;
fig. 3 is a schematic sample space diagram of a user type identification method according to an embodiment of the present application;
fig. 4 is another schematic flowchart of a user type identification method provided in an embodiment of the present application;
FIG. 5 is a schematic diagram of another sample space of a user type identification method provided in an embodiment of the present application;
FIG. 6 is a schematic diagram of another sample space of a user type identification method provided in an embodiment of the present application;
fig. 7 is another schematic flowchart of a user type identification method provided in an embodiment of the present application;
fig. 8 is another schematic flowchart of a user type identification method provided in an embodiment of the present application;
fig. 9 is a schematic structural diagram of a user type identification apparatus according to an embodiment of the present application;
fig. 10 is another schematic structural diagram of a user type identification device provided in an embodiment of the present application;
fig. 11 is another schematic structural diagram of a user type identification device according to an embodiment of the present application;
fig. 12 is another schematic structural diagram of a user type identification device according to an embodiment of the present application;
fig. 13 is another schematic structural diagram of a user type identification device according to an embodiment of the present application;
fig. 14 is another schematic structural diagram of a user type identification device according to an embodiment of the present application;
FIG. 15 is a schematic structural diagram of a computer device provided in an embodiment of the present application;
FIG. 16 is a block chain system according to an embodiment of the present disclosure;
fig. 17 is another structural diagram of a block chain system according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a user type identification method and device. Specifically, the embodiment of the application provides a user type identification device suitable for computer equipment. The computer device may be a terminal or a server, and the terminal may be a mobile phone, a tablet computer, a notebook computer, a vehicle-mounted computer, and the like. The server may be a single server or a server cluster composed of a plurality of servers.
In the embodiment of the present application, a user type identification apparatus is taken as an example of a server to introduce a user type identification method.
In particular, the server may generate a sample space of a sample user set, wherein the sample user set includes at least one sample user, the sample space including user features of each sample user in at least one feature dimension; determining a target user set from the sample user set, wherein the target user set comprises at least one target sample user; based on the distribution of the target user set in each feature dimension, carrying out space removal processing on the sample space to obtain a target sample space, wherein the distribution information of the sample user set in the target sample space meets a preset distribution condition; determining a user type identification rule based on the characteristic value range of each characteristic dimension in the target sample space; and identifying the user type of the target user based on the user type identification rule.
In an embodiment, referring to fig. 1, a user type identification system provided in an embodiment of the present application may include a server 10, a terminal 20, and the like; the server 10 and the terminal 20 may be connected via a network, such as a wired or wireless network connection.
Wherein, the terminal 20 may run related applications, such as financial applications, social applications, etc.; the terminal 20 may send a sample set of users to the server 10, wherein the sample set of users includes at least one sample user, each sample user including its user characteristics in at least one characteristic dimension.
Wherein the server 10 may obtain a sample user set and generate a sample space of the sample user set. Further, the server 10 may determine a target user set from the sample user set based on the true tags of the sample users, wherein the target user set includes at least one target sample user. Further, the server 10 may perform spatial elimination processing on the sample space based on the distribution of the target user set in each feature dimension to obtain a target sample space, where distribution information of the sample user set in the target sample space meets a preset distribution condition; and determining a user type identification rule based on the characteristic value range of each characteristic dimension in the target sample space. The server 10 may also acquire a target user of the user type to be identified from the terminal 20 and identify the user type of the target user based on the user type identification rule.
The following are detailed below. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.
With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.
Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The user type identification method provided by the embodiment of the application can be executed by a terminal or a server, or can be executed by the terminal and the server together; in the embodiment of the present application, the user type identification method is described as an example executed by a server, specifically, executed by a user type identification device integrated in the server, as shown in fig. 2, a specific process of the user type identification method may be as follows:
101. a sample space of a sample user set is generated, wherein the sample user set includes at least one sample user, and the sample space includes user features of each sample user in at least one feature dimension.
Wherein, the sample is a part of individual observed or investigated, and the whole is the whole of the research object. Thus, the sample users are part of the observation or survey, and the sample user set is a set of sample users.
For example, for financial applications, it is often desirable to assess user qualifications to determine what level of risk services the user is offering, e.g., for a good user, a higher risk financial service, such as a loan, may be offered, while for a bad user, a lower risk financial service may be offered or no risk financial service may be offered. Therefore, the financial application may select a part of users from all users as sample users to obtain a sample user set, so that a user type identification rule may be generated according to the user type identification method described in the present application, and the user type identification rule is applied to identify a user type of a target user, for example, identify that the user type to which the target user belongs is a good user type or a bad user type.
As another example, for social applications, it is often desirable to push content that may be of interest to a user, such as advertising content, based on the user's behavioral characteristics and usage preferences, to provide personalized customization to the user. Therefore, the social application may select a part of users from all users as sample users to obtain a sample user set, so that a user type identification rule may be generated according to the user type identification method described in the present application, and the user type identification rule is applied to identify a user type of a target user, for example, identify that the user type to which the target user belongs is a target audience type of an advertisement (i.e., a target promotion user of the advertisement) or a non-audience type of the advertisement (i.e., a target promotion user who is not an advertisement or a non-target promotion user of the advertisement), and so on.
The user characteristics are characteristic description information of the user, for example, the user characteristics may include information describing characteristics of basic attributes, user preferences, living habits, user behaviors, and the like of the user. As an example, for a financial application, the user characteristics of interest may include basic attribute information of the user, such as age, gender, city, etc.; financial behavior information of the user is included, such as loan behavior information, consumption behavior information and the like; and so on.
The feature dimension is a feature type to which the user feature belongs, and for example, different types of user features may be considered to belong to different feature dimensions. As an example, for a financial application, a sample user includes the following ten user characteristics: gender, age, city, education level, purchasing preference, risk preference, number of loans in a year, total loans in a year, multi-head loans behavior, high-risk equipment behavior, and thus, the sample space constructed by the financial application may correspondingly include these ten feature dimensions.
It is to be noted that, in the present application, a specific setting manner of the feature dimension is not limited, different types of user features may be considered to belong to different feature dimensions, or different types of user features may be processed to generate more or less feature dimensions, and may be specifically set based on a service.
The sample space is composed of at least one characteristic dimension, the sample users are elements in the sample space, it is noted that the user characteristics of the sample users in each characteristic dimension are used for determining the positions of the sample users in the sample space, and each sample user can have a specific value. By way of example, referring to fig. 3, a sample space is bounded by two characteristic dimensions: height and weight, and the sample space includes 10 sample users, the user characteristics of each sample user in each characteristic dimension are used to determine the position of the sample user in the sample space, and each sample user may have a specific value, and in fig. 3, the value of each sample user may be male or female.
In an embodiment, the server may obtain a sample user set, where the sample user set includes at least one sample user, each sample user includes user features in at least one feature dimension thereof, the server may generate a sample space of the sample user set based on the user features of each sample user, specifically, the generated sample space may be formed by at least one feature dimension, where the sample users in the sample user set are included, and each sample user may determine the location of the sample user in the sample space according to the user features of the sample user in the respective feature dimensions, and each sample user may further have a specific value.
102. A target user set is determined from the sample user set, wherein the target user set includes at least one target sample user.
In the present application, the target sample users are sample users who need attention for generating the user type identification rule, for example, in a financial application, the sample users can be classified into two types, one is a poor user who strictly rejects and does not provide a risk service (hereinafter, referred to as a "poor user"), for example, including a user who has a serious long-term loan or a user who has a high risk of device behavior; the other class is the rest of the non-strict user. Therefore, when the strict user identification rule is generated, the target sample user is a strict user of the sample users.
For another example, when content promotion is performed for a social application, for example, advertisement promotion, sample users may be classified into two types, one type being targeted promotion users of advertisements, and the other type being non-targeted promotion users of advertisements. Therefore, when the identification rules of the target promotion users of the advertisement are generated, the target sample users are the target promotion users in the sample users.
Accordingly, the target user set is a set of target sample users, and thus, at least one target sample user is included in the target user set.
The manner of determining the target user set from the sample user set may be various, for example, the target sample user may be determined from the sample user set based on the real label of each sample user in the sample user set, and then the target user set is obtained, specifically, the step "determining the target user set from the sample user set, where the target user set includes at least one target sample user", may include:
acquiring real labels of all sample users in a sample user set;
determining target sample users from the sample user set according to the real labels;
based on the target sample users, a set of target users is determined.
The sample user has a corresponding real label besides the user characteristics, wherein the real label is real mark information of the sample user. For example, in a financial application, the authentic tag of the sample user may include: strictly denied users and not strictly denied users; as another example, in a social application, the true tags of the sample users may include: the target promotion users of the advertisements and the non-target promotion users of the advertisements; and so on.
The real label of the sample user can be generated in various ways, for example, the real label of the sample user can be determined by means of manual labeling; for another example, the user characteristics of the sample user may be processed, and the real label of the sample user may be determined based on the processing result; and so on.
Correspondingly, the manner of obtaining the real label of the sample user can be various, for example, the real label of the sample user can be obtained by obtaining the manual labeling result of the sample user; for another example, the real label of the sample user may be obtained by obtaining the feature processing result of the sample user; and so on.
In an embodiment, the server may obtain the real tags of the sample users in the sample user set, and determine the target sample user from the sample user set according to the real tags of the sample users based on the service requirement. For example, in a financial application, a sample user with a genuine label as a strict user may be selected as a target sample user; for another example, in social applications, a sample user with a real tag as a target promotion user may be selected as a target sample user; and so on.
Since the target user set is a set formed by the target sample users, the target user set can be further generated after the target sample users are determined from the sample user set.
103. And performing space removal processing on the sample space based on the distribution of the target user set in each characteristic dimension to obtain a target sample space, wherein the distribution information of the sample user set in the target sample space meets a preset distribution condition.
The distribution of the target user set in each feature dimension may be determined in various ways, and as an example, the distribution of the target user set in one feature dimension is determined, where the distribution of the target user set in the feature dimension may be determined by analyzing the number distribution of the target user set in the feature dimension, for example, by analyzing the number distribution of the target user set in different value ranges of the feature dimension, and the like.
Among them, the space removal processing is processing means for reducing the sample space. The spatial elimination process may be implemented in various ways, for example, the sample space may be gradually reduced by gradually screening off sample users in the sample space; as another example, the sample space may be reduced by reducing the dimensions of features contained in the sample space; for another example, the sample space may be reduced by reducing the range of values of the feature dimension of the sample space; and so on.
The distribution information of the sample user set in the target sample space is related information describing the distribution of the sample user set in the target sample space, and it should be noted that the sample user set here refers to a set formed by all initial sample users. The distribution information of the sample user set in the target sample space may include various forms, for example, the sample capacity of the sample user set in the target sample space may be used as the distribution information; for another example, the aggregation of the sample user set on each feature dimension of the target sample space may be analyzed to determine distribution information; as another example, the distribution density of the sample users in the target sample space may be analyzed to determine distribution information; and so on.
In the application, the target sample space with higher target sample user concentration is obtained by performing space removal processing on the sample space, and then the user type identification rule for judging the user type to which the target sample user belongs is generated, so that how to perform space removal processing on the sample space can be determined by analyzing the distribution of the target user set on each characteristic dimension, and the target sample space required by generating the user type identification rule is obtained.
In an embodiment, to ensure that the finally obtained target sample space has a statistically useful meaning, after the sample space is subjected to the spatial elimination processing to obtain the eliminated sample space, it may be determined whether the eliminated sample space is the required target sample space by judging distribution information of the sample user set in the eliminated sample space, specifically, the step "performing the spatial elimination processing on the sample space based on the distribution of the target user set in each feature dimension to obtain the target sample space" may include:
based on the distribution of the target user set under each characteristic dimension, carrying out space removal processing on the sample space to obtain a removed sample space;
acquiring distribution information of the sample user set in the removed sample space;
and if the distribution information meets the preset distribution condition, taking the removed sample space as a target sample space.
It is to be noted that, in practical applications, by performing the spatial elimination processing on the sample space, the process of obtaining the target sample space may not be performed at once, that is, the target sample space may be obtained by performing the spatial elimination processing on the sample space a plurality of times. For example, by designing an iterative mechanism, iteratively performing spatial removal processing on a sample space to obtain a target sample space, specifically, the step "performing spatial removal processing on the sample space based on distribution of a target user set in each feature dimension to obtain the target sample space" may further include:
and if the distribution information does not meet the preset distribution condition, updating the sample space into a removed sample space, returning to execute the distribution of the target user set under each characteristic dimension, and performing space removal processing on the sample space.
Specifically, referring to fig. 4, the server may perform spatial removal processing on the sample space based on the distribution of the target user set in each feature dimension, so as to obtain a removed sample space. Further, the server can obtain the distribution information of the sample user set in the removed sample space, and if the distribution information meets the preset distribution condition, the removed sample space is used as a target sample space, so that a target sample space is obtained; otherwise, updating the sample space into the removed sample space, and returning to execute the step of performing space removal processing on the sample space based on the distribution of the target user set under each characteristic dimension.
The following explains the step of performing spatial removal processing on the sample space based on the distribution of the target user set in each feature dimension to obtain a removed sample space.
Based on the distribution of the target user set in each feature dimension, there are various ways of performing spatial removal processing on the sample space, for example, in each iteration, a target removal subspace may be selected from a plurality of candidate subspaces to be removed in the sample space, and the target removal subspace is removed from the sample space, so as to obtain a removed sample space. Specifically, the step "performing spatial removal processing on the sample space based on the distribution of the target user set in each feature dimension to obtain a removed sample space" may include:
determining a subspace to be removed corresponding to each characteristic dimension based on the distribution of the sample user set under each characteristic dimension, wherein the subspace to be removed comprises each characteristic dimension;
determining a target removal subspace from each subspace to be removed based on the distribution of the target user set in each subspace to be removed;
and carrying out space removal processing on the sample space aiming at the target removal subspace to obtain a removed sample space.
For example, each feature dimension of the sample space may be analyzed, and the subspace to be removed corresponding to each feature dimension is determined based on the distribution, such as the number distribution, of the sample user set in each feature dimension, so as to obtain a plurality of candidate subspaces to be removed of the sample space. Specifically, the step "determining a subspace to be removed corresponding to each feature dimension based on the distribution of the sample user set in each feature dimension, where the subspace to be removed includes each feature dimension", may include:
based on the distribution of the sample user set under each characteristic dimension, sequencing the sample users in the sample user set to obtain a sequencing result corresponding to each characteristic dimension;
selecting users to be removed of each characteristic dimension from the sample user set according to the sorting result;
and determining the subspace to be removed corresponding to each characteristic dimension based on the user to be removed.
In an embodiment, the distribution of the sample user set in each feature dimension may be determined by analyzing user feature values of the sample users in the sample user set in each feature dimension, specifically, since the sample users may have a specific user feature value in each feature dimension, for example, the sample space in fig. 3 includes two feature dimensions: height and weight, taking sample user 1001 as an example, the sample user has a user characteristic value of 180 in the height characteristic dimension and 160 in the weight characteristic dimension, and similarly, each sample user may have a particular user characteristic value in a different characteristic dimension. Therefore, the sample users in the sample user set can be sorted according to the user characteristic value of each sample user in the sample user set on each characteristic dimension, and a sorting result corresponding to each characteristic dimension is obtained.
For example, taking the feature dimension j in the sample space as an example, the sample users in the sample user set may be sorted according to the feature value of the sample user in the sample user set on the feature dimension j, so as to obtain a sorting result corresponding to the feature dimension j. Similarly, the sample users in the sample user set can be sorted based on the feature values of the sample users in the sample user set in each feature dimension, so that the sorting results corresponding to each feature dimension are obtained.
Further, the users to be removed of each feature dimension can be selected from the sample user set according to the sorting result corresponding to each feature dimension. For example, the following explains a characteristic dimension j in the sample space as an example.
In the mth iteration, the sample space is B m The sample users in the sample user set x can be ranked according to the feature values of the sample users in the sample user set x in the feature dimension j, and the ranking result x corresponding to the feature dimension j is obtained j . And can rootAccording to the sorting result x j Selecting a sample user set x in a sample space B m To-be-removed user in (1).
For example, the result x may be sorted according to j Determining a sample user set x in a sample space B m Alpha quantile x in (1) jm(a) And sample user set x is in sample space B m The (1-. alpha.) quantile x in (1-. alpha.) jm(1-a) . Where α is a hyperparameter representing the proportion of samples removed at each time. In practical applications, a relatively small value, for example, 0.05 to 0.1, may be selected, which has the advantage that each local adjustment does not have a significant effect on the final result. The quantiles are also called quantiles, and refer to numerical points dividing the probability distribution range of a random variable into several equal parts, and commonly used are medians (i.e., binary), quartiles, percentiles, and the like.
Further, the result x can be sorted according to the sorting result j Selecting a user to be removed with a characteristic dimension j from a sample user set x, specifically, selecting x j Medium below alpha quantile x jm(a) Or select x j Higher than (1-. alpha.) quantile x jm(1-a) The sample user of (a) is the user to be removed of the feature dimension j. Therefore, the subspace to be removed corresponding to the characteristic dimension j can be determined as b based on the selected user to be removed mj- And b mj+ Wherein, b mj- ={x|x j ≤x jm(a) },b mj+ ={x|x j ≥x jm(1-a) }。
Similarly, similar processing may be performed on other feature dimensions to determine the subspace to be removed corresponding to each feature dimension, so as to obtain a subspace set to be removed C (b) m )={b m1- ,b m1+ ,b m2- ,b m2+ ,…,b mp- ,b mp+ Where p is the sample space B m The number of feature dimensions involved.
Further, the target removal subspace may be determined from a plurality of candidate to-be-removed subspaces of the sample space in a plurality of manners, for example, each to-be-removed subspace may be evaluated based on a distribution, such as a number distribution, of the target user set in each to-be-removed subspace, so as to select the target to-be-removed subspace therefrom, so that the concentration of the target user set in the removed sample space is maximum after the target to-be-removed subspace is removed from the sample space. Specifically, the step "determining a target removal subspace from each subspace to be removed based on the distribution of the target user set in each subspace to be removed", may include:
calculating the space removal contribution degree of each subspace to be removed to the sample space based on the distribution of the target user set in each subspace to be removed, wherein the space removal contribution degree represents the distribution characteristics of the target user set in the removed sample space after the subspace to be removed is removed from the sample space;
and determining a target removal subspace from the subspace to be removed according to the space removal contribution degree.
The distribution characteristics of the target user set in the removed sample space represent the distribution characteristics of the target user set in the removed sample space, and the distribution characteristics can have various expression forms, for example, the number of target sample users in the removed sample space can be used as the distribution characteristics; for another example, the target sample user concentration of the target user set in the removed sample space may be calculated, and the target sample user concentration is used as a distribution feature; and so on.
Specifically, the target user set is in sample space B m The target sample user concentration in (a) may be calculated with reference to the following equation:
Figure BDA0002997907890000151
wherein n is m Representing a sample space B m Number of sample users in, y i Is the true label of sample user i and y i E {0, 1}, where y i 1 indicates that the sample user is the target sample user, y i And 0 indicates that the sample user is not the target sample user. Therefore, f (y) can be used to represent the set of target users in the sample space B m Similarly, the target sample user concentration in the removed sample space of the target user set can be calculatedAnd (4) degree.
The contribution degree of the subspace to be removed to the space removal of the sample space represents the distribution characteristics of the target user set in the removed sample space after the subspace to be removed is removed from the sample space.
In an embodiment, when calculating the spatial removal contribution degree of the subspace to be removed to the sample space, the target sample user concentration of the target user set in the removed sample space may be used as the spatial removal contribution degree of the subspace to be removed to the sample space. Specifically, for sample space B m After the space removal processing is carried out once, a removed sample space B can be obtained m+1 That is to say,
Figure BDA0002997907890000152
wherein the content of the first and second substances,
Figure BDA0002997907890000153
b m ∈C(b m ) In which
Figure BDA0002997907890000154
The meaning of (1) is that a target removal subspace is selected from a plurality of subspaces to be removed, so that the user concentration of a target sample in the removed sample space is maximum after the target removal subspace is removed.
Further, after the target removal subspace is determined, the sample space can be subjected to space removal processing aiming at the target removal subspace, and a removed sample space is obtained. For the target removal subspace, there may be multiple ways of performing spatial removal processing on the sample space, for example, because the target removal subspace is also composed of at least one feature dimension, the range of the target subspace to be removed may be determined according to attribute information of the target removal subspace in the at least one feature dimension, for example, a feature value range, and then the range is removed from the sample space, thereby achieving removal of the target removal subspace from the sample space, specifically, the step "performing spatial removal processing on the sample space for the target removal subspace to obtain a removed sample space" may include:
acquiring attribute information of a target removal subspace on at least one characteristic dimension;
and according to the attribute information, carrying out space removal processing on the sample space to obtain a removed sample space.
The attribute information of the target removal subspace is attribute information of the target removal subspace in at least one characteristic dimension, for example, the attribute information of the target removal subspace may include a characteristic value range of the target removal subspace in at least one characteristic dimension, and therefore, a space range occupied by the target removal sample space in the sample space may be determined according to the attribute information, and therefore, space removal processing may be performed on the sample space, and the removed sample space is obtained.
As an example, referring to fig. 5, in a sample space composed of two feature dimensions, namely a weight feature dimension and a height feature dimension, a target removal subspace is determined to be an area shown by 1002, wherein attribute information of the target removal subspace is: the feature value range of the target removal subspace in the height feature dimension is (182.5, 192.5), and the feature value range in the weight feature dimension is (85, 175), so that the space region occupied by the target removal subspace can be determined in the sample space according to the attribute information of the target removal subspace in at least one feature dimension, and further the sample space can be subjected to space removal processing to obtain a removed sample space.
It should be noted that, based on the distribution of the target user set in each feature dimension, the sample space is subjected to spatial elimination processing to obtain a target sample space, where distribution information of the sample user set in the target sample space should satisfy a preset distribution condition.
In the expansion of the step "performing spatial removal processing on the sample space based on the distribution of the target user set in each feature dimension to obtain the target sample space", "performing spatial removal processing on the sample space based on the distribution of the target user set in each feature dimension to obtain the explanation of the removed sample space", the following explanation will be given to "obtaining the distribution information of the sample user set in the removed sample space"; and if the distribution information meets the preset distribution condition, taking the removed sample space as a target sample space. "to explain further.
The distribution information of the sample user set in the target sample space is related information describing the distribution situation of the sample user set in the target sample space, for example, the sample capacity of the sample user set in the target sample space may be used as the distribution information; as another example, the aggregation of the sample user set in each feature dimension of the target sample space may be analyzed to determine distribution information; as another example, the distribution density of the sample users in the target sample space may be analyzed to determine distribution information; and so on.
In order to ensure that the finally obtained target sample space has a statistically useful meaning, the sample space is subjected to space removal processing, and after the removed sample space is obtained, whether the distribution information of the sample user set in the removed sample space meets a preset distribution condition is judged to determine whether the removed sample space is the required target sample space.
In an embodiment, the sample capacity of the sample user set in the target sample space may be used as the distribution information, and specifically, the preset distribution condition may refer to the following settings:
through continuous iteration, after k times of space removal processing (wherein k is a positive integer) is executed, a removed sample space B is obtained k If post-removal sample space B k The removed sample space can be used as a target sample space when the following preset distribution conditions are met:
Figure BDA0002997907890000171
in the above equation, the meaning of the indicator function I (-) is if the sample user x i Sample space B after removal k And if so, the value is 1, otherwise, the value is 0. n is the initial total number of sample users, i.e. the totalityTotal number of sample users, therefore, β k Is that it belongs to the post-removal sample space B k Is compared to the initial total number of sample users. Wherein, beta 0 Is another hyperparameter indicating the sample fraction in the target sample space. The parameter should not be too small, otherwise the result will lose statistical significance and be easily passive.
104. And determining a user type identification rule based on the characteristic value range of each characteristic dimension in the target sample space.
For example, in a financial application, users can be classified into two types, one type is a poor user who strictly rejects and is not provided with a risk service (hereinafter, referred to as "strictly rejected user"), and the other type is a non-strictly rejected user. When the financial application is to provide the risk service to the user, the user type to which the user belongs can be identified through the user type identification rule to determine whether the risk service can be provided to the user.
According to the method and the device, the high-dimensional space samples are subjected to space removal processing, so that the concentration of target sample users in the processed target sample space is far greater than that of the whole sample space, and the high-concentration space is concerned, and the user type identification rule is further obtained by searching and combining the high-concentration space, so that the user type identification can be carried out by applying the user type identification model. Therefore, there may be multiple ways to determine the user type identification rule, for example, the feature identification rule corresponding to each feature dimension may be determined based on the value range of each feature dimension in the target sample space, so as to obtain the user type identification rule, specifically, the step "determining the user type identification rule based on the feature value range of each feature dimension in the target sample space" may include:
determining a feature identification rule corresponding to each feature dimension based on a feature value range of each feature dimension in a target sample space;
and combining the characteristic identification rules to obtain a combined user type identification rule.
The feature identification rule corresponding to the feature dimension is a rule for verifying the user feature of the user in the feature dimension, and the verification result may include a verification pass and a verification fail.
Therefore, the feature identification rule corresponding to each feature dimension can be determined based on the feature value range of each feature dimension in the target sample space. As an example, the target sample space may include 3 feature dimensions: d1, D2 and D3, and the feature value ranges of the feature dimensions in the target sample space can be determined to be D1, D2 and D3, respectively, so that the feature identification rule corresponding to the feature dimension D1 can be determined as follows: and if the user characteristic value of the user on the characteristic dimension D1 is within the D1, the verification is passed, otherwise, the verification is not passed. Similarly, the feature identification rule for feature dimension d2 is: and if the user characteristic value of the user on the characteristic dimension D2 is within the D2, the verification is passed, otherwise, the verification is not passed. Similarly, the feature identification rule for feature dimension d3 is: and if the user characteristic value of the user on the characteristic dimension D3 is within the D3, the verification is passed, otherwise, the verification is not passed.
After the feature identification rules corresponding to the feature dimensions are determined, the combined user type identification rules can be obtained by combining the feature identification rules.
For example, the feature recognition rules may be combined through logic symbols to obtain a combined user type recognition rule. The logic symbol is various artificial language symbols used in the logic to represent logic form and logic operation. The main characteristic and function of the logic symbol are that it can precisely and univocally interpret the object it represents, so that it can be used to precisely and concisely represent various logic axioms, theorems and logic operation processes. By way of example, a logical symbol may comprise a logical and, a logical or, a logical no, and the like. It should be noted that the logic symbols used for combination, the combination order and combination level between the feature recognition rules, etc. may be set based on the service requirement.
For another example, different weights may be given to different feature recognition rules, and the feature recognition rules may be combined according to the weights to obtain a combined user type recognition rule, and the like.
105. And identifying the user type of the target user based on the user type identification rule.
After the user type identification rule is confirmed, the user type of the target user can be identified by applying the user type identification rule. Specifically, the step "identifying the user type of the target user based on the user type identification rule" may include:
acquiring user characteristics of a target user under at least one characteristic dimension;
based on the user type identification rule, carrying out feature identification on the user features to obtain an identification result;
and determining the user type of the target user based on the recognition result.
In an embodiment, the user type identification rule is obtained by combining feature identification rules corresponding to 4 feature dimensions, and specifically, the user type identification rule may be R1& & R2& & R3& & R4, where & & is a logical and symbol, which indicates that the result is true only if both operands are true; r1 is a characteristic identification rule corresponding to the characteristic dimension A; r2 is a characteristic identification rule with characteristic dimension corresponding to the characteristic dimension B; r3 is a feature identification rule corresponding to the feature dimension C; r4 is a feature identification rule corresponding to the feature dimension D.
The server can obtain a user feature a of the target user in a feature dimension a, a user feature B in a feature dimension B, a user feature C in a feature dimension C, and a user feature D in a feature dimension D. And performing feature identification on the target user based on the user type identification rule R1& & R2& & R3& & R4, specifically, if the application R1 verifies a, the application R2 verifies b, the application R3 verifies c and the application R4 verifies d, and the obtained verification results are all verified, it can be determined that the user type identification result of the target user is that the target user is the same as the user type of the target sample user, otherwise, it can be determined that the target user is different from the user type of the target sample user.
As can be seen from the above, the present embodiment may generate a sample space of a sample user set, where the sample user set includes at least one sample user, and the sample space includes user features of each sample user in at least one feature dimension; determining a target user set from the sample user set, wherein the target user set comprises at least one target sample user; based on the distribution of the target user set in each feature dimension, performing space removal processing on the sample space to obtain a target sample space, wherein the distribution information of the sample user set in the target sample space meets a preset distribution condition; determining a user type identification rule based on the characteristic value range of each characteristic dimension in the target sample space; and identifying the user type of the target user based on the user type identification rule.
According to the scheme, the space removal processing can be carried out on the high-dimensional space sample, so that the concentration of the target sample user is far greater than that of the whole sample space in the target sample space obtained after the processing, and the user type identification rule is further obtained by searching and combining the high concentration space by paying attention to the high concentration space, so that the user type identification can be carried out by applying the user type identification model. The scheme does not determine the user type prediction rule by violently exhausting all combination possibilities, but purposefully 'aims' at a high-concentration local space in a sample space and generates the user type identification rule based on the local space, so that the calculation resource and the time resource are greatly saved.
In addition, the scheme does not need to train a global model of machine learning by absorbing a large amount of sample data, but generates a user type prediction rule by determining a local space with high user concentration of a target sample in given sample data, so that the user type of the target user is efficiently and accurately predicted by applying the target type prediction rule.
In addition, after the scheme obtains the sample user set, only the needed hyper-parameters alpha and beta need to be artificially matched 0 The setting is performed, that is, the user type identification rule is generated based on the sample user set, and thus, the user type identification rule can be generated semi-automatically. After the method is actually applied, the development cycle time of the user type identification rule is greatly shortened, the time consumption of the steps of data preparation, feature processing, combined feature rule screening and the like is reduced from one day to 1 hour, and the efficiency is greatly improved. In addition, based on the method, the feature space can be fully excavated, and the obtained feature combination can be satisfied in interpretability, accuracy and coverage.
The method described in the above examples is further described in detail below by way of example.
In this embodiment, a user type identification device is integrated in a server and a terminal, for example, the server may be a single server or a server cluster composed of a plurality of servers; the terminal can be a mobile phone, a tablet computer, a notebook computer and other equipment.
As shown in fig. 7, a user type identification method specifically includes the following steps:
201. the server obtains a sample user set sent by the terminal, wherein the sample user set comprises at least one sample user.
In an embodiment, the user type identification method described in the present application may be applied to generate a wind control policy rule, and in particular, may be used to generate a wind control policy rule capable of identifying a malicious user, where the malicious user is also called a blacklist user, and refers to a user with a lower credit or a higher risk in the financial field.
The server may obtain a sample user set sent by the terminal, where the sample user set may include n sample users (n is a positive integer), and each sample user may include a user feature in at least one feature dimension and a corresponding real tag. For example, the characteristic dimensions may include dimensions of gender, age, city, education level, purchasing preference, risk preference, number of loans in a year, total loans in a year, multi-head loans, high-risk equipment behavior, and the like. While the corresponding genuine tags may include malicious users and non-malicious users.
202. The server generates a sample space of a sample user set, wherein the sample space includes user features of each sample user in at least one feature dimension.
In an embodiment, the sample space generated by the server may be composed of at least one feature dimension, where each sample user may determine its position in the sample space by its user feature value in each feature dimension, and the value of the sample user in the sample space may be consistent with the true tag of the sample user.
For example, the server-generated sample space may consist of the following ten feature dimensions: gender, age, city, education level, purchasing preference, risk preference, number of loans in a year, total loans in a year, multi-head loans and high-risk equipment behaviors. Each sample user can determine the position of the sample user in the sample space by the user characteristic value of each sample user in each characteristic dimension, and in addition, the value of the sample user in the sample space can be determined according to the real label of the sample user, for example, a malicious user or a non-malicious user.
203. The server determines a target user set from the sample user set, wherein the target user set includes at least one target sample user.
In an embodiment, the target sample user may be determined according to the real label of the sample user, and then the target user set is obtained. For example, a sample user whose true label is a malicious user may be determined as a target sample user, resulting in a target user set consisting of target sample users.
204. And the server performs space removal processing on the sample space based on the distribution of the target user set under each characteristic dimension to obtain a removed sample space.
In an embodiment, referring to FIG. 8, a serviceThe processor may iteratively perform a spatial elimination process on the sample space to obtain a target sample space. Specifically, first, the relevant parameters may be initialized: initializing sample space to B 1 The sample removal ratio is α, and the sample fraction in the target sample space is β 0 Wherein α and β 0 Are hyper-parameters that may be set based on traffic demand.
Further, the server may determine a subspace to be removed corresponding to each feature dimension based on the distribution of the sample user set in each feature dimension. For example, the server may rank the sample users in the sample user set based on the distribution of the sample user set under each feature dimension to obtain a ranking result corresponding to each feature dimension, for example, referring to fig. 8, taking a feature dimension in a sample space as an example, the server may find an α quantile x from all sample users x (a) And (1-. alpha.) quantile x (1-a) Similarly, the server may obtain the alpha quantile and the (1-alpha) quantile of the sample user set in each feature dimension.
The server can select a target removal subspace b from the subspaces to be removed corresponding to the characteristic dimensions * Removing the sample to ensure that the concentration of the target sample user in the removed sample space is the highest, and obtaining the removed sample space which is B-B *
205. And the server acquires the distribution information of the sample user set in the removed sample space.
In one embodiment, the distribution information of the sample user set in the post-removal sample space may be calculated by referring to the following formula:
Figure BDA0002997907890000221
where k denotes the number of iterations and k is a positive integer, the meaning of the indicator function I (-) is that, if a sample user x i Sample space B after removal k In this case, the value is 1, otherwise 0. n is the initial total number of sample users, i.e. the total number of total sample users, thus, β k The meaning of (A) isIn the post-removal sample space B k Is compared to the initial total number of sample users. Thus, for the first iteration, β can be calculated with reference to the above equation 1
206. And if the distribution information meets the preset distribution condition, the server takes the removed sample space as a target sample space.
In an embodiment, referring to fig. 8, β may be expressed k And beta 0 By comparison, if beta k Is less than or equal to beta 0 If not, the sample space is updated to the removed sample space, and the step of performing space removal processing on the sample space based on the distribution of the target user set under each feature dimension is returned.
207. And the server determines the user type identification rule based on the characteristic value range of each characteristic dimension in the target sample space.
In an embodiment, the server may determine, based on a feature value range of each feature dimension in the target sample space, a feature identification rule corresponding to each feature dimension, and combine the feature identification rules to obtain a combined user type identification rule.
208. And the server acquires the user information of the target user sent by the terminal.
Wherein the user information of the target user may comprise user characteristics of the target user in at least one characteristic dimension.
209. And the server identifies the user type of the target user based on the user type identification rule and the user information.
In an embodiment, the server may obtain the user characteristics of the target user in at least one characteristic dimension through the user information of the target user. Further, feature recognition may be performed on the user features based on the user type recognition rule to obtain a recognition result, and the user type of the target user may be determined based on the recognition result.
Therefore, in the embodiment of the application, the space removal processing can be performed on the high-dimensional space sample, so that the concentration of the target sample user is far greater than that of the whole sample space in the target sample space obtained after the processing, and the user type identification rule is further obtained by searching and combining the high concentration space by paying attention to the high concentration space, so that the user type identification can be performed by applying the user type identification model. The user type prediction rule is determined not by exhaustively exhausting all combination possibilities, but is purposefully aimed at a high-concentration local space in a sample space, and the user type identification rule is generated based on the local space, so that computing resources and time resources are greatly saved.
In addition, according to the embodiment of the application, a user type prediction rule is generated by determining a local space with high target sample user concentration in given sample data without absorbing a large amount of sample data to train a machine learning global model, so that the user type of the target user is efficiently and accurately predicted by applying the target type prediction rule, and therefore, the embodiment of the application also improves a user type identification mode by saving sample resources.
In addition, after the sample user set is obtained, the embodiment of the application only needs to artificially pair the required hyper-parameters alpha and beta 0 And setting is carried out, so that the efficiency of developing the wind control strategy and the coverage rate of risks can be improved under the condition of giving the characteristics of the target sample user and the related user. After the method is actually applied, the development cycle time of the user type identification rule is greatly shortened, the time consumption of the steps of data preparation, feature processing, combined feature rule screening and the like is reduced from one day to 1 hour, and the efficiency is greatly improved. In addition, based on the method, the feature space can be fully excavated, and the obtained feature combination can be satisfied in interpretability, accuracy and coverage.
In order to better implement the method, correspondingly, the embodiment of the application also provides a user type identification device, wherein the user type identification device can be integrated in a server or a terminal. The server can be a single server or a server cluster consisting of a plurality of servers; the terminal can be a mobile phone, a tablet computer, a notebook computer and other equipment.
For example, as shown in fig. 9, the user type identification apparatus may include a generation unit 301, a target determination unit 302, a removal unit 303, a rule determination unit 304, and an identification unit 305, as follows:
a generating unit 301, configured to generate a sample space of a sample user set, where the sample user set includes at least one sample user, and the sample space includes user features of each sample user in at least one feature dimension;
a target determination unit 302, configured to determine a target user set from the sample user set, wherein the target user set includes at least one target sample user;
a removing unit 303, configured to perform spatial removal processing on the sample space based on the distribution of the target user set in each feature dimension to obtain a target sample space, where distribution information of the sample user set in the target sample space meets a preset distribution condition;
a rule determining unit 304, configured to determine a user type identification rule based on a feature value range of each feature dimension in the target sample space;
an identifying unit 305, configured to identify a user type of the target user based on the user type identification rule.
In an embodiment, referring to fig. 10, the removing unit 303 may include:
a space removal subunit 3031, configured to perform space removal processing on the sample space based on the distribution of the target user set in each feature dimension, to obtain a removed sample space;
an information obtaining subunit 3032, configured to obtain distribution information of the sample user set in the removed sample space;
the space determining subunit 3033 may take the removed sample space as a target sample space if the distribution information satisfies the preset distribution condition.
In an embodiment, referring to fig. 11, the removing unit may further include:
the target updating subunit 3034 may update the sample space to the removed sample space if the distribution information does not satisfy the preset distribution condition, and return to execute the step of performing the spatial removal processing on the sample space based on the distribution of the target user set in each feature dimension.
In an embodiment, the removing subunit 3031 may be configured to:
determining a subspace to be removed corresponding to each characteristic dimension based on the distribution of the sample user set under each characteristic dimension, wherein the subspace to be removed comprises each characteristic dimension; determining a target removal subspace from each subspace to be removed based on the distribution of the target user set in each subspace to be removed; and aiming at the target removal subspace, carrying out space removal processing on the sample space to obtain a removed sample space.
In an embodiment, the removing subunit 3031 may be specifically configured to:
based on the distribution of the sample user set under each characteristic dimension, sequencing the sample users in the sample user set to obtain a sequencing result corresponding to each characteristic dimension; selecting users to be removed of each characteristic dimension from the sample user set according to the sorting result; and determining the subspace to be removed corresponding to each feature dimension based on the user to be removed.
In an embodiment, the removing subunit 3031 may be specifically configured to:
calculating a space removal contribution degree of each subspace to be removed to the sample space based on the distribution of the target user set in each subspace to be removed, wherein the space removal contribution degree characterizes the distribution characteristics of the target user set in the removed sample space after the subspace to be removed is removed from the sample space; and determining a target removal subspace from the subspace to be removed according to the space removal contribution degree.
In an embodiment, the removing subunit 3031 may be specifically configured to:
acquiring attribute information of the target removal subspace on at least one characteristic dimension; and according to the attribute information, carrying out space removal processing on the sample space to obtain a removed sample space.
In an embodiment, referring to fig. 12, the rule determining unit 304 may include:
a rule determining subunit 3041, configured to determine, based on a feature value range of each feature dimension in the target sample space, a feature identification rule corresponding to each feature dimension;
the rule combination subunit 3042 may be configured to combine the feature recognition rules to obtain a combined user type recognition rule.
In an embodiment, referring to fig. 13, the identifying unit 305 may include:
a feature obtaining subunit 3051, configured to obtain a user feature of the target user in at least one feature dimension;
the feature recognition subunit 3052 is configured to perform feature recognition on the user features based on the user type recognition rule, so as to obtain a recognition result;
a type determination subunit 3053, configured to determine a user type of the target user based on the recognition result.
In an embodiment, referring to fig. 14, the target determining unit 302 may include:
a tag obtaining subunit 3021, configured to obtain a real tag of each sample user in the sample user set;
a user determination subunit 3022, which may be configured to determine a target sample user from the sample user set according to the authentic tag;
a target determination subunit 3023, which may be configured to determine a target user set based on the target sample users.
In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.
As can be seen from the above, in the user type identification apparatus of this embodiment, the generation unit 301 generates a sample space of a sample user set, where the sample user set includes at least one sample user, and the sample space includes user features of each sample user in at least one feature dimension; determining, by a target determination unit 302, a target user set from the sample user set, wherein the target user set comprises at least one target sample user; performing, by a removing unit 303, spatial removal processing on the sample space based on the distribution of the target user set in each feature dimension to obtain a target sample space, where distribution information of the sample user set in the target sample space meets a preset distribution condition; determining, by the rule determining unit 304, a user type identification rule based on the feature value ranges of the feature dimensions in the target sample space; the user type of the target user is identified by the identifying unit 305 based on the user type identification rule.
According to the scheme, the space removal processing can be carried out on the high-dimensional space sample, so that the concentration of the target sample user is far greater than that of the whole sample space in the target sample space obtained after the processing, and the user type identification rule is further obtained by searching and combining the high concentration space by paying attention to the high concentration space, so that the user type identification can be carried out by applying the user type identification model. The scheme does not determine the user type prediction rule by violently exhausting all combination possibilities, but purposefully 'aims' at a high-concentration local space in a sample space and generates the user type identification rule based on the local space, so that the calculation resource and the time resource are greatly saved. In addition, according to the scheme, a user type prediction rule is generated by determining a local space with high target sample user concentration in given sample data without absorbing a large amount of sample data to train a machine learning global model, so that the user type of the target user is efficiently and accurately predicted by applying the target type prediction rule, and therefore, the scheme further improves a user type recognition mode by saving sample resources.
In addition, an embodiment of the present application further provides a computer device, where the computer device may be a device such as a server or a terminal, as shown in fig. 15, a schematic structural diagram of the computer device according to the embodiment of the present application is shown, and specifically:
the computer device may include components such as a memory 401 including one or more computer-readable storage media, a processor 402 including one or more processing cores, and a power supply 403. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 15 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. Wherein:
the memory 401 may be used to store software programs and modules, and the processor 402 executes various functional applications and data processing by operating the software programs and modules stored in the memory 401. The memory 401 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the computer device, and the like. Further, the memory 401 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 401 may further comprise a memory controller to provide the processor 402 and the input unit 603 with access to the memory 401.
The processor 402 is a control center of the computer device, connects various parts of the entire mobile phone using various interfaces and lines, and performs various functions of the computer device and processes data by operating or executing software programs and/or modules stored in the memory 401 and calling data stored in the memory 401, thereby integrally monitoring the mobile phone. Optionally, processor 402 may include one or more processing cores; preferably, the processor 402 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 402.
The computer device also includes a power supply 403 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 402 via a power management system to manage charging, discharging, and power consumption management functions via the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
Although not shown, the computer device may further include a camera, a bluetooth module, etc., which will not be described herein. Specifically, in this embodiment, the processor 402 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 401 according to the following instructions, and the processor 402 runs the application programs stored in the memory 401, so as to implement various functions as follows:
generating a sample space of a sample user set, wherein the sample user set comprises at least one sample user, the sample space comprising user features of each of the sample users in at least one feature dimension; determining a target user set from the sample user set, wherein the target user set comprises at least one target sample user; based on the distribution of the target user set in each feature dimension, performing space removal processing on the sample space to obtain a target sample space, wherein the distribution information of the sample user set in the target sample space meets a preset distribution condition; determining a user type identification rule based on the characteristic value range of each characteristic dimension in the target sample space; and identifying the user type of the target user based on the user type identification rule.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
As can be seen from the above, the computer device of this embodiment may perform space removal processing on the high-dimensional space sample, so that the concentration of the target sample user is far greater than the whole sample space in the target sample space obtained after the processing, and further search and combine the high concentration space to obtain the user type identification rule by paying attention to the high concentration space, so that the user type identification may be performed by applying the user type identification model. The computer device does not determine the user type prediction rule by exhaustively exhausting all combination possibilities, but purposefully "targets" a high-concentration local space in the sample space and generates the user type identification rule based on the local space, thereby greatly saving computing resources and time resources. Moreover, the computer device does not need to train a machine learning global model by absorbing a large amount of sample data, but generates a user type prediction rule by determining a local space with high target sample user concentration in given sample data, so that the user type of the target user is efficiently and accurately predicted by applying the target type prediction rule.
The system related to the embodiment of the application can be a distributed system formed by connecting a client, a plurality of nodes (any form of computing equipment in an access network, such as a server and a user terminal) through a network communication mode.
Taking a distributed system as an example of a blockchain system, referring To fig. 16, fig. 16 is an optional structural schematic diagram of the distributed system 100 applied To the blockchain system provided in this embodiment of the present application, and is formed by a plurality of nodes (computing devices in any form in an access network, such as servers and user terminals) and clients, and a Peer-To-Peer (P2P, Peer To Peer) network is formed between the nodes, and the P2P Protocol is an application layer Protocol operating on top of a Transmission Control Protocol (TCP). In a distributed system, any machine, such as a server or a terminal, can join to become a node, and the node comprises a hardware layer, a middle layer, an operating system layer and an application layer.
Referring to the functions of each node in the block chain system shown in fig. 16, the related functions include:
1) routing, the basic function a node has for supporting communication between nodes.
Besides the routing function, the node may also have the following functions:
2) the application is used for being deployed in a block chain, realizing specific services according to actual service requirements, recording data related to the realization functions to form recording data, carrying a digital signature in the recording data to represent a source of task data, and sending the recording data to other nodes in the block chain system, so that the other nodes add the recording data to a temporary block when the source and integrity of the recording data are verified successfully.
For example, the services implemented by the application include:
2.1) a wallet for providing functions of transaction of electronic money, which includes initiating transaction (i.e. sending transaction record of current transaction to other nodes in the blockchain system, and storing the record data of transaction into a temporary block of the blockchain as a response for acknowledging that the transaction is valid after the other nodes successfully verify; of course, the wallet also supports the querying of the remaining electronic money in the electronic money address;
and 2.2) sharing the account book, wherein the shared account book is used for providing functions of operations such as storage, query and modification of account data, record data of the operations on the account data are sent to other nodes in the block chain system, and after the other nodes verify the validity, the record data are stored in a temporary block as a response for acknowledging that the account data are valid, and confirmation can be sent to the node initiating the operations.
2.3) Intelligent contracts, computerized agreements, which can enforce the terms of a contract, implemented by codes deployed on a shared ledger for execution when certain conditions are met, for completing automated transactions according to actual business requirement codes, such as querying the logistics status of goods purchased by a buyer, transferring the buyer's electronic money to the merchant's address after the buyer signs for the goods; of course, smart contracts are not limited to executing contracts for trading, but may also execute contracts that process received information.
3) And the Block chain comprises a series of blocks (blocks) which are mutually connected according to the generated chronological order, new blocks cannot be removed once being added into the Block chain, and recorded data submitted by nodes in the Block chain system are recorded in the blocks.
Referring to fig. 17, fig. 17 is an optional schematic diagram of a Block Structure (Block Structure) provided in this embodiment, where each Block includes a hash value of a transaction record stored in the Block (hash value of the Block) and a hash value of a previous Block, and the blocks are connected by the hash value to form a Block chain. The block may include information such as a time stamp at the time of block generation. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using cryptography, and each data block contains related information for verifying the validity (anti-counterfeiting) of the information and generating a next block.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, the present application provides a storage medium, in which a plurality of instructions are stored, where the instructions can be loaded by a processor to execute the steps in any one of the user type identification methods provided in the present application. For example, the instructions may perform the steps of:
generating a sample space of a sample user set, wherein the sample user set comprises at least one sample user, the sample space comprising user features of each of the sample users in at least one feature dimension; determining a target user set from the sample user set, wherein the target user set comprises at least one target sample user; based on the distribution of the target user set in each feature dimension, performing space removal processing on the sample space to obtain a target sample space, wherein the distribution information of the sample user set in the target sample space meets a preset distribution condition; determining a user type identification rule based on the characteristic value range of each characteristic dimension in the target sample space; and identifying the user type of the target user based on the user type identification rule.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the storage medium may execute the steps in any user type identification method provided in the embodiment of the present application, beneficial effects that can be achieved by any user type identification method provided in the embodiment of the present application may be achieved, for which details are given in the foregoing embodiment and are not described herein again.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The computer instructions are read by a processor of the computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the methods provided in the various alternative implementations of the user type identification aspect described above.
The user type identification method, device, computer device, storage medium and system provided by the embodiments of the present application are introduced in detail above, and a specific example is applied in the present application to explain the principle and implementation manner of the present application, and the description of the above embodiments is only used to help understanding the method and core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (13)

1. A method for identifying a user type, comprising:
generating a sample space of a sample user set, wherein the sample user set comprises at least one sample user, the sample space comprises user features of each of the sample users in at least one feature dimension;
determining a target user set from the sample user set, wherein the target user set comprises at least one target sample user;
based on the distribution of the target user set in each feature dimension, performing space removal processing on the sample space to obtain a target sample space, wherein the distribution information of the sample user set in the target sample space meets a preset distribution condition;
determining a user type identification rule based on the characteristic value range of each characteristic dimension in the target sample space;
and identifying the user type of the target user based on the user type identification rule.
2. The method for identifying user types according to claim 1, wherein performing spatial removal processing on the sample space based on distribution of the target user set in each of the feature dimensions to obtain a target sample space includes:
based on the distribution of the target user set under each feature dimension, carrying out space removal processing on the sample space to obtain a removed sample space;
acquiring distribution information of the sample user set in the removed sample space;
and if the distribution information meets the preset distribution condition, taking the removed sample space as a target sample space.
3. The method of claim 2, further comprising:
and if the distribution information does not meet the preset distribution condition, updating the sample space into the removed sample space, returning to execute the distribution of the target user set based on each characteristic dimension, and performing space removal processing on the sample space.
4. The method for identifying user types according to claim 2, wherein performing spatial removal processing on the sample space based on distribution of the target user set in each of the feature dimensions to obtain a removed sample space includes:
determining a subspace to be removed corresponding to each characteristic dimension based on the distribution of the sample user set under each characteristic dimension, wherein the subspace to be removed comprises each characteristic dimension;
determining a target removal subspace from each subspace to be removed based on the distribution of the target user set in each subspace to be removed;
and aiming at the target removal subspace, carrying out space removal processing on the sample space to obtain a removed sample space.
5. The method according to claim 4, wherein determining the subspace to be removed corresponding to each of the feature dimensions based on the distribution of the sample user set in each of the feature dimensions comprises:
based on the distribution of the sample user set under each characteristic dimension, sorting the sample users in the sample user set to obtain sorting results corresponding to each characteristic dimension;
according to the sorting result, selecting users to be removed of each characteristic dimension from the sample user set;
and determining the subspace to be removed corresponding to each feature dimension based on the user to be removed.
6. The method according to claim 4, wherein determining a target removal subspace from each subspace to be removed based on the distribution of the target user set in each subspace to be removed comprises:
calculating a space removal contribution degree of each subspace to be removed to the sample space based on the distribution of the target user set in each subspace to be removed, wherein the space removal contribution degree characterizes the distribution characteristics of the target user set in the removed sample space after the subspace to be removed is removed from the sample space;
and determining a target removal subspace from the subspace to be removed according to the space removal contribution degree.
7. The method according to claim 4, wherein the performing spatial elimination processing on the sample space for the target elimination subspace to obtain an eliminated sample space includes:
acquiring attribute information of the target removal subspace on at least one characteristic dimension;
and according to the attribute information, carrying out space removal processing on the sample space to obtain a removed sample space.
8. The method according to claim 1, wherein determining a user type identification rule based on the feature value range of each feature dimension in the target sample space comprises:
determining a feature identification rule corresponding to each feature dimension based on a feature value range of each feature dimension in the target sample space;
and combining the characteristic identification rules to obtain a combined user type identification rule.
9. The method according to claim 1, wherein identifying the user type of the target user based on the user type identification rule comprises:
acquiring user characteristics of the target user under at least one characteristic dimension;
based on the user type identification rule, carrying out feature identification on the user features to obtain an identification result;
and determining the user type of the target user based on the identification result.
10. The method of claim 1, wherein determining a target user set from the sample user set comprises:
acquiring a real label of each sample user in the sample user set;
determining a target sample user from the sample user set according to the real label;
based on the target sample users, a target user set is determined.
11. A user type identification device, comprising:
a generating unit, configured to generate a sample space of a sample user set, wherein the sample user set includes at least one sample user, and the sample space includes user features of each sample user in at least one feature dimension;
a target determination unit, configured to determine a target user set from the sample user set, wherein the target user set includes at least one target sample user;
a removing unit, configured to perform spatial removal processing on the sample space based on distribution of the target user set in each feature dimension to obtain a target sample space, where distribution information of the sample user set in the target sample space meets a preset distribution condition;
a rule determining unit, configured to determine a user type identification rule based on a feature value range of each feature dimension in the target sample space;
and the identification unit is used for identifying the user type of the target user based on the user type identification rule.
12. An electronic device comprising a memory and a processor; the memory stores an application program, and the processor is configured to execute the application program in the memory to perform the operations of the user type identification method according to any one of claims 1 to 10.
13. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the method for identifying a user type according to any one of claims 1 to 10.
CN202110336507.4A 2021-03-29 2021-03-29 User type identification method and device, computer equipment and storage medium Pending CN115131025A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110336507.4A CN115131025A (en) 2021-03-29 2021-03-29 User type identification method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110336507.4A CN115131025A (en) 2021-03-29 2021-03-29 User type identification method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115131025A true CN115131025A (en) 2022-09-30

Family

ID=83375421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110336507.4A Pending CN115131025A (en) 2021-03-29 2021-03-29 User type identification method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115131025A (en)

Similar Documents

Publication Publication Date Title
WO2020249125A1 (en) Method and system for automatically training machine learning model
Chittilappilly et al. A survey of general-purpose crowdsourcing techniques
Negahban et al. Agent-based simulation applications in marketing research: an integrated review
Vieira et al. How banks can better serve their customers through artificial techniques
CN109766454A (en) A kind of investor's classification method, device, equipment and medium
Hayyolalam et al. Single‐objective service composition methods in cloud manufacturing systems: Recent techniques, classification, and future trends
CN110705719A (en) Method and apparatus for performing automatic machine learning
CN110751286A (en) Training method and training system of neural network model
CN113609345B (en) Target object association method and device, computing equipment and storage medium
CN112070310A (en) Loss user prediction method and device based on artificial intelligence and electronic equipment
EP4283511A1 (en) Information processing method and device, server and user equipment
CN111008335A (en) Information processing method, device, equipment and storage medium
Li et al. Explain graph neural networks to understand weighted graph features in node classification
CN109345201A (en) Human Resources Management Method, device, electronic equipment and storage medium
CN113011911B (en) Data prediction method and device based on artificial intelligence, medium and electronic equipment
CN114330476A (en) Model training method for media content recognition and media content recognition method
Kumar et al. Opinion Mining of Saubhagya Yojna for Digital India
CN112364102A (en) Block chain-based big data transaction method, device, medium and equipment
Li et al. What Will Be Popular Next? Predicting Hotspots in Two-Mode Social Networks.
CN115131025A (en) User type identification method and device, computer equipment and storage medium
CN111079992A (en) Data processing method, device and storage medium
Huang Information dissemination control algorithm of ecological changes in the new media communication environment
CN111752985A (en) Method, device and storage medium for generating main portrait
Balani et al. CSIMH: Design of an Efficient Security-Aware Customized Sidechaining Model via Iterative Meta-Heuristics
US20240184813A1 (en) Characterization for erroneous artificial intelligence outputs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination