CN111178421B - Method, device, medium and electronic equipment for detecting user state - Google Patents

Method, device, medium and electronic equipment for detecting user state Download PDF

Info

Publication number
CN111178421B
CN111178421B CN201911352487.9A CN201911352487A CN111178421B CN 111178421 B CN111178421 B CN 111178421B CN 201911352487 A CN201911352487 A CN 201911352487A CN 111178421 B CN111178421 B CN 111178421B
Authority
CN
China
Prior art keywords
user
state
cluster
users
business
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911352487.9A
Other languages
Chinese (zh)
Other versions
CN111178421A (en
Inventor
李嘉晨
郭凯
付东东
刘雷
刘洋
胡磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beike Technology Co Ltd
Original Assignee
Beike Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beike Technology Co Ltd filed Critical Beike Technology Co Ltd
Priority to CN201911352487.9A priority Critical patent/CN111178421B/en
Publication of CN111178421A publication Critical patent/CN111178421A/en
Application granted granted Critical
Publication of CN111178421B publication Critical patent/CN111178421B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features

Abstract

A method, apparatus, medium and electronic device for detecting a user state are disclosed. The method comprises the following steps: acquiring respective business behavior characteristic information of a plurality of users to be detected according to business data of the plurality of users to be detected; clustering the business behavior characteristic information of each of the plurality of users to be detected to obtain at least one cluster; acquiring a cluster representative user of the cluster; determining the user state of the cluster representing the user according to the service behavior characteristic information of the cluster representing the user; and determining the user state of each user to be detected in the cluster where the cluster represents the user according to the user state of the cluster representing the user. The technical scheme provided by the disclosure is beneficial to improving the accuracy of the user state detection result.

Description

Method, device, medium and electronic equipment for detecting user state
Technical Field
The present disclosure relates to network technologies, and in particular, to a method for detecting a user state, an apparatus for detecting a user state, a storage medium, and an electronic device.
Background
In some business fields, a business provider typically provides business services to users in the form of APP (application program), web site, client, and the like. For a service provider, each user currently receiving its service is typically in a certain user state in a user lifetime, respectively.
Service providers often have a need to learn the current user status of each user, so as to provide different service for different users.
How to accurately detect the current user state of the user so as to provide proper business service for the user is a technical problem which is worth focusing.
Disclosure of Invention
The present disclosure has been made in order to solve the above technical problems. Embodiments of the present disclosure provide a method for detecting a user state, an apparatus for detecting a user state, a storage medium, and an electronic device.
According to one aspect of the disclosed embodiments, there is provided a method of detecting a user state, the method comprising: acquiring respective business behavior characteristic information of a plurality of users to be detected according to business data of the plurality of users to be detected; clustering the business behavior characteristic information of each of the plurality of users to be detected to obtain at least one cluster; acquiring a cluster representative user of the cluster; determining the user state of the cluster representing the user according to the service behavior characteristic information of the cluster representing the user; and determining the user state of each user to be detected in the cluster where the cluster represents the user according to the user state of the cluster representing the user.
In an embodiment of the present disclosure, the obtaining, according to service data of a plurality of users to be detected, service behavior feature information of the plurality of users to be detected includes: and acquiring at least one of business behavior characteristic information based on time invariant attributes, business behavior characteristic information based on business behavior times and/or business behavior resource consumption statistics and business behavior characteristic information based on time sequence variables of the plurality of users to be detected according to the business data of the plurality of users to be detected.
In yet another embodiment of the present disclosure, the acquiring the cluster of the clusters represents a user, including: screening clusters containing no more than a preset number of cluster nodes from the at least one cluster; and taking the user to be detected corresponding to the central node of the screened cluster as a cluster representative user.
In still another embodiment of the present disclosure, the determining, according to the service behavior feature information of the cluster representing the user, the user state of the cluster representing the user includes: providing the cluster representing the business behavior characteristic information of the user as input of a classifier for predicting the state of the user to the classifier; and determining the user state of the cluster representative user according to the classification prediction result output by the classifier.
In yet another embodiment of the present disclosure, the method further comprises: respectively acquiring service data of seed users in each user state; forming service behavior characteristic information of the seed users in each user state according to the service data of the seed users in each user state; generating a plurality of training samples according to the business behavior characteristic information of the seed users in each user state; training the classifier by using the plurality of training samples; wherein, for any user state, the seed user of the user state is the user in the user state at a historical time.
In still another embodiment of the present disclosure, before the step of separately acquiring service data of the seed users in each user state, the method further includes: according to the preset state mark information or state condition corresponding to each user state, determining the users with the state mark information or service data meeting the state condition in the service data of each user, and taking the determined users as seed users of the corresponding user states.
In still another embodiment of the present disclosure, the forming the service behavior feature information of the seed user in each user state according to the service data of the seed user in each user state includes: and for any sub-user in any user state, acquiring the business behavior characteristic information of the sub-user according to the business data before the state mark information time point or the business data before the time point meeting the state condition in the business data of the sub-user.
In still another embodiment of the present disclosure, the generating a plurality of training samples according to the service behavior feature information of the seed user in each user state includes: and setting a user state label for the business behavior characteristic information of the corresponding seed user according to the marking behavior information or the user state corresponding to the state condition for any state marking information or any state condition, and generating a training sample.
In yet another embodiment of the present disclosure, the training the classifier using the plurality of training samples includes: respectively providing training samples corresponding to each user state to the classifier; according to the difference between the classification prediction result respectively output by the classifier for each training sample and the user state label of the corresponding training sample, adjusting the model parameters of the classifier; wherein the number of training samples corresponding to each user state provided to the classifier is the same.
According to another aspect of an embodiment of the present disclosure, there is provided an apparatus for detecting a user state, the apparatus including: the information acquisition module is used for acquiring the respective service behavior characteristic information of the plurality of users to be detected according to the service data of the plurality of users to be detected; the clustering processing module is used for carrying out clustering processing on the business behavior characteristic information of each of the plurality of users to be detected to obtain at least one cluster; the acquisition representative user module is used for acquiring a cluster representative user of the cluster; the first determining state module is used for determining the user state of the cluster representing the user according to the service behavior characteristic information of the cluster representing the user; and the second determining state module is used for determining the user state of each user to be detected in the cluster where the cluster represents the user according to the user state of the cluster representing the user.
In an embodiment of the disclosure, the information acquisition module is further configured to: and acquiring at least one of business behavior characteristic information based on time invariant attributes, business behavior characteristic information based on business behavior times and/or business behavior resource consumption statistics and business behavior characteristic information based on time sequence variables of the plurality of users to be detected according to the business data of the plurality of users to be detected.
In yet another embodiment of the present disclosure, the acquisition representative user module is further configured to: screening clusters containing no more than a preset number of cluster nodes from the at least one cluster; and taking the user to be detected corresponding to the central node of the screened cluster as a cluster representative user.
In yet another embodiment of the present disclosure, the first determination state module is further configured to: providing the cluster representing the business behavior characteristic information of the user as input of a classifier for predicting the state of the user to the classifier; and determining the user state of the cluster representative user according to the classification prediction result output by the classifier.
In yet another embodiment of the present disclosure, the apparatus further comprises: training module for: respectively acquiring service data of seed users in each user state; forming service behavior characteristic information of the seed users in each user state according to the service data of the seed users in each user state; generating a plurality of training samples according to the business behavior characteristic information of the seed users in each user state; training the classifier by using the plurality of training samples; wherein, for any user state, the seed user of the user state is the user in the user state at a historical time.
In yet another embodiment of the present disclosure, the apparatus further comprises: and the seed user determining module is used for determining the users of which the service data of each user contains the state mark information or the service data meeting the state condition according to the state mark information or the state condition corresponding to each preset user state, and taking the determined users as seed users of the corresponding user states.
In yet another embodiment of the present disclosure, the training module is further to: and for any sub-user in any user state, acquiring the business behavior characteristic information of the sub-user according to the business data before the state mark information time point or the business data before the time point meeting the state condition in the business data of the sub-user.
In yet another embodiment of the present disclosure, the training module is further to: and setting a user state label for the business behavior characteristic information of the corresponding seed user according to the marking behavior information or the user state corresponding to the state condition for any state marking information or any state condition, and generating a training sample.
In yet another embodiment of the present disclosure, the training module is further to: respectively providing training samples corresponding to each user state to the classifier; according to the difference between the classification prediction result respectively output by the classifier for each training sample and the user state label of the corresponding training sample, adjusting the model parameters of the classifier; wherein the number of training samples corresponding to each user state provided to the classifier is the same.
According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for performing the above-described method of detecting a user state.
According to still another aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; the processor is configured to read the executable instructions from the memory and execute the instructions to implement the method for detecting a user state described above.
Based on the method and the device for detecting the user state provided by the embodiments of the present disclosure, by performing clustering processing on the service behavior feature information of each of a plurality of users to be detected, users to be detected with service behavior feature information with small changes (such as small day changes) can be gathered together; the user state of the cluster representative user in the cluster is obtained through clustering and is used as the user state of each user to be detected in the cluster where the cluster representative user is located, so that the phenomenon that the user states of two users to be detected are changed due to the tiny change of the business behavior characteristic information of the two users to be detected belonging to the same cluster can be avoided, and the technical scheme of the present disclosure is favorable for having the resistance of the tiny change of the business behavior characteristic information on the user state. Therefore, the technical scheme provided by the disclosure is beneficial to improving the accuracy of the user state detection result.
The technical scheme of the present disclosure is described in further detail below through the accompanying drawings and examples.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The disclosure may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a schematic diagram of one embodiment of a suitable scenario of the present disclosure;
FIG. 2 is a flow chart of one embodiment of a method of detecting a user status of the present disclosure;
FIG. 3 is a schematic diagram of one embodiment of a cluster obtained by the clustering process of the present disclosure;
FIG. 4 is a flow chart of one embodiment of a training classifier of the present disclosure;
FIG. 5 is a schematic diagram illustrating the structure of an embodiment of an apparatus for detecting a user status of the present disclosure;
fig. 6 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
Example embodiments according to the present disclosure will be described in detail below with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present disclosure and not all of the embodiments of the present disclosure, and that the present disclosure is not limited by the example embodiments described herein.
It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.
It will be appreciated by those of skill in the art that the terms "first," "second," etc. in embodiments of the present disclosure are used merely to distinguish between different steps, devices or modules, etc., and do not represent any particular technical meaning nor necessarily logical order between them.
It should also be understood that in embodiments of the present disclosure, "plurality" may refer to two or more, and "at least one" may refer to one, two or more.
It should also be appreciated that any component, data, or structure referred to in the presently disclosed embodiments may be generally understood as one or more without explicit limitation or the contrary in the context.
In addition, the term "and/or" in this disclosure is merely an association relationship describing an association object, and indicates that three relationships may exist, such as a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the front and rear association objects are an or relationship.
It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and that the same or similar features may be referred to each other, and for brevity, will not be described in detail.
Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Embodiments of the present disclosure are applicable to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with the terminal device, computer system, or server, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment. In a distributed cloud computing environment, tasks may be performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.
Summary of the disclosure
In the process of implementing the present disclosure, the inventor finds that in the process of detecting the user state, there may exist two different users with very similar service behavior feature information, and the detected user states are different. I.e. small changes in business behavior feature information (e.g. small changes in days, etc.), may lead to changes in user status. This phenomenon adversely affects the accuracy of user status detection.
In the process of detecting the user state, if the phenomenon of user state change caused by small change of the service behavior characteristic information can be avoided, the accuracy of the user state detection result is improved, and therefore the service provider can provide more proper service for the user.
Exemplary overview
The technology for detecting a user state provided by the present disclosure is applied to an example in the field of real estate, as shown in fig. 1.
In fig. 1, a house renting service provider provides house renting service to each user through a server 102. Suppose that an APP for house renting is installed in the smart mobile phone 100 of the user 101. When the user 101 has a house renting requirement, the APP in the smart mobile phone 100 is opened, the server 102 of the house renting service provider pushes the main page of the APP to the smart mobile phone 100, and then the server 102 of the house renting service provider can push corresponding page information to the smart mobile phone 100 according to clicking or screen sliding operation of the user 101. The server 102 of the service provider can obtain the service behavior characteristic information of the user 101 according to the historical operation log of the user 101 using the APP, and determine the current user state of the user 101 by utilizing the service behavior characteristic information of the user 101, so as to provide more accurate house renting service for the user 101, and finally be beneficial to realizing the purpose of recommending houses meeting the requirements of the user 101.
Exemplary method
FIG. 2 is a flow chart of one embodiment of a method of detecting a user status of the present disclosure. The method of the embodiment shown in fig. 2 comprises the steps of: s200, S201, S202, S203, and S204. The steps are described separately below.
S200, acquiring service behavior characteristic information of a plurality of users to be detected according to service data of the plurality of users to be detected.
The user to be detected in the present disclosure may refer to a user who needs to perform user status detection. The service data of the user to be detected in the present disclosure may include: information formed based on various actions performed by the user for a service provided by a service provider. The various actions performed by the user may include: an on-line action or an off-line action. An online action may refer to an action (e.g., browsing, leaving a message, logging in, etc.) that is formed based on a user's network access operation. Offline actions may refer to physical actions of a user (e.g., seeing a room in the field or an incoming call consultation or a store consultation, etc.). The offline actions of a user may be entered in the corresponding system by a worker of the service provider (e.g., a property broker, etc.), thereby forming the user's service data.
In one example, the service data of a user to be detected may include: all records in the service provider's log that are relevant to the user to be detected. The service behavior characteristic information of the user to be detected in the present disclosure may refer to information for describing a behavior of the user to be detected related to a service provided by a service provider.
The method and the device can acquire the service data of the user who performs the corresponding action in the last time period (such as the last 30 days) from the data warehouse of the service provider, and the acquired service data is the service data of the user to be detected. In addition, the present disclosure may also obtain service data of the user to be detected in other manners. For example, the service data of each user to be detected is obtained from the data warehouse of the service provider by means of searching (such as searching by using the identification of the user to be detected, etc.).
For any user to be detected, the method and the device can obtain the business behavior characteristic information of the user to be detected by searching the corresponding field in the business data of the user to be detected, and performing calculation, statistics summarization (such as date calculation, frequency statistics, flow summarization and the like) and the like on the searched corresponding field. The content specifically included in the service behavior characteristic information of the user to be detected can be set according to the actual service condition.
S201, clustering processing is carried out on the business behavior characteristic information of a plurality of users to be detected, and at least one cluster is obtained.
The method and the device can adopt a clustering algorithm based on density to perform clustering processing on the business behavior characteristic information of a plurality of users to be detected, and at least one cluster is obtained. For example, the present disclosure may employ a DBSCAN (DensityBased Spatial Clustering of Applications with Noise, density-based spatial clustering with noise application) algorithm to cluster traffic behavior feature information of a plurality of users to be detected, thereby obtaining one or more clusters. The distances between all nodes within each cluster and the cluster center node are typically less than a preset distance threshold. For example, the present disclosure may perform a clustering operation according to a preset minimum number of nodes in a cluster and a super parameter of a clustering algorithm such as a distance threshold, so as to form a cluster, where distances between all nodes in the formed cluster and a cluster center node are smaller than the preset distance threshold. The distance threshold in the super parameter can be set smaller, and the minimum node number in the cluster in the super parameter can be set larger, so that each cluster obtained by the clustering process can be a high-density cluster with compact distribution of all nodes in the cluster.
One cluster obtained by the clustering process of the present disclosure is shown in fig. 3. The nodes within circle 300 in fig. 3 form a cluster, and the black node 301 therein is the cluster center node.
S202, acquiring clusters of the clusters to represent users.
Each node in a cluster in the present disclosure corresponds to one user, and different nodes correspond to different users. A cluster representative of one cluster firstly has to correspond to one node in the cluster, and secondly, the node corresponding to the cluster representative of one cluster should be located at the central position of the cluster as much as possible. For example, the present disclosure may use a user corresponding to a cluster center node of one cluster as a cluster representative user of the cluster. For another example, the present disclosure may also use a user in a cluster corresponding to a node closest to a central node of the cluster as a cluster representative user of the cluster.
In general, one cluster has one cluster representing a user. Of course, the present disclosure does not exclude the case where one cluster has two or three or more clusters representing users. In addition, the method and the device can record the corresponding relation among the user identification of the cluster representative user, the cluster identification of the cluster representative user and the user identifications of the users to be detected corresponding to other nodes of the cluster representative user, so that in the subsequent steps, the user states of the corresponding users to be detected can be set according to the user states of the corresponding users to be detected.
S203, determining the user state of the cluster representing the user according to the service behavior characteristic information of the cluster representing the user.
The user life cycle can be set according to the actual service condition, and the user life cycle can be a non-closed-loop user life cycle. A user lifecycle comprises a plurality of user states, i.e. the plurality of user states form a user lifecycle.
Each user state in the disclosure has a corresponding feature, and when the feature of the business behavior feature information of one user is similar to the feature corresponding to a certain user state, the user can be considered to be in the corresponding user state. The method and the device can determine the user state of the cluster representing the user by performing classification detection and other processes on the business behavior characteristic information of the cluster representing the user.
In the case that one cluster has two or three or more clusters representing users, the present disclosure may randomly select one cluster representing user from a plurality of cluster representing users, and determine a user state of the cluster representing user using service behavior feature information of the randomly selected cluster representing user. The present disclosure may also determine, according to service feature information of each cluster representing a user, a user state of each cluster representing a user.
S204, determining the user state of each user to be detected in the cluster where the cluster represents the user according to the user state of the cluster representing the user.
One cluster in the present disclosure corresponds to only one cluster on behalf of a user, and users corresponding to all nodes in the cluster include: the cluster represents the user and the non-cluster represents the user, and the cluster represents the user and the non-cluster represents the user and belongs to the user to be detected. All users corresponding to all nodes in a cluster in this disclosure typically have one and the same user state.
In the case where a cluster has one cluster representing a user, the present disclosure may regard the user status of the cluster representing the user in the cluster as the user status of all non-clusters representing the user in the cluster.
In the case that one cluster has two or three or more clusters representing users, the present disclosure may compare the credibility of the user states of the plurality of clusters representing users, and use the user state with the highest credibility among the user states of the plurality of clusters representing users as the user state of all users to be detected corresponding to all nodes in the cluster where the clusters representing users are located. For example, assuming that one cluster has three clusters representing users, the present disclosure determines that the user state of the first cluster representing the user is user state 1 and the reliability is 0.7, the user state of the second cluster representing the user is user state 2 and the reliability is 0.65, and the user state of the third cluster representing the user is user state 3 and the reliability is 0.8 according to the service behavior information of the three clusters representing the users; in the above case, the present disclosure may use the user state 3 as the user state of all to-be-detected users corresponding to each node in the cluster where the cluster representative user is located.
In the case where one cluster has two or three or more clusters representing users, the present disclosure may also use most of the user states as the user states of all users to be detected corresponding to all nodes in the cluster where the cluster representing users are located. For example, assuming that one cluster has three clusters representing users, two of the three clusters representing users have a user state 1, and another cluster representing a user has a user state 2, since the user state 1 is a majority of user states, the present disclosure may use the user state 1 as the user state of all users to be detected corresponding to all nodes in the cluster where the cluster representing users are located.
Under the condition that the corresponding relation among the user identification of the cluster representative user, the cluster identification of the cluster where the cluster representative user is located and the user identifications of the users to be detected corresponding to other nodes of the cluster where the cluster representative user is located is recorded in the disclosure, the user identifications of the users to be detected corresponding to other nodes in the corresponding cluster can be obtained according to the user identifications of the cluster representative user and the cluster identifications, and user states are set for the users to be detected. Meanwhile, the method and the device can record the corresponding relation among the user identification of the cluster representative user, the cluster identification of the cluster where the cluster representative user is located, the user identifications of the users to be detected, which are respectively corresponding to other nodes of the cluster where the cluster representative user is located, and the user states.
The method and the device can enable the users to be detected with small changes (such as small days) of business behavior characteristic information to be gathered together by clustering the business behavior characteristic information of each of the users to be detected; the user state of the cluster representative user in the cluster is obtained through clustering and is used as the user state of each user to be detected in the cluster where the cluster representative user is located, so that the phenomenon that the user states of two users to be detected are changed due to the tiny change of the business behavior characteristic information of the two users to be detected belonging to the same cluster can be avoided, and the technical scheme of the present disclosure is favorable for having the resistance of the tiny change of the business behavior characteristic information on the user state. Therefore, the technical scheme provided by the disclosure is beneficial to improving the accuracy of the user state detection result.
In an optional example, the disclosure may obtain one or more of business behavior feature information based on time invariant attribute, business behavior feature information based on business behavior times and/or business behavior resource consumption statistics, and business behavior feature information based on time sequence variables of a user to be detected by performing calculation, statistical summary (such as date calculation, times statistics, flow summary, etc.) and the like on corresponding fields in the business data of the user to be detected. The content specifically included in the service behavior feature information of the user to be detected may be set according to an actual service condition, which is only illustrated by way of example.
Optionally, the business behavior characteristic information of the user to be detected based on the time invariant attribute may include: in a time period, the user side software and hardware basic condition information of the service data is generated based on the operation of various actions executed by the user to be detected, and the software and hardware basic condition information is not changed with time in the time period. For example, the business behavior feature information based on the time invariant property may include: the type of terminal equipment used by the user to be detected when performing network access (such as the model of a smart mobile phone), the source channel of the application program used by the user to be detected (such as the download source of the APP of the user to be detected), and the like.
Optionally, the business behavior feature information of the user to be detected based on the business behavior times and/or the business behavior resource consumption statistics may refer to: and respectively carrying out statistical processing on a plurality of fields related to the business behavior in the business data of the user to be detected according to the actual requirements of the specific business to obtain a result. For example, for the property field, the business behavior feature information of the user to be detected based on the business behavior times and/or the business behavior resource consumption statistics of the present disclosure may include: the PV (Page View) amount of the user, the Page View time of the user, the number of times the user looks at the near-N-day, the number of times the user entrusts the near-N-day, the number of times the user generates opportunities for business, the number of times the user generates business for the near-N-day, and the like.
Optionally, the time sequence variable-based service behavior characteristic information of the user to be detected may refer to: and on the time coordinates, reflecting the condition that the user to be detected executes corresponding actions. For example, for the property field, the time-series variable-based business behavior feature information of the user to be detected of the present disclosure may include: the last time the user accessed the network/field, the first time the user took the house, etc.
According to the method and the device, the business behavior characteristic information based on the time invariant attribute, the business behavior characteristic information based on the business behavior times and/or the business behavior resource consumption statistics and the business behavior characteristic information based on the time sequence variable of the user to be detected are obtained through corresponding processing of the business data of the user to be detected, so that the business behavior characteristics of the user to be detected can be reflected from multiple angles, and the accuracy of clustering processing is improved.
In an alternative example, the present disclosure may screen all clusters obtained by the clustering process, and select only clusters from the screened satisfactory clusters to represent users. The requirement for screening in the present disclosure may be a requirement set for the number of cluster nodes included in a cluster.
Optionally, the present disclosure may screen out clusters including no more than a predetermined number (e.g., 100) of cluster nodes from all clusters obtained by the clustering process, where the screened clusters may be referred to as high-density small clusters, and the high-density small clusters may also be referred to as high-density small clusters; then, the user to be detected corresponding to the central node of the screened high-density small cluster can be used as a cluster representative user. Of course, the present disclosure may also use the central node in the screened high-density small cluster and n (e.g., n=1 or 2) nodes nearest to the central node as the to-be-detected users respectively. In addition, the present disclosure may use n (e.g., n=1 or 2, etc.) nodes closest to the central node in the screened high-density small cluster as the cluster representative users to be detected. For clusters that contain more than a predetermined number of cluster nodes, the present disclosure may not perform operations on them to pick the cluster representative of the user.
The method and the device are beneficial to enabling the business behavior characteristic information of all users to be detected belonging to the same cluster to have only tiny change by limiting the number of the cluster nodes contained in the cluster, so that the phenomenon that the business behavior characteristic information of different users to be detected possibly exists when the number of the cluster nodes contained in one cluster is too large is avoided, and further the accuracy of the user state detection result is improved.
In one optional example, the present disclosure may utilize a pre-set classifier to determine a user state that the cluster represents the user. Specifically, the present disclosure may provide the service behavior characteristic information of the cluster representative user to the classifier as input of the classifier for predicting the state of the user, so as to perform a state classification prediction process on the service behavior characteristic information of the cluster representative user via the classifier, thereby obtaining a classification prediction result output by the classifier, and then, the present disclosure may determine the state of the user of the cluster representative user according to the classification prediction result output by the classifier.
Alternatively, the classifier of the present disclosure may be a multi-class classifier based on machine learning. For example, the classifier may be an XGboost-based multi-class classifier. The types of user states that can be predicted by the classifier of the present disclosure are generally related to the training of the classifier, and the training process of the classifier can be seen in the following description with respect to fig. 4.
Optionally, the classification prediction result output by the classifier of the present disclosure may be: the respective confidence levels (i.e., confidence levels) for all user states. For example, assuming that all preset user states include three user states, namely, user state 1, user state 2 and user state 3, the classifier outputs three credibility for the business behavior feature information of the cluster representative user inputted, the first credibility represents the probability that the cluster representative user is in user state 1, the second credibility represents the probability that the cluster representative user is in user state 2, and the third credibility represents the probability that the cluster representative user is in user state 3. The main objective of the training of the classifier in the present disclosure is to optimize the reliability of the classifier output.
The user state prediction processing is carried out on the user state by utilizing the classifier, so that the user state of the cluster representative user can be conveniently and accurately obtained.
In an alternative example, one example of the disclosure training a classifier is shown in fig. 4.
In fig. 4, S400, service data of seed users in each user state is acquired respectively.
Alternatively, in the event that the business data of a user indicates that the user was in a user state, the present disclosure may consider the user as a seed user for the user state. That is, for any user state, the seed user for that user state is the user in that user state for a historical time.
Optionally, the present disclosure may determine the seed user of each user state by using preset state flag information or state conditions corresponding to each user state. For example, for any user, the disclosure may determine whether the service data of the user includes status flag information or whether related information in the service data of the user meets a status condition, and if the service data of the user includes status flag information of a user status or related information in the service data of the user meets a status condition, the disclosure may use the user as a seed user of the status flag information or the user status corresponding to the status condition. The state labeling information can be a specific value of a field content in the service data, etc. The status condition may be that the actual value of the corresponding field (e.g., time field, etc.) in the service data satisfies the corresponding condition (e.g., time condition), etc. The status flag information and status conditions should be set according to actual traffic conditions, which is not limited by the present disclosure.
Alternatively, for the field of real estate, it is assumed that all user states in the present disclosure may include: an on-line lead-in period, a silent period, an on-line active period, an on-line maturation period, an off-line lead-in period, an off-line active period, and an off-line maturation period. If the service data of a user includes information characterizing that the user accesses a house renting service through an APP or the like for the first time within a predetermined time range (e.g., 180 days), the user may be used as a seed user for an online lead-in period. If the business data of a user can indicate that the time (e.g., days) from the last time the user accessed the house renting business through the APP or other tool to the current time exceeds a predetermined time interval (e.g., 14 days, etc.), the user can be used as a seed user for the silent period. If the business data of a user can indicate that the user has performed business activities (e.g., access to a house renting business through an APP or the like, or access in the field, etc.) within a predetermined time frame (e.g., the previous week, etc.) prior to the current time, the user can be used as a seed user for the online active period. The user may jump from an online lead-in period to an online active period. If the business data of a user includes information that characterizes that the user has first occurred an online delegation activity (e.g., the online activity of a user makes a property broker a dedicated server for the user), the user may be an online maturity seed user. The user may jump from an online active period to an online mature period. If the business data of a user includes information that characterizes the first occurrence of an offline delegation activity for the user (e.g., a property broker sets a user as its exclusive user), the user may be a seed user for an offline lead-in period. The user may jump from an online active period to an offline lead-in period. If the business data of a user includes information for representing that the user first takes place to view the behavior of the house source in the field, the user can become a seed user for the offline active period. The user may jump from an online maturation period or an offline introduction period to an offline active period. If the business data of a user includes information that characterizes the user's occurrence of a house source renting and selling achievement, the user may become a seed user for an offline maturity. The user may jump from an offline active period to an offline maturation period. In addition, the user may jump from the silent period to an on-line active period or an off-line lead-in period. The above seven user states, the seed users of the seven user states, and the hops between the seven user states are just examples, and the user states, and the seed users of each user state, and the hops between the user states may be set according to actual service conditions, which is not limited in this disclosure.
Optionally, the present disclosure may first obtain service data of a plurality of users from a data warehouse of a service provider, and determine whether each user may become a seed user of a corresponding user state by using state flag information or state conditions of each user state, so that service data of various seed users may be obtained. The training sample forming method and device are beneficial to forming training samples rapidly and accurately by setting seed users in various user states.
In addition, it should be specifically noted that the service data collection time window of each seed user in each user state is usually different, for example, in the collection time window with the same duration, the number of seed users in some user states is greater, and the number of seed users in some user states (such as offline maturity) is less, which may result in fewer training samples in some user states, and if the classifier is more sensitive to the imbalance of the number of samples in different classes, the training effect of the classifier may be greatly affected. The service data acquisition time windows of seed users in different user states are different, for example, the service data acquisition time window of the offline maturity is longer than the service data acquisition time windows of other user states, so that fewer training samples in part of user states are avoided, and the training effect of the classifier is improved.
S401, forming service behavior characteristic information of the seed users in each user state according to service data of the seed users in each user state.
For any sub-user in any user state, the present disclosure may obtain the service behavior feature information of the sub-user according to the service data located before the state flag information time point in the service data of the sub-user. The method and the device can acquire the service behavior characteristic information of the seed user according to the service data, which is positioned before the time point meeting the state condition, in the service data of the seed user. Under the condition that the service data of one user indicates that the user is in a plurality of user states in sequence, the service data between the time points of two user states can form service behavior characteristic information of a seed user in the latter user state.
Optionally, for any seed user, the disclosure may obtain the service behavior feature information of the seed user by searching a corresponding field in the service data of the seed user, and performing calculation, statistics and summarization (such as date calculation, frequency statistics and flow summarization) and the like on the searched corresponding field. The content specifically included in the service behavior characteristic information of the seed user can be set according to the actual service condition. For example, the present disclosure may obtain the business behavior feature information of the seed user based on the time-invariant attribute, the business behavior feature information based on the business behavior times and/or the business behavior resource consumption statistics, and the business behavior feature information based on the time-series variable by performing computation, statistics summary (such as date computation, times statistics, traffic summary, etc.) and the like on the corresponding fields in the business data of the seed user.
Optionally, the business behavior feature information of the seed user based on the time invariant property may include: in a time period, the user side software and hardware basic condition information of the service data is generated based on the operations of various actions executed by the seed user, and the software and hardware basic condition information is not changed with time in the time period. For example, the business behavior feature information based on the time invariant property may include: the type of terminal equipment used by the seed user when performing network access (such as the model of a smart mobile phone, etc.), the source channel of the application program used by the seed user (such as the download source of the APP of the seed user, etc.), and the like.
Optionally, the business behavior feature information of the seed user based on the business behavior times and/or the business behavior resource consumption statistics may refer to: and respectively carrying out statistical processing on a plurality of fields related to the business behavior in the business data of the seed user according to the actual requirements of the specific business to obtain a result. For example, for the property field, business behavior feature information of the seed user of the present disclosure based on business behavior times and/or business behavior resource consumption statistics may include: the PV amount of the seed users, the page browsing time of the seed users, the number of times of seeing houses of the seed users in the near N days, the number of entrustment times of the seed users in the near N days, the number of times of opportunities of business opportunities generated by the seed users in the near N days, the number of opportunities of business opportunities generated by the seed users in the near N days, and the like.
Alternatively, the time sequence variable-based business behavior feature information of the seed user may refer to: and reflecting the situation that the seed user executes corresponding actions on the time coordinates. For example, for the property domain, the time series variable based business behavior feature information of the seed user of the present disclosure may include: the last time the seed user accessed the network/field, the first time the seed user looked at the room, etc.
S402, generating a plurality of training samples according to the business behavior characteristic information of the seed users in each user state.
Optionally, the business behavior feature information of each seed user in the disclosure corresponds to a state labeling information or corresponds to a state condition, and each state labeling information corresponds to a user state, and each state condition corresponds to a user state. The method and the device can determine the user state corresponding to the business behavior characteristic information of the seed user according to the state labeling information or the state condition corresponding to the business behavior characteristic information of the seed user, and can set a corresponding user state label for the business behavior characteristic information of the seed user according to the user state corresponding to the business behavior characteristic information of the seed user, so that the business behavior characteristic information of the seed user and the user state label form a training sample together.
S403, training the classifier by using a plurality of training samples.
Optionally, the present disclosure may provide a plurality of training samples corresponding to each user state to a classifier, and the classifier performs a classification prediction process on each input training sample, and outputs a classification prediction processing result for each input training sample, that is, a probability (may be considered as a confidence level) that each training sample belongs to each user state. The method and the device can calculate the loss according to the difference between the classification prediction result respectively output by the classifier for each training sample and the user state label of the corresponding training sample by using the corresponding loss function, and adjust the model parameters of the classifier by using the calculated loss.
Optionally, the number of training samples that each user state provided to the classifier by the present disclosure corresponds to is the same. For example, in the case where the user states in the present disclosure are the seven user states exemplified above, the number of all training samples provided to the classifier in the present disclosure is 7×m (M is an integer greater than 0, e.g., M is 3 ten thousand), and the number of training samples provided to each user state of the classifier is M.
Of course, the number of training samples provided by the present disclosure to each user state of the classifier may be approximately the same, rather than exactly the same. For example, the number of training samples for any two user states does not differ by more than a predetermined difference. As another example, the ratio of the number of training samples for any two user states is no greater than a predetermined ratio.
Under the condition that the classifier is sensitive to unbalanced quantity of training samples of different categories, the training effect of the classifier is improved by enabling the quantity of the training samples corresponding to each user state provided to the classifier to be identical or approximately identical.
Optionally, the present disclosure may divide the training samples into a training set and a test set, and the present disclosure may train the classifier using the training samples in the training set, and detect the classification effect of the classifier using the training samples in the test set. When the detection result does not meet the requirement, the classifier can be continuously trained by using the training samples in the training set.
It should be noted that, as the number of training samples for training the classifier is not as large as possible, the number of training samples is not so large that when the number of training samples exceeds a certain number, and the classifier cannot learn new knowledge, so the number of training samples should be controlled within a certain number (e.g. 3 ten thousand). Furthermore, the present disclosure may perform clustering processing on training samples to obtain a plurality of clusters, and through performing multidimensional analysis on each training sample in each cluster, a portion of training samples that do not conform to cognition may be removed, and user state labels of a portion of training samples may also be corrected, so as to eliminate an influence of an improper training sample on a training result of a classifier. In addition, the user corresponding to the training sample of the present disclosure may also be used as the user to be detected, that is, the training sample may also be used as the business behavior feature information of the user to be detected.
Exemplary apparatus
Fig. 5 is a schematic structural diagram of an embodiment of an apparatus for detecting a user status of the present disclosure. The apparatus of this embodiment may be used to implement the method embodiments of the present disclosure described above.
As shown in fig. 5, the apparatus of this embodiment may include: an acquisition information module 500, a cluster processing module 501, an acquisition representative user module 502, a first determination status module 503, and a second determination status module 504. Optionally, the apparatus of this embodiment may further include: training module 505 and determine seed user module 506.
The acquiring information module 500 is configured to acquire respective service behavior feature information of a plurality of users to be detected according to service data of the plurality of users to be detected. For example, the acquiring information module 500 may acquire at least one of the business behavior feature information based on the time invariant attribute, the business behavior feature information based on the business behavior times and/or the business behavior resource consumption statistics, and the business behavior feature information based on the time sequence variable of the plurality of users to be detected according to the business data of the plurality of users to be detected.
The clustering module 501 is configured to perform clustering processing on the service behavior feature information of each of the plurality of users to be detected, which is acquired by the acquiring information module 500, to obtain at least one cluster.
The acquisition representative user module 502 is configured to acquire clusters representative of the clusters acquired by the cluster processing module 501. For example, the acquiring representative user module 502 may screen out clusters including no more than a predetermined number of cluster nodes from all clusters obtained by the cluster processing module 501; the obtaining representative user module 502 may use the user to be detected corresponding to the central node of the screened cluster as a cluster representative user.
The first determining state module 503 is configured to determine a user state of the cluster representative user according to the acquired service behavior feature information of the cluster representative user obtained by the representative user module 502. For example, the first determination state module 503 may provide the business behavior feature information of the clusters representing the user to a classifier as an input of the classifier for predicting the user state; the first determining state module 503 may determine a user state in which the cluster represents the user according to the classification prediction result output by the classifier.
The second determining state module 504 is configured to determine, according to the user state of the cluster representing the user determined by the first determining state module 503, a user state of each user to be detected in the cluster where the cluster representing the user is located.
The training module 505 is configured to obtain service data of the seed users in each user state, form service behavior feature information of the seed users in each user state according to the service data of the seed users in each user state, generate a plurality of training samples according to the service behavior feature information of the seed users in each user state, and train the classifier by using the plurality of training samples by the training module 505. Wherein, for any user state, the seed user of the user state is the user in the user state at a historical time.
Optionally, for any sub-user in any user state, the training module 505 may obtain the service behavior feature information of the sub-user according to the service data located before the time point of the state flag information or the service data located before the time point of the state condition in the service data of the sub-user.
Optionally, for any state flag information or any state condition, the training module 506 may set a user state label for the service behavior feature information of the corresponding seed user according to the flag behavior information or the user state corresponding to the state condition, so as to generate a training sample.
Optionally, the training module 506 may provide training samples corresponding to the user states to the classifier respectively, and adjust model parameters of the classifier according to differences between classification prediction results output by the classifier for the training samples and user state labels of the corresponding training samples. Wherein the number of training samples corresponding to each user state provided to the classifier by the training module 506 is the same.
The seed user determining module 506 is configured to determine, according to preset state flag information or state conditions corresponding to each user state, a user whose service data includes state flag information or service data that satisfies the state conditions, where the seed user determining module 506 uses the determined user as a seed user of the corresponding user state.
Exemplary electronic device
An electronic device according to an embodiment of the present disclosure is described below with reference to fig. 6. Fig. 6 shows a block diagram of an electronic device according to an embodiment of the disclosure. As shown in fig. 6, the electronic device 61 includes one or more processors 611 and memory 612.
Processor 611 may be a Central Processing Unit (CPU) or other form of processing unit having the capability to detect user status and/or instruction execution capability, and may control other components in electronic device 61 to perform desired functions.
Memory 612 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example: random Access Memory (RAM) and/or cache, etc. The nonvolatile memory may include, for example: read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by the processor 611 to implement the methods of detecting user status and/or other desired functions of the various embodiments of the present disclosure described above. Various contents such as an input signal, a signal component, a noise component, and the like may also be stored in the computer-readable storage medium.
In one example, the electronic device 61 may further include: input device 613, and output device 614, etc., interconnected by a bus system and/or other forms of connection mechanisms (not shown). In addition, the input device 613 may include, for example, a keyboard, a mouse, and the like. The output device 614 can output various information to the outside. The output devices 614 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.
Of course, only some of the components of the electronic device 61 that are relevant to the present disclosure are shown in fig. 6 for simplicity, components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 61 may include any other suitable components depending on the particular application.
Exemplary computer program product and computer readable storage Medium
In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform steps in a method of detecting a user state according to various embodiments of the present disclosure described in the above "exemplary methods" section of this specification.
The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium, having stored thereon computer program instructions, which when executed by a processor, cause the processor to perform steps in a method of detecting a user state according to various embodiments of the present disclosure described in the above "exemplary method" section of the present disclosure.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium may include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present disclosure have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present disclosure are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, since the disclosure is not necessarily limited to practice with the specific details described.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, so that the same or similar parts between the embodiments are mutually referred to. For system embodiments, the description is relatively simple as it essentially corresponds to method embodiments, and reference should be made to the description of method embodiments for relevant points.
The block diagrams of the devices, apparatuses, devices, systems referred to in this disclosure are merely illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatus, devices, and systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
It is also noted that in the apparatus, devices and methods of the present disclosure, components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered equivalent to the present disclosure.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects, and the like, will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, changes, additions, and sub-combinations thereof.

Claims (18)

1. A method of detecting a user state, comprising:
acquiring respective business behavior characteristic information of a plurality of users to be detected according to business data of the plurality of users to be detected;
clustering the business behavior characteristic information of each of the plurality of users to be detected to obtain at least one cluster;
acquiring a cluster representative user of the cluster;
determining the user state of the cluster representing the user according to the service behavior characteristic information of the cluster representing the user;
determining the user state of each user to be detected in the cluster where the cluster represents the user according to the user state of the cluster representing the user;
wherein the acquiring the cluster of the clusters represents a user, comprises: screening clusters containing no more than a preset number of cluster nodes from the at least one cluster; and taking the user to be detected corresponding to the central node of the screened cluster as a cluster representative user.
2. The method of claim 1, wherein the obtaining the service behavior feature information of the plurality of users to be detected according to the service data of the plurality of users to be detected comprises:
and acquiring at least one of business behavior characteristic information based on time invariant attributes, business behavior characteristic information based on business behavior times and/or business behavior resource consumption statistics and business behavior characteristic information based on time sequence variables of the plurality of users to be detected according to the business data of the plurality of users to be detected.
3. The method according to claim 1 or 2, wherein said determining, from the service behavior feature information of the cluster representing the user, the user state of the cluster representing the user comprises:
providing the cluster representing the business behavior characteristic information of the user as input of a classifier for predicting the state of the user to the classifier;
and determining the user state of the cluster representative user according to the classification prediction result output by the classifier.
4. A method according to claim 3, wherein the method further comprises:
respectively acquiring service data of seed users in each user state;
forming service behavior characteristic information of the seed users in each user state according to the service data of the seed users in each user state;
Generating a plurality of training samples according to the business behavior characteristic information of the seed users in each user state;
training the classifier by using the plurality of training samples;
wherein, for any user state, the seed user of the user state is the user in the user state at a historical time.
5. The method of claim 4, wherein the method further comprises, before the step of separately acquiring service data of the seed user of each user state:
according to the preset state mark information or state condition corresponding to each user state, determining the users with the state mark information or service data meeting the state condition in the service data of each user, and taking the determined users as seed users of the corresponding user states.
6. The method of claim 5, wherein the forming the service behavior feature information of the seed user in each user state according to the service data of the seed user in each user state includes:
and for any sub-user in any user state, acquiring the business behavior characteristic information of the sub-user according to the business data before the state mark information time point or the business data before the time point meeting the state condition in the business data of the sub-user.
7. The method according to claim 5 or 6, wherein the generating a plurality of training samples according to the business behavior feature information of the seed user of each user state includes:
and setting a user state label for the business behavior characteristic information of the corresponding seed user according to the marking behavior information or the user state corresponding to the state condition for any state marking information or any state condition, and generating a training sample.
8. The method of any of claims 4-6, wherein the training the classifier with the plurality of training samples comprises:
respectively providing training samples corresponding to each user state to the classifier;
according to the difference between the classification prediction result respectively output by the classifier for each training sample and the user state label of the corresponding training sample, adjusting the model parameters of the classifier;
wherein the number of training samples corresponding to each user state provided to the classifier is the same.
9. An apparatus for detecting a user status, wherein the apparatus comprises:
the information acquisition module is used for acquiring the respective service behavior characteristic information of the plurality of users to be detected according to the service data of the plurality of users to be detected;
The clustering processing module is used for carrying out clustering processing on the business behavior characteristic information of each of the plurality of users to be detected to obtain at least one cluster;
the acquisition representative user module is used for acquiring a cluster representative user of the cluster;
the first determining state module is used for determining the user state of the cluster representing the user according to the service behavior characteristic information of the cluster representing the user;
the second determining state module is used for determining the user state of each user to be detected in the cluster where the cluster represents the user according to the user state of the cluster representing the user;
the acquisition representative user module is further configured to:
screening clusters containing no more than a preset number of cluster nodes from the at least one cluster;
and taking the user to be detected corresponding to the central node of the screened cluster as a cluster representative user.
10. The apparatus of claim 9, wherein the means for obtaining information is further configured to:
and acquiring at least one of business behavior characteristic information based on time invariant attributes, business behavior characteristic information based on business behavior times and/or business behavior resource consumption statistics and business behavior characteristic information based on time sequence variables of the plurality of users to be detected according to the business data of the plurality of users to be detected.
11. The apparatus of claim 9 or 10, wherein the first determination state module is further to:
providing the cluster representing the business behavior characteristic information of the user as input of a classifier for predicting the state of the user to the classifier;
and determining the user state of the cluster representative user according to the classification prediction result output by the classifier.
12. The apparatus of claim 11, wherein the apparatus further comprises: training module for:
respectively acquiring service data of seed users in each user state;
forming service behavior characteristic information of the seed users in each user state according to the service data of the seed users in each user state;
generating a plurality of training samples according to the business behavior characteristic information of the seed users in each user state;
training the classifier by using the plurality of training samples;
wherein, for any user state, the seed user of the user state is the user in the user state at a historical time.
13. The apparatus of claim 12, wherein the apparatus further comprises:
and the seed user determining module is used for determining the users of which the service data of each user contains the state mark information or the service data meeting the state condition according to the state mark information or the state condition corresponding to each preset user state, and taking the determined users as seed users of the corresponding user states.
14. The apparatus of claim 13, wherein the training module is further to:
and for any sub-user in any user state, acquiring the business behavior characteristic information of the sub-user according to the business data before the state mark information time point or the business data before the time point meeting the state condition in the business data of the sub-user.
15. The apparatus of claim 13 or 14, wherein the training module is further to:
and setting a user state label for the business behavior characteristic information of the corresponding seed user according to the marking behavior information or the user state corresponding to the state condition for any state marking information or any state condition, and generating a training sample.
16. The apparatus of any of claims 12 to 14, wherein the training module is further to:
respectively providing training samples corresponding to each user state to the classifier;
according to the difference between the classification prediction result respectively output by the classifier for each training sample and the user state label of the corresponding training sample, adjusting the model parameters of the classifier;
Wherein the number of training samples corresponding to each user state provided to the classifier is the same.
17. A computer readable storage medium storing a computer program for performing the method of any one of the preceding claims 1-8.
18. An electronic device, the electronic device comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor being configured to read the executable instructions from the memory and execute the instructions to implement the method of any of the preceding claims 1-8.
CN201911352487.9A 2019-12-25 2019-12-25 Method, device, medium and electronic equipment for detecting user state Active CN111178421B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911352487.9A CN111178421B (en) 2019-12-25 2019-12-25 Method, device, medium and electronic equipment for detecting user state

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911352487.9A CN111178421B (en) 2019-12-25 2019-12-25 Method, device, medium and electronic equipment for detecting user state

Publications (2)

Publication Number Publication Date
CN111178421A CN111178421A (en) 2020-05-19
CN111178421B true CN111178421B (en) 2023-10-20

Family

ID=70655666

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911352487.9A Active CN111178421B (en) 2019-12-25 2019-12-25 Method, device, medium and electronic equipment for detecting user state

Country Status (1)

Country Link
CN (1) CN111178421B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115134665B (en) * 2021-03-22 2024-03-01 中国电信股份有限公司 Data processing method and device based on set top box, storage medium and electronic equipment
CN113610175A (en) * 2021-08-16 2021-11-05 上海冰鉴信息科技有限公司 Service strategy generation method and device and computer readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0893894A2 (en) * 1997-07-24 1999-01-27 AT&T Corp. A method for designing sonet ring networks suitable for local access
CN102087576A (en) * 2009-12-04 2011-06-08 索尼公司 Display control method, image user interface, information processing apparatus and information processing method
CN103927309A (en) * 2013-01-14 2014-07-16 阿里巴巴集团控股有限公司 Method and device for marking information labels for business objects
CN106455056A (en) * 2016-11-14 2017-02-22 百度在线网络技术(北京)有限公司 Positioning method and device
CN106529711A (en) * 2016-11-02 2017-03-22 东软集团股份有限公司 Method and apparatus for predicting user behavior
CN106603324A (en) * 2015-10-20 2017-04-26 富士通株式会社 Training set acquisition device and training set acquisition method
CN108710894A (en) * 2018-04-17 2018-10-26 中国科学院软件研究所 A kind of Active Learning mask method and device based on cluster representative point

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0893894A2 (en) * 1997-07-24 1999-01-27 AT&T Corp. A method for designing sonet ring networks suitable for local access
US6061335A (en) * 1997-07-24 2000-05-09 At&T Corp Method for designing SONET ring networks suitable for local access
CN102087576A (en) * 2009-12-04 2011-06-08 索尼公司 Display control method, image user interface, information processing apparatus and information processing method
CN103927309A (en) * 2013-01-14 2014-07-16 阿里巴巴集团控股有限公司 Method and device for marking information labels for business objects
CN106603324A (en) * 2015-10-20 2017-04-26 富士通株式会社 Training set acquisition device and training set acquisition method
CN106529711A (en) * 2016-11-02 2017-03-22 东软集团股份有限公司 Method and apparatus for predicting user behavior
CN106455056A (en) * 2016-11-14 2017-02-22 百度在线网络技术(北京)有限公司 Positioning method and device
CN108710894A (en) * 2018-04-17 2018-10-26 中国科学院软件研究所 A kind of Active Learning mask method and device based on cluster representative point

Also Published As

Publication number Publication date
CN111178421A (en) 2020-05-19

Similar Documents

Publication Publication Date Title
US10200393B2 (en) Selecting representative metrics datasets for efficient detection of anomalous data
US10592666B2 (en) Detecting anomalous entities
US10902207B2 (en) Identifying application software performance problems using automated content-based semantic monitoring
AU2016204068B2 (en) Data acceleration
US11190562B2 (en) Generic event stream processing for machine learning
CN105590055B (en) Method and device for identifying user credible behaviors in network interaction system
Gama et al. On evaluating stream learning algorithms
Žliobaitė et al. Active learning with drifting streaming data
US9817893B2 (en) Tracking changes in user-generated textual content on social media computing platforms
US11625602B2 (en) Detection of machine learning model degradation
Gama et al. Recurrent concepts in data streams classification
US20180285432A1 (en) Extracting and labeling custom information from log messages
CN111178421B (en) Method, device, medium and electronic equipment for detecting user state
Yang et al. A multi-stage automated online network data stream analytics framework for IIoT systems
CN113159615A (en) Intelligent information security risk measuring system and method for industrial control system
CN103631787A (en) Webpage type recognition method and webpage type recognition device
US11321165B2 (en) Data selection and sampling system for log parsing and anomaly detection in cloud microservices
CN113538154A (en) Risk object identification method and device, storage medium and electronic equipment
Costa et al. Adaptive learning for dynamic environments: A comparative approach
US20220309391A1 (en) Interactive machine learning optimization
Naidu et al. Analysis of Hadoop log file in an environment for dynamic detection of threats using machine learning
CN114138977A (en) Log processing method and device, computer equipment and storage medium
CN112116159B (en) Information interaction method and device, computer readable storage medium and electronic equipment
CN112070559A (en) State acquisition method and device, electronic equipment and storage medium
US20200192778A1 (en) Real-time collaboration dynamic logging level control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant