CN114218500B - User mining method, system, device and storage medium - Google Patents

User mining method, system, device and storage medium Download PDF

Info

Publication number
CN114218500B
CN114218500B CN202111527510.0A CN202111527510A CN114218500B CN 114218500 B CN114218500 B CN 114218500B CN 202111527510 A CN202111527510 A CN 202111527510A CN 114218500 B CN114218500 B CN 114218500B
Authority
CN
China
Prior art keywords
user
call
calling
data
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111527510.0A
Other languages
Chinese (zh)
Other versions
CN114218500A (en
Inventor
陈国言
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iMusic Culture and Technology Co Ltd
Original Assignee
iMusic Culture and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iMusic Culture and Technology Co Ltd filed Critical iMusic Culture and Technology Co Ltd
Priority to CN202111527510.0A priority Critical patent/CN114218500B/en
Publication of CN114218500A publication Critical patent/CN114218500A/en
Application granted granted Critical
Publication of CN114218500B publication Critical patent/CN114218500B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Abstract

The invention discloses a user mining method, a system, a device and a storage medium, wherein the method comprises the following steps: acquiring call data of a user; the call data is the call data of the calling user to the called user; then, carrying out data cleaning on the call data to generate a call behavior list; based on a three-degree influence principle, performing pruning operation on the call behavior list to generate a call network; based on random walk algorithm, generating random call sequence set according to call network; inputting the random call sequence set into the trained user mining model to obtain a user vector set corresponding to the calling user; the user mining model is a Skip-gram model; and finally, calculating the similarity between the user vector corresponding to the current calling user and the user vectors corresponding to other calling users, and determining a similar user group similar to the current calling user. The embodiment of the application can be widely applied to the technical field of the Internet.

Description

User mining method, system, device and storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to a user mining method, system, device, and storage medium.
Background
In the internet, the network users are subjected to grid control by taking communities as units, and the method has great significance for public opinion management and control or accurate marketing activities, so that mining community users is an important means for realizing internet management and control.
In the related art, the social network is generally searched centering on the social relationship of the network user, but because the attributes of the user have diversity and the complexity of the network is increasing, the related art faces great algorithmic and engineering challenges in the process of mining the network user.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art. Therefore, the application provides a user mining method, system, device and storage medium.
In a first aspect, an embodiment of the present application provides a user mining method, including: acquiring call data of a user; the calling data is the calling data of a plurality of calling users to called users in a specified time period; carrying out data cleaning on the calling data to generate a calling behavior list; based on a three-degree influence principle, pruning operation is carried out on the call behavior list to generate a call network; wherein the calling network is a directed graph; generating a random call sequence set according to the call network based on a random walk algorithm; inputting the random call sequence set into a trained user mining model to obtain a user vector set corresponding to the user; the user mining model Skip-gram model is adopted, and the user vector set comprises a plurality of user vectors corresponding to the users; and calculating the similarity between the current user vector and other user vectors in the user vector set, and determining a similar user group corresponding to the current user vector and similar to the user.
Optionally, the method further comprises: acquiring high-frequency order lists of all the users in the similar user group; and performing order recommendation on all the users in the similar user group according to the high-frequency order list.
Optionally, the performing data cleaning on the call data to generate a call behavior list includes: defining a plurality of dialing action types and defining a first weight value corresponding to the dialing action types; merging the call data belonging to the same calling subscriber in the call data; calculating call weighting scores of all the called users corresponding to the current calling user according to the dialing action type, the first weight value and the combined call data; and sequencing all the called users corresponding to the current calling user according to the call weighting score to generate the call behavior list.
Optionally, the performing data cleaning on the call data to generate a call behavior list includes: merging the call data belonging to the same calling subscriber in the call data; after the combination is completed, sequencing the call data corresponding to the current calling user according to the call time; after finishing sequencing, according to a preset continuous calling interval, segmenting the calling data corresponding to the current calling user to generate the calling behavior list comprising a plurality of continuous calling units; wherein, the continuous calling unit comprises a plurality of called users.
Optionally, the pruning the call behavior list based on the three-degree influence principle to generate a call network includes: determining the relation level of all the called users and the calling users according to the call data; the relationship level is divided into a primary relationship, a secondary relationship, a tertiary relationship and other relationships; deleting the call data corresponding to the called subscriber belonging to the other relationship from the call behavior list; and generating the calling network according to the deleted calling behavior list.
Optionally, the generating a random call sequence set according to the call network based on a random walk algorithm includes: determining a second weight value in the calling network; and in the call network, realizing a random walk algorithm according to the second weight value, and generating the random call sequence set.
In a second aspect, an embodiment of the present application provides a user mining system, including: the first module is used for acquiring call data of a user; the calling data is the calling data of a plurality of calling users to called users in a specified time period; the second module is used for carrying out data cleaning on the calling data and generating a calling behavior list; a third module, configured to perform pruning operation on the call behavior list based on a three-degree influence principle, so as to generate a call network; wherein the calling network is a directed graph; a fourth module, configured to generate a random call sequence set according to the call network based on a random walk algorithm; a fifth module, configured to input the random call sequence set into a trained user mining model, so as to obtain a user vector set corresponding to the user; the user mining model Skip-gram model is adopted, and the user vector set comprises a plurality of user vectors corresponding to the users; a sixth module, configured to calculate similarity between the current user vector and the other user vectors in the user vector set, and determine a similar user group that is similar to the user corresponding to the current user vector.
In a third aspect, an embodiment of the present application provides an apparatus, including: at least one processor; at least one memory for storing at least one program; when executed by the at least one processor, cause the at least one processor to implement the user mining method of the first aspect.
In a fourth aspect, the present application provides a computer storage medium, in which a program executable by a processor is stored, and the program executable by the processor is used for implementing the user mining method according to the first aspect when executed by the processor.
The beneficial effects of the embodiment of the application are as follows: in the user mining method provided by the embodiment of the application, the call data of a user is firstly obtained; the calling data is the calling data of a calling user to a called user; then, carrying out data cleaning on the calling data to generate a calling behavior list; based on a three-degree influence principle, pruning operation is carried out on the call behavior list to generate a call network; based on random walk algorithm, generating random call sequence set according to call network; inputting the random call sequence set into the trained user mining model to obtain a user vector set corresponding to the calling user; wherein, the user mining model is a Skip-gram model; and finally, calculating the similarity between the user vector corresponding to the current calling user and the user vectors corresponding to other calling users, and determining a similar user group similar to the current calling user. Compared with the related technology, the method generates the call network in a random walk mode, and is beneficial to improving the call data characteristic of low call frequency; in addition, the user vector representing the user is generated by using the user mining model, the limitation of social network undirected graphs in the related technology can be avoided during data mining, and similar user groups similar to the current user can be better mined, so that activities such as public opinion management and control, accurate marketing and the like are facilitated.
Drawings
The accompanying drawings are included to provide a further understanding of the claimed subject matter and are incorporated in and constitute a part of this specification, illustrate embodiments of the subject matter and together with the description serve to explain the principles of the subject matter and not to limit the subject matter.
Fig. 1 is a flowchart illustrating steps of a user mining method according to an embodiment of the present disclosure;
fig. 2 is a flowchart of a first step of generating a call behavior list according to an embodiment of the present application;
fig. 3 is a flowchart of a second step of generating a call behavior list according to an embodiment of the present application;
fig. 4 is a schematic diagram of a call network provided in an embodiment of the present application;
FIG. 5 is a schematic diagram of a user mining system provided in an embodiment of the present application;
fig. 6 is a schematic diagram of an apparatus provided in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
It should be noted that although functional block divisions are provided in the system drawings and logical orders are shown in the flowcharts, in some cases, the steps shown and described may be performed in different orders than the block divisions in the systems or in the flowcharts. The terms first, second and the like in the description and in the claims, and the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
In the internet, the network users are subjected to grid control by taking communities as units, and the method has great significance for public opinion management and control or accurate marketing activities, so that mining community users is an important means for realizing internet management and control. In the related art, the social network is generally searched centering on the social relationship of the network user, but because the attributes of the user have diversity and the complexity of the network is increasing, the related art faces great algorithmic and engineering challenges in the process of mining the network user.
Based on this, the embodiment of the present application provides a user mining method, system, device and storage medium, and the method includes: firstly, acquiring call data of a user; the calling data is the calling data of a calling user to a called user; then, carrying out data cleaning on the calling data to generate a calling behavior list; based on a three-degree influence principle, pruning operation is carried out on the call behavior list to generate a call network; based on random walk algorithm, generating random call sequence set according to call network; inputting the random call sequence set into the trained user mining model to obtain a user vector set corresponding to the calling user; wherein, the user mining model is a Skip-gram model; and finally, calculating the similarity between the user vector corresponding to the current calling user and the user vectors corresponding to other calling users, and determining a similar user group similar to the current calling user. Compared with the related technology, the method generates the call network in a random walk mode, and is beneficial to improving the call data characteristic of low call frequency; in addition, the user vector representing the user is generated by using the user mining model, the limitation of social network undirected graphs in the related technology can be avoided during data mining, and similar user groups similar to the current user can be better mined, so that activities such as public opinion management and control, accurate marketing and the like are facilitated.
The embodiments of the present application will be further explained with reference to the drawings.
Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a user mining method provided in an embodiment of the present application, where the method includes, but is not limited to, steps S100 to S150:
s100, acquiring call data of a user;
specifically, call data of a plurality of users is acquired within a specified period of time. Because the embodiment of the application is used for mining users similar to each other, in some embodiments, user call data which may have similarity may be acquired according to regions. For example, call data for all users in cell a in the past 1 month is obtained.
It is understood that the call data is the number of communications during a given time period that call is initiated from different callers to different calleesAccordingly. For example, the call data may be represented as
Figure RE-GDA0003478805790000041
D represents the calling data, the superscript k of D represents the kth calling data, the following table Um _ Un of D represents that the calling user is Um, the called user is Un, and k, m and n are all natural numbers, and m is not equal to n.
S110, performing data cleaning on the calling data to generate a calling behavior list;
specifically, data cleaning is performed on the obtained call data of the user, and a call behavior list corresponding to each calling user is generated, for example, the call behavior list of the user U1 may indicate that U1:
Figure RE-GDA0003478805790000042
wherein, the superscript j of D represents the j-th call data in the call behavior list of U1, and k is a natural number. Similarly, the call behavior lists of other users may be represented in the same form.
In some embodiments, the process of generating a call activity list may be described by the method steps in fig. 2. Referring to fig. 2, fig. 2 is a flowchart illustrating a first step of generating a call behavior list according to an embodiment of the present application, where the method includes, but is not limited to, steps S200 to S230:
s200, defining a plurality of dialing action types and defining a first weight value corresponding to the dialing action types;
in particular, different dialing action types may be defined according to the call data of the calling subscriber when calling the called subscriber. For example, the dialing action types include, but are not limited to, whether the call is connected or not, the call duration of the call, the call frequency of the current called user, the average call duration of the current called user, the concentrated call duration of the current called user, and the like. For example, in the process of mining the similar users, if the higher the frequency of the call between the current calling user and a certain called user is, the higher the similarity between the current calling user and the called user can be reflected, and a first weight value with a higher dialing action type of the frequency of the call between the current calling user and the current called user is given.
S210, merging the call data belonging to the same calling party in the call data;
specifically, the call data acquired in the above step S100 is represented as
Figure RE-GDA0003478805790000051
In this form, as can be understood from the subscript of D, a certain subscriber may be a calling subscriber or a called subscriber in the call data. In this step, each user involved in the call data is taken as a calling user, and the call data belonging to the same calling user are combined to obtain a plurality of groups of call data representing different users. According to the above, when the calling subscriber is U1, the corresponding call data can be represented as U1:
Figure RE-GDA0003478805790000052
it should be noted that, unlike the call behavior list in the embodiment of the present application, at this time, the call data corresponding to U1 are not sequentially divided from each other, and the call data are further sorted and ordered in the following steps, so as to generate the call behavior list.
S220, calculating call weighting scores of all called users corresponding to the current calling user according to the dialing action type, the first weight value and the combined call data;
specifically, the call weighting scores of all called users corresponding to the current calling user are calculated according to the dialing action type, the first weight value and the combined call data. Taking the current caller as U1 as an example, assume that the dialing action type T includes { T } 1 ,T 2 ,T 3 ,...,T q Q represents the q-th dialing action type, and a first weight value corresponding to the dialing action type T is assumed to be { w } 1 ,w 2 ,w 3 ,...,w q Then, the call weighting score S of the user U1 to the user U2 can be determined according to the following formula:
S U1_U2 =T 1 *w 1 +T 2 *w 2 +...+T q *w q
wherein S is U1_U2 Indicating the call weight score of the calling user U1 to the called user U2. It can be understood that dialing { T in action type T for different called subscribers 1 ,T 2 ,T 3 ,...,T q It should have different values, so as to obtain different call weighting scores for the called user by U1 under different called users.
S230, sequencing all called users corresponding to the current calling user according to the call weighting score to generate a call behavior list;
specifically, according to step S230, the call weighting score of all the called users corresponding to each calling user can be calculated, so that all the called users corresponding to the current calling user can be sorted according to the magnitude of the call weighting score, thereby generating the call behavior list corresponding to each calling user. For example, the call behavior list of calling user U1 may be represented as U1: { U 2 :S U1_U2 ,U 4 :S U1_U4 ,...,U n :S U1_Un And the called users in the call behavior list are arranged in a descending order according to the call weighting score.
Through steps S200 to S230, the embodiment of the present application provides a scheme for determining a called user weight value according to a called user dialing action type, and then generating a call behavior list.
In other embodiments, the call behavior list may also be generated based on continuous calls to the called user. Referring to fig. 3, fig. 3 is a flowchart illustrating a second step of generating a call behavior list according to an embodiment of the present application, where the method includes, but is not limited to, steps S300 to S320:
s300, merging the call data belonging to the same calling party in the call data;
specifically, referring to step S210, the call data belonging to the same calling party is merged, and the description thereof will not be repeated.
S310, after the combination is completed, sequencing the call data corresponding to the current calling user according to the call time;
specifically, when the merging is completed, the call data of a plurality of called users corresponding to each calling user is obtained, and in this step, the call data corresponding to the current calling user is sorted according to the call time. In the above, it is mentioned that the call data is user call data in a specified time period, the calculation is started from the start time of the specified time period, the first call performed by the current calling user is in the first order, and so on, and all call data of the current calling user are sorted according to the time order. For example, the call data for user U1 may be expressed as: u1: { U 2 :time1,U 4 :time2,...,U n Time n, wherein U 2 Time1 indicates that the user calls the called user U2 at the calling time of time 1. By analogy, the call data of all calling subscribers can be sorted.
S320, after finishing sequencing, segmenting the call data corresponding to the current calling user according to a preset continuous call interval, and generating a call behavior list comprising a plurality of continuous call units;
specifically, in the embodiment of the present application, it is considered that the called user who is continuously called by the calling user has a considerable similarity with the calling user. Therefore, in this step, when finishing the sequencing of all the calling subscriber calling data, setting a continuous calling interval, and dividing the calling data corresponding to the current calling subscriber into a plurality of continuous calling units. For example, if the continuous call interval is set to be 30min, all the called subscribers included in the first 30min are classified into the same continuous call unit from the start time of the specified time period, and so on, until the end time of the specified time period, all the call data corresponding to the current calling subscriber are classified into a plurality of continuous call units, and a call behavior list is generated. For example, the call behavior list for user U1 may be represented as U1 { [ U ] 2 ,U 4 ],[U 4 ],...,[U 3 ,U 2 ,U 4 ]Wherein, "" is present]"indicates different continuous callsAnd (4) units.
Through steps S300-S320, the embodiment of the present application provides a scheme for generating a call behavior list according to a continuous call situation from a calling user to a called user.
Step S110 is described in the above, and step S120 is described below.
S120, based on a three-degree influence principle, pruning operation is carried out on the call behavior list to generate a call network; wherein, the calling network is a directed graph;
specifically, the three degree influence principle means that the distance between two adjacent social networks is within three degrees, which can be regarded as a strong connection, and the strong connection can trigger a behavior. Taking the user A as an example, if the user A has a friend user B, the distance between the user A and the user B is one degree, and the relationship between the user A and the user B is called a primary relationship in the embodiment of the application; if the user B has a friend user C, the distance between the user A and the user B is two degrees, and the relationship between the user A and the user C is called a secondary relationship; similarly, the distance between the user a and the friend user D of the user C is three degrees, which means that the relationship between the user a and the user D is a three-level relationship. Based on the three-degree influence principle, if the relationship with the user a exceeds three degrees, the relationship between the user a and the user is called other relationship.
Based on this theory, it is considered in the embodiment of the present application that users within three degrees of distance from the current user can be included in the range of users that may be similar to the current user for further evaluation. Thus, user data that is more than three degrees away from the current user needs to be purged. In this step, the relationship between the current calling subscriber and all the called subscribers is determined, and then the call data corresponding to the called subscribers belonging to other relationships are deleted from the call behavior list, so that only the called subscribers within three degrees of the distance from the current calling subscriber remain in the deleted call behavior list, and therefore, a similar subscriber group similar to the current calling subscriber can be determined more accurately.
And generating a calling network according to the calling behavior list, wherein the calling network is represented in the form of a directed graph. Referring to fig. 4, fig. 4 is a schematic diagram of a calling network provided in the embodiment of the present application, and as shown in fig. 4, nodes in the network are each user, and the user may be a calling user identity or a called user identity, and the nodes are connected by directed edges, where the directed edges point from the calling user to the called user.
S130, generating a random calling sequence set according to a calling network based on a random walk algorithm;
specifically, in this step, it is first necessary to determine the weight between each node in the call network, where the weight is referred to as a second weight value, and the call networks obtained from the same batch of call data may be different according to different generation methods of the call list. It should be noted that, since the calling network is generated according to the calling behavior list, the second weight value in the calling network should also be related to the order in the calling behavior list, for example, if the calling behavior list is generated in the method shown in fig. 2, the second weight value may be determined by referring to the first weight value; if the call behavior list is generated by the method shown in fig. 3, the second weight value may be determined by referring to the call frequency of the calling subscriber to the same called subscriber in the same continuous call unit. In the embodiment of the present application, a specific determination method of the second weight value is not particularly limited.
After the second weight value is determined, a random walk algorithm is implemented according to the second weight value in the call network, and a random call sequence set is generated. The random call sequence set includes a plurality of random call sequences, for example, one of them may be: after the user U1 calls the user U2, U2 calls U3, U3 calls U5, and then the calling sequence is ended. Through the random walk algorithm, the embodiment of the application enriches the call data, and improves the data characteristic of the call low frequency to a certain extent, thereby being beneficial to better mining similar users.
S140, inputting the random call sequence set into the trained user mining model to obtain a user vector set corresponding to the user;
specifically, the user mining model in the embodiment of the application is a Skip-gram model, the Skip-gram model belongs to one of Word2Vec models, and in brief, the Word2Vec model is a model for representing semantic information of words in a Word vector mode by continuously learning texts, that is, words can be mapped into one vector by the Word2Vec model, so that the similarity between different Word semantics can be represented in a vector mode. While the Skip-gram model in the Word2Vec model predicts the context by a given input Word. In the embodiment of the application, the user mining model built on the basis of the Skip-gram model can generate user vectors corresponding to each user according to the random call sequence set, and the user vectors form a user vector set.
S150, calculating the similarity between the current user vector and other user vectors in the user vector set, and determining a similar user group similar to the user corresponding to the current user vector;
specifically, according to the step S140, the user vector corresponding to each user is obtained, and then the similarity calculation may be performed on the user vectors, so as to determine a plurality of similar users. For example, if all users similar to the current user need to be found, the cosine similarity between the user vector corresponding to the current user and the user vectors corresponding to other users can be calculated, and it is assumed that when the user vector u is 1 And a user vector u 1 If the similarity of the user U1 and the user U2 is greater than the preset similarity threshold, it indicates that the user U1 and the user U2 are similar, and the user U2 is listed in the similar user group of the user U1, and so on, the similar user group of the user U1 can be determined.
After the similar user group is determined, the high-frequency ordering list of all users in the similar user group can be obtained, for example, the user A often purchases clothes of a certain brand, other ordering recommendations in the similar user group are generated according to the high-frequency ordering list of the user A, and the ordering recommendations comprise the ordering recommendations of the clothes of the brand, so that accurate marketing is realized in the similar user group.
Through steps S100 to S150, in the embodiment of the present application, call data of a user is first acquired; the calling data is the calling data of a calling user to a called user; then, carrying out data cleaning on the call data to generate a call behavior list; based on a three-degree influence principle, pruning operation is carried out on the call behavior list to generate a call network; based on random walk algorithm, generating random call sequence set according to call network; inputting the random call sequence set into the trained user mining model to obtain a user vector set corresponding to the calling user; wherein, the user mining model is a Skip-gram model; and finally, calculating the similarity between the user vector corresponding to the current calling user and the user vectors corresponding to other calling users, and determining a similar user group similar to the current calling user. Compared with the related technology, the method generates the call network in a random walk mode, and is beneficial to improving the call data characteristic of low call frequency; in addition, the user vector representing the user is generated by using the user mining model, the limitation of social network undirected graphs in the related technology can be avoided during data mining, and similar user groups similar to the current user can be better mined, so that activities such as public opinion management and control, accurate marketing and the like are facilitated.
Referring to fig. 5, fig. 5 is a schematic diagram of a user mining system provided in an embodiment of the present application, where the system 500 includes, but is not limited to, a first module 510, a second module 520, a third module 530, a fourth module 540, a fifth module 550, and a sixth module 560, and the first module is used to obtain call data of a user; the calling data is the calling data of a calling user to a called user; the second module is used for carrying out data cleaning on the calling data to generate a calling behavior list; the third module is used for pruning the call behavior list based on a three-degree influence principle to generate a call network; the fourth module is used for generating a random call sequence set according to a call network based on a random walk algorithm; the fifth module is used for inputting the random call sequence set into the trained user mining model to obtain a user vector set corresponding to the calling user; wherein, a user excavates a model Skip-gram model; the sixth module is used for calculating the similarity between the user vector corresponding to the current calling user and the user vectors corresponding to other calling users, and determining a similar user group similar to the current calling user.
Referring to fig. 6, fig. 6 is a schematic diagram of an apparatus 600 provided in an embodiment of the present application, where the apparatus 600 includes at least one processor 610 and at least one memory 620 for storing at least one program; one processor and one memory are exemplified in fig. 6.
The processor and memory may be connected by a bus or other means, such as by a bus in FIG. 6.
The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Another embodiment of the present application also provides an apparatus that may be used to perform the control method as in any of the embodiments above, for example, performing the method steps of fig. 1 described above.
The above-described embodiments of the apparatus are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The embodiment of the application also discloses a computer storage medium, wherein a program executable by a processor is stored, and the program executable by the processor is used for realizing the method provided by the application when being executed by the processor.
One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
While the preferred embodiments of the present invention have been described, the present invention is not limited to the above embodiments, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention, and such equivalent modifications or substitutions are included in the scope of the present invention defined by the claims.

Claims (8)

1. A user mining method, comprising:
acquiring call data of a user; the calling data is the calling data of a plurality of calling users to called users in a specified time period;
carrying out data cleaning on the calling data to generate a calling behavior list;
based on a three-degree influence principle, pruning the call behavior list according to the relation level of the calling user and the called user to generate a call network; wherein the calling network is a directed graph;
generating a random call sequence set according to the call network based on a random walk algorithm;
inputting the random call sequence set into a trained user mining model to obtain a user vector set corresponding to the user; the user mining model is a Skip-gram model, and the user vector set comprises a plurality of user vectors corresponding to the users;
calculating the similarity between the current user vector and other user vectors in the user vector set, and determining a similar user group similar to the user corresponding to the current user vector;
wherein the generating a random call sequence set according to the call network based on the random walk algorithm includes:
determining a second weight value in the calling network; wherein, the second weight value is the weight of each node in the call network;
and in the call network, realizing a random walk algorithm according to the second weight value, and generating the random call sequence set.
2. The user mining method of claim 1, the method further comprising:
acquiring high-frequency order lists of all the users in the similar user group;
and performing order recommendation on all the users in the similar user group according to the high-frequency order list.
3. The user mining method according to any one of claims 1-2, wherein the performing data cleansing on the call data to generate a call behavior list comprises:
defining a plurality of dialing action types and defining a first weight value corresponding to the dialing action types;
merging the call data belonging to the same calling subscriber in the call data;
calculating call weighting scores of all the called users corresponding to the current calling user according to the dialing action type, the first weight value and the combined call data;
and sequencing all the called users corresponding to the current calling user according to the call weighting score to generate the call behavior list.
4. The user mining method according to any one of claims 1-2, wherein the performing data cleansing on the call data to generate a call behavior list comprises:
merging the call data belonging to the same calling subscriber in the call data;
after the merging is completed, sorting the call data corresponding to the current calling user according to the call time;
after finishing sequencing, according to a preset continuous calling interval, segmenting the calling data corresponding to the current calling user to generate the calling behavior list comprising a plurality of continuous calling units; wherein, the continuous calling unit comprises a plurality of called users.
5. The user mining method according to any one of claims 1-2, wherein the pruning the call behavior list according to the relationship level between the calling user and the called user based on a three-degree influence principle to generate a call network comprises:
determining the relation level of all the called users and the calling users according to the call data; the relationship level is divided into a first-level relationship, a second-level relationship, a third-level relationship and other relationships;
deleting the call data corresponding to the called subscriber belonging to the other relationship from the call behavior list;
and generating the calling network according to the deleted calling behavior list.
6. A user mining system, comprising:
the first module is used for acquiring call data of a user; the calling data is the calling data of a plurality of calling users to called users in a specified time period;
the second module is used for carrying out data cleaning on the calling data and generating a calling behavior list;
a third module, configured to perform pruning operation on the call behavior list according to a relationship level between the calling subscriber and the called subscriber based on a three-degree influence principle, so as to generate a call network; wherein the calling network is a directed graph;
a fourth module, configured to generate a random call sequence set according to the call network based on a random walk algorithm;
a fifth module, configured to input the random call sequence set into a trained user mining model, to obtain a user vector set corresponding to the user; the user mining model is a Skip-gram model, and the user vector set comprises a plurality of user vectors corresponding to the users;
a sixth module, configured to calculate similarity between the current user vector and other user vectors in the user vector set, and determine a similar user group that is similar to the user corresponding to the current user vector;
wherein the generating a random call sequence set according to the call network based on the random walk algorithm includes:
determining a second weight value in the calling network; wherein, the second weight value is the weight of each node in the call network;
and in the call network, realizing a random walk algorithm according to the second weight value, and generating the random call sequence set.
7. A user excavation apparatus, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement the user mining method of any of claims 1-5.
8. A computer storage medium having stored therein a processor-executable program, wherein the processor-executable program, when executed by the processor, is for implementing the user mining method of any of claims 1-5.
CN202111527510.0A 2021-12-14 2021-12-14 User mining method, system, device and storage medium Active CN114218500B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111527510.0A CN114218500B (en) 2021-12-14 2021-12-14 User mining method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111527510.0A CN114218500B (en) 2021-12-14 2021-12-14 User mining method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN114218500A CN114218500A (en) 2022-03-22
CN114218500B true CN114218500B (en) 2023-03-24

Family

ID=80701830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111527510.0A Active CN114218500B (en) 2021-12-14 2021-12-14 User mining method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN114218500B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170178A (en) * 2022-06-27 2022-10-11 天翼爱音乐文化科技有限公司 Marketing method, system, equipment and storage medium based on call network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106899492B (en) * 2017-02-17 2020-04-14 上海新炬网络技术有限公司 Method for mining relationship chain of colleague users
CN110059889B (en) * 2019-03-28 2021-05-28 国家计算机网络与信息安全管理中心 Fraud call sequence detection method based on unsupervised learning
CN113570391B (en) * 2021-09-24 2022-02-01 平安科技(深圳)有限公司 Community division method, device, equipment and storage medium based on artificial intelligence

Also Published As

Publication number Publication date
CN114218500A (en) 2022-03-22

Similar Documents

Publication Publication Date Title
CN104601438A (en) Friend recommendation method and device
CN107240029B (en) Data processing method and device
CN109474756B (en) Telecommunication anomaly detection method based on collaborative network representation learning
CN109711746A (en) A kind of credit estimation method and system based on complex network
CN111325340B (en) Information network relation prediction method and system
CN113094593B (en) Social network event recommendation method, system, device and storage medium
CN112052404A (en) Group discovery method, system, device and medium for multi-source heterogeneous relation network
CN110781960A (en) Training method, classification method, device and equipment of video classification model
CN114218500B (en) User mining method, system, device and storage medium
US20130211873A1 (en) Determining a churn risk
CN105824818A (en) Informationized management method, platform and system
CN107358308B (en) Method and device for maximizing social network influence
Mitrovic et al. Scalable RFM-enriched representation learning for churn prediction
CN109754135B (en) Credit behavior data processing method, apparatus, storage medium and computer device
Mitrovic et al. Dyn2Vec: Exploiting dynamic behaviour using difference networks-based node embeddings for classification
CN112131569B (en) Risk user prediction method based on graph network random walk
CN114065060B (en) Data analysis method, device and storage medium
CN115146292A (en) Tree model construction method and device, electronic equipment and storage medium
CN111177477B (en) Method, device and equipment for determining suspicious group
CN110738418A (en) Detection method of weakly connected overlapping communities
CN110765303A (en) Method and system for updating database
CN109299337A (en) Graph searching method based on iteration
CN111026863A (en) Customer behavior prediction method, apparatus, device and medium
CN112887491B (en) User missing information acquisition method and device
CN113094506B (en) Early warning method based on relational graph, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant