CN112417076A - Building personnel affiliation identification method based on big data mining technology - Google Patents

Building personnel affiliation identification method based on big data mining technology Download PDF

Info

Publication number
CN112417076A
CN112417076A CN202011330345.5A CN202011330345A CN112417076A CN 112417076 A CN112417076 A CN 112417076A CN 202011330345 A CN202011330345 A CN 202011330345A CN 112417076 A CN112417076 A CN 112417076A
Authority
CN
China
Prior art keywords
user
users
enterprise
building
base station
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011330345.5A
Other languages
Chinese (zh)
Other versions
CN112417076B (en
Inventor
王彦青
张清竹
严莲
郑紫薇
赵海秀
高梓枫
王为强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinxun Digital Technology Hangzhou Co ltd
Original Assignee
EB INFORMATION TECHNOLOGY Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by EB INFORMATION TECHNOLOGY Ltd filed Critical EB INFORMATION TECHNOLOGY Ltd
Priority to CN202011330345.5A priority Critical patent/CN112417076B/en
Publication of CN112417076A publication Critical patent/CN112417076A/en
Application granted granted Critical
Publication of CN112417076B publication Critical patent/CN112417076B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs

Abstract

A building personnel affiliation identification method based on big data mining technology comprises the following steps: extracting base station data in the working time period of a user, determining a base station to which the user works, acquiring a building to which the user works, and dividing all the users into different building user groups; building-user grouping model is established and trained, input is user characteristic data in each building user group, output is a plurality of enterprise user groups obtained by dividing users, and the working process is as follows: calculating enterprise similarity between every two users, then constructing a graph by using the user as a node and the enterprise similarity between the users as an edge by adopting a community discovery Louvain algorithm, and dividing all the users into a plurality of communities; and inputting the user characteristic data in the building user group to be identified into the building-user grouping model, and outputting enterprise user groups to which all users in the building user group to be identified respectively belong. The invention belongs to the field of communication, and can realize automatic identification of enterprise user groups in a building by utilizing user data and signaling data.

Description

Building personnel affiliation identification method based on big data mining technology
Technical Field
The invention relates to a building personnel affiliation identification method based on a big data mining technology, and belongs to the field of communication.
Background
Through the development of the last thirty years, electronic commerce has gone into thousands of households in China, and an irreplaceable industrial form is formed. Currently, the mainstream and mature user grouping mode in the e-commerce market is based on user behaviors, and the specific application mode is commodity recommendation. Enterprises strive to use big data to portray users, divide user groups with different characteristics, provide exclusive marketing service for different user groups, and the personalized recommendation mode can be mainly summarized into two types:
(a) and carrying out classification labeling on the commodities, identifying the interest categories of the users through behavior information of browsing, collecting and the like of the users, and recommending the commodities of the same category for the users.
(b) And (3) portraying all users, adopting different recommendation modes and contents aiming at different types of users, and recommending other user interest products to the same type of users.
The above method mainly focuses on the personal behaviors of the users and the front and back influences on the time sequence, the group division of the users only depends on the characteristics of the network behaviors of the users, the available information is limited, and the dimensionality is relatively single. However, a large amount of information is still required to be found and applied in the current relevant big data such as user behaviors, the influence among group behavior characteristics and the division dimension of user groups are required to be comprehensively researched, and how to discover and apply a new user grouping dimension becomes a big hotspot in the era background.
In the past, the group users are the key points of enterprise customer maintenance, and unlike individual users, the group users have the advantages of convenience in centralized maintenance, high benefit and low maintenance cost. Research shows that the client concentration and the financial benefits of enterprises have an inverse U-shaped change relationship, namely, as the client concentration increases, the financial benefits of the enterprises tend to increase first and then decrease. Therefore, enterprise group users are reasonably developed, the client concentration is improved, and the financial benefits of the electric commerce and the enterprise are improved; meanwhile, because the users of the same group have the characteristics of similar characteristics, adjacent geographic positions and the like, the existing group users are managed in a unified manner, and the maintenance cost and the supply cost of the clients are reduced, so that the clustering division of the clients becomes a major key point for the maintenance of enterprise clients.
Meanwhile, research shows that the characteristics of local social environment can influence the thought and behavior of people, which is called neighborhood effect. Research shows that groups have great influence on individual behaviors, online shopping sharing can greatly increase the implicit demands of users, shopping behaviors of adjacent groups can influence each other, and product selection is convergent. Therefore, the geographical concentration condition of the user group is analyzed, the occupation space is analyzed for the user group, the mutual influence among shopping behaviors of adjacent groups is favorably found, the electronic commerce and enterprise are assisted to carry out accurate recommendation and marketing, the reference dimension of user recommendation is perfected, and the big data information is deeply utilized, so that the customer satisfaction is improved, and customers are better maintained.
However, the definition of the group attribution is complex, so that the user grouping also has certain difficulty according to the group attribution, and how to reasonably determine the group attribution condition of the user and whether the algorithm is effective are problems to be solved. Therefore, how to fully utilize the user data and the signaling data to divide the user into groups according to the attribution enterprise and the geographic position so as to realize the automatic identification of the enterprise user groups in the building becomes a technical problem which is a key focus of technical personnel.
Disclosure of Invention
In view of the above, the present invention provides a building personnel affiliation identification method based on big data mining technology, which can make full use of user data and signaling data to group-divide users according to affiliation enterprises and geographic locations, thereby implementing automatic identification of enterprise user groups in a building.
In order to achieve the purpose, the invention provides a building personnel affiliation identification method based on big data mining technology, which comprises the following steps:
step one, setting a working period, extracting base station data of each user in the working period to determine a base station to which each user belongs during working according to the base station data, acquiring a home building of each user during working according to a building name contained in base station information, and finally dividing all users into different building user groups according to the home building of each user during working;
step two, building and training a building-user grouping model, wherein the input of the building-user grouping model is the characteristic data of all users in each building user group, the output of the building-user grouping model is a plurality of enterprise user groups formed by dividing all the users in the building user group, and the working flow of the building-user grouping model is as follows: calculating enterprise similarity between every two users according to input feature data of each user, and then constructing a graph by using a community discovery Louvain algorithm and taking each user as a node and the enterprise similarity between every two users as an edge, so that all users in a building user group are divided into a plurality of communities, wherein one community is an enterprise user group;
and step three, inputting the characteristic data of all users in the building user group to be identified into the trained building-user clustering model, and outputting and obtaining a plurality of enterprise user groups to which all the users in the building user group to be identified respectively belong.
Compared with the prior art, the invention has the beneficial effects that: because users with different enterprise affiliation conditions are mixed in the same building, the invention carries out diving and homing research on the users based on signaling data and big data mining technology, and adopts graph theory community discovery algorithm to divide the groups of the users in the same building according to the enterprise affiliation and geographic position, and finally realizes homing of each user to the enterprise to which the user belongs, thereby helping accurate marketing, personalized recommendation and improving the customer satisfaction.
Drawings
FIG. 1 is a flow chart of a building personnel affiliation identification method based on big data mining technology.
Fig. 2 is a specific flowchart of the step one in fig. 1, which extracts the base station data of each user in the working period to determine the base station to which each user belongs when working.
FIG. 3 is a specific flowchart of calculating the business similarity between each two users according to the feature data input to each user in step two of FIG. 1.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the accompanying drawings.
As shown in fig. 1, the building personnel affiliation identification method based on big data mining technology of the present invention includes:
step one, setting a working period, extracting base station data of each user in the working period to determine a base station to which each user belongs during working according to the base station data, acquiring a home building of each user during working according to a building name contained in base station information, and finally dividing all users into different building user groups according to the home building of each user during working;
step two, building and training a building-user grouping model, wherein the input of the building-user grouping model is the characteristic data of all users in each building user group, the output of the building-user grouping model is a plurality of enterprise user groups formed by dividing all the users in the building user group, and the working flow of the building-user grouping model is as follows: calculating enterprise similarity between every two users according to input feature data of each user, and then constructing a graph by using a community discovery Louvain algorithm and taking each user as a node and the enterprise similarity between every two users as an edge, so that all users in a building user group are divided into a plurality of communities, wherein one community is an enterprise user group;
and step three, inputting the characteristic data of all users in the building user group to be identified into the trained building-user clustering model, and outputting and obtaining a plurality of enterprise user groups to which all the users in the building user group to be identified respectively belong.
In the first step, when the user is at home or in a working state, a general rule is to stay in a specific place for a long time, so that the attachment time of the base station can be used as a characteristic for judging the user state. The 24 hours of the day are first divided into 24 periods: starting from 0, sequentially selecting T1, T2, T... and T24, according to research, and in order to enable data to cover most of people and ensure the accuracy of feature extraction, working time intervals of T9-T12 and T14-T17 can be selected, and then, for each user, the base station to which the user possibly belongs in working time is screened according to the time of the user attaching to each base station in the working time intervals. The membership degree represents the degree of a certain element belonging to a certain fuzzy set, and is a key problem in fuzzy pattern recognition; the invention can convert the attribute vector of each user to each staying base station into the membership degree vector according to the membership degree function. Therefore, as shown in fig. 2, in step one of fig. 1, extracting the base station data of each user in the working period to determine the base station to which each user works according to the base station data, may further include:
step 11, obtaining a plurality of base stations where each user stays in the working period, and constructing an attribute vector of each user to each staying base station: xij=(xij1,xij2,...,xijn)TWherein X isijIs the attribute vector, x, of user i for its jth dwell base stationij1、xij2、...、xijnRespectively, the 1 st, 2 nd, and n th base station data of the j th staying base station of the user i, where n is the total number of the base station data, and the base station data includes but is not limited to: the number of calls in a time period, the number of calls in the time period, the number of basic position updates in the time period, the number of periodic position updates, the number of short message receptions in the time period, the number of short message transmissions in the time period, the total number of communication time in the time period and the stay time in the time period are shown in the following table, and are base station data tables for each stay base station in the working period of the same user:
Figure 100002_1
step 12, calculating membership degree vectors of each user to each staying base station and a standard working state base station according to the attribute vector of each user to each staying base station: u shapeij=(μij1,μij2,...,μijn)TWherein, UijIs the membership of the user i to the jth stay base station and the standard working state base stationDegree vector, UijThe calculation formula of each element value in (1) is as follows:
Figure DEST_PATH_IMAGE001
μijzis UijZ element value of (1, u)],xijzIs the z-th base station data, a, of user i for its j-th dwell base stationzIs the standard value, σ, of the z-th base station datazIs the standard deviation of the data of the z-th base station, az、σzThe value of (c) can be obtained by calculation according to the mean value of all users in the sample data to the data of the z-th base station of all the staying base stations;
step 13, calculating the membership evaluation value of each user to each staying base station and the standard working state base station:
Figure 100002_DEST_PATH_IMAGE002
wherein N isijIs the membership evaluation value of the user i to the jth stay base station and the standard working state base station, alphazThe weight corresponding to the data of the z-th base station is determined according to training of a building-user clustering model, and then a minimum value is selected from membership evaluation values of all the staying base stations and standard working state base stations of each user, wherein the staying base station corresponding to the minimum value is the base station to which each user belongs when working.
The feature data of the users can be obtained from different dimensions such as call delivery, payroll income, group building, and dining position between the members of the enterprise, as shown in fig. 3, the enterprise similarity between every two users is calculated according to the feature data input to each user in step two of fig. 1, which is described by taking users p and q as an example, and may further include:
step 21, calculating the call feature similarity of the users p and q:
Figure 100002_DEST_PATH_IMAGE003
wherein, thetacIs the weight of the c-th call feature, the value of which may be determined from training of the building-user clustering model,
Figure 100002_DEST_PATH_IMAGE004
is the attribute value of users p and q on the C-th call feature, C is the number of call features, which may include but is not limited to: the method comprises the following steps of (1) total call times, total call duration, number of common contacts and call times of the common contacts;
step 22, calculating the salary income characteristic similarity of the users p and q:
Figure 100002_DEST_PATH_IMAGE005
wherein, deltabIs the weight of the b-th payroll income feature, the value of which can be determined from training of the building-user clustering model,
Figure 100002_DEST_PATH_IMAGE006
is the similarity of users p and q on the B-th payroll income characteristic, B is the payroll income characteristic number, which may include but is not limited to: using a short message interface of a bank with the frequency of the first three, fixing the delivery date and the delivery times per month;
since the payroll income feature may be discrete or continuous attribute data, taking the b-th payroll income feature as an example, when the b-th payroll income feature is discrete attribute data,
Figure 100002_DEST_PATH_IMAGE007
the calculation formula of (a) is as follows:
Figure 100002_DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 100002_DEST_PATH_IMAGE009
the b-th payroll income characteristic values of users p and q respectively; when the b-th payroll income characteristic is continuity attribute data,
Figure DEST_PATH_IMAGE010
the calculation formula of (a) is as follows:
Figure 100002_DEST_PATH_IMAGE011
wherein, ubanmax、ubanminThe maximum value and the minimum value of the b-th payroll income characteristic are respectively set according to the actual business needs;
step 23, calculating the similarity of the clustering features of the users p and q
Figure 100002_DEST_PATH_IMAGE012
Respectively extracting TM base stations with long stay time of users p and q in a certain period of each historical holiday, respectively sequencing the TM base stations extracted for the users p and q according to the stay time from long to short, then comparing whether the stay base stations of the users p and q on each sequencing position are the same one by one, thereby obtaining the same stay base station number of the users p and q, then calculating the clustering feature similarity value of the users p and q on each historical holiday, wherein the clustering feature similarity value is the ratio of the same stay base station number of the users p and q to the TM, and finally calculating the clustering feature similarity of the users p and q, namely the average value of the clustering feature similarity values of the users p and q on all historical holidays; the TM may be set according to actual service needs, for example, selecting base stations three before the user stays for time between each holiday T13-T17, where in a certain historical holiday, the base station with the first ranking of the user p is consistent with the user q, but the base stations with the second and third ranking are inconsistent, and the clustering feature similarity value of the users p and q in the historical holiday is 1/3;
step 24, calculating the dinner party feature similarity of the users p and q
Figure DEST_PATH_IMAGE013
Comparing whether the base stations with the longest residence time of the users p and q in a certain time period of each working day in a statistical period are the same one by one, and counting the number of days of the base stations which are the same, and then calculating the dinner gathering similarity of the users p and q, namely the ratio of the number of days of the base stations which are the same to the total number of days of all working days in the statistical period;
step 25, calculating the enterprise similarity of the users p and q:
Figure 100002_DEST_PATH_IMAGE014
where ρ is1、ρ2、ρ3、ρ4The weights of the call feature similarity, the payroll income feature similarity, the group building feature similarity and the party meal feature similarity are determined according to training of a building-user clustering model.
After the building-user clustering model divides all users in a building user group into a plurality of communities by adopting a community discovery Louvain algorithm, the method of intra-group splitting and inter-group aggregation can be adopted aiming at the condition that a plurality of enterprises exist in the same enterprise user group in the personnel distribution mode and a plurality of enterprise user groups exist in the same enterprise personnel distribution mode, so that the accurate identification of the single enterprise user group in the building is realized, wherein:
1) aiming at the condition that a plurality of enterprises exist in the personnel distribution in the same enterprise user group, the method also comprises the following steps:
a1, selecting a plurality of users with low enterprise similarity from each enterprise user group as reselected users according to the enterprise similarity between every two users in each enterprise user group in the building user group, forming reselected user groups by all reselected users, and deleting the reselected users from the enterprise user groups to which the reselected users belong;
step A2, calculating the similarity between each user in the reselecting user group and each enterprise user group in the building user group, wherein the similarity between the user and the enterprise user group is the mean value of the enterprise similarities between the user and all the users in the enterprise user group, selecting the enterprise user group with the highest similarity for each user in the reselecting user group, then judging whether the similarity between each user and the selected enterprise user group is greater than the enterprise similarity between a certain number of users in the selected enterprise user group, and if so, adding the user into the selected enterprise user group; if not, a new enterprise user group is constructed for the user, and the user is added into the new enterprise user group.
2) Aiming at the condition that the same enterprise personnel are distributed in a plurality of enterprise user groups, the method also comprises the following steps:
step B1, calculating the similarity between every two enterprise user groups in the building, wherein the similarity between the two enterprise user groups is the mean value of the enterprise similarities between all the users in the two enterprise user groups, and then combining a plurality of enterprise user groups with high similarity into one enterprise user group;
and step B2, judging whether the number of users of each enterprise user group is smaller than the threshold number of people, if so, calculating the similarity between the enterprise user group and other enterprise user groups in the building, and merging the enterprise user group into other enterprise user groups with the highest similarity.
After determining each weight parameter in the building-user clustering model by training the two building-user clustering models in the step, the model effect can be evaluated by using the test sample, and the method also comprises the following steps:
step C1, inputting the characteristic data of all users in the tested building user group into the trained building-user clustering model, and outputting and obtaining a plurality of enterprise user groups to which all users in the tested building user group respectively belong;
step C2, acquiring the names of the users in each enterprise user group in the test building user group and the enterprise to which the users belong, and selecting the enterprise name with the largest number of users for each enterprise user group as the name of each enterprise user group;
step C3, calculating the clustering accuracy and the mixing rate:
Figure DEST_PATH_IMAGE015
wherein Accuracy is the clustering Accuracy, Mess is the clustering confounding rate, NucIs the correct number of users to be grouped, NuIs the number of users tested, NC is the number of enterprises tested, x belongs to [1, NC],mxIs the number of users, M, in the x-th enterprise user group that do not belong to the enterprise corresponding to the name of the enterprise user groupxIs the number of users in the x-th enterprise user group;
and step C4, judging whether the calculated clustering accuracy is greater than the accuracy threshold and the clustering mixing rate is less than the mixing rate threshold, if not, indicating that the model effect does not meet the requirement, and continuing to adjust the model.
If the model effect does not meet the requirements, model optimization can be carried out by perfecting a similarity measurement characteristic system and perfecting subdivision rules. On one hand, more non-contact characteristics are introduced to describe the similarity between users without direct contact; on the other hand, when the user grouping and the user subdivision are carried out, rules are defined to describe and divide users in different departments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (7)

1. A building personnel affiliation identification method based on big data mining technology is characterized by comprising the following steps:
step one, setting a working period, extracting base station data of each user in the working period to determine a base station to which each user belongs during working according to the base station data, acquiring a home building of each user during working according to a building name contained in base station information, and finally dividing all users into different building user groups according to the home building of each user during working;
step two, building and training a building-user grouping model, wherein the input of the building-user grouping model is the characteristic data of all users in each building user group, the output of the building-user grouping model is a plurality of enterprise user groups formed by dividing all the users in the building user group, and the working flow of the building-user grouping model is as follows: calculating enterprise similarity between every two users according to input feature data of each user, and then constructing a graph by using a community discovery Louvain algorithm and taking each user as a node and the enterprise similarity between every two users as an edge, so that all users in a building user group are divided into a plurality of communities, wherein one community is an enterprise user group;
and step three, inputting the characteristic data of all users in the building user group to be identified into the trained building-user clustering model, and outputting and obtaining a plurality of enterprise user groups to which all the users in the building user group to be identified respectively belong.
2. The method of claim 1, wherein in step one, the base station data of each user in the working period is extracted to determine the base station to which each user works, and further comprising:
step 11, obtaining a plurality of base stations where each user stays in the working period, and constructing an attribute vector of each user to each staying base station: xij=(xij1,xij2,...,xijn)TWherein X isijIs the attribute vector, x, of user i for its jth dwell base stationij1、xij2、...、xijnRespectively 1 st, 2 nd, n base station data of the j-th staying base station of the user i, wherein n is the total number of the base station data, and the base station data comprises: the number of calls in a time period, the number of calls in the time period, the number of basic position updates in the time period, the number of periodic position updates, the number of short message receptions in the time period, the number of short message transmissions in the time period, the total number of communication time in the time period and the stay time in the time period;
step 12, calculating membership degree vectors of each user to each staying base station and a standard working state base station according to the attribute vector of each user to each staying base station: u shapeij=(μij1,μij2,...,μijn)TWherein, UijIs the membership vector, U, of the user i to the jth station and the standard station in working stateijThe calculation formula of each element value in (1) is as follows:
Figure 3
μijzis UijZ-th element value of (1, n), z ∈ [1, n ]],xijzIs the z-th base station data, a, of user i for its j-th dwell base stationzIs the standard value, σ, of the z-th base station datazIs the standard deviation of the data of the z-th base station;
step 13, calculating the membership evaluation value of each user to each staying base station and the standard working state base station:
Figure DEST_PATH_IMAGE002
wherein N isijIs the degree of membership of the user i to the jth stay base station and the standard working state base stationEvaluation value, αzThe weight corresponding to the data of the z-th base station, and then selecting a minimum value from the membership evaluation values of all the staying base stations and the standard working state base station of each user, wherein the staying base station corresponding to the minimum value is the base station to which each user belongs when working.
3. The method according to claim 1, wherein in the second step, the enterprise similarity between every two users is calculated according to the feature data input to every user, which is described by taking users p and q as examples, and further comprising:
step 21, calculating the call feature similarity of the users p and q:
Figure DEST_PATH_IMAGE003
wherein, thetacIs the weight of the c-th call feature,
Figure DEST_PATH_IMAGE004
is the attribute value of the users p and q on the C-th call feature, C is the number of call features, and the call features include: the method comprises the following steps of (1) total call times, total call duration, number of common contacts and call times of the common contacts;
step 22, calculating the salary income characteristic similarity of the users p and q:
Figure DEST_PATH_IMAGE005
wherein, deltabIs the weight of the b-th payroll income characteristic,
Figure FDA0002795592410000025
is the similarity value of users p and q on the B-th payroll income characteristic, B is the payroll income characteristic number, and the payroll income characteristic comprises: using a short message interface of a bank with the frequency of the first three, fixing the delivery date and the delivery times per month;
step 23, calculating the similarity of the clustering features of the users p and q
Figure DEST_PATH_IMAGE006
Respectively extracting TM base stations with long stay time of users p and q in a certain period of each historical holiday, respectively sequencing the TM base stations extracted for the users p and q according to the stay time from long to short, then comparing whether the stay base stations of the users p and q on each sequencing position are the same one by one, thereby obtaining the same stay base station number of the users p and q, then calculating the clustering feature similarity value of the users p and q on each historical holiday, wherein the clustering feature similarity value is the ratio of the same stay base station number of the users p and q to the TM, and finally calculating the clustering feature similarity of the users p and q, namely the average value of the clustering feature similarity values of the users p and q on all historical holidays;
step 24, calculating the dinner party feature similarity of the users p and q
Figure DEST_PATH_IMAGE007
Comparing whether the base stations with the longest residence time of the users p and q in a certain time period of each working day in a statistical period are the same one by one, and counting the number of days of the base stations which are the same, and then calculating the dinner gathering similarity of the users p and q, namely the ratio of the number of days of the base stations which are the same to the total number of days of all working days in the statistical period;
step 25, calculating the enterprise similarity of the users p and q:
Figure DEST_PATH_IMAGE008
where ρ is1、ρ2、ρ3、ρ4Respectively are the weights of the call feature similarity, the payroll income feature similarity, the group building feature similarity and the party meal feature similarity.
4. The method of claim 3, wherein, in step 22, for example, when the b-th payroll income characteristic is the discrete attribute data,
Figure DEST_PATH_IMAGE009
the calculation formula of (a) is as follows:
Figure 2
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE011
the b-th payroll income characteristic values of users p and q respectively; when the b-th payroll income characteristic is continuity attribute data,
Figure DEST_PATH_IMAGE012
the calculation formula of (a) is as follows:
Figure 1
wherein, ubanmax、ubanminThe maximum value and the minimum value of the b-th payroll income characteristic are respectively.
5. The method as claimed in claim 1, wherein the building-user clustering model further comprises, after dividing all users in the building user group into a plurality of communities by using a community discovery Louvain algorithm:
a1, selecting a plurality of users with low enterprise similarity from each enterprise user group as reselected users according to the enterprise similarity between every two users in each enterprise user group in the building user group, forming reselected user groups by all reselected users, and deleting the reselected users from the enterprise user groups to which the reselected users belong;
step A2, calculating the similarity between each user in the reselecting user group and each enterprise user group in the building user group, wherein the similarity between the user and the enterprise user group is the mean value of the enterprise similarities between the user and all the users in the enterprise user group, selecting the enterprise user group with the highest similarity for each user in the reselecting user group, then judging whether the similarity between each user and the selected enterprise user group is greater than the enterprise similarity between a certain number of users in the selected enterprise user group, and if so, adding the user into the selected enterprise user group; if not, a new enterprise user group is constructed for the user, and the user is added into the new enterprise user group.
6. The method as claimed in claim 1, wherein the building-user clustering model further comprises, after dividing all users in the building user group into a plurality of communities by using a community discovery Louvain algorithm:
step B1, calculating the similarity between every two enterprise user groups in the building user group, wherein the similarity between the two enterprise user groups is the mean value of the enterprise similarities between all the users in the two enterprise user groups, and then combining a plurality of enterprise user groups with high similarity into one enterprise user group;
and step B2, judging whether the number of users of each enterprise user group is smaller than the threshold number of people one by one, if so, calculating the similarity between the enterprise user group and other enterprise user groups in the building user group, and merging the enterprise user group into other enterprise user groups with the highest similarity.
7. The method of claim 1, wherein the training of the building-user clustering model by two steps, after determining the respective weight parameters in the building-user clustering model, further comprises:
step C1, inputting the characteristic data of all users in the tested building user group into the trained building-user clustering model, and outputting and obtaining a plurality of enterprise user groups to which all users in the tested building user group respectively belong;
step C2, acquiring the names of the users in each enterprise user group in the test building user group and the enterprise to which the users belong, and selecting the enterprise name with the largest number of users for each enterprise user group as the name of each enterprise user group;
step C3, calculating the clustering accuracy and the mixing rate:
Figure DEST_PATH_IMAGE014
wherein Accuracy is the clustering Accuracy, Mess is the clustering confounding rate, NucIs the correct number of users to be grouped, NuIs the number of users tested, NC is the number of enterprises tested, x belongs to [1, NC],mxIs the number of users, M, in the x-th enterprise user group that do not belong to the enterprise corresponding to the name of the enterprise user groupxIs the number of users in the x-th enterprise user group;
and step C4, judging whether the calculated clustering accuracy is greater than the accuracy threshold and the clustering mixing rate is less than the mixing rate threshold, if not, indicating that the model effect does not meet the requirement, and continuing to adjust the model.
CN202011330345.5A 2020-11-24 2020-11-24 Building personnel affiliation identification method based on big data mining technology Active CN112417076B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011330345.5A CN112417076B (en) 2020-11-24 2020-11-24 Building personnel affiliation identification method based on big data mining technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011330345.5A CN112417076B (en) 2020-11-24 2020-11-24 Building personnel affiliation identification method based on big data mining technology

Publications (2)

Publication Number Publication Date
CN112417076A true CN112417076A (en) 2021-02-26
CN112417076B CN112417076B (en) 2022-08-05

Family

ID=74777572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011330345.5A Active CN112417076B (en) 2020-11-24 2020-11-24 Building personnel affiliation identification method based on big data mining technology

Country Status (1)

Country Link
CN (1) CN112417076B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130138479A1 (en) * 2010-05-24 2013-05-30 Telefonaktiebolaget Lm Ericsson (Publ) Classification of network users based on corresponding social network behavior
CN103700018A (en) * 2013-12-16 2014-04-02 华中科技大学 Method for dividing users in mobile social network
CN111221879A (en) * 2020-04-22 2020-06-02 南京柏跃软件有限公司 Potential community member detection method and detection model based on track similarity

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130138479A1 (en) * 2010-05-24 2013-05-30 Telefonaktiebolaget Lm Ericsson (Publ) Classification of network users based on corresponding social network behavior
CN103700018A (en) * 2013-12-16 2014-04-02 华中科技大学 Method for dividing users in mobile social network
CN111221879A (en) * 2020-04-22 2020-06-02 南京柏跃软件有限公司 Potential community member detection method and detection model based on track similarity

Also Published As

Publication number Publication date
CN112417076B (en) 2022-08-05

Similar Documents

Publication Publication Date Title
Jooa et al. Implementation of a recommendation system using association rules and collaborative filtering
CN111178624B (en) New product demand prediction method
CN110415091A (en) Shop and Method of Commodity Recommendation, device, equipment and readable storage medium storing program for executing
CN111177559B (en) Text travel service recommendation method and device, electronic equipment and storage medium
Ye et al. Telecom customer segmentation with K-means clustering
Qiu et al. Clustering Analysis for Silent Telecom Customers Based on K-means++
CN107527240A (en) A kind of operator's industry product Praise effect identification system and method
CN112184484A (en) Differentiated service method and system for power users
CN109254909A (en) A kind of test big drawing generating method and system
CN106776859A (en) Mobile solution App commending systems based on user preference
Qiuru et al. Telecom customer segmentation based on cluster analysis
CN111352976A (en) Search advertisement conversion rate prediction method and device for shopping nodes
CN112149352A (en) Prediction method for marketing activity clicking by combining GBDT automatic characteristic engineering
Liu et al. A hybrid book recommendation algorithm based on context awareness and social network
CN111428092B (en) Bank accurate marketing method based on graph model
CN113343077A (en) Personalized recommendation method and system integrating user interest time sequence fluctuation
CN112417076B (en) Building personnel affiliation identification method based on big data mining technology
Wang et al. A Comparative Study on Contract Recommendation Model: Using Macao Mobile Phone Datasets
Lewaaelhamd Customer segmentation using machine learning model: an application of RFM analysis
Piatykop et al. Model Selection of the Target Audience in Social Networks in Order to Promote the Product.
Zhu et al. Context-aware restaurant recommendation for group of people
WO2024001102A1 (en) Method and apparatus for intelligently identifying family circle in communication industry, and device
Ou et al. On data mining for direct marketing
Yuan et al. Location recommendation algorithm based on temporal and geographical similarity in location-based social networks
Mustafa et al. Recommendation system based on item and user similarity on restaurants directory online

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 310013 4th floor, No.398 Wensan Road, Xihu District, Hangzhou City, Zhejiang Province

Patentee after: Xinxun Digital Technology (Hangzhou) Co.,Ltd.

Address before: 310013 4th floor, No.398 Wensan Road, Xihu District, Hangzhou City, Zhejiang Province

Patentee before: EB Information Technology Ltd.

CP01 Change in the name or title of a patent holder