CN107563429A - A kind of sorting technique and device of network user colony - Google Patents

A kind of sorting technique and device of network user colony Download PDF

Info

Publication number
CN107563429A
CN107563429A CN201710743140.1A CN201710743140A CN107563429A CN 107563429 A CN107563429 A CN 107563429A CN 201710743140 A CN201710743140 A CN 201710743140A CN 107563429 A CN107563429 A CN 107563429A
Authority
CN
China
Prior art keywords
feature
user
network user
sorted
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710743140.1A
Other languages
Chinese (zh)
Other versions
CN107563429B (en
Inventor
孙波
房婧
杜雄杰
姚珊
张伟
司成祥
李应博
刘成
姜栋
王亿芳
张建松
董建武
张文学
杜晓梦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Percent Technology Group Co ltd
National Computer Network and Information Security Management Center
Original Assignee
Beijing Baifendian Information Science & Technology Co Ltd
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baifendian Information Science & Technology Co Ltd, National Computer Network and Information Security Management Center filed Critical Beijing Baifendian Information Science & Technology Co Ltd
Publication of CN107563429A publication Critical patent/CN107563429A/en
Application granted granted Critical
Publication of CN107563429B publication Critical patent/CN107563429B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the present application discloses a kind of sorting technique and device of network user colony, this method includes obtaining the historical data that the network user to be sorted carries out network access, and the network user to be sorted is the user that its historical data matches with predetermined targeted customer's feature;According to user's disaggregated model, the historical data and the targeted customer's feature pre-established, the classification information belonging to the network user to be sorted is determined.Utilize the embodiment of the present application, the historical data to be matched by predetermined targeted customer's Feature Selection, full content based on the historical data determines the classification information belonging to the network user to be sorted, whole historical datas of the network user to be sorted are fully excavated on the whole, the classification information belonging to the network user to be sorted is determined on this basis, so that the qualified degree of targeted customer is higher, classification information accuracy is improved.

Description

A kind of sorting technique and device of network user colony
Technical field
The application is related to field of computer technology, more particularly to a kind of sorting technique and device of network user colony.
Background technology
With the continuous development of network technology, network becomes increasingly complex, and web site contents are also increasingly abundanter, so, Xiang Yong Family recommendation information, and safety of network etc. just become particularly important.Generally can by the behavioural characteristic and rule of user come Information recommendation, network monitor and the optimization of website are realized, and before this, how to determine that targeted customer needs to solve important asks Topic.
Generally targeted customer can be determined by determining the classification belonging to the network user, to realize information recommendation, network Monitoring and the optimization of website.Specifically, generally it is determined that during classification information belonging to the network user, carried by website or service Realized for the business experience of business, for example, it is necessary to recommend to targeted customer corresponding in the popularization activity of certain electric business category or brand Information, generally using the network user of this browsed brand or the commodity of similar category as targeted customer, so that it is determined that target Classification information belonging to user;Or when judging whether some network user is related to network security, by judging the network user Certain sensitive website whether is have accessed, or has used certain sensitive application program to realize, so that it is determined that belonging to targeted customer Classification information.
However, the processing mode of the classification information belonging to targeted customer is determined above by business experience, just with A part of behavioral data in all operation behavior data of user, for example, certain user only browsed once this brand or Similar category commodity are but as targeted customer, so that the qualified degree of obtained targeted customer is different, it is determined that Classification information accuracy it is poor.
The content of the invention
The purpose of the embodiment of the present application is to provide a kind of sorting technique and device of network user colony, to solve existing skill A part of behavioral data in all operation behavior data of user make use of just to determine the classification letter belonging to targeted customer in art Breath, and the qualified degree difference of obtained targeted customer, it is determined that classification information accuracy difference the problem of.
In order to solve the above technical problems, what the embodiment of the present application was realized in:
A kind of sorting technique for network user colony that the embodiment of the present application provides, methods described include:
The historical data that the network user to be sorted carries out network access is obtained, the network user to be sorted goes through for it The user that history data match with predetermined targeted customer's feature;
According to user's disaggregated model, the historical data and the targeted customer's feature pre-established, it is determined that described treat Classification information belonging to the network user of classification.
Alternatively, the historical data for obtaining the network user to be sorted and carrying out network access, including:
Obtain the behavior sample data of the network user;
Feature extraction is carried out to the behavior sample data, using the validity feature of extraction as targeted customer's feature, The validity feature is the class another characteristic that can characterize the corresponding network user;
The web-based history access data for obtaining in database and matching with targeted customer's feature are accessed from web-based history, By the network user to be sorted of the network user belonging to the web-based history access data of acquisition, and the web-based history of acquisition is accessed Data carry out the historical data of network access as the network user to be sorted.
Alternatively, the validity feature using extraction is as targeted customer's feature, including:
Based on chi amount method and/or Information Gain Method, the chi of each feature of extraction is calculated respectively Value and/or information gain value;
The feature that chi value and/or information gain value are exceeded to corresponding predetermined threshold is special as the targeted customer Sign, the chi amount method are used to determine the dependence between the feature and classification of extraction, described information gain method For characterizing the information delta in categorizing system before and after increase predetermined characteristic.
Alternatively, based on Information Gain Method, the information gain value of each feature in the behavior sample data is calculated, is wrapped Include:
The number of users of the acquisition other network user of predetermined class from the behavior sample data, and non-predetermined classification The number of users of the network user;
According to the number of users of the other network user of the predetermined class and the number of users of the network user of non-predetermined classification, Determine comentropy;
The frequency that each feature is designated user and unspecified persons access is obtained, and obtains each feature and is not designated to use The frequency that family and unspecified persons access;
According to described information entropy, the frequency that each feature is designated user and unspecified persons access, and each feature The frequency of user and unspecified persons access is not designated, it is determined that the information gain value of each feature.
Alternatively, based on chi amount method, the chi amount of each feature in the behavior sample data is calculated Value, including:
Obtain access target feature and behavior sample data where it be specify user behavior sample data the One number, obtains access target feature and behavior sample data where it are the of the behavior sample data of unspecified persons Two numbers, it is the 3rd of the behavior sample data of specified user to obtain non-access target feature and the behavior sample data where it Number, it is the 4th of the behavior sample data of unspecified persons to obtain non-access target feature and the behavior sample data where it Number;
According to first number, second number, the 3rd number and the 4th number, the target is determined The chi value of feature;
Wherein, the target signature is the arbitrary characteristics included in the behavior sample data.
Alternatively, methods described also includes:
Obtain the behavior sample data for the network user for including classification information;
User's disaggregated model is established, and the behavior sample data of the network user based on the classification information are to described User's disaggregated model is trained, user's disaggregated model after being trained.
Alternatively, user's disaggregated model is Random Forest model, GBDT models or bagging models.
Alternatively, the network user to be sorted includes multiple network users, and methods described also includes:
A point group is carried out to the network user to be sorted to handle, obtain at least according to predetermined k-means clustering algorithms One network user colony, the number of users included in the network user colony are no more than number of users threshold value.
Alternatively, user's disaggregated model, the historical data and the targeted customer's feature that the basis pre-establishes, The classification information belonging to the network user to be sorted is determined, including:
According to user's disaggregated model, the historical data and the targeted customer's feature pre-established, it is determined that described treat The prediction probability that the network user of classification belongs to a different category;
The prediction probability to be belonged to a different category according to the network user to be sorted, determine that the network to be sorted is used Classification information belonging to family.
A kind of sorter for network user colony that the embodiment of the present application provides, described device include:
Historical data acquisition module, the historical data of network access is carried out for obtaining the network user to be sorted, it is described The network user to be sorted is the user that its historical data matches with predetermined targeted customer's feature;
Category determination module, for being used according to the user's disaggregated model, the historical data and the target that pre-establish Family feature, determine the classification information belonging to the network user to be sorted.
Alternatively, the historical data acquisition module, including:
Behavior sample acquiring unit, for obtaining the behavior sample data for the network user for including classification information;
Feature extraction unit, for carrying out feature extraction to the behavior sample data, and obtained from the feature of extraction Meet the feature of predetermined selection condition as targeted customer's feature;
Characteristic matching unit, obtain what is matched with targeted customer's feature in database for being accessed from web-based history Web-based history accesses the historical data that data carry out network access as the network user to be sorted.
Alternatively, the feature extraction unit, for based on chi amount device and/or information gain device, difference Calculate the chi value and/or information gain value of each feature of extraction;By chi value and/or information gain value Feature more than corresponding predetermined threshold is used for the spy of determination extraction as targeted customer's feature, the chi amount device Dependence between sign and classification, described information gain apparatus are used to characterize the letter increased in categorizing system before and after predetermined characteristic Cease increment.
The technical scheme provided from above the embodiment of the present application, the embodiment of the present application is by obtaining network to be sorted User carries out the historical data of network access, wherein, the network user to be sorted is its historical data and predetermined target The user that user characteristics matches, then, used according to the user's disaggregated model, the historical data and the target that pre-establish Family feature, the classification information belonging to the network user to be sorted is determined, so, pass through predetermined targeted customer's Feature Selection The historical data to match, the full content based on the historical data determine the classification letter belonging to the network user to be sorted Breath, whole historical datas of the network user to be sorted are fully excavated on the whole, be followed successively by the network to be sorted according to determination Classification information belonging to user, so that the qualified degree of obtained targeted customer is higher, improve classification information standard True property.
Brief description of the drawings
, below will be to embodiment or existing in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments described in application, for those of ordinary skill in the art, do not paying the premise of creative labor Under, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is a kind of sorting technique embodiment of network user colony of the application;
Fig. 2 is the sorting technique embodiment of the application another kind network user colony;
Fig. 3 is a kind of schematic diagram of the acquisition methods of historical data of the application;
Fig. 4 is a kind of sorter embodiment of network user colony of the application;
Fig. 5 is a kind of sorting device embodiment of network user colony of the application.
Embodiment
The embodiment of the present application provides a kind of sorting technique and device of network user colony.
In order that those skilled in the art more fully understand the technical scheme in the application, it is real below in conjunction with the application The accompanying drawing in example is applied, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described implementation Example only some embodiments of the present application, rather than whole embodiments.It is common based on the embodiment in the application, this area The every other embodiment that technical staff is obtained under the premise of creative work is not made, it should all belong to the application protection Scope.
Embodiment one
As shown in figure 1, the embodiment of the present application provides a kind of sorting technique of network user colony, the executive agent of this method Can be server or terminal device, wherein, the server can be independent server or by multiple server groups Into server cluster etc., the terminal device can be any electronic equipment, for example, the mobile terminal such as mobile phone or tablet personal computer is set It is standby, or, it can also be the terminal devices such as personal computer.This method is determined for the class belonging to network user colony Not, and obtain in the processing such as the network user colony of specified classification.This method specifically may comprise steps of:
In step S101, the historical data that the network user to be sorted carries out network access is obtained, it is described to be sorted The network user is the user that its historical data matches with predetermined targeted customer's feature.
Wherein, the network user can be some specific user or the network being made up of multiple network users User group, the network user therein can be the users of any website, for example, the user of shopping website, the use of forum website Family, the user etc. of resource website.Network user colony can be all users or part therein use in certain website Family, network user colony can also be the colony that the packet of certain user division is formed, or, network user colony can also be category In same category of user group or the user group with denominator etc..Targeted customer's feature can be arbitrary characteristics, can So that including a feature, multiple features can also be included, for example, the feature of the information structure of certain commodity of network user's purchase Feature (such as mother of the information structure provided in (such as chiffon seven-tenths trousers or embroidery lapel shirt) or the marketing activity of certain businessman Section promotion or the promotion of 516 category days etc.).
In force, the interest of the different network users, hobby and custom etc. also tend to difference, and for some or it is more It individual hobby or custom, can be provided simultaneously with by multiple network users, for example, user A likes playing soccer and played badminton, used Family B likes playing soccer and played table tennis, then user A and user B have the hobby that identical is played soccer.In order to improve user Experience, the network user can be divided into multiple different classifications by many websites or service provider, for example, for shopping website, The network user can be divided into according to the commodity or the commodity of purchase that user likes the Interests User of electronic product, dress ornament it is emerging Interests User of interesting user and cosmetics etc..So, can subsequently be pushed away according to the classification of the network user to the corresponding network user Recommend information, the merchandise news in such as above-mentioned shopping website.As can be seen here, to some network user or network user colony (network user i.e. to be sorted) carries out classification and is very important.
, can be with when needing to classify to some network user or network user colony (network user i.e. to be sorted) Using data caused by each network user progress network access in the network user or network user colony as analysis number According to.And in actual applications, data volume caused by each network user's progress network access is often larger, therefore, to network When user is classified, some hiding or potential feature (the i.e. targeted customer's feature, such as with wanting wherein included can be directed to The feature of direct correlation relation is not present in the purpose reached) data analyzed, you can it is one or more hidden to preset Hide or potential feature is as targeted customer's feature, for example, being provided in the marketing activity of certain businessman as described in above-mentioned example Feature (such as Mother's Day promotion or the promotion of 516 category days) of information structure etc..Can be from the network user or network user group Each network user in body carries out the number that extracting data caused by network access matches with targeted customer's feature According to the historical data that network access is carried out as the network user to be sorted.
In addition, before the network user to be carried out to data caused by network access and targeted customer's characteristic matching, may be used also Pre-processed with data caused by carrying out network access to the network user, you can with by included in data and network access Unrelated data are deleted, can be with for example, can delete the network parametric data in above-mentioned data and system parameter data etc. The impurity data included in above-mentioned data is deleted, for example, the impurity data mixed in caused wrong data and transmitting procedure Deng so, the follow-up efficiency for carrying out targeted customer's characteristic matching can be improved.
In step s 102, it is special according to the user's disaggregated model, the historical data and the targeted customer that pre-establish Sign, determines the classification information belonging to the network user to be sorted.
Wherein, classification information can be the relevant information of the classification belonging to the network user, for example, classification information can be A Brand mobile phone fan or shooting fan, can also be travel enthusiasts etc..
In force, user's classification for being classified to the network user can be established according to the experience in practical application Model, you can during the historical data of network access is carried out in collection network user, count and summarize in historical data Comprising rule and identical characteristic, and according to statistics and summarize related content and the network user belonging to classification be associated Analysis, it is determined that incidence relation between the two, and user's disaggregated model can be established based on obtained above-mentioned incidence relation.In reality In the application of border, in addition to it can build user's disaggregated model by experience, user's classification can also be established including various ways Model, for example, appropriate disaggregated model can be chosen from conventional disaggregated model as preliminary classification model, it is then possible to Corresponding training data is gathered for the preliminary classification model of selection, the training data can be used to the preliminary classification model It is trained, the disaggregated model after being trained.Above-mentioned training data or test data can be used to the classification mould after training Type carries out Accuracy Verification, if the verification passes, then can be using the disaggregated model after training as user's disaggregated model, such as Fruit is verified not by then continuing to gather corresponding training data, disaggregated model being trained, until being verified.
Because the historical data and targeted customer's feature match, therefore, historical data can be passed through described The mode of targeted customer's feature represents, if for example, targeted customer's feature includes n, wherein n >=1, and n is positive integer, Respectively f1, f2 ... fn, characteristic vector corresponding to targeted customer's feature can be expressed as (f1, f2 ... fn), and treat point The number that the network user of class triggers feature f1 is 3 times, triggering feature f2 number is 5 times, then the network user to be sorted Characteristic vector corresponding to historical data can be expressed as that (3,5,0 ... 0).It is then possible to the network user to be sorted that will be obtained Historical data corresponding to characteristic vector as input value, be input in user's disaggregated model and calculated, obtained meter Calculation result can serve to indicate that the classification belonging to the network user to be sorted, for example, different classifications can be set accordingly Expression numeral, such as classification A is 1, classification B is 2, classification C is 3, after the calculating by user's disaggregated model, 1 can be obtained, 2 or 3 grades represent numeral, and classification information corresponding to the expression numeral being calculated can be found in above-mentioned corresponding relation, from And the classification information belonging to the network user to be sorted is obtained, for example, obtained after the calculating by user's disaggregated model As a result it is 1, then it is classification A that can determine the classification information belonging to the network user to be sorted.
, subsequently can be according to category information from waiting to push away after obtaining the classification information belonging to the network user to be sorted Recommend and corresponding recommendation information is chosen in information, the recommendation information of selection can be sent to the above-mentioned network user, and can root The structure of the network system is improved according to the historical data of the above-mentioned different classes of network user or network user colony and right Internet resources optimize.
The embodiment of the present application provides a kind of sorting technique of network user colony, is entered by the network user for obtaining to be sorted The historical data of row network access, wherein, the network user to be sorted is that its historical data and predetermined targeted customer are special The user to match is levied, it is then, special according to the user's disaggregated model, the historical data and the targeted customer that pre-establish Sign, determines the classification information belonging to the network user to be sorted, so, passes through predetermined targeted customer's Feature Selection phase The historical data matched somebody with somebody, the full content based on the historical data determine the classification information belonging to the network user to be sorted, from Whole historical datas of the network user to be sorted are fully excavated on the whole, are followed successively by the network user institute to be sorted according to determination The classification information of category, so that the qualified degree of obtained targeted customer is higher, improve classification information accuracy.
Embodiment two
As shown in Fig. 2 the embodiment of the present application provides a kind of sorting technique of network user colony, the executive agent of this method Can be server or terminal device, wherein, the server can be independent server or by multiple server groups Into server cluster etc., the terminal device can be any electronic equipment, for example, the mobile terminal such as mobile phone or tablet personal computer is set It is standby, or, it can also be the terminal devices such as personal computer.This method illustrates so that executive agent is server as an example.
The embodiment of the present application provides a kind of sorting technique of network user colony and is determined for institute of network user colony The classification of category, and obtain in the processing such as network user colony for specifying classification.The network user in the embodiment of the present application can be with It is a user or the network user colony being made up of multiple users, in order to better illustrate network user colony Sorting technique, it is described in detail so that the network user is the network user colony being made up of multiple users as an example, for a use The situation at family, following related contents are may refer to, will not be repeated here.This method specifically may comprise steps of:
In step s 201, the behavior sample data for the network user for including classification information are obtained.
Wherein, behavior sample data can be user when carrying out network access, some or multiple behaviour are carried out in website Make the sample data that the data of behavior are formed, operation behavior therein can be any operation that user is able to carry out, such as point Hit operation behavior, lower single operation behavior or down operation behavior of link etc..
In force, data record mechanism can be set in the background server of website or application program, as certain user After its Account Logon server, server can record the various operation behaviors of user's progress, and operation behavior production Raw data, can be by the user of data caused by the operation behavior of record and the operation behavior and the user mark (as registered User name or coding etc.) corresponding storage.Database (i.e. web-based history accesses database) can be provided with server, such as Mysql databases, can be by the data Cun Chudao databases of above-mentioned corresponding storage., can when needing to use above-mentioned data To extract the data of certain data volume from database, the data of extraction can be analyzed, it is determined that the corresponding network user Affiliated classification, can by the classification of determination it is corresponding with user's mark storage into database, or, can also storage user Operation behavior caused by data when, analyze and determine the classification belonging to the corresponding network user.
When needing to use sample data to extract feature, and during training pattern, in the data that can be stored from database The data (such as 10GB data or the data of 100,000 messages) of predetermined amount of data are extracted, can include multiple use in the data The related data at family, user's mark, the production of the operation behavior information of the user, each operation behavior can be included in the related data Raw data and the classification information belonging to the user.Can be using the data of the predetermined amount of data of extraction as behavior sample data.
In step S202, feature extraction is carried out to the behavior sample data, using the validity feature of extraction as target User characteristics.
Wherein, the validity feature is the class another characteristic that can characterize the corresponding network user, specifically can be according to reality Border situation determines, for example, the validity feature can be the feature included in the feature of extraction, and wants the mesh that reaches with it The feature in the absence of direct correlation relation, that is, reach the hiding feature of a certain purpose or potential feature etc., in addition, in some feelings Under condition, validity feature can also be that the feature in the feature extracted with the network user under normal circumstances has poor another characteristic etc..
In force, so that the validity feature is to reach hiding feature or the potential feature of a certain purpose as an example.Website or Service provider often wants the purpose or reason that reach, for example, it is desired to count hobby basket when screening the network user with it The network user of ball motion, therefore, feature extraction can be carried out to the behavior sample data of the network user, extract each network and use Feature corresponding to family, wherein, it can include buying in the feature of the network user of basketball, personal information in these features and record There is its hobby to include the obvious feature associated between above-mentioned purpose such as feature of the network user of basketball, and in reality It may find that the network user of hobby basketball movement generally bought wrister in, so, bought the feature of wrister just As the key feature of the network user of statistics hobby basketball movement, and wrister is bought with liking basketball movement and being not present Direct correlation relation, therefore, the feature for buying the network user of wrister are to reach the net of statistics hobby basketball movement The hiding feature or potential feature of the purpose of network user, it that is to say validity feature (targeted customer's feature can be used as).If therefrom The hiding feature for reaching a certain purpose or potential feature are not searched, then can be continued above-mentioned to next network user execution Judge, untill all network users are fully completed above-mentioned decision.
The validity feature using extraction in above-mentioned steps S202 as targeted customer's feature processing can with varied, with It is lower that a kind of optional processing mode is provided, it specifically may comprise steps of the processing of one and step 2:
Step 1, based on chi amount method and/or Information Gain Method, the card of each feature of extraction is calculated respectively Fang Tongji values and/or information gain value.
Wherein, the chi amount method is used to determine the dependence between the feature and classification of extraction, card side's system The most basic thought of metering method is exactly to determine the correctness of conclusion by observing the deviation of actual value and theoretical value.It is actual In, usually first assume that two variables are independent (i.e. null hypothesises), then observe actual value (can also be called observed value) With the extent of deviation of theoretical value, if deviation is sufficiently small, it is considered that error is natural sample error, i.e., it is due to survey Amount means not enough accurately cause or occurrent, now just receive null hypothesis, it is believed that both are independent, if deviation is big Both to a certain extent so that such error is unlikely to be caused by accidental generation or measurement inaccurately, then can consider Actually related, that is, negate null hypothesis.
Described information gain method is used to characterize the information delta increased in categorizing system before and after predetermined characteristic.Increase in information In benefit, its criterion is the quantity that certain feature can be the information that categorizing system is brought, and the information brought is more, and this feature is got over It is important.
In force, for chi amount method, each of said extracted can be calculated by chi quantity algorithm Dependence between feature and any classification set in advance, obtain corresponding chi value.For example, the feature of extraction Including feature 1, feature 2 and feature 3, classification includes classification A and classification B, and the dependence that can be calculated between feature 1 and classification A is closed System, corresponding chi value 1 is obtained, calculate the dependence between feature 1 and classification B, obtain corresponding chi Value 2, the dependence between feature 2 and classification A is calculated, obtain corresponding chi value 3, calculate feature 2 and classification B Between dependence, obtain corresponding chi value 4, calculate the dependence between feature 3 and classification A, obtain phase The chi value 5 answered, the dependence between feature 3 and classification B is calculated, obtains corresponding chi value 6.
Above-mentioned chi amount method can specifically be performed by way of following steps 1 and step 2, referring specifically to following Content:
Step 1:It is to specify the behavior sample number of user to obtain access target feature and the behavior sample data where it According to the first number, obtain access target feature and behavior sample data where it be unspecified persons behavior sample number According to the second number, it is to specify the behavior sample data of user to obtain non-access target feature and the behavior sample data where it The 3rd number, obtain non-access target feature and the behavior sample data where it be unspecified persons behavior sample data The 4th number.
Wherein, specified user can be any user, can specifically be determined according to actual conditions.The target signature is institute The arbitrary characteristics included in behavior sample data are stated, for example, it may be feature of certain commodity etc..
In force, one or more representative or user identity can be chosen in behavior sample data to meet in advance For the network user of fixed condition (such as VIP user) as user is specified, the remaining network user is unspecified persons.It can unite Count the data below in obtained behavior sample data:Access target feature and the behavior sample data where it are to specify to use First number of the behavior sample data at family, access target feature and behavior sample data where it are unspecified persons Second number of behavior sample data, non-access target feature and the behavior sample data where it are to specify the behavior sample of user The 3rd number, non-access target feature and the behavior sample data where it of notebook data are the behavior sample number of unspecified persons According to the 4th number.
Step 2:According to first number, second number, the 3rd number and the 4th number, institute is determined State the chi value of target signature.
In force, in order to simplify follow-up description, first number can be represented using A, described second is counted Mesh is represented using B, and the 3rd number is represented using C, and the 4th number is represented using D, then can pass through following public affairs Formula
Calculate the chi value of the target signature.Wherein, N be behavior sample data quantity, χ2For the mesh Mark the chi value of feature.Above-mentioned to each network user progress in behavior sample data respective wire can be calculated Chi value corresponding to network user.
If it should be noted that corresponding to some network user between feature and some classification independently of each other, The estimate of card side corresponding to the network user is 0.
For Information Gain Method, each feature that by information gain algorithm, can calculate said extracted adds classification Information delta before and after system, obtain corresponding information gain value.For example, the feature of extraction includes feature 1, feature 2 and feature 3, the information delta that feature 1 is added before and after categorizing system can be calculated, obtains corresponding information gain value 1, feature 2 is calculated and adds Information delta before and after categorizing system, corresponding information gain value 2 is obtained, calculate the information that feature 3 is added before and after categorizing system Increment, obtain corresponding information gain value 3.
Above- mentioned information gain method can specifically be performed by way of 1~step 4 of following steps, referring specifically to following interior Hold:
Step 1:The number of users of the other network user of predetermined class is obtained from the behavior sample data, and it is non-predetermined The number of users of the network user of classification.
In force, it can choose one or more representative or can reflect in multiple classifications set in advance Go out the classification of website or service provider's purpose to be reached as predetermined classification, can be carried from the behavior sample data Take the user that classification information is the other network user of predetermined class to identify, and count the number of user's mark of extraction, to determine phase The number of users answered.Meanwhile the use from the behavior sample extracting data classification information for the network user of non-predetermined classification Family identifies, and counts the number of user's mark of extraction, to determine corresponding number of users.
Step 2:According to the number of users of the other network user of the predetermined class and the use of the network user of non-predetermined classification Amount mesh, determines comentropy.
Wherein, comentropy can be understood as information delta.
In force, in order to simplify follow-up description, the number of users of the other network user of predetermined class can be used N1 Represent, the number of users of the network user of the non-predetermined classification is used into N2Represent, then can pass through below equation
Calculate comentropy.Wherein, Entropy (S) is comentropy.
Step 3:Obtain each feature and be designated the frequency that user and unspecified persons access, and obtain each feature not by Specify the frequency of user and unspecified persons access.
In force, can as described in above-mentioned chi amount method, can be chosen in behavior sample data one or Multiple representative or user identity meets that the network user of predetermined condition is as specified user, the remaining network user Unspecified persons.The data below in obtained behavior sample data can be counted:Each feature is designated user and non-designated The frequency that user accesses, and the frequency that each feature is not designated user and unspecified persons access, for example, can count every The frequency that individual commodity are designated user and unspecified persons access, and each commodity are not designated user and unspecified persons are visited The frequency asked.
Step 4:According to described information entropy, the frequency that each feature is designated user and unspecified persons access, and often The frequency that individual feature is not designated user and unspecified persons access, it is determined that the information gain value of each feature.
In force, in order to simplify follow-up description, each feature can be designated to the frequency usage A tables that user accesses Show, the frequency usage B that each feature is accessed by unspecified persons is represented, each feature is not designated to the frequency that user accesses Represented using C, the frequency usage D that each feature is not accessed by unspecified persons is represented, then can pass through below equation
Calculate the information gain value of each feature.Wherein, InfoGain be certain feature information gain value, Entropy (S) For comentropy.
Step 2, chi value and/or information gain value are exceeded into the feature of corresponding predetermined threshold as the mesh Mark user characteristics.
In force, for chi amount method and Information Gain Method, corresponding threshold value can be set to select respectively Hiding or potential feature (i.e. validity feature) is taken, the setting of predetermined threshold can determine according to actual conditions.For example, for card Predetermined threshold corresponding to square statistic method can be M, can be N for predetermined threshold corresponding to Information Gain Method.It is based on Above-mentioned example, for chi amount method, can by chi value 1, chi value 2, chi value 3, Chi value 4, chi value 5 and chi value 6 compared with predetermined threshold M, can will be greater than making a reservation for respectively The feature of the network user is as targeted customer's feature corresponding to threshold value M chi value.For Information Gain Method, It is able to can will be greater than making a reservation for by information gain value 1, information gain value 2 and information gain value 3 respectively compared with predetermined threshold N The feature of the network user corresponding to threshold value N information gain value is as targeted customer's feature.
By above-mentioned side hidden or potential targeted customer's feature after, can be based on hiding or potential targeted customer Feature construction user's disaggregated model, to classify to the network user, it specifically may refer to following step S203 processing.
In step S203, user's disaggregated model, and the behavior of the network user based on the classification information are established Sample data is trained to user's disaggregated model, user's disaggregated model after being trained.
Wherein, user's disaggregated model is Random Forest model, GBDT (Gradient Boosting Decision Tree, gradient lifting decision tree) model or bagging models.For Random Forest model, in machine learning, random forest Model is a sorter model for including multiple decision trees, and classification of its output is by some or multiple decision-makings therein Set the mode of the classification of output and determine.For GBDT models, GBDT models are a kind of algorithm models of supervised learning, GBDT moulds The training result of type is a decision forest, and GBDT models are when being trained, it is necessary to which iteration is multiple, and iteration n times, then decision-making is gloomy N tree will be included in woods, each tree all includes several leaves, and each leaf corresponds to some specific fraction.GBDT models The final result of study of decision forest include the structure of fraction and each decision tree corresponding to each leaf.For Bagging models, bagging models are a kind of models for being used for improving the learning algorithm degree of accuracy, and the model can pass through construction One anticipation function series, and in a specific way by above-mentioned anticipation function series of combination into an anticipation function.
In force, so that user's disaggregated model is Random Forest model as an example, can utilize in above-mentioned behavior sample data Total data or partial data as training data, train the Random Forest model, it is described random gloomy after being trained Woods model.Wherein, in order that the result of the Random Forest model output after must training is more accurate, training data is being chosen When, the larger behavior sample data of data volume can be chosen as far as possible as training data.
In addition, in order that must obtain Random Forest model output result it is more accurate, five folding cross validations can be utilized Methods, above-mentioned behavior sample data are divided into training data and test data, i.e., behavior sample data are divided into 5 equal portions, The behavior sample data of 4 equal portions therein are chosen as training data, remaining 1 equal portions behavior sample data are as test number According to can be trained using training data to the Random Forest model, the Random Forest model after being trained.So Afterwards, the Random Forest model after training can be tested using test data, obtains corresponding test value, by The test value arrived calculates the Average Accuracy of the Random Forest model.Accuracy rate threshold value can be preset, specifically can root Set according to actual conditions, such as 0.8 or 0.7 etc., can be by Average Accuracy compared with accuracy rate threshold value, if Average Accuracy Not less than accuracy rate threshold value, then it is considered that the Random Forest model after training is trust model, if Average Accuracy Less than accuracy rate threshold value, then it can reselect other models and continue to create model and train, if all types of models are instructed Average Accuracy after white silk is below accuracy rate threshold value, then can redefine targeted customer's feature.
In step S204, the history for obtaining in database and matching with targeted customer's feature is accessed from web-based history Network access data, by the network user to be sorted of the network user belonging to the web-based history access data of acquisition, and it will obtain Web-based history access the historical data that data carry out network access as the network user to be sorted.
In force, validity feature (i.e. targeted customer spy has been obtained by above-mentioned steps S201 and step S202 processing Sign), it can be gone through by obtained targeted customer's feature to what database (i.e. web-based history accesses database) middle lookup matched History network access data, so as to be met the network user of condition (network user i.e. to be sorted), and net to be sorted Network user carries out the historical data of network access.
For example, as shown in figure 3, targeted customer's feature can be one, or it is multiple, if targeted customer is characterized as It is multiple, and targeted customer's feature includes feature A, feature B and feature C, then feature A can be used to access database with web-based history In the web-based history of each network user access data and matched, obtain accessing data with the web-based history that feature A matches 1, then, using identical method, the web-based history that each network user in database is accessed using feature B and web-based history is visited Ask that data are matched, obtain accessing data 2 with the web-based history that feature B matches, number is accessed using feature C and web-based history Data are accessed according to the web-based history of each network user in storehouse to be matched, and obtain accessing with the web-based history that feature C matches Data 3.Web-based history can be accessed to data 1, web-based history accesses data 2 and web-based history accesses the merging of data 3 and gone through History data, the network user corresponding to the historical data are the network user to be sorted.
The data volume of historical data in view of obtaining through the above way is still larger, therefore can therefrom selected section Historical data, then above-mentioned steps S204 processing can be accomplished by the following way:Access in database and obtain from web-based history Data are accessed with the web-based history that targeted customer's feature matches;Access in data and select from the web-based history to match The web-based history of predetermined number is taken to access the historical data that data carry out network access as the network user to be sorted.
Wherein, predetermined number can be set according to actual conditions, and the embodiment of the present application is not limited this.
It is above-mentioned that the web-based history access for obtaining in database and matching with targeted customer's feature is accessed from web-based history The processing of data, above-mentioned related content is may refer to, will not be repeated here.
It is special according to the user's disaggregated model, the historical data and the targeted customer that pre-establish in step S205 Sign, determines the prediction probability that the network user to be sorted belongs to a different category.
In force, historical data can be represented by way of targeted customer's feature, obtained each to be sorted The network user historical data corresponding to characteristic vector, specifically may refer to the correlation in step S102 in above-described embodiment one Content, it will not be repeated here.It is then possible to by feature corresponding to the historical data of obtained each network user to be sorted to Amount is used as input value, is input in user's disaggregated model and is calculated, obtains the network user to be sorted and belong to not Generic prediction probability, for example, all classifications include classification A, classification B and classification C, then it can calculate each to be sorted The network user is belonging respectively to classification A, classification B and prediction probability corresponding to classification C, and such as network user 1 to be sorted belongs to classification A Prediction probability be 0.8, the network user 1 to be sorted belong to classification B prediction probability be 0.2, the network user 1 to be sorted belongs to In classification C prediction probability be 0.1 etc..
In step S206, the prediction probability that is belonged to a different category according to the network user to be sorted, it is determined that described Classification information belonging to the network user to be sorted.
In force, for some network user to be sorted, the network user to be sorted can be belonged to different The prediction probability of classification is compared, can be using the network user institute to be sorted as this of classification corresponding to prediction probability maximum The classification of category, so as to obtain the classification information belonging to the network user to be sorted, it can obtain being needed through the above way Classification information belonging to the network user of classification.For example, the example based on above-mentioned steps S205, if the network user to be sorted 1 prediction probability for belonging to classification A is 0.8, and the prediction probability that the network user 1 to be sorted belongs to classification B is 0.2, to be sorted The prediction probability that the network user 1 belongs to classification C is 0.1, then the classification information belonging to the network user 1 to be sorted is classification A.
Except can through the above way in addition to, prediction threshold value can also be preset, the size of its numerical value specifically can root Set according to actual conditions.For some network user to be sorted, the network user to be sorted can be belonged to inhomogeneity Prediction probability compared with predicting threshold value, can be more than respectively and predict that classification is used as this corresponding to threshold value by other prediction probability Classification belonging to the network user to be sorted, so as to obtain the classification information belonging to the network user to be sorted.Wherein, if Certain network user to be sorted belongs to two or more different classes of prediction probabilities and is all higher than predicting threshold value, then can therefrom select The classification belonging to the network user to be sorted as this of classification corresponding to prediction probability maximum being selected, or, this can also be treated The network user of classification be respectively divided above-mentioned two or it is multiple it is different classes of in.
, can also be to network to be sorted in the case of the network user to be sorted includes multiple network users User is divided into multiple different network user colonies, specifically may refer to following step S207 processing.
In step S207, a point group is carried out to the network user to be sorted according to predetermined k-means clustering algorithms Processing, obtains at least one network user colony, the number of users included in the network user colony is no more than number of users Threshold value.
Wherein, k-means clustering algorithms are hard clustering algorithms, and it is that typically the object function based on prototype is clustered Representative, k-means clustering algorithms using data point to prototype certain distance as optimize object function, utilize extreme value of a function Method obtain the regulation rule of interative computation.K-means clustering algorithms are used as similarity measure using Euclidean distance, it is therefore an objective to really Surely the optimal classification of a certain initial cluster center vector is corresponded to so that evaluation index is minimum.K-means clustering algorithms use error Sum-of-squares criterion function is as clustering criteria function.
In force, can be according to the prediction probability that the network user to be sorted belongs to a different category to network to be sorted User is further divided.Quantity algorithm can be counted according to jumping degree, determine that the jumping degree of prediction probability exceedes predetermined jump The prediction probability of threshold value is spent, as group's foundation handled is divided, the foundation handled according to described point of group is to the network to be sorted User carries out a point group and handled, and obtains at least one network user colony.Wherein, jumping degree statistic:Definition:If X(1),X(2)…, X(n)The order statistic for being n for the sample size from overall distribution F (x, θ),To only rely upon X(1),X(2)…,X(k)'s It is expected μ point estimation,
Then claimFor μ point K jumping degree.Its measure can be realized by following formula:
Wherein, k≤n.
Specifically, jumping degree threshold value can be preset, can specifically be determined according to actual conditions.Can be according to jumping degree Quantity algorithm is counted, i.e., the jumping degree of prediction probability is calculated by above-mentioned correlation formula, obtained prediction probability is ranked up, Determine occur obvious or large jump point (or position) in the prediction probability after sequence by the result of sequence, will can determine Obvious or large jump point a foundation of the coordinate (or the point or prediction probability of position correspondence) as minute group processing, example Such as 0.3 and 0.9, if the network user to be sorted belongs to the prediction probability of a certain classification more than 0.9, it is considered that to be sorted The network user apart from target group corresponding to above-mentioned classification relatively, if the network user to be sorted belongs to a certain classification Prediction probability be less than 0.3, then it is considered that the network user to be sorted apart from target group corresponding to above-mentioned classification farther out, such as The prediction probability that the fruit network user to be sorted belongs to a certain classification is then between said two devices between 0.3 and 0.9.It is logical Multiple colonies can be divided into by the network user to be sorted by crossing aforesaid way.
It should be noted that in order that the quantity of the network user in the colony that must be marked off is unlikely excessive, cause system Resource overhead is excessive, and quantity is crossed divides group excessively coarse at least, then can predefine point group's quantity, can preset limit value, Such as 1000 or 1500, when the result that number of users/1000 obtain is less than 1000, the result that takes number of users/1000 to obtain As dividing group basic, so, 10,000-1,000 ten thousand target audience can be covered through the above way.
It is then possible to cluster calculation is carried out to the network user to be sorted according to predetermined k-means clustering algorithms, During cluster calculation is carried out, the historical data that can obtain each network user to be sorted through the above way is corresponding Characteristic vector, substitute into k-means clustering algorithm formula in calculated, the of a sort network user to be sorted is divided into In same network user colony, so as to obtain at least one network user colony.So, subsequently can be according to network user group Body and its classification information choose corresponding recommendation information from information to be recommended, can be sent to the recommendation information of selection above-mentioned Network user colony, and can be according to the historical data of above-mentioned different classes of network user colony to the network system structure It is improved and Internet resources is optimized.
The embodiment of the present application provides a kind of sorting technique of network user colony, is entered by the network user for obtaining to be sorted The historical data of row network access, wherein, the network user to be sorted is that its historical data and predetermined targeted customer are special The user to match is levied, it is then, special according to the user's disaggregated model, the historical data and the targeted customer that pre-establish Sign, determines the classification information belonging to the network user to be sorted, so, passes through predetermined targeted customer's Feature Selection phase The historical data matched somebody with somebody, the full content based on the historical data determine the classification information belonging to the network user to be sorted, from Whole historical datas of the network user to be sorted are fully excavated on the whole, are followed successively by the network user institute to be sorted according to determination The classification information of category, so that the qualified degree of obtained targeted customer is higher, improve classification information accuracy.
Embodiment three
The present embodiment will combine specific application scenarios, and a kind of network user colony provided in an embodiment of the present invention is divided Class method is explained in detail, and corresponding application scenarios are to extract it from all purchase information of the network user of certain electric business Comprising hiding feature or potential feature, and determine corresponding network user belonging to classification information.
It can be server that the embodiment of the present application, which provides a kind of executive agent of the sorting technique of network user colony,.This Shen The network user that please be in embodiment is the network user colony being made up of multiple users.This method can specifically include following step Suddenly:
In step S401, the behavior sample data for the network user for including classification information are obtained.
In step S402, it is to specify user to obtain the commodity of access target brand and the behavior sample data where it Behavior sample data the first number, acquisition accessed the commodity of the target brand and the behavior sample data where it are Second number of the behavior sample data of unspecified persons, acquisition do not access the commodity of the target brand and the behavior where it Sample data is to specify the 3rd number of the behavior sample data of user, obtains commodity and its institute for not accessing the target brand Behavior sample data for unspecified persons behavior sample data the 4th number, wherein, the target brand is any Brand.
In step S403, according to first number, second number, the 3rd number and the 4th number Mesh, determine the chi value of the commodity of the target brand.
The chi value of any commodity of each brand can be obtained by above-mentioned processing mode, it is every based on what is obtained The chi value of any commodity of individual brand performs following step S404 processing.
In step s 404, the commodity that chi value exceedes corresponding predetermined threshold are obtained.
For example, the commodity of above-mentioned acquisition can be some commodity of certain High Tier Brand of above-mentioned electric business, or this is high-end Multiple commodity of brand, then or, multiple High Tier Brands of the electric business, and each High Tier Brand are related in the commodity of above-mentioned acquisition Include one or more commodity, then the commodity of above-mentioned acquisition can be above-mentioned multiple commodity etc..
In step S405, the number of users of the other network user of predetermined class is obtained from the behavior sample data, with And the number of users of the network user of non-predetermined classification.
In step S406, used according to the network of the number of users of the other network user of the predetermined class and non-predetermined classification The number of users at family, determines comentropy.
In step S 407, the commodity for obtaining each brand are designated the frequency of user and unspecified persons access, and obtain Take the frequency that the commodity of each brand are not designated user and unspecified persons access.
In step S408, it is designated user according to the commodity of described information entropy, each brand and unspecified persons accesses Frequency, and the commodity of each brand are not designated user and frequency that unspecified persons access, it is determined that the business of each brand The information gain value of product.
In step S409, the commodity that information gain value exceedes corresponding predetermined threshold are obtained.
In step S410, chi value is exceeded into corresponding predetermined threshold, and information gain value exceedes corresponding make a reservation for The commodity of threshold value are as targeted customer's feature.
So, by above-mentioned processing procedure, can from some commodity in each brand (or some High Tier Brands), and its Found in its some similar commodity, and the commodity of other non-equal classifications or category potential feature, notable feature that it includes or Characteristic feature, it is then possible to make full use of all data of corresponding network user, therefrom extract potential feature, notable feature or Characteristic feature specifically may refer to following step to carry out follow-up model training and diffusion crowd.
In step S411, user's disaggregated model, and the behavior of the network user based on the classification information are established Sample data is trained to user's disaggregated model, user's disaggregated model after being trained.
In step S412, the history for obtaining in database and matching with targeted customer's feature is accessed from web-based history Network access data, by the network user to be sorted of the network user belonging to the web-based history access data of acquisition, and it will obtain Web-based history access the historical data that data carry out network access as the network user to be sorted.
It is special according to the user's disaggregated model, the historical data and the targeted customer that pre-establish in step S413 Sign, determines the prediction probability that the network user to be sorted belongs to a different category.
In step S414, the prediction probability that is belonged to a different category according to the network user to be sorted, it is determined that described Classification information belonging to the network user to be sorted.
In step S415, quantity algorithm is counted according to jumping degree the prediction probability is ranked up, it is determined that after sequence Occur the point (or position) of data jump in prediction probability, can by the coordinate of obvious or large jump the point of determination (or The prediction probability of the point or position correspondence) as a minute foundation for group's processing.
In step S416, a point group is carried out to the network user to be sorted according to predetermined k-means clustering algorithms Processing, obtains at least one network user colony, the number of users included in the network user colony is no more than number of users Threshold value.
Above-mentioned steps S401~step S416 processing procedure may refer to the related content in above-described embodiment two, herein Repeat no more.
So, corresponding recommendation can be subsequently chosen from information to be recommended according to network user colony and its classification information Information, the recommendation information of selection can be sent to above-mentioned network user colony, and can be according to above-mentioned different classes of net The historical data of network user group is improved to the structure of the network system and Internet resources is optimized.
The embodiment of the present application provides a kind of sorting technique of network user colony, is entered by the network user for obtaining to be sorted The historical data of row network access, wherein, the network user to be sorted is that its historical data and predetermined targeted customer are special The user to match is levied, it is then, special according to the user's disaggregated model, the historical data and the targeted customer that pre-establish Sign, determines the classification information belonging to the network user to be sorted, so, passes through predetermined targeted customer's Feature Selection phase The historical data matched somebody with somebody, the full content based on the historical data determine the classification information belonging to the network user to be sorted, from Whole historical datas of the network user to be sorted are fully excavated on the whole, are followed successively by the network user institute to be sorted according to determination The classification information of category, so that the qualified degree of obtained targeted customer is higher, improve classification information accuracy.
Example IV
The sorting technique of the network user colony provided above for the embodiment of the present application, based on same thinking, the application Embodiment also provides a kind of sorter of network user colony, as shown in Figure 4.
The sorter of the network user colony includes:Historical data acquisition module 501 and category determination module 502, Wherein:
Historical data acquisition module 501, the historical data of network access, institute are carried out for obtaining the network user to be sorted State the user that the network user to be sorted matches for its historical data with predetermined targeted customer's feature;
Category determination module 502, for according to user's disaggregated model, the historical data and the target pre-established User characteristics, determine the classification information belonging to the network user to be sorted.
In the embodiment of the present application, the historical data acquisition module 501, including:
Behavior sample acquiring unit, for obtaining the behavior sample data of the network user;
Feature extraction unit, for the behavior sample data carry out feature extraction, using the validity feature of extraction as Targeted customer's feature, the validity feature are the class another characteristic that can characterize the corresponding network user;
Characteristic matching unit, obtain what is matched with targeted customer's feature in database for being accessed from web-based history Web-based history accesses data, by the network user to be sorted of the network user belonging to the web-based history access data of acquisition, and will The web-based history of acquisition accesses the historical data that data carry out network access as the network user to be sorted.
In the embodiment of the present application, the feature extraction unit, for based on chi amount device and/or information gain dress Put, calculate the chi value and/or information gain value of each feature of extraction respectively;By chi value and/or letter Feature of the yield value more than corresponding predetermined threshold is ceased as targeted customer's feature, and the chi amount device is used to determine Dependence between the feature and classification of extraction, described information gain apparatus are used to characterize to increase predetermined characteristic in categorizing system Front and rear information delta.
In the embodiment of the present application, the characteristic matching unit, for from web-based history access database in obtain with it is described The web-based history that targeted customer's feature matches accesses data;Access to choose in data from the web-based history to match and make a reservation for The web-based history of number accesses the historical data that data carry out network access as the network user to be sorted.
In the embodiment of the present application, the feature extraction unit, for obtaining predetermined classification from the behavior sample data The network user number of users, and the number of users of the network user of non-predetermined classification;According to the other net of the predetermined class The number of users of the network user of the number of users of network user and non-predetermined classification, determines comentropy;Each feature is obtained to be referred to Determine the frequency of user and unspecified persons access, and obtain the frequency that each feature is not designated user and unspecified persons access Rate;Do not referred to according to described information entropy, the frequency that each feature is designated user and unspecified persons access, and each feature The frequency of user and unspecified persons access is determined, it is determined that the information gain value of each feature.
In the embodiment of the present application, the feature extraction unit, for obtaining access target feature and the behavior where it Sample data is to specify the first number of the behavior sample data of user, obtains access target feature and the behavior sample where it Notebook data is the second number of the behavior sample data of unspecified persons, obtains non-access target feature and the behavior sample where it Notebook data is the 3rd number for specifying the behavior sample data of user, obtains non-access target feature and the behavior sample where it Data are the 4th number of the behavior sample data of unspecified persons;According to first number, second number, described Three numbers and the 4th number, determine the chi value of the target signature;Wherein, the target signature is the row For the arbitrary characteristics included in sample data.
In the embodiment of the present application, described device also includes:
Sample data acquisition module, for obtaining the behavior sample data for the network user for including classification information;
Model building module, for establishing user's disaggregated model, and the network user based on the classification information Behavior sample data are trained to user's disaggregated model, user's disaggregated model after being trained.
In the embodiment of the present application, user's disaggregated model is Random Forest model, GBDT models or bagging models.
In the embodiment of the present application, the network user to be sorted includes multiple network users, and described device also includes:
Grouping module, for carrying out a point group to the network user to be sorted according to predetermined k-means clustering algorithms Processing, obtains at least one network user colony, the number of users included in the network user colony is no more than number of users Threshold value.
In the embodiment of the present application, the category determination module 502, including:
Class prediction unit, for being used according to the user's disaggregated model, the historical data and the target that pre-establish Family feature, determine the prediction probability that the network user to be sorted belongs to a different category;
Classification determination unit, for the prediction probability to be belonged to a different category according to the network user to be sorted, it is determined that Classification information belonging to the network user to be sorted.
The embodiment of the present application provides a kind of sorter of network user colony, is entered by the network user for obtaining to be sorted The historical data of row network access, wherein, the network user to be sorted is that its historical data and predetermined targeted customer are special The user to match is levied, it is then, special according to the user's disaggregated model, the historical data and the targeted customer that pre-establish Sign, determines the classification information belonging to the network user to be sorted, so, passes through predetermined targeted customer's Feature Selection phase The historical data matched somebody with somebody, the full content based on the historical data determine the classification information belonging to the network user to be sorted, from Whole historical datas of the network user to be sorted are fully excavated on the whole, are followed successively by the network user institute to be sorted according to determination The classification information of category, so that the qualified degree of obtained targeted customer is higher, improve classification information accuracy.
Embodiment five
The sorter of the network user colony provided above for the embodiment of the present application, based on same thinking, the application Embodiment also provides a kind of sorting device of network user colony, as shown in Figure 5.
The sorting device of the network user colony can be the server or terminal device that above-described embodiment provides.
The sorting device of network user colony can produce bigger difference because configuration or performance are different, can include one Individual or more than one processor 601 and memory 602, one or more storages can be stored with memory 602 should With program or data.Wherein, memory 602 can be of short duration storage or persistently storage.It is stored in the application program of memory 602 One or more modules (diagram is not shown) can be included, the classification that each module can include to network user colony is set Series of computation machine executable instruction in standby.Further, processor 601 could be arranged to communicate with memory 602, The series of computation machine executable instruction in memory 602 is performed on the sorting device of network user colony.Network user colony Sorting device can also include one or more power supplys 603, one or more wired or wireless network interfaces 604, one or more input/output interfaces 605, one or more keyboards 606.
Specifically in the present embodiment, the sorting device of network user colony includes memory, and one or one with On program, one of them or more than one program storage is in memory, and one or more than one program can wrap One or more modules are included, and each module can include to the series of computation in the sorting device of network user colony Machine executable instruction, and it is configured to that either more than one computing device this or more than one program bag contain by one For carrying out following computer executable instructions:
The historical data that the network user to be sorted carries out network access is obtained, the network user to be sorted goes through for it The user that history data match with predetermined targeted customer's feature;
According to user's disaggregated model, the historical data and the targeted customer's feature pre-established, it is determined that described treat Classification information belonging to the network user of classification.
Alternatively, the executable instruction when executed, can also make the processor:
Obtain the behavior sample data of the network user;
Feature extraction is carried out to the behavior sample data, using the validity feature of extraction as targeted customer's feature, The validity feature is the class another characteristic that can characterize the corresponding network user;
The web-based history access data for obtaining in database and matching with targeted customer's feature are accessed from web-based history, By the network user to be sorted of the network user belonging to the web-based history access data of acquisition, and the web-based history of acquisition is accessed Data carry out the historical data of network access as the network user to be sorted.
Alternatively, the executable instruction when executed, can also make the processor:
Based on chi amount method and/or Information Gain Method, the chi of each feature of extraction is calculated respectively Value and/or information gain value;
The feature that chi value and/or information gain value are exceeded to corresponding predetermined threshold is special as the targeted customer Sign, the chi amount method are used to determine the dependence between the feature and classification of extraction, described information gain method For characterizing the information delta in categorizing system before and after increase predetermined characteristic.
Alternatively, the executable instruction when executed, can also make the processor:
The web-based history access data for obtaining in database and matching with targeted customer's feature are accessed from web-based history;
The web-based history access data of selection predetermined number, which are used as, from the web-based history access data to match treats The network user of classification carries out the historical data of network access.
Alternatively, the executable instruction when executed, can also make the processor:
The number of users of the acquisition other network user of predetermined class from the behavior sample data, and non-predetermined classification The number of users of the network user;
According to the number of users of the other network user of the predetermined class and the number of users of the network user of non-predetermined classification, Determine comentropy;
The frequency that each feature is designated user and unspecified persons access is obtained, and obtains each feature and is not designated to use The frequency that family and unspecified persons access;
According to described information entropy, the frequency that each feature is designated user and unspecified persons access, and each feature The frequency of user and unspecified persons access is not designated, it is determined that the information gain value of each feature.
Alternatively, the executable instruction when executed, can also make the processor:
Obtain access target feature and behavior sample data where it be specify user behavior sample data the One number, obtains access target feature and behavior sample data where it are the of the behavior sample data of unspecified persons Two numbers, it is the 3rd of the behavior sample data of specified user to obtain non-access target feature and the behavior sample data where it Number, it is the 4th of the behavior sample data of unspecified persons to obtain non-access target feature and the behavior sample data where it Number;
According to first number, second number, the 3rd number and the 4th number, the target is determined The chi value of feature;
Wherein, the target signature is the arbitrary characteristics included in the behavior sample data.
Alternatively, the executable instruction when executed, can also make the processor:
Obtain the behavior sample data for the network user for including classification information;
User's disaggregated model is established, and the behavior sample data of the network user based on the classification information are to described User's disaggregated model is trained, user's disaggregated model after being trained.
Alternatively, user's disaggregated model is Random Forest model, GBDT models or bagging models.
Alternatively, the executable instruction when executed, can also make the processor:
The network user to be sorted includes multiple network users,
A point group is carried out to the network user to be sorted to handle, obtain at least according to predetermined k-means clustering algorithms One network user colony, the number of users included in the network user colony are no more than number of users threshold value.
Alternatively, the executable instruction when executed, can also make the processor:
According to user's disaggregated model, the historical data and the targeted customer's feature pre-established, it is determined that described treat The prediction probability that the network user of classification belongs to a different category;
The prediction probability to be belonged to a different category according to the network user to be sorted, determine that the network to be sorted is used Classification information belonging to family.
The embodiment of the present application provides a kind of sorting device of network user colony, is entered by the network user for obtaining to be sorted The historical data of row network access, wherein, the network user to be sorted is that its historical data and predetermined targeted customer are special The user to match is levied, it is then, special according to the user's disaggregated model, the historical data and the targeted customer that pre-establish Sign, determines the classification information belonging to the network user to be sorted, so, passes through predetermined targeted customer's Feature Selection phase The historical data matched somebody with somebody, the full content based on the historical data determine the classification information belonging to the network user to be sorted, from Whole historical datas of the network user to be sorted are fully excavated on the whole, are followed successively by the network user institute to be sorted according to determination The classification information of category, so that the qualified degree of obtained targeted customer is higher, improve classification information accuracy.
System, device, module or the unit that above-described embodiment illustrates, it can specifically be realized by computer chip or entity, Or realized by the product with certain function.One kind typically realizes that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cell phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet PC, wearable device or these equipment The combination of equipment.
For convenience of description, it is divided into various units during description apparatus above with function to describe respectively.Certainly, this is being implemented The function of each unit can be realized in same or multiple softwares and/or hardware during application.
It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the present invention can use the computer for wherein including computer usable program code in one or more The computer program production that usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and internal memory.
Internal memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moved State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Define, calculate according to herein Machine computer-readable recording medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It should also be noted that, term " comprising ", "comprising" or its any other variant are intended to nonexcludability Comprising so that process, method, commodity or equipment including a series of elements not only include those key elements, but also wrapping Include the other element being not expressly set out, or also include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that wanted including described Other identical element also be present in the process of element, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer program product. Therefore, the application can be using the embodiment in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Form.Deposited moreover, the application can use to can use in one or more computers for wherein including computer usable program code The shape for the computer program product that storage media is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.
The application can be described in the general context of computer executable instructions, such as program Module.Usually, program module includes performing particular task or realizes routine, program, object, the group of particular abstract data type Part, data structure etc..The application can also be put into practice in a distributed computing environment, in these DCEs, by Task is performed and connected remote processing devices by communication network.In a distributed computing environment, program module can be with In the local and remote computer-readable storage medium including storage device.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment Divide mutually referring to what each embodiment stressed is the difference with other embodiment.It is real especially for system For applying example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method Part explanation.
Embodiments herein is the foregoing is only, is not limited to the application.For those skilled in the art For, the application can have various modifications and variations.All any modifications made within spirit herein and principle, it is equal Replace, improve etc., it should be included within the scope of claims hereof.

Claims (12)

1. a kind of sorting technique of network user colony, it is characterised in that methods described includes:
The historical data that the network user to be sorted carries out network access is obtained, the network user to be sorted is its history number According to the user to match with predetermined targeted customer's feature;
According to user's disaggregated model, the historical data and the targeted customer's feature pre-established, determine described to be sorted The network user belonging to classification information.
2. according to the method for claim 1, it is characterised in that described to obtain network user's progress network access to be sorted Historical data, including:
Obtain the behavior sample data of the network user;
The behavior sample data are carried out with feature extraction, it is described using the validity feature of extraction as targeted customer's feature Validity feature is the class another characteristic that can characterize the corresponding network user;
The web-based history access data for obtaining in database and matching with targeted customer's feature are accessed from web-based history, will be obtained The web-based history that takes accesses the network user belonging to the data network user to be sorted, and the web-based history of acquisition is accessed into data The historical data of network access is carried out as the network user to be sorted.
3. according to the method for claim 2, it is characterised in that the validity feature using extraction is as the targeted customer Feature, including:
Based on chi amount method and/or Information Gain Method, the chi value of each feature of extraction is calculated respectively And/or information gain value;
Chi value and/or information gain value are exceeded into the feature of corresponding predetermined threshold as targeted customer's feature, The chi amount method is used to determine the dependence between the feature and classification of extraction, and described information gain method is used for Characterize the information delta before and after increase predetermined characteristic in categorizing system.
4. according to the method for claim 3, it is characterised in that based on Information Gain Method, calculate the behavior sample number The information gain value of each feature in, including:
The number of users of the other network user of predetermined class, and the network of non-predetermined classification are obtained from the behavior sample data The number of users of user;
According to the number of users of the other network user of the predetermined class and the number of users of the network user of non-predetermined classification, it is determined that Comentropy;
Obtain each feature and be designated the frequency that user and unspecified persons access, and obtain each feature be not designated user and The frequency that unspecified persons access;
The frequency that user and unspecified persons access is designated according to described information entropy, each feature, and each feature not by The frequency of user and unspecified persons access is specified, it is determined that the information gain value of each feature.
5. according to the method for claim 3, it is characterised in that based on chi amount method, calculate the behavior sample The chi value of each feature in data, including:
It is to specify the first number of the behavior sample data of user to obtain access target feature and the behavior sample data where it Mesh, obtains access target feature and the behavior sample data where it count for the second of the behavior sample data of unspecified persons Mesh, it is to specify the 3rd number of the behavior sample data of user to obtain non-access target feature and the behavior sample data where it Mesh, obtains non-access target feature and the behavior sample data where it count for the 4th of the behavior sample data of unspecified persons Mesh;
According to first number, second number, the 3rd number and the 4th number, the target signature is determined Chi value;
Wherein, the target signature is the arbitrary characteristics included in the behavior sample data.
6. according to the method for claim 1, it is characterised in that methods described also includes:
Obtain the behavior sample data for the network user for including classification information;
User's disaggregated model is established, and the behavior sample data of the network user based on the classification information are to the user Disaggregated model is trained, user's disaggregated model after being trained.
7. according to the method for claim 1, it is characterised in that user's disaggregated model is Random Forest model, GBDT Model or bagging models.
8. according to the method for claim 1, it is characterised in that the network user to be sorted includes multiple networks and used Family, methods described also include:
A point group is carried out to the network user to be sorted to handle, obtain at least one according to predetermined k-means clustering algorithms Network user colony, the number of users included in the network user colony are no more than number of users threshold value.
9. according to the method for claim 1, it is characterised in that user's disaggregated model that the basis pre-establishes, described Historical data and targeted customer's feature, the classification information belonging to the network user to be sorted is determined, including:
According to user's disaggregated model, the historical data and the targeted customer's feature pre-established, determine described to be sorted The prediction probability that belongs to a different category of the network user;
The prediction probability to be belonged to a different category according to the network user to be sorted, determine the network user institute to be sorted The classification information of category.
10. a kind of sorter of network user colony, it is characterised in that described device includes:
Historical data acquisition module, carries out the historical data of network access for obtaining the network user to be sorted, described to treat point The network user of class is the user that its historical data matches with predetermined targeted customer's feature;
Category determination module, for special according to the user's disaggregated model, the historical data and the targeted customer that pre-establish Sign, determines the classification information belonging to the network user to be sorted.
11. device according to claim 10, it is characterised in that the historical data acquisition module, including:
Behavior sample acquiring unit, for obtaining the behavior sample data of the network user;
Feature extraction unit, for carrying out feature extraction to the behavior sample data, using the validity feature of extraction as described in Targeted customer's feature, the validity feature are the class another characteristic that can characterize the corresponding network user;
Characteristic matching unit, the history to match with targeted customer's feature is obtained in database for being accessed from web-based history Network access data, by the network user to be sorted of the network user belonging to the web-based history access data of acquisition, and it will obtain Web-based history access the historical data that data carry out network access as the network user to be sorted.
12. device according to claim 11, it is characterised in that the feature extraction unit, for based on chi Device and/or information gain device are measured, calculates the chi value and/or information gain value of each feature of extraction respectively; Chi value and/or information gain value are exceeded into the feature of corresponding predetermined threshold as targeted customer's feature, it is described Chi amount device is used to determine the dependence between the feature and classification of extraction, and described information gain apparatus is used to characterize Increase the information delta before and after predetermined characteristic in categorizing system.
CN201710743140.1A 2017-07-27 2017-08-25 Method and device for classifying network user groups Active CN107563429B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710621667 2017-07-27
CN2017106216677 2017-07-27

Publications (2)

Publication Number Publication Date
CN107563429A true CN107563429A (en) 2018-01-09
CN107563429B CN107563429B (en) 2020-11-10

Family

ID=60976882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710743140.1A Active CN107563429B (en) 2017-07-27 2017-08-25 Method and device for classifying network user groups

Country Status (1)

Country Link
CN (1) CN107563429B (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108766558A (en) * 2018-05-15 2018-11-06 京东方科技集团股份有限公司 A kind of method, apparatus of information processing, computer storage media and terminal
CN108985950A (en) * 2018-07-13 2018-12-11 平安科技(深圳)有限公司 Electronic device, user's insurance fraud method for prewarning risk and storage medium
CN109299728A (en) * 2018-08-10 2019-02-01 深圳前海微众银行股份有限公司 Federal learning method, system and readable storage medium storing program for executing
CN109376759A (en) * 2018-09-10 2019-02-22 平安科技(深圳)有限公司 User information classification method, device, computer equipment and storage medium
CN109409949A (en) * 2018-10-17 2019-03-01 北京字节跳动网络技术有限公司 Determination method, apparatus, electronic equipment and the storage medium of user group's classification
CN109933744A (en) * 2018-08-10 2019-06-25 深信服科技股份有限公司 Target identification method and device, equipment and computer readable storage medium
CN110245787A (en) * 2019-05-24 2019-09-17 阿里巴巴集团控股有限公司 A kind of target group's prediction technique, device and equipment
CN110532460A (en) * 2019-04-18 2019-12-03 国家计算机网络与信息安全管理中心 Classification method, device, electronic equipment and the medium of network access user
CN110598157A (en) * 2019-09-20 2019-12-20 北京字节跳动网络技术有限公司 Target information identification method, device, equipment and storage medium
CN111049809A (en) * 2019-11-27 2020-04-21 深圳壹账通智能科技有限公司 Risk user identification method and device, computer equipment and storage medium
CN111225009A (en) * 2018-11-27 2020-06-02 北京沃东天骏信息技术有限公司 Method and apparatus for generating information
TWI696194B (en) * 2018-02-05 2020-06-11 香港商阿里巴巴集團服務有限公司 Sorting method and device of complaint report type
CN111314102A (en) * 2018-12-11 2020-06-19 北京嘀嘀无限科技发展有限公司 Group identification method and device, electronic equipment and computer readable storage medium
CN111385136A (en) * 2018-12-29 2020-07-07 华为技术服务有限公司 Method and device for determining user communication identifier
CN111382210A (en) * 2018-12-27 2020-07-07 中国移动通信集团山西有限公司 Classification method, device and equipment
CN111522812A (en) * 2020-03-25 2020-08-11 平安科技(深圳)有限公司 User intelligent layering method and device, electronic equipment and readable storage medium
CN111629319A (en) * 2019-02-28 2020-09-04 中国移动通信有限公司研究院 Position prediction method and device
CN111753023A (en) * 2020-06-23 2020-10-09 中国联合网络通信集团有限公司 Method and device for determining type of internet private line
CN112351004A (en) * 2020-10-23 2021-02-09 烟台南山学院 Computer network based information security event processing system and method
CN112488140A (en) * 2019-09-12 2021-03-12 北京国双科技有限公司 Data association method and device
CN113254644A (en) * 2021-06-07 2021-08-13 成都数之联科技有限公司 Model training method, non-complaint work order processing method, system, device and medium
CN113516513A (en) * 2021-07-20 2021-10-19 重庆度小满优扬科技有限公司 Data analysis method and device, computer equipment and storage medium
CN117557306A (en) * 2024-01-09 2024-02-13 北京信索咨询股份有限公司 Management system for classifying consumers based on behaviors and characteristics
CN111522812B (en) * 2020-03-25 2024-06-28 平安科技(深圳)有限公司 User intelligent layering method and device, electronic equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110154216A1 (en) * 2009-12-18 2011-06-23 Hitachi, Ltd. Gui customizing method, system and program
CN102937951A (en) * 2011-08-15 2013-02-20 北京百度网讯科技有限公司 Method for building internet protocol (IP) address classification model, user classifying method and device
CN105701498A (en) * 2015-12-31 2016-06-22 腾讯科技(深圳)有限公司 User classification method and server
CN106295702A (en) * 2016-08-15 2017-01-04 西北工业大学 A kind of social platform user classification method analyzed based on individual affective behavior

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110154216A1 (en) * 2009-12-18 2011-06-23 Hitachi, Ltd. Gui customizing method, system and program
CN102937951A (en) * 2011-08-15 2013-02-20 北京百度网讯科技有限公司 Method for building internet protocol (IP) address classification model, user classifying method and device
CN105701498A (en) * 2015-12-31 2016-06-22 腾讯科技(深圳)有限公司 User classification method and server
CN106295702A (en) * 2016-08-15 2017-01-04 西北工业大学 A kind of social platform user classification method analyzed based on individual affective behavior

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CHEN ZHANG ET AL: "A Users Clustering Algorithm for Group Recommendation", 《 2016 4TH INTL CONF ON APPLIED COMPUTING AND INFORMATION TECHNOLOGY/3RD INTL CONF ON COMPUTATIONAL SCIENCE/INTELLIGENCE AND APPLIED INFORMATICS/1ST INTL CONF ON BIG DATA, CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (ACIT-CSII-BCD)》 *
YING WANG ET AL: "Mining characteristics of network community user group based on CDFP-tree algorithm", < 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING》 *
张彤等: "基于日志分析的网络用户行为分类研究", 《广东公安科技》 *
徐明等: "基于改进卡方统计的微博特征提取方法", 《计算机工程与应用》 *
陈红涛: "基于搜索日志的用户行为研究及应用", 《万方学位论文库》 *

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI696194B (en) * 2018-02-05 2020-06-11 香港商阿里巴巴集團服務有限公司 Sorting method and device of complaint report type
US10915706B2 (en) 2018-02-05 2021-02-09 Advanced New Technologies Co., Ltd. Sorting text report categories
CN108766558A (en) * 2018-05-15 2018-11-06 京东方科技集团股份有限公司 A kind of method, apparatus of information processing, computer storage media and terminal
CN108985950A (en) * 2018-07-13 2018-12-11 平安科技(深圳)有限公司 Electronic device, user's insurance fraud method for prewarning risk and storage medium
CN108985950B (en) * 2018-07-13 2023-04-18 平安科技(深圳)有限公司 Electronic device, user fraud protection risk early warning method and storage medium
WO2020010712A1 (en) * 2018-07-13 2020-01-16 平安科技(深圳)有限公司 Electronic device, user insurance fraud risk pre-warning method, and storage medium
CN109299728A (en) * 2018-08-10 2019-02-01 深圳前海微众银行股份有限公司 Federal learning method, system and readable storage medium storing program for executing
CN109933744A (en) * 2018-08-10 2019-06-25 深信服科技股份有限公司 Target identification method and device, equipment and computer readable storage medium
CN109376759A (en) * 2018-09-10 2019-02-22 平安科技(深圳)有限公司 User information classification method, device, computer equipment and storage medium
CN109409949A (en) * 2018-10-17 2019-03-01 北京字节跳动网络技术有限公司 Determination method, apparatus, electronic equipment and the storage medium of user group's classification
CN111225009A (en) * 2018-11-27 2020-06-02 北京沃东天骏信息技术有限公司 Method and apparatus for generating information
CN111314102A (en) * 2018-12-11 2020-06-19 北京嘀嘀无限科技发展有限公司 Group identification method and device, electronic equipment and computer readable storage medium
CN111382210A (en) * 2018-12-27 2020-07-07 中国移动通信集团山西有限公司 Classification method, device and equipment
CN111382210B (en) * 2018-12-27 2023-11-10 中国移动通信集团山西有限公司 Classification method, device and equipment
CN111385136A (en) * 2018-12-29 2020-07-07 华为技术服务有限公司 Method and device for determining user communication identifier
CN111629319B (en) * 2019-02-28 2022-05-31 中国移动通信有限公司研究院 Position prediction method and device
CN111629319A (en) * 2019-02-28 2020-09-04 中国移动通信有限公司研究院 Position prediction method and device
CN110532460A (en) * 2019-04-18 2019-12-03 国家计算机网络与信息安全管理中心 Classification method, device, electronic equipment and the medium of network access user
CN110245787B (en) * 2019-05-24 2023-11-17 创新先进技术有限公司 Target group prediction method, device and equipment
CN110245787A (en) * 2019-05-24 2019-09-17 阿里巴巴集团控股有限公司 A kind of target group's prediction technique, device and equipment
CN112488140A (en) * 2019-09-12 2021-03-12 北京国双科技有限公司 Data association method and device
CN110598157A (en) * 2019-09-20 2019-12-20 北京字节跳动网络技术有限公司 Target information identification method, device, equipment and storage medium
CN111049809A (en) * 2019-11-27 2020-04-21 深圳壹账通智能科技有限公司 Risk user identification method and device, computer equipment and storage medium
CN111522812A (en) * 2020-03-25 2020-08-11 平安科技(深圳)有限公司 User intelligent layering method and device, electronic equipment and readable storage medium
CN111522812B (en) * 2020-03-25 2024-06-28 平安科技(深圳)有限公司 User intelligent layering method and device, electronic equipment and readable storage medium
CN111753023B (en) * 2020-06-23 2023-06-06 中国联合网络通信集团有限公司 Method and device for determining type of internet private line
CN111753023A (en) * 2020-06-23 2020-10-09 中国联合网络通信集团有限公司 Method and device for determining type of internet private line
CN112351004A (en) * 2020-10-23 2021-02-09 烟台南山学院 Computer network based information security event processing system and method
CN113254644B (en) * 2021-06-07 2021-09-17 成都数之联科技有限公司 Model training method, non-complaint work order processing method, system, device and medium
CN113254644A (en) * 2021-06-07 2021-08-13 成都数之联科技有限公司 Model training method, non-complaint work order processing method, system, device and medium
CN113516513A (en) * 2021-07-20 2021-10-19 重庆度小满优扬科技有限公司 Data analysis method and device, computer equipment and storage medium
CN113516513B (en) * 2021-07-20 2023-04-07 重庆度小满优扬科技有限公司 Data analysis method and device, computer equipment and storage medium
CN117557306A (en) * 2024-01-09 2024-02-13 北京信索咨询股份有限公司 Management system for classifying consumers based on behaviors and characteristics
CN117557306B (en) * 2024-01-09 2024-04-19 北京信索咨询股份有限公司 Management system for classifying consumers based on behaviors and characteristics

Also Published As

Publication number Publication date
CN107563429B (en) 2020-11-10

Similar Documents

Publication Publication Date Title
CN107563429A (en) A kind of sorting technique and device of network user colony
CN106485562B (en) Commodity information recommendation method and system based on user historical behaviors
CN107341716A (en) A kind of method, apparatus and electronic equipment of the identification of malice order
CN109063966B (en) Risk account identification method and device
CN107273436A (en) The training method and trainer of a kind of recommended models
CN103810162B (en) The method and system of recommendation network information
CN107818344A (en) The method and system that user behavior is classified and predicted
CN109325691A (en) Abnormal behaviour analysis method, electronic equipment and computer program product
CN108351985A (en) Method and apparatus for large-scale machines study
CN108804704A (en) A kind of user&#39;s depth portrait method and device
CN107016569A (en) The targeted customer&#39;s account acquisition methods and device of a kind of networking products
CN107368519A (en) A kind of cooperative processing method and system for agreeing with user interest change
CN109409928A (en) A kind of material recommended method, device, storage medium, terminal
CN107368856A (en) Clustering method and device, the computer installation and readable storage medium storing program for executing of Malware
CN107818491A (en) Electronic installation, Products Show method and storage medium based on user&#39;s Internet data
CN108304853A (en) Acquisition methods, device, storage medium and the electronic device for the degree of correlation of playing
CN112329816A (en) Data classification method and device, electronic equipment and readable storage medium
CN107592296A (en) The recognition methods of rubbish account and device
CN107368526A (en) A kind of data processing method and device
Ko et al. Keeping our rivers clean: Information-theoretic online anomaly detection for streaming business process events
CN115545103A (en) Abnormal data identification method, label identification method and abnormal data identification device
WO2023024408A1 (en) Method for determining feature vector of user, and related device and medium
CN110232154A (en) Products Show method, apparatus and medium based on random forest
CN111860598B (en) Data analysis method and electronic equipment for identifying sports behaviors and relationships
CN111784360A (en) Anti-fraud prediction method and system based on network link backtracking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 100029 Beijing city Chaoyang District Yumin Road No. 3

Patentee after: NATIONAL COMPUTER NETWORK AND INFORMATION SECURITY MANAGEMENT CENTER

Patentee after: Beijing PERCENT Technology Group Co.,Ltd.

Address before: 100029 Beijing city Chaoyang District Yumin Road No. 3

Patentee before: NATIONAL COMPUTER NETWORK AND INFORMATION SECURITY MANAGEMENT CENTER

Patentee before: BEIJING BAIFENDIAN INFORMATION SCIENCE & TECHNOLOGY Co.,Ltd.

CP01 Change in the name or title of a patent holder