CN105491444B - A kind of data identifying processing method and device - Google Patents
A kind of data identifying processing method and device Download PDFInfo
- Publication number
- CN105491444B CN105491444B CN201510835028.1A CN201510835028A CN105491444B CN 105491444 B CN105491444 B CN 105491444B CN 201510835028 A CN201510835028 A CN 201510835028A CN 105491444 B CN105491444 B CN 105491444B
- Authority
- CN
- China
- Prior art keywords
- feature vector
- target feature
- user
- information
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/435—Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/441—Acquiring end-user identification, e.g. using personal code sent by the remote control or by inserting a card
Abstract
The embodiment of the invention discloses a kind of data identifying processing method and device, wherein method includes:According to the corresponding target feature vector of client described in the facility information for the client being collected into, user information and service feature information structuring;The user type mark carried respectively based on multiple feature vectors in flag data set, the disaggregated model for classifying to multiple feature vectors in the flag data set is created, and the corresponding user type of the target feature vector is identified according to the characteristic value in the disaggregated model and the target feature vector;User type mark corresponding with the user type of the target feature vector is set for the target feature vector, and the target feature vector for carrying the user type mark is added to the flag data set.Using the present invention, whether identification main broadcaster client that can be accurate, inexpensive is practised fraud using illegal means.
Description
Technical field
The present invention relates to Internet technical field more particularly to a kind of data identifying processing method and devices.
Background technology
In recent years, gather the functions such as online K songs, Online Video live streaming, game on line live streaming, online education live streaming to be integrated
Comprehensive rich media client develop on an unprecedented scale so that spectators user can comfortable ground main broadcaster is watched by spectator client
The content of client live streaming.But but there are some disabled users to help main broadcaster's client by using the mode of protocol number at present
Illegal brush popularity operation, the operation of brush stage property etc. are realized, to obtain unlawful interests.Wherein, protocol number is a kind of using network
Package form logs in the cheating program of client, which is chiefly used in live broadcast service of playing.
Currently, in order to find the spectator client for belonging to protocol number, typically by manually according to business experience to spectators visitor
The correlated characteristic at family end is analyzed, to find whether spectator client is protocol number client, and to protocol number client into
Row respective handling.Since the quantity of spectator client is huger, so by manually being analyzed one by one spectator client,
Huge human cost will be brought, and is not that obviously spectator client, manual analysis get up ratio for feature
It is more difficult, it is easy to cause erroneous judgement.
Invention content
A kind of data identifying processing method of offer of the embodiment of the present invention and device, can accurate, low cost identification main broadcaster
Whether client is practised fraud using illegal means.
An embodiment of the present invention provides a kind of data identifying processing methods, including:
The facility information and user information of client are collected, and according to the facility information, the user information and industry
The corresponding target feature vector of the characteristic information construction client of being engaged in;The target feature vector include the facility information,
The user information and the corresponding characteristic value of the service feature information;
Based on the user type mark that multiple feature vectors in flag data set carry respectively, create for described
The disaggregated model that multiple feature vectors in flag data set are classified, and it is special according to the disaggregated model and the target
Characteristic value in sign vector identifies the corresponding user type of the target feature vector;
For the target feature vector, user type mark corresponding with the user type of the target feature vector is set,
And the target feature vector for carrying the user type mark is added to the flag data set, in order to follow-up basis
New flag data set updates the disaggregated model so that new target feature vector to be identified;The user type mark
Including validated user mark and disabled user's mark.
Correspondingly, the embodiment of the present invention additionally provides a kind of data recognition process unit, including:
Constructing module, the facility information for collecting client and user information are collected, and according to the facility information, institute
State the corresponding target feature vector of client described in user information and service feature information structuring;The target feature vector packet
Include the facility information, the user information and the corresponding characteristic value of the service feature information;
Create identification module, the user type mark for being carried respectively based on multiple feature vectors in flag data set
Know, creates the disaggregated model for classifying to multiple feature vectors in the flag data set, and according to described point
Characteristic value in class model and the target feature vector identifies the corresponding user type of the target feature vector;
Add module is set, for the user type pair for target feature vector setting and the target feature vector
The user type mark answered, and the target feature vector for carrying the user type mark is added to the flag data collection
It closes, in order to subsequently update the disaggregated model according to new flag data set to know to new target feature vector
Not;The user type mark includes validated user mark and disabled user's mark.
The embodiment of the present invention, can be with structure by the facility information and user information and service feature information of collection client
The corresponding target feature vector of client is made, and is identified according to the characteristic value in the disaggregated model and target feature vector created
Go out the corresponding user type of target feature vector, if user type is disabled user's type, it can be said that the bright client is association
View client, so as to realize whether automatic identification spectator client is protocol number client, to reduce human cost;Into
One step can also be that user type mark corresponding with the user type of target feature vector is arranged in target feature vector, and will take
Target feature vector with user type mark is added to flag data set, in order to subsequently can be according to new reference numerals
According to set update disaggregated model new target feature vector to be identified, it can be seen that, in flag data set
The quantity of feature vector increases, and the disaggregated model created also can more and more precisely, to the unconspicuous target signature of feature
Vector can also accurately identify, that is, improve the identification accuracy to protocol number.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
Obtain other attached drawings according to these attached drawings.
Fig. 1 is a kind of flow diagram of data identifying processing method provided in an embodiment of the present invention;
Fig. 2 is the flow diagram of another data identifying processing method provided in an embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of data recognition process unit provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram for collecting constructing module provided in an embodiment of the present invention;
Fig. 5 is a kind of structural schematic diagram creating identification module provided in an embodiment of the present invention;
Fig. 6 is the structural schematic diagram of another data recognition process unit provided in an embodiment of the present invention;
Fig. 7 is the structural schematic diagram of another data recognition process unit provided in an embodiment of the present invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Fig. 1 is referred to, is a kind of flow diagram of data identifying processing method provided in an embodiment of the present invention, the side
Method may include:
S101, collects the facility information and user information of client, and according to the facility information, the user information with
And the corresponding target feature vector of client described in service feature information structuring;
Specifically, a kind of facility information that can collect client applied to the data recognition process unit of background server
And user information, wherein the user information may include subscriber identity information and user behavior information.Wherein, the equipment
Information can refer to user device environment information, and the process feature, called parent process, transmission data packet for specifically including operation are adopted
Agreement etc..The subscriber identity information can refer to user the client (such as spectator client) record, specifically
Including user name, age, gender, registered place, registration IP (Internet Protocol, procotol), grade, the pet name, letter
The information such as Jie, client login situation.The user behavior information can refer to the user of game live streaming platform record in each frequency
Behavior in road specifically includes log-on message, viewing information, consumption information (such as sending flower, send stage property etc.) and the interaction of user
Behavioural information (is such as left a message);Wherein, the log-on message of the user may include the stepping on of adding up of i days users before counting from day
Number/number of days/duration is recorded, the period is logged in, logs in IP and the related frequency;The viewing information may include that viewing live streaming is accumulative
Number/number of days/duration/period;The consumption information may include the consumption number of times/number of days/amount of money/period;The interaction row
It may include the period etc. of message for information;Wherein, the period refers to the specific time that behavior occurs.
The data recognition process unit creates the corresponding target feature vector of the client again, and the equipment is believed
Described in breath, the subscriber identity information, the user behavior information and the corresponding characteristic value of service feature information are used as
The element of target feature vector.Wherein, the service feature information may include account name length whether be more than 15 characters,
Whether word and data mix account name, whether account name is containing Chinese Name phonetic (being obtained in such as demographic database), account
Whether name contains whether english name and English everyday words, account registration IP have whether the registration of other accounts, account login IP have
Other accounts log in, whether account binds mobile phone and whether mailbox, account set privacy problem, account uses the pet name whether with
User name is identical, whether account idiograph and brief introduction are empty, account grade and integral etc..
Wherein, the dimension of the target feature vector is the total quantity of characteristic value in the target feature vector.It considers
The raw value range disunity of the primitive character value of each feature, for example log duration range may be between 1 to 3600, and step on
Recording numbers range may be in 1 to 100 time, therefore, and the characteristic value of quantity Value Types is belonged in the target feature vector is all
It is obtained by normalized;Wherein, the formula of normalized can be:The characteristic value of certain feature after normalized
=(the raw value stated range minimum of primitive character value-this feature of this feature)/(the raw value range of this feature is maximum
The raw value stated range minimum of value-this feature), do numberical range corresponding to the characteristic value after normalized can [0,
1] between.In addition, the characteristic value for belonging to non-quantity Value Types in the target feature vector is by using preset specified number
Value carries out assignment and obtains, i.e., the feature for non-quantity Value Types, can be to its classification assignment using as the target signature
Vector an element value, for example feature " male/female " be assigned a value of 0,1 respectively.
S102, based on the user type mark that multiple feature vectors in flag data set carry respectively, establishment is used for
To the disaggregated model that multiple feature vectors in the flag data set are classified, and according to the disaggregated model and described
Characteristic value in target feature vector identifies the corresponding user type of the target feature vector;
Specifically, after the data recognition process unit obtains the target feature vector, it can be according to target spy
The characteristic value of sign vector calculates the position in the hyperplane of vector space, Coordinate calculation method of the multi-C vector in hyperplane
For the prior art, it is not discussed here.At this point, the data recognition process unit can be based on SVM (Support
Vector Machine, support vector machines) use that carries respectively of multiple feature vectors in grader and flag data set
Family type identification (the user type mark includes validated user mark and disabled user's mark), creates in the hyperplane
Disaggregated model for classifying to multiple feature vectors in the flag data set;Wherein, the disaggregated model packet
The validated user region in the hyperplane and disabled user region are included, the validated user region includes carrying validated user
The feature vector of mark, the disabled user region include the feature vector for carrying disabled user's mark, the validated user area
Domain and disabled user region can also include multiple feature vectors for not carrying the user type mark, the flag data
Multiple feature vectors and the feature vector for not carrying the user type mark in set are in the hyperplane
Position is all to be calculated in advance according to the characteristic value of each feature vector by the data recognition process unit, described not take
Feature vector with user type mark includes at least the target feature vector.
After the data recognition process unit creates the disaggregated model, it can calculate to be distributed in the hyperplane and own
The Euclidean between multiple feature vectors in the feature vector and the flag data set of the user type mark is not carried
Distance, if for example, the feature vector for not carrying the user type mark has A, B two, the spy in the flag data set
Sign vector has C, D, E tri-, then needs to calculate separately the Euclidean distance between A and C, A and D, A and E, B and C, B and D, B and E.
Wherein, the formula of the Euclidean distance between two feature vectors of calculating is:D=sqrt (∑ (Xi1-Xi2) ^2), i=1,2..n;
Xi1For the characteristic value of certain feature in one of feature vector, Xi2For the characteristic value of this feature in another feature vector.Work as institute
State the corresponding Euclidean distance of target feature vector for most short Euclidean distance in all Euclidean distances for being calculated when, illustrate and institute
It is all Europe for being calculated to state in the associated at least one Euclidean distance of target feature vector there are one of Euclidean distance
Most short Euclidean distance in family name's distance, at this point it is possible to which the position according to the target feature vector in the hyperplane, determines
Region of the target feature vector in the disaggregated model, to identify the corresponding user type of the target feature vector,
I.e. if position of the target feature vector in the hyperplane belongs to the validated user region in the disaggregated model,
It can identify that the corresponding user type of the target feature vector is validated user, that is, illustrate that the target feature vector corresponds to
Client be not protocol number client;If position of the target feature vector in the hyperplane belongs to the classification
Disabled user region in model can then identify that the corresponding user type of the target feature vector is disabled user, i.e.,
Illustrate that the corresponding client of the target feature vector is protocol number client.Further, when the target feature vector pair
The Euclidean distance answered not for the most short Euclidean distance in all Euclidean distances for being calculated when, temporarily not to the target signature to
Amount is identified, and current only to the feature vector progress for not carrying the user type mark with most short Euclidean distance
Identification.For example, if the feature vector for not carrying the user type mark has A, B two (A is the target feature vector), institute
Stating the feature vector in flag data set has C, D, E tri-, and calculates separately out A and C, A and D, A and E, B and C, B and D, B
Euclidean distance between E, and detect that A and C is the most short Euclidean distance in all Euclidean distances, then it can pass through described point
The user type of A is identified in class model.
Wherein, it selects the purpose of most short Euclidean distance and is current all not carrying the user type mark to select
Feature vector in the most apparent feature vector of feature, i.e. Euclidean distance is shorter, illustrates that this does not carry user type mark
Feature vector it is closer from the feature vector for carrying user type mark, then illustrate that this does not carry the user type mark
Feature vector characteristic value closer to the feature vector for carrying user type mark characteristic value, i.e., this do not carry described
The feature of the feature vector of user type mark is more apparent, by the way that the most apparent feature vector of feature is identified and can be ensured
Current identification is most accurately.
User type corresponding with the user type of the target feature vector is arranged for the target feature vector in S103
Mark, and the target feature vector for carrying the user type mark is added to the flag data set, in order to rear
It is continuous that the disaggregated model is updated so that new target feature vector to be identified according to new flag data set;
Specifically, after the data recognition process unit identifies the corresponding user type of the target feature vector, it can
Think that user type mark corresponding with the user type of the target feature vector is arranged in the target feature vector, and will take
Target feature vector with user type mark is added to the flag data set, in order to subsequently according to new mark
Note data acquisition system updates the disaggregated model so that new target feature vector to be identified.Wherein, initial flag data collection
A small amount of feature vector in conjunction can by handmarking its corresponding user type mark, with not carrying the user largely
The feature vector of type identification is specifically identified, marks, and the feature vector in flag data set can be made more and more, because
This, again will be more more accurate than original disaggregated model according to the new disaggregated model that new flag data set is established,
So can accurately be identified to the new target feature vector based on the new disaggregated model, the new target
Feature vector can be gone out selected in the remaining feature vector for not carrying the user type mark with most short Europe
The feature vector of family name's distance.Since the feature vector identified every time is all the remaining feature for not carrying the user type mark
The most apparent feature vector of feature in vector can get over feature so being based on data identifying processing method provided by the invention
Unconspicuous feature vector is placed on to be identified more afterwards, and disaggregated model is also more accurate in the backward, to ensure that each
Feature vector all plays the effect accurately identified, that is, realizing logical too small amount of handmarking can be in numerous spectator client
It is middle all to find out all accord client.For example, the full dose user of game live streaming is more than 3,000,000, and initial label
In data acquisition system from a small amount of feature vector of handmarking can only need to include 100 carry disabled users mark features to
Amount and 100 feature vectors for carrying validated user mark, the data recognition process unit pass through the initial flag data
Set can be identified and mark one by one to the client of whole users.
The embodiment of the present invention, can be with structure by the facility information and user information and service feature information of collection client
The corresponding target feature vector of client is made, and is identified according to the characteristic value in the disaggregated model and target feature vector created
Go out the corresponding user type of target feature vector, if user type is disabled user's type, it can be said that the bright client is association
View client, so as to realize whether automatic identification spectator client is protocol number client, to reduce human cost;Into
One step can also be that user type mark corresponding with the user type of target feature vector is arranged in target feature vector, and will take
Target feature vector with user type mark is added to flag data set, in order to subsequently can be according to new reference numerals
According to set update disaggregated model new target feature vector to be identified, it can be seen that, in flag data set
The quantity of feature vector increases, and the disaggregated model created also can more and more precisely, to the unconspicuous target signature of feature
Vector can also accurately identify, that is, improve the identification accuracy to protocol number.
Fig. 2 is referred to again, is the flow diagram of another data identifying processing method provided in an embodiment of the present invention, institute
The method of stating may include:
S201, collects the facility information and user information of client, and according to the facility information, the user information with
And the corresponding target feature vector of client described in service feature information structuring;
Specifically, a kind of facility information that can collect client applied to the data recognition process unit of background server
And user information, wherein the user information may include subscriber identity information and user behavior information.Wherein, the equipment
Information can refer to user device environment information, and the process feature, called parent process, transmission data packet for specifically including operation are adopted
Agreement etc..The subscriber identity information can refer to user the client (such as spectator client) record, specifically
Including information such as user name, age, gender, registered place, registration IP, grade, the pet name, brief introduction, client login situations.The use
Family behavioural information can refer to behavior of the user of game live streaming platform record in each channel, specifically include the login letter of user
Breath, viewing information, consumption information (such as sending flower, send stage property etc.) and mutual-action behavior information (such as leaving a message);Wherein, the user
Log-on message may include i days users add up before counting from day login times/number of days/duration, log in the period, log in IP
And the related frequency;The viewing information may include viewing live streaming accumulative number/number of days/duration/period;The consumption letter
Breath may include the consumption number of times/number of days/amount of money/period;The mutual-action behavior information may include the period etc. of message;Wherein,
The period refers to the specific time that behavior occurs.
The data recognition process unit creates the corresponding target feature vector of the client again, and the equipment is believed
Described in breath, the subscriber identity information, the user behavior information and the corresponding characteristic value of service feature information are used as
The element of target feature vector.Wherein, the service feature information may include account name length whether be more than 15 characters,
Whether word and data mix account name, whether account name is containing Chinese Name phonetic (being obtained in such as demographic database), account
Whether name contains whether english name and English everyday words, account registration IP have whether the registration of other accounts, account login IP have
Other accounts log in, whether account binds mobile phone and whether mailbox, account set privacy problem, account uses the pet name whether with
User name is identical, whether account idiograph and brief introduction are empty, account grade and integral etc..
Wherein, the dimension of the target feature vector is the total quantity of characteristic value in the target feature vector.It considers
The raw value range disunity of the primitive character value of each feature, for example log duration range may be between 1 to 3600, and step on
Recording numbers range may be in 1 to 100 time, therefore, and the characteristic value of quantity Value Types is belonged in the target feature vector is all
It is obtained by normalized;Wherein, the formula of normalized can be:The characteristic value of certain feature after normalized
=(the raw value stated range minimum of primitive character value-this feature of this feature)/(the raw value range of this feature is maximum
The raw value stated range minimum of value-this feature), do numberical range corresponding to the characteristic value after normalized can [0,
1] between.In addition, the characteristic value for belonging to non-quantity Value Types in the target feature vector is by using preset specified number
Value carries out assignment and obtains, i.e., the feature for non-quantity Value Types, can be to its classification assignment using as the target signature
Vector an element value, for example feature " male/female " be assigned a value of 0,1 respectively.
S202 calculates the position in the hyperplane of vector space according to the characteristic value in the target feature vector;
Specifically, after the data recognition process unit obtains the target feature vector, it can be according to target spy
The characteristic value of sign vector calculates the position in the hyperplane of vector space, Coordinate calculation method of the multi-C vector in hyperplane
For the prior art, it is not discussed here.It optionally, can be in order to improve the coordinate computational efficiency in the hyperplane
According to the characteristic value correlation between the threshold value and different characteristic vector for judging characteristic value validity, in target spy
It levies and filters out validity feature value in the characteristic value of vector, and the target feature vector is calculated in institute according to the validity feature value
The position in hyperplane is stated, since the quantity of the validity feature value is than all characteristic values in the target feature vector
It is few, it is possible to improve the coordinate computational efficiency in the hyperplane.Wherein, since important feature carrying information is more,
I.e. characteristic value differs greatly, so specifically may be used by the method for screening validity feature value for the threshold value of judging characteristic value validity
To include:1, numeric type Feature change coefficient is more than reservation threshold, then this feature can be used as validity feature value;2, numeric type feature
Mark difference is more than reservation threshold, then this feature can be used as validity feature value;If 3, the number of certain class label of classifying type feature is less than pre-
Determine threshold values, then this feature can be used as validity feature value;If 4, the quantity of classifying type feature class label is less than reservation threshold, the spy
Sign can be used as validity feature value.Wherein, standard deviation/average value of the coefficient of variation=normal distribution.Wherein, by by the target
Feature vector is compared with the feature vector in the flag data set, it is known that the two relevance values are closer to special
Sign is more important, it is possible to using the high feature of characteristic value relevance values as validity feature value, wherein detection characteristic value correlation
Method may include three aspect examine:Pearson related-coefficient tests, variance analysis test, Chi-square Test.
S203 is carried respectively based on multiple feature vectors in support vector machines grader and flag data set
User type mark, in the hyperplane create for dividing multiple feature vectors in the flag data set
The disaggregated model of class;
Specifically, the data recognition process unit can be based on multiple in SVM classifier and flag data set
(the user type mark includes validated user mark and disabled user's mark to the user type mark that feature vector carries respectively
Know), the classification mould for classifying to multiple feature vectors in the flag data set is created in the hyperplane
Type;Wherein, the disaggregated model is included in the validated user region in the hyperplane and disabled user region, the legal use
Family region includes the feature vector for carrying validated user mark, and the disabled user region includes the spy for carrying disabled user's mark
Sign vector, the validated user region and disabled user region can not also carry what the user type identified including multiple
Feature vector, multiple feature vectors in the flag data set and the feature for not carrying the user type mark
Position of the vector in the hyperplane is all the characteristic value by the data recognition process unit according to each feature vector in advance
(or validity feature value) is calculated, and the feature vector for not carrying the user type mark includes at least the mesh
Mark feature vector.
S204, calculate be distributed in the hyperplane all feature vectors for not carrying user type mark with it is described
The Euclidean distance between multiple feature vectors in flag data set;
Specifically, after the data recognition process unit creates the disaggregated model, can calculate be distributed in it is described super flat
In face all feature vectors for not carrying user type mark and multiple feature vectors in the flag data set it
Between Euclidean distance, if for example, the feature vector for not carrying user type mark has A, B two, the flag data collection
Feature vector in conjunction has C, D, E tri-, then needs to calculate separately the Europe between A and C, A and D, A and E, B and C, B and D, B and E
Family name's distance.Wherein, the formula of the Euclidean distance between two feature vectors of calculating is:D=sqrt (∑ (Xi1-Xi2) ^2), i=
1,2..n;Xi1For the characteristic value of certain feature in one of feature vector, Xi2For the feature of this feature in another feature vector
Value.
S205 is the most short Europe in all Euclidean distances for being calculated when the corresponding Euclidean distance of the target feature vector
Family name apart from when, according to position of the target feature vector in the hyperplane, determine the target feature vector described
Region in disaggregated model, to identify the corresponding user type of the target feature vector;
Specifically, when the corresponding Euclidean distance of the target feature vector is most short in all Euclidean distances for being calculated
When Euclidean distance, illustrate at least one Euclidean distance associated with the target feature vector there are one of Euclidean away from
From for the most short Euclidean distance in all Euclidean distances for being calculated, at this point it is possible to according to the target feature vector described
Position in hyperplane determines region of the target feature vector in the disaggregated model, to identify the target signature
The corresponding user type of vector, i.e., if position of the target feature vector in the hyperplane belongs to the disaggregated model
In validated user region, then can identify the corresponding user type of the target feature vector be validated user, that is, illustrate
The corresponding client of the target feature vector is not protocol number client;If the target feature vector is in the hyperplane
In position belong to the disabled user region in the disaggregated model, then can identify the corresponding use of the target feature vector
Family type is disabled user, that is, illustrates that the corresponding client of the target feature vector is protocol number client.Further, when
The corresponding Euclidean distance of the target feature vector not for the most short Euclidean distance in all Euclidean distances for being calculated when, temporarily
The target feature vector is not identified, and current only to not carrying the user type with most short Euclidean distance
The feature vector of mark is identified.Described for example, if the feature vector for not carrying user type mark has A, B two
Feature vector in flag data set has C, D, E tri-, and calculates separately out A and C, A and D, A and E, B and C, B and D, B and E
Between Euclidean distance, and detect that A and C are the most short Euclidean distance in all Euclidean distances, then can pass through the classification
Model is first identified the user type of A.
Wherein, it selects the purpose of most short Euclidean distance and is current all not carrying the user type mark to select
Feature vector in the most apparent feature vector of feature, i.e. Euclidean distance is shorter, illustrates that this does not carry user type mark
Feature vector it is closer from the feature vector for carrying user type mark, that is, illustrate that this does not carry the user type mark
Feature vector characteristic value closer to the feature vector for carrying user type mark characteristic value, i.e., this do not carry described
The feature of the feature vector of user type mark is more apparent, by the way that the most apparent feature vector of feature is identified and can be ensured
Current identification is most accurately.
User type corresponding with the user type of the target feature vector is arranged for the target feature vector in S206
Mark, and the target feature vector for carrying the user type mark is added to the flag data set;
Specifically, after the data recognition process unit identifies the corresponding user type of the target feature vector, it can
Think that user type mark corresponding with the user type of the target feature vector is arranged in the target feature vector, and will take
Target feature vector with user type mark is added to the flag data set, in order to subsequently according to new mark
Note data acquisition system updates the disaggregated model so that new target feature vector to be identified.Wherein, initial flag data collection
A small amount of feature vector in conjunction can by handmarking its corresponding user type mark, with not carrying the user largely
The feature vector of type identification is specifically identified, marks, and the feature vector in flag data set can be made more and more, because
This, again will be more more accurate than original disaggregated model according to the new disaggregated model that new flag data set is established,
So can accurately be identified to the new target feature vector based on the new disaggregated model, the new target
Feature vector can be gone out selected in the remaining feature vector for not carrying the user type mark with most short Europe
The feature vector of family name's distance.Since the feature vector identified every time is all the remaining feature for not carrying the user type mark
The most apparent feature vector of feature in vector can get over feature so being based on data identifying processing method provided by the invention
Unconspicuous feature vector is placed on to be identified more afterwards, and disaggregated model is also more accurate in the backward, to ensure that each
Feature vector all plays the effect accurately identified, that is, realizing logical too small amount of handmarking can be in numerous spectator client
It is middle all to find out all accord client.If for example, having feature vector A, B, C in the flag data set, currently
The feature vector for not carrying the user type mark has D, E, F, then can be first according to the feature vector in flag data set
A, B, C create disaggregated model a1, if at this point, detecting that feature vector D with most short Euclidean distance, can pass through disaggregated model
A1 is identified and marks to feature vector D, and the feature vector D for carrying the user type mark is added to flag data
Set;Create disaggregated model a2 further according to feature vector A, B, C, D in flag data set, if at this point, detect feature to
Measuring F has most short Euclidean distance, then feature vector F can be identified and be marked by disaggregated model a2, and will carry institute
The feature vector F for stating user type mark is added to flag data set;Finally, further according to the feature in flag data set to
It measures A, B, C, D, F and creates disaggregated model a3, at this point, understanding that feature vector E has most short Euclidean distance, it is possible to pass through classification
Model a3 is identified and marks to feature vector E, and the feature vector E for carrying the user type mark is added to label
Data acquisition system so that flag data set includes feature vector A, B, C, D, E, F.In another example the full dose user for live streaming of playing is super
3,000,000 are crossed, and a small amount of feature vector by handmarking in initial flag data set can need to include only 100 carryings
The feature vector of disabled user's mark and 100 feature vectors for carrying validated user mark, the data recognition process unit
The client of whole users can be identified and be marked one by one by the initial flag data set.
It is special to calculate the target when the user type of the target feature vector is identified as disabled user's mark by S207
The vectorial Euclidean distance between the feature vector of carrying disabled user mark in the flag data set respectively of sign, to obtain
Average Euclidean distance;
Specifically, after all feature vectors for not carrying the user type mark are all identified and mark, it is described
Data recognition process unit can make corresponding punishment to the corresponding client of feature vector for carrying disabled user's mark and arrange
It applies.Again by taking the corresponding client of the target feature vector as an example, when the user type of the target feature vector be identified as it is non-
When method user identifier, spy of the target feature vector respectively with carrying disabled user mark in the flag data set is calculated
Euclidean distance between sign vector, to obtain average Euclidean distance;Likewise, other carry the feature vector of disabled user's mark
Also it needs to calculate corresponding average Euclidean distance, calculating process is identical as the target feature vector.If for example, the reference numerals
There is A, B, C tri- (A is the target feature vector) according to the feature vector for carrying disabled user's mark in set, then needs
The Euclidean distance (being respectively AB, BC, AC) between A and B, B and C, A and C is first calculated, then calculates the average Euclidean distance of A and is
(AB+AC) the average Euclidean distance that the average Euclidean distance of/2, B is (AB+BC)/2, C is (AC+BC)/2.
S208 calculates the corresponding confidence level of the target feature vector according to the average Euclidean distance, and to the mesh
The feature vector of carrying disabled user mark is corresponding in the mark corresponding confidence level of feature vector and the flag data set sets
Reliability is ranked up;
Specifically, the average Euclidean distance meter of the data recognition process unit further according to the target feature vector
The corresponding confidence level of the target feature vector is calculated, the average Euclidean distance is longer, then the confidence level is lower;Likewise,
The data recognition process unit also calculates corresponding confidence level to the feature vector of other carryings disabled user's mark.The number
According to recognition process unit again to the corresponding confidence level of the target feature vector and other carry disabled user mark feature to
It measures corresponding confidence level to be ranked up, can be specifically ranked up according to the sequence of confidence level from big to small.
S209 determines the target feature vector according to the sorting position of the corresponding confidence level of the target feature vector
Corresponding illegal grade, and the client is handled according to the illegal grade corresponding tactful processing mode;
Specifically, the data recognition process unit can be according to the sequence of the corresponding confidence level of the target feature vector
Position determines the corresponding illegal grade of the target feature vector, and according to the corresponding tactful processing mode of the illegal grade
The client is handled.For example, the data recognition process unit can preset four illegal grades, danger of attaching most importance to respectively
User, middle danger user, light danger user and suspicion user, and come preceding 10% feature vector during confidence level is sorted and be determined as
Endanger user again, and the feature vector for coming preceding 10% to 30% is determined as middle danger user, comes preceding 30% to 60% feature vector
It is determined as the user that gently endangers, the feature vector for coming preceding 60% to 100% is determined as suspicion user;Wherein, the suspicion user couple
The tactful processing mode answered can be:User is kicked offline, it is desirable that input identifying code;The corresponding tactful processing of the light danger user
Mode can be:User is kicked offline and user's input handset number is required to verify, for example user can input a cell-phone number, so
Input handset identifying code afterwards;The corresponding tactful processing mode of the middle danger user can be:User is kicked offline and requirement hand
Machine Modify password;The corresponding tactful processing mode of the heavy danger user can be:Direct title, if there is feedback needs to restore account
Number, need manual examination and verification.It can be seen that by calculating each corresponding confidence level of feature vector for carrying disabled user's mark, it can
With the illegal grade of each feature vector for carrying disabled user's mark of determination, so as to more reasonably non-to each carrying
The corresponding client of feature vector of method user identifier makes corresponding punishment.
The embodiment of the present invention, can be with structure by the facility information and user information and service feature information of collection client
The corresponding target feature vector of client is made, and is identified according to the characteristic value in the disaggregated model and target feature vector created
Go out the corresponding user type of target feature vector, if user type is disabled user's type, it can be said that the bright client is association
View client, so as to realize whether automatic identification spectator client is protocol number client, to reduce human cost;Into
One step can also be that user type mark corresponding with the user type of target feature vector is arranged in target feature vector, and will take
Target feature vector with user type mark is added to flag data set, in order to subsequently can be according to new reference numerals
According to set update disaggregated model new target feature vector to be identified, it can be seen that, in flag data set
The quantity of feature vector increases, and the disaggregated model created also can more and more precisely, to the unconspicuous target signature of feature
Vector can also accurately identify, that is, improve the identification accuracy to protocol number;And by calculating each carrying disabled user
The corresponding confidence level of feature vector of mark, it may be determined that the illegal grade of each feature vector for carrying disabled user's mark,
So as to more reasonably make corresponding punishment to the corresponding client of feature vector of each carrying disabled user mark.
Fig. 3 is referred to, is a kind of structural schematic diagram of data recognition process unit provided in an embodiment of the present invention, the number
Background server is can be applied to according to recognition process unit 1, the data recognition process unit 1 may include:Collect construction mould
Block 10 creates identification module 20, setting add module 30;
The collection constructing module 10, the facility information for collecting client and user information, and according to the equipment
The corresponding target feature vector of client described in information, the user information and service feature information structuring;The target is special
Sign vector includes the facility information, the user information and the corresponding characteristic value of the service feature information;
Specifically, the constructing module 10 of collecting can collect the facility information and user information of client, and according to institute
State the corresponding target feature vector of client described in facility information, the user information and service feature information structuring.Into one
Step, then be the structural schematic diagram for collecting constructing module 10 please also refer to Fig. 4, the collection constructing module 10 includes:
Collector unit 101, vectorial creating unit 102;
The collector unit 101, the facility information for collecting client and user information;The user information includes using
Family identity information and user behavior information;
The vector creating unit 102, for creating the corresponding target feature vector of the client, and by the equipment
Information, the subscriber identity information, the user behavior information and the corresponding characteristic value of service feature information are as institute
State the element of target feature vector;
Wherein, the user information may include subscriber identity information and user behavior information.Wherein, the facility information
User device environment information can be referred to, specifically include process feature, called parent process, transmission data the packet use of operation
Agreement etc..The subscriber identity information can refer to user in the record of the client (such as spectator client), specifically include
The information such as user name, age, gender, registered place, registration IP, grade, the pet name, brief introduction, client login situation.User's row
Behavior of the user in each channel that game live streaming platform record can be referred to for information, specifically include user log-on message,
Viewing information, consumption information (such as sending flower, send stage property etc.) and mutual-action behavior information (such as leaving a message);Wherein, the user
Log-on message may include login times/number of days/duration that i days users add up before counting from day, log in the period, log in IP with
And the related frequency;The viewing information may include viewing live streaming accumulative number/number of days/duration/period;The consumption information
It may include the consumption number of times/number of days/amount of money/period;The mutual-action behavior information may include the period etc. of message;Wherein, institute
It refers to the specific time that behavior occurs to state the period.
The vector creating unit 102 can create the corresponding target feature vector of the client, and by the equipment
Information, the subscriber identity information, the user behavior information and the corresponding characteristic value of service feature information are as institute
State the element of target feature vector.Wherein, the service feature information may include whether the length of account name is more than 15 words
Whether word and data mix, whether account name is containing Chinese Name phonetic (being obtained in such as demographic database), account for symbol, account name
Whether whether number name containing english name and English everyday words, account registration IP have whether the registration of other accounts, account log in IP
There are other accounts to log in, whether account binds mobile phone and whether mailbox, account set privacy problem, account uses the pet name whether
Whether identical as user name, account idiograph and brief introduction are empty, account grade and integral etc..
Wherein, the dimension of the target feature vector is the total quantity of characteristic value in the target feature vector.It considers
The raw value range disunity of the primitive character value of each feature, for example log duration range may be between 1 to 3600, and step on
Recording numbers range may be in 1 to 100 time, therefore, and the characteristic value of quantity Value Types is belonged in the target feature vector is all
It is obtained by normalized;Wherein, the formula of normalized can be:The characteristic value of certain feature after normalized
=(the raw value stated range minimum of primitive character value-this feature of this feature)/(the raw value range of this feature is maximum
The raw value stated range minimum of value-this feature), do numberical range corresponding to the characteristic value after normalized can [0,
1] between.In addition, the characteristic value for belonging to non-quantity Value Types in the target feature vector is by using preset specified number
Value carries out assignment and obtains, i.e., the feature for non-quantity Value Types, can be to its classification assignment using as the target signature
Vector an element value, for example feature " male/female " be assigned a value of 0,1 respectively.
The establishment identification module 20, the user for being carried respectively based on multiple feature vectors in flag data set
Type identification, disaggregated model of the establishment for classifying to multiple feature vectors in the flag data set, and according to
Characteristic value in the disaggregated model and the target feature vector identifies the corresponding user type of the target feature vector;
Specifically, after obtaining the target feature vector, the establishment identification module 20 can be based on flag data set
In multiple feature vectors carry respectively user type mark, create for multiple features in the flag data set
The disaggregated model that vector is classified, and institute is identified according to the characteristic value in the disaggregated model and the target feature vector
State the corresponding user type of target feature vector.Further, it is the establishment identification module 20 then please also refer to Fig. 5
Structural schematic diagram, the establishment identification module 20 may include:Position calculation unit 201, model creating unit 202, distance meter
Calculate unit 203, recognition unit 204;
The position calculation unit 201, for according to the characteristic value in the target feature vector, calculating in vector space
Hyperplane in position;
Specifically, in order to improve the coordinate computational efficiency in the hyperplane, the position calculation unit 201 can be with
Specifically for basis for the characteristic value correlation between the threshold value and different characteristic vector of judging characteristic value validity, in institute
It states and filters out validity feature value in the characteristic value of target feature vector, and the target signature is calculated according to the validity feature value
Position of the vector in the hyperplane.Since the quantity of the validity feature value is than all spies in the target feature vector
Value indicative will be lacked, it is possible to improve the coordinate computational efficiency in the hyperplane.Wherein, since important feature carries information
More, i.e., characteristic value differs greatly, so the position calculation unit 201 passes through the threshold value sieve for judging characteristic value validity
The method for selecting validity feature value can specifically include:1, numeric type Feature change coefficient is more than reservation threshold, then this feature can be made
For validity feature value;2, numeric type feature mark difference is more than reservation threshold, then this feature can be used as validity feature value;3, classifying type is special
If the number for levying certain class label is less than reservation threshold, this feature can be used as validity feature value;If 4, classifying type feature class label
Quantity be less than reservation threshold, then this feature can be used as validity feature value.Wherein, the standard deviation of the coefficient of variation=normal distribution/
Average value.It wherein, can be with by the way that the target feature vector to be compared with the feature vector in the flag data set
Know the two relevance values closer to feature is more important, so the position calculation unit 201 can be by characteristic value relevance values
High feature is as validity feature value, wherein the method for detection characteristic value correlation may include that three aspects are examined:pearson
Related-coefficient test, variance analysis test, Chi-square Test.
The model creating unit 202, for based on more in support vector machines grader and flag data set
The user type mark that a feature vector carries respectively, creates in the hyperplane for in the flag data set
The disaggregated model that multiple feature vectors are classified;The disaggregated model be included in validated user region in the hyperplane and
Disabled user region;
Specifically, the model creating unit 202 can be based on multiple spies in SVM classifier and flag data set
The user type mark (the user type mark includes validated user mark and disabled user's mark) that sign vector carries respectively,
The disaggregated model for classifying to multiple feature vectors in the flag data set is created in the hyperplane;Its
In, the disaggregated model is included in validated user region and disabled user region in the hyperplane, the validated user area
Domain include carry validated user mark feature vector, the disabled user region include carry disabled user mark feature to
Amount, the validated user region and disabled user region can also include multiple features for not carrying the user type mark
Vector, multiple feature vectors in the flag data set and the feature vector for not carrying the user type mark
Position in the hyperplane is all (or to be had according to the characteristic value of each feature vector by the position calculation unit 201 in advance
Effect characteristic value) be calculated, it is special that the feature vector for not carrying the user type mark includes at least the target
Sign vector.
The metrics calculation unit 203 all in the hyperplane does not carry the user type for calculating to be distributed in
The Euclidean distance between multiple feature vectors in the feature vector of mark and the flag data set;
Specifically, after the model creating unit 202 creates the disaggregated model, the metrics calculation unit 203 can be with
It calculates and is distributed in all feature vectors for not carrying the user type mark and the flag data set in the hyperplane
In multiple feature vectors between Euclidean distance, if for example, the feature vector for not carrying user type mark has A, B two
A, the feature vector in the flag data set has C, D, E tri-, then the metrics calculation unit 203 needs to calculate separately A
With the Euclidean distance between C, A and D, A and E, B and C, B and D, B and E.Wherein, calculate two feature vectors between Euclidean away from
From formula be:D=sqrt (∑ (Xi1-Xi2) ^2), i=1,2..n;Xi1For the feature of certain feature in one of feature vector
Value, Xi2For the characteristic value of this feature in another feature vector.
The recognition unit 204, all Europe for being calculated when the corresponding Euclidean distance of the target feature vector
Family name distance in most short Euclidean distance when, according to position of the target feature vector in the hyperplane, determine the mesh
Region of the feature vector in the disaggregated model is marked, to identify the corresponding user type of the target feature vector;
Specifically, when the corresponding Euclidean distance of the target feature vector is most short in all Euclidean distances for being calculated
When Euclidean distance, illustrate at least one Euclidean distance associated with the target feature vector there are one of Euclidean away from
From for the most short Euclidean distance in all Euclidean distances for being calculated, at this point, the recognition unit 204 can be according to the target
Position of the feature vector in the hyperplane determines region of the target feature vector in the disaggregated model, to know
The corresponding user type of not described target feature vector, i.e., if position of the target feature vector in the hyperplane belongs to
Validated user region in the disaggregated model, then the recognition unit 204 can identify the target feature vector pair
The user type answered is validated user, that is, illustrates that the corresponding client of the target feature vector is not protocol number client;Such as
Position of the target feature vector described in fruit in the hyperplane belongs to the disabled user region in the disaggregated model, then described
Recognition unit 204 can identify that the corresponding user type of the target feature vector is disabled user, that is, illustrate the target
The corresponding client of feature vector is protocol number client.Further, when the corresponding Euclidean distance of the target feature vector
Not for the most short Euclidean distance in all Euclidean distances for being calculated when, the target feature vector is not identified temporarily,
And it is current that only the feature vector for not carrying the user type mark with most short Euclidean distance is identified.For example,
If the feature vector for not carrying user type mark has A, B two, the feature vector in the flag data set have C,
D, E tri-, and the Euclidean distance between A and C, A and D, A and E, B and C, B and D, B and E is calculated separately out, and detect A and C
For the most short Euclidean distance in all Euclidean distances, then the recognition unit 204 can be by the disaggregated model first to the use of A
Family type is identified.
Wherein, it selects the purpose of most short Euclidean distance and is current all not carrying the user type mark to select
Feature vector in the most apparent feature vector of feature, i.e. Euclidean distance is shorter, illustrates that this does not carry user type mark
Feature vector it is closer from the feature vector for carrying user type mark, that is, illustrate that this does not carry the user type mark
Feature vector characteristic value closer to the feature vector for carrying user type mark characteristic value, i.e., this do not carry described
The feature of the feature vector of user type mark is more apparent, by the way that the most apparent feature vector of feature is identified and can be ensured
Current identification is most accurately.
The setting add module 30, for the user for target feature vector setting and the target feature vector
The corresponding user type mark of type, and the target feature vector for carrying the user type mark is added to the label
Data acquisition system, in order to subsequently according to new flag data set update the disaggregated model with to new target feature vector into
Row identification;
Specifically, after identifying the corresponding user type of the target feature vector, the setting add module 30 can
Think that user type mark corresponding with the user type of the target feature vector is arranged in the target feature vector, and will take
Target feature vector with user type mark is added to the flag data set, in order to subsequently according to new mark
Note data acquisition system updates the disaggregated model so that new target feature vector to be identified.Wherein, initial flag data collection
A small amount of feature vector in conjunction can by handmarking its corresponding user type mark, with not carrying the user largely
The feature vector of type identification is specifically identified, marks, and the feature vector in flag data set can be made more and more, because
This, again will be more more accurate than original disaggregated model according to the new disaggregated model that new flag data set is established,
So can accurately be identified to the new target feature vector based on the new disaggregated model, the new target
Feature vector can be gone out selected in the remaining feature vector for not carrying the user type mark with most short Europe
The feature vector of family name's distance.Since the feature vector identified every time is all the remaining feature for not carrying the user type mark
The more unconspicuous feature vector of feature is placed on and is identified more afterwards by the most apparent feature vector of feature in vector, and more past
Disaggregated model is also more accurate afterwards, it is possible to which guarantee plays the effect accurately identified to each feature vector, that is, realizes
Leading to too small amount of handmarking can all find out all accord client in numerous spectator clients.For example,
If having feature vector A, B, C in the flag data set, do not carry currently user type mark feature vector have D,
E, F, then the identification module 20 that creates can be first according to feature vector A, B, C establishment disaggregated model in flag data set
A1, if at this point, detecting that feature vector D with most short Euclidean distance, can carry out feature vector D by disaggregated model a1
Identification and label, and the feature vector D for carrying the user type mark is added to label by the setting add module 30
Data acquisition system;The identification module 20 that creates creates disaggregated model further according to feature vector A, B, C, D in flag data set
A2, if at this point, detecting that feature vector F with most short Euclidean distance, can carry out feature vector F by disaggregated model a2
Identification and label, and the feature vector F for carrying the user type mark is added to label by the setting add module 30
Data acquisition system;Finally, the identification module 20 that creates is further according to feature vector A, B, C, D, F establishment point in flag data set
Class model a3, at this point, understanding that feature vector E has most short Euclidean distance, it is possible to by disaggregated model a3 to feature vector E
It is identified and marks, and be added to the feature vector E for carrying the user type mark by the setting add module 30
Flag data set so that flag data set includes feature vector A, B, C, D, E, F.In another example the full dose for live streaming of playing is used
Family is more than 3,000,000, and a small amount of feature vector by handmarking in initial flag data set can need to include only 100
Carry the feature vector and 100 feature vectors for carrying validated user mark of disabled user's mark, the data identifying processing
Device 1 can be identified and marked one by one to the client of whole users by the initial flag data set.
The embodiment of the present invention, can be with structure by the facility information and user information and service feature information of collection client
The corresponding target feature vector of client is made, and is identified according to the characteristic value in the disaggregated model and target feature vector created
Go out the corresponding user type of target feature vector, if user type is disabled user's type, it can be said that the bright client is association
View client, so as to realize whether automatic identification spectator client is protocol number client, to reduce human cost;Into
One step can also be that user type mark corresponding with the user type of target feature vector is arranged in target feature vector, and will take
Target feature vector with user type mark is added to flag data set, in order to subsequently can be according to new reference numerals
According to set update disaggregated model new target feature vector to be identified, it can be seen that, in flag data set
The quantity of feature vector increases, and the disaggregated model created also can more and more precisely, to the unconspicuous target signature of feature
Vector can also accurately identify, that is, improve the identification accuracy to protocol number.
Fig. 6 is referred to again, is the structural schematic diagram of another data recognition process unit provided in an embodiment of the present invention, institute
It states data recognition process unit 1 and can be applied to background server, the data recognition process unit 1 may include above-mentioned Fig. 3
Collection constructing module 10, establishment identification module 20 in corresponding embodiment, setting add module 30, further, the data
Recognition process unit 1 can also include:Computing module 40, sorting module 50, tactful processing module 60;
The computing module 40, for when the user type of the target feature vector be identified as disabled user mark when,
Calculate the target feature vector respectively in the flag data set carry disabled user mark feature vector between
Euclidean distance, to obtain average Euclidean distance;
Specifically, after all feature vectors for not carrying the user type mark are all identified and mark, it is described
Data recognition process unit 1 can make corresponding punishment to the corresponding client of feature vector for carrying disabled user's mark and arrange
It applies.Again by taking the corresponding client of the target feature vector as an example, when the user type of the target feature vector be identified as it is non-
When method user identifier, the computing module 40 can calculate the target feature vector and be taken respectively with the flag data set
Euclidean distance between feature vector with disabled user's mark, to obtain average Euclidean distance;Likewise, other are carried illegally
The feature vector of user identifier also needs to calculate corresponding average Euclidean distance, calculating process and the target feature vector phase
Together.If for example, the feature vector for carrying disabled user's mark in the flag data set has A, B, C tri-, (A is described
Target feature vector), then it needs first to calculate Euclidean distance between A and B, B and C, A and C (respectively by the computing module 40
For AB, BC, AC), then it is the flat of (AB+BC)/2, C to calculate the average Euclidean distance that the average Euclidean distance of A is (AB+AC)/2, B
Equal Euclidean distance is (AC+BC)/2.
The computing module 40 is additionally operable to set according to the average Euclidean distance calculating target feature vector is corresponding
Reliability;
The sorting module 50, for in the corresponding confidence level of the target feature vector and the flag data set
The corresponding confidence level of feature vector for carrying disabled user's mark is ranked up;
Specifically, described in the average Euclidean distance calculating of the computing module 40 further according to the target feature vector
The corresponding confidence level of target feature vector, the average Euclidean distance is longer, then the confidence level is lower;Likewise, the meter
It calculates module 40 and corresponding confidence level also is calculated to the feature vector of other carryings disabled user's mark.The sorting module 50 is right again
The corresponding confidence level of target feature vector confidence level corresponding with the feature vector of other carryings disabled user's mark carries out
Sequence, can specifically be ranked up according to the sequence of confidence level from big to small.
The strategy processing module 60, for the sorting position according to the corresponding confidence level of the target feature vector, really
Determine the corresponding illegal grade of the target feature vector, and according to the corresponding tactful processing mode of the illegal grade to the visitor
Family end is handled;
Specifically, the strategy processing module 60 can be according to the sequence position of the corresponding confidence level of the target feature vector
It sets, determines the corresponding illegal grade of the target feature vector, and according to the corresponding tactful processing mode pair of the illegal grade
The client is handled.For example, preset four illegal grades, attach most importance to respectively danger user, middle danger user, gently endanger user and
Suspicion user, and come preceding 10% feature vector during confidence level is sorted by the tactful processing module 60 and determines use of endangering of attaching most importance to
Family, the feature vector for coming preceding 10% to 30% are determined as middle danger user, come preceding 30% to 60% feature vector and are determined as
Light danger user, the feature vector for coming preceding 60% to 100% are determined as suspicion user;Wherein, the corresponding plan of the suspicion user
Omiting processing mode can be:User is kicked offline, it is desirable that input identifying code;The corresponding tactful processing mode of the light danger user can
Think:User is kicked offline and user's input handset number is required to verify, for example user can input a cell-phone number, then input
Mobile phone identifying code;The corresponding tactful processing mode of the middle danger user can be:User is kicked offline and requirement to be changed with mobile phone
Password;The corresponding tactful processing mode of the heavy danger user can be:Direct title is needed if there is feedback needs to restore account
Manual examination and verification.It can be seen that by calculating each corresponding confidence level of feature vector for carrying disabled user's mark, it may be determined that
The illegal grade of each feature vector for carrying disabled user's mark, so as to more reasonably to each carrying disabled user
The corresponding client of feature vector of mark makes corresponding punishment.
The embodiment of the present invention, can be with structure by the facility information and user information and service feature information of collection client
The corresponding target feature vector of client is made, and is identified according to the characteristic value in the disaggregated model and target feature vector created
Go out the corresponding user type of target feature vector, if user type is disabled user's type, it can be said that the bright client is association
View client, so as to realize whether automatic identification spectator client is protocol number client, to reduce human cost;Into
One step can also be that user type mark corresponding with the user type of target feature vector is arranged in target feature vector, and will take
Target feature vector with user type mark is added to flag data set, in order to subsequently can be according to new reference numerals
According to set update disaggregated model new target feature vector to be identified, it can be seen that, in flag data set
The quantity of feature vector increases, and the disaggregated model created also can more and more precisely, to the unconspicuous target signature of feature
Vector can also accurately identify, that is, improve the identification accuracy to protocol number;And by calculating each carrying disabled user
The corresponding confidence level of feature vector of mark, it may be determined that the illegal grade of each feature vector for carrying disabled user's mark,
So as to more reasonably make corresponding punishment to the corresponding client of feature vector of each carrying disabled user mark.
Fig. 7 is referred to, is the structural schematic diagram of another data recognition process unit provided in an embodiment of the present invention, it is described
Data recognition process unit 1000 may include processor 1001, communication interface 1002 and (the data identification of memory 1003
The quantity of processor 1001 in processing unit 1000 can be one or more, in Fig. 7 by taking a processor as an example).This hair
In some bright embodiments, processor 1001, communication interface 1002 and memory 1003 can pass through communication bus or other modes
Connection, wherein Fig. 7 by communication bus for being connected.
Wherein, the communication interface 1002, for being communicated with client;
The memory 1003 is for storing program;
The processor 1001 is for executing described program, to realize
The facility information and user information of client are collected, and according to the facility information, the user information and industry
The corresponding target feature vector of the characteristic information construction client of being engaged in;The target feature vector include the facility information,
The user information and the corresponding characteristic value of the service feature information;
Based on the user type mark that multiple feature vectors in flag data set carry respectively, create for described
The disaggregated model that multiple feature vectors in flag data set are classified, and it is special according to the disaggregated model and the target
Characteristic value in sign vector identifies the corresponding user type of the target feature vector;
For the target feature vector, user type mark corresponding with the user type of the target feature vector is set,
And the target feature vector for carrying the user type mark is added to the flag data set, in order to follow-up basis
New flag data set updates the disaggregated model so that new target feature vector to be identified;The user type mark
Including validated user mark and disabled user's mark.
In one embodiment, the processor 1001 is additionally operable to:
When the user type of the target feature vector is identified as disabled user's mark, the target feature vector is calculated
Euclidean distance between the feature vector of carrying disabled user mark in the flag data set respectively, to obtain average Europe
Family name's distance;
The corresponding confidence level of the target feature vector is calculated according to the average Euclidean distance, and to the target signature
Carried in vectorial corresponding confidence level and the flag data set the corresponding confidence level of feature vector of disabled user's mark into
Row sequence;
According to the sorting position of the corresponding confidence level of the target feature vector, determine that the target feature vector is corresponding
Illegal grade, and the client is handled according to the illegal grade corresponding tactful processing mode.
In one embodiment, the processor 1001 is executing the facility information and user information for collecting client, and
According to the corresponding target feature vector of client described in the facility information, the user information and service feature information structuring
When, it is specifically used for:
Collect the facility information and user information of client;The user information includes subscriber identity information and user behavior
Information;
Create the corresponding target feature vector of the client, and by the facility information, the subscriber identity information, institute
State the element of user behavior information and the corresponding characteristic value of service feature information as the target feature vector;
Wherein, the dimension of the target feature vector is the total quantity of characteristic value in the target feature vector;The mesh
The characteristic value for belonging to quantity Value Types in mark feature vector is obtained by normalized, and belongs to non-quantity Value Types
Characteristic value carries out assignment by using preset specified numerical value and obtains.
In one embodiment, the processor 1001 is being executed based on multiple feature vectors in flag data set point
The user type mark not carried, creates the classification for classifying to multiple feature vectors in the flag data set
Model, and identify that the target feature vector corresponds to according to the characteristic value in the disaggregated model and the target feature vector
User type when, be specifically used for:
According to the characteristic value in the target feature vector, the position in the hyperplane of vector space is calculated;
The user carried respectively based on multiple feature vectors in support vector machines grader and flag data set
Type identification creates point for classifying to multiple feature vectors in the flag data set in the hyperplane
Class model;The disaggregated model is included in validated user region and disabled user region in the hyperplane;
It calculates and is distributed in all feature vectors for not carrying the user type mark and the label in the hyperplane
The Euclidean distance between multiple feature vectors in data acquisition system;All feature vectors for not carrying the user type mark are extremely
Include the target feature vector less;
When the corresponding Euclidean distance of the target feature vector for the most short Euclidean in all Euclidean distances for being calculated away from
From when, according to position of the target feature vector in the hyperplane, determine the target feature vector in the classification
Region in model, to identify the corresponding user type of the target feature vector.
In one embodiment, the processor 1001 is being executed according to the characteristic value in the target feature vector, meter
When calculating the position in the hyperplane of vector space, it is specifically used for:
According to the characteristic value correlation between the threshold value and different characteristic vector for judging characteristic value validity, in institute
It states and filters out validity feature value in the characteristic value of target feature vector, and the target signature is calculated according to the validity feature value
Position of the vector in the hyperplane.
The embodiment of the present invention, can be with structure by the facility information and user information and service feature information of collection client
The corresponding target feature vector of client is made, and is identified according to the characteristic value in the disaggregated model and target feature vector created
Go out the corresponding user type of target feature vector, if user type is disabled user's type, it can be said that the bright client is association
View client, so as to realize whether automatic identification spectator client is protocol number client, to reduce human cost;Into
One step can also be that user type mark corresponding with the user type of target feature vector is arranged in target feature vector, and will take
Target feature vector with user type mark is added to flag data set, in order to subsequently can be according to new reference numerals
According to set update disaggregated model new target feature vector to be identified, it can be seen that, in flag data set
The quantity of feature vector increases, and the disaggregated model created also can more and more precisely, to the unconspicuous target signature of feature
Vector can also accurately identify, that is, improve the identification accuracy to protocol number;And by calculating each carrying disabled user
The corresponding confidence level of feature vector of mark, it may be determined that the illegal grade of each feature vector for carrying disabled user's mark,
So as to more reasonably make corresponding punishment to the corresponding client of feature vector of each carrying disabled user mark.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the program can be stored in a computer read/write memory medium
In, the program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic
Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access
Memory, RAM) etc..
The above disclosure is only the preferred embodiments of the present invention, cannot limit the right model of the present invention with this certainly
It encloses, therefore equivalent changes made in accordance with the claims of the present invention, is still within the scope of the present invention.
Claims (10)
1. a kind of data identifying processing method, which is characterized in that including:
The facility information and user information of client are collected, and special according to the facility information, the user information and business
Levy the corresponding target feature vector of client described in information structuring;The target feature vector includes the facility information, described
User information and the corresponding characteristic value of the service feature information;
Based on the user type mark that multiple feature vectors in flag data set carry respectively, create for the label
The disaggregated model that multiple feature vectors in data acquisition system are classified, when the target feature vector being distributed in hyperplane
When Euclidean distance between multiple feature vectors in the flag data set includes most short Euclidean distance, according to described point
Characteristic value in class model and the target feature vector identifies the corresponding user type of the target feature vector;Wherein,
Position of the target feature vector in the hyperplane is calculated according to the validity feature value in target feature vector
's;The validity feature value is according to for the characteristic value between the threshold value and different characteristic vector of judging characteristic value validity
What correlation was screened, the characteristic value correlation is based on pearson related-coefficient tests, variance analysis test, card side
What inspection obtained;
User type mark corresponding with the user type of the target feature vector is set for the target feature vector, and will
The target feature vector for carrying user type mark is added to the flag data set, in order to follow-up according to new
Flag data set updates the disaggregated model so that new target feature vector to be identified;The user type identifies
Validated user identifies and disabled user's mark.
2. the method as described in claim 1, which is characterized in that further include:
When the user type of the target feature vector is identified as disabled user's mark, the target feature vector difference is calculated
With disabled user's mark is carried in the flag data set feature vector between Euclidean distance, with obtain average Euclidean away from
From;
The corresponding confidence level of the target feature vector is calculated according to the average Euclidean distance, and to the target feature vector
The corresponding confidence level of feature vector that disabled user's mark is carried in corresponding confidence level and the flag data set is arranged
Sequence;
According to the sorting position of the corresponding confidence level of the target feature vector, determine that the target feature vector is corresponding illegal
Grade, and the client is handled according to the illegal grade corresponding tactful processing mode.
3. the method as described in claim 1, which is characterized in that the facility information and user information for collecting client, and
According to the corresponding target signature of client described in the facility information, the user information and service feature information structuring to
Amount, including:
Collect the facility information and user information of client;The user information includes subscriber identity information and user behavior letter
Breath;
Create the corresponding target feature vector of the client, and by the facility information, the subscriber identity information, the use
The element of family behavioural information and the corresponding characteristic value of service feature information as the target feature vector;
Wherein, the dimension of the target feature vector is the total quantity of characteristic value in the target feature vector;The target is special
The characteristic value for belonging to quantity Value Types in sign vector is obtained by normalized, and belongs to the feature of non-quantity Value Types
Value carries out assignment by using preset specified numerical value and obtains.
4. the method as described in claim 1, which is characterized in that multiple feature vectors point in the set based on flag data
The user type mark not carried, creates the classification for classifying to multiple feature vectors in the flag data set
Model, and identify that the target feature vector corresponds to according to the characteristic value in the disaggregated model and the target feature vector
User type, including:
According to the characteristic value in the target feature vector, the position in the hyperplane of vector space is calculated;
The user type carried respectively based on multiple feature vectors in support vector machines grader and flag data set
Mark creates the classification mould for classifying to multiple feature vectors in the flag data set in the hyperplane
Type;The disaggregated model is included in validated user region and disabled user region in the hyperplane;
It calculates and is distributed in all feature vectors for not carrying the user type mark and the flag data in the hyperplane
The Euclidean distance between multiple feature vectors in set;All feature vectors for not carrying the user type mark are at least wrapped
Include the target feature vector;
When the corresponding Euclidean distance of the target feature vector for the most short Euclidean distance in all Euclidean distances for being calculated when,
According to position of the target feature vector in the hyperplane, determine the target feature vector in the disaggregated model
Region, to identify the corresponding user type of the target feature vector.
5. method as claimed in claim 4, which is characterized in that the characteristic value according in the target feature vector, meter
The position in the hyperplane of vector space is calculated, is specifically included:
According to the characteristic value correlation between the threshold value and different characteristic vector for judging characteristic value validity, in the mesh
It marks and filters out validity feature value in the characteristic value of feature vector, and the target feature vector is calculated according to the validity feature value
Position in the hyperplane.
6. a kind of data recognition process unit, which is characterized in that including:
Constructing module, the facility information for collecting client and user information are collected, and according to the facility information, the use
The corresponding target feature vector of client described in family information and service feature information structuring;The target feature vector includes institute
State facility information, the user information and the corresponding characteristic value of the service feature information;
Identification module is created, the user type mark for being carried respectively based on multiple feature vectors in flag data set,
Disaggregated model for classifying to multiple feature vectors in the flag data set is created, when being distributed in hyperplane
The target feature vector and the flag data set in multiple feature vectors between Euclidean distance include most short Europe
Family name apart from when, the target feature vector pair is identified according to the characteristic value in the disaggregated model and the target feature vector
The user type answered;Wherein, position of the target feature vector in the hyperplane is according in target feature vector
Validity feature value is calculated;The validity feature value is according to the threshold value for being used for judging characteristic value validity and different spies
Characteristic value correlation between sign vector is screened, the characteristic value correlation be based on pearson related-coefficient tests,
Variance analysis test, Chi-square Test obtain;
Add module is set, it is corresponding with the user type of the target feature vector for being arranged for the target feature vector
User type identifies, and the target feature vector for carrying the user type mark is added to the flag data set,
In order to subsequently update the disaggregated model according to new flag data set new target feature vector to be identified;Institute
It includes validated user mark and disabled user's mark to state user type mark.
7. device as claimed in claim 6, which is characterized in that further include:
Computing module, for when the user type of the target feature vector is identified as disabled user's mark, calculating the mesh
Euclidean distance of the feature vector respectively between the feature vector of carrying disabled user mark in the flag data set is marked, with
Obtain average Euclidean distance;
The computing module is additionally operable to calculate the corresponding confidence level of the target feature vector according to the average Euclidean distance;
Sorting module, for carrying illegal use in the corresponding confidence level of the target feature vector and the flag data set
The corresponding confidence level of feature vector of family mark is ranked up;
Tactful processing module determines the target for the sorting position according to the corresponding confidence level of the target feature vector
The corresponding illegal grade of feature vector, and according to the corresponding tactful processing mode of the illegal grade to the client at
Reason.
8. device as claimed in claim 6, which is characterized in that the collection constructing module includes:
Collector unit, the facility information for collecting client and user information;The user information includes subscriber identity information
With user behavior information;
Vectorial creating unit, for creating the corresponding target feature vector of the client, and by the facility information, the use
Family identity information, the user behavior information and the corresponding characteristic value of service feature information as the target signature to
The element of amount;
Wherein, the dimension of the target feature vector is the total quantity of characteristic value in the target feature vector;The target is special
The characteristic value for belonging to quantity Value Types in sign vector is obtained by normalized, and belongs to the feature of non-quantity Value Types
Value carries out assignment by using preset specified numerical value and obtains.
9. device as claimed in claim 6, which is characterized in that the establishment identification module includes:
Position calculation unit, for according to the characteristic value in the target feature vector, calculating in the hyperplane of vector space
Position;
Model creating unit, for based on multiple feature vectors in support vector machines grader and flag data set
Carry respectively user type mark, in the hyperplane create for multiple features in the flag data set to
Measure the disaggregated model classified;The disaggregated model is included in validated user region and disabled user area in the hyperplane
Domain;
Metrics calculation unit, for calculate be distributed in the hyperplane all features for not carrying user type mark to
The Euclidean distance between multiple feature vectors in amount and the flag data set;It is all not carry the user type mark
Feature vector include at least the target feature vector;
Recognition unit, in all Euclidean distances for being calculated when the corresponding Euclidean distance of the target feature vector most
When short Euclidean distance, according to position of the target feature vector in the hyperplane, determine that the target feature vector exists
Region in the disaggregated model, to identify the corresponding user type of the target feature vector.
10. device as claimed in claim 9, which is characterized in that
The position calculation unit is specifically used for according to the threshold value and different characteristic vector for being used for judging characteristic value validity
Between characteristic value correlation, filter out validity feature value in the characteristic value of the target feature vector, and according to it is described effectively
Characteristic value calculates position of the target feature vector in the hyperplane.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510835028.1A CN105491444B (en) | 2015-11-25 | 2015-11-25 | A kind of data identifying processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510835028.1A CN105491444B (en) | 2015-11-25 | 2015-11-25 | A kind of data identifying processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105491444A CN105491444A (en) | 2016-04-13 |
CN105491444B true CN105491444B (en) | 2018-11-06 |
Family
ID=55678102
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510835028.1A Active CN105491444B (en) | 2015-11-25 | 2015-11-25 | A kind of data identifying processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105491444B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180089581A1 (en) * | 2016-09-27 | 2018-03-29 | Futurewei Technologies, Inc. | Apparatus and method for dataset model fitting using a classifying engine |
CN108268877A (en) * | 2016-12-30 | 2018-07-10 | 中国移动通信集团黑龙江有限公司 | A kind of method and apparatus for identifying target terminal |
CN108399418B (en) * | 2018-01-23 | 2021-09-03 | 北京奇艺世纪科技有限公司 | User classification method and device |
CN110166344B (en) * | 2018-04-25 | 2021-08-24 | 腾讯科技(深圳)有限公司 | Identity identification method, device and related equipment |
CN110557447B (en) * | 2019-08-26 | 2022-06-10 | 腾讯科技(武汉)有限公司 | User behavior identification method and device, storage medium and server |
CN111417021B (en) * | 2020-03-16 | 2022-07-08 | 广州虎牙科技有限公司 | Plug-in identification method and device, computer equipment and readable storage medium |
CN111766487A (en) * | 2020-07-31 | 2020-10-13 | 南京南瑞继保电气有限公司 | Cable partial discharge defect type identification method based on multiple quality characteristic quantities |
CN113521751B (en) * | 2021-07-27 | 2023-11-14 | 腾讯科技(深圳)有限公司 | Operation test method and device, storage medium and electronic equipment |
CN114466358B (en) * | 2022-01-30 | 2023-10-31 | 全球能源互联网研究院有限公司 | User identity continuous authentication method and device based on zero trust |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101600178A (en) * | 2009-06-26 | 2009-12-09 | 成都市华为赛门铁克科技有限公司 | Junk information confirmation method and device, terminal |
CN102708186A (en) * | 2012-05-11 | 2012-10-03 | 上海交通大学 | Identification method of phishing sites |
CN102768659A (en) * | 2011-05-03 | 2012-11-07 | 阿里巴巴集团控股有限公司 | Method and system for identifying repeated account |
CN104471501A (en) * | 2012-06-12 | 2015-03-25 | 西门子公司 | Generalized pattern recognition for fault diagnosis in machine condition monitoring |
CN104579773A (en) * | 2014-12-31 | 2015-04-29 | 北京奇虎科技有限公司 | Domain name system analysis method and device |
CN104933082A (en) * | 2014-03-21 | 2015-09-23 | 华为技术有限公司 | Evaluation information processing method and apparatus |
-
2015
- 2015-11-25 CN CN201510835028.1A patent/CN105491444B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101600178A (en) * | 2009-06-26 | 2009-12-09 | 成都市华为赛门铁克科技有限公司 | Junk information confirmation method and device, terminal |
CN102768659A (en) * | 2011-05-03 | 2012-11-07 | 阿里巴巴集团控股有限公司 | Method and system for identifying repeated account |
CN102708186A (en) * | 2012-05-11 | 2012-10-03 | 上海交通大学 | Identification method of phishing sites |
CN104471501A (en) * | 2012-06-12 | 2015-03-25 | 西门子公司 | Generalized pattern recognition for fault diagnosis in machine condition monitoring |
CN104933082A (en) * | 2014-03-21 | 2015-09-23 | 华为技术有限公司 | Evaluation information processing method and apparatus |
CN104579773A (en) * | 2014-12-31 | 2015-04-29 | 北京奇虎科技有限公司 | Domain name system analysis method and device |
Also Published As
Publication number | Publication date |
---|---|
CN105491444A (en) | 2016-04-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105491444B (en) | A kind of data identifying processing method and device | |
CN105447147B (en) | A kind of data processing method and device | |
CN106445796B (en) | Automatic detection method and device for cheating channel | |
CN108399418A (en) | A kind of user classification method and device | |
CN108304426B (en) | Identification obtaining method and device | |
CN106469261A (en) | A kind of auth method and device | |
CN106843941B (en) | Information processing method, device and computer equipment | |
CN107515915A (en) | User based on user behavior data identifies correlating method | |
CN112364202A (en) | Video recommendation method and device and electronic equipment | |
CN105516192B (en) | A kind of mail address is safe to identify control method and device | |
CN106021455A (en) | Image characteristic relationship matching method, apparatus and system | |
CN114297448B (en) | License applying method, system and medium based on intelligent epidemic prevention big data identification | |
CN107529093A (en) | A kind of detection method and system of video file playback volume | |
CN107729924A (en) | Picture review probability interval generation method and picture review decision method | |
CN106301979B (en) | Method and system for detecting abnormal channel | |
CN107622406A (en) | Identify the method and system of virtual unit | |
CN109816004A (en) | Source of houses picture classification method, device, equipment and storage medium | |
CN108804501A (en) | A kind of method and device of detection effective information | |
CN111179023B (en) | Order identification method and device | |
CN109104381A (en) | A kind of mobile application recognition methods based on third party's flow HTTP message | |
EP3882825A1 (en) | Learning model application system, learning model application method, and program | |
CN109062945B (en) | Information recommendation method, device and system for social network | |
CN113362095A (en) | Information delivery method and device | |
CN113065126B (en) | Personal information compliance method and device based on distributed data sandbox | |
CN107977413A (en) | Feature selection approach, device, computer equipment and the storage medium of user data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP02 | Change in the address of a patent holder | ||
CP02 | Change in the address of a patent holder |
Address after: 519000 High-tech Zone, Zhuhai City, Guangdong Province, Unit 1, Fourth Floor C, Building A, Headquarters Base No. 1, Qianwan Third Road, Tangjiawan Town Patentee after: ZHUHAI DUOWAN INFORMATION TECHNOLOGY LIMITED Address before: 510000 Nancun Town Wanbo Business Center, Panyu District, Guangzhou City, Guangdong Province, 29 floors of B-1 Building, Wanda Business Plaza North District Patentee before: ZHUHAI DUOWAN INFORMATION TECHNOLOGY LIMITED |