CN108228450A - It is a kind of that shop brush list application detection method is applied based on machine learning - Google Patents

It is a kind of that shop brush list application detection method is applied based on machine learning Download PDF

Info

Publication number
CN108228450A
CN108228450A CN201711265795.9A CN201711265795A CN108228450A CN 108228450 A CN108228450 A CN 108228450A CN 201711265795 A CN201711265795 A CN 201711265795A CN 108228450 A CN108228450 A CN 108228450A
Authority
CN
China
Prior art keywords
application
scoring
commentator
detected
brush list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711265795.9A
Other languages
Chinese (zh)
Inventor
何道敬
洪凯
唐宗力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN201711265795.9A priority Critical patent/CN108228450A/en
Publication of CN108228450A publication Critical patent/CN108228450A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3608Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

Shop brush list application detection method is applied based on machine learning the invention discloses a kind of, this method includes:Define feature vector V=[number is deleted in maximum extreme scoring length, extremely scoring rate, scoring, and score deletion rate, general comment score, and ranking unusual fluctuation number increases suspicious commentator's number, suspicious commentator's ratio day newly] of application to be detected;The feature vector of training sample is obtained from application shop, using machine learning method, the feature vector of each training sample is learnt, generates brush list application detection model;Application to be measured is detected using brush list application detection model, judges whether the application is brush list application.This method efficiently, can be detected quickly using the brush list application in shop.

Description

It is a kind of that shop brush list application detection method is applied based on machine learning
Technical field
It is more particularly to a kind of that shop brush list is applied based on machine learning the present invention relates to brush list application detection technique field Using detection method.
Background technology
With universal, the App Store of such as apple of smart mobile phone, the movement that the Google Play of Google are this kind of should It is budded out into popularity therewith with market by providing the abundant mobile phone application of type to the user.In these mobile application markets, carry Reflection has been supplied using popular degree ranking list function.It can not only reflect using respectively being applied in shop by user using ranking list Ratings, while also be able to the application more forward for ranking and bring more customer flows.Therefore, there are many unique Application developer also to the purchase of application Shua Bang mechanisms using brush list service, oneself improved with this apply and applying shop ranking list In ranking.These Shua Bang mechanisms can transfer a large amount of brush list user within the section time, huge by being manufactured for intended application The means such as download, a large amount of favorable comments, the ranking apply brush list are obviously improved within the section time, and some can even be rushed in The umber one.
These Shua Bang mechanisms it is without restraint active, it will to competition between normally being applied in application shop bring it is inequitable because Element.Therefore, how effectively to detect using the brush list application in shop, and adopt an effective measure in time, into current brush Thorny problem in list application detection technique field.
Invention content
One kind it is an object of the invention to be directed to existing brush list application detection technique deficiency and provide is based on engineering Habit applies shop brush list application detection method, and machine learning techniques are applied to brush list application detection technique neck by this method for the first time In domain, feasibility is good, compared with the application detection of brush list was carried out using manual method or heuristic rule in the past, has in efficiency Greatly promoted.Important work can be played to maintenance application shop normal order to provide powerful using shop network operator With
Realizing the specific technical solution of the object of the invention is:
A kind of to apply shop brush list application detection method based on machine learning, this method includes step in detail below:
Step 1:Defining the feature vector V=of application to be detected, [maximum extreme scoring length, extremely scoring rate, scoring are deleted Divisor, score deletion rate, general comment score, and ranking unusual fluctuation number increases suspicious commentator's number, suspicious commentator's ratio day newly];
Step 2:The feature vector of training sample is obtained from application shop, uses the machine learning sides such as support vector machines Method learns the feature vector of each training sample, generates brush list application detection model;
Step 3:Application to be measured is detected using brush list application detection model, judges whether the application is that brush list should With;Wherein:
The maximum extreme scoring length is:It is to be detected to apply what the continuous highest of the longest occurred in detection cycle scored The length of length or the continuous lowest score of longest;
The extreme scoring rate is:It is to be detected to apply the highest scoring rate occurred in detection cycle or lowest score rate;
The day increases suspicious commentator newly:Newly-increased suspicious commentator's quantity daily;
The ranking unusual fluctuation number is:It is to be measured that the day occurred in detection cycle ranking amplitude of variation is applied to be more than threshold value f1 Number;
The suspicious commentator is the user for meeting following either condition:
Scoring total degree is more than threshold value f2 and head comments the user averagely taken less than threshold value f3;
The total degree that scores is more than threshold value f2 and history extremely scoring rate is more than the user of threshold value f4;
It downloads and is more than the user of threshold value f5 and scoring conversion ratio more than threshold value f6 using total degree.
The head comments averagely to take:For all applications that commentator once commented on, when self-application is downloaded from, until commenting Theorist generates it first comment and stops, the average time interval passed throughThe scoring conversion ratio of the user is:The user Scoring total degree divided by its download are using total degree.
Above-mentioned threshold value f1, f2, f3, f4, f5, f6 used can be measured by experiment.
Machine learning method described in step 2 is linear regression, logistic regression, decision tree, support vector machines, simple pattra leaves This, K nearest neighbor algorithms, K mean algorithms, random forests algorithm, dimension-reduction algorithm, Gradient Boost or Adaboost algorithm.
The present invention can obtain the brush list set of applications in application shop, and the management for application shop provides reliable guarantor Barrier.In the case where training sample is enough, this method can carry out brush list application in application shop with higher accuracy Detection.With the continuous renewal of application brush list means, new brush list means are often taken using Shua Bang mechanisms, in new case In, using new training sample as input, this method can adapt to the brush list application under new model.
Description of the drawings
Fig. 1 is flow chart of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawings and specific embodiment the present invention will be further described, but embodiments of the present invention are not limited to This.
The present invention includes step in detail below:
Embodiment 1
Defining the feature vector V=of application to be detected, [maximum extreme scoring length, extremely scoring rate, scoring are deleted number, are commented Divide deletion rate, general comment score, ranking unusual fluctuation number increases suspicious commentator's number, suspicious commentator's ratio day newly], below to feature The data that vector is used elaborate
In defined feature vector, maximum extreme scoring length is:To be detected apply occurs most in detection cycle The length of long continuous highest scoring or the length of the continuous lowest score of longest;Extreme scoring rate is:It is to be detected to apply in detection week The highest scoring rate of interim appearance or lowest score rate;Number is deleted in scoring:The scoring quantity that application to be detected was deleted; Scoring deletion rate be:The ratio of scoring number and general comment score that application to be detected was deleted;Ranking unusual fluctuation number is:It is to be measured to answer The day ranking amplitude of variation occurred in detection cycle is more than the number of threshold value f1;Day increases suspicious commentator's number newly:Daily Increase the ratio that suspicious commentator accounts for daily newly-increased commentator's sum newly;Suspicious commentator is the user for meeting following either condition: Scoring total degree is more than threshold value f2 and head comments the averagely time-consuming user for being less than threshold value f3, scoring total degree to be more than threshold value f2 and history Extreme user of the scoring rate more than threshold value f4 downloads the use for being more than threshold value f6 more than threshold value f5 and the conversion ratio that scores using total degree Family;Head comments averagely to take:For all applications that commentator once commented on, when self-application is downloaded from, until commentator produces it Raw first comment stops, the average time interval passed throughThe scoring conversion ratio of user is:The user score total degree divided by It is downloaded using total degree.
The feature vector (column vector) of training sample is obtained from application shop, such as:
v1=(30 75,%50 33%150201050%)T
v2=(10 61% 3 3% 100 10 2 2%)T
v3=(19 5% 50 33% 150 82 50%)TWherein, upper right footmark T represents transposed matrix.
The matrix that then all feature vectors are formed
After the feature vector for obtaining each application, mark is stamped to the feature vector of each application by artificial mode Note, in this example, for the application corresponding to vectorial V1, V3 labeled as brush list application, the application corresponding to vectorial V2 should for non-brush list With.That is y=(1 0 1)T.Wherein, upper right footmark T represents transposed matrix.
Using logistic regression method, the feature vector of each training sample is learnt, generates brush list application detection model. Concrete operations are as follows:
Defining anticipation function is:
Wherein V is feature vector, and θ is the parameter column vector identical with V dimensions, and e is natural constant, and T represents transposed matrix.
Feature is learnt below, using gradient descent algorithm, parameter vector θ is iterated, concrete operations are as follows:
(each iteration updates θ simultaneouslyj)
M represents the quantity of feature vector, and α represents learning rate, the i-th row of upper right footmark (i) representing matrix, lower right corner table The jth row of j representing matrixes.
When iteration convergence, parameter matrix θ=[θ can be obtained12345678]。
The parameter of logistic regression is obtained, learning process is completed.
It is detected below, as eigenmatrix V '=[a, b, c, d, e, f, g, the h] for inputting application to be measured, calculates judgement Value:
If hθ(V′)>0.5, then the application is brush list application;Conversely, for normal use.
Although the illustrative specific embodiment of the present invention is described above, in order to the technology of the art Personnel understand the present invention, it should be apparent that described embodiment is only part of the embodiment of the present invention rather than whole Embodiment.Based on the embodiments of the present invention, those of ordinary skill in the art own without making creative work The every other embodiment obtained belongs to the protection category of the present invention.

Claims (4)

1. a kind of apply shop brush list application detection method based on machine learning, this method includes:
Step 1:Define the feature vector V of application to be detected=[number is deleted in maximum extreme scoring length, extremely scoring rate, scoring, Score deletion rate, general comment score, and ranking unusual fluctuation number increases suspicious commentator's number, suspicious commentator's ratio newly daily];
Step 2:The feature vector of training sample is obtained from application shop, using machine learning method, to each training sample Feature vector is learnt, and generates brush list application detection model;
Step 3:Application to be detected is detected using brush list application detection model, judges whether the application is that brush list should With;Wherein:
The maximum extreme scoring length is:The length to be detected for applying the continuous highest scoring of the longest occurred in detection cycle The length of degree or the continuous lowest score of longest;
The extreme scoring rate is:It is to be detected to apply the highest scoring rate occurred in detection cycle or lowest score rate;
The day increases suspicious commentator newly:Newly-increased suspicious commentator's quantity daily;
The ranking unusual fluctuation number is:It is to be detected that the day occurred in detection cycle ranking amplitude of variation is applied to be more than threshold value f1's Number;
The suspicious commentator is the user for meeting following either condition:
Scoring total degree is more than threshold value f2 and head comments the user averagely taken less than threshold value f3;
The total degree that scores is more than threshold value f2 and history extremely scoring rate is more than the user of threshold value f4;
It downloads and is more than the user of threshold value f5 and scoring conversion ratio more than threshold value f6 using total degree.
2. detection method according to claim 1, which is characterized in that the head, which comments averagely to take, is:For commentator once All applications commented on, self-application download when from, stop until commentator generates it first comment, the average time passed through Interval
3. detection method according to claim 1, which is characterized in that the user conversion ratio that scores is:The user scores Total degree divided by its download are using total degree.
4. detection method according to claim 1, which is characterized in that the machine learning method is linear regression, logic Recurrence, decision tree, support vector machines, naive Bayesian, K nearest neighbor algorithms, K mean algorithms, random forests algorithm, dimensionality reduction are calculated Method, Gradient Boost or Adaboost algorithms.
CN201711265795.9A 2017-12-05 2017-12-05 It is a kind of that shop brush list application detection method is applied based on machine learning Pending CN108228450A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711265795.9A CN108228450A (en) 2017-12-05 2017-12-05 It is a kind of that shop brush list application detection method is applied based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711265795.9A CN108228450A (en) 2017-12-05 2017-12-05 It is a kind of that shop brush list application detection method is applied based on machine learning

Publications (1)

Publication Number Publication Date
CN108228450A true CN108228450A (en) 2018-06-29

Family

ID=62653767

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711265795.9A Pending CN108228450A (en) 2017-12-05 2017-12-05 It is a kind of that shop brush list application detection method is applied based on machine learning

Country Status (1)

Country Link
CN (1) CN108228450A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110096632A (en) * 2019-04-16 2019-08-06 华东师范大学 A kind of brush list person's detection method based on sparse self-encoding encoder

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912599A (en) * 2016-03-31 2016-08-31 维沃移动通信有限公司 Ranking method and terminal of terminal application programs
CN107391548A (en) * 2017-04-06 2017-11-24 华东师范大学 A kind of Mobile solution market brush list user's group detection method and its system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912599A (en) * 2016-03-31 2016-08-31 维沃移动通信有限公司 Ranking method and terminal of terminal application programs
CN107391548A (en) * 2017-04-06 2017-11-24 华东师范大学 A kind of Mobile solution market brush list user's group detection method and its system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAO CHEN: "Toward Detecting Collusive Ranking Manipulation", 《ACM》 *
HENGSHUZHU: "Ranking Fraud Detection for Mobile Apps: A Holistic View", 《ACM》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110096632A (en) * 2019-04-16 2019-08-06 华东师范大学 A kind of brush list person's detection method based on sparse self-encoding encoder

Similar Documents

Publication Publication Date Title
CN105224606B (en) A kind of processing method and processing device of user identifier
Cai et al. Solving nonlinear water management models using a combined genetic algorithm and linear programming approach
US20180276542A1 (en) Recommendation Result Generation Method and Apparatus
CN105335409B (en) A kind of determination method, equipment and the network server of target user
CN105095219B (en) Micro-blog recommendation method and terminal
CN105677648B (en) A kind of Combo discovering method and system based on label propagation algorithm
CN106649774A (en) Artificial intelligence-based object pushing method and apparatus
CN108153791B (en) Resource recommendation method and related device
TW201717071A (en) Recommendation method and device
CN103365936A (en) Video recommendation system and method thereof
CN110378731A (en) Obtain method, apparatus, server and the storage medium of user's portrait
CA2436352A1 (en) Process and system for developing a predictive model
CN103678431A (en) Recommendation method based on standard labels and item grades
CN109993414A (en) A kind of appraisal procedure, device and the storage medium of electric power enterprise innovation and development
CN106168980A (en) Multimedia resource recommends sort method and device
CN111581516A (en) Investment product recommendation method and related device
CN104599084A (en) Crowd calculation quality control method and device
CN104992348B (en) A kind of method and apparatus of information displaying
CN107067282B (en) Consumer product rebate sale marketing management system and use method thereof
CN110110226A (en) A kind of proposed algorithm, recommender system and terminal device
CN103812671B (en) A kind of telecommunication service subscriber perceptibility appraisal procedure and system
CN104574093A (en) Method and device for calculating sales volume based on E-commerce sample data information
CN109615504A (en) Products Show method, apparatus, electronic equipment and computer readable storage medium
CN105260458A (en) Video recommendation method for display apparatus and display apparatus
CN110069781A (en) A kind of recognition methods of entity tag and relevant device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180629

WD01 Invention patent application deemed withdrawn after publication