CN108229131A - Counterfeit APP recognition methods and device - Google Patents

Counterfeit APP recognition methods and device Download PDF

Info

Publication number
CN108229131A
CN108229131A CN201611153579.0A CN201611153579A CN108229131A CN 108229131 A CN108229131 A CN 108229131A CN 201611153579 A CN201611153579 A CN 201611153579A CN 108229131 A CN108229131 A CN 108229131A
Authority
CN
China
Prior art keywords
app
identified
feature
information
legal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611153579.0A
Other languages
Chinese (zh)
Inventor
常玲
邱勤
薛姗
赵蓓
杜雪涛
张琳
马力鹏
吴日切夫
于雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Design Institute Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Design Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Design Institute Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201611153579.0A priority Critical patent/CN108229131A/en
Publication of CN108229131A publication Critical patent/CN108229131A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/44Program or device authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a kind of counterfeit APP recognition methods and devices.Method includes:Step S1, the installation kit of APP to be identified is obtained, and target signature information is extracted from installation kit;Step S2, similarity analysis is carried out to the legal APP in APP to be identified and pre-established APP information banks according to target signature information, obtains the similarity between APP to be identified and legal copy APP;Step S3, judge whether APP to be identified is counterfeit APP according to similarity and default decision rule.The embodiment of the present invention obtains the target signature information of APP to be identified automatically, and APP to be identified is analyzed based on target signature information, to judge whether APP to be identified is counterfeit APP, compared with prior art, it reduces manually to the intervention of system, has the advantages that recognition efficiency is high.

Description

Counterfeit APP recognition methods and device
Technical field
The present embodiments relate to fields of communication technology, and in particular to a kind of counterfeit APP recognition methods and device.
Background technology
Mobile application market development is rapid, and more and more developers is attracted to pour in, individual developer and small-sized team's number It measures numerous, leads to market confusion and the appearance of a large amount of counterfeit APP.The typical behaviour of counterfeit APP includes following three aspects:
1st, the icon or title of popular application are replicated, increases the popularity of itself application;2nd, by the advertisement of original application Supplier changes other supplier into, obtains advertising income;3rd, after plagiarizing application, malicious code is inserted into, privacy of user is obtained, steals User's money or account number cipher etc..
The social danger that counterfeit APP is generated includes following three aspects:
1st, counterfeit APP be used to swindle crime, such as mountain vallage " Alipay " so that user's shopping money flows directly into swindler Personal account;2nd, counterfeit APP is used for theft crime, and criminal is applied not only to steal after obtaining the various account number ciphers of user Surreptitiously, other people can be also sold, heavy losses are caused to user;3rd, counterfeit APP easily leaks out individual subscriber privacy, including communicatedly Location, telephone number, browsing record etc., criminal is sold other people, so as to make a profit.
During the embodiment of the present invention is realized, inventor has found that existing counterfeit APP identifying schemes are usually by special The monitoring personnel of door is based on experience to determine whether for counterfeit APP.But this identifying schemes subjective factor is to judging result It is affected, reduces the precision of judgement, moreover, judging efficiency is relatively low.
Invention content
One purpose of the embodiment of the present invention is to solve the prior art to use the counterfeit APP identification sides based on expertise Case causes to judge the problem of precision is low, judging efficiency is low.
The embodiment of the present invention proposes a kind of counterfeit APP recognition methods, including:
Step S1, the installation kit of APP to be identified is obtained, and target signature information is extracted from installation kit;
Step S2, according to target signature information to the legal APP in the APP to be identified and pre-established APP information banks into Row similarity analysis obtains the similarity between the APP to be identified and legal copy APP;
Step S3, judge whether the APP to be identified is counterfeit APP according to the similarity and default decision rule.
Optionally, before step S1, the method further includes:
Step S0, crawl the page of all newly-increased APP in APP publication channels, using all newly-increased APP crawled as APP to be identified;
Essential information on the page of each APP to be identified of extraction;
Correspondingly, the installation kit for obtaining APP to be identified includes:
The installation kit of APP to be identified is obtained according to the essential information of APP to be identified.
Optionally, after step S0 and before step S1, the method further includes:
Step S0', it is filtered out from APP to be identified to preset matching strategy according to the essential information of APP to be identified related APP。
Optionally, the essential information includes:Apply Names;
Before step S0', the method further includes:
Word segmentation processing is carried out to the Apply Names of all legal copy APP in pre-established APP information banks respectively, each should be obtained With the corresponding feature phrase of title;
According to the position sequence of Feature Words in feature phrase, to each Apply Names, corresponding feature phrase carries out statistical Analysis, selects a Feature Words from each position;
Correspondingly, the step S0' includes:
It by Feature Words in the corresponding feature phrases of the APP to be identified and is selected according to the position sequence of Feature Words The corresponding Feature Words in each position are matched, and obtain matching degree;
If judgement knows that the matching degree is more than the first predetermined threshold value, it is correlation APP to confirm the APP to be identified.
Optionally, the essential information includes:Using introduction;
Before step S0', the method further includes:
Classified according to business function to all legal copy APP in pre-established APP information banks;
The application of all kinds of legal copy APP is introduced and carries out word segmentation processing, obtains full feature set;
It is for statistical analysis to the Feature Words in the full feature set, obtain the feature set and feature of all kinds of legal copy APP Concentrate the weighted value of each Feature Words;
The step S0' includes:
By the Feature Words in the feature set of the APP to be identified and the Feature Words in the feature set per a kind of legal copy APP into Row individually matching;
If judging to know that the sum of weighted value of Feature Words to match is more than the second predetermined threshold value, confirm described to be identified APP is correlation APP.
Optionally, the step S3 includes:
If judging to know, the similarity between the APP to be identified and legal copy APP is less than the first similar threshold value, judges institute APP to be identified is stated as counterfeit APP.
Optionally, the step S2 includes:
According to target signature information respectively to the APP to be identified and legal APP in pre-established APP information banks and counterfeit APP carries out similarity analysis, obtain the first similarity between the APP to be identified and legal copy APP, the APP to be identified and The second similarity between counterfeit APP.
Optionally, the step S3 includes:
If judgement knows that first similarity is less than the second similar threshold value, and second similarity is more than or equal to described Third similar threshold value, then it is counterfeit APP to judge the APP to be identified;
Alternatively,
If judgement knows that the first similarity is more than or equal to third similar threshold value and second similarity is less than described second Similar threshold value then judges the APP to be identified for legal APP.
Optionally, the pre-established APP information banks legal copy APP libraries and counterfeit APP libraries;
Correspondingly, the method further includes:
Step S4, according to judgement result by the relevant information of APP to be identified and APP to be identified be stored in legal APP libraries or Counterfeit APP libraries.
The embodiment of the present invention proposes a kind of counterfeit APP identification devices, including:
Acquisition module for obtaining the installation kit of APP to be identified, and extracts target signature information from installation kit;
Analysis module, for according to target signature information to the legal copy in the APP to be identified and pre-established APP information banks APP carries out similarity analysis, obtains the similarity between the APP to be identified and legal copy APP;
Judgment module, for judging whether the APP to be identified is counterfeit according to the similarity and default decision rule APP。
Optionally, described device further includes:Preprocessing module;
The preprocessing module, it is all by what is crawled for crawling the page of all newly-increased APP in APP publication channels Newly-increased APP is as APP to be identified;Essential information on the page of each APP to be identified of extraction.
Optionally, described device further includes:Screening module;
The screening module is sieved for the essential information according to APP to be identified and preset matching strategy from APP to be identified Select related APP;
Correspondingly, the acquisition module, for obtaining the installation kit of correlation APP according to the essential information of related APP.
Optionally, the essential information includes:Apply Names;
Described device further includes:First processing module;
The first processing module, for respectively to the Apply Names of all legal copy APP in pre-established APP information banks into Row word segmentation processing obtains the corresponding feature phrase of each Apply Names;According to the position sequence of Feature Words in feature phrase to every The corresponding feature phrase of a Apply Names is for statistical analysis, and a Feature Words are selected from each position;
Correspondingly, the screening module, for the position sequence according to Feature Words by the corresponding features of the APP to be identified Feature Words in phrase are matched with the Feature Words selected, obtain matching degree;If judgement knows that the matching degree is more than the One predetermined threshold value, then it is correlation APP to confirm the APP to be identified.
Optionally, the essential information includes:Using introduction;
Described device further includes:Second processing module;
The Second processing module, for being carried out according to business function to all legal copy APP in pre-established APP information banks Classification;The application of all kinds of legal copy APP is introduced and carries out word segmentation processing, obtains full feature set;To the feature in the full feature set Word is for statistical analysis, obtains the weighted value of each Feature Words in the feature set and feature set of all kinds of legal copy APP;
Correspondingly, the screening module, for the Feature Words in the feature set of the APP to be identified and every one kind are legal Feature Words in the feature set of APP are individually matched;If judging to know, the sum of weighted value of Feature Words for matching is more than the Two predetermined threshold values, then it is correlation APP to confirm the APP to be identified.
Optionally, the pre-established APP information banks include:Legal APP word banks and counterfeit APP word banks;
Correspondingly, described device further includes:Optimization module;
The optimization module, it is legal for being stored in the relevant information of APP to be identified and APP to be identified according to judgement result APP word banks or counterfeit APP word banks.
As shown from the above technical solution, a kind of counterfeit APP recognition methods and device that the embodiment of the present invention proposes obtain automatically The target signature information of APP to be identified is taken, and APP to be identified is analyzed based on target signature information, it is to be identified to judge Whether APP is counterfeit APP, compared with prior art, reduces manually to the intervention of system, has the advantages that recognition efficiency is high.
Description of the drawings
The features and advantages of the present invention can be more clearly understood by reference to attached drawing, attached drawing is schematically without that should manage It solves to carry out any restrictions to the present invention, in the accompanying drawings:
Fig. 1 shows a kind of flow diagram for counterfeit APP recognition methods that one embodiment of the invention provides;
Fig. 2 shows a kind of flow diagrams for counterfeit APP recognition methods that another embodiment of the present invention provides;
Fig. 3 shows the flow signal of matching strategy in a kind of counterfeit APP recognition methods that one embodiment of the invention provides Figure;
Fig. 4 shows Apply Names matching strategy in a kind of counterfeit APP recognition methods that one embodiment of the invention provides Flow diagram;
Fig. 5, which shows to apply in a kind of counterfeit APP recognition methods that one embodiment of the invention provides, introduces matching strategy Flow diagram;
Fig. 6, which shows to apply in a kind of counterfeit APP recognition methods that one embodiment of the invention provides, introduces matching strategy Generating principle schematic diagram;
Fig. 7 shows a kind of flow diagram for counterfeit APP recognition methods that further embodiment of this invention provides;
Fig. 8 shows a kind of structure diagram for counterfeit APP identification devices that one embodiment of the invention provides;
Fig. 9 shows a kind of structure diagram for counterfeit APP identification devices that another embodiment of the present invention provides;
Figure 10 shows a kind of structure diagram for counterfeit APP identification devices that further embodiment of this invention provides.
Specific embodiment
Purpose, technical scheme and advantage to make the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is The part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people Member's all other embodiments obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.
Embodiment one
Fig. 1 shows a kind of flow diagram for counterfeit APP recognition methods that one embodiment of the invention provides, referring to Fig. 1, This method is realized by processor, is specifically comprised the following steps:
110th, the installation kit of APP to be identified is obtained, and target signature information is extracted from installation kit;
It should be noted that the related function module of processor downloads the installation of APP to be identified automatically by available channel Packet, and target signature information is extracted from installation kit based on preconfiguration parameters, target signature information herein includes:Main configuration text Part, resource file, the Smali files of decompiling, key sequence boundary figure, developer and certificate information etc..
120th, phase is carried out to the APP to be identified and the legal APP in pre-established APP information banks according to target signature information It is analyzed like degree, obtains the similarity between the APP to be identified and legal copy APP;
It is it should be noted that the target signature information of APP to be identified is similar to the target signature information progress of legal copy APP Degree analysis, that is, one by one by the master configuration file of APP to be identified, resource file, the Smali files of decompiling, key sequence boundary figure Shape, developer and certificate information etc. and the master configuration file, resource file, anti-volume of some APP in legal APP and counterfeit APP Smali files, key sequence boundary figure, developer and certificate information for translating etc. are compared respectively, so obtain APP to be identified with The similarity of the APP;Then APP to be identified and next APP in legal copy APP are compared, until by APP to be identified with Each APP in legal APP compares completion, each APP obtained in legal copy APP is similar to the APP's to be identified Degree.
130th, judge whether the APP to be identified is counterfeit APP according to the similarity and default decision rule.
It should be noted that based on the similarity obtained in step 120, default decision rule can be will be in legal APP The similarity of each APP and APP to be identified compared with preset similarity threshold, if judging to know described treat Identify that the similarity between APP and legal copy APP is less than the first similar threshold value, then it is counterfeit APP to judge the APP to be identified;If Similarity is more than or equal to the second similar threshold value, then judges the APP to be identified for legal APP;If similarity be more than or Equal to the first similar threshold value and less than the second similar threshold value, then manual identified is carried out.
As it can be seen that the present embodiment obtains the target signature information of APP to be identified automatically, and knowledge is treated based on target signature information Other APP is analyzed, and to judge whether APP to be identified is counterfeit APP, compared with prior art, is reduced manually to system Intervene, have the advantages that recognition efficiency is high.
Embodiment two
Fig. 2 shows a kind of flow diagram for counterfeit APP recognition methods that another embodiment of the present invention provides, referring to figure 2, this method is realized by processor, is specifically comprised the following steps:
210th, the page of all newly-increased APP in APP publication channels is crawled, using all newly-increased APP crawled as treating Identify APP;
It should be noted that based on preconfiguration parameters, the related function module of processor periodically crawls specified automatically The page of all newly-increased APP of publication channel, for example, scheduled some daily period crawl some publication channel.
Understandable to be, the publication channel of APP has very much, and the application is herein without enumerating.
220th, the essential information on the page of each APP to be identified is extracted;
Understandable to be, the page of APP is loaded with many essential informations, for example, the mark of APP, Apply Names, application Introduction, graphic interface etc.;It can be known when extracting essential information by image otherwise or the mode of scanning obtains.
230th, related APP is filtered out from APP to be identified to preset matching strategy according to the essential information of APP to be identified;
It is understandable to be, it can in the scheduled newly-increased APP for crawling some or certain several publication channels in the period Can have very much, and wherein only partly need to monitor, part in addition can't cause shadow to the legal APP that enterprise possesses It rings, therefore, it is necessary to first be screened to APP to be identified, to reduce the quantity of APP for needing to identify, and then reduces data processing Amount achievees the purpose that improve recognition efficiency.
In addition, preset matching strategy essential information can determine according to.
240th, the installation kit of correlation APP is obtained according to the essential information of related APP, and target spy is extracted from installation kit Reference ceases;
It should be noted that the processing based on step 230, processor only needs to carry out feature extraction i.e. to relevant APP It can.
250th, according to target signature information to the correlation APP and legal APP in pre-established APP information banks and counterfeit APP carries out similarity analysis, obtains the similarity between the correlation APP and legal copy APP;
260th, judge whether the related APP is counterfeit APP to default decision rule according to the similarity
It should be noted that step 250 and step 260 and step 120 in embodiment one and step 130 correspond to phase respectively Together, therefore, explanation be not unfolded herein.
As it can be seen that the embodiment of the present invention crawls APP to be identified automatically, and APP to be identified is screened, and screening scheme The installation file of APP to be identified need not be downloaded, improves the degree of automation of identifying schemes and the efficiency of identification.
Embodiment three
Fig. 3 shows the flow signal of matching strategy in a kind of counterfeit APP recognition methods that one embodiment of the invention provides Figure, is described in detail the preset matching strategy in embodiment two referring to Fig. 3:
301st, the essential information of APP to be identified is obtained, including:Apply Names, using introduce and developer's information;
302nd, the essential information for increasing APP newly is pre-processed, including carrying out Chinese word segmentation processing to Apply Names, obtained Feature set of words A;Application is introduced and carries out Chinese word segmentation processing, obtains feature set of words B;Chinese point is carried out to developer's information Word processing, obtains feature set of words C.
303rd, it by the feature set of words A of Apply Names, is matched with default Apply Names matching strategy.
304th, judge successful match Feature Words whether be more than systemic presupposition threshold value, i.e. whether similarity be more than default threshold Value, if so, confirming that the newly-increased APP is related to enterprise;If it is not, then perform step 307.
305th, the feature set of words B and the feature set of words C of developer's information introduced application, with default developer's information Matching strategy is matched.
306th, judge whether the Feature Words of successful match are more than the threshold value of systemic presupposition, if so, think the newly-increased APP with Enterprise is related;If it is not, then perform step 307.
307th, Apply Names are matched with presetting similar Apply Names matching strategy.
308th, judge whether the Feature Words of successful match are more than the threshold value of systemic presupposition, if so, performing step 309;If It is no, then confirm that the newly-increased APP is unrelated with enterprise.
309th, the feature set of words B that application is introduced is introduced matching strategy with default application to match, calculates it with answering With the matching degree for introducing each classification in matching strategy;
If the 310, judging to reach threshold value there are one the matching degree of classification, confirm that the application introduces content and belongs to enterprise and has The scope of business, and finally determine that the newly-increased APP is related to enterprise.
As it can be seen that the present embodiment matches APP to be identified, the process screened need not download the installation file of APP to be identified, because This, can achieve the purpose that improve recognition efficiency;Moreover, the present embodiment is by being respectively configured not the essential information of APP to be identified Therefore same matching strategy, can achieve the purpose that improve recognition efficiency.
Example IV
Fig. 4 shows Apply Names matching strategy in a kind of counterfeit APP recognition methods that one embodiment of the invention provides Flow diagram, referring to Fig. 4, Apply Names matching strategy includes:
410th, word segmentation processing is carried out to the Apply Names of all legal copy APP in pre-established APP information banks respectively, obtained every The corresponding feature phrase of a Apply Names;
It should be noted that there are many ways to word segmentation processing, such as:Segmenting method, the meaning of a word participle of string matching Method and statistical morphology etc..
420th, according to the position sequence of Feature Words in feature phrase, to each Apply Names, corresponding feature phrase is united Meter analysis, selects a Feature Words from each position;
It should be noted that feature phrase includes the position of multiple Feature Words and each Feature Words, such as:To " no Know what you are saying " carry out word segmentation processing after, obtain a feature phrase, feature phrase includes:First Feature Words-no Know, second Feature Words-you, third Feature Words-in the 4th Feature Words-what is said.
Then, based on the corresponding feature phrases of each legal copy APP, first Feature Words in feature phrase are carried out respectively Statistical analysis, statistical analysis are illustrated below:
The quantity of the identical legal APP of first Feature Words is counted, if first Feature Words is identical in Apply Names APP number is more than threshold value N, then this feature word is generated a similar Apply Names matching strategy;
The quantity of the identical legal APP of second Feature Words is counted, if second Feature Words is identical in Apply Names APP number is more than threshold value N, then this feature word is generated a similar Apply Names matching strategy;
The quantity of the identical legal APP of third Feature Words is counted, if third Feature Words are identical in Apply Names APP number is more than threshold value N, then this feature word is generated a similar Apply Names matching strategy;
In summary Feature Words Apply Names first position, the second position, the third place three kinds of situations, formed it is similar Apply Names matching strategy collection.
430th, according to the position sequence of Feature Words by the Feature Words in the corresponding feature phrases of the APP to be identified and selection The corresponding Feature Words in each position gone out are matched, and obtain matching degree;
It should be noted that the Apply Names to APP to be identified carry out word segmentation processing, the Feature Words of APP to be identified are obtained Group, and the Feature Words of corresponding position are matched respectively.
If the 440, judging to know that the matching degree is more than the first predetermined threshold value, confirm the APP to be identified for correlation APP, wherein, the first predetermined threshold value is configurable parameter.
As it can be seen that the Apply Names matching strategy that the present embodiment proposes uses Apply Names of the participle technique to APP to be identified Word segmentation processing is carried out, and is matched based on the Feature Words after word segmentation processing and the position of Feature Words, can be effectively improved The precision matched to filter out more relevant APP from all APP to be identified, and then improves the efficiency of identification APP.
Embodiment five
Fig. 5, which shows to apply in a kind of counterfeit APP recognition methods that one embodiment of the invention provides, introduces matching strategy Flow diagram referring to Fig. 5, includes using matching strategy is introduced:
510th, classified according to business function to all legal copy APP in pre-established APP information banks;
It should be noted that processor introduces according to business the application of all legal copy APP that the corresponding function module is sent Function is classified, such as the legal APP that telecom operators have by oneself classifies according to business function, can be divided into content class, Game class, tool-class, news category, social class, financial payment class and business hall class etc.;
520th, the application of all kinds of legal copy APP is introduced and carries out word segmentation processing, obtain the lemma of all application introductions, and then obtain Take full feature set;
530th, it is for statistical analysis to the Feature Words in the full feature set, obtain all kinds of legal copy APP feature set and The weighted value of each Feature Words in feature set;
It should be noted that the various frequencies that the whole lemmas of statistics occur, such as:Sum frequency, classification frequency etc..These statistics Foundation of the information as feature selecting.Wherein, the frequency that a certain lemma occurs can pass through number that the lemma occurs and all words The quantity of member, which calculates, to be obtained.
Then, the Feature Words and weight of each classification are extracted from full feature set by Feature Words and weight selection algorithm, It forms application and introduces matching strategy collection.
540th, by the Feature Words in the feature set of the APP to be identified and the feature in the feature set per a kind of legal copy APP Word is individually matched;
If the 550, judging to know that the sum of weighted value of Feature Words to match is more than the second predetermined threshold value, treated described in confirmation Identification APP is correlation APP, wherein, the second predetermined threshold value is configurable parameter.
The application of APP to be identified is introduced using participle technique as it can be seen that matching strategy is introduced in the application that the present embodiment proposes Word segmentation processing is carried out, and is matched based on the Feature Words after word segmentation processing and the weight of Feature Words, can be effectively improved The precision matched to filter out more relevant APP from all APP to be identified, and then improves the efficiency of identification APP.
Embodiment six
Fig. 6, which shows to apply in a kind of counterfeit APP recognition methods that one embodiment of the invention provides, introduces matching strategy Generating principle schematic diagram, referring to Fig. 6, generating principle is as follows:
Mentality of designing:This programme is that matching strategy set creation method, this method are introduced in the application based on Algorithm of documents categorization Using the classified application introductions of all legal copy APP as training text collection, it is applied by algorithm training managing and introduces matching Set of strategies, the application for matching newly-increased APP are introduced.Including:
620th, content of text is subjected to Chinese word segmentation, after text input, the lemma after output text participle.Common Chinese Segmenter includes IK Analyzer etc..
630th, the various frequencies that all lemmas of statistics training text collection occur, such as:Sum frequency, classification frequency etc..
640th, foundation of these statistical informations as feature selecting.
650th, the Feature Words of each classification are extracted from the full feature set of training text by feature and weight selection algorithm And weight, form the feature space of each classification.
660th, text is represented with text vector.
It should be noted that before step 620, further include:
680th, the step of training text collection;
Based on above-mentioned processing step, this method further includes:
610th, application introduction to be matched is obtained;
670th, it treats classifying text and predicts its classification, only discriminate whether to belong to enterprise correlation APP in the method, therefore As long as there are one the threshold values that classification reaches system setting.
It describes in detail below to the step of this method:
1st, step 660 specifically includes:
The word or phrase obtained after being segmented to Chinese text, is properly termed as feature lemma or characteristic item.It is to The foundation of text classification, i.e. some special items, can play the role of representing text.Assuming that training text is concentrated comprising m Text, n different items, then Di=(t1, t2..., tk..., tn)(1<i<M), a text is represented;To item t thereink(1 <k<N) assignment is denoted as wk, represent its significance level in the text, commonly referred to as item tkWeight.That is Di=(t1w1, t2w2..., tkwk..., tnwn) it is text DiVector represent.
2nd, step 650 specifically includes:
(1) feature selecting
Feature selecting is to wipe out the few item of information content, to reduce the complexity calculated, and improves the efficiency of classification.Belong to this Type dimension reduction method includes text frequency (DF), information gain (IG), mutual information (MI) etc..
The feature selecting algorithm that this method uses is information gain (IG) and improved mutual information (MI) phase based on weight With reference to feature selecting algorithm,.
1) IG information gains
The expected information content that can be used in classifying whether appeared in obtained in text by observing an item is known as believing Cease gain.
Assuming that D points of text collection is K classes, it is denoted as c1, c2..., cj..., ck.Lemma t is for classification cjInformation gain For:
P (c in formula (1)j) can be with belonging to classification c in entire training setjText estimate that and p (t) can use The textual data that training set middle term t occurs estimates,It can then be estimated with the textual data that training set middle term t does not occur.This Outside, p (cj| t) it is classification cjAt least there is primary text, p (c in middle term tj| t) it is classification cjIn do not contain the text of a t.
It is distributed in the classification that higher expression this t of IG values is concentrated in training text and more concentrates.IG methods be selection IG values compared with High feature, basic thought are that the feature that distribution is more concentrated is more important.
2) improved mutual information (MI)
Mutual information (MI) basic thought is:Higher feature is more important with Category Relevance.MI methods are extraction mutual informations The higher feature of value.Assuming that D points of text collection is K classes, it is denoted as c1, c2..., cj..., ck, lemma t is for classification cjMutual trust Cease MI (t, cj) traditional calculation formula be:
For convenience of calculation, can be reduced to:
Wherein, N is the text sum that training text concentration includes, and A is t and cjThe number occurred simultaneously, B go out for lemma t Now cjThe number not occurred, C cjOccur and number that lemma t does not occur, by the formula can obtain lemma with it is all kinds of The association relationship of other.
Lemma t and Average Mutual value of all categories, can be calculated with the following formula
From the point of view of the characteristics of statistics, it should preferentially be chosen at c of all categoriesjOn the bigger lemma t of mutual information dispersion make It is characterized, that is to say, that, it should being chosen at the bigger lemma of mutual information criterion difference of all categories can more rationally as feature.It improves Mutual information calculation formula it is as follows:
3) the combined feature selection algorithm based on weight
When being applied in combination to improve two feature selection approach, some method suppresses the feelings of another method completely Condition.This algorithm uses certain in the combined feature selection algorithm based on weight, that is, the feature set of each classification formed The lemma of ratio K is that a feature selection approach is elected, and remaining next lemma is elected by another method. K can be configured based on actual conditions in systems.
(2) feature weight calculates
When the feature selecting to each classification finishes, it is formed feature set.Feature selecting is only to determine t1, t2..., ti..., tn, and what feature weight was calculated is:ω1, ω2..., ωi..., ωn
The advantages of weight of feature is calculated in this algorithm using traditional TFIDF, considers IDF weights and TF weights and It is insufficient.Basic thought is:If characteristic item ti is in the number frequency that is fewer, and occurring in entire text set that certain class text occurs Rate is higher, then ti is smaller to the effect of classification, it should lower weight is assigned, conversely, then assigning higher weight.
Two kinds of information of text frequency and word frequency are utilized in the combination of TFIDF weights, i.e. IDF weights and TF weights, and formula is such as Under:
F in formula (6)ikI Feature Words occurrence number in kth piece text, N be training set in total textual data, niFor instruction Practice the textual data concentrated and ith feature word occur.
Wherein, WkiIt is directly proportional to the number that ith feature lemma occurs in text k, and with being concentrated in training text Occurring this feature lemma less, (total number is:ni) textual data be inversely proportional.
Excessively inhibited this phenomenon by high-frequency characteristic to reduce characteristics of low-frequency, it is corrected, formula is as follows:
N is feature lemma total number in text k in formula.
3rd, step 630 specifically includes:
Text message statistical module is characterized and weight selecting module service, and the feature that can be used in this algorithm carries Algorithm is taken to have:IG, improved MI, learn from calculation formula, it is only necessary to five statistics.
(1) sum of training text.
(2) the classification number in training text.
(3) the text number in training text in each classification.
(4) a certain lemma tiThe frequency occurred in the text of training set (only whether note occurs).
(5) some lemma tiThe frequency occurred in each classification text of training set (only whether note occurs).
4th, step 670 specifically includes:
It treats classifying text and predicts its classification, only discriminate whether to belong to enterprise correlation APP in this algorithm, therefore As long as reach threshold value there are one classification.
Assuming that text vector to be sorted is Ui=(t1w1, t2w2..., tkwk..., tnwn), the feature set of a certain classification D It is Di=(t1w1, t2w2..., tkwk..., tmwm), wherein m>n.
If text U to be sorted has K feature lemma to belong to classification D after Chinese word segmentation, then text U to be sorted and class The similarity L=t of other D1+t2+...+tkIf L reaches predetermined threshold, then it is assumed that the text U to be sorted belongs to enterprise's correlation APP。
Step 470 can also be:
Text classification is carried out using KNN algorithms, KNN algorithm main thoughts are to calculate text to be sorted with owning in training set The distance between sample according to result of calculation, obtains the text collection U of distance in a certain range.According to samples all in set U The score of all samples of classification each in text collection U is added by this generic, obtains text to be sorted for each Text to be sorted is finally included into one kind of similarity score maximum by the similarity score of classification.
The distance between sample, two of which point x=(x are defined with the angle between vector1, x2..., xn) and y= (y1, y2..., yn) vector angle:
Cos values are higher to represent that angle is smaller, and vectorial similarity is higher.
Embodiment seven
Fig. 7 shows a kind of flow diagram for counterfeit APP recognition methods that further embodiment of this invention provides, referring to figure 7, this method includes:
710th, the installation kit of APP to be identified is obtained, and target signature information is extracted from installation kit;
720th, according to target signature information to the APP to be identified and legal APP in pre-established APP information banks and counterfeit APP carries out similarity analysis, obtain the first similarity between the APP to be identified and legal copy APP, APP to be identified with it is counterfeit The second similarity between APP;
It should be noted that the legal APP in some APP information banks is there may be incomplete situation, such as:Enterprise A APP information banks in only have the legal APP of complete parent company, and the legal APP of subsidiary and imperfect;Therefore, this reality The similarity between the two-way consideration legal copy APP of example and counterfeit APP and APP to be identified is applied, to further improve the precision of identification.
In addition, the principle of similarity analysis is:To different types of characteristic information using corresponding analysis method, and set Per category feature information, corresponding weighted value is to calculate the similarity between each legal copy APP or counterfeit APP and APP to be identified;
For the information of the pattern classes in target signature information, such as:Key sequence boundary image, using image comparison algorithm The key sequence boundary image of APP to be identified and the key sequence boundary image of legal copy APP and counterfeit APP are compared one by one, obtain one Group similarity;
For the information of the word classification in target signature information, such as:Apply Names are believed using introduction, signing certificate Breath etc.;Can first to text information carry out word segmentation processing, then by each Feature Words after word segmentation processing respectively with legal APP and imitate The corresponding Feature Words of text information for emitting APP are compared, and obtain another group of similarity;
For the other information of code word in target signature information, such as:Resource file, master configuration file etc. can be adopted It is compared with 3 grade tools of Beyond Compare, obtains another group of similarity.
The corresponding pre-configuration weighted value of similarity and each similarity based on acquisition, calculate obtain APP to be identified with The similarity of one legal copy APP or counterfeit APP is similarly obtained between all legal copy APP and counterfeit APP and APP to be identified Similarity.
730th, judge whether the APP to be identified is counterfeit APP according to the similarity and default decision rule.
It should be noted that if judge to know that first similarity is less than the second similar threshold value, and described second similar Degree is more than or equal to the third similar threshold value, then it is counterfeit APP to judge the APP to be identified;If the first similarity is known in judgement It is less than second similar threshold value more than or equal to third similar threshold value and second similarity, then judges the APP to be identified For legal APP;If above-mentioned two situations are unsatisfactory for, manual identified is carried out.
Using the second similar threshold value as 20%, third similar threshold value is described as follows for being 80%:
If the maximum value in the first similarity between APP to be identified and legal copy APP is less than 20%, and APP to be identified with The maximum value in the second similarity between counterfeit APP is more than or equal to 80%, then it is counterfeit APP to judge APP to be identified.
If the maximum value in the first similarity between APP to be identified and legal copy APP is more than or equal to 80%, and wait to know The maximum value in the second similarity between other APP and counterfeit APP is less than 20%, then judges APP to be identified for legal APP.
If the value of the first similarity and/or the second similarity is in the range of 20%-80%, manual identified is carried out.
As it can be seen that the present embodiment by APP to be identified carry out bidirectional recognition, to further improve the precision of identification.
Embodiment eight
Fig. 8 shows a kind of structure diagram for counterfeit APP identification devices that one embodiment of the invention provides, referring to Fig. 8, The device includes:Acquisition module 810, analysis module 820 and judgment module 830, wherein:
Acquisition module 810 for obtaining the installation kit of APP to be identified, and extracts target signature information from installation kit;
Analysis module 820, for according to target signature information in the APP to be identified and pre-established APP information banks Legal APP carries out similarity analysis, obtains the corresponding similarities of the APP to be identified;
Judgment module 830, for judging whether the APP to be identified is imitative according to the similarity and default decision rule Emit APP.
It should be noted that receive start identification instruction be, acquisition module 810 automatically from specified address obtain treat It identifies the installation kit of APP, and the target signature information extracted from installation kit is sent to analysis module 820, analysis module 820 The similarity of APP to be identified He legal copy APP and counterfeit APP are analyzed, and analysis result is sent out based on target signature information Middle judgment module 830 is sent, being based on similarity by judgment module 830 judges whether APP to be identified is counterfeit APP.
As it can be seen that the present embodiment obtains the target signature information of APP to be identified automatically, and knowledge is treated based on target signature information Other APP is analyzed, and to judge whether APP to be identified is counterfeit APP, compared with prior art, is reduced manually to system Intervene, have the advantages that recognition efficiency is high.
In the present embodiment, device further includes:Crawl module;
It is described to crawl module, it is all newly-increased by what is crawled for crawling the page of all newly-increased APP of target channel APP is as APP to be identified;Essential information on the page of each APP to be identified of extraction;
The acquisition module 810, for obtaining the installation kit of APP to be identified according to the essential information of APP to be identified.
Embodiment nine
Fig. 9 shows a kind of structure diagram for counterfeit APP identification devices that another embodiment of the present invention provides, referring to figure 9, which includes preprocessing module 910, first processing module 920, Second processing module 930, screening module 940, obtains mould Block 950, analysis module 960, judgment module 970 and optimization module, wherein:
Analysis module 960 and judgment module 970 are corresponding with the analysis module 820 in embodiment eight and judgment module 830, Its operation principle is identical, therefore, not reinflated explanation herein.
Preprocessing module 910, it is all newly-increased in APP publication channels for when receiving the instruction for starting identification, crawling APP the page, using all newly-increased APP crawled as APP to be identified;Base on the page of each APP to be identified of extraction This information, and the information got is sent to screening module 940.
Screening module 940 is pre- in the information combination first processing module 920 and Second processing module 930 received If matching strategy filters out related APP from APP to be identified;
Correspondingly, acquisition module 950, for obtaining the installation kit of correlation APP according to the essential information of related APP.
Wherein, the essential information includes:Apply Names, using introduce etc., the screening principle of screening module 940 is specifically wrapped It includes:
The first processing module 920, for respectively to the application name of all legal copy APP in pre-established APP information banks Claim to carry out word segmentation processing, obtain the corresponding feature phrase of each Apply Names;According to the position sequence of Feature Words in feature phrase To each Apply Names, corresponding feature phrase is for statistical analysis, and a Feature Words are selected from each position;
Correspondingly, the screening module 940, it is for the position sequence according to Feature Words that the APP to be identified is corresponding Feature Words in feature phrase are matched with the Feature Words selected, obtain matching degree;If judgement knows that the matching degree is big In the first predetermined threshold value, then it is correlation APP to confirm the APP to be identified.
Alternatively,
The Second processing module 930, for according to business function to all legal copy APP in pre-established APP information banks Classify;The application of all kinds of legal copy APP is introduced and carries out word segmentation processing, obtains full feature set;To in the full feature set Feature Words are for statistical analysis, obtain the weighted value of each Feature Words in the feature set and feature set of all kinds of legal copy APP;
Correspondingly, the screening module 940, for by the Feature Words in the feature set of the APP to be identified and per a kind of Feature Words in the feature set of legal APP are individually matched;If judge to know that the sum of weighted value of Feature Words to match is big In the second predetermined threshold value, then it is correlation APP to confirm the APP to be identified.
In addition, the pre-established APP information banks include:Legal APP word banks and counterfeit APP word banks;
Correspondingly, the optimization module 980, for according to judgement result by APP to be identified it is related to APP's to be identified believe Breath deposit legal copy APP word banks or counterfeit APP word banks, so that first processing module 920 and Second processing module 930 can be based on Updated data update preset matching strategy.
Embodiment ten
Figure 10 shows a kind of structure diagram for counterfeit APP identification devices that further embodiment of this invention provides, referring to Figure 10, the device include:Website 100 to be monitored, correlation APP crawl download module 110, correlation APP analysis modules, monitoring report Generation module 130, APP uploading modules 140 to be monitored and automation policy generation module 150, wherein:
Related APP crawls download module 110 and includes:Channel library management unit 111, monitoring reptile unit 112, APP are related Sex determination unit 113 and correlation APP download units;
The operation principle that related APP crawls download module 110 is as follows:
Channel library management unit 111 is used to manage channel information library;Channel information library attribute mainly includes:Channel title, The essential attributes such as URL addresses, affiliated area, importance (high, normal, basic), channel explanation, LOGO information.Each channel needs to configure specially The attributes such as protocal analysis, page parsing of door, support the configuration of channel state, are divided into (available, unavailable) two states, by with Not available channel is set to no longer to be monitored.
Reptile unit 112 is monitored, for according to system configuration, daily timing to crawl all newly-increased of specified channel automatically The page of APP.Duplicate removal processing and page parsing are carried out to the APP pages crawled, extract the essential information on the newly-increased APP pages, Including Apply Names, using recommended information, developer's information.APP correlation prediction submodules are called, filters out and is had by oneself with enterprise The relevant newly-increased APP of APP, preserve its essential information, are sent to related APP download units.
APP correlation predictions unit 113, for by increase newly APP page parsings obtain Apply Names, using introduction Information, developer's information are analyzed, obtain the newly-increased APP whether with the relevant conclusion of this enterprise.
Related APP download units 114 for the enterprise's correlation APP information sent according to monitoring reptile unit 112, are downloaded Its installation package file, and characteristic information is extracted, including master configuration file, resource file, the Smali files of decompiling, crucial boundary Enterprise's correlation APP information that this newly obtains is sent to related APP analysis modules by face figure, developer and certificate information etc. 120。
The operation principle for automating policy generation module 150 is as follows:
The legal APP information that APP uploading modules 140 to be monitored are sent is received, generation APP correlation predictions unit needs Various set of strategies, including Apply Names matching strategy, similar Apply Names matching strategy, using introducing matching strategy, developer Information matches strategy.Wherein, Apply Names matching strategy collection includes obtaining after all legal copy APP Apply Names progress Chinese word segmentation The feature set of words arrived.Developer's information matches set of strategies is included after all legal APP developer's information progress Chinese word segmentations Obtained feature set of words.Similar Apply Names matching strategy collection include it is all can be similar to legal APP Apply Names progress Matched strategy.Matched strategy is carried out including all can be introduced with legal APP applications using matching strategy collection is introduced, is not difficult Understand, Different matching strategy is performed or performed using same matching unit using different matching units, to scheme For 1 show, automation policy generation module includes:Apply Names matching unit 151, similar Apply Names matching unit 152nd, using introducing matching unit 153 and developer's information matching unit 154.
The operation principle of related APP analysis modules 120 is as follows:
Automatically analyze unit 121:By related APP crawl APP to be identified that download module 110 sends successively with known APP APP in information bank 123 carries out similarity analysis, automatically analyzes decision rule of the unit 121 according to system, and automatic judgement is just Version APP or counterfeit APP, for the APP that system can not directly judge, goes to manual analysis unit 122, by system operation personnel It is analyzed and determined.Similarity analysis be the installation kit name to APP, master configuration file, resource file, decompiling Smali text Part, key sequence boundary figure, developer and certificate information etc. carry out comprehensive analysis, determine sample attribute, if belonging to legal APP, receive Enter to legal APP libraries;If belonging to counterfeit APP, it is included in counterfeit APP libraries.
Manual analysis unit 123, for manually being studied and judged with the relevant APP of enterprise to doubtful by system operation personnel, It determines sample attribute, if belonging to legal APP, brings legal APP libraries into;If belonging to counterfeit APP, it is included in counterfeit APP libraries.
APP information banks include legal copy APP libraries and counterfeit APP libraries.Known APP information banks attribute includes Apply Names, application Recommended information, application version number, signing certificate information, installation package file, master configuration file, resource file, decompiling Smali File, key sequence boundary figure etc..The source in legal APP libraries is broadly divided into three parts, first, initialization is formed, second is that on user Sample APP to be monitored is carried, third, in system operation, through being determined as legal APP after monitoring channel download.It is being It unites during O&M, constantly downloads newest APP from channel, by automatic and manual analysis, be determined as legal APP, will bring into Legal APP libraries;It is determined as the old version of legal APP, equally brings legal APP libraries into, legal APP libraries of enriching constantly increases The monitoring capability of system.The source in counterfeit APP libraries is mainly then in system operation, through judgement after monitoring channel download For counterfeit APP.
The operation principle of APP uploading modules 140 to be monitored is as follows:
System operation personnel submit APP monitoring requests by APP uploading modules to be monitored.The module needs to extract to be monitored The characteristic information of APP, including Apply Names, using recommended information, application version number, developer's information, signing certificate information, peace Fill APMB package, master configuration file, resource file, the Smali files of decompiling, key sequence boundary figure etc..And by APP's to be monitored Information include Apply Names, using recommended information, application version number, signing certificate information, installation package file, master configuration file, The relevant informations such as resource file, the Smali files of decompiling, key sequence boundary figure are sent to enterprise's correlation APP analysis modules and deposit Enter in legal APP libraries, while the information of APP to be monitored includes Apply Names, using recommended information, developer's information, is sent to Policy generation module 150 is automated, to update the various set of strategies of APP correlation predictions unit needs.
The operation principle of monitoring report generation module 130 is as follows:
First report generation unit 131 carrys out statistical data by monitored APP, counts the specifying information list of its counterfeit APP Generation report, statistical data are included it has been found that being monitored the counterfeit APP quantity of APP, each channel has found doubtful quantity and sentenced Fixed counterfeit APP quantity etc.;The specifying information of counterfeit APP crawls the time including sample, crawls channel, packet name, safety and disease Malicious title etc..
Second report generation unit 132 carrys out statistical data by APP publication channels, counts the specifying information list of counterfeit APP Generation report, statistical data include the counterfeit APP quantity of the channel, it has been found that doubtful quantity and the counterfeit APP quantity having determined that Deng;The specifying information of counterfeit APP crawls the time including sample, crawls channel, packet name, safety and Virus Name etc..
In the above-described embodiments, for method embodiment, in order to be briefly described, therefore it is all expressed as a series of dynamic It combines, but those skilled in the art should know, embodiment of the present invention is not limited by described sequence of movement, Because of embodiment according to the present invention, certain steps may be used other sequences or be carried out at the same time.Secondly, people in the art Member should also know that embodiment described in this description belongs to preferred embodiment, and involved action might not Necessary to being embodiment of the present invention.
For device embodiments, since it is substantially similar to method embodiment, so description is fairly simple, Related part illustrates referring to the part of method embodiment.
It should be noted that in all parts of the device of the invention, according to the function that it to be realized to therein Component has carried out logical partitioning, and still, the present invention is not only restricted to this, all parts can be repartitioned as needed or Person combines.
The all parts embodiment of the present invention can be with hardware realization or to be transported on one or more processor Capable software module is realized or is realized with combination thereof.In the present apparatus, PC is by realizing internet to equipment or device The step of remote control, accurately control device or device each operate.The present invention is also implemented as performing here The some or all equipment or program of device of described method are (for example, computer program and computer program production Product).The program of the present invention, which is achieved, may be stored on the computer-readable medium, and the file or document tool that program generates Having can be statistical, generates data report and cpk reports etc., and batch testing can be carried out to power amplifier and is counted.On it should be noted that Stating embodiment, the present invention will be described rather than limits the invention, and those skilled in the art are not departing from Replacement embodiment can be designed in the case of attached the scope of the claims.It in the claims, should not will be between bracket Any reference mark be configured to limitations on claims.Word "comprising" does not exclude the presence of member not listed in the claims Part or step.Word "a" or "an" before element does not exclude the presence of multiple such elements.The present invention can borrow Help the hardware for including several different elements and realized by means of properly programmed computer.If listing equipment for drying Unit claim in, several in these devices can be embodied by same hardware branch.Word first, Second and the use of third etc. do not indicate that any sequence.These words can be construed to title.
Although being described in conjunction with the accompanying embodiments of the present invention, those skilled in the art can not depart from this hair Various modifications and variations are made in the case of bright spirit and scope, such modifications and variations are each fallen within by appended claims Within limited range.

Claims (15)

1. a kind of counterfeit APP recognition methods, which is characterized in that including:
Step S1, the installation kit of APP to be identified is obtained, and target signature information is extracted from installation kit;
Step S2, phase is carried out to the APP to be identified and the legal APP in pre-established APP information banks according to target signature information It is analyzed like degree, obtains the similarity between the APP to be identified and legal copy APP;
Step S3, judge whether the APP to be identified is counterfeit APP according to the similarity and default decision rule.
2. according to the method described in claim 1, it is characterized in that, before step S1, the method further includes:
Step S0, the page of all newly-increased APP in APP publication channels is crawled, using all newly-increased APP crawled as waiting to know Other APP;
Essential information on the page of each APP to be identified of extraction.
3. according to the method described in claim 2, it is characterized in that, after step S0 and before step S1, the method It further includes:
Step S0', related APP is filtered out from APP to be identified to preset matching strategy according to the essential information of APP to be identified;
Correspondingly, step S1 includes:
The installation kit of correlation APP is obtained according to the essential information of related APP.
4. according to the method described in claim 3, it is characterized in that, the essential information includes:Apply Names;
Before step S0', the method further includes:
Word segmentation processing is carried out to the Apply Names of all legal copy APP in pre-established APP information banks respectively, obtains each application name Claim corresponding feature phrase;
According to the position sequence of Feature Words in feature phrase, to each Apply Names, corresponding feature phrase is for statistical analysis, from A Feature Words are selected in each position;
Correspondingly, the step S0' includes:
According to the position sequence of Feature Words by the Feature Words in the corresponding feature phrases of the APP to be identified and the feature selected Word is matched, and obtains matching degree;
If judgement knows that the matching degree is more than the first predetermined threshold value, it is correlation APP to confirm the APP to be identified.
5. according to the method described in claim 3, it is characterized in that, the essential information includes:Using introduction;
Before step S0', the method further includes:
Classified according to business function to all legal copy APP in pre-established APP information banks;
The application of all kinds of legal copy APP is introduced and carries out word segmentation processing, obtains full feature set;
It is for statistical analysis to the Feature Words in the full feature set, in the feature set and feature set that obtain all kinds of legal copy APP The weighted value of each Feature Words;
The step S0' includes:
Feature Words in the feature set of the APP to be identified and the Feature Words in the feature set per a kind of legal copy APP are subjected to list Solely matching;
If judging to know that the sum of weighted value of Feature Words to match is more than the second predetermined threshold value, the APP to be identified is confirmed For related APP.
6. according to the method described in claim 1, it is characterized in that, the step S3 includes:
If judging to know, the similarity between the APP to be identified and legal copy APP is less than the first similar threshold value, is treated described in judgement Identification APP is counterfeit APP.
7. according to the method described in claim 1, it is characterized in that, the step S2 includes:
According to target signature information respectively to the legal APP in the APP to be identified and pre-established APP information banks and counterfeit APP Carry out similarity analysis, obtain the first similarity between the APP to be identified and legal copy APP, the APP to be identified with it is counterfeit The second similarity between APP.
8. the method according to the description of claim 7 is characterized in that the step S3 includes:
If judgement knows that first similarity is less than the second similar threshold value, and second similarity is more than or equal to the third Similar threshold value, then it is counterfeit APP to judge the APP to be identified;
Alternatively,
If judgement knows that the first similarity is more than or equal to third similar threshold value and second similarity is similar less than described second Threshold value then judges the APP to be identified for legal APP.
9. according to claim 1-8 any one of them methods, which is characterized in that the pre-established APP information banks include:It is legal APP word banks and counterfeit APP word banks;
Correspondingly, the method further includes:
Step S4, the relevant information of APP to be identified and APP to be identified are stored in by legal APP word banks according to judgement result or imitated Emit APP word banks.
10. a kind of counterfeit APP identification devices, which is characterized in that including:
Acquisition module for obtaining the installation kit of APP to be identified, and extracts target signature information from installation kit;
Analysis module, for according to target signature information to the legal APP in the APP to be identified and pre-established APP information banks Similarity analysis is carried out, obtains the similarity between the APP to be identified and legal copy APP;
Judgment module, for judging whether the APP to be identified is counterfeit APP according to the similarity and default decision rule.
11. device according to claim 10, which is characterized in that described device further includes:Preprocessing module;
The preprocessing module, it is all newly-increased by what is crawled for crawling the page of all newly-increased APP in APP publication channels APP as APP to be identified;Essential information on the page of each APP to be identified of extraction.
12. according to the devices described in claim 11, which is characterized in that described device further includes:Screening module;
The screening module filters out for the essential information according to APP to be identified and preset matching strategy from APP to be identified Related APP;
Correspondingly, the acquisition module, for obtaining the installation kit of correlation APP according to the essential information of related APP.
13. device according to claim 12, which is characterized in that the essential information includes:Apply Names;
Described device further includes:First processing module;
The first processing module, for dividing respectively the Apply Names of all legal copy APP in pre-established APP information banks Word processing, obtains the corresponding feature phrase of each Apply Names;It is answered according to the position sequence of Feature Words in feature phrase each It is for statistical analysis with the corresponding feature phrase of title, a Feature Words are selected from each position;
Correspondingly, the screening module, for the position sequence according to Feature Words by the corresponding feature phrases of the APP to be identified In Feature Words matched with the Feature Words selected, obtain matching degree;It is pre- that if judgement knows that the matching degree is more than first If threshold value, then it is correlation APP to confirm the APP to be identified.
14. device according to claim 12, which is characterized in that the essential information includes:Using introduction;
Described device further includes:Second processing module;
The Second processing module, for being divided according to business function all legal copy APP in pre-established APP information banks Class;The application of all kinds of legal copy APP is introduced and carries out word segmentation processing, obtains full feature set;To the Feature Words in the full feature set It is for statistical analysis, obtain the weighted value of each Feature Words in the feature set and feature set of all kinds of legal copy APP;
Correspondingly, the screening module, for by the Feature Words in the feature set of the APP to be identified and per a kind of legal copy APP Feature set in Feature Words individually matched;If judge to know that the sum of weighted value of Feature Words to match is more than second in advance If threshold value, then it is correlation APP to confirm the APP to be identified.
15. according to claim 10-14 any one of them devices, which is characterized in that the pre-established APP information banks include: Legal APP word banks and counterfeit APP word banks;
Correspondingly, described device further includes:Optimization module;
The optimization module, for the relevant information of APP to be identified and APP to be identified to be stored in legal copy APP according to judgement result Word bank or counterfeit APP word banks.
CN201611153579.0A 2016-12-14 2016-12-14 Counterfeit APP recognition methods and device Pending CN108229131A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611153579.0A CN108229131A (en) 2016-12-14 2016-12-14 Counterfeit APP recognition methods and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611153579.0A CN108229131A (en) 2016-12-14 2016-12-14 Counterfeit APP recognition methods and device

Publications (1)

Publication Number Publication Date
CN108229131A true CN108229131A (en) 2018-06-29

Family

ID=62638991

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611153579.0A Pending CN108229131A (en) 2016-12-14 2016-12-14 Counterfeit APP recognition methods and device

Country Status (1)

Country Link
CN (1) CN108229131A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108897739A (en) * 2018-07-20 2018-11-27 西安交通大学 A kind of intelligentized application traffic identification feature automatic mining method and system
WO2020177116A1 (en) * 2019-03-07 2020-09-10 华为技术有限公司 Counterfeit app identification method and apparatus
CN112016580A (en) * 2019-05-31 2020-12-01 北京百度网讯科技有限公司 Application program name identification method and device and terminal
CN112149101A (en) * 2019-06-28 2020-12-29 北京智明星通科技股份有限公司 False game APP identification method and system
CN112348104A (en) * 2020-11-17 2021-02-09 百度在线网络技术(北京)有限公司 Counterfeit program identification method, apparatus, device and storage medium
CN112507182A (en) * 2020-12-17 2021-03-16 上海连尚网络科技有限公司 Application screening method and device
CN114612118A (en) * 2022-03-17 2022-06-10 杭州云深科技有限公司 Counterfeit app identification system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222199A (en) * 2011-06-03 2011-10-19 奇智软件(北京)有限公司 Method and system for identifying identification of application program
EP2693356A2 (en) * 2012-08-02 2014-02-05 Google Inc. Detecting pirated applications
CN103823751A (en) * 2013-12-13 2014-05-28 国家计算机网络与信息安全管理中心 Counterfeit application program monitoring method based on characteristic implantation
CN104123493A (en) * 2014-07-31 2014-10-29 百度在线网络技术(北京)有限公司 Method and device for detecting safety performance of application program
CN104133832A (en) * 2014-05-15 2014-11-05 腾讯科技(深圳)有限公司 Pirate application identification method and device
CN104424402A (en) * 2013-08-28 2015-03-18 卓易畅想(北京)科技有限公司 Method and device for detecting pirated application program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102222199A (en) * 2011-06-03 2011-10-19 奇智软件(北京)有限公司 Method and system for identifying identification of application program
EP2693356A2 (en) * 2012-08-02 2014-02-05 Google Inc. Detecting pirated applications
CN104424402A (en) * 2013-08-28 2015-03-18 卓易畅想(北京)科技有限公司 Method and device for detecting pirated application program
CN103823751A (en) * 2013-12-13 2014-05-28 国家计算机网络与信息安全管理中心 Counterfeit application program monitoring method based on characteristic implantation
CN104133832A (en) * 2014-05-15 2014-11-05 腾讯科技(深圳)有限公司 Pirate application identification method and device
CN104123493A (en) * 2014-07-31 2014-10-29 百度在线网络技术(北京)有限公司 Method and device for detecting safety performance of application program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高尚 著: "《分布估计算法及其应用》", 31 January 2016, 国防工业出版社 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108897739A (en) * 2018-07-20 2018-11-27 西安交通大学 A kind of intelligentized application traffic identification feature automatic mining method and system
CN108897739B (en) * 2018-07-20 2020-06-26 西安交通大学 Intelligent automatic mining method and system for application flow identification characteristics
WO2020177116A1 (en) * 2019-03-07 2020-09-10 华为技术有限公司 Counterfeit app identification method and apparatus
CN112016580A (en) * 2019-05-31 2020-12-01 北京百度网讯科技有限公司 Application program name identification method and device and terminal
CN112016580B (en) * 2019-05-31 2023-07-25 北京百度网讯科技有限公司 Application program name identification method, device and terminal
CN112149101A (en) * 2019-06-28 2020-12-29 北京智明星通科技股份有限公司 False game APP identification method and system
CN112348104A (en) * 2020-11-17 2021-02-09 百度在线网络技术(北京)有限公司 Counterfeit program identification method, apparatus, device and storage medium
CN112348104B (en) * 2020-11-17 2023-08-18 百度在线网络技术(北京)有限公司 Identification method, device, equipment and storage medium for counterfeit program
CN112507182A (en) * 2020-12-17 2021-03-16 上海连尚网络科技有限公司 Application screening method and device
CN114612118A (en) * 2022-03-17 2022-06-10 杭州云深科技有限公司 Counterfeit app identification system
CN114612118B (en) * 2022-03-17 2024-05-28 杭州云深科技有限公司 Counterfeit app identification system

Similar Documents

Publication Publication Date Title
CN108229131A (en) Counterfeit APP recognition methods and device
US20230205610A1 (en) Systems and methods for removing identifiable information
Ullah et al. Cyber security threats detection in internet of things using deep learning approach
US20200082097A1 (en) Combination of Protection Measures for Artificial Intelligence Applications Against Artificial Intelligence Attacks
CN111914256B (en) Defense method for machine learning training data under toxic attack
Zhang et al. A deep learning method to detect web attacks using a specially designed CNN
Pacheco et al. Uncovering coordinated networks on social media
CN106845265B (en) Document security level automatic identification method
Biggio et al. Is data clustering in adversarial settings secure?
US20130218620A1 (en) Method and system for skill extraction, analysis and recommendation in competency management
CN110084468B (en) Risk identification method and device
CN114077741B (en) Software supply chain safety detection method and device, electronic equipment and storage medium
CN108023868B (en) Malicious resource address detection method and device
CN109756467B (en) Phishing website identification method and device
CN108197474A (en) The classification of mobile terminal application and detection method
WO2022245581A1 (en) Methods and systems for facilitating secure authentication of users based on known data
Khan Detection of phishing websites using deep learning techniques
Vijayalakshmi et al. Hybrid dual-channel convolution neural network (DCCNN) with spider monkey optimization (SMO) for cyber security threats detection in internet of things
Liu et al. Cloning your mind: Security challenges in cognitive system designs and their solutions
CN117172875A (en) Fraud detection method, apparatus, device and storage medium
CN111277433A (en) Network service abnormity detection method and device based on attribute network characterization learning
Ajagbe et al. Ensuring intrusion detection for iot services through an improved CNN
CN108200776A (en) For determining the system and method for the safe class of unknown applications
Liang et al. Leverage temporal convolutional network for the representation learning of urls
Carminati et al. A supervised auto-tuning approach for a banking fraud detection system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180629

RJ01 Rejection of invention patent application after publication