CN109144831B - Method and device for acquiring APP identification rule - Google Patents

Method and device for acquiring APP identification rule Download PDF

Info

Publication number
CN109144831B
CN109144831B CN201710453676.XA CN201710453676A CN109144831B CN 109144831 B CN109144831 B CN 109144831B CN 201710453676 A CN201710453676 A CN 201710453676A CN 109144831 B CN109144831 B CN 109144831B
Authority
CN
China
Prior art keywords
app
word segmentation
characteristic
segmentation unit
apps
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710453676.XA
Other languages
Chinese (zh)
Other versions
CN109144831A (en
Inventor
储晶星
邓圆
傅一平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Zhejiang Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Zhejiang Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Zhejiang Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201710453676.XA priority Critical patent/CN109144831B/en
Publication of CN109144831A publication Critical patent/CN109144831A/en
Application granted granted Critical
Publication of CN109144831B publication Critical patent/CN109144831B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3612Software analysis for verifying properties of programs by runtime analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/865Monitoring of software

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides an APP identification rule obtaining method and device. The method comprises the following steps: the method comprises the steps of obtaining characteristic fields included in access logs generated by running of a plurality of to-be-obtained APPs in a preset time period, wherein the characteristic fields include URL fields and UA fields; acquiring word segmentation units included in the characteristic field, and calculating a characteristic score corresponding to each word segmentation unit in each APP to be acquired; and generating an identification rule corresponding to each APP to be obtained according to the characteristic score. The device is used for executing the method. The method and the device provided by the invention improve the acquisition efficiency of the APP identification rule.

Description

Method and device for acquiring APP identification rule
Technical Field
The embodiment of the invention relates to the technical field of internet, in particular to a method and a device for acquiring an APP identification rule.
Background
With the rapid development of mobile internet, people increasingly like to install various Applications (APPs) on a mobile phone, the APPs become important entrances for online behaviors, and the feature analysis and traffic identification of various APPs on a network side have important values, so that the research on the acquisition method of the APP identification rules is more and more concerned by people.
Under the condition of the prior art, the method for acquiring the APP identification rule mainly comprises a reverse compilation method and a manual packet capturing method. The decompiling method comprises the steps of decompiling an APP installation package, and obtaining static characteristics such as application names, version numbers and the like in the installation package as identification rules; the manual packet capturing method is that an APP to be acquired is installed and operated manually, Uniform Resource Locator (URL) and User Agent (UA) information generated in the using process of the APP to be acquired are captured through packet capturing software, and typical characteristics in the using process of the APP to be acquired are summarized in a manual analysis mode to serve as an identification rule. However, the decompilation method can obtain static features such as application names, version numbers and the like, cannot obtain dynamic features of the to-be-obtained APPs in the using process, and can only analyze APPs with obvious features of a small number of installation packages; the manual packet capturing method needs manual downloading and installation, manual packet capturing analysis and other steps, 10-20 minutes are needed for identifying one APP, the experience of an analyst is highly depended on, and misjudgment or missed judgment of part of identification rules is easily caused; in addition, the inverse compiling method and the manual packet capturing method can only acquire the identification rule aiming at a single APP, and the efficiency of acquiring the APP identification rule is greatly influenced by the problems.
Therefore, how to provide a method to improve the acquisition efficiency of the APP recognition rule is an important issue to be solved in the industry.
Disclosure of Invention
Aiming at the defects in the prior art, the embodiment of the invention provides a method and a device for acquiring an APP identification rule.
In one aspect, an embodiment of the present invention provides a method for acquiring an APP identification rule, including:
the method comprises the steps of obtaining characteristic fields included in access logs generated by running of a plurality of to-be-obtained APPs in a preset time period, wherein the characteristic fields include URL fields and UA fields;
acquiring word segmentation units included in the characteristic field, and calculating a characteristic score corresponding to each word segmentation unit in each APP to be acquired;
and generating an identification rule corresponding to each APP to be obtained according to the characteristic score.
On the other hand, an embodiment of the present invention provides an apparatus for acquiring an APP identification rule, including:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring characteristic fields included in access logs generated by running a plurality of to-be-acquired APPs in a preset time period, and the characteristic fields include URL fields and UA fields;
the calculation unit is used for acquiring word segmentation units included in the characteristic field and calculating the corresponding characteristic score of each word segmentation unit in each APP to be acquired;
and the processing unit is used for generating the identification rule corresponding to each APP to be obtained according to the characteristic score.
In another aspect, an embodiment of the present invention provides an electronic device, including a processor, a memory, and a bus, where:
the processor and the memory complete mutual communication through a bus;
the processor may invoke a computer program in memory to perform the steps of the above-described method.
In yet another aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the above-mentioned method.
According to the method and the device for acquiring the APP identification rules, the characteristic fields included in the access logs generated by the operation of the multiple APPs to be acquired within the preset time period are acquired, the word segmentation units included in the characteristic fields are acquired, and the characteristic score of each word segmentation unit corresponding to each APP to be acquired is calculated, so that the identification rules corresponding to each APP to be acquired are generated according to the characteristic scores, and the acquisition efficiency of the APP identification rules is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of an obtaining method of an APP identification rule according to an embodiment of the present invention;
fig. 2 is a schematic overall flow chart of an obtaining method of an APP identification rule according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an apparatus for obtaining APP identification rules according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus for obtaining an APP identification rule according to another embodiment of the present invention;
fig. 5 is a schematic structural diagram of an entity apparatus of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow diagram of a method for obtaining an APP identification rule according to an embodiment of the present invention, and as shown in fig. 1, the embodiment provides a method for obtaining an APP identification rule, including:
s101, obtaining characteristic fields included in an access log generated by running a plurality of to-be-obtained APPs in a preset time period, wherein the characteristic fields include URL fields and UA fields;
specifically, the obtaining device of the APP identification rule obtains a plurality of access logs generated by running of APPs to be obtained within a preset time period, and extracts a URL field and a UA field from the access logs as a feature field, where the feature field may further include other fields, and may be specifically adjusted according to an actual situation, and this is not specifically limited here.
S102, word segmentation units included in the characteristic field are obtained, and a characteristic score corresponding to each word segmentation unit in each APP to be obtained is calculated;
specifically, the device carries out word segmentation and stop word preprocessing on the URL field and the UA field corresponding to each APP to be acquired, obtains word segmentation units included in the characteristic field, then, the device counts the times of occurrence of each word segmentation unit in each characteristic field corresponding to the APP to be acquired and the number of the APPs to be acquired in the target corresponding to each word segmentation unit, and the device calculates the characteristic score corresponding to each word segmentation unit in each APP to be acquired according to the times and the number. The target APP to be obtained is the corresponding APP to be obtained in the characteristic field including the word segmentation unit.
S103, generating identification rules corresponding to the APPs to be obtained according to the feature scores.
Specifically, the device sequences all word segmentation units corresponding to the to-be-acquired APPs from high to low according to the feature scores, takes the word segmentation units with the preset number in the front of the sequence as feature keywords of the to-be-acquired APPs, and generates identification rules of the to-be-acquired APPs according to the feature keywords.
According to the method for acquiring the APP identification rules, the characteristic fields included in the access logs generated by the operation of the multiple APPs to be acquired within the preset time period are acquired, the word segmentation units included in the characteristic fields are acquired, and the characteristic scores of the word segmentation units corresponding to the APPs to be acquired are calculated, so that the identification rules corresponding to the APPs to be acquired are generated according to the characteristic scores, and the acquisition efficiency of the APP identification rules is improved.
On the basis of the foregoing embodiment, further, the obtaining word segmentation units included in the feature field, and calculating feature scores corresponding to each word segmentation unit in each APP to be obtained includes:
respectively acquiring word segmentation units included in the characteristic fields corresponding to the to-be-acquired APPs;
counting the times of occurrence of each word segmentation unit in the characteristic field corresponding to each APP to be obtained and the number of target APPs to be obtained corresponding to each word segmentation unit, wherein the target APPs to be obtained are the corresponding characteristic fields including the APPs to be obtained of the word segmentation units;
and calculating the corresponding characteristic score of each word segmentation unit in each APP to be obtained according to the times and the number.
Specifically, the device respectively acquires each word segmentation unit included in the characteristic field corresponding to the to-be-acquired APP, counts the number of times that the word segmentation unit appears in each characteristic field corresponding to the to-be-acquired APP and the number of the to-be-acquired APPs in the targets corresponding to the word segmentation unit, and calculates the feature score corresponding to each word segmentation unit in each to-be-acquired APP according to the number of times that the word segmentation unit appears in each characteristic field corresponding to the to-be-acquired APP and the number of the to-be-acquired APPs in the targets corresponding to the word segmentation unit. The target APP to be obtained is the corresponding APP to be obtained in the characteristic field including the word segmentation unit. For example, table 1 shows to-be-obtained APPs, corresponding word segmentation units, and the number of times that each word segmentation unit appears in each to-be-obtained APP, as shown in table 1, taking word segmentation unit a as an example, the number of times that word segmentation unit a appears in APP1 is 2, word segmentation unit a appears in APP1, APP2, and APP4, respectively, then the number of target to-be-obtained APPs corresponding to word segmentation unit a is 3, the feature score corresponding to word segmentation unit a in APP1 is calculated according to the number of times being 2 and the number of times being 3, the feature score corresponding to word segmentation unit b in APP1 is calculated in the same method, and the feature score corresponding to word segmentation unit c in APP1 is calculated.
TABLE 1
Figure BDA0001323161130000051
On the basis of the foregoing embodiment, further, the calculating, according to the number and the number of times, a feature score corresponding to each word segmentation unit in each APP to be obtained includes:
according to the times of the word segmentation units appearing in the characteristic fields corresponding to the to-be-obtained APPs, according to a formula:
Figure BDA0001323161130000061
calculating a corresponding time characteristic value of each word segmentation unit in each APP to be obtained; wherein, Fi,jThe jth word segmentation unit is at the ith times of characteristic values, N, corresponding to the APP to be obtainedi,jThe number of times, N, that the jth word segmentation unit appears in the characteristic field corresponding to the ith APP to be obtainediThe total times of occurrence of each word segmentation unit in the characteristic field corresponding to the ith APP to be obtained;
according to the number of the APP to be obtained by the target corresponding to each word segmentation unit, according to a formula:
Figure BDA0001323161130000062
calculating the number characteristic value of the target APP to be obtained corresponding to each word segmentation unit; mjThe number characteristic value of the target APP to be obtained corresponding to the jth word segmentation unit is obtained, P is the total number of the APPs to be obtained, P isjThe number of the APP to be obtained for the target corresponding to the jth word segmentation unit;
according to the formula:
Δi,j=Fi,j×Mj
calculating the corresponding characteristic score of each word segmentation unit in each APP to be obtained; wherein, Deltai,jA feature score, F, corresponding to the jth participle unit in the ith APPi,jThe jth word segmentation unit is at the ith times of characteristic values, M, corresponding to the APP to be obtainedjAnd obtaining the number characteristic value of the APP to be obtained for the target corresponding to the jth word segmentation unit.
Specifically, the device counts, for each APP to be obtained, the respective times of occurrence of each participle unit included in the feature field corresponding to the APP, and the total times of occurrence of each participle unit, according to a formula:
Figure BDA0001323161130000063
calculating a corresponding time characteristic value of each word segmentation unit in each APP to be obtained; wherein, Fi,jThe jth word segmentation unit is at the ith times of characteristic values, N, corresponding to the APP to be obtainedi,jThe number of times, N, that the jth word segmentation unit appears in the characteristic field corresponding to the ith APP to be obtainediAnd obtaining the total times of the word segmentation units appearing in the characteristic field corresponding to the ith APP to be obtained.
The device acquires the number of the APP to be acquired by the targets respectively corresponding to the word segmentation units, and the total number of the APP to be acquired, and according to the formula:
Figure BDA0001323161130000071
calculating the number characteristic value of the target APP to be obtained corresponding to each word segmentation unit; mjThe number characteristic value of the target APP to be obtained corresponding to the jth word segmentation unit is obtained, P is the total number of the APPs to be obtained, P isjAnd obtaining the number of the APP to be obtained for the target corresponding to the jth word segmentation unit.
The device is used for obtaining the number characteristic value of the APP to be obtained according to the corresponding number characteristic value of each word segmentation unit in each APP to be obtained and the number characteristic value of the APP to be obtained in the target corresponding to each word segmentation unit according to a formula: deltai,j=Fi,j×MjCalculating the corresponding characteristic score of each word segmentation unit in each APP to be obtained; wherein, Deltai,jA feature score, F, corresponding to the jth participle unit in the ith APPi,jThe jth word segmentation unit is at the ith times of characteristic values, M, corresponding to the APP to be obtainedjAnd obtaining the number characteristic value of the APP to be obtained for the target corresponding to the jth word segmentation unit.
For example, continuing to refer to table 1, taking the example of calculating the corresponding feature score of the segmentation unit a in APP1, the total number of occurrences of the segmentation unit a, the segmentation unit b, and the segmentation unit c included in the feature field corresponding to APP1 is 2+1+3 — 6, the number of occurrences of the segmentation unit a in APP1 is 2, and the corresponding number of occurrences feature score of the segmentation unit a in APP1 is 2/(2+1+3) — 1/3. The total number of the APPs to be obtained is 5, and the number of the APPs to be obtained including the word segmentation unit a in the corresponding feature field is 3, that is, the number of the target APPs to be obtained corresponding to the word segmentation unit a is 3, then the feature value of the number of the target APPs to be obtained corresponding to the word segmentation unit a is log [5/(3+1) ], which is log1.25, and the corresponding feature value of the word segmentation unit a in APP1 is (1/3) × log1.25, which is 0.0291. The device may also calculate, in the same method, a feature score corresponding to each word segmentation unit in each APP to be obtained, which is not described herein again.
On the basis of the foregoing embodiment, further, the generating an identification rule of each APP to be obtained according to the feature score includes:
sorting word segmentation units included in the characteristic field corresponding to each APP to be obtained according to the characteristic score from high to low, and taking the word segmentation units with the preset number in the front of the sorting as the characteristic keywords of the APP to be obtained;
and generating identification rules of the APP to be obtained according to the characteristic keywords.
Specifically, the device ranks word segmentation units included in a corresponding feature field of each to-be-obtained APP according to the feature score from high to low, takes a preset number of word segmentation units ranked in the front as feature keywords of the to-be-obtained APP, and generates an identification rule of each to-be-obtained APP according to the feature keywords. The preset number can be adjusted and set according to actual conditions, and is not specifically limited herein.
For example, continuing to refer to table 1, taking APP1 as an example, the feature score of a participle unit a, the feature score of a participle unit b, and the feature score of a participle unit c, which are included in the feature field corresponding to APP1, are calculated as 0.0291, 0.0162, and 0.0485 respectively through the above-mentioned procedures, the apparatus ranks the participle units according to the feature scores as participle unit c > participle unit a > participle unit b, the apparatus may take the first two ranked participle units c and participle unit a as feature keywords of APP1, and may generate an APP1 recognition rule according to the feature keywords as follows: identifying the APP with the matching word segmentation unit and the APP with the matching word segmentation unit a and the matching word segmentation unit c in the characteristic field in the running log as the APP 1. The device can also respectively generate other identification rules of the APP to be obtained according to the method, and the specific flow is not described herein again.
In the foregoing embodiments, the method further includes:
acquiring the installation package of the APP to be acquired, and installing and simulating the APP to be acquired;
obtaining an access log generated by the simulation operation of the APP to be obtained in a preset time period;
and acquiring the characteristic field according to the access log, and storing the characteristic field.
Specifically, the apparatus may collect information (such as name, site category, download link) of the APP to be acquired from an application market, download an installation package of the APP to be acquired, install the APP to be acquired to a simulator, setting the networking IP of the simulator as the IP of a proxy server, starting the APP to be acquired, the log monitor monitors the access log output by the proxy server, and stops the acquisition of the APP after the preset time period, then collecting all access logs generated by the APP to be obtained in the operation within the preset time period, extracting URL fields and UA fields from the access logs as the characteristic fields, and storing the characteristic fields to a specified storage position, when the device acquires the identification rule of the APP to be acquired, the characteristic field may be acquired from the specified storage location. The device obtains and stores a plurality of characteristic fields of the APP to be obtained by the same method, and details are not repeated here.
According to the method for acquiring the APP identification rules, the characteristic fields included in the access logs generated by the operation of the multiple APPs to be acquired within the preset time period are acquired, the word segmentation units included in the characteristic fields are acquired, and the characteristic scores of the word segmentation units corresponding to the APPs to be acquired are calculated, so that the identification rules corresponding to the APPs to be acquired are generated according to the characteristic scores, and the acquisition efficiency of the APP identification rules is improved.
Fig. 2 is a schematic overall flow diagram of a method for acquiring an APP identification rule according to an embodiment of the present invention, and as shown in fig. 2, the method for acquiring an APP identification rule according to an embodiment of the present invention specifically includes the following steps:
s201, collecting information of an APP to be obtained; the device can collect the information of the APP to be obtained from an application market, wherein the information comprises a name, a site classification, a download link and other information; then, step S202 is executed;
s202, downloading and installing the APP to be obtained and installing the APP to the simulator; the device downloads the installation package of the APP to be obtained, installs the APP to be obtained to the simulator, and sets the networking IP of the simulator as the IP of the proxy server; then, step S202 is executed;
s203, simulating and operating the APP to be obtained; starting the APP to be obtained, and monitoring an access log output by the proxy server by a log monitoring program; then the step of
S204, judging whether the running time reaches a preset time period or not; if the device judges that the running time reaches the preset time period, the device stops the APP to be obtained, then step S205 is executed, otherwise, the device returns to step S203;
s205, obtaining an access log; the device collects all access logs generated by the running of the APP to be obtained in the preset time period, and then executes a step S206;
s206, extracting and storing the characteristic field of the APP to be obtained; the device extracts a URL field and a UA field from the access log as the characteristic fields, saves them to a designated storage location, and then performs step S207;
s207, whether the number of the APP to be obtained in the simulation operation reaches a threshold value or not; if the device judges that the number of the APP to be obtained which are subjected to the simulated operation reaches the threshold value, executing a step S208, otherwise, returning to execute the step S201;
s208, obtaining a plurality of characteristic fields of the APP to be obtained; the device acquires a plurality of characteristic fields of the APP to be acquired from the specified storage location, and then executes step S209;
s209, obtaining word segmentation units included in the characteristic fields corresponding to the to-be-obtained APPs; the device performs word segmentation and stop word removal on the URL field and the UA field corresponding to each APP to be obtained, obtains word segmentation units included in the characteristic fields, and then executes step S210;
s210, counting the times of occurrence of each word segmentation unit in each characteristic field corresponding to the APP to be obtained; then, step S202 is executed;
s210, counting the number of target to-be-acquired APPs corresponding to each word segmentation unit; then, step S202 is executed;
s211, calculating a corresponding feature score of each word segmentation unit in each APP to be obtained; the device calculates a feature score corresponding to each word segmentation unit in each to-be-acquired APP according to the occurrence frequency of each word segmentation unit in the feature field corresponding to each to-be-acquired APP and the number of target to-be-acquired APPs corresponding to each word segmentation unit, and then executes step S212;
s212, generating identification rules of the APP to be obtained according to the feature scores; the device sorts all word segmentation units corresponding to the APP to be obtained from high to low according to the characteristic scores, takes the word segmentation units with the preset number in the front of the sorting as the characteristic keywords of the APP to be obtained, and generates the identification rules of the APP to be obtained according to the characteristic keywords
According to the method for acquiring the APP identification rules, the characteristic fields included in the access logs generated by the operation of the multiple APPs to be acquired within the preset time period are acquired, the word segmentation units included in the characteristic fields are acquired, and the characteristic scores of the word segmentation units corresponding to the APPs to be acquired are calculated, so that the identification rules corresponding to the APPs to be acquired are generated according to the characteristic scores, and the acquisition efficiency of the APP identification rules is improved.
Fig. 3 is a schematic structural diagram of an apparatus for obtaining an APP identification rule according to an embodiment of the present invention, and as shown in fig. 3, the apparatus for obtaining an APP identification rule according to an embodiment of the present invention includes an obtaining unit 301, a calculating unit 302, and a processing unit 303, where:
the obtaining unit 301 is configured to obtain a feature field included in an access log generated by running multiple APPs to be obtained within a preset time period, where the feature field includes a URL field and a UA field; the calculating unit 302 is configured to obtain word segmentation units included in the feature field, and calculate a feature score corresponding to each word segmentation unit in each APP to be obtained; the processing unit 303 is configured to generate an identification rule corresponding to each APP to be acquired according to the feature score.
Specifically, the obtaining unit 301 obtains a plurality of access logs generated by running the APP to be obtained within a preset time period, and extracts a URL field and a UA field from the access logs as a feature field, where the feature field may further include other fields, and may be specifically adjusted according to an actual situation, and this is not specifically limited here. After preprocessing such as word segmentation and stop word removal is performed on the URL field and the UA field corresponding to each APP to be acquired by the calculating unit 302, a word segmentation unit included in the feature field is obtained, then, the calculating unit 302 counts the number of times that each word segmentation unit appears in each feature field corresponding to the APP to be acquired and the number of target APPs to be acquired corresponding to each word segmentation unit, and the calculating unit 302 calculates the feature score corresponding to each word segmentation unit in each APP to be acquired according to the number of times and the number. The target APP to be obtained is the corresponding APP to be obtained in the characteristic field including the word segmentation unit. The processing unit 303 sequences the word segmentation units corresponding to the to-be-acquired APPs according to the feature scores from high to low, takes the word segmentation units with the preset number in the front of the sequence as feature keywords of the to-be-acquired APPs, and generates identification rules of the to-be-acquired APPs according to the feature keywords.
According to the device for acquiring the APP identification rules, the characteristic fields included in the access logs generated by the operation of the multiple APPs to be acquired within the preset time period are acquired, the word segmentation units included in the characteristic fields are acquired, and the characteristic scores of the word segmentation units corresponding to the APPs to be acquired are calculated, so that the identification rules corresponding to the APPs to be acquired are generated according to the characteristic scores, and the acquisition efficiency of the APP identification rules is improved.
Fig. 4 is a schematic structural diagram of an apparatus for obtaining an APP identification rule according to another embodiment of the present invention, and as shown in fig. 4, the apparatus for obtaining an APP identification rule according to the embodiment of the present invention includes an obtaining unit 401, a calculating unit 402, and a processing unit 403, where the obtaining unit 401 and the processing unit 403 are the same as the obtaining unit 401 and the processing unit 403 in the foregoing embodiment, and the calculating unit 402 includes an obtaining subunit 404, a statistics subunit 405, and a calculating subunit 406, where:
the obtaining subunit 404 is configured to obtain word segmentation units included in the feature field corresponding to each APP to be obtained, respectively; the counting subunit 405 is configured to count the number of times that each word segmentation unit appears in the feature field corresponding to each APP to be acquired, and the number of target APPs to be acquired corresponding to each word segmentation unit, where the target APPs to be acquired are the APPs to be acquired in the feature field corresponding to each word segmentation unit; and the calculation subunit 406 is configured to calculate, according to the number of times and the number, a feature score corresponding to each word segmentation unit in each APP to be obtained.
Specifically, obtaining subunit 404 obtains each respectively the to-be-obtained APP corresponds the participle unit included in the feature field, statistics subunit 405 counts each the number of times that the participle unit appears in each the feature field to be obtained APP corresponds, and each the number of APPs to be obtained by the target to which the participle unit corresponds, and calculating subunit 406 calculates each the participle unit is in each the to-be-obtained APP corresponding feature score according to the number of times that the participle unit appears in each the feature field to be obtained APP corresponds and each the number of APPs to be obtained by the target to which the participle unit corresponds. The target APP to be obtained is the corresponding APP to be obtained in the characteristic field including the word segmentation unit.
According to the device for acquiring the APP identification rules, the characteristic fields included in the access logs generated by the operation of the multiple APPs to be acquired within the preset time period are acquired, the word segmentation units included in the characteristic fields are acquired, and the characteristic scores of the word segmentation units corresponding to the APPs to be acquired are calculated, so that the identification rules corresponding to the APPs to be acquired are generated according to the characteristic scores, and the acquisition efficiency of the APP identification rules is improved.
On the basis of the foregoing embodiment, further, the calculating subunit 406 is specifically configured to:
according to the times of the word segmentation units appearing in the characteristic fields corresponding to the to-be-obtained APPs, according to a formula:
Figure BDA0001323161130000131
calculating a corresponding time characteristic value of each word segmentation unit in each APP to be obtained; wherein, Fi,jThe jth word segmentation unit is at the ith times of characteristic values, N, corresponding to the APP to be obtainedi,jThe number of times, N, that the jth word segmentation unit appears in the characteristic field corresponding to the ith APP to be obtainediThe total times of occurrence of each word segmentation unit in the characteristic field corresponding to the ith APP to be obtained;
according to the number of the APP to be obtained by the target corresponding to each word segmentation unit, according to a formula:
Figure BDA0001323161130000132
calculating the number characteristic value of the target APP to be obtained corresponding to each word segmentation unit; mjThe number characteristic value of the target APP to be obtained corresponding to the jth word segmentation unit is obtained, P is the total number of the APPs to be obtained, P isjThe number of the APP to be obtained for the target corresponding to the jth word segmentation unit;
according to the formula:
Δi,j=Fi,j×Mj
calculating the corresponding characteristic score of each word segmentation unit in each APP to be obtained; wherein, Deltai,jA feature score, F, corresponding to the jth participle unit in the ith APPi,jThe jth word segmentation unit is at the ith times of characteristic values, M, corresponding to the APP to be obtainedjAnd obtaining the number characteristic value of the APP to be obtained for the target corresponding to the jth word segmentation unit.
Specifically, the calculating subunit 406, for each APP to be obtained, counts the respective times of occurrence of each participle unit included in the feature field corresponding to the APP, and the total times of occurrence of each participle unit, according to a formula:
Figure BDA0001323161130000133
calculating a corresponding time characteristic value of each word segmentation unit in each APP to be obtained; wherein, Fi,jThe jth word segmentation unit is at the ith times of characteristic values, N, corresponding to the APP to be obtainedi,jThe number of times, N, that the jth word segmentation unit appears in the characteristic field corresponding to the ith APP to be obtainediAnd obtaining the total times of the word segmentation units appearing in the characteristic field corresponding to the ith APP to be obtained.
The calculating subunit 406 obtains the number of the target to-be-obtained APPs respectively corresponding to each word segmentation unit, and the total number of the to-be-obtained APPs, and according to the formula:
Figure BDA0001323161130000141
calculating the number characteristic value of the target APP to be obtained corresponding to each word segmentation unit; mjThe number characteristic value of the target APP to be obtained corresponding to the jth word segmentation unit is obtained, P is the total number of the APPs to be obtained, P isjAnd obtaining the number of the APP to be obtained for the target corresponding to the jth word segmentation unit.
The calculating subunit 406, according to the number characteristic value of each word segmentation unit corresponding to each APP to be obtained and the number characteristic value of each word segmentation unit corresponding to the target APP to be obtained, according to a formula: deltai,j=Fi,j×MjCalculating the corresponding characteristic score of each word segmentation unit in each APP to be obtained; wherein, Deltai,jA feature score, F, corresponding to the jth participle unit in the ith APPi,jThe jth word segmentation unit is at the ith times of characteristic values, M, corresponding to the APP to be obtainedjAnd obtaining the number characteristic value of the APP to be obtained for the target corresponding to the jth word segmentation unit.
According to the device for acquiring the APP identification rules, the characteristic fields included in the access logs generated by the operation of the multiple APPs to be acquired within the preset time period are acquired, the word segmentation units included in the characteristic fields are acquired, and the characteristic scores of the word segmentation units corresponding to the APPs to be acquired are calculated, so that the identification rules corresponding to the APPs to be acquired are generated according to the characteristic scores, and the acquisition efficiency of the APP identification rules is improved.
The embodiment of the apparatus provided in the present invention may be specifically configured to execute the processing flows of the above method embodiments, and the functions of the apparatus are not described herein again, and refer to the detailed description of the above method embodiments.
Fig. 5 is a schematic structural diagram of an entity apparatus of an electronic device according to an embodiment of the present invention, and as shown in fig. 5, the electronic device may include: a processor (processor)501, a memory (memory)502 and a bus 503, wherein the processor 501 and the memory 502 are communicated with each other through the bus 503. The processor 501 may call the computer program in the memory 502 to perform the following method: the method comprises the steps of obtaining characteristic fields included in access logs generated by running of a plurality of to-be-obtained APPs in a preset time period, wherein the characteristic fields include URL fields and UA fields; acquiring word segmentation units included in the characteristic field, and calculating a characteristic score corresponding to each word segmentation unit in each APP to be acquired; and generating an identification rule corresponding to each APP to be obtained according to the characteristic score.
An embodiment of the present invention discloses a computer program product, which includes a computer program stored on a non-transitory computer readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer can execute the methods provided by the above method embodiments, for example, the method includes: the method comprises the steps of obtaining characteristic fields included in access logs generated by running of a plurality of to-be-obtained APPs in a preset time period, wherein the characteristic fields include URL fields and UA fields; acquiring word segmentation units included in the characteristic field, and calculating a characteristic score corresponding to each word segmentation unit in each APP to be acquired; and generating an identification rule corresponding to each APP to be obtained according to the characteristic score.
An embodiment of the present invention provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores a computer program, where the computer program causes the computer to execute the method provided by the foregoing method embodiments, for example, the method includes: the method comprises the steps of obtaining characteristic fields included in access logs generated by running of a plurality of to-be-obtained APPs in a preset time period, wherein the characteristic fields include URL fields and UA fields; acquiring word segmentation units included in the characteristic field, and calculating a characteristic score corresponding to each word segmentation unit in each APP to be acquired; and generating an identification rule corresponding to each APP to be obtained according to the characteristic score.
In addition, the logic instructions in the memory 503 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (6)

1. A method for acquiring an APP identification rule is characterized by comprising the following steps:
the method comprises the steps of obtaining characteristic fields included in access logs generated by running of a plurality of to-be-obtained APPs in a preset time period, wherein the characteristic fields include URL fields and UA fields;
acquiring word segmentation units included in the characteristic field, and calculating a characteristic score corresponding to each word segmentation unit in each APP to be acquired;
generating an identification rule corresponding to each APP to be obtained according to the characteristic score;
the obtaining of the word segmentation units included in the characteristic field and the calculation of the corresponding characteristic score of each word segmentation unit in each APP to be obtained includes:
respectively acquiring word segmentation units included in the characteristic fields corresponding to the to-be-acquired APPs;
counting the times of occurrence of each word segmentation unit in the characteristic field corresponding to each APP to be obtained and the number of target APPs to be obtained corresponding to each word segmentation unit, wherein the target APPs to be obtained are the corresponding characteristic fields including the APPs to be obtained of the word segmentation units;
calculating the corresponding characteristic score of each word segmentation unit in each APP to be obtained according to the times and the number;
calculating the corresponding feature score of each word segmentation unit in each APP to be obtained according to the times and the number, wherein the calculation comprises the following steps:
according to the times of the word segmentation units appearing in the characteristic fields corresponding to the to-be-obtained APPs, according to a formula:
Figure FDA0003165070550000011
calculating a corresponding time characteristic value of each word segmentation unit in each APP to be obtained; wherein, Fi,jThe jth word segmentation unit is at the ith times of characteristic values, N, corresponding to the APP to be obtainedi,jThe ith word segmentation unit is the ith word to be segmentedObtaining the times of occurrence, N, in the characteristic field corresponding to the APPiThe total times of occurrence of each word segmentation unit in the characteristic field corresponding to the ith APP to be obtained;
according to the number of the APP to be obtained by the target corresponding to each word segmentation unit, according to a formula:
Figure FDA0003165070550000021
calculating the number characteristic value of the target APP to be obtained corresponding to each word segmentation unit; mjThe number characteristic value of the target APP to be obtained corresponding to the jth word segmentation unit is obtained, P is the total number of the APPs to be obtained, P isjThe number of the APP to be obtained for the target corresponding to the jth word segmentation unit;
according to the formula:
Δi,j=Fi,j×Mj
calculating the corresponding characteristic score of each word segmentation unit in each APP to be obtained; wherein, Deltai,jCorresponding feature score, F, of the jth word segmentation unit in the ith APP to be obtainedi,jThe jth word segmentation unit is at the ith times of characteristic values, M, corresponding to the APP to be obtainedjAnd obtaining the number characteristic value of the APP to be obtained for the target corresponding to the jth word segmentation unit.
2. The method according to claim 1, wherein the generating identification rules corresponding to the APPs to be obtained according to the feature scores includes:
sorting word segmentation units included in the characteristic field corresponding to each APP to be obtained according to the characteristic score from high to low, and taking the word segmentation units with the preset number in the front of the sorting as the characteristic keywords of the APP to be obtained;
and generating identification rules of the APP to be obtained according to the characteristic keywords.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
acquiring the installation package of the APP to be acquired, and installing and simulating the APP to be acquired;
obtaining an access log generated by the simulation operation of the APP to be obtained in a preset time period;
and acquiring the characteristic field according to the access log, and storing the characteristic field.
4. An apparatus for obtaining APP identification rules, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring characteristic fields included in access logs generated by running a plurality of to-be-acquired APPs in a preset time period, and the characteristic fields include URL fields and UA fields;
the calculation unit is used for acquiring word segmentation units included in the characteristic field and calculating the corresponding characteristic score of each word segmentation unit in each APP to be acquired;
the processing unit is used for generating identification rules corresponding to the APPs to be obtained according to the characteristic scores;
the calculation unit includes:
the obtaining subunit is configured to obtain word segmentation units included in the feature field corresponding to each APP to be obtained, respectively;
a counting subunit, configured to count the number of times that each word segmentation unit appears in the feature field corresponding to each APP to be acquired, and the number of target APPs to be acquired corresponding to each word segmentation unit, where the target APPs to be acquired are the APPs to be acquired in the feature field corresponding to each word segmentation unit;
the calculating subunit is configured to calculate, according to the times and the number, a feature score corresponding to each word segmentation unit in each APP to be acquired;
the calculation subunit is specifically configured to:
according to the times of the word segmentation units appearing in the characteristic fields corresponding to the to-be-obtained APPs, according to a formula:
Figure FDA0003165070550000031
calculating a corresponding time characteristic value of each word segmentation unit in each APP to be obtained; wherein, Fi,jThe jth word segmentation unit is at the ith times of characteristic values, N, corresponding to the APP to be obtainedi,jThe number of times, N, that the jth word segmentation unit appears in the characteristic field corresponding to the ith APP to be obtainediThe total times of occurrence of each word segmentation unit in the characteristic field corresponding to the ith APP to be obtained;
according to the number of the APP to be obtained by the target corresponding to each word segmentation unit, according to a formula:
Figure FDA0003165070550000032
calculating the number characteristic value of the target APP to be obtained corresponding to each word segmentation unit; mjThe number characteristic value of the target APP to be obtained corresponding to the jth word segmentation unit is obtained, P is the total number of the APPs to be obtained, P isjThe number of the APP to be obtained for the target corresponding to the jth word segmentation unit;
according to the formula:
Δi,j=Fi,j×Mj
calculating the corresponding characteristic score of each word segmentation unit in each APP to be obtained; wherein, Deltai,jCorresponding feature score, F, of the jth word segmentation unit in the ith APP to be obtainedi,jThe jth word segmentation unit is at the ith times of characteristic values, M, corresponding to the APP to be obtainedjAnd obtaining the number characteristic value of the APP to be obtained for the target corresponding to the jth word segmentation unit.
5. An electronic device comprising a processor, a memory, and a bus, wherein:
the processor and the memory complete mutual communication through a bus;
the processor may invoke a computer program in memory to perform the steps of the method of any of claims 1-3.
6. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 3.
CN201710453676.XA 2017-06-15 2017-06-15 Method and device for acquiring APP identification rule Active CN109144831B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710453676.XA CN109144831B (en) 2017-06-15 2017-06-15 Method and device for acquiring APP identification rule

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710453676.XA CN109144831B (en) 2017-06-15 2017-06-15 Method and device for acquiring APP identification rule

Publications (2)

Publication Number Publication Date
CN109144831A CN109144831A (en) 2019-01-04
CN109144831B true CN109144831B (en) 2021-10-29

Family

ID=64830160

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710453676.XA Active CN109144831B (en) 2017-06-15 2017-06-15 Method and device for acquiring APP identification rule

Country Status (1)

Country Link
CN (1) CN109144831B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245273B (en) * 2019-06-21 2021-04-30 武汉绿色网络信息服务有限责任公司 Method for acquiring APP service feature library and corresponding device
CN112839004B (en) * 2019-11-22 2022-09-06 中国电信股份有限公司 Application identification method and device
CN111740923A (en) * 2020-06-22 2020-10-02 北京神州泰岳智能数据技术有限公司 Method and device for generating application identification rule, electronic equipment and storage medium
CN114500309B (en) * 2022-04-13 2022-07-08 南京华飞数据技术有限公司 Network application flow automatic configuration recognition system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298735A (en) * 2014-09-30 2015-01-21 北京金山安全软件有限公司 Method and device for identifying application program type
CN104331662A (en) * 2013-07-22 2015-02-04 深圳市腾讯计算机系统有限公司 Method and device for detecting Android malicious application
CN104618132A (en) * 2014-12-16 2015-05-13 北京神州绿盟信息安全科技股份有限公司 Generation method and generation device for application program recognition rule

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9152674B2 (en) * 2012-04-27 2015-10-06 Quixey, Inc. Performing application searches

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331662A (en) * 2013-07-22 2015-02-04 深圳市腾讯计算机系统有限公司 Method and device for detecting Android malicious application
CN104298735A (en) * 2014-09-30 2015-01-21 北京金山安全软件有限公司 Method and device for identifying application program type
CN104618132A (en) * 2014-12-16 2015-05-13 北京神州绿盟信息安全科技股份有限公司 Generation method and generation device for application program recognition rule

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
程骏.面向移动互联网的文本分类技术应用研究.《中国优秀硕士学位论文全文数据库信息科技辑》.2017,(第02期), *
面向移动互联网的文本分类技术应用研究;程骏;《中国优秀硕士学位论文全文数据库信息科技辑》;20170215(第02期);I138-4563 *

Also Published As

Publication number Publication date
CN109144831A (en) 2019-01-04

Similar Documents

Publication Publication Date Title
CN109144831B (en) Method and device for acquiring APP identification rule
CN110198310B (en) Network behavior anti-cheating method and device and storage medium
CN107168854B (en) Internet advertisement abnormal click detection method, device, equipment and readable storage medium
CN109669795B (en) Crash information processing method and device
CN106649831B (en) Data filtering method and device
CN107566358A (en) A kind of Risk-warning reminding method, device, medium and equipment
CN105577528B (en) A kind of wechat public platform collecting method and device based on virtual machine
CN103336766A (en) Short text garbage identification and modeling method and device
CN110929203B (en) Abnormal user identification method, device, equipment and storage medium
CN105404631B (en) Picture identification method and device
CN111740923A (en) Method and device for generating application identification rule, electronic equipment and storage medium
CN111078742B (en) User classification model training method, user classification method and device
WO2020258102A1 (en) Content pushing method and apparatus, mobile terminal and storage medium
CN111340062A (en) Mapping relation determining method and device
CN112153062A (en) Multi-dimension-based suspicious terminal equipment detection method and system
CN111126928A (en) Method and device for auditing release content
CN108804501B (en) Method and device for detecting effective information
CN114329452A (en) Abnormal behavior detection method and device and related equipment
CN111209998B (en) Training method and device of machine learning model based on data type
CN112269937B (en) Method, system and device for calculating user similarity
CN111400511B (en) Method and device for intercepting multimedia resources
CN107241342A (en) A kind of network attack crosstalk detecting method and device
CN113783855A (en) Site evaluation method, site evaluation device, electronic apparatus, storage medium, and program product
CN114637684A (en) Application program identification method and device, electronic equipment and storage medium
CN113721960A (en) Application program bug fixing method and device based on RPA and AI

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant