CN108376071B - APP identification method and system - Google Patents

APP identification method and system Download PDF

Info

Publication number
CN108376071B
CN108376071B CN201610994224.8A CN201610994224A CN108376071B CN 108376071 B CN108376071 B CN 108376071B CN 201610994224 A CN201610994224 A CN 201610994224A CN 108376071 B CN108376071 B CN 108376071B
Authority
CN
China
Prior art keywords
app
url
apps
data
url data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610994224.8A
Other languages
Chinese (zh)
Other versions
CN108376071A (en
Inventor
楼弘
庞夫星
许鑫伶
许大虎
杜建雄
李晓平
梅铮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Hangzhou Information Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Hangzhou Information Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201610994224.8A priority Critical patent/CN108376071B/en
Publication of CN108376071A publication Critical patent/CN108376071A/en
Application granted granted Critical
Publication of CN108376071B publication Critical patent/CN108376071B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/53Decompilation; Disassembly
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention discloses an APP identification method and system, wherein the method comprises the following steps: the system crawls information of the APP download website according to a preset mode to obtain an APP download link; the system downloads the APP compressed package according to the APP download link; the system analyzes the APP compressed packet by adopting a decompilation mode, acquires URL data inside the APP, and establishes an APP _ URL corresponding relation between the APP and the URL data inside the APP; the system receives user data containing URL request information, identifies the APP corresponding to the user data according to the URL request information in the user data and the APP _ URL corresponding relation between the APP and the URL data in the APP, and solves the problems of low efficiency, high automation difficulty, incomplete identification and the like existing in user APP identification.

Description

APP identification method and system
Technical Field
The invention relates to the field of big data, in particular to an APP identification method and system.
Background
With the progress of software and hardware of the smart phone and the technology of the internet, the mobile internet is rapidly developed with the advantages of convenience, rapidness and anytime and anywhere internet access. As the largest mobile phone operator, china mobile masters massive user internet access data, and mobile phone applications (APP, Application) of mobile phone users are identified through the user internet access data, so that the mobile phone applications can be used in the fields of analyzing user internet access behaviors, accurate marketing and the like, and therefore, the mobile phone Application has certain theoretical research and practical Application values.
The main adopted ways of identifying APP at present are: performing APP packet capture analysis in a manual mode to obtain Uniform Resource Locator (URL) data in an APP, removing an unidentifiable part after obtaining accurate URL data, establishing an APP _ URL database for an identifiable URL, and comparing URL request information contained in user data with the APP _ URL database when analyzing Deep Packet Inspection (DPI) data to identify the APP used by a user.
Although the prior art can determine the mobile phone APP corresponding to the partial URL in the user data more accurately through a large amount of manual auditing, the defects of low efficiency, high automation difficulty, incomplete identification and the like exist in the manual mode identification of the APP, and the mass data of the APP in the mobile internet is the cup salary.
Disclosure of Invention
In order to solve the technical problems, embodiments of the present invention are expected to provide an APP identification method and system, so as to solve the problems of low efficiency, high automation difficulty, incomplete identification and the like in user APP identification.
The technical scheme of the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides an APP identification method, where the method is used in an APP identification system, and the method includes:
the system crawls information of the APP download website according to a preset mode to obtain an APP download link;
the system downloads the APP compressed package according to the APP download link;
the system analyzes the APP compressed packet by adopting a decompilation mode, acquires URL data inside the APP, and establishes an APP _ URL corresponding relation between the APP and the URL data inside the APP;
the system receives user data containing URL request information, and identifies an APP corresponding to the user data according to the URL request information in the user data and an APP _ URL corresponding relation between the APP and URL data in the APP.
In the above scheme, the information crawling is performed according to a preset mode, specifically according to the following steps: the method comprises the steps of first-level classification definition, second-level classification crawling, list information crawling and detail information crawling for information crawling.
Further, the system crawls information to the APP download website according to the preset mode, acquires the APP download link, and specifically includes:
the system acquires an APP downloading website, and takes an APP downloading website page address as input information of information crawling;
the system classifies the applications in the APP website page addresses according to a preset mode;
when the system requests the classification of a specific application, all APPs of the classification of the specific application are obtained;
and when the system requests a specific APP detail page in all the APPs of the specific application classification, obtaining an APP download link of the specific APP detail page.
In the above scheme, the decompilation mode adopts a decompression tool java.util.zip provided by Java to parse files in the APP installation package into character code streams.
In the above scheme, the system adopts a decompilation mode to analyze the APP compressed packet, acquires the inside URL data of the APP, and establishes an APP _ URL corresponding relationship between the APP and the inside URL data of the APP, specifically including:
the system analyzes the APP compressed packet by adopting a decompilation mode to obtain a character code stream analyzed by the APP compressed packet;
the system establishes a regular expression for matching URL data in an APP, and the regular expression is matched with the character code stream line by line;
when the regular expression is successfully matched with the character code stream, the system takes the character code stream successfully matched with the regular expression as URL data in the APP;
the system establishes an APP _ URL corresponding relationship between the APP and URL data inside the APP.
In the above scheme, the system receives user data including URL request information, identifies an APP corresponding to the user data according to the URL request information in the user data and an APP _ URL correspondence between the APP and URL data inside the APP, and specifically includes:
the system receives user data containing URL request information, analyzes the received user data and acquires the URL request information in the user data;
when URL request information in user data is matched with URL data in an APP _ URL corresponding relation between the APP and URL data in the APP, the system identifies the APP corresponding to the user data according to the matched URL data.
In the above scheme, when the system obtains APP _ URL correspondence between multiple sets of APPs and URL data inside the APPs, if there is a same URL data corresponding to multiple APPs in APP _ URL correspondence between multiple sets of APPs and URL data inside the APPs, the method further includes: and the system performs data cleaning on the APP _ URL corresponding relation between the multiple groups of acquired APPs and the URL data in the APPs.
Further, the system performs data cleaning on APP _ URL corresponding relations between the obtained multiple sets of APPs and the URL data inside the APPs, and specifically includes:
the system counts APP _ URL corresponding relations with the same URL data in APP _ URL corresponding relations among multiple groups of APPs and URL data in the APPs;
the system removes suffixes from any two APPs with the same URL data through a character string matching method to obtain two APPs without suffixes;
the system inquires the minimum string of the two APP without suffixes to obtain the minimum string length;
the system queries an APP with a shorter character string in the two APP names without suffixes to obtain the APP name length with the shorter character string;
the system takes the ratio of the minimum character string length to the shorter APP name length of the character string as the similarity;
when the similarity is smaller than a preset threshold value of the system, the system replaces the corresponding relation between the two APPs and the URL data with the APP _ URL corresponding relation between the APP with the shorter character string and the URL data in the names of the two APPs without suffixes.
In the above scheme, when the system obtains APP _ URL correspondence between multiple sets of APPs and URL data inside the APPs, if there is correspondence between the same APP and multiple URL data in APP _ URL correspondence between multiple sets of APPs and URL data inside the APPs, the method further includes: the system carries out outer chain exclusion on APP _ URL corresponding relations between multiple groups of obtained APPs and URL data in the APPs.
Further, the system performs outer chain exclusion on APP _ URL corresponding relations between the obtained multiple sets of APPs and the URL data inside the APPs, and specifically includes:
the system counts APP _ URL corresponding relations with the same APP in APP _ URL corresponding relations among multiple groups of APPs and URL data in the APPs;
when the system queries that an APP-URL corresponding relation between an APP and URL data inside the APP exists in an APP-URL corresponding relation, acquiring URL data corresponding to an APP interface request through the APP interface request corresponding to the APP open interface;
and the system excludes the URL data corresponding to the APP interface request.
In a second aspect, an embodiment of the present invention provides an APP identification system, where the system includes: the device comprises an acquisition module, a downloading module, an analysis module and an identification module; wherein,
the acquisition module is used for crawling information of the APP download website according to a preset mode to acquire an APP download link;
the downloading module is used for downloading the APP compressed package according to the APP downloading link;
the analysis module is used for analyzing the APP compressed packet by adopting a decompilation mode, acquiring URL data inside the APP, and establishing an APP _ URL corresponding relation between the APP and the URL data inside the APP;
the identification module is used for receiving user data containing URL request information, and identifying the APP corresponding to the user data according to the URL request information in the user data and the APP _ URL corresponding relation between the APP and the URL data in the APP.
In the above scheme, the information crawling is performed according to a preset mode, specifically according to the following steps: the method comprises the steps of first-level classification definition, second-level classification crawling, list information crawling and detail information crawling for information crawling.
Further, the obtaining module is specifically configured to
Acquiring an APP downloading website, and taking an APP downloading website page address as input information of information crawling;
classifying the applications in the APP website page addresses according to a preset mode;
when the specific application classification is requested, all APPs of the specific application classification are obtained;
and when a specific APP detail page in all the APPs of the specific application classification is requested, obtaining an APP download link of the specific APP detail page.
In the above scheme, the decompilation mode adopts a decompression tool java.util.zip provided by Java to parse files in the APP installation package into character code streams.
In the above scheme, the parsing module includes a decompression sub-module, a first establishment sub-module, a first matching sub-module, and a second establishment sub-module; wherein,
the decompression submodule is used for analyzing the APP compressed packet by adopting a decompiling mode to obtain a character code stream analyzed by the APP compressed packet;
the first establishing submodule is used for establishing a regular expression for matching URL data in an APP, and performing line-by-line matching on the regular expression and the character code stream;
the first matching sub-module is used for taking the character code stream successfully matched with the regular expression as URL data in the APP when the regular expression is successfully matched with the character code stream;
and the second establishing submodule is used for establishing an APP _ URL corresponding relation between the APP and URL data inside the APP.
In the above scheme, the identification module includes a receiving sub-module and a second matching sub-module; wherein,
the receiving submodule is used for receiving the user data containing the URL request information, analyzing the received user data and acquiring the URL request information in the user data;
and the second matching submodule is used for identifying the APP corresponding to the user data according to the matched URL data when the URL request information in the user data is matched with the URL data in the APP _ URL corresponding relation between the APP and the URL data in the APP.
In the above scheme, when the system acquires APP _ URL corresponding relations between multiple sets of APPs and URL data inside the APPs, if the same URL data and multiple APPs correspond to each other in the APP _ URL corresponding relations between the multiple sets of APPs and the URL data inside the APPs, the system further includes a data cleaning module.
Further, the data cleaning module is specifically used for
Counting APP _ URL corresponding relations with the same URL data in APP _ URL corresponding relations among multiple groups of APPs and URL data in the APPs;
suffix removal is carried out on any two APPs with the same URL data through a character string matching method, and two postfix-removed APPs are obtained;
inquiring the minimum string of the two un-suffixed APPs to obtain the minimum string length;
inquiring the APP with the shorter character string in the two APP names without suffixes to obtain the APP name length with the shorter character string;
and taking the ratio of the minimum character string length to the shorter APP name length of the character string as the similarity;
and when the similarity is smaller than a preset threshold value of the system, replacing the corresponding relation between the two APPs and the URL data with the APP _ URL corresponding relation between the APP with the shorter character string in the names of the two un-suffixed APPs and the URL data.
In the above scheme, when the system acquires APP _ URL correspondence between multiple sets of APPs and URL data inside the APPs, if there is a correspondence between the same APP and multiple URL data in APP _ URL correspondence between multiple sets of APPs and URL data inside the APPs, the system further includes an outer chain exclusion module.
Further, the outer chain exclusion module is specifically for
Counting APP _ URL corresponding relations, which are the same as the APP, in APP _ URL corresponding relations among multiple groups of APPs and URL data in the APPs;
when an APP open interface exists in an APP _ URL corresponding relation between the inquired APP and the URL data in the APP, acquiring the URL data corresponding to the APP interface request through the APP interface request corresponding to the APP open interface;
and eliminating URL data corresponding to the APP interface request.
The embodiment of the invention provides an APP identification method and system, wherein the system carries out information crawling on an APP download website according to a preset mode to obtain an APP download link; the system downloads the APP compressed package according to the APP download link; the system analyzes the APP compressed packet by adopting a decompilation mode, acquires URL data inside the APP, and establishes an APP _ URL corresponding relation between the APP and the URL data inside the APP; the system receives user data containing URL request information, identifies the APP corresponding to the user data according to the URL request information in the user data and the APP _ URL corresponding relation between the APP and the URL data in the APP, and solves the problems of low efficiency, high automation difficulty, incomplete identification and the like existing in user APP identification.
Drawings
Fig. 1 is a schematic diagram of an APP recognition system according to an embodiment of the present invention;
fig. 2 is a flowchart of an APP identification method according to an embodiment of the present invention;
fig. 3 is a flowchart of obtaining an APP download link according to an embodiment of the present invention;
fig. 4 is a flowchart of establishing an APP _ URL correspondence between an APP and URL data in the APP according to the embodiment of the present invention;
fig. 5 is a flowchart of identifying an APP corresponding to user data according to an embodiment of the present invention;
fig. 6 is a specific flowchart of an APP identification method according to an embodiment of the present invention;
FIG. 7 is a flow chart of data cleaning according to an embodiment of the present invention;
FIG. 8 is a flowchart illustrating an exclusion process according to an embodiment of the present invention;
fig. 9 is a block diagram of an APP recognition system according to an embodiment of the present invention;
fig. 10 is a block diagram of a parsing module according to an embodiment of the present invention;
fig. 11 is a block diagram of an identification module according to an embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
Referring to fig. 1, which is a schematic diagram of an APP identification system, as can be seen from fig. 1, the system includes an information crawling process and an APP _ URL correspondence establishing process, and a URL data matching identification APP process in a correspondence between a URL request message and an APP _ URL.
Based on the system diagram of fig. 1, the basic idea of the embodiment of the present invention is: the system crawls information of the APP download website according to a preset mode to obtain an APP download link; the system downloads the APP compressed package according to the APP download link; the system analyzes the APP compressed packet by adopting a decompilation mode, acquires URL data inside the APP, and establishes an APP _ URL corresponding relation between the APP and the URL data inside the APP; the system receives user data containing URL request information, identifies the APP corresponding to the user data according to the URL request information in the user data and the APP _ URL corresponding relation between the APP and the URL data in the APP, and solves the problems of low efficiency, high automation difficulty, incomplete identification and the like existing in user APP identification.
Example one
Referring to fig. 2, there is shown an APP recognition method for use in an APP recognition system, the method comprising:
s201: the system crawls information of the APP download website according to a preset mode to obtain an APP download link;
s202: the system downloads the APP compressed package according to the APP download link;
s203: the system analyzes the APP compressed packet by adopting a decompilation mode, acquires URL data inside the APP, and establishes an APP _ URL corresponding relation between the APP and the URL data inside the APP;
s204: the system receives user data containing URL request information, and identifies an APP corresponding to the user data according to the URL request information in the user data and an APP _ URL corresponding relation between the APP and URL data in the APP.
For step S201, the information crawling is a process of acquiring APP information of an APP download website on the internet; for example, the system obtains APP download connections, APP sizes, APP versions and the like through information crawling;
preferably, the information crawling is performed according to a preset mode, specifically according to the following steps: the method comprises the steps of first-level classification definition, second-level classification crawling, list information crawling and detail information crawling for information crawling;
further, referring to fig. 3, the system crawls the information of the APP download website according to a preset mode, and acquires an APP download link, which specifically includes:
s2011: the system acquires an APP downloading website, and takes an APP downloading website page address as input information of information crawling;
s2012: the system classifies the applications in the APP website page addresses according to a preset mode;
s2013: when the system requests the classification of a specific application, all APPs of the classification of the specific application are obtained;
s2014: and when the system requests a specific APP detail page in all the APPs of the specific application classification, obtaining an APP download link of the specific APP detail page.
It should be noted that, for step S201, the specific implementation of the primary classification definition corresponds to step S2011, the specific implementation of the secondary classification crawling corresponds to step S2012, the specific implementation of the list information crawling corresponds to step S2013, and the specific implementation of the detail information crawling corresponds to step S2014;
further, the system crawls through one-level classification definition, second grade classification, list information and details information crawls and acquires APP download link, can unify the mode of crawling of APP download website, reduces the complexity that technical staff developed the multistation crawler, shortens the development cycle of appointed website crawler.
For step S203, preferably, the decompiling mode uses a decompression tool java.util.zip provided by Java to parse the file in the APP installation package into a character code stream.
For step S203, referring to fig. 4, the system analyzes the APP compressed packet by using a decompilation mode, acquires URL data inside the APP, and establishes an APP _ URL corresponding relationship between the APP and the URL data inside the APP, which specifically includes:
s2031: the system analyzes the APP compressed packet by adopting a decompilation mode to obtain a character code stream analyzed by the APP compressed packet;
s2032: the system establishes a regular expression for matching URL data in an APP, and the regular expression is matched with the character code stream line by line;
s2033: when the regular expression is successfully matched with the character code stream, the system takes the character code stream successfully matched with the regular expression as URL data in the APP;
s2034: the system establishes an APP _ URL corresponding relationship between the APP and URL data inside the APP.
For step S2032, specifically, the regular expression is in a form containing key characters required by URL data inside APP.
It should be noted that, in step S2033, when a certain row of character code streams matched with the regular expression conforms to the form of the regular expression, the system takes the character string matched with the regular expression in the row of character code streams as the URL data inside the APP corresponding to the row of character code streams.
For step S204, referring to fig. 5, the system receives user data including URL request information, identifies an APP corresponding to the user data according to the URL request information in the user data and the APP _ URL correspondence between the APP and the URL data in the APP, and specifically includes:
s2041: the system receives user data containing URL request information, analyzes the received user data and acquires the URL request information in the user data;
s2042: when URL request information in user data is matched with URL data in an APP _ URL corresponding relation between the APP and URL data in the APP, the system identifies the APP corresponding to the user data according to the matched URL data.
For step S2041, the user data may specifically be DPI data, and may also be other data types, and the present invention does not limit the type of the user data.
For step S2042, specifically, when the URL request information in the user data is the same as the URL data in the APP _ URL correspondence between the APP and the URL data inside the APP, the system identifies the APP corresponding to the user data according to the URL data that is the same as the URL request information.
For this embodiment, it should be noted that, when the system obtains APP _ URL corresponding relationships between multiple sets of APPs and URL data inside the APPs, if there is a same URL data corresponding to multiple APPs in the APP _ URL corresponding relationships between the multiple sets of APPs and the URL data inside the APPs, the technical scheme shown in fig. 2 may further include: the system performs data cleaning on APP _ URL corresponding relations between multiple groups of obtained APPs and URL data in the APPs;
further, the system performs data cleaning on APP _ URL corresponding relations between the obtained multiple sets of APPs and the URL data inside the APPs, and specifically includes:
the system counts APP _ URL corresponding relations with the same URL data in APP _ URL corresponding relations among multiple groups of APPs and URL data in the APPs;
the system removes suffixes from any two APPs with the same URL data through a character string matching method to obtain two APPs without suffixes;
the system inquires the minimum string of the two APP without suffixes to obtain the minimum string length;
the system queries an APP with a shorter character string in the two APP names without suffixes to obtain the APP name length with the shorter character string;
the system takes the ratio of the minimum character string length to the shorter APP name length of the character string as the similarity;
when the similarity is smaller than a preset threshold value of the system, the system replaces the corresponding relation between the two APPs and the URL data with the APP _ URL corresponding relation between the APP with the shorter character string and the URL data in the names of the two APPs without suffixes.
For this embodiment, it should be noted that, when the system obtains APP _ URL corresponding relationships between multiple sets of APPs and URL data inside the APPs, if there is a correspondence between one APP and multiple URL data in APP _ URL corresponding relationships between multiple sets of APPs and URL data inside the APPs, the technical scheme shown in fig. 2 may further include: the system carries out outer chain elimination on APP _ URL corresponding relations between multiple groups of obtained APPs and URL data in the APPs;
further, the system performs outer chain exclusion on APP _ URL corresponding relations between the obtained multiple sets of APPs and the URL data inside the APPs, and specifically includes:
the system counts APP _ URL corresponding relations with the same APP in APP _ URL corresponding relations among multiple groups of APPs and URL data in the APPs;
when the system queries that an APP-URL corresponding relation between an APP and URL data inside the APP exists in an APP-URL corresponding relation, acquiring URL data corresponding to an APP interface request through the APP interface request corresponding to the APP open interface;
and the system excludes the URL data corresponding to the APP interface request.
For the present embodiment, when the APP _ URL corresponding relationship between the APP and the URL data inside the APP is subjected to data cleaning and/or outer chain elimination, the APP _ URL corresponding relationship between the APP and the URL data inside the APP in step SA04 is the APP _ URL corresponding relationship between the APP and the URL data inside the APP subjected to data cleaning and/or outer chain elimination.
The embodiment provides an APP identification method, wherein the system crawls information of an APP download website according to a preset mode to obtain an APP download link; the system downloads the APP compressed package according to the APP download link; the system analyzes the APP compressed packet by adopting a decompilation mode, acquires URL data inside the APP, and establishes an APP _ URL corresponding relation between the APP and the URL data inside the APP; the system receives user data containing URL request information, identifies the APP corresponding to the user data according to the URL request information in the user data and the APP _ URL corresponding relation between the APP and the URL data in the APP, and solves the problems of low efficiency, high automation difficulty, incomplete identification and the like existing in user APP identification.
Example two
Based on the same technical concept of the foregoing embodiment, referring to fig. 6, a specific method for APP identification is shown, where the method includes:
s601: the system acquires an APP downloading website, and takes an APP downloading website page address as input information of information crawling;
s602: the system classifies the applications in the APP website page addresses according to a preset mode;
s603: when the system requests the classification of a specific application, all APPs of the classification of the specific application are obtained;
s604: when the system requests a specific APP detail page in all the APPs of the specific application classification, obtaining an APP download link of the specific APP detail page;
s605: the system downloads the APP compressed package according to the APP download link;
s606: the system analyzes the APP compressed packet by adopting a decompilation mode to obtain a character code stream analyzed by the APP compressed packet;
s607: the system establishes a regular expression for matching URL data in an APP, and the regular expression is matched with the character code stream line by line;
s608: when the regular expression is successfully matched with the character code stream, the system takes the character code stream successfully matched with the regular expression as URL data in the APP;
s609: the system establishes an APP _ URL corresponding relation between the APP and URL data in the APP;
s610: when the system acquires APP _ URL corresponding relations between multiple groups of APPs and URL data inside the APPs, if the same URL data corresponds to multiple APPs in the APP _ URL corresponding relations between the multiple groups of APPs and the URL data inside the APPs, the system performs data cleaning on the acquired APP _ URL corresponding relations between the multiple groups of APPs and the URL data inside the APPs;
s611: when the system acquires APP _ URL corresponding relations between multiple groups of APPs and URL data inside the APPs, if the same APP corresponds to multiple URL data in the APP _ URL corresponding relations between the multiple groups of APPs and the URL data inside the APPs, the system performs outer chain elimination on the acquired APP _ URL corresponding relations between the multiple groups of APPs and the URL data inside the APPs;
s612: the system receives user data containing URL request information, analyzes the received user data and acquires the URL request information in the user data;
s613: when URL request information in user data is matched with URL data in an APP _ URL corresponding relation between APP subjected to data cleaning and/or outer chain elimination and URL data in the APP, the system identifies the APP corresponding to the user data according to the matched URL data.
Specifically, for step S601, the APP download website page address may be a page address of an airplane-fronted network, a PP assistant, an intranet, an android market, and an OPPO application market;
specifically, for step S602, the system classifies applications in the APP website page address into "book", "business", "education", "entertainment", and the like according to application categories;
specifically, for step S603, when the system requests the "education" application, a series of APPs in the "education" application are acquired;
specifically, for step S604, when the system requests the "there is a channel dictionary" detail page, the system may obtain the APP download link of the "there is a channel dictionary".
It should be noted that, through steps S601 to S604, the system acquires the APP download link, and therefore, steps S601 to S604 are summarized as follows: the system crawls information of the APP download website according to a preset mode to obtain an APP download link; wherein, information crawling is carried out according to a preset mode, specifically according to: the method comprises the steps of first-level classification definition, second-level classification crawling, list information crawling and detail information crawling for information crawling;
further, the specific implementation of the primary classification definition corresponds to step S601, the specific implementation of the secondary classification crawling corresponds to step S602, the specific implementation of the list information crawling corresponds to step S603, and the specific implementation of the detail information crawling corresponds to step S604.
It should be noted that, the system acquires the APP download link in steps S601 to S604, which not only unifies the crawling manner of the APP download site, reduces the complexity of technical personnel in developing the multi-site crawler, and shortens the development period of the crawler at the designated site.
Preferably, for step S606, the decompiling mode uses a decompression tool java.util.zip provided by Java to parse the file in the APP installation package into a character code stream.
For step S607, specifically, the regular expression is in a form of containing key characters required by URL data inside the APP.
It should be noted that, for step S608, when a certain row of character code streams matched with the regular expression conforms to the form of the regular expression, the system takes the character string matched with the regular expression in the row of character code streams as the URL data inside the APP corresponding to the row of character code streams.
Specifically, for steps S606 to S608, the system establishes a regular expression for matching URL data inside the APP, performs line-by-line matching on the regular expression and the character code stream analyzed by the APP, and when the form of a specific line of character code stream matched with the regular expression is the same as the regular expression, the system takes the line of character code stream as URL data inside the APP; for example, the specific form of the regular expression is "www.axbxc", where "X" in "www.axbxc" represents any character, the regular expression is used to compare with the character code stream analyzed by the APP line by line, and if the specific form of a certain line of character code stream compared with the regular expression includes "www.aabbc", then "www.aabbc" in the line of character code stream is used as the URL data inside the APP.
For step S610, referring to fig. 7, when the system acquires APP _ URL corresponding relationships between multiple sets of APPs and URL data inside the APPs, if there is a same URL data corresponding to multiple APPs in APP _ URL corresponding relationships between multiple sets of APPs and URL data inside the APPs, the system performs data cleaning on APP _ URL corresponding relationships between the acquired multiple sets of APPs and URL data inside the APPs, and specifically includes:
s6101: the system counts APP _ URL corresponding relations with the same URL data in APP _ URL corresponding relations among multiple groups of APPs and URL data in the APPs;
s6102: the system removes suffixes from any two APPs with the same URL data through a character string matching method to obtain two APPs without suffixes;
s6103: the system inquires the minimum string of the two APP without suffixes to obtain the minimum string length;
s6104: the system queries an APP with a shorter character string in the two APP names without suffixes to obtain the APP name length with the shorter character string;
s6105: the system takes the ratio of the minimum character string length to the shorter APP name length of the character string as the similarity;
s6106: when the similarity is smaller than a preset threshold value of the system, the system replaces the corresponding relation between the two APPs and the URL data with the APP _ URL corresponding relation between the APP with the shorter character string and the URL data in the names of the two APPs without suffixes.
Exemplarily, for step S610, assuming that the system obtains APP _ URL corresponding relations between multiple groups of APPs and URL data inside the APPs through information crawling, where corresponding relations between the same URL data and multiple APPs, specifically, the newwave microblog HD-URL1 and the newwave microblog-URL 1, exist in the APP _ URL corresponding relations between the multiple groups of APPs and URL data inside the APPs, and then data cleaning is performed between the newwave microblog HD and the newwave microblog through steps S6101 to S6106; if the similarity between the green microblog and the green microblog HD is smaller than a preset threshold value of the system after the system is subjected to data cleaning, and the name length of the green microblog is smaller than that of the green microblog HD, the system updates the green microblog-URL 1 and the green microblog HD-URL1 to be in the same APP _ URL corresponding relation, namely the green microblog-URL 1 corresponding relation.
For step S611, referring to fig. 8, when the system acquires APP _ URL corresponding relationships between multiple sets of APPs and URL data inside the APPs, if there is a correspondence between the same APP and multiple URL data in APP _ URL corresponding relationships between multiple sets of APPs and URL data inside the APPs, the system performs outer chain exclusion on APP _ URL corresponding relationships between the acquired multiple sets of APPs and URL data inside the APPs, specifically including:
s6111: the system counts APP _ URL corresponding relations with the same APP in APP _ URL corresponding relations among multiple groups of APPs and URL data in the APPs;
s6112: when the system queries that an APP-URL corresponding relation between an APP and URL data inside the APP exists in an APP-URL corresponding relation, acquiring URL data corresponding to an APP interface request through the APP interface request corresponding to the APP open interface;
s6113: and the system excludes the URL data corresponding to the APP interface request.
Exemplarily, for step S611, it is assumed that the system extracts APP _ URL correspondence that is the same as APP, specifically, flight news-URL 1 and flight news-URL 3 correspondence, and the system performs an out-link exclusion process on the flight news-URL 1 and flight news-URL 3 correspondence through steps S6111 to S6113, respectively; if the URL1 in the corresponding relation of the Tencent news-URL 1 is the URL corresponding to the open interface after the exclusion of the external link, then the URL1 is the external link, and at the moment, the system excludes the corresponding relation of the Tencent news-URL 1; if no open interface exists in the Tengcong news-URL 3 corresponding relation after the outer chain is eliminated, the system keeps the Tengcong news-URL 3 corresponding relation.
For step S612, the user data may specifically be DPI data, and may also be other data types, and the present invention does not limit the type of the user data.
For step S613, specifically, when the URL request information in the user data is the same as the URL data in the APP _ URL correspondence between the APP and the URL data inside the APP, the system identifies the APP corresponding to the user data according to the URL data that is the same as the URL request information.
For step S613, exemplarily, the URL request information acquired by the system is specifically "www.aabbc", the URL request information of "www.aabbc" is compared with URL data in the APP _ URL correspondence between APPs subjected to data cleaning and/or outer chain exclusion and URL data inside APPs, and assuming that URLi data in the APPi-URLi correspondence between APPs subjected to data cleaning and/or outer chain exclusion and URLi data inside APPs is "www.aabbc", the system identifies APPs corresponding to the user data as APPs according to the URLi.
The embodiment provides a specific method for APP identification, wherein the system acquires an APP downloading website, and takes an APP downloading website page address as input information of information crawling; the system classifies the applications in the APP website page addresses according to a preset mode; when the system requests the classification of a specific application, all APPs of the classification of the specific application are obtained; when the system requests a specific APP detail page in all the APPs of the specific application classification, obtaining an APP download link of the specific APP detail page; the system downloads the APP compressed package according to the APP download link; the system analyzes the APP compressed packet by adopting a decompilation mode to obtain a character code stream analyzed by the APP compressed packet; the system establishes a regular expression for matching URL data in an APP, and the regular expression is matched with the character code stream line by line; when the regular expression is successfully matched with the character code stream, the system takes the character code stream successfully matched with the regular expression as URL data in the APP; the system establishes an APP _ URL corresponding relation between the APP and URL data in the APP; when the system acquires APP _ URL corresponding relations between multiple groups of APPs and URL data inside the APPs, if the same URL data corresponds to multiple APPs in the APP _ URL corresponding relations between the multiple groups of APPs and the URL data inside the APPs, the system performs data cleaning on the acquired APP _ URL corresponding relations between the multiple groups of APPs and the URL data inside the APPs; when the system acquires APP _ URL corresponding relations between multiple groups of APPs and URL data inside the APPs, if the same APP corresponds to multiple URL data in the APP _ URL corresponding relations between the multiple groups of APPs and the URL data inside the APPs, the system performs outer chain elimination on the acquired APP _ URL corresponding relations between the multiple groups of APPs and the URL data inside the APPs; the system receives user data containing URL request information, analyzes the received user data and acquires the URL request information in the user data; when URL request information in user data is matched with URL data in an APP _ URL corresponding relation between APP and APP inside URL data eliminated through data cleaning and/or external links, the system identifies the APP corresponding to the user data according to the matched URL data, and the problems of low efficiency, high automation difficulty, incomplete identification and the like existing in user APP identification are solved.
EXAMPLE III
Referring to fig. 9, there is shown the structure of an APP recognition system 90, said system comprising: an acquisition module 901, a download module 902, an analysis module 903 and an identification module 904; wherein,
the obtaining module 901 is configured to perform information crawling on the APP download website according to a preset manner, and obtain an APP download link;
the downloading module 902 is configured to download the APP compressed package according to the APP download link;
the analysis module 903 is configured to analyze the APP compressed packet by using a decompilation mode, acquire URL data inside the APP, and establish an APP _ URL corresponding relationship between the APP and the URL data inside the APP;
the identification module 904 is configured to receive user data including URL request information, and identify an APP corresponding to the user data according to the URL request information in the user data and an APP _ URL correspondence between the APP and URL data inside the APP.
For the obtaining module 901, the information crawling is a process of obtaining APP information of an APP downloading website on the internet; for example, the system obtains APP download connections, APP sizes, APP versions and the like through information crawling;
preferably, the information crawling is performed according to a preset mode, specifically according to the following steps: the method comprises the steps of first-level classification definition, second-level classification crawling, list information crawling and detail information crawling for information crawling;
further, the obtaining module 901 is specifically configured to
Acquiring an APP downloading website, and taking an APP downloading website page address as input information of information crawling;
classifying the applications in the APP website page addresses according to a preset mode;
when the specific application classification is requested, all APPs of the specific application classification are obtained;
and when a specific APP detail page in all the APPs of the specific application classification is requested, obtaining an APP download link of the specific APP detail page.
It should be noted that, for the obtaining module 901, a specific implementation of the primary classification definition corresponds to step SA011, a specific implementation of the secondary classification crawling corresponds to step SA012, a specific implementation of the list information crawling corresponds to step SA013, and a specific implementation of the detail information crawling corresponds to step SA 014;
further, through the acquisition module 901, the crawling mode of the APP download site can be unified, the complexity of technical personnel in developing the multi-site crawler is reduced, and the development period of the crawler at the designated site is shortened.
As for the parsing module 903, referring to fig. 10, the parsing module 903 includes a decompression sub-module 9031, a first establishing sub-module 9032, a first matching sub-module 9033, and a second establishing sub-module 9034; wherein,
the decompression sub-module 9031 is configured to analyze the APP compressed packet in a decompiling manner to obtain a character code stream after the APP compressed packet is analyzed;
the first establishing sub-module 9032 is configured to establish a regular expression used for matching URL data inside an APP, and match the regular expression with the character code stream line by line;
the first matching sub-module 9033 is configured to, when the regular expression is successfully matched with the character code stream, use the character code stream successfully matched with the regular expression as URL data inside an APP;
the second establishing sub-module 9034 is configured to establish an APP _ URL corresponding relationship between the APP and URL data inside the APP.
For the first establishing sub-module 9032, specifically, the regular expression is in a form of including key characters required by URL data inside the APP.
It should be noted that, for the matching sub-module 9033, when a certain row of character code streams matched with the regular expression conforms to the form of the regular expression, the system takes the character string matched with the regular expression in the row of character code streams as URL data inside the APP corresponding to the row of character code streams.
For the recognition module 904, referring to fig. 11, the recognition module 904 includes a receiving sub-module 9041 and a second matching sub-module 9042; wherein,
the receiving submodule 9041 is configured to receive user data including URL request information, parse the received user data, and acquire URL request information in the user data;
and the second matching sub-module 9042 is configured to, when URL request information in user data matches URL data in an APP _ URL correspondence between the APP and URL data inside the APP, identify the APP corresponding to the user data according to the matched URL data.
For the receiving sub-module 9041, the user data may specifically be DPI data, and may also be other data types, and the present invention does not limit the type of the user data.
For the second matching sub-module 9042, specifically, when URL request information in user data is the same as URL data in an APP _ URL correspondence between the APP and URL data inside the APP, the system identifies the APP corresponding to the user data according to the URL data that is the same as the URL request information.
For this embodiment, it should be noted that, when the system obtains APP _ URL corresponding relationships between multiple sets of APPs and URL data inside the APPs, if there is a same URL data corresponding to multiple APPs in APP _ URL corresponding relationships between multiple sets of APPs and URL data inside the APPs, the technical scheme may further include: a data cleaning module 905; wherein,
the data cleaning module 905 is configured to perform data cleaning on APP _ URL corresponding relationships between multiple acquired sets of APPs and URL data in the APPs;
further, the data cleaning module 905 is specifically configured to
Counting APP _ URL corresponding relations with the same URL data in APP _ URL corresponding relations among multiple groups of APPs and URL data in the APPs;
suffix removal is carried out on any two APPs with the same URL data through a character string matching method, and two postfix-removed APPs are obtained;
inquiring the minimum string of the two un-suffixed APPs to obtain the minimum string length;
inquiring the APP with the shorter character string in the two APP names without suffixes to obtain the APP name length with the shorter character string;
and taking the ratio of the minimum character string length to the shorter APP name length of the character string as the similarity;
and when the similarity is smaller than a preset threshold value of the system, replacing the corresponding relation between the two APPs and the URL data with the APP _ URL corresponding relation between the APP with the shorter character string in the names of the two un-suffixed APPs and the URL data.
For this embodiment, it should be noted that, when the system obtains APP _ URL corresponding relationships between multiple sets of APPs and the URL data inside the APPs, if there is a same APP corresponding to multiple URL data in APP _ URL corresponding relationships between multiple sets of APPs and the URL data inside the APPs, the technical scheme may further include: an outer chain exclusion module 906; wherein,
the external chain exclusion module 906 is configured to perform external chain exclusion on APP _ URL corresponding relationships between the obtained multiple sets of APPs and URL data inside the APPs;
further, the outer chain exclusion module 906 is specifically configured for
Counting APP _ URL corresponding relations, which are the same as the APP, in APP _ URL corresponding relations among multiple groups of APPs and URL data in the APPs;
when an APP open interface exists in an APP _ URL corresponding relation between the inquired APP and the URL data in the APP, acquiring the URL data corresponding to the APP interface request through the APP interface request corresponding to the APP open interface;
and eliminating URL data corresponding to the APP interface request.
For this embodiment, when it needs to be described, when the APP _ URL corresponding relationship between the APP and the URL data inside the APP is subjected to data cleaning and/or outer chain elimination, the APP _ URL corresponding relationship between the APP and the URL data inside the APP in the identification module 904 is the APP _ URL corresponding relationship between the APP and the URL data inside the APP subjected to data cleaning and/or outer chain elimination.
The embodiment provides a video classification system, where the obtaining module 901 is configured to perform information crawling on an APP download website according to a preset manner, and obtain an APP download link; the downloading module 902 is configured to download the APP compressed package according to the APP download link; the analysis module 903 is configured to analyze the APP compressed packet by using a decompilation mode, acquire URL data inside the APP, and establish an APP _ URL corresponding relationship between the APP and the URL data inside the APP; the identification module 904 is used for receiving user data containing URL request information, identifying the APP corresponding to the user data according to the URL request information in the user data and the APP _ URL corresponding relationship between the APP and the URL data in the APP, and solves the problems of low efficiency, high automation difficulty, incomplete identification and the like existing in user APP identification.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims (16)

1. An APP identification method, wherein the method is used in an APP identification system, and wherein the method comprises:
the system crawls information of the APP download website according to a preset mode to obtain an APP download link;
the system downloads the APP compressed package according to the APP download link;
the system analyzes the APP compressed packet by adopting a decompilation mode, acquires URL data inside the APP, and establishes an APP _ URL corresponding relation between the APP and the URL data inside the APP;
the system receives user data containing URL request information, and identifies an APP corresponding to the user data according to the URL request information in the user data and an APP _ URL corresponding relation between the APP and URL data in the APP; the user data comprises user internet surfing data;
when the system acquires APP _ URL correspondence between multiple sets of APPs and URL data inside the APPs, if there is correspondence between the same APP and multiple URL data in APP _ URL correspondence between multiple sets of APPs and URL data inside the APPs, the method further includes:
the system counts APP _ URL corresponding relations with the same APP in APP _ URL corresponding relations among multiple groups of APPs and URL data in the APPs;
when the system queries that an APP-URL corresponding relation between an APP and URL data inside the APP exists in an APP-URL corresponding relation, acquiring URL data corresponding to an APP interface request through the APP interface request corresponding to the APP open interface;
and the system excludes the URL data corresponding to the APP interface request.
2. The method according to claim 1, characterized in that the information crawling is performed in a preset manner, specifically according to: the method comprises the steps of first-level classification definition, second-level classification crawling, list information crawling and detail information crawling for information crawling.
3. The method according to claim 2, wherein the system crawls information of the APP download website in a preset manner to obtain the APP download link, and specifically comprises:
the system acquires an APP downloading website, and takes an APP downloading website page address as input information of information crawling;
the system classifies the applications in the APP website page addresses according to a preset mode;
when the system requests the classification of a specific application, all APPs of the classification of the specific application are obtained;
and when the system requests a specific APP detail page in all the APPs of the specific application classification, obtaining an APP download link of the specific APP detail page.
4. The method of claim 1, wherein the decompilation means parses the file in the APP installation package into a character code stream using a decompression tool Java.
5. The method of claim 1, wherein the system analyzes the APP compressed packet in a decompilation manner, obtains URL data inside the APP, and establishes an APP _ URL correspondence between the APP and the URL data inside the APP, specifically comprising:
the system analyzes the APP compressed packet by adopting a decompilation mode to obtain a character code stream analyzed by the APP compressed packet;
the system establishes a regular expression for matching URL data in an APP, and the regular expression is matched with the character code stream line by line;
when the regular expression is successfully matched with the character code stream, the system takes the character code stream successfully matched with the regular expression as URL data in the APP;
the system establishes an APP _ URL corresponding relationship between the APP and URL data inside the APP.
6. The method of claim 1, wherein the system receives user data including URL request information, and identifies an APP corresponding to the user data according to the URL request information in the user data and an APP _ URL correspondence between the APP and URL data in the APP, and specifically includes:
the system receives user data containing URL request information, analyzes the received user data and acquires the URL request information in the user data;
when URL request information in user data is matched with URL data in an APP _ URL corresponding relation between the APP and URL data in the APP, the system identifies the APP corresponding to the user data according to the matched URL data.
7. The method of claim 1, wherein when the system obtains APP _ URL correspondence between multiple sets of APPs and URL data inside the APPs, if there is a same URL data corresponding to multiple APPs in APP _ URL correspondence between multiple sets of APPs and URL data inside the APPs, the method further comprises: and the system performs data cleaning on the APP _ URL corresponding relation between the multiple groups of acquired APPs and the URL data in the APPs.
8. The method according to claim 7, wherein the system performs data cleaning on APP _ URL correspondence between multiple sets of obtained APPs and URL data inside the APPs, and specifically includes:
the system counts APP _ URL corresponding relations with the same URL data in APP _ URL corresponding relations among multiple groups of APPs and URL data in the APPs;
the system removes suffixes from any two APPs with the same URL data through a character string matching method to obtain two APPs without suffixes;
the system inquires the minimum string of the two APP without suffixes to obtain the minimum string length;
the system queries an APP with a shorter character string in the two APP names without suffixes to obtain the APP name length with the shorter character string;
the system takes the ratio of the minimum character string length to the shorter APP name length of the character string as the similarity;
when the similarity is smaller than a preset threshold value of the system, the system replaces the corresponding relation between the two APPs and the URL data with the APP _ URL corresponding relation between the APP with the shorter character string and the URL data in the names of the two APPs without suffixes.
9. An APP identification system, the system comprising: the device comprises an acquisition module, a downloading module, an analysis module and an identification module; wherein,
the acquisition module is used for crawling information of the APP download website according to a preset mode to acquire an APP download link;
the downloading module is used for downloading the APP compressed package according to the APP downloading link;
the analysis module is used for analyzing the APP compressed packet by adopting a decompilation mode, acquiring URL data inside the APP, and establishing an APP _ URL corresponding relation between the APP and the URL data inside the APP;
the identification module is used for receiving user data containing URL request information and identifying an APP corresponding to the user data according to the URL request information in the user data and the APP _ URL corresponding relation between the APP and the URL data in the APP; the user data comprises user internet surfing data;
when the system acquires APP _ URL corresponding relations between multiple groups of APPs and URL data inside the APPs, if the same APP corresponds to multiple URL data in the APP _ URL corresponding relations between the multiple groups of APPs and the URL data inside the APPs, the system further comprises an outer chain elimination module;
the outer chain exclusion module is specifically configured to:
counting APP _ URL corresponding relations, which are the same as the APP, in APP _ URL corresponding relations among multiple groups of APPs and URL data in the APPs;
when an APP open interface exists in an APP _ URL corresponding relation between the inquired APP and the URL data in the APP, acquiring the URL data corresponding to the APP interface request through the APP interface request corresponding to the APP open interface;
and eliminating URL data corresponding to the APP interface request.
10. The system according to claim 9, wherein the information crawling is performed in a preset manner, specifically according to: the method comprises the steps of first-level classification definition, second-level classification crawling, list information crawling and detail information crawling for information crawling.
11. The system according to claim 10, wherein the obtaining module is specifically configured to obtain an APP download website, and use an APP download website page address as input information for information crawling;
classifying the applications in the APP website page addresses according to a preset mode;
when the specific application classification is requested, all APPs of the specific application classification are obtained;
and when a specific APP detail page in all the APPs of the specific application classification is requested, obtaining an APP download link of the specific APP detail page.
12. The system of claim 9, wherein the decompilation means parses the file in the APP installation package into a character stream using a decompression tool Java.
13. The system of claim 9, wherein the parsing module comprises a decompression sub-module, a first establishment sub-module, a first matching sub-module, and a second establishment sub-module; wherein,
the decompression submodule is used for analyzing the APP compressed packet by adopting a decompiling mode to obtain a character code stream analyzed by the APP compressed packet;
the first establishing submodule is used for establishing a regular expression for matching URL data in an APP, and performing line-by-line matching on the regular expression and the character code stream;
the first matching sub-module is used for taking the character code stream successfully matched with the regular expression as URL data in the APP when the regular expression is successfully matched with the character code stream;
and the second establishing submodule is used for establishing an APP _ URL corresponding relation between the APP and URL data inside the APP.
14. The system of claim 9, wherein the identification module comprises a receiving sub-module and a second matching sub-module; wherein,
the receiving submodule is used for receiving the user data containing the URL request information, analyzing the received user data and acquiring the URL request information in the user data;
and the second matching submodule is used for identifying the APP corresponding to the user data according to the matched URL data when the URL request information in the user data is matched with the URL data in the APP _ URL corresponding relation between the APP and the URL data in the APP.
15. The system of claim 9, wherein when the system obtains APP _ URL correspondence between multiple sets of APPs and URL data inside the APPs, if there is a same URL data corresponding to multiple APPs in APP _ URL correspondence between multiple sets of APPs and URL data inside the APPs, the system further comprises a data cleaning module.
16. The system according to claim 15, wherein the data cleansing module is specifically configured to
Counting APP _ URL corresponding relations with the same URL data in APP _ URL corresponding relations among multiple groups of APPs and URL data in the APPs;
suffix removal is carried out on any two APPs with the same URL data through a character string matching method, and two postfix-removed APPs are obtained;
inquiring the minimum string of the two un-suffixed APPs to obtain the minimum string length;
inquiring the APP with the shorter character string in the two APP names without suffixes to obtain the APP name length with the shorter character string;
and taking the ratio of the minimum character string length to the shorter APP name length of the character string as the similarity;
and when the similarity is smaller than a preset threshold value of the system, replacing the corresponding relation between the two APPs and the URL data with the APP _ URL corresponding relation between the APP with the shorter character string in the names of the two un-suffixed APPs and the URL data.
CN201610994224.8A 2016-11-11 2016-11-11 APP identification method and system Active CN108376071B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610994224.8A CN108376071B (en) 2016-11-11 2016-11-11 APP identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610994224.8A CN108376071B (en) 2016-11-11 2016-11-11 APP identification method and system

Publications (2)

Publication Number Publication Date
CN108376071A CN108376071A (en) 2018-08-07
CN108376071B true CN108376071B (en) 2021-08-24

Family

ID=63016030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610994224.8A Active CN108376071B (en) 2016-11-11 2016-11-11 APP identification method and system

Country Status (1)

Country Link
CN (1) CN108376071B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245273B (en) * 2019-06-21 2021-04-30 武汉绿色网络信息服务有限责任公司 Method for acquiring APP service feature library and corresponding device
CN115022216A (en) * 2022-05-27 2022-09-06 中国电信股份有限公司 Installed APP detection method and device, and network side equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591992A (en) * 2012-02-15 2012-07-18 苏州亚新丰信息技术有限公司 Webpage classification identifying system and method based on vertical search and focused crawler technology
CN102938789A (en) * 2012-11-19 2013-02-20 江苏省公用信息有限公司 Download combination analysis method and device for mobile internet mobile phone applications
CN104700289A (en) * 2015-03-17 2015-06-10 中国联合网络通信集团有限公司 Advertising method and device
CN105022832A (en) * 2015-08-07 2015-11-04 广东欧珀移动通信有限公司 Method for safely downloading APP (application), mobile terminal and download server

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7752207B2 (en) * 2007-05-01 2010-07-06 Oracle International Corporation Crawlable applications
US20120185342A1 (en) * 2011-01-13 2012-07-19 Michael Onghai Systems and methods for utilizing customer-provided information within social media applications
US20140074603A1 (en) * 2012-09-11 2014-03-13 Millmobile Bv Consumer advertisement targeting platform system
CN103136342B (en) * 2013-02-04 2016-06-15 百度在线网络技术(北京)有限公司 The searching method of application A PP, system and search server
CN104980409A (en) * 2014-04-11 2015-10-14 中兴通讯股份有限公司 Internet behavior management method and device
CN104507073A (en) * 2014-12-18 2015-04-08 北京大唐智能卡技术有限公司 Method and system for downloading application
CN104504335B (en) * 2014-12-24 2017-12-05 中国科学院深圳先进技术研究院 Fishing APP detection methods and system based on page feature and URL features
CN105808276A (en) * 2014-12-30 2016-07-27 乐视致新电子科技(天津)有限公司 Application management method and apparatus
CN106022127B (en) * 2016-05-10 2019-07-16 江苏通付盾科技有限公司 APK file safety detection method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591992A (en) * 2012-02-15 2012-07-18 苏州亚新丰信息技术有限公司 Webpage classification identifying system and method based on vertical search and focused crawler technology
CN102938789A (en) * 2012-11-19 2013-02-20 江苏省公用信息有限公司 Download combination analysis method and device for mobile internet mobile phone applications
CN104700289A (en) * 2015-03-17 2015-06-10 中国联合网络通信集团有限公司 Advertising method and device
CN105022832A (en) * 2015-08-07 2015-11-04 广东欧珀移动通信有限公司 Method for safely downloading APP (application), mobile terminal and download server

Also Published As

Publication number Publication date
CN108376071A (en) 2018-08-07

Similar Documents

Publication Publication Date Title
CN112749284B (en) Knowledge graph construction method, device, equipment and storage medium
CN107346336B (en) Information processing method and device based on artificial intelligence
CN101853300B (en) Method and system for identifying and evaluating video downloading service website
CN108875091B (en) Distributed web crawler system with unified management
CN109408821B (en) Corpus generation method and device, computing equipment and storage medium
CN110046254B (en) Method and apparatus for generating a model
CN106301825B (en) DPI rule generation method and device
CN107257390B (en) URL address resolution method and system
CN112597373A (en) Data acquisition method based on distributed crawler engine
CN106599160B (en) Content rule library management system and coding method thereof
CN105049287A (en) Log processing method and log processing devices
CN110968684A (en) Information processing method, device, equipment and storage medium
CN111740923A (en) Method and device for generating application identification rule, electronic equipment and storage medium
CN101853289A (en) Database auditing method and equipment
CN102984161A (en) Identification method and device for reliable website
CN111447224A (en) Web vulnerability scanning method and vulnerability scanner
CN108376071B (en) APP identification method and system
CN110019012B (en) Data preprocessing method, data preprocessing device and computer-readable storage medium
CN110020161B (en) Data processing method, log processing method and terminal
CN112822121A (en) Traffic identification method, traffic determination method and knowledge graph establishment method
CN107517237B (en) Video identification method and device
CN113806647A (en) Method for identifying development framework and related equipment
CN106462614B (en) Information analysis system, information analysis method, and information analysis program
CN111159509B (en) Data processing method and related product
CN114881012A (en) Article title and content intelligent rewriting system and method based on natural language processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 310012 building A01, 1600 yuhangtang Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province

Applicant after: CHINA MOBILE (HANGZHOU) INFORMATION TECHNOLOGY Co.,Ltd.

Applicant after: China Mobile Communications Corp.

Address before: 310012, No. 14, building three, Chang Torch Hotel, No. 259, Wensanlu Road, Xihu District, Zhejiang, Hangzhou

Applicant before: CHINA MOBILE (HANGZHOU) INFORMATION TECHNOLOGY Co.,Ltd.

Applicant before: China Mobile Communications Corp.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant