CN109104381A - A kind of mobile application recognition methods based on third party's flow HTTP message - Google Patents

A kind of mobile application recognition methods based on third party's flow HTTP message Download PDF

Info

Publication number
CN109104381A
CN109104381A CN201810670461.8A CN201810670461A CN109104381A CN 109104381 A CN109104381 A CN 109104381A CN 201810670461 A CN201810670461 A CN 201810670461A CN 109104381 A CN109104381 A CN 109104381A
Authority
CN
China
Prior art keywords
application
message
party
flow
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810670461.8A
Other languages
Chinese (zh)
Other versions
CN109104381B (en
Inventor
杨明
王姗
吴嘉楠
吴文甲
凌振
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201810670461.8A priority Critical patent/CN109104381B/en
Publication of CN109104381A publication Critical patent/CN109104381A/en
Application granted granted Critical
Publication of CN109104381B publication Critical patent/CN109104381B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2475Traffic characterised by specific attributes, e.g. priority or QoS for supporting traffic characterised by the type of applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Abstract

The present invention discloses a kind of mobile application recognition methods based on third party's flow HTTP message, and step is: user carries out flow sample collection, and automatic marked traffic using automation flow collection platform;User counts HTTP message keyword sequences and is in the presence of judging whether message corresponding to the sequence is third party's flow in data set;Count HTTP message composition sequence, by message value value the appearance situation in same application and between different application judge value value whether with apply there are mapping relations, to establish third party's fingerprint base;Then, after capturing message to be measured, first determine whether the message is third party's flow, then check that third party's fingerprint base finds the value value of mark application, i.e. application ID, and identify the application source of the message by the mapping relations between ID and application.Such method judges third party's traffic messages and extracts the application ID in message using statistical method, establishes the mapping relations between ID and application, to carry out using identification.

Description

A kind of mobile application recognition methods based on third party's flow HTTP message
Technical field
The invention belongs to mobile application identification technology fields, and in particular to a kind of shifting based on third party's flow HTTP message Dynamic application and identification method.
Background technique
With universal and mobile application market the prosperity of mobile intelligent terminal, mobile flow accounts for network bulk flow ratio It is continuously increased, how it is effectively supervised of increasing concern.In order to carry out fine granularity monitoring to mobile flow, need pair The attributes such as source, the function of flow are identified, and mobile application identification technology due to can effectively solve the problem that the above problem and by To extensive concern.
A kind of common approach of mobile application identification is by knowing to the application feature in third parties' flow such as advertisement Not.Specifically, the purpose that third party's service is needed or got a profit for function, often needs to identify application identity, this ID of the value value for allowing for often filling some application identities for identification in third party's traffic messages as application.They There are apparent mapping relations between application, so can be used to identify application.But due to third party service provider's quantity compared with More, the flow of generation also has respective mode, so being difficult to the mapping relations automatically established between ID value and application;And The method for extracting third party's flow application ID at present is based on phraseological analysis more, and this method is time-consuming and is easy erroneous judgement.
Summary of the invention
The purpose of the present invention is to provide a kind of mobile application recognition methods based on third party's flow HTTP message, Third party's traffic messages are judged using statistical method and extract the application ID in message, and the mapping established between ID and application is closed System, to carry out using identification.
In order to achieve the above objectives, solution of the invention is:
A kind of mobile application recognition methods based on third party's flow HTTP message, includes the following steps:
Step 1, user carries out flow sample collection, and automatic marked traffic by using automation flow collection platform;
Step 2, user is in the presence of in data set by counting HTTP message keyword sequences, judges the sequence institute Whether corresponding message is third party's flow;
Step 3, count HTTP message composition sequence, by message value value in same application and different application it Between appearance situation judge value value whether with application there are mapping relations, to establish third party's fingerprint base;Then, when catching After grasping message to be measured, first determine whether the message is third party's flow, then checks that third party's fingerprint base finds mark The value value of application, i.e. application ID, and identify by the mapping relations between ID and application the message applies source.
In above-mentioned steps 1, automatic test platform is built using Android virtual machine and Monkey, guarantees same The same time at most only installs an application to be measured on simulator, to be numbered by simulator and be using runing time section The flow of test platform triggering is marked.
In above-mentioned steps 2, the later keyword sequences of value value are rejected to characterize message using HTTP message, and pass through The number that the sequence occurs in multiple and different applications judges whether message corresponding to the sequence comes from third party's service.
Wherein, keyword sequences are formed using the parameter name in domain name, resource path and the domain query and content.
In above-mentioned steps 3, if the value in different message composition sequences on a certain position is identical in same application, more It is different between a different application, then it is assumed that the value value of this position be with using there are the application IDs of mapping relations.
Wherein, the parameter name and parameter value of domain name, path, query and content in is used to form sequence as message Column.
According to the survey, same service provider provides the flow generated when service in addition to respective location is filled for different application to root Value value other than it is all identical, and there is the application ID of application identity for identification among these value.It is based on This, recognition methods provided by the invention compared with prior art, has the advantage that
(1) whether the present invention uses statistical method to judge a HTTP message for third party's service flow;
(2) present invention extracts the application ID of third party's traffic messages using statistical method, and is come using this ID value Identification application, method is simple, and calculation amount is smaller.
Detailed description of the invention
Fig. 1 is flow chart of the invention.
Specific embodiment
Below with reference to attached drawing, technical solution of the present invention and beneficial effect are described in detail.
As shown in Figure 1, the present invention provides a kind of mobile application recognition methods based on third party's flow HTTP message, including Following steps:
(1) data on flows automation collection:
User carries out flow sample collection, and automatic marked traffic by using automation flow collection platform.It utilizes The fuzz testings tool such as Android virtual machine and Monkey builds automatic test platform, guarantees same on same simulator One time at most only installed an application to be measured, can thus be numbered by simulator and application runing time section is test The flow of platform triggering is marked;It carries out needing a large amount of labeled good flow samples before using identification.
(2) third party's flow is identified:
User is in the presence of in data set by counting HTTP message keyword sequences, judges corresponding to the sequence Whether message is third party's flow.The later keyword sequences of value value are rejected to characterize message, specifically using HTTP message Parameter name sequential concatenation in domain name, resource path and query and two domains content in message is got up to form keyword Sequence, to characterize message.It can be treated after establishing third party's message keyword sequence library and observe and predict text and matched to sentence Whether the message that breaks belongs to third party's service.Meanwhile if same keyword sequences occur in multiple and different applications, recognize For flow of the corresponding message of the sequence from third party's service.Third party's flow library is established on this basis, when message to be measured Keyword sequences are in the flow library, it may be considered that this message is the flow of third party's service.
(3) the identification application of third party's flow is utilized:
User counts HTTP message composition sequence, by message value value in same application and between different application Appearance situation judge value value whether with application there are mapping relations, to establish third party's fingerprint base.Then, by sentencing Special value value completes application identification in disconnected third party's flow.By in message domain name, resource path and query and Parameter name and parameter value sequential concatenation in two domains content get up formation sequence.By the sequence between multiple applications into Row compares, if the value of a certain position of the sequence is constant in the same application, but is different between multiple applications, then may be used To think that this value is the application ID that third party's service is applied for identification.It is established between application ID and application according to above-mentioned rule Mapping relations, it can identified by application ID and applied belonging to message to be measured.
Embodiment:
The mobile application recognition methods based on third party's flow HTTP message in the present embodiment, comprising the following steps:
One, data on flows automation collection:
A large amount of mobile applications are downloaded by reptile instrument first;It is then based on Android virtual machine and fuzz testing tool The mobile application automatic test platform of Monkey chooses application in application library, and automation is installed and run using to generate stream Amount;Then tool is acted on behalf of using MITMPROXY go-between to monitor and save using the flow generated on virtual machine, record Traffic log;Finally judged using source by traffic log to be marked as flow, then by it using wscript.exe It is stored in data on flows library.Particularly, since the time same on simulator can only at most run an application, it is possible to pass through prison Time and its simulator source of message are heard to judge message is generated by which application, it is possible to be carried out using mark Note.
Two, third party's flow is identified:
User is in the presence of in data set by counting HTTP message keyword sequences, judges corresponding to the sequence Whether message is third party's flow, can treat after establishing third party's message keyword sequence library and observe and predict text and match To judge whether the message belongs to third party's service.
It is always fixed due to the interaction protocol of the same third party's service, so its format, i.e. keyword sequences are always Constant, what can be changed only has because carrying the value that information is different and value is different.For this purpose, by the value in each message It removes, leaves the keyword sequences in its domain name, resource path, query and content.If the sequence is at 3 or 3 or more Application in occur, then it is assumed that the message belongs to third party's flow, and keyword sequences thirdPktStr is stored.
Particularly, when these applications belong to the different editions of the same application or belong to same manufacturer with a series of When different product, they are most likely with in-company public service, but the similar flow that these applications generate is but not Third party's flow should be classified as.Apk can be named as ' domain1.domain2 ... name_ by usual developer The form of version.apk ', and ' domain1.domain2 ' Chang Xiangtong in the ProductName of same manufacturer.For example, Com.youdao.dict_6070000 and com.youdao.note_65 is two product under Netease has, wherein ' com.youdao ' specifies product manufacturer and series, and ' _ 6070000 ' and ' _ 65 ' represent the sequence of some version Number.Application vendor and version are judged accordingly, if ' domain1.domain2 ' of application is identical, they are from same Manufacturer, if using only ' part of _ version ' is different, then it is assumed that they are actually with a application.Algorithm 1 describes whole A process:
Three, it is identified and is applied using third party's flow:
HTTP message composition sequence is counted, by message value value in going out in same application and between different application Status condition judge value value whether with application there are mapping relations, to establish third party's fingerprint base.Then, by judging Special value value completes application identification in tripartite's flow.
After further research third party's flow, value and the application of a certain specific position of part specific message are found Between there are one-to-one relationships, it can be used as a validity feature identifier, to using identification help is provided.This Invention devises third party's marker extraction algorithm and extracts to it, establishes the mapping table between identifier and application, from And identify application.
With application with the presence of the identifier value following characteristics of corresponding relationship:
Wherein message type can be used thirdPktStr and be indicated.Assuming that consistList be message key, Value sequence, identifier extraction algorithm is as follows, and thirdPktStr and consistList example are as shown in table 1.
1 thirdPktStr of table and consistList example
Mapping relations between ID and application are finally recorded in by third party's marker extraction method as shown in algorithm 2 In thirdIdTable table:
In message cognitive phase, the structure sequence thirdPktStr of message is extracted first, whether inquiry has The record of thirdPktStr extracts the element in message consistList if having and carries out being spliced to form feature, most The corresponding application of the feature is known by inquiring thirdIdTable afterwards.
In summary, a kind of mobile application recognition methods based on third party's flow HTTP message of the present invention, by building Application traffic is collected and marked to automation flow collection platform based on Android virtual machine and fuzz testing tool, realizes The automation of data sample obtains;On the basis of collecting data on flows collection, third party's flow HTTP based on statistics is used Packet identification method identifies third party's service flow, and establishes the value value and application of specific position in third party's flow automatically Between corresponding relationship, to identify application.The present invention allows to belong to third party in user's automatic identification mobile application flow The HTTP message of service, and pass through the identification application of these messages.
The above examples only illustrate the technical idea of the present invention, and this does not limit the scope of protection of the present invention, all According to the technical idea provided by the invention, any changes made on the basis of the technical scheme each falls within the scope of the present invention Within.

Claims (6)

1. a kind of mobile application recognition methods based on third party's flow HTTP message, it is characterised in that include the following steps:
Step 1, user carries out flow sample collection, and automatic marked traffic by using automation flow collection platform;
Step 2, user is in the presence of in data set by counting HTTP message keyword sequences, judges corresponding to the sequence Message whether be third party's flow;
Step 3, HTTP message composition sequence is counted, by message value value in same application and between different application Appearance situation judge value value whether with application there are mapping relations, to establish third party's fingerprint base;Then, when capturing After message to be measured, first determine whether the message is third party's flow, then checks that third party's fingerprint base finds mark application Value value, i.e. application ID, and identify by the mapping relations between ID and application the message applies source.
2. a kind of mobile application recognition methods based on third party's flow HTTP message as described in claim 1, feature exist In: in the step 1, automatic test platform is built using Android virtual machine and Monkey, guarantees same simulator The upper same time at most only installs an application to be measured, to be that test is flat by simulator number and application runing time section The flow of platform triggering is marked.
3. a kind of mobile application recognition methods based on third party's flow HTTP message as described in claim 1, feature exist In: in the step 2, the later keyword sequences of value value are rejected to characterize message using HTTP message, and pass through the sequence The number occurred in multiple and different applications judges whether message corresponding to the sequence comes from third party's service.
4. a kind of mobile application recognition methods based on third party's flow HTTP message as claimed in claim 3, feature exist In: keyword sequences are formed using the parameter name in domain name, resource path and the domain query and content.
5. a kind of mobile application recognition methods based on third party's flow HTTP message as described in claim 1, feature exist In: in the step 3, if the value in different message composition sequence on a certain position is identical in same application, it is multiple not With different between application, then it is assumed that the value value of this position be with using there are the application IDs of mapping relations.
6. a kind of mobile application recognition methods based on third party's flow HTTP message as claimed in claim 5, feature exist In: use the parameter name and parameter value of domain name, path, query and content in as message composition sequence.
CN201810670461.8A 2018-06-26 2018-06-26 Mobile application identification method based on third-party traffic HTTP message Active CN109104381B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810670461.8A CN109104381B (en) 2018-06-26 2018-06-26 Mobile application identification method based on third-party traffic HTTP message

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810670461.8A CN109104381B (en) 2018-06-26 2018-06-26 Mobile application identification method based on third-party traffic HTTP message

Publications (2)

Publication Number Publication Date
CN109104381A true CN109104381A (en) 2018-12-28
CN109104381B CN109104381B (en) 2021-11-02

Family

ID=64844985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810670461.8A Active CN109104381B (en) 2018-06-26 2018-06-26 Mobile application identification method based on third-party traffic HTTP message

Country Status (1)

Country Link
CN (1) CN109104381B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222547A (en) * 2019-12-30 2020-06-02 中国人民解放军国防科技大学 Traffic feature extraction method and system for mobile application
CN111371700A (en) * 2020-03-11 2020-07-03 武汉思普崚技术有限公司 Traffic identification method and device applied to forward proxy environment
CN112671671A (en) * 2021-03-16 2021-04-16 北京邮电大学 Third party flow identification method, device and equipment based on third party library

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6870830B1 (en) * 2000-11-30 2005-03-22 3Com Corporation System and method for performing messaging services using a data communications channel in a data network telephone system
CN102065017A (en) * 2010-12-31 2011-05-18 成都市华为赛门铁克科技有限公司 Message processing method and device
CN103312565A (en) * 2013-06-28 2013-09-18 南京邮电大学 Independent learning based peer-to-peer (P2P) network flow identification method
CN105099803A (en) * 2014-05-15 2015-11-25 中国移动通信集团公司 Traffic identification method, application server, and network element equipment
US20170026407A1 (en) * 2013-11-25 2017-01-26 Imperva, Inc. Coordinated detection and differentiation of denial of service attacks
CN107357612A (en) * 2017-06-27 2017-11-17 聚好看科技股份有限公司 Application program updating detection method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6870830B1 (en) * 2000-11-30 2005-03-22 3Com Corporation System and method for performing messaging services using a data communications channel in a data network telephone system
CN102065017A (en) * 2010-12-31 2011-05-18 成都市华为赛门铁克科技有限公司 Message processing method and device
CN103312565A (en) * 2013-06-28 2013-09-18 南京邮电大学 Independent learning based peer-to-peer (P2P) network flow identification method
US20170026407A1 (en) * 2013-11-25 2017-01-26 Imperva, Inc. Coordinated detection and differentiation of denial of service attacks
CN105099803A (en) * 2014-05-15 2015-11-25 中国移动通信集团公司 Traffic identification method, application server, and network element equipment
CN107357612A (en) * 2017-06-27 2017-11-17 聚好看科技股份有限公司 Application program updating detection method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
WENJIA WU,JIANAN WU,YANHAO WANG,ZHEN LING;MING YANG: "Efficient Fingerprinting-Based Android Device Identification With Zero-Permission Identifiers", 《IEEE ACCESS,2016,PP(99):1-1》 *
李冰,金志刚,舒炎泰: "基于流量统计实时识别QQ语音通信的方法", 《高技术通讯》 *
黄健文,黄健,蔡秋艳,李俊磊,严冬: "UICC卡非接触应用隐式选择识别技术研究", 《微型机与应用》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222547A (en) * 2019-12-30 2020-06-02 中国人民解放军国防科技大学 Traffic feature extraction method and system for mobile application
CN111222547B (en) * 2019-12-30 2021-08-17 中国人民解放军国防科技大学 Traffic feature extraction method and system for mobile application
CN111371700A (en) * 2020-03-11 2020-07-03 武汉思普崚技术有限公司 Traffic identification method and device applied to forward proxy environment
CN112671671A (en) * 2021-03-16 2021-04-16 北京邮电大学 Third party flow identification method, device and equipment based on third party library
CN112671671B (en) * 2021-03-16 2021-06-29 北京邮电大学 Third party flow identification method, device and equipment based on third party library

Also Published As

Publication number Publication date
CN109104381B (en) 2021-11-02

Similar Documents

Publication Publication Date Title
CN105022960B (en) Multiple features mobile terminal from malicious software detecting method and system based on network traffics
CN105809035B (en) The malware detection method and system of real-time behavior is applied based on Android
CN107515915B (en) User identification association method based on user behavior data
CN109726744A (en) A kind of net flow assorted method
CN109120429B (en) Risk identification method and system
CN102469117B (en) Method and device for identifying abnormal access action
CN106354797B (en) Data recommendation method and device
CN105491444B (en) A kind of data identifying processing method and device
CN110648172B (en) Identity recognition method and system integrating multiple mobile devices
CN109104381A (en) A kind of mobile application recognition methods based on third party's flow HTTP message
CN106878108B (en) Network flow playback test method and device
CN105376223B (en) The reliability degree calculation method of network identity relationship
CN110245273B (en) Method for acquiring APP service feature library and corresponding device
CN106301980A (en) A kind of brush amount tool detection method and apparatus
CN106998336B (en) Method and device for detecting user in channel
CN110011860A (en) Android application and identification method based on network traffic analysis
CN106067879B (en) The detection method and device of information
CN106301975A (en) A kind of data detection method and device thereof
CN110891071A (en) Network traffic information acquisition method, device and related equipment
Zungur et al. Libspector: Context-aware large-scale network traffic analysis of android applications
CN107704494B (en) User information collection method and system based on application software
CN102469450B (en) Method and device for recognizing virus characteristics of mobile phone
CN108650145A (en) Phone number characteristic automatic extraction method under a kind of home broadband WiFi
CN106161403A (en) Application program restored method, device and system
CN106897619B (en) Mobile terminal from malicious software cognitive method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant