CN110661796A - User action flow identification method and device - Google Patents

User action flow identification method and device Download PDF

Info

Publication number
CN110661796A
CN110661796A CN201910896941.0A CN201910896941A CN110661796A CN 110661796 A CN110661796 A CN 110661796A CN 201910896941 A CN201910896941 A CN 201910896941A CN 110661796 A CN110661796 A CN 110661796A
Authority
CN
China
Prior art keywords
operating system
target
application
under
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910896941.0A
Other languages
Chinese (zh)
Other versions
CN110661796B (en
Inventor
熊威
叶志钢
黄华桥
王赟
程波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Greenet Information Service Co Ltd
Original Assignee
Wuhan Greenet Information Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Greenet Information Service Co Ltd filed Critical Wuhan Greenet Information Service Co Ltd
Priority to CN201910896941.0A priority Critical patent/CN110661796B/en
Publication of CN110661796A publication Critical patent/CN110661796A/en
Application granted granted Critical
Publication of CN110661796B publication Critical patent/CN110661796B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method and a device for identifying user action flow, wherein the identification method comprises the following steps: triggering the full operation of the target application under the target operating system, and determining the mapping relation between the target application and the target operating system to obtain a first classification set; triggering at least one target operation of the target application under the target operating system, and determining the mapping relation between each target operation and the target operating system to obtain a second classification set; acquiring user action flow, classifying the user action flow through a first classification set, and determining an application and an operating system to which the user action flow belongs; and classifying the user action flow through a second classification set matched with the user action flow to obtain the operation corresponding to the user action flow. In the invention, through layer-by-layer screening and filtering, the problem of identification errors caused by different operating systems can be avoided, so that the fine action packet can be identified more accurately, and the user behavior can be extracted more accurately through the fine action packet.

Description

User action flow identification method and device
Technical Field
The invention belongs to the field of computer application, and particularly relates to a method and a device for identifying user action flow.
Background
With the explosion of the internet, users bring huge profits to enterprises. In the internet era, the analysis of user behaviors is beneficial to enterprises to optimize and customize user access so as to provide better service, and the method can be used for constructing a behavior model, distinguishing different types of users and screening malicious and abnormal users.
Currently, there is one identification method: the method comprises the steps of firstly, acquiring flow of application software, analyzing the acquired flow, extracting and summarizing protocol features from network flow, setting a matching expression of protocol identification, and identifying the flow acquired in a network. In an actual application scenario, the flow rate of the fine actions applied under different operating systems may be different. Therefore, the method has the problem of inaccurate identification on the identification of the action flow, and if the condition that all system flows are met is met, the feature codes are weakened or the condition that all system flows are met cannot be extracted, the feature codes cannot be extracted, so that the identification is inaccurate due to the weakening of the feature codes. Or different system characteristic codes of the same application interfere with each other to influence the identification accuracy. For example, other action traffic feature codes applied to the IOS system exactly match with fine action (such as attention) traffic applied to the android system, which causes recognition errors and application information extraction errors.
In view of the above, overcoming the drawbacks of the prior art is an urgent problem in the art.
Disclosure of Invention
Aiming at the defects or the improvement requirements of the prior art, the invention provides a method and a device for identifying user action flow, which aim to identify an application and an operating system firstly by screening and filtering layer by layer, and finally adopt differential identification aiming at the difference of different fine action flows applied to different systems, so that the problem of identification errors caused by different operating systems can be avoided and reduced, and the technical problem of inaccurate user action identification at present is solved.
To achieve the above object, according to an aspect of the present invention, there is provided a method for recognizing user action traffic, the method including:
triggering the full operation of a target application under a target operating system, and determining the mapping relation between the target application and the target operating system to obtain a first classification set;
triggering at least one target operation of a target application under a target operating system, and determining the mapping relation between each target operation and the target operating system to obtain a second classification set;
acquiring user action traffic, classifying the user action traffic through the first classification set, and determining an application and an operating system to which the user action traffic belongs;
and acquiring a second classification set matched with the user action flow based on the application and the operating system to which the user action flow belongs, classifying the user action flow through the second classification set matched with the user action flow to obtain an operation corresponding to the user action flow so as to identify the user action.
Preferably, the triggering the full operation of the target application under the target operating system, and determining the mapping relationship between the target application and the target operating system to obtain the first classification set includes:
triggering the full operation of the target application under a target operating system to obtain the application characteristics of the target application under the target operating system and the operating system characteristics of the target operating system;
and establishing a mapping relation between the application characteristics of the target application and the operating system characteristics of the target operating system to obtain a first classification set.
Preferably, the triggering the full operation of the target application under the target operating system to obtain the application features of the target application under the target operating system and the operating system features of the target operating system includes:
constructing various testing environments based on various terminals, various IP addresses and/or various account numbers;
under each test environment, triggering the full operation of the target application under the target operating system, and capturing corresponding data traffic to obtain a plurality of groups of data packets;
and analyzing the protocol type, the protocol port and/or the data load part of the data packet to obtain the application characteristics and the operating system characteristics of the target application under the target operating system.
Preferably, the analyzing the protocol type, the protocol port and/or the data load part of the data packet to obtain the application characteristic and the operating system characteristic of the target application under the target operating system includes:
traversing data load parts of a plurality of data packets, determining whether the data load parts have fixed bytes which are always kept consistent, and if so, taking the fixed bytes which are always kept consistent as application characteristics of the target application under the target operating system; and/or the presence of a gas in the gas,
acquiring the data load length of each data packet and the bytes at the designated position, determining whether a proportional relation exists between the data load length of the data packet and the bytes at the designated position, and if so, taking the proportional relation as the application characteristic of the target application under the target operating system.
Preferably, the analyzing the protocol type, the protocol port and/or the data load part of the data packet to obtain the application characteristic and the operating system characteristic of the target application under the target operating system includes:
analyzing a data load part of the data packet, and determining whether a character string representing the target operating system exists in the data load part;
and if so, taking the character string as the operating system characteristic of the target operating system.
Preferably, the analyzing the protocol type, the protocol port and/or the data load part of the data packet to obtain the application characteristic and the operating system characteristic of the target application under the target operating system further includes:
if not, acquiring a first data packet generated by the target application under a first operating system and a second data packet generated by the target application under a second operating system under the same test environment;
comparing the first data packet with the second data packet to determine whether fixed bytes which are changed along with the operating system exist in the first data packet and the second data packet;
and if so, taking the fixed byte which is changed along with the operating system as the operating system characteristic of the corresponding operating system.
Preferably, the triggering the target application to perform at least one target operation under a target operating system, and determining a mapping relationship between each target operation and the target operating system to obtain the second classification set includes:
triggering at least one target operation of a target application under a target operation system, and acquiring operation characteristics of the target operation after each target operation is triggered;
and establishing a mapping relation among the target operation, the operation characteristics corresponding to the target operation and the operating system characteristics of the target operating system to obtain a second classification set.
Preferably, the triggering the target application to at least one target operation under a target operating system, and after each target operation is triggered, acquiring the operating characteristics of the target operation includes:
constructing various testing environments based on various terminals, various IP addresses and/or various account numbers;
under each test environment, triggering target operation of the target application under a target operating system, and capturing corresponding data traffic to obtain a plurality of groups of data packets;
and analyzing the protocol type, the protocol port and/or the data load part of the data packet to obtain the operation characteristics of the target operation under the target operation system.
Preferably, the target operating system includes an IOS operating system, an Android operating system, and a Windows operating system.
According to another aspect of the present invention, there is provided an identification apparatus comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor and programmed to perform the identification method of the present invention.
Generally, compared with the prior art, the technical scheme of the invention has the following beneficial effects: the invention provides a method and a device for identifying user action flow, wherein the identification method comprises the following steps: triggering the full operation of the target application under the target operating system, and determining the mapping relation between the target application and the target operating system to obtain a first classification set; triggering at least one target operation of the target application under the target operating system, and determining the mapping relation between each target operation and the target operating system to obtain a second classification set; acquiring user action flow, classifying the user action flow through a first classification set, and determining an application and an operating system to which the user action flow belongs; and acquiring a second classification set matched with the user action flow based on the application and the operating system to which the user action flow belongs, classifying the user action flow through the second classification set matched with the user action flow to obtain an operation corresponding to the user action flow so as to identify the user action.
In the scheme of the invention, a first classification set and a second classification set are established based on analog data, an operating system and application to which user action flow belongs are determined through the first classification set, the user action flow is finely classified through the second classification set, the application and the operating system are firstly identified through layer-by-layer screening and filtering, and finally differential identification is adopted according to differences of different fine action flows applied to different systems, so that the problem of identification errors caused by different operating systems can be avoided and reduced, a fine action packet is more accurately identified, and user actions are more accurately extracted through the fine action packet.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 is a schematic flowchart of a method for identifying user action traffic according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a partial implementation process of step 10 in FIG. 1 according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a partial implementation process of step 11 in FIG. 1 according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a data packet generated by an application a under an Android operating system according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a data packet generated by application A under the IOS operating system according to an embodiment of the present invention;
fig. 6 is a schematic diagram of another data packet generated by the application a under the Android operating system according to the embodiment of the present invention;
FIG. 7 is a diagram illustrating another data packet generated by application A under the IOS operating system according to an embodiment of the present invention;
fig. 8 is a schematic view of an action flow generated by a focused action of an application a in an Android operating system according to an embodiment of the present invention;
fig. 9 is a schematic view of an action flow generated by an action of canceling attention in an Android operating system according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of action flow generated by a focused action of application A under an IOS operating system according to an embodiment of the present invention;
FIG. 11 is a schematic diagram of action flow generated by an unfocused action of application A under an IOS operating system according to an embodiment of the present invention;
fig. 12 is a schematic structural diagram of an identification apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example 1:
referring to fig. 1, the present embodiment provides a method for identifying user action traffic, where the method includes the following steps:
step 10: and triggering the full operation of the target application under a target operating system, and determining the mapping relation between the target application and the target operating system to obtain a first classification set.
The target operating system comprises an IOS operating system, an Android operating system, a Windows operating system and the like, and can be other types of operating systems.
The target application may be various types of applications currently on the market, such as chat software, financial software, and shopping software.
The full-scale operation comprises a plurality of target operations, can cover all functions of the target application, obtains a large amount of data flow by triggering all functions of the target application, analyzes the data flow, determines the mapping relation between the target application and the target operating system, and obtains a first classification set.
On an interactive interface corresponding to a target application, a target operation refers to a function button for responding to a fine action of a user, wherein the fine action refers to a specific precisely-fine behavior action, for example, a detail action such as sending a picture, sending a video, receiving a picture or initiating a voice call in chat software; paying attention to detailed actions such as stock actions, paying attention to cancellation actions, trading actions and searching actions in financial software; in the video software application, detailed actions such as video action playing and video action downloading are carried out; account login action in shopping software, shopping cart adding action, order placing and purchasing action and the like.
Step 12: and triggering at least one target operation of the target application under a target operating system, and determining the mapping relation between each target operation and the target operating system to obtain a second classification set.
In this embodiment, the first classification set includes a plurality of applications of different types and a mapping relationship between each application and an operating system; the number of the second classification sets may be multiple, each type of application corresponds to one second classification set, and the second classification set includes mapping relationships between different operations of the application and the operating system, or the number of the second classification sets is one, and multiple different types of applications share the same second classification set, and the second classification set includes mapping relationships between different operations of each application and the corresponding operating system.
The first classification set is mainly used for identifying an operation system and an application to which the user action flow belongs so as to select a proper second classification set, and the second classification set is mainly used for realizing fine classification of the user action flow and determining the specific action operation of the application to which the user action flow belongs.
In this embodiment, the key to establish the first classification set and the second classification set is to collect a large amount of analog data, and perform inductive analysis on the analog data to obtain a corresponding classification set.
Step 13: and acquiring user action flow, classifying the user action flow through the first classification set, and determining an application and an operating system to which the user action flow belongs.
The user action traffic is traffic generated based on a certain operation, and the user action traffic can and only can uniquely reflect that a certain target operation of a target application is triggered and represents a certain behavior of a user.
The user action traffic generally carries user behavior information, such as a concerned stock code, account information of the user is carried in account login traffic, name information of a video watched by the user is carried in a played video, and the like. The specific behavior of the user can be counted and analyzed through the information brought in the message, such as what types of stocks are liked, what types of articles are liked to buy, what contents are liked to watch is videos, and the like.
Step 14: and acquiring a second classification set matched with the user action flow based on the application and the operating system to which the user action flow belongs, classifying the user action flow through the second classification set matched with the user action flow to obtain an operation corresponding to the user action flow so as to identify the user action.
In an actual application scenario, if differentiation recognition is not performed according to an operating system and an application classification, the following situations may occur: (1) in terms of signature strength: if the traffic of all systems is satisfied, only the feature codes can be weakened, or the situation that all the system traffic is satisfied cannot be extracted exists, so that the situation that the feature codes cannot be extracted is caused. (2) In terms of efficiency, in any case, the full amount of rules need to be loaded each time, increasing the wear on the recognition processor. (3) From the point of view of accuracy: (3.1) if the application is not identified firstly, the fine action is directly identified, because the fine action rule is a byte characteristic and generally comprises one or more bytes, if the identification is directly carried out without processing in actual application, other application protocols can be identified, but the flow of which the byte characteristic meets the target application protocol is mistakenly identified into the flow of the application A, so that inaccurate identification is caused. And (3.2) the situation that other system action characteristic codes interfere the fine traffic of the current system exists, for example, other action traffic characteristic codes applied to the IOS system just match the fine action (such as attention) traffic of the android system, and therefore recognition errors and application information extraction errors are caused.
To solve the foregoing problems, the solution of the present embodiment has the following advantages: (1) the system only needs to satisfy the common feature code under a single operating system for the user flow, but does not need to satisfy the features under the whole system, thereby enhancing the strength of the feature code. (2) Establishing a first classification set and a second classification set based on the simulation data, firstly determining an operating system and application to which user action traffic belongs through the first classification set, and then finely classifying the user action traffic through the second classification set. When the second classification set is carried out, the fine motion identification of the single system flow only needs to load the fine motion feature codes under the corresponding system, and does not need to load the fine motion feature codes of other systems, so that the number of the loaded feature codes is reduced, the loss of an identification engine is reduced, meanwhile, the interference of the feature codes of other systems is reduced, and the problem of identification errors caused by different operating systems can be avoided and reduced. Through screening and filtering layer by layer, an application and an operating system are firstly identified, and finally differentiation identification is adopted for differences of different fine action flows applied to different systems, so that a fine action packet is identified more accurately, and a user behavior is extracted more accurately through the fine action packet.
In an alternative scheme, with reference to fig. 2, step 10 specifically includes: triggering the full operation of the target application under a target operating system to obtain the application characteristics of the target application under the target operating system and the operating system characteristics of the target operating system; and establishing a mapping relation between the application characteristics of the target application and the operating system characteristics of the target operating system to obtain a first classification set.
Specifically, a plurality of test environments are constructed based on a plurality of terminals, a plurality of IP addresses and/or a plurality of account numbers; under each test environment, triggering the full operation of the target application under the target operating system, and capturing corresponding data traffic to obtain a plurality of groups of data packets; and analyzing the protocol type, the protocol port and/or the data load part of the data packet to obtain the application characteristics and the operating system characteristics of the target application under the target operating system.
Wherein the application characteristics of the target application under the target operating system can be obtained as follows.
The first method is as follows: and traversing the data load parts of the data packets, determining whether the data load parts have fixed bytes which are always kept consistent, and if so, taking the fixed bytes which are always kept consistent as the application characteristics of the target application under the target operating system.
The second method comprises the following steps: acquiring the data load length of each data packet and the bytes at the designated position, determining whether a proportional relation exists between the data load length of the data packet and the bytes at the designated position, and if so, taking the proportional relation as the application characteristic of the target application under the target operating system.
In an actual application scenario, if data in a data packet can satisfy both the first mode and the second mode, the fixed byte in the first mode can be combined with the proportional relationship to obtain an application characteristic. In the mode, the combined features are used for enhancing judgment, so that the problems that the subdivision action rule is weak, and false identification or differentiation cannot be performed can be solved, the identification of fine flow is improved, and the accuracy of analyzing the user behavior is improved.
In this embodiment, the operating system characteristics of the target operating system may be obtained in the following manner. Specifically, a data load part of the data packet is analyzed, and whether a character string representing the target operating system exists in the data load part is determined; and if so, taking the character string as the operating system characteristic of the target operating system.
If the first data packet and the second data packet are not the same, the first data packet and the second data packet are obtained under the same test environment, and the application, the operation and the account corresponding to the first data packet and the second data packet are the same. Comparing the first data packet with the second data packet to determine whether fixed bytes which are changed along with the operating system exist in the first data packet and the second data packet; and if so, taking the fixed byte which is changed along with the operating system as the operating system characteristic of the corresponding operating system. The first operating system and the second operating system refer to two different operating systems, for example, the first operating system is an IOS operating system, the second operating system is an Android operating system, or the first operating system is an IOS operating system, and the second operating system is a Window operating system.
In this embodiment, step 11 includes: triggering at least one target operation of a target application under a target operation system, and acquiring operation characteristics of the target operation after each target operation is triggered; and establishing a mapping relation among the target operation, the operation characteristics corresponding to the target operation and the operating system characteristics of the target operating system to obtain a second classification set.
With reference to fig. 3, the main process of establishing the second classification set is briefly shown by taking target operations including an attention operation, a focus cancellation operation, and a search operation as examples.
Specifically, a plurality of test environments are constructed based on a plurality of terminals, a plurality of IP addresses and/or a plurality of account numbers; under each test environment, triggering target operation of the target application under a target operating system, and capturing corresponding data traffic to obtain a plurality of groups of data packets; and analyzing the protocol type, the protocol port and/or the data load part of the data packet to obtain the operation characteristics of the target operation under the target operation system.
The identification method is mainly applied to the field of fine action identification of the private protocol, and the conditions that the same private protocol and fine action flows under different operating systems are different are distinguished through the identification method, so that the accuracy of fine action identification of the private protocol is improved, and user behaviors can be analyzed more accurately.
Example 2:
the following explains a main implementation process of the identification method of the present invention with a specific application example in combination with embodiment 1.
In an actual application scenario, it is necessary to analyze a user's fine operation on a certain financial software (assumed to be application a) based on big data, where the fine operation (which may be understood as the target operation in embodiment 1) includes a search operation, a share attention action, and a cancel attention operation, and analyze user action traffic to finally obtain share information that the user specifically pays attention to, which share information has been searched, and share information that has been removed from attention, so as to obtain a user portrait corresponding to the financial software.
In the identification process of user actions, a first classification set and a second classification set are established, wherein the first classification set comprises a plurality of applications of different types and a mapping relation between each application and an operating system; the number of the second classification sets may be multiple, each type of application corresponds to one second classification set, and the second classification set includes mapping relationships between different operations of the application and the operating system, or the number of the second classification sets is one, and multiple different types of applications share the same second classification set, and the second classification set includes mapping relationships between different operations of each application and the corresponding operating system.
The first classification set is mainly used for identifying an operation system and an application to which the user action flow belongs so as to select a proper second classification set, and the second classification set is mainly used for realizing fine classification of the user action flow and determining the specific action operation of the application to which the user action flow belongs.
In this embodiment, the key to establish the first classification set and the second classification set is to collect a large amount of analog data, and perform inductive analysis on the analog data to obtain a corresponding classification set.
In the process of collecting the simulation data, firstly, influences of irrelevant background traffic are eliminated as much as possible (for example, a process is closed or other application traffic is prohibited), and then, a plurality of different test environments are constructed by mutually combining a plurality of different terminals, a plurality of different IP addresses and/or different account numbers. And triggering the full operation of the application A under the target operating system aiming at each test environment to capture the data packet corresponding to the application A, thereby obtaining a plurality of groups of test packets corresponding to different test environments. The target operating system comprises an IOS operating system, a Window operating system, an Android operating system or other types of operating systems and the like. Here, the respective target operating systems are an IOS operating system, a Window operating system, and an Android operating system as an example for explanation.
After acquiring a plurality of groups of data packets of the application A under the target operating system, analyzing each data packet to obtain a protocol type, a protocol port and a data load part corresponding to the data packet so as to determine the application characteristics of the application A under different operating systems. Comparing whether the protocol types of the multiple groups of data packets have a rule, if so, summarizing and inducing the application characteristics of the application A under a certain operating system according to the rule; comparing whether the protocol ports of the multiple groups of data packets have a rule, if so, summarizing and inducing the application characteristics of the application A under a certain operating system according to the rule; and comparing whether the data load part of the multiple groups of data packets has a rule, if so, summarizing and inducing the application characteristics of the application A under a certain operating system according to the rule.
Generally, the protocol types and protocol ports of different packets may change, and may not be the focus analysis object. The data load part generally has byte data which can reflect the application characteristics, and in the actual analysis, the data load part can be taken as an important analysis object.
When analyzing the data payload part, it is compared whether there are regularly occurring byte features in the data payload part, e.g. whether the data payload part carries the string information of application a is retrieved. For example, observing that a certain rule exists in the length of a data packet, the load length of the data packet or the change of a certain byte in a data load part, if an initial byte which is not influenced by the test environment exists, extracting the byte, summarizing and summarizing to obtain byte characteristics, then capturing a message for verification, if the captured message always has the byte characteristics and does not conflict with other protocols, taking the captured message as the application characteristics of the application A, and finally summarizing into an expression, thereby identifying the application.
Referring to fig. 4, a data packet generated by the application a when running under the Andriod operating system is shown, and the following conclusion is obtained by comparing a large number of captured data packets and verifying the captured large number of data packets: (1) the first 4 bytes of the first few data packets of the protocol A are always 0x 78798402; (2) the 5 th byte and the 6 th byte of the first few session data packets containing payload data have a proportional relation with the data payload length of the data packets, wherein the proportional relation is that the 5 th byte and the 6 th byte of the first packet are 0x004d, the 16-system number is converted into 77, the data payload length of the packets is 83, the difference is 6, and the 5 th byte and the 6 th byte of the second packet containing the data payload are continuously observed to be 0x0014, the 16-system number is converted into 20, and the data payload length of the packets is 26 and the difference is 6. Continuously verifying and finding that the first few session messages of the protocol all meet the rule, and verifying a large number of packet capturing also meets the rule; (3) and observing whether the data load part contains plaintext character string information containing the application A, if so, recording the position or position interval range of the character string, and comparing the plurality of groups of captured data packets to obtain that the data load part of the data packet corresponding to the application A contains the abbreviation 'xzzt' of the application A, and the abbreviation 'xzzt' is positioned in the position interval.
Referring to FIG. 5, a data packet generated by the runtime of application A under the IOS operating system is shown, which can be analyzed in the manner described above.
The foregoing mainly describes a method for determining application characteristics of application a, and the following describes an identification method for obtaining operating system characteristics from a message.
Firstly, the influence of irrelevant background traffic (for example, closing the process or prohibiting other application traffic and the like) is eliminated as much as possible, the protocol type of the application A is determined based on which protocol is analyzed for the message, the data load part is analyzed, and whether the character string characteristic obviously representing the operating system exists in the data load part is determined. The message is analyzed, and obvious operating system information is determined to exist in a tcp load part of a protocol corresponding to the application A. As shown in fig. 6 and 7, the distinct os information exists in the data packet, the character "Android" in fig. 6 may represent the Android os, the character "iphone" in fig. 7 may represent the IOS os, the existence of the fixed character in the load portion indicates the os information formula, and the fixed character is used as the os feature.
If no obvious operating system information exists, under the same test environment, a first data packet generated by the application a under the first operating system is acquired, and a second data packet generated by the application a under the second operating system is acquired. And comparing the first data packet with the second data packet, and determining whether bytes which change with the operating system exist in the first data packet and the second data packet and are not changed all the time under the same operating system as the operating system characteristics. Then, the operating system characteristic is verified through a large number of repeated packet grabbing, and the byte is used as the operating system characteristic after the verification is passed.
After the operating system features and the application features are obtained according to the foregoing process, a first classification set based on the application features and the operating system features is established, and an establishing process of a second classification set is described below by way of example.
The method comprises the steps of collecting simulation data aiming at the concerned action of an application A, eliminating influences of irrelevant background flow (such as closing a process or prohibiting other application flow) as far as possible, and then constructing various different test environments by mutually combining various different terminals, various different IP addresses and/or different account numbers.
And triggering the target operation of the application A under the target operating system aiming at each test environment to capture the data packet corresponding to the target operation, thereby obtaining a plurality of groups of test packets corresponding to different test environments. The data packet is parsed to obtain the protocol type, the port type and the data load part corresponding to the data packet, because the target operation is a fine action, the target operation is usually 1 packet (or several packets), and the data packet includes the request information of the target operation. And comparing whether the invariable bytes exist in the data load part in the plurality of groups of data packets, and if so, taking the bytes as the operation characteristics of the target operation. Then, verifying, under the condition that other conditions are not changed, capturing packets of other actions except for the target operation, if the other actions generate the same bytes, the verification fails, and the bytes cannot be used as the operation characteristics of the target operation; if the same byte is not generated by other actions and is generated only when the target operation is performed under the operating system, the verification is passed and the byte is taken as the operating characteristic of the target operation.
With reference to fig. 8, a target operation is taken as a focused action, and a target operating system is an Android operating system for example. Acquiring a data packet corresponding to the concerned action, comparing and analyzing data load parts of the data packet, wherein the 16 th last byte and the 15 th last byte of a TCP load part under the concerned action flow are always 0x8d00, and through a large number of repeated packet capturing verifications, the flow of the type is not generated by other actions under an Android operating system, and the 2 bytes are used as the operating characteristics of the concerned action. In addition, the data packet corresponding to the action of interest further includes a byte feature and request information (for example, a stock code) for operating the action of interest, and the foregoing byte feature and the stock code may be used as the operation feature of the action of interest. And extracting the 8 bytes as the concerned stock code in combination with the fact that the concerned stock code recorded in the actual action is the same as the stock code character string contained in the last 8 bytes.
With reference to fig. 10, the target operation is a focused action, and the target operating system is an IOS operating system. Acquiring a data packet corresponding to the concerned action, comparing and analyzing a data load part, wherein the 32 nd last byte and the 31 st last byte of the TCP load part under the concerned action flow are always 0x8c10, and through a large number of repeated packet capturing verifications, the flow of the type is not generated by other actions under an IOS operating system, and the 2 bytes are taken as the concerned characteristic. In addition, the data packet corresponding to the action of interest further includes a byte feature and request information (for example, a stock code) for operating the action of interest, and the foregoing byte feature and the stock code may be used as the operation feature of the action of interest. In connection with the illustration, the stock code of interest recorded in the actual action is the same as the stock code character string contained in the reciprocal 24 to 17 bytes, and these 8 bytes are extracted as the stock features of interest.
With reference to fig. 9, a target operation is taken as an attention cancelling action, and a target operating system is an Android operating system for example. Acquiring a data packet corresponding to the attention-canceling action, comparing and analyzing a data load part, wherein the 16 th last byte and the 15 th last byte of a tcp load part are always 0x9d01 under the attention-canceling action flow, verifying that the type of flow is not generated by other actions under an android system through a large number of repeated packet capturing, and taking the 2 bytes as the attention-canceling characteristic. Meanwhile, the stock code recorded in the actual action for canceling the attention is combined with the stock code character string contained in the last 8 bytes, and the last 8 bytes are extracted as the stock code for canceling the attention. Referring to fig. 11, the target operation is a cancel attention action, and the target operating system is an IOS operating system. And acquiring a data packet corresponding to the action of canceling attention, and comparing and analyzing the data load part, wherein the 32 th reciprocal byte and the 31 st reciprocal byte of the tcp load part are always 0x9c11 under the flow of the attention action. And verifying that the type of traffic is not generated by other actions under the IOS system through a large number of repeated packet grabbing, and taking the 2 bytes as a feature of canceling attention. Meanwhile, in combination with that the stock code of canceling the attention recorded in the actual action is the same as the stock code character string contained in the reciprocal 24 to 17 bytes, the 8 bytes are extracted as the stock code of canceling the attention.
As can be seen from the above specific examples, corresponding operation characteristics of the same operation of the same application under different operating systems are different, and therefore, when identifying the user action traffic, it is necessary to identify the operating system to which the user action traffic belongs first, and then perform fine classification on the user action traffic, otherwise, there is a high possibility that an identification error occurs and the user action traffic cannot be identified.
In an actual application scenario, other fine-motion traffic analysis methods are the same, and all the methods are used for comparing a data load part through a large number of packet capturing, and observing plaintext character strings or byte characteristics brought by the data load part to obtain information so as to obtain corresponding operation characteristics, which are not listed.
After the operation characteristics corresponding to all target operations of the application A are obtained, a mapping relation is established among the target operations, the operation characteristics and the operating system characteristics to obtain a second classification set so as to accurately identify the user action flow.
In an actual application scene, after the action flow of the user is accurately identified, the required information is output to a ticket from the fine action packet, and the user behavior is counted and analyzed through the output ticket. The call ticket comprises the IP information, account information and behavior action information of the user, so that the user behavior is finely classified and counted.
Example 3:
referring to fig. 12, fig. 12 is a schematic structural diagram of an identification device according to an embodiment of the present invention. The identification means of the present embodiment comprises one or more processors 41 and a memory 42. In fig. 12, one processor 41 is taken as an example.
The processor 41 and the memory 42 may be connected by a bus or other means, and fig. 12 illustrates the connection by a bus as an example.
The memory 42, as a non-volatile computer-readable storage medium based on an identification method, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as the identification method of user action traffic and corresponding program instructions in embodiments 1 and 2. The processor 41 implements the functions of the recognition methods of the user action traffic of embodiments 1 and 2 by executing various functional applications and data processing of the recognition methods of the user action traffic by executing nonvolatile software programs, instructions, and modules stored in the memory 42.
The memory 42 may include, among other things, high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 42 may optionally include memory located remotely from processor 41, which may be connected to processor 41 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Please refer to fig. 1 to 11 and the related text description, which are not repeated herein.
It should be noted that, for the information interaction, execution process and other contents between the modules and units in the apparatus and system, the specific contents may refer to the description in the embodiment of the method of the present invention because the same concept is used as the embodiment of the processing method of the present invention, and are not described herein again.
Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for identifying user action traffic is characterized by comprising the following steps:
triggering the full operation of a target application under a target operating system, and determining the mapping relation between the target application and the target operating system to obtain a first classification set;
triggering at least one target operation of a target application under a target operating system, and determining the mapping relation between each target operation and the target operating system to obtain a second classification set;
acquiring user action traffic, classifying the user action traffic through the first classification set, and determining an application and an operating system to which the user action traffic belongs;
and acquiring a second classification set matched with the user action flow based on the application and the operating system to which the user action flow belongs, classifying the user action flow through the second classification set matched with the user action flow to obtain an operation corresponding to the user action flow so as to identify the user action.
2. The identification method according to claim 1, wherein the triggering of the full operation of the target application under the target operating system, and the determining of the mapping relationship between the target application and the target operating system to obtain the first classification set comprises:
triggering the full operation of the target application under a target operating system to obtain the application characteristics of the target application under the target operating system and the operating system characteristics of the target operating system;
and establishing a mapping relation between the application characteristics of the target application and the operating system characteristics of the target operating system to obtain a first classification set.
3. The identification method of claim 2, wherein the triggering of the full operation of the target application under the target operating system to obtain the application characteristics of the target application under the target operating system and the operating system characteristics of the target operating system comprises:
constructing various testing environments based on various terminals, various IP addresses and/or various account numbers;
under each test environment, triggering the full operation of the target application under the target operating system, and capturing corresponding data traffic to obtain a plurality of groups of data packets;
and analyzing the protocol type, the protocol port and/or the data load part of the data packet to obtain the application characteristics and the operating system characteristics of the target application under the target operating system.
4. The identification method according to claim 3, wherein parsing the protocol type, the protocol port and/or the data payload portion of the data packet to obtain the application characteristic and the operating system characteristic of the target application under the target operating system comprises:
traversing data load parts of a plurality of data packets, determining whether the data load parts have fixed bytes which are always kept consistent, and if so, taking the fixed bytes which are always kept consistent as application characteristics of the target application under the target operating system; and/or the presence of a gas in the gas,
acquiring the data load length of each data packet and the bytes at the designated position, determining whether a proportional relation exists between the data load length of the data packet and the bytes at the designated position, and if so, taking the proportional relation as the application characteristic of the target application under the target operating system.
5. The identification method according to claim 3, wherein parsing the protocol type, the protocol port and/or the data payload portion of the data packet to obtain the application characteristic and the operating system characteristic of the target application under the target operating system comprises:
analyzing a data load part of the data packet, and determining whether a character string representing the target operating system exists in the data load part;
and if so, taking the character string as the operating system characteristic of the target operating system.
6. The method according to claim 5, wherein parsing the protocol type, the protocol port, and/or the data payload of the packet to obtain the application characteristic and the operating system characteristic of the target application under the target operating system further comprises:
if not, acquiring a first data packet generated by the target application under a first operating system and a second data packet generated by the target application under a second operating system under the same test environment;
comparing the first data packet with the second data packet to determine whether fixed bytes which are changed along with the operating system exist in the first data packet and the second data packet;
and if so, taking the fixed byte which is changed along with the operating system as the operating system characteristic of the corresponding operating system.
7. The method according to claim 1, wherein the triggering the target application to perform at least one target operation under a target operating system, and determining a mapping relationship between each target operation and the target operating system to obtain the second classification set comprises:
triggering at least one target operation of a target application under a target operation system, and acquiring operation characteristics of the target operation after each target operation is triggered;
and establishing a mapping relation among the target operation, the operation characteristics corresponding to the target operation and the operating system characteristics of the target operating system to obtain a second classification set.
8. The method of claim 7, wherein the triggering the target application to perform at least one target operation under a target operating system, and after each target operation is triggered, acquiring the operating characteristics of the target operation comprises:
constructing various testing environments based on various terminals, various IP addresses and/or various account numbers;
under each test environment, triggering target operation of the target application under a target operating system, and capturing corresponding data traffic to obtain a plurality of groups of data packets;
and analyzing the protocol type, the protocol port and/or the data load part of the data packet to obtain the operation characteristics of the target operation under the target operation system.
9. The identification method according to any one of claims 1 to 8, wherein the target operating system comprises an IOS operating system, an Android operating system and a Windows operating system.
10. An identification device comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor and programmed to perform the identification method of any of claims 1 to 9.
CN201910896941.0A 2019-09-23 2019-09-23 User action flow identification method and device Active CN110661796B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910896941.0A CN110661796B (en) 2019-09-23 2019-09-23 User action flow identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910896941.0A CN110661796B (en) 2019-09-23 2019-09-23 User action flow identification method and device

Publications (2)

Publication Number Publication Date
CN110661796A true CN110661796A (en) 2020-01-07
CN110661796B CN110661796B (en) 2022-02-01

Family

ID=69038346

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910896941.0A Active CN110661796B (en) 2019-09-23 2019-09-23 User action flow identification method and device

Country Status (1)

Country Link
CN (1) CN110661796B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111953563A (en) * 2020-07-31 2020-11-17 中国移动通信集团江苏有限公司 User identification method, device, equipment and computer storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104243618A (en) * 2014-07-02 2014-12-24 北京润通丰华科技有限公司 Method and system based on client behaviour identification network sharing
US20150146973A1 (en) * 2013-11-27 2015-05-28 Adobe Systems Incorporated Distributed similarity learning for high-dimensional image features
US20160110232A1 (en) * 2014-10-17 2016-04-21 International Business Machines Corporation Integrated support for application porting transparency and streamlined system migration in heterogeneous platform environments
CN106778264A (en) * 2016-11-24 2017-05-31 北京金山安全管理系统技术有限公司 The application program analysis method and analysis system of a kind of mobile client
CN106936667A (en) * 2017-04-17 2017-07-07 东南大学 A kind of main frame real-time identification method based on application rs traffic distributed analysis
CN110011860A (en) * 2019-04-16 2019-07-12 湖南警察学院 Android application and identification method based on network traffic analysis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150146973A1 (en) * 2013-11-27 2015-05-28 Adobe Systems Incorporated Distributed similarity learning for high-dimensional image features
CN104243618A (en) * 2014-07-02 2014-12-24 北京润通丰华科技有限公司 Method and system based on client behaviour identification network sharing
US20160110232A1 (en) * 2014-10-17 2016-04-21 International Business Machines Corporation Integrated support for application porting transparency and streamlined system migration in heterogeneous platform environments
CN106778264A (en) * 2016-11-24 2017-05-31 北京金山安全管理系统技术有限公司 The application program analysis method and analysis system of a kind of mobile client
CN106936667A (en) * 2017-04-17 2017-07-07 东南大学 A kind of main frame real-time identification method based on application rs traffic distributed analysis
CN110011860A (en) * 2019-04-16 2019-07-12 湖南警察学院 Android application and identification method based on network traffic analysis

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111953563A (en) * 2020-07-31 2020-11-17 中国移动通信集团江苏有限公司 User identification method, device, equipment and computer storage medium

Also Published As

Publication number Publication date
CN110661796B (en) 2022-02-01

Similar Documents

Publication Publication Date Title
US10009358B1 (en) Graph based framework for detecting malicious or compromised accounts
CN109063745B (en) Network equipment type identification method and system based on decision tree
CN108768883B (en) Network traffic identification method and device
CN113706149A (en) Big data wind control processing method and system for dealing with online payment data threat
CN109086422B (en) Machine bullet screen user identification method, device, server and storage medium
CN115398860A (en) Session detection method, device, detection equipment and computer storage medium
CN113271237B (en) Industrial control protocol analysis method and device, storage medium and processor
CN112134893B (en) Internet of things safety protection method and device, electronic equipment and storage medium
CN110768875A (en) Application identification method and system based on DNS learning
CN104252592A (en) Method and device for identifying plug-in application program
CN110245273B (en) Method for acquiring APP service feature library and corresponding device
EP3905084A1 (en) Method and device for detecting malware
CN108055166B (en) Nested application layer protocol state machine extraction system and extraction method thereof
CN110661796B (en) User action flow identification method and device
CN111552626A (en) Method and system for testing developed system using real transaction data
CN116134785A (en) Low latency identification of network device attributes
CN114697066A (en) Network threat detection method and device
CN109190408B (en) Data information security processing method and system
CN115865525A (en) Log data processing method and device, electronic equipment and storage medium
CN114095235B (en) System identification method, device, computer equipment and medium
CN113032836B (en) Data desensitization method and apparatus
CN107229865B (en) Method and device for analyzing Webshell intrusion reason
CN111079144B (en) Virus propagation behavior detection method and device
JP2018121262A (en) Security monitoring server, security monitoring method, program
CN110377499A (en) The method and device that a kind of pair of application program is tested

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A method and device for identifying user action flow

Effective date of registration: 20220620

Granted publication date: 20220201

Pledgee: Guanggu Branch of Wuhan Rural Commercial Bank Co.,Ltd.

Pledgor: WUHAN GREENET INFORMATION SERVICE Co.,Ltd.

Registration number: Y2022420000171

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20230704

Granted publication date: 20220201

Pledgee: Guanggu Branch of Wuhan Rural Commercial Bank Co.,Ltd.

Pledgor: WUHAN GREENET INFORMATION SERVICE Co.,Ltd.

Registration number: Y2022420000171

PC01 Cancellation of the registration of the contract for pledge of patent right