CN112053145B - Network red packet action identification method and device and storage medium - Google Patents

Network red packet action identification method and device and storage medium Download PDF

Info

Publication number
CN112053145B
CN112053145B CN202010697887.XA CN202010697887A CN112053145B CN 112053145 B CN112053145 B CN 112053145B CN 202010697887 A CN202010697887 A CN 202010697887A CN 112053145 B CN112053145 B CN 112053145B
Authority
CN
China
Prior art keywords
red packet
session
data
network
actions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010697887.XA
Other languages
Chinese (zh)
Other versions
CN112053145A (en
Inventor
王敏
程涛木
陈鑫
王可锋
刘怡
吴艾伦
王京辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Broid Technology Co ltd
Original Assignee
Broid Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Broid Technology Co ltd filed Critical Broid Technology Co ltd
Priority to CN202010697887.XA priority Critical patent/CN112053145B/en
Publication of CN112053145A publication Critical patent/CN112053145A/en
Application granted granted Critical
Publication of CN112053145B publication Critical patent/CN112053145B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/04Payment circuits
    • G06Q20/06Private payment circuits, e.g. involving electronic currency used among participants of a common payment scheme
    • G06Q20/065Private payment circuits, e.g. involving electronic currency used among participants of a common payment scheme using e-cash
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a method, a device and a storage medium for identifying network red packet actions, wherein the method comprises the following steps: based on traditional machine learning and deep learning, carrying out modeling training on traffic data which is acquired by local packet capturing and related to a network red packet to obtain an initial red packet identification model; performing real-time online learning on the initial red packet identification model based on the acquired traffic data related to the network red packet in the current network to obtain a deep red packet identification model; and identifying the flow data in the current network based on the deep red packet identification model, and outputting a network red packet action corresponding to the flow data. The invention not only improves the accuracy of identifying the network red packet action, but also ensures the adaptability of real-time service and has high flexibility.

Description

Network red packet action identification method and device and storage medium
Technical Field
The invention relates to the technical field of data action recognition, in particular to a method, a device and a storage medium for recognizing network red packet actions.
Background
The network red envelope is a new red envelope distribution mode, such as WeChat red envelope, paibao red envelope and the like. The network red packet is also an internet tool for blessing played among friends, and is also an internet tool for internet operators and merchants to distribute the red packet and send money by organizing internet online activities.
At present, in the mainstream application of the internet, the number of users of WeChat and Payment Bao hong Bao is increasing day by day, and particularly under the holidays of spring festival, mid-autumn festival, yuanxiao festival and the like, the number of users of WeChat and Pao Bao hong Bao is increasing in a blowout manner. Therefore, the operator identifies the network red packet action in the flow and deeply studies the successful and failed scenes of the network red packet so as to improve the perception of the network red packet user.
Traditional traffic analysis distinguishes different services based on port numbers of a transport layer, i.e., service traffic is classified and counted by identifying the port numbers. However, as the demand for mobile internet contents rapidly increases, HTTP and P2P based minority traffic occupies most of traffic of a mobile data network, and port number based traffic identification technology cannot identify the minority traffic. Deep Packet Inspection (DPI) is a technology for further probing a data application layer on the basis of traditional service identification based on IP five-tuple (source IP address, source port number, destination IP address, destination port number and bearer protocol). Generally, DPI technologies are classified in the industry into 3 categories: the method comprises a characteristic word-based identification technology, an application layer gateway identification technology and a behavior pattern identification technology. The DPI technology is adopted to identify the data flow service, a flow characteristic library needs to be established, the flow characteristic library and the data flow to be identified are matched through a pattern matching algorithm, and if the matching is successful, the data flow to be identified is identified as the corresponding service. For HTTP data streams, the service characteristics may exist in information such as URL, host, user-agent, etc.; for P2P-based applications, the traffic characteristics are generally digital, such as TCP-based WeChat message data packet, with port number 80 or 8080, the first 3 bytes of the first upstream payload-carrying packet being "060104", and the last 4 bytes being "04010000". However, when the network red packet action is identified by using the DPI technology, on one hand, since the data of the network red packet action is defined by the feature value (for example, the data is identified according to the IP), there may be some situations that other action scenarios (for example, a pay treasure transfer) under corresponding applications also use the IP, so that the other action scenarios are mistakenly identified as the network red packet action; on the other hand, the characteristic values such as IP and server name can be changed at any time according to the service requirements of the service provider, so for the data identified as network red packet action, the accuracy of the data is not verified, and meanwhile, the adaptability of the characteristic values to real-time services cannot be ensured.
Therefore, there is a need for an improved method for identifying the network red packet.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the utility model provides a network red packet action recognition method, a device and a storage medium, which aims to solve the problem of lower accuracy of the existing network red packet action recognition method.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
the first aspect of the embodiments of the present invention provides a method for identifying a network red packet action, including the following steps:
based on traditional machine learning and deep learning, carrying out modeling training on traffic data which is acquired by local packet capturing and related to a network red packet to obtain an initial red packet identification model;
performing real-time online learning on the initial red packet identification model based on the acquired traffic data related to the network red packet in the current network to obtain a deep red packet identification model;
and identifying the flow data in the current network based on the deep red packet identification model, and outputting a network red packet action corresponding to the flow data.
In some embodiments, the performing real-time online learning on the initial red packet identification model based on the acquired traffic data related to the network red packet in the existing network to obtain the deep red packet identification model specifically includes the following steps:
acquiring all original code streams containing network red packet actions in the current network according to flow data related to the network red packet in the current network;
converting the original code stream into a hexadecimal file taking a session as a unit and a log file which corresponds to the hexadecimal file and is provided with a timestamp mark by using a preset file conversion tool;
recording the occurrence time of the network red packet action, and obtaining a first session corresponding to the network red packet action according to the log file with the timestamp label and the corresponding relation between the log file and the session;
filtering the first session to obtain a second session;
screening the second session to obtain a third session;
adjusting the data in the third session into integer data which can be used for training to obtain a fourth session;
and training the initial red packet recognition model by using the integer data in the fourth session to obtain a deep red packet recognition model.
In some embodiments, the obtaining all original code streams in the current network, which include the network red packet action, according to the traffic data related to the network red packet in the current network specifically includes the following steps:
controlling a mobile phone group to do various network red packet actions by utilizing a preset automatic dial testing program to obtain current network flow data corresponding to each network red packet action;
recording the number of the base station covered by the mobile phone group and the number of the user mobile phone according to the corresponding relation with the current network flow data;
and acquiring all original code streams containing network red packet actions according to the corresponding relation between the base station number and the mobile phone number of the user.
In some embodiments, the filtering the first session to obtain the second session specifically includes the following steps:
according to a preset data filtering standard, preliminarily filtering the first session to obtain a preliminarily filtered first session;
and performing depth filtering on the preliminarily filtered first sessions to obtain second sessions, specifically filtering all currently known sessions which are irrelevant to the network red packet actions in the preliminarily filtered first sessions.
In some embodiments, the preset data filtering criteria is to filter out data of other actions in the first session, which have a high degree of overlap with the network red packet action; filtering out data generated by establishing a link in the first session; filtering out the encrypted data in the first session for storing the data of the encryption key; filtering out non-user application data in the first session.
In some embodiments, the screening the second session to obtain a third session specifically includes the following steps:
and screening out sessions related to network red packet actions from the second sessions according to the distribution concentration of each statistical analysis feature in the second sessions to obtain third sessions, wherein the statistical analysis features at least comprise session _ num, total _ len, app _ data _ len, tcp _ len, http _ len and ssl _ handoff _ len.
In some embodiments, the adjusting the data in the third session to be integer data that can be used for training to obtain a fourth session specifically includes the following steps:
respectively judging whether the text length of the data in each third session is greater than a specified length;
deleting data positioned at the redundant text length in the third conversation with the text length of the data larger than the specified length;
supplementing the text length of the data of the third conversation, of which the text length of the data is smaller than the specified length, to the specified length by 0;
and converting the data of which the text length reaches the specified length in the third conversation into integer data which can be used for training to obtain a fourth conversation.
In some embodiments, the training of the initial red packet recognition model by using the integer data in the fourth session to obtain the deep red packet recognition model specifically includes the following steps:
fitting the integer data in the fourth session by using a convolutional neural network or an Xboost algorithm, and classifying based on various network red packet actions;
and optimizing the initial red packet identification model by using Dropout or Batch normal based on the fourth session after fitting and classifying by a convolutional neural network or an Xboost algorithm to obtain a deep red packet identification model.
A second aspect of the embodiments of the present invention provides an apparatus for identifying a network red packet action, including:
the modeling module is used for carrying out modeling training on traffic data which is acquired by local packet capturing and related to the network red packet based on traditional machine learning and deep learning to obtain an initial red packet identification model;
the training module is used for carrying out real-time online learning on the initial red packet recognition model based on the acquired traffic data related to the network red packet in the current network to obtain a deep red packet recognition model;
and the identification module is used for identifying the traffic data in the current network based on the deep red packet identification model and outputting the network red packet action corresponding to the traffic data.
A third aspect of embodiments of the present invention provides a storage medium having stored thereon executable instructions that, when executed, perform a method according to the first aspect of embodiments of the present invention.
From the above description, compared with the prior art, the invention has the following beneficial effects:
firstly, modeling training is carried out on traffic data which is acquired by local packet capturing and is related to a network red packet by utilizing traditional machine learning and deep learning to obtain an initial red packet identification model; secondly, performing real-time online training on the initial red packet recognition model by using flow data related to the network red packet in the current network to obtain a deep red packet recognition model; and finally, identifying the flow data in the current network by using a deep red packet identification model. The invention not only improves the accuracy of identifying the network red packet action, but also ensures the adaptability of real-time service and has high flexibility.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is to be understood that the drawings in the following description are of some, but not all, embodiments of the invention. For a person skilled in the art, without inventive effort, further figures can also be obtained from the provided figures.
Fig. 1 is a schematic flowchart of a method for identifying a network red packet action according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of step S2 in fig. 1 according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of step S21 in fig. 2 according to an embodiment of the present invention;
fig. 4 is a schematic flowchart of step S24 in fig. 2 according to an embodiment of the present invention;
fig. 5 is a schematic flowchart of step S25 in fig. 2 according to an embodiment of the present invention;
fig. 6 is a schematic flowchart of step S26 in fig. 2 according to an embodiment of the present invention;
fig. 7 is a schematic flowchart of step S27 in fig. 2 according to an embodiment of the present invention;
fig. 8 is a block diagram of an apparatus for identifying a network red packet action according to an embodiment of the present invention;
fig. 9 is a block diagram of a storage medium according to an embodiment of the present invention.
Detailed Description
For purposes of promoting a clear understanding of the objects, aspects and advantages of the invention, reference will now be made in detail to the present embodiments of the invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements throughout. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example 1
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating a method for identifying a network red packet action according to an embodiment of the present invention.
As shown in fig. 1, a method for identifying a network red packet action according to a first embodiment of the present invention includes the following steps:
s1, based on traditional machine learning and deep learning, carrying out modeling training on traffic data which is acquired through local packet capturing and related to a network red packet to obtain an initial red packet identification model;
s2, performing real-time online learning on the initial red packet identification model based on the acquired traffic data related to the network red packet in the current network to obtain a deep red packet identification model;
and S3, identifying the flow data in the current network based on the deep red packet identification model, and outputting a network red packet action corresponding to the flow data.
It should be noted that the network red packet operation is not limited to one, and examples thereof include a red packet transmission success operation, a red packet transmission failure operation, a red packet reception success operation, and a red packet reception failure operation.
It should be further noted that, in step S1, the initial red envelope identification model is actually obtained through an offline training mode. In the step S1, traffic data needs to be obtained from the local packet capturing, and after the traffic data is accumulated to a certain amount, the traffic data is used as original data to perform modeling training to obtain an initial red packet recognition model and construct an initial credit. In the off-line training process, firstly, analyzing the operation flow of the network red packet to know different types of network red packet actions, such as red packet payment, payment completion and other actions; secondly, performing fusion learning of traditional machine learning and deep learning, and identifying the network red packet action based on the feature data such as session _ num, total _ len, app _ data _ len and the like and the text similarity features; and finally, performing dimensionality reduction, filtering and screening on the model based on algorithms such as rule discrimination, XGboost and the like.
The method for identifying the network red packet action provided by the first embodiment of the invention comprises the steps of firstly, utilizing the traditional machine learning and deep learning to carry out modeling training on traffic data which is acquired by local packet capturing and is related to the network red packet to obtain an initial red packet identification model; secondly, performing real-time online training on the initial red packet recognition model by using flow data related to the network red packet in the current network to obtain a deep red packet recognition model; and finally, identifying the flow data in the current network by using a deep red packet identification model. The embodiment not only improves the accuracy of identifying the network red packet action, but also ensures the adaptability of real-time service, and has high flexibility.
Example 2
Referring to fig. 2, fig. 2 is a schematic flowchart of step S2 in fig. 1 according to an embodiment of the present invention.
Different from the method for identifying the network red packet action according to the first embodiment of the present invention, the second embodiment of the present invention provides a specific flow of step S2.
As shown in fig. 2, step S2 specifically includes the following steps:
s21, acquiring all original code streams containing network red packet actions in the current network according to flow data related to the network red packets in the current network;
s22, converting the original code stream into a hexadecimal file taking a session as a unit and a log file which corresponds to the hexadecimal file and is provided with a timestamp mark by using a preset file conversion tool;
s23, recording the occurrence time of the network red packet action, and obtaining a first session corresponding to the network red packet action according to a log file with a timestamp mark and the corresponding relation between the log file and the session;
s24, filtering the first session to obtain a second session;
s25, screening the second session to obtain a third session;
s26, adjusting the data in the third session into integer data which can be used for training to obtain a fourth session;
and S27, training the initial red packet recognition model by using integer data in the fourth session to obtain a deep red packet recognition model.
It should be noted that there is a one-to-one correspondence between the hexadecimal file taking the conversation as a unit and the log file with the timestamp mark, and in a popular way, any log file is the statistics of the conversation (hexadecimal file) corresponding to the log file.
It should be noted that, when step S23 is performed, the occurrence times of all network red packet actions are recorded, and the correspondence between the session and the network red packet actions can be easily determined according to the correspondence between the occurrence times of the network red packet actions and the time in the time stamp in the log file, and the correspondence between the log file and the session.
Example 3
Referring to fig. 3 to 7, fig. 3 is a schematic flowchart of step S21 in fig. 2 according to an embodiment of the present invention, fig. 4 is a schematic flowchart of step S24 in fig. 2 according to an embodiment of the present invention, fig. 5 is a schematic flowchart of step S25 in fig. 2 according to an embodiment of the present invention, fig. 6 is a schematic flowchart of step S26 in fig. 2 according to an embodiment of the present invention, and fig. 7 is a schematic flowchart of step S27 in fig. 2 according to an embodiment of the present invention.
Unlike the method for identifying a network red packet operation according to the second embodiment of the present invention, the third embodiment of the present invention provides specific flows of steps S21, S24, S25, S26, and S27.
As shown in fig. 3, step S21 specifically includes the following steps:
s211, controlling the mobile phone group to perform various network red packet actions by using a preset automatic dial testing program to obtain current network traffic data corresponding to each network red packet action;
s212, recording the base station number covered by the mobile phone group and the mobile phone number of the user according to the corresponding relation with the current network flow data;
s213, acquiring all original code streams containing network red packet actions according to the corresponding relation between the base station number and the user mobile phone number.
As shown in fig. 4, step S24 specifically includes the following steps:
s241, preliminarily filtering the first session according to a preset data filtering standard to obtain a preliminarily filtered first session;
and S242, performing depth filtering on the preliminarily filtered first session to obtain a second session, specifically filtering all currently known sessions irrelevant to the network red packet action in the preliminarily filtered first session.
It should be understood that the preset data filtering criteria herein are: filtering out data of other actions with high coincidence degree with the network red packet actions in the first session; filtering out data generated by establishing a link in a first session; filtering out encrypted data in the first session for storing data of the encryption key; non-user application data in the first session is filtered out. The non-user application data includes, but is not limited to, data containing device information in http protocol, and resource locator data.
For clarity of understanding of step S24 in the present embodiment, step S24 is exemplified below based on WeChat and Payment treasured, respectively.
For WeChat, the specific flow of step S24 is as follows:
firstly, filtering all unidentified TCPs;
step two, filtering the original code stream rule: payload [ 0] = = 17;
thirdly, filtering all the data with host of szextshort.weixin.qq.com in the http data;
fourthly, the length of tcp payload of a packet satisfying the condition that tcp payload [ 0] = = 17;
fifthly, when the host is szextshort.weixin.qq.com, and the post satisfies data.l en > =540and data.len < =565, and the packet data.len > =210and data < =220 of htttp response, the conversation can be judged as red packet traffic;
and sixthly, in the http session, when the host is szextshort.weixin.qq.com, the data.len >1300 in the p ost data packet is satisfied, the data.len > =320and data.len = <400 in the http response data packet is satisfied, and the WeChat red packet failure flow can be judged if the conditions are satisfied.
For the pay treasure, the specific flow of step S24 is as follows:
first, the server must be one of the following domain names, sender.
And secondly, filtering out the sessions of which the server is render.
And thirdly, filtering out the sessions with the number of application data packets less than or equal to 3 when the server is mdap.
And fourthly, filtering out the sessions with the server being mcgw.
And fifthly, judging that the conversation belongs to the flow of the PayPayOad of the payment packet, wherein the length of the downstream flow tcp payOad of the application data under the render.
Sixthly, when the number of packets of application data under mdap-age.com is equal to 2, when the tcp payload length of the uplink traffic packet belongs to the interval (80, 200), the session can be judged as the traffic for which the payment packet is successfully sent, and when the tcp payload length of the uplink traffic packet belongs to the interval [0,80], the session can be judged as the traffic for which the payment packet is failed to be sent
And seventhly, when the number of packets of the application data under mdap.
And eighthly, when the number of the application data packets under mcgw.
As shown in fig. 5, step S25 specifically includes the following steps:
and S251, screening out sessions related to network red packet actions from the second sessions according to the distribution concentration of each statistical analysis feature in the second sessions to obtain third sessions, wherein the statistical analysis feature comprises but is not limited to session _ num, total _ len, app _ data _ len, tcp _ len, http _ len and ssl _ handoff _ len.
Experiments show that the distribution concentration of network red packet actions in the statistical analysis characteristic session _ num is high, red packet payment actions are mainly concentrated on 5, and payment completion actions are mainly concentrated on 4-7; the distribution concentration of the network red packet action in the statistical analysis characteristic app _ data _ len is high, the red packet payment action is mainly concentrated at about 35000, and the payment completion is mainly concentrated at about 5000.
As shown in fig. 6, step S26 specifically includes the following steps:
s261, respectively judging whether the text length of the data in each third conversation is larger than the specified length;
s262, deleting the data positioned at the redundant text length in a third session in which the text length of the data is greater than the specified length;
s263, supplementing the text length of the data of the third conversation with the text length of the data smaller than the specified length to the specified length by 0;
and S264, converting the data of which the text length of the data in the third conversation reaches the specified length into integer data which can be used for training to obtain a fourth conversation.
It should be understood that the text length of the data in the third session is different, and it is necessary to have the same length for each piece of data, so that the parallel computation can be used in step S27 to increase the training speed. That is, when the text length of the data is greater than the specified length, the redundant data text needs to be discarded; when the text length of the data is smaller than the specified length, 0 is needed to fill the data text to the specified length.
It should also be understood that if the data in the third session is encrypted data, then after the data is directly opened, garbled codes may appear, and these garbled codes cannot be used for training when step S27 is performed, and therefore, the data in the third session needs to be converted into integer data to facilitate step S27.
As shown in fig. 7, step S27 specifically includes the following steps:
s271, fitting integer data in a fourth conversation by using a convolutional neural network or an Xboost algorithm, and classifying the integer data based on various network red packet actions;
and S272, optimizing the initial red packet recognition model by using Dropout or Batch normal based on the fourth session which is subjected to fitting and classification by the convolutional neural network or the Xboost algorithm to obtain a deep red packet recognition model.
Example 4
In order to clearly understand the method for identifying network red packet actions provided in the first embodiment to the third embodiment of the present invention, the fourth embodiment of the present invention combines the first embodiment to the third embodiment, and provides another method for identifying network red packet actions, which includes the following specific steps:
s101, based on traditional machine learning and deep learning, carrying out modeling training on traffic data which is acquired through local packet capturing and related to a network red packet to obtain an initial red packet identification model;
s102, controlling a mobile phone group to perform various network red packet actions by using a preset automatic dial testing program to obtain current network traffic data corresponding to each network red packet action;
s103, recording the base station number covered by the mobile phone group and the mobile phone number of the user according to the corresponding relation with the current network flow data;
s104, acquiring all original code streams containing network red packet actions according to the corresponding relation between the base station number and the user mobile phone number;
s105, converting the original code stream into a hexadecimal file taking a session as a unit and a log file which corresponds to the hexadecimal file and is provided with a timestamp mark by using a preset file conversion tool;
s106, recording the occurrence time of the network red packet action, and obtaining a first conversation corresponding to the network red packet action according to a log file with a timestamp mark and the corresponding relation between the log file and the conversation;
s107, performing preliminary filtering on the first session according to a preset data filtering standard to obtain a first session after the preliminary filtering;
s108, performing depth filtering on the preliminarily filtered first session to obtain a second session, specifically filtering all currently known sessions irrelevant to the network red packet action in the preliminarily filtered first session;
s109, according to the distribution concentration of each statistical analysis feature in the second session, screening out sessions related to the network red packet action from the second session to obtain a third session, wherein the statistical analysis feature at least comprises session _ num, total _ len, app _ data _ len, tcp _ len, http _ len and ssl _ handoff _ len;
s110, respectively judging whether the text length of the data in each third session is greater than a specified length;
s111, deleting data positioned at the redundant text length in a third session in which the text length of the data is greater than the specified length;
s112, supplementing the text length of the data of the third conversation with the text length of the data smaller than the specified length to the specified length by 0;
s113, converting the data of which the text length reaches the specified length in the third session into integer data for training to obtain a fourth session;
s114, fitting integer data in a fourth session by using a convolutional neural network or an Xboost algorithm, and classifying the integer data based on various network red packet actions;
s115, optimizing the initial red packet recognition model by using Dropout or Batch normal based on the fourth session after fitting and classifying by a convolutional neural network or an Xboost algorithm to obtain a deep red packet recognition model;
and S116, based on the deep red packet identification model, identifying the traffic data in the current network, and outputting a network red packet action corresponding to the traffic data.
The method for identifying the network red packet action provided by the fourth embodiment of the invention can identify deeper application action contents through machine learning. Compared with the traditional DPI flow analysis technology, the method has the advantages that the original code stream is converted into the hexadecimal file taking the conversation as the unit and the log file which corresponds to the hexadecimal file and is provided with the timestamp mark by the aid of the self-researched code stream file conversion tool, then the deep red packet identification model is obtained through filtering, characteristic engineering analysis and model training, and whether the user behavior is the behavior of operating the network red packet or not and the success/failure conditions of the network red packet can be accurately obtained according to the deep red packet identification model.
In addition, the method for automatically identifying the user behaviors based on the machine learning saves research and development cost, improves content identification efficiency and accuracy, and is more advanced and efficient. Through manual dial testing and verification, the accuracy rate of the method for identifying the encrypted/non-encrypted traffic data of the network red packet based on machine learning is improved by 27% compared with the method for identifying the network red packet by using characteristic rules such as IP (Internet protocol), server name and the like in the prior art.
Example 5
Referring to fig. 8, fig. 8 is a block diagram of an apparatus for identifying a network red packet according to an embodiment of the present invention.
As shown in fig. 8, a network red packet identification device 100 according to a fifth embodiment of the present invention, which corresponds to the network red packet identification method according to the first embodiment of the present invention, includes:
the modeling module 101 is used for performing modeling training on traffic data which is acquired by local packet capturing and related to a network red packet based on traditional machine learning and deep learning to obtain an initial red packet identification model;
the training module 102 is used for performing real-time online learning on the initial red packet recognition model based on the acquired traffic data related to the network red packet in the current network to obtain a deep red packet recognition model;
and the identification module 103 is configured to identify traffic data in the existing network based on the deep red packet identification model, and output a network red packet action corresponding to the traffic data.
Example 6
Referring to fig. 9, fig. 9 is a block diagram of a storage medium according to an embodiment of the present invention.
As shown in fig. 9, a storage medium 200 according to a sixth embodiment of the present invention has stored thereon an executable instruction 201, and the executable instruction 201 is executed to perform the method according to any one of the first to fourth embodiments of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). Computer-readable storage media can be any available media that can be accessed by a computer or a data storage device, such as a server, data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk), among others.
It should be noted that, in the summary of the present invention, each embodiment is described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the method embodiment, since it is similar to the product embodiment, the description is simple, and reference may be made to the partial description of the product embodiment for relevant points.
It is further noted that, in the present disclosure, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined in this disclosure may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the summary is not intended to be limited to the embodiments shown in the summary, but is to be accorded the widest scope consistent with the principles and novel features disclosed in the summary.

Claims (8)

1. A network red packet action recognition method is characterized by comprising the following steps:
based on traditional machine learning and deep learning, carrying out modeling training on traffic data which is acquired by local packet capturing and related to a network red packet to obtain an initial red packet identification model;
acquiring all original code streams containing network red packet actions in the current network according to flow data related to the network red packet in the current network;
converting the original code stream into a hexadecimal file taking a session as a unit and a log file which corresponds to the hexadecimal file and is provided with a timestamp mark by using a preset file conversion tool;
recording the occurrence time of the network red packet action, and obtaining a first session corresponding to the network red packet action according to the log file with the timestamp mark and the corresponding relation between the log file and the session;
filtering the first session to obtain a second session;
screening out sessions related to network red packet actions from the second session according to the distribution concentration of each statistical analysis feature in the second session to obtain a third session, wherein the statistical analysis features at least comprise session _ num, total _ len, app _ data _ len, tcp _ len, http _ len and ssl _ handover _ len;
adjusting the data in the third conversation to integer data which can be used for training to obtain a fourth conversation;
training the initial red packet recognition model by using integer data in the fourth session to obtain a deep red packet recognition model;
and identifying the traffic data in the current network based on the deep red packet identification model, and outputting a network red packet action corresponding to the traffic data.
2. The method for identifying a network red packet action according to claim 1, wherein the step of obtaining all original code streams containing the network red packet action in the current network according to the traffic data related to the network red packet in the current network specifically comprises the following steps:
controlling a mobile phone group to perform various network red packet actions by utilizing a preset automatic dial testing program to obtain current network flow data corresponding to each network red packet action;
recording the base station number covered by the mobile phone group and the mobile phone number of the user according to the corresponding relation with the current network flow data;
and acquiring all original code streams containing network red packet actions according to the corresponding relation between the base station number and the user mobile phone number.
3. The method for identifying a network red packet action according to claim 1, wherein the filtering the first session to obtain a second session specifically comprises the following steps:
according to a preset data filtering standard, preliminarily filtering the first session to obtain a preliminarily filtered first session;
and performing depth filtering on the preliminarily filtered first sessions to obtain second sessions, specifically filtering all currently known sessions which are irrelevant to the network red packet actions in the preliminarily filtered first sessions.
4. The method for identifying the network red packet action according to claim 3, wherein the preset data filtering criteria are: filtering out data of other actions with high coincidence degree with the network red packet actions in the first session; filtering out data generated by establishing a link in the first session; filtering out the encrypted data in the first session for storing the data of the encrypted key; filtering out non-user application data in the first session.
5. The method for identifying a network red packet action according to claim 1, wherein the step of adjusting the data in the third session to integer data that can be used for training to obtain a fourth session specifically comprises the following steps:
respectively judging whether the text length of the data in each third session is greater than a specified length;
deleting data positioned at the redundant text length in the third conversation with the text length of the data larger than the specified length;
supplementing the text length of the data of the third session, of which the text length of the data is smaller than the specified length, to the specified length by 0;
and converting the data of which the text length reaches the specified length in the third conversation into integer data which can be used for training to obtain a fourth conversation.
6. The method according to claim 1, wherein the training of the initial red packet recognition model by using the integer data in the fourth session to obtain a deep red packet recognition model specifically comprises the following steps:
fitting the integer data in the fourth session by using a convolutional neural network or an Xboost algorithm, and classifying based on various network red packet actions;
and optimizing the initial red packet identification model by using Dropout or Batch normal based on the fourth session after fitting and classifying by a convolutional neural network or an Xboost algorithm to obtain a deep red packet identification model.
7. An apparatus for recognizing network red packet actions, comprising:
the modeling module is used for carrying out modeling training on traffic data which is acquired through local packet grabbing and is related to the network red packet based on traditional machine learning and deep learning to obtain an initial red packet identification model;
the training module is used for acquiring all original code streams containing network red packet actions in the current network according to flow data related to the network red packets in the current network; converting the original code stream into a hexadecimal file taking a session as a unit and a log file which corresponds to the hexadecimal file and is provided with a timestamp mark by using a preset file conversion tool; recording the occurrence time of the network red packet action, and obtaining a first session corresponding to the network red packet action according to the log file with the timestamp mark and the corresponding relation between the log file and the session; filtering the first session to obtain a second session; screening out sessions related to network red packet actions from the second session according to the distribution concentration of each statistical analysis feature in the second session to obtain a third session, wherein the statistical analysis features at least comprise session _ num, total _ len, app _ data _ len, tcp _ len, http _ len and ssl _ handover _ len; adjusting the data in the third session into integer data which can be used for training to obtain a fourth session; training the initial red packet recognition model by using integer data in the fourth session to obtain a deep red packet recognition model;
and the identification module is used for identifying the traffic data in the current network based on the deep red packet identification model and outputting the network red packet action corresponding to the traffic data.
8. A storage medium having stored thereon executable instructions that, when executed, perform the method of any of claims 1-6.
CN202010697887.XA 2020-07-20 2020-07-20 Network red packet action identification method and device and storage medium Active CN112053145B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010697887.XA CN112053145B (en) 2020-07-20 2020-07-20 Network red packet action identification method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010697887.XA CN112053145B (en) 2020-07-20 2020-07-20 Network red packet action identification method and device and storage medium

Publications (2)

Publication Number Publication Date
CN112053145A CN112053145A (en) 2020-12-08
CN112053145B true CN112053145B (en) 2023-01-31

Family

ID=73601090

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010697887.XA Active CN112053145B (en) 2020-07-20 2020-07-20 Network red packet action identification method and device and storage medium

Country Status (1)

Country Link
CN (1) CN112053145B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921530A (en) * 2018-06-22 2018-11-30 Oppo广东移动通信有限公司 Information judgment method, device, storage medium and terminal
CN109076013A (en) * 2016-05-10 2018-12-21 华为技术有限公司 Packet switching service recognition methods and terminal
CN109213843A (en) * 2018-07-23 2019-01-15 北京密境和风科技有限公司 A kind of detection method and device of rubbish text information
CN110874723A (en) * 2018-09-04 2020-03-10 Oppo广东移动通信有限公司 Electronic red packet detection method, electronic red packet detection device and mobile terminal
CN111061815A (en) * 2019-12-13 2020-04-24 携程计算机技术(上海)有限公司 Conversation data classification method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109076013A (en) * 2016-05-10 2018-12-21 华为技术有限公司 Packet switching service recognition methods and terminal
CN108921530A (en) * 2018-06-22 2018-11-30 Oppo广东移动通信有限公司 Information judgment method, device, storage medium and terminal
CN109213843A (en) * 2018-07-23 2019-01-15 北京密境和风科技有限公司 A kind of detection method and device of rubbish text information
CN110874723A (en) * 2018-09-04 2020-03-10 Oppo广东移动通信有限公司 Electronic red packet detection method, electronic red packet detection device and mobile terminal
CN111061815A (en) * 2019-12-13 2020-04-24 携程计算机技术(上海)有限公司 Conversation data classification method

Also Published As

Publication number Publication date
CN112053145A (en) 2020-12-08

Similar Documents

Publication Publication Date Title
CN101741644B (en) Flow detection method and apparatus
CN108337652B (en) Method and device for detecting flow fraud
CN103229479B (en) A kind of website identification method, device and network system
CN109861957A (en) A kind of the user behavior fining classification method and system of the privately owned cryptographic protocol of mobile application
CN105321108A (en) System and method for creating a list of shared information on a peer-to-peer network
CN101714952A (en) Method and device for identifying traffic of access network
Areström et al. Early online classification of encrypted traffic streams using multi-fractal features
CN101188505B (en) content type recognition method and device
CN101399843A (en) Deepened filtering method for packet
CN104486091A (en) Charging method and device
CN102857917A (en) Method for identifying internet access of mobile phone through personal computer (PC) based on signaling analysis
CN104348638A (en) Method for identifying service type of session flow and system and equipment thereof
CN101635720A (en) Filtering method of unknown flow rate and bandwidth management equipment
CN115150207A (en) Industrial network equipment identification method and device, terminal equipment and storage medium
CN108965011A (en) One kind being based on intelligent gateway deep packet inspection system and analysis method
CN112053145B (en) Network red packet action identification method and device and storage medium
CN105184559B (en) A kind of payment system and method
CN106330768A (en) Application identification method based on cloud computing
US9077662B2 (en) Service linkage control system and method
CN114189902A (en) Customized power 5G/B5G communication access method based on power service QoS flow mapping
CN116828087B (en) Information security system based on block chain connection
CN103650439B (en) The system of network entity including network entity, the method for operating network entity and the method for policy control
CN109922083B (en) Network protocol flow control system
CN102395117B (en) Method and device for identifying content type
CN112134856B (en) Application program disabling method, system, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 401120 No.2, 7th floor, Fenghuang a building, No.18, Qingfeng North Road, Yubei District, Chongqing

Applicant after: Broid Technology Co.,Ltd.

Address before: No.1, area a, building B1, Shenzhen digital technology park, No.002, Gaoxin South 7th Road, Nanshan District, Shenzhen, Guangdong 518000

Applicant before: SHENZHEN BROADTECH CO.,LTD.

GR01 Patent grant
GR01 Patent grant