CN116723238A - API encrypted flow collection and labeling method based on man-in-the-middle agent - Google Patents

API encrypted flow collection and labeling method based on man-in-the-middle agent Download PDF

Info

Publication number
CN116723238A
CN116723238A CN202310769946.3A CN202310769946A CN116723238A CN 116723238 A CN116723238 A CN 116723238A CN 202310769946 A CN202310769946 A CN 202310769946A CN 116723238 A CN116723238 A CN 116723238A
Authority
CN
China
Prior art keywords
api
traffic
application
request
network access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310769946.3A
Other languages
Chinese (zh)
Inventor
朱宇坤
赵毅卓
周玉祥
宁延硕
陈瑞东
张小松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202310769946.3A priority Critical patent/CN116723238A/en
Publication of CN116723238A publication Critical patent/CN116723238A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/565Conversion or adaptation of application format or content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses an API encrypted flow collection and labeling method based on an intermediate agent, and belongs to the technical field of network flow data collection. The invention adopts the man-in-the-middle agent client to detect all network access requests of the application terminal, analyzes the concerned flow, extracts url links, parameters and other information in the flow to form an API interface document; and matching the encrypted network flow with the specific application by utilizing the mapping between the target application process number and the API traffic source port, thereby achieving the aim of marking the API encrypted traffic and generating a corresponding log document. The invention improves the efficiency and expansibility of flow collection and labeling based on the client-server mode, not only can realize a distributed structure, but also can collect large-scale application API encrypted flow and corresponding keys. The invention can automatically complete the simulation of various user API request parameters, and directionally generate and collect API request flow.

Description

API encrypted flow collection and labeling method based on man-in-the-middle agent
Technical Field
The invention belongs to the technical field of network traffic data acquisition, and particularly relates to an API encrypted traffic acquisition and labeling method based on an intermediary agent.
Background
With the rapid development of the internet, various applications occupy people's lives, including web applications, mobile applications, and PC applications. People use these applications to browse web pages to obtain knowledge, shop online, chat with friends, etc. But with this, there is raised a safety concern as to whether or not privacy of user access, etc., can be obtained through a study of traffic, particularly traffic between the target application and the server API (Application Program Interface). Although most of applications encrypt traffic at present, the encrypted traffic cannot completely cover the user behavior characteristics behind the traffic, the traffic characteristics of the applications, and the like, but to develop analysis and research on the traffic of an application network, firstly, how to collect application API encrypted traffic data on a large scale and make targeted labeling on the application API encrypted traffic data is faced, so that the API encrypted traffic collection and labeling system can well meet the requirements.
The existing mainstream application flow collection means is passive flow collection, and most of passive flow collection uses backbone network nodes of network service providers or other network interfaces belonging to specific organization teams for providing network access services to the public. By being able to capture web traffic data on a large scale at these critical network nodes, but not providing specific information about the traffic, particularly which API interface the traffic originates from, the assistance provided for traffic research is limited.
The existing mainstream application flow labeling means is mainly implemented by using a DPI (Deep Packet Inspection ) technology, that is, the existing DPI tool is used for analyzing the acquired data packet, and fields with obvious classification characteristics are obtained to classify the flow. However, such methods mainly aim at unencrypted HTTP traffic, and have poor effect on HTTPs traffic using encryption algorithms, because encrypted packets cannot be parsed. Meanwhile, such methods are difficult to achieve complete accuracy and even completely inapplicable to unfamiliar traffic of unknown web application types.
The existing acquisition technology mainly carries out large-scale acquisition on backbone network nodes, cannot refine influence factors of flow data so as to realize self-defined environment parameters, the acquired flow data is generally interfered by unknown request parameter influence factors, the statistical distribution of the flow data is single or fluctuation is obvious, and the later research work is seriously influenced, so that the traditional technology is not suitable for marking HTTPS flow any more.
Disclosure of Invention
The invention aims to provide an API encrypted flow collection and labeling method based on an intermediate agent, which aims to solve the technical problems that the conventional flow collection and labeling method cannot provide specific parameter information or incomplete information of encrypted API flow, so that encrypted flow labeling is inaccurate and the application range is small.
In order to achieve the above purpose, the invention adopts the following technical scheme:
an API encrypted flow collection and labeling method based on an intermediate agent comprises the following steps in a system comprising a cloud server and an application terminal provided with an intermediate agent client, wherein the system comprises the following steps:
the method comprises the steps that an intermediary proxy client detects and intercepts a network access request (adopting real parameters of a user) sent by an application program of an application terminal to an application server (such as a web server), analyzes the network access request to generate a corresponding API interface document, wherein the API interface document comprises network request links and request parameters of different APIs of a target application server; the man-in-the-middle agent client generates a simulated network access request within the parameter variable range according to the real flow information of the application terminal, and disguises the application program to establish connection and communication with the application server, sends the content returned by the application server to the corresponding application program, records and stores a communication log file and an API interface document at the same time, and uploads the communication log file and the API interface document to the cloud server;
the cloud server generates a corresponding script code for each API interface document, randomly selects parameters from an interface parameter dictionary and replaces the parameters of each API interface document, generates a simulation network access request in batches and sends the simulation network access request to an application server, namely the cloud server generates a large number of communication flows of corresponding APIs according to the API interface documents and the parameter dictionary; when sending a simulation network access request to an application server, the cloud server collects encrypted communication traffic and a communication private key generated by the same API interface, generates a traffic log file for traffic marking when collecting the encrypted communication traffic, and records the collected communication private key for decrypting the encrypted communication traffic.
Further, the request parameters in the API interface document include: the encryption algorithm and key information used for each handshake.
Further, the application program of the application terminal contains user preference settings and cookie caching (data stored on the user's local terminal for session tracking in order to discern the user's identity) and is able to access different web applications.
Further, the man-in-the-middle agent client analyzes the network access request and generates a corresponding API interface document and stores the API interface document specifically including:
screening and filtering the traffic protocol types in the network access request, and reserving http and https protocol traffic;
analyzing the network data packet of the network access request according to the corresponding protocol format, extracting url links and parameter fields in the network data packet, and storing the url links and the parameter fields in an xml file format.
Further, the broker client obtains a process corresponding to the current application program by reading a Process ID (PID) in an operating system of the application terminal, and obtains a flow corresponding to the current application program by a process source port number. The invention maps the PID to a specific system application through a packet management application program interface of an operating system to obtain the mapping relation between the flow and the application. Namely, the labeling mode is a fine-grained corresponding relation from the application, the process and the source port to the API traffic.
Further, the man-in-the-middle agent client comprises an API request positioning and intercepting module, an API request forwarding module and an API request parameter recording module;
the API request positioning and intercepting module is used for detecting and intercepting a network access request sent by an application program to the application page server, analyzing the network access request and generating a corresponding API interface document;
the API request forwarding module is used for generating a simulated network access request, establishing connection and communication between the disguised application program and the application server, and sending the content returned by the webpage server to the corresponding application program;
the API request parameter recording module is used for recording the communication log file and the API interface document.
Further, the cloud server is deployed at a physical network position capable of establishing stable connection with the application server, and comprises a packet transmitter and a flow collector;
the package sender is used for generating a corresponding script code for each API interface document, randomly selecting parameters from an interface parameter dictionary and replacing the parameters of each API interface document, generating simulation network access requests in batches and sending the simulation network access requests to the application server;
the traffic collector monitors and collects encrypted communication traffic and a communication private key generated by the same API interface based on a preset traffic capture tool (such as tcpdump).
Further, the batch generation of the simulation network access request by the packet transmitter of the cloud server and the transmission to the application server are specifically as follows:
generating communication flow of corresponding APIs in batches according to API interface documents and preset parameter dictionaries
Analyzing the API interface document, reading URL links and request parameters in the API interface document, loading a parameter dictionary on a cloud server, replacing related request parameters, generating corresponding python codes, executing the python codes to generate a simulation network access request, and sending the simulation network access request to an application server.
Further, the traffic collector stores the collected encrypted traffic according to the designated encrypted traffic.
Further, the flow storage format is: an API of a web application requests a format corresponding to a record of traffic pcap.
Further, the method for marking the collected encrypted communication traffic in the API log document uploaded by the agent client of the middleman by the traffic collector is as follows: the url links are compared. Because only the parameters are modified, the links are unchanged, so only url links are compared.
The technical scheme provided by the invention has at least the following beneficial effects:
(1) The invention can collect the complete and decryptable flow data set of the API request plaintext. The private key of encrypted communication between the application terminal (user side) and the application server side can be obtained by the mode of the intermediate proxy, the private key refers to the private key of the user side, the response message of the server can be decrypted, and the intermediate proxy client monitors and intercepts the plaintext message from the application, so that the plaintext message sent by the application of the user side can be obtained without decryption, and the network traffic safety research work is facilitated under the condition of simultaneously mastering a large number of plaintext messages and encrypted messages;
(2) The invention can customize the API requests simulating different parameters. An API interface document may be generated for the client application and the server, where the API interface document includes different API interface documents of the server, including the interface address and the request parameters, and then by continuously changing the request parameters, simulating to generate large-scale encrypted traffic of the same interface under different devices and network environments, and then capturing the encrypted traffic by using a traffic collection tool, where the change of the parameters is determined by a pre-stored parameter dictionary.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a block diagram of an API encrypted traffic collection and labeling system based on an intermediary agent provided by the invention;
FIG. 2 is a flowchart of the method for collecting and labeling API encrypted traffic based on an intermediary agent provided by the invention;
fig. 3 is a functional block diagram of the man-in-the-middle proxy client certificate replacement and hijacking, forwarding and recording requests provided by the invention.
Wherein the reference numerals are annotated as follows:
1-PC, 2-man-in-the-middle agent client, 3-cloud server, 4-flow collector, 5-browser, 6-packet sender, 2-1-API request forwarding module, 2-request positioning interception module and 2-3-API request parameter recording device.
Description of the preferred embodiments
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
The embodiment of the invention discloses an API encrypted flow collection and labeling method based on an intermediate agent. The method adopts an intermediate agent client to detect all network access requests of the application of an application terminal (such as a PC computer), analyzes the concerned http/https flow, extracts url links, parameters and other information in the http/https flow to form an API interface document; and the mapping between the target application process number (PID) and the API traffic source port is utilized to realize the matching of the TLS (secure transport layer protocol) encrypted network flow and the specific application, thereby achieving the aim of marking the API encrypted traffic and generating a corresponding log document. The invention improves the efficiency and expansibility of flow collection and labeling based on the client-server mode, not only can realize a distributed structure, but also can collect large-scale application API encrypted flow and corresponding secret keys, and realize the acquisition of application and API-level flow classification labels on the basis of large-scale flow collection. The invention can automatically complete simulation of various user API request parameters, and directionally generate and collect API request flow; and the preservation of the encryption traffic key can help to develop the research on the influence factors of the encryption traffic, which is still blank in the field of encryption traffic analysis at present.
As shown in fig. 1, the method for collecting and labeling API encrypted traffic based on an agent for man-in-the-middle according to the embodiments of the present invention is that a system including a cloud server and a PC computer with an agent client installed therein is heavy, and various applications (e.g., a browser) and counterfeit certificates replaced by the agent client are installed in the PC computer. The man-in-the-middle client uses session hijacking and certificate replacement to achieve its functions. The session hijacking is to establish connection and communication between disguised application software and a remote web server, and simultaneously, the disguised web server forwards messages to a PC computer, and the man-in-the-middle proxy client simultaneously grasps session keys and certificates of the PC computer and the man-in-the-middle proxy client and the web server, so that encrypted traffic of the two parties can be decrypted and analyzed, url links and parameters of concerned http/https traffic can be extracted to form an API interface document. The man-in-the-middle proxy client aims to fool the PC computer into a connection with the web server and establish a connection through two sets of protocol certificates, i.e. if this signature does not match or comes from an untrusted party, the secure client of the third party will simply disconnect and refuse to continue.
Certificate replacement can thus be used to solve the above problem, requiring manual generation of a complete certificate, and adding the generated certificate to the trusted root certificate directory in order for the PC computer to trust the broker proxy server. The certificate (used for authenticating TLS protocol) is a mechanism used for verifying identity in an encrypted TLS protocol used by HTTPS, and is a file in digital signature form, and includes a public key of a certificate owner and certificate information of a third party. Certificates are classified into two types, self-signed certificates and CA certificates. Typically self-signed certificates cannot be used for identity authentication. The basic principle of identity authentication based on CA certificate between the application client and the application server in the key negotiation process of TLS protocol is as follows: firstly, an application client needs to trust a CA certificate of an application server (for example, the certificate is in a trusted certificate list of an operating system, or a user adds a public key and a private key of the CA into the trusted list in a mode of 'installing a root certificate', and the like), and then the CA signs an original certificate of the application server (private key encryption), so as to generate a final certificate; after the application client obtains the final certificate, the application client decrypts the final certificate by using the public key contained in the certificate to obtain the original certificate of the application server. Taking the encryption of RSA as an example, if decryption with the public key of the CA is successful, indicating that the certificate is indeed encrypted with the private key of the CA, the application server may be considered trusted.
In addition, the invention organizes in a distributed client-cloud server mode, monitors network traffic of PC equipment application, analyzes http/https traffic concerned in the network traffic, extracts API interface documents, uploads the API interface documents to a server for traffic replay, and achieves the purposes of acquiring traffic classification label data of application and API interface granularity level on the basis of large-scale traffic acquisition.
As a possible implementation manner, the API encrypted traffic collection and labeling method based on the broker agent provided by the embodiment of the present invention is: in a system including a PC computer 1 and a cloud server 3, the PC computer is installed with a browser 5, and an intermediary proxy client 2; the browser 5 contains user preference settings and cookie caching and is able to access different web applications; the broker client 2 includes an API request positioning and intercepting module 2-2, an API request forwarding module 2-1, and an API request parameter recording module (i.e., an API request parameter recording device 2-3 in fig. 3), as shown in fig. 3, the request sent to the web server by the browser (the detected network request uses the parameters of the real user) is intercepted by the API request positioning and intercepting module 2-2, then the API request forwarding module 2-1 disguises the browser to communicate with the web server, resends the content returned by the web server to the browser, and at the same time, the API request parameter recording device 2-3 records the request parameters. The intermediary proxy client 2 replaces the intermediary CA certificate at the PC 1 end, the intermediary proxy client 2API request positioning and intercepting module 2-2 intercepts all network requests of the browser installed on the PC 1 for accessing the web application, the API request forwarding module 2-1 replaces the PC 1 to communicate with the web server and forwards the network access request of the PC 1, the API request parameter recording device 2-3 records the encryption algorithm and key information used by each handshake to carry out decryption hijacking, and therefore the request data and request parameters of the complete application can be obtained on the intermediary proxy client 2; the cloud server 3 is deployed at a physical network position (as shown in fig. 3) capable of establishing stable connection with the web server, and the cloud server 3 comprises a packet transmitter 6 and a flow collector 4, wherein the packet transmitter 6 generates simulated user flow in batches based on real user request data, a parameter dictionary and time delay recorded by the man-in-the-middle agent client 2; the flow collector 4 collects the flow generated by the packet transmitter 6 by using a flow capturing tool such as tcpdump and the like, and simultaneously generates a log file to conveniently mark the collected flow, and records a session key when the packet transmitter 6 is collected to generate a network request to conveniently decrypt and analyze the encrypted flow.
In the API encryption flow collection and labeling method based on the man-in-the-middle Agent, on one hand, the man-in-the-middle Agent client 2 is adopted to detect all network access requests applied by a User, analyze focused http and https flows, extract url links and parameters of the network access requests, generate corresponding interface documents so as to adjust some parameters such as User-Agent and the like, and simulate the flows generated by different network devices in different network environments on a large scale; the broker client 2 can generate a log file while detecting http/https traffic, and the log file records the application name and url link of the application request, wherein the format is as (application 1: { "https:// request1.Com" }, "application 2: {" https:// request2.Com "," https:// request3.Com "}), so that the application can be marked quickly in subsequent traffic collection; the client-server design framework greatly improves the efficiency of traffic collection and labeling by extracting API interface documents and log files from a small amount of application traffic on the PC computer 1 and then using the cloud server 3 to simulate large-scale traffic.
As a possible implementation manner, as shown in fig. 2, the specific implementation steps of the API encrypted traffic collection and labeling method based on the broker agent provided in the embodiment of the present invention include:
step S1: the network access request of the application installed in the PC computer 1 is monitored using the man-in-the-middle agent client 2.
In order to collect pure application API traffic, proxy forwarding is required for the application traffic, that is, all traffic of the target application process is proxied to a designated port, and the broker proxy client 2 monitors the port in real time to parse and forward the traffic.
The application flow agent forwarding method is changed according to different application types, is relatively simple for common Web applications, and is obviously more beneficial to flow collection and labeling because the browser basically realizes the functions of http/https flow agents, can proxy all flow of the browser to specified IP and ports, and uses a method of constructing a routing table by using proxy plug-ins to complete the forwarding function of API requests, and different Web application flows are forwarded to different ports through configuration rules.
For general user applications and system applications, the flow agent tool can be used to configure corresponding rules, and the flow of the target application is forwarded to the monitoring port of the man-in-the-middle agent client 2.
In order for the broker proxy client 2 to hijack the session between the target application and the target application server, the target application needs to trust and use the credentials of the broker proxy client, so that the session key between the target application and the broker proxy client 2 and the session key between the broker proxy client 2 and the target application server (web server) can be grasped, and decryption of the encrypted traffic can be achieved.
Step S2: the man-in-the-middle agent client 2 exports and stores the API interface document generated by analyzing the network access request and log information stored by the comparison PID.
The broker client 2 will parse all network request data packets according to the TCP/IP protocol stack format, and first determine the protocol type of the data packet, and since the embodiment mainly collects http/https traffic, traffic of other protocol types will be ignored. For an http/https protocol data packet, the broker client 2 judges whether the data packet is encrypted by using a TLS/SSL encryption suite, if the data packet is not encrypted, the application layer of the data packet is parsed according to an http protocol format to obtain a required URL link, a request mode, a request header field and other contents; if the data packet is encrypted by using an encryption algorithm, the intermediate proxy client 2 obtains the communication key by a session hijacking method through certificate replacement, so that the intermediate proxy client 2 can decrypt the encrypted application layer payload by using the communication key, and then decrypt the decrypted application layer plaintext data according to the same processing mode as http.
The intermediate proxy client 2 analyzes URL links obtained by http/https data packets, the contents such as request header fields and the like form corresponding API interface documents, the formats of { URL connection of API requests, parameter forms and flow network delay } are stored, all information of the data packet requests are recorded, a packet sender randomly changes the parameter forms and the flow network delay based on the recording formats, the generation capacity of batch API requests is realized, the parameter forms comprise request modes, request header and request data, the request modes comprise request modes of various http protocols such as GET, POST, HEAD, the request header comprises various header information of request lines such as Cookie, user-Agent, host and the like, and the request data comprise some important data which need to be encrypted by a POST mode.
The man-in-the-middle agent client 2 generates a specific log file by comparing the mapping relation between the application process PID and the source port of the data packet, for example, a network connection occupation port condition is monitored by using a 'netstat-aon|findstr' source port 'command, a port condition owned by the process is monitored by using a' tasklist|findstr 'PID' command, so that a one-to-one mapping relation between the API interface flow data packet and the application process PID (the mapping relation is only stored locally in the PC computer 1) is formed, and the accurate flow collection capability is further realized.
Step S3: the broker proxy client 2 uploads the saved API interface document and log file to the cloud server 3.
The man-in-the-middle client 2 writes the relation between the API interface and the application acquired in real time into a log file in the form of "application name-stream five-tuple (source address, destination address, source port, destination port, protocol type) list-API interface information", and uploads the relation to the cloud server 3 simultaneously with the API interface document generated during the process.
Step S4: the cloud server 3 receives a plurality of API interface documents sent by the broker client 2, stores all API records, integrates parameter forms and network delays in the records to obtain a parameter dictionary and a network delay variation range, can generate corresponding python script codes for each API interface document, generates new API requests in batches based on the parameter dictionary and the delay variation range in combination with the stored API records, executes the python codes to send the corresponding network requests, and then captures communication encrypted traffic and communication private keys generated by the same interface by using tcpdump.
That is, in step S4, the cloud server 3 receives the API interface documents and log files sent from the plurality of man-in-the-middle agent clients 2, and the packet sender 6 on the cloud server 3 generates the python script code using the information provided by the API interface documents, and because the API interface documents contain all necessary information requested by the network, the function of automatically generating the python packet script can be implemented using the postman tool. Then, the packet sender 6 loads a parameter dictionary library and a time delay factor library, continuously replaces related parameter information of a corresponding API, and then executes a python script to send out different data packets, wherein the parameter dictionary mainly comprises related fields such as GET parameters in url links of the API, user-Agent in http/https heads and the like, and the related fields are used for simulating the conditions of different users and different devices for initiating network requests in different environments. The TLS/SSL certificates used by the python script of the wrapper 6 are controllable and are replaced by the present invention so that the present invention does not decrypt the encrypted traffic collected by the traffic collector, facilitating the analysis of the encrypted traffic. The flow collector 4 of the cloud server 3 is mainly implemented by tcpdump, and is configured to collect a network request sent by a python script, generate an original flow file (Pcap) file, compare the original flow file with a log file corresponding to an API interface file, generate the corresponding API interface communication flow of the target application through statistics, and obtain a private key in a certificate used by the wrapper 6 at the same time, where different API interface files of the same application are stored in the same folder, and the folder contains the Pcap files collected by the different API interface flows and the private keys corresponding to the Pcap files.
The method and the system can analyze the encrypted flow of the API interface which belongs to a certain type of application, further mine some flow characteristic information and behavior patterns of the application, and particularly can decrypt and analyze the collected encrypted flow, analyze the influence factors of the encrypted flow, mine the potential user behavior pattern characteristics and help judge whether the behavior of the certain type of application leaks user privacy.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
What has been described above is merely some embodiments of the present invention. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit of the invention.

Claims (10)

1. An API encrypted traffic collection and labeling method based on an intermediary agent is characterized in that in a system comprising a cloud server and an application terminal provided with an intermediary agent client, the following steps are executed:
the method comprises the steps that an intermediary proxy client detects and intercepts a network access request sent by an application program of an application terminal to an application server, analyzes the network access request to generate a corresponding API interface document, wherein the API interface document comprises network request links and request parameters of different APIs of the application server; the man-in-the-middle agent client generates a simulated network access request within the parameter variable range according to the real flow information of the application terminal, and disguises the application program to establish connection and communication with the application server, sends the content returned by the application server to the corresponding application program, records and stores a communication log file and an API interface document at the same time, and uploads the communication log file and the API interface document to the cloud server;
the cloud server generates a corresponding script code for each API interface document, randomly selects parameters from an interface parameter dictionary and replaces the parameters of each API interface document, generates a simulation network access request in batches and sends the simulation network access request to the application server, when the simulation network access request is sent to the application server, the cloud server collects encrypted communication traffic and a communication private key generated by the same API interface, generates a traffic log file for traffic marking when collecting the encrypted communication traffic, and records that the collected communication private key is used for decrypting the encrypted communication traffic.
2. The method of claim 1, wherein the request parameters in the API interface document comprise: the encryption algorithm and key information used for each handshake.
3. The method of claim 1, wherein the application program of the application terminal includes user preference settings and cookie caching and is capable of accessing different web applications.
4. The method of claim 1, wherein the broker client parsing the network access request and generating and storing a corresponding API interface document specifically comprises:
screening and filtering the traffic protocol types in the network access request, and reserving http and https protocol traffic;
analyzing the network data packet of the network access request according to the corresponding protocol format, extracting url links and parameter fields in the network data packet, and storing the url links and the parameter fields in an xml file format.
5. The method of claim 1, wherein the broker client obtains the process corresponding to the current application by reading a process ID in an operating system of the application terminal, and obtains the traffic corresponding to the current application by a process source port number. The invention maps the process ID to a specific system application through a packet management application program interface of an operating system to obtain the mapping relation between the flow and the application.
6. The method of claim 1, wherein the broker client comprises an API request locating intercept module, an API request forwarding module, and an API request parameter recording module;
the API request positioning and intercepting module is used for detecting and intercepting a network access request sent by an application program to the application server, analyzing the network access request and generating a corresponding API interface document;
the API request forwarding module is used for generating a simulated network access request, establishing connection and communication between the disguised application program and the application server, and sending the content returned by the application server to the corresponding application program;
the API request parameter recording module is used for recording the communication log file and the API interface document.
7. The method of claim 1, wherein the cloud server is deployed at a physical network location capable of establishing a stable connection with an application server, comprising a wrapper and a traffic collector;
the package sender is used for generating a corresponding script code for each API interface document, randomly selecting parameters from an interface parameter dictionary and replacing the parameters of each API interface document, generating simulation network access requests in batches and sending the simulation network access requests to the application server;
the traffic collector monitors and collects encrypted communication traffic and a communication private key generated by the same API interface based on a preset traffic capturing tool.
8. The method of claim 7, wherein the generating, by the packet transmitter of the cloud server, the simulated network access request in batches and sending the simulated network access request to the application server is specifically:
generating communication flow of corresponding APIs in batches according to API interface documents and preset parameter dictionaries
Analyzing the API interface document, reading URL links and request parameters in the API interface document, loading a parameter dictionary on a cloud server, replacing related request parameters, generating corresponding python codes, executing the python codes to generate a simulation network access request, and sending the simulation network access request to an application server.
9. The method of claim 7, wherein the traffic collector stores the collected encrypted traffic in a format as: an API of a web application requests a format corresponding to a record of traffic pcap.
10. The method of claim 7, wherein the method for marking the collected encrypted communication traffic in the API log document uploaded by the broker client is as follows: the url links are compared.
CN202310769946.3A 2023-06-27 2023-06-27 API encrypted flow collection and labeling method based on man-in-the-middle agent Pending CN116723238A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310769946.3A CN116723238A (en) 2023-06-27 2023-06-27 API encrypted flow collection and labeling method based on man-in-the-middle agent

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310769946.3A CN116723238A (en) 2023-06-27 2023-06-27 API encrypted flow collection and labeling method based on man-in-the-middle agent

Publications (1)

Publication Number Publication Date
CN116723238A true CN116723238A (en) 2023-09-08

Family

ID=87875025

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310769946.3A Pending CN116723238A (en) 2023-06-27 2023-06-27 API encrypted flow collection and labeling method based on man-in-the-middle agent

Country Status (1)

Country Link
CN (1) CN116723238A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117834253A (en) * 2023-12-29 2024-04-05 北京天融信网络安全技术有限公司 Method and device for analyzing TLS (transport layer security) traffic and TLS communication traffic analysis system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117834253A (en) * 2023-12-29 2024-04-05 北京天融信网络安全技术有限公司 Method and device for analyzing TLS (transport layer security) traffic and TLS communication traffic analysis system

Similar Documents

Publication Publication Date Title
Husák et al. HTTPS traffic analysis and client identification using passive SSL/TLS fingerprinting
Anderson et al. Identifying encrypted malware traffic with contextual flow data
US11838330B2 (en) Selective information extraction from network traffic traces both encrypted and non-encrypted
US10965704B2 (en) Identifying self-signed certificates using HTTP access logs for malware detection
US9137215B2 (en) Methods and systems for non-intrusive analysis of secure communications
US6928471B2 (en) Method and apparatus for measurement, analysis, and optimization of content delivery
WO2020076388A1 (en) Triggering targeted scanning to detect rats and other malware
Ling et al. Novel packet size-based covert channel attacks against anonymizer
CN111147305A (en) Network asset portrait extraction method
CN107463848B (en) Application-oriented ciphertext search method, device, proxy server and system
Datta et al. Network traffic classification in encrypted environment: a case study of google hangout
CN116723238A (en) API encrypted flow collection and labeling method based on man-in-the-middle agent
KR101996044B1 (en) ICAP protocol extension method for providing network forensic service of encrypted traffic, network forensic device supporting it and web proxy
EP4106268B1 (en) Method for detecting anomalies in ssl and/or tls communications, corresponding device, and computer program product
Špaček et al. HTTPS event-flow correlation: improving situational awareness in encrypted web traffic
KR101919762B1 (en) An encrypted traffic management apparatus and method for decrypting encrypted traffics
Gancheva et al. TLS Fingerprinting Techniques
Song et al. Evaluating the Distinguishability of Tor Traffic over Censorship Circumvention Tools
Mohammed Network-Based Detection and Prevention System Against DNS-Based Attacks
Koshy et al. Privacy Leaks Via SNI and Certificate Parsing
CN117879932A (en) Encryption traffic detection method and device, storage medium and terminal
Hartmond et al. Client Monitoring with HTTPS
Victor Technical Report: PC Browser and Android Applications Fingerprinting
Biß et al. Device discovery and identification in industrial networks: Geräteerkennung und-identifizierung in industriellen Netzen
Carvalho Is Web Browsing Secure?

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination