[ summary of the invention ]
The technical problem to be solved by the invention is that the existing transparent proxy system needs to deliver the message generated by the application layer proxy program to the network, and the correct message forwarding can be realized only by using the transparent proxy system of the Linux bridge device and configuring the bridge device with an IP address and a correct gateway address and routing table. At this time, an IP address is often required to be configured for the bridge device to participate in routing based on the IP address, and under the condition that some network networking environments are complex, configuring the IP address for the bridge device and configuring a related routing table is very tedious and easy to make an error.
The invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for HTTP to HTTPs bidirectional transparent proxy, including:
after receiving a first HTTP request sent by a client, the proxy system analyzes an HTTP header field to obtain a Host field, compares the Host field with a built-in domain name list and stores the content of the first HTTP request; the domain name list is used for storing domain names meeting HTTPS redirection;
if the Host domain name is found in the list, the proxy system initiates TCP handshake with a target port of 443 to the server so as to establish a first TCP channel and perform TLS negotiation process through the first TCP channel; wherein an IP address of a client used in the TCP handshake;
the agent system sends a first HTTPS request to the server through a first TLS channel established by TLS negotiation; the first HTTPS request carries encrypted content of the first HTTP request content;
and the proxy system receives a first HTTPS response returned by the server, deletes the secure attribute of the cookie field in the HTTP header field of the first HTTPS response and the Strict-Transport-Security field contained in the HSTS, decrypts the content of the first HTTPS response into a plaintext, and then transmits the plaintext to the client.
Preferably, if the Host domain name is found not in the list, the method further comprises:
the proxy system initiates a TCP handshake with a target port of 80 to the server so as to establish a second TCP channel;
the proxy system sends a second HTTP request to the server through the second TCP channel, wherein the second HTTP request carries the content of the first HTTP request;
the proxy system receives a second HTTP response returned by the server and checks whether the second HTTP response is an HTTPS redirection;
if the redirection is HTTPS redirection, the proxy system adds the Host into the domain name list, discards the second HTTP response, simultaneously sends a TCP Reset message to the server, and closes the first TCP channel.
Preferably, the method further comprises:
the proxy system initiates a TCP handshake with a target port of 443 to the server so as to establish a second TCP channel and perform a TLS negotiation process through the second TCP channel; wherein an IP address of a client used in the TCP handshake;
the agent system sends a second HTTPS request to the server through a second TLS channel established by TLS negotiation; the second HTTPS request carries encrypted content of the second HTTP request content;
and the proxy system receives a second HTTPS response returned by the server, deletes the secure attribute of the cookie field in the HTTP header field of the second HTTPS response and the Strict-Transport-Security field contained in the HSTS, decrypts the content of the second HTTPS response into a plaintext, and then transmits the plaintext to the client.
Preferably, the checking whether the second HTTP response is an HTTPs redirect includes:
checking whether the status code of the second HTTP response is between 300 and 399, and checking that the redirected target address is an HTTPS version of the redirected previous Host; wherein, according to the HTTP protocol specification, the response status code of the interval represents redirection.
Preferably, if it is not an HTTPS redirect, the method further includes:
and the proxy system transmits the received second HTTP response to the client.
Preferably, the agent system includes: data packet transceiver module, virtual network card module, data packet routing module, agent program module, it is specific:
the data packet receiving and transmitting module is used for receiving and transmitting messages at the bottom layer and consists of a data packet receiving module and a data packet transmitting module, and the data packet receiving module is responsible for receiving data packets from the physical network card and forwarding the data packets to the virtual network card;
the virtual network card module is a standard TUN virtual network card working on a transmission layer and is used for analyzing the received IP data packet through a protocol stack of an operating system;
and the data packet routing module is used for routing the target message to the agent program module for processing, and comprises iptables rule configuration and policy routing configuration.
Preferably, the agent module operates in an application layer of an operating system and supports a Tproxy mechanism, specifically:
the IP _ TRANSPARENT parameter is set on the socket attribute, so that socket connection of any destination IP address is accepted, and meanwhile, data messages can be generated by taking any IP address as a source IP, so that the socket can bind any IP address.
Preferably, the method further comprises:
adding a first rule in the iptables rule:
iptables-t mangle-A PREROUTING-p tcp--dport 80-j TPROXY--tproxy-mark 0x1/0x1--on-port 0;
the first rule indicates that a label value of 1 is marked on a message with a transmission protocol of TCP and a destination port of 80;
adding a second rule in the iptables rule:
iptables-t mangle-A PREROUTING-p tcp--sport 443-j MARK--set-mark 1;
the second rule indicates that the label value of the message with the transmission protocol being TCP and the source port being 443 is marked with 1;
adding a third rule in the iptables rule:
iptables-t mangle-A OUTPUT-p tcp--dport 443-j MARK--set-mark2;
the third rule indicates that the label value of the message with the transmission protocol being TCP and the destination port being 443 is marked with 2;
add a fourth rule in the iptables rule:
iptables-t mangle-A OUTPUT-p tcp--sport 80-j MARK--set-mark2;
the fourth rule indicates that the packet with the transport protocol TCP and the source port 80 is tagged with a tag value of 2.
Preferably, the method further comprises policy routing configuration, specifically:
the message with the label value 1 is queried in the routing table 100:
ip rule add fwmark 1lookup 100;
establishing a routing table with the number of 100, and setting the content as the routing from the virtual network card of the agent system to the application layer of the agent system:
ip route add local 0.0.0.0/0dev lo table 100;
configuring a message query routing table 200 with a label value of 2:
ip rule add fwmark 0x2 table 200;
a routing table with the number of 200 is established, and the message is sent through the virtual network card tun 1:
ip route add default via 10.0.0.2dev tun1 table 200。
in a second aspect, the present invention further provides an HTTP to HTTPs bidirectional transparent proxy apparatus, for implementing the HTTP to HTTPs bidirectional transparent proxy method described in the first aspect, where the apparatus includes:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor for performing the HTTP to HTTPs bi-directional transparent proxy method of the first aspect.
In a third aspect, the present invention further provides a non-volatile computer storage medium, where the computer storage medium stores computer-executable instructions, which are executed by one or more processors, and are configured to perform the method for converting HTTP to HTTPs according to the first aspect.
The invention has realized a two-way transparent agent method and apparatus to HTTP changes HTTPS, in the data link layer, receive and send by the direct control message of the procedure, does not change the MAC address of the original message, have realized two-way transparency to the upstream and downstream routing equipment in the data link layer; in the IP transmission layer, the source IP address and the destination IP address are not changed by combining the virtual network card technology and the Tproxy technology, so that the bidirectional transparency of the client direction and the server direction is realized; on the application layer, HTTP-to-HTTPS proxy is realized, an HTTP connection is maintained between the proxy system and the client, and an HTTPS connection is maintained between the proxy system and the server, so that bidirectional transparent proxy of the application layer is realized.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the description of the present invention, the terms "inner", "outer", "longitudinal", "lateral", "upper", "lower", "top", "bottom", and the like indicate orientations or positional relationships based on those shown in the drawings, and are for convenience only to describe the present invention without requiring the present invention to be necessarily constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
The transparent agent building method used in the invention does not use IP address modification or adopt network bridge equipment, but directly reads the data packet of the network card from the bottom layer, and then performs agent on the data packet by combining the virtual network card technology. The method is characterized in that the network card of the data message is directly controlled to receive and transmit by using a program, and the message routing is not dependent on an operating system. This has two benefits:
firstly, a large amount of routing configuration work is avoided, such as configuring an IP address for a network bridge, configuring an operating system routing table, configuring a default gateway address for the network bridge, configuring an ARP address of the default gateway in an operating system and the like;
and secondly, the acquisition of the related address information of the upstream and downstream routing nodes of the current proxy system is avoided, the proxy system does not need to know the IP address and the MAC address of the next hop node of the data message routing in advance, because the MAC address of the original message is reserved in the proxy system, the MAC address of the original message is not modified when the message is received and sent, and the proxy system is transparent to the upstream and downstream routing equipment, so that the proxy system can be conveniently embedded into a complex network environment.
The traditional agent for the HTTPS is generally only data transfer of a TCP layer, and data forwarding is simply performed between an HTTPS client and an HTTPS server. This is so because HTTPS itself can prevent data modification by the man-in-the-middle, and if the man-in-the-middle acts on HTTPS traffic without a legal certificate, it may cause the HTTPS client to report an error, for example, a typical HTTPS client is a browser, and the browser may alert the end user to warn the user that the traffic may be hijacked. In order to solve the problem that the HTTPS flow is difficult to decrypt and act, the invention provides an HTTP-to-HTTPS agent, which bypasses the process that HTTPS encrypts the client flow and solves the problem that a client browser generates a certificate alarm due to the fact that a certificate of an agent system is not trusted under the HTTPS agent.
The invention realizes a bidirectional transparent proxy system for HTTP to HTTPS, wherein, at the data link layer, the program directly controls the message receiving and sending without changing the MAC address of the original message, and the bidirectional transparency is realized for the upstream and downstream routing devices at the data link layer; in an IP transmission layer, a virtual network card technology and a Tproxy technology are combined, a source IP address and a target IP address are not changed, and bidirectional transparency of the client direction and the server direction is realized; on the application layer, HTTP-to-HTTPS proxy is realized, an HTTP connection is maintained between the proxy system and the client, and an HTTPS connection is maintained between the proxy system and the server, so that bidirectional transparent proxy of the application layer is realized.
The first part is the bottom realization of the transparent agent technology, including message receiving and sending and realizing IP address transparency; the second part is how the agent system interacts with the message between the client and the server.
The transparent proxy bottom layer realizes that:
the overall role is divided into three categories, client, agent system and server. The proxy system maintains two connections simultaneously, one with the client and one with the server, and relays data between the two connections.
As shown in fig. 1, the proxy system is integrally divided into four modules, namely, a data packet transceiver module, a virtual network card module, a data packet routing module, and a proxy system module.
The data packet receiving and transmitting module is mainly responsible for receiving and transmitting messages of the bottom layer and consists of a data packet receiving module and a data packet transmitting module, wherein the data packet receiving module is responsible for receiving data packets from the network card and transmitting the data packets to the virtual network card equipment; the virtual network card module is a standard TUN virtual network card working on a transmission layer and is mainly responsible for enabling a received IP data packet to enter a protocol stack of an operating system for analysis; the data packet routing module is mainly responsible for routing the target message to the agent program module for processing, and specifically comprises iptables rule configuration, policy routing configuration and the like. Under the default condition, the operating system finds that the destination IP address is not a data packet of the IP of the operating system, and the data packet is not transmitted to the application layer but discarded or forwarded. The agent program module works in an application layer of an operating system, is an application program, and sets an IP _ TRANSPARENT parameter on a socket attribute in order to process a message of any destination IP address, namely, to support a Tproxy mechanism, so that socket connection of any destination IP address can be accepted, and simultaneously, a data message can be generated by taking any IP address as a source IP, namely, the socket can bind any IP address.
The flow direction of the data packet in the agent system is shown in fig. 2, the agent program module works in the application layer, and maintains two socket connections with the client and the server at the same time, the flow direction formed by several lines with the numbers of 1, 2, 3,4, 5, 6 represents the connection maintained by the agent system and the client, the flow direction formed by several lines with the numbers of 7, 8, 9, 10, 11, 12 represents the connection maintained by the agent system and the server, and the line with the number of 13 represents the traffic of non-HTTP, and the traffic is not delivered to the application layer for processing, but is directly forwarded by the network card. The traffic of which the data packet transceiver module is responsible for processing is the traffic messages with the numbers 2,5,8,11 and 13, specifically, the data packet receiver module is responsible for processing the traffic messages represented by the numbers 2,11 and 13, and the data packet transmitter module is responsible for processing the traffic messages represented by the numbers 5 and 8; the data packet routing module is responsible for processing the flow messages with the serial numbers of 3,4,7 and 12.
Message processing flow of the agent program module:
the HTTP-to-HTTPS proxy in the invention essentially realizes HTTP redirection jumping hijacking. The modern website gradually adopts HTTPS to protect the transmission content from being modified, but in order to be compatible with the HTTP traffic accessed by the default of the user, the HTTP service of an 80 port is generally kept, when the user accesses the HTTP service of the 80 port, a redirection mechanism of an HTTP protocol is utilized to send 301/302 redirection commands to the user, the user is guided to the corresponding HTTPS service, and after receiving redirection, the HTTP client of the user accesses the redirected HTTPS service according to the specification of the HTTP protocol. The HTTP-to-HTTPS proxy function realized by the invention essentially utilizes and hijacks the redirection mechanism.
The agent program module transmits and interacts the messages of the client and the server:
when a client side initiates an HTTP request, the proxy system firstly probes a target domain name, the specific probing process is to initiate the HTTP request to the corresponding domain name and check whether an HTTP response state code is a redirection state code such as 301/302, if the redirection state code is a redirection state code, and the redirected website address is an HTTPS version of the website address, the HTTPS request is initiated to the corresponding domain name and the returned response content of the HTTPS is transmitted to the client side through an HTTP channel. By the operation, the proxy system and the client maintain an HTTP connection, but maintain an HTTPS connection with the server, the 302 redirection action of the client is successfully hijacked, and the client cannot be upgraded to the HTTPS connection all the time. If the detection result is that the target domain name does not meet the HTTPS redirection condition, the proxy system simultaneously communicates with the client and the server by using an HTTP channel, and data is transferred between the client and the server, and then the proxy system works in an HTTP proxy mode.
In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Example 1:
embodiment 1 of the present invention provides a method for converting HTTP into HTTPs for a bidirectional transparent proxy, as shown in fig. 3, including:
in step 101, after receiving a first HTTP request sent by a client, a proxy system parses a HTTP header field to obtain a Host field, compares the Host field with a built-in domain name list, and stores the content of the first HTTP request; the domain name list is used for storing domain names meeting HTTPS redirection.
In step 102, if the Host domain name is found in the list, the proxy system initiates a TCP handshake with a target port of 443 to the server, so as to establish a first TCP channel and perform a TLS negotiation process through the first TCP channel; wherein an IP address of a client used in the TCP handshake; wherein the TLS negotiation process comprises: negotiating an encryption suite, transferring certificates, verifying certificates, and computing keys.
In step 103, the agent system sends a first HTTPS request (also expressed as HTTP request in other embodiments of the present invention, and described herein as HTTPS request for intuitive representation of technical characteristics) to the server through a first TLS channel established by TLS negotiation; and the first HTTPS request carries the encrypted content of the first HTTP request content.
In step 104, the proxy system receives the first HTTPS response returned by the server, deletes the secure attribute of the cookie field in the HTTP header field of the first HTTPS response and the Strict-Transport-Security field included in the HSTS, decrypts the content of the first HTTPS response into a plaintext, and passes the plaintext to the client.
The embodiment of the invention realizes a bidirectional transparent proxy method for HTTP to HTTPS, wherein, at a data link layer, a program directly controls the receiving and sending of messages, the MAC address of the original message is not changed, and bidirectional transparency is realized on upstream and downstream routing equipment at the data link layer; in the IP transmission layer, the source IP address and the destination IP address are not changed by combining the virtual network card technology and the Tproxy technology, so that the bidirectional transparency of the client direction and the server direction is realized; on the application layer, HTTP-to-HTTPS proxy is realized, an HTTP connection is maintained between the proxy system and the client, and an HTTPS connection is maintained between the proxy system and the server, so that bidirectional transparent proxy of the application layer is realized.
With reference to the embodiment of the present invention, there is also an extended implementation scheme, where as a parallel optional situation of step 102 in embodiment 1, if it is found that the Host domain name is not in the list, as shown in fig. 4, the method further includes:
in step 105, the proxy system initiates a TCP handshake to the server with target port 80 to establish a second TCP tunnel.
In step 106, the proxy system sends a second HTTP request to the server through the second TCP channel, where the second HTTP request carries the content of the first HTTP request.
In step 107, the proxy system receives a second HTTP response returned by the server and checks if said second HTTP response is an HTTPs redirect.
Wherein, checking whether the second HTTP response is an HTTPs redirect specifically includes:
checking whether the status code of the second HTTP response is between 300 and 399 (because the redirection function is not only an HTTPS redirection but also any URL redirection can be carried out, and the HTTPS redirection is only one application), and checking that the redirected target address is an HTTPS version of the redirection front Host; wherein, according to the HTTP protocol specification, the response status code of the interval represents redirection.
In step 108, if the redirection is HTTPS redirection, the proxy system adds Host to the domain name list, discards the second HTTP response, and sends a TCP Reset packet to the server to close the first TCP tunnel.
In connection with the embodiment of the present invention, there is also an extended implementation, as shown in fig. 5, after the step 108 is performed, preferably, the method further includes:
in step 109, the proxy system initiates a TCP handshake with a target port 443 to the server, so as to establish a second TCP channel and perform a TLS negotiation process through the second TCP channel; wherein the IP address of the client used in the TCP handshake, wherein the TLS negotiation process comprises: negotiating an encryption suite, transferring certificates, verifying certificates, and computing keys.
In step 110, the proxy system sends a second HTTPS request to the server through a second TLS channel established by the TLS negotiation; the second HTTPS request carries encrypted content of the second HTTP request content;
in step 111, the proxy system receives the second HTTPS response returned by the server, deletes the secure attribute of the cookie field in the HTTP header field of the second HTTPS response and the Strict-Transport-Security field included in the HSTS, decrypts the content of the second HTTPS response to a plaintext, and passes the plaintext to the client.
As a complete technical solution implementation consideration, if the determination result in the step 108 is another case, the corresponding technical solution implementation content is represented as: if not HTTPS redirection is found, the method further comprises: and the proxy system transmits the received second HTTP response to the client.
In the embodiment of the present invention, a preferred proxy system framework structure is further provided, as shown in fig. 2, including: data packet transceiver module, virtual network card module, data packet routing module, agent program module, it is specific:
the data packet receiving and transmitting module is used for receiving and transmitting messages at the bottom layer and consists of a data packet receiving module and a data packet transmitting module, and the data packet receiving module is responsible for receiving data packets from the physical network card and forwarding the data packets to the virtual network card;
the virtual network card module is a standard TUN virtual network card working on a transmission layer and is used for analyzing the received IP data packet through a protocol stack of an operating system;
and the data packet routing module is used for routing the target message to the agent program module for processing, and comprises iptables rule configuration and policy routing configuration. Under the default condition, the operating system finds that the destination IP address is not a data packet of the IP of the operating system, and the data packet is not transmitted to the application layer but discarded or forwarded.
The agent program module works in an application layer of an operating system and supports a Tproxy mechanism, and specifically comprises the following steps:
the IP _ TRANSPARENT parameter is set on the socket attribute, so that socket connection of any destination IP address is accepted, and meanwhile, data messages can be generated by taking any IP address as a source IP, so that the socket can bind any IP address.
As a bottom layer mechanism element for supporting the implementation of the relevant method steps in embodiment 1 of the present invention, the method further includes:
adding a first rule in the iptables rule:
iptables-t mangle-A PREROUTING-p tcp--dport 80-j TPROXY--tproxy-mark 0x1/0x1--on-port 0;
the first rule indicates that a label value of 1 is marked on a message with a transmission protocol of TCP and a destination port of 80;
adding a second rule in the iptables rule:
iptables-t mangle-A PREROUTING-p tcp--sport 443-j MARK--set-mark 1;
the second rule indicates that the label value of the message with the transmission protocol being TCP and the source port being 443 is marked with 1;
adding a third rule in the iptables rule:
iptables-t mangle-A OUTPUT-p tcp--dport 443-j MARK--set-mark2;
the third rule indicates that the label value of the message with the transmission protocol being TCP and the destination port being 443 is marked with 2;
add a fourth rule in the iptables rule:
iptables-t mangle-A OUTPUT-p tcp--sport 80-j MARK--set-mark2;
the fourth rule indicates that the packet with the transport protocol TCP and the source port 80 is tagged with a tag value of 2.
Matching with the iptables rule, the policy routing configuration corresponding to 9 is also needed to implement, specifically:
a message with a tag value of 1 is queried in the routing table 100 (the routing table 100 is only used for convenience of presentation of the following instruction, and in the actual use process, the routing table 100 may also be expressed as other self-defined character strings, and therefore, it should not be taken as a special limitation of the protection scope of the present invention):
ip rule add fwmark 1lookup 100;
establishing a routing table with the number of 100, and setting the content as the routing from the virtual network card of the agent system to the application layer of the agent system:
ip route add local 0.0.0.0/0dev lo table 100;
configuring a message query routing table 200 with a label value of 2:
ip rule add fwmark 0x2 table 200;
a routing table with a number of 200 is established, and the message is sent through the virtual network card tun1 (actually, the message may also be a virtual network card with other numbers, such as tun2, tun5, etc., which is not specifically limited here):
ip route add default via 10.0.0.2dev tun1 table 200。
next, the method execution process of the main function module in the proxy system related to the present invention is described one by one through a plurality of embodiments, and the first and second expressions in embodiment 1 of the present invention will not be continued in the corresponding embodiments, it should be noted that the first, second, and third prefixes used in the embodiments of the present invention are only used for distinguishing objects more clearly in the process of the citation, and do not have special limiting meanings, and the following embodiments can be associated with the relevant objects related to the embodiments of the present invention through the expression of the context even without adding the first and second calibrations, and thus, the description of the related objects is omitted here and in the following.
Example 2:
the work flow of the data packet receiving module is shown in fig. 6:
in step S121, a message is received from the physical network card, the network card in this step includes an uplink network card and a downlink network card, at this time, the program receives the message and does not depend on the analysis and processing of the message by the operating system, but directly reads the message from the network card, and at this time, the message includes an ethernet header, an IP header, and the like, and is a complete message.
In step S122, the message is analyzed, in which the source MAC address, the source IP address, the source port, the destination MAC address, the destination IP address, and the destination port of the message are obtained, and meanwhile, the correspondence between the IP address and the MAC address is recorded and stored in the IP-MAC correspondence table.
In step S123, it is checked whether the packet is a destination packet, where the checking conditions in this step include two conditions, one is to check whether the destination port is 80 ports of TCP, and the other is to check whether the source port is 443 ports of TCP, and these two conditions are in an or relationship, that is, as long as one condition is hit, the packet belongs to the destination packet.
In step S124, the ethernet header is stripped off, and the ethernet header of the message hit in step S123 is stripped off, and only the data portion above the IP header is reserved.
In step S125, the IP data message generated in step S124 is sent to the virtual network card device.
In step S126, the message is sent to the physical network card of the opposite terminal, that is, the message that is not hit in step S123 is directly delivered to the physical network card of the opposite terminal. Here, the physical network card of the opposite terminal refers to that if the message is received from the uplink network card, the message is transferred out from the downlink network card; otherwise, if the message is received from the downlink network card, the message is transferred out from the uplink network card.
Example 3:
the work flow of the data packet receiving module is shown in fig. 7:
in step S131, a message is received from the virtual network card, where the step is to read a message from the virtual network card device, the virtual network card is a TUN-type network card at this time, and the received message is an IP data message with an IP header and has no ethernet header.
In step S132, an ethernet header is added, which means that an ethernet header is added to the IP datagram, including the source MAC address and the destination MAC address. The principle of the addition is that an IP-MAC corresponding table generated by the data packet receiving module is inquired, a source MAC is set as an MAC address corresponding to the source IP, and a target MAC is set as an MAC address corresponding to the target IP.
In step S133, it is detected whether the destination port is 443, and if the destination port of the packet is 443, the determination condition is satisfied.
In step S134, the ethernet data packet is sent to the physical network card of the downstream port, where the ethernet data packet is sent to the physical network card of the downstream port.
In step S135, the ethernet data packet is sent to the physical network card of the upstream port, where the ethernet data packet is filled with the ethernet header.
Example 4:
the configuration process involves several aspects such as route forwarding configuration, virtual network card configuration, iptables rule configuration, policy routing configuration, and the like, as shown in fig. 8.
S141 is a route forwarding configuration, which is to turn on the route forwarding function of the Linux system. Under the default condition, the operating system finds that the destination IP address is not the data message of the IP address of the operating system and discards the data message; if the routing forwarding function is started, the operating system will forward the data packet whose destination IP address is not the own IP address, instead of discarding the data packet, which is equivalent to the routing processing performed by the operating system on the data packet. The method for starting the route forwarding is simple, the configuration can be carried out in a command mode, the configuration can also be carried out in a configuration file writing mode, and the names and the positions of network configuration files of different Linux system release versions are possibly different. Commands configured by command lines are as follows:
sysctl-w net.ipv4.ip_forward=1;
s142 is a virtual network card configuration, which is to generate a virtual network card for receiving the IP packet message, so that the IP packet enters the local protocol stack for processing.
A typical configuration method is as follows:
first, a TUN-type virtual network card device is created, which is named TUN5:
ip tuntap add mode tun tun5;
activating the virtual network card:
ip link set tun5 up;
adding an IP address for the virtual network card:
ip addr add 10.0.0.2/24dev tun5;
the IP address of the virtual network card does not influence the function of the whole agent system and can be set at will.
S143 is an iptables rule configuration, which is mainly intended to tag the target packet. The destination packet includes two parts, the first part is traffic represented by the line No. 3 and the line No. 12 in fig. 2, and the second part is traffic represented by the line No. 4 and the line No. 7 in fig. 2. The flow represented by the line No. 3 and the line No. 12 is labeled, so that the message of which the destination address is not the IP address of the local machine can be routed to the application layer of the local machine for processing, but not forwarded; the traffic represented by the line 4 and the line 7 is labeled because the part of the traffic inquires the default routing table and is routed to the physical network card, and the traffic is also labeled for routing to the virtual network card in the invention.
The first type of rule is to label the traffic represented by the line No. 3 and the line No. 12 in fig. 2, and a typical configuration method is as follows, and the following rule is added by using an iptables tool:
iptables-t mangle-A PREROUTING-p tcp--dport 80-j TPROXY--tproxy-mark 0x1/0x1--on-port 0;
this rule indicates that, when the transmission protocol is TCP and the message with the destination port of 80 is labeled with a value of 1, the message is delivered to the application layer to the program with the port of 80 for processing, and this part of the traffic matches the traffic represented by the line 3 in fig. 2.
iptables-t mangle-A PREROUTING-p tcp--sport 443-j MARK--set-mark 1;
This rule indicates that the packet whose transport protocol is TCP and whose source port is 443 is labeled with a value of 1, and this part of the traffic matches the traffic represented by line 12 in fig. 2.
The second rule is to label the traffic represented by lines 4 and 7 in fig. 2:
iptables-t mangle-A OUTPUT-p tcp--dport 443-j MARK--set-mark2;
iptables-t mangle-A OUTPUT-p tcp--sport 80-j MARK--set-mark2;
s144 is policy routing configuration, which is partly for use with iptables rules. Firstly, the message of the virtual network card can be delivered to an application layer program for processing, namely, the flow messages represented by the line No. 3 and the line No. 12 in the graph 2 are matched with the policy route; secondly, the message generated by the application program can be delivered to the virtual network card instead of the physical network card, and the flow messages represented by the line No. 4 and the line No. 7 in fig. 2 are matched with the policy routing.
The specific configuration method is as follows, and the message with the label value of 1 is made to query the routing table 100:
ip rule add fwmark 1 lookup 100;
establishing a routing table with the number of 100, and setting the content as routing to an application layer:
ip route add local 0.0.0.0/0dev lo table 100;
configuring a message query routing table 200 with a label value of 2:
ip rule add fwmark 0x2 table 200;
a routing table with the number of 200 is established, and the message is sent through the virtual network card tun5:
ip route add default via 10.0.0.2dev tun5 table 200。
example 5:
fig. 9 is a process of message handling by the agent module, which maintains an HTTP or HTTPs connection with the client and the server at the same time, and relays data between the client and the server.
Role:
c: client, representing a Client, a typical HTTP Client being a browser;
p: proxy, representing a Proxy system;
s: a Server, which represents a Server side, typically an HTTP Server side, such as various websites accessed;
and (3) interaction direction:
- > represents a one-way transmission message;
-representing the two parties sending and receiving messages to each other;
name interpretation:
TCP: transmission Control Protocol, Transmission Control Protocol;
HTTP: HyperText Transfer Protocol, Hypertext Transfer Protocol;
TLS: transport Layer Security, secure Transport Layer protocol;
HSTS: HTTP strong Transport Security, HTTP Strict secure Transport;
TCP handshake: the method refers to a three-way handshake process which needs to be executed when two sides in a TCP protocol establish connection;
TLS handshake: the method comprises the following steps that in a TLS protocol, a client and a server establish a negotiation process of TLS connection, and the process has a plurality of message interactions;
HTTP GET: the method refers to a resource request initiated by a client in an HTTP (hyper text transport protocol);
HTTP Response, which means that in the HTTP protocol, the server side replies the Response content of the client side;
host is an HTTP header field in the HTTP request message, and the value of the HTTP header field is the accessed target domain name;
as shown in fig. 9, the message interaction process between the proxy system and the client and server is as follows:
in step S201, the client and the proxy system initiate a TCP handshake, and establish a TCP connection, where the target port is an 80 port. To the client, it appears that a TCP connection is being made with the server, but because the traffic at this point has already been routed to the proxy system, the client actually establishes a TCP connection with the proxy system.
In step S202, the client initiates an HTTP request to the proxy system, where the HTTP is a plaintext request and the destination port is 80 ports. The client appears to be initiating an HTTP request to the server, and because the traffic at this point has already been routed to the proxy system, the client actually sends the request to the proxy system.
In step S203, after receiving the HTTP request sent by the client, the proxy system parses the HTTP header field to obtain the Host field, and compares the Host field with the built-in domain name list; the domain name list refers to all domain names meeting the HTTPS redirection, and if a domain name is located in the list, it indicates that the domain name has been subjected to the HTTPS redirection detection before, and the domain name is subjected to the HTTPS redirection. If the Host domain name is found in the list, the step S204 is entered; otherwise, the process proceeds to step S210.
In step S204, the proxy system initiates a TCP handshake to the server, target port 443; at this time, the IP address seen by the server is the IP address of the client, and the existence of the proxy system cannot be found.
In step S205, the proxy system and the server perform a TLS negotiation process, which is based on the TCP tunnel established in step S204; in the TLS negotiation process, a plurality of messages are interacted, and in the process, the two parties can complete the work of negotiating an encryption suite, transferring a certificate, checking the certificate, calculating a secret key and the like.
In step S206, the proxy system sends an HTTP request to the server, the request being based on the TLS channel established in step S205, the HTTP request being an encrypted request, and the specific request content being obtained from step S202;
in step S207, the server replies to the proxy system with HTTP response content, which is based on the TLS channel established in step S205, encrypted, and not in clear.
In step S208, the HTTP response is processed. After receiving the HTTP response from the server, the proxy system needs to process the response content and then send the response content to the client. The HTTP response is handled here because an HTTPs to HTTP adaptation is performed. The attribute of some fields in the HTTP header field relates to an HTTPs transmission mode, and typically has two fields, one is a secure attribute of a cookie field, which indicates that the cookie can only be transmitted in an HTTPs channel and cannot be transmitted in an HTTP channel; if the attribute is not deleted, the cookie can not be transmitted in a plaintext transmission channel such as HTTP, and various server authentication problems can be caused; and secondly, the HSTS has the function of forcing the client to establish connection with the server by using the HTTPS, and the HSTS is set by including a Strict-Transport-Security field in an HTTP response header. The HTTP protocol specifies that the HSTS field set for unencrypted transmission is invalid, but the HSTS field present in the HTTP clear text transmission channel raises the client's doubt, so this field is deleted.
In step S209, the proxy system returns an HTTP response to the client, where the content of the HTTP response may be adapted or not changed in step S208; if the step is reached from the step S213, no modification of the HTTP response content is required.
In step S210, the proxy system initiates a TCP handshake to the server, and the target port is 80 ports.
In step S211, the proxy system sends an HTTP request to the server, the HTTP request being based on the TCP channel established in step S210, the HTTP request being a clear text request, and the specific request content being obtained from step S202.
In step S212, the server replies to the proxy system with HTTP response content, which is transmitted in clear text based on the TCP channel established in step S210.
In step S213, it is checked whether the HTTP response is an HTTPs redirect. The step mainly checks the HTTP response content obtained in the step S212, namely, whether the state code of the HTTP response is between 300 and 399 is checked, and the response state code of the interval represents redirection according to the HTTP protocol specification; and secondly, whether the redirected target address is an HTTPS version of Host before redirection is checked, because the redirection function is not only an HTTPS redirection but also any URL redirection can be carried out, and the HTTPS redirection is only one application. After checking, if the HTTPS redirection is found, then step S214 is skipped, otherwise step S209 is skipped.
In step S214, the proxy system and the server disconnect the TCP connection. The proxy system finds that the HTTP response content is HTTPS redirection at the moment, and at the moment, the proxy system does not forward the response to the client, but discards the HTTP response, and simultaneously sends a TCP Reset message to the server and closes a TCP channel;
in step S215, the Host is added to the domain name list. At this time, the agent system has already finished the detection to the target domain name, know that the target domain name meets the condition that HTTPS redirects, so join the domain name list, when the agent system processes this domain name next time, it can be to act on directly, does not need to detect repeatedly;
all the traffic passing through the agent system can be divided into 3 interactive flows according to different situations:
the first, interaction flow A: the HTTP proxy mode, namely the proxy system and the client use HTTP connection, and the server side also uses HTTP connection; the proxy system firstly detects the server, finds that the server does not meet the HTTPS redirection condition, and then switches to the HTTP proxy mode;
second, interaction flow B: the HTTP-HTTPS proxy mode is changed, namely the proxy system uses HTTP connection with the client and uses HTTPS connection with the server; the agent system detects the domain name before, finds that the domain name meets the HTTPS redirection condition, and directly performs HTTPS agent;
the third, interaction flow C: the HTTP changes into HTTPS proxy mode, and the initial detection flow is superposed; the agent system firstly detects the server, finds that the server meets an HTTPS redirection condition and then transfers to an HTTPS agent;
when the agent system uses the interaction flow a, the working steps are S201, S202, S203, S210, S211, S212, S213, and S209, and the interaction flow of the message is as shown in fig. 10. Suppose the IP address used by the client is 1.1.1.1 and the port is 1024; the IP address of the server is 2.2.2.2, and the port is 80; the proxy system itself has an IP address of 3.3.3.3. The connections between the client and the proxy system use IP addresses of 1.1.1.1 and 2.2.2.2 and ports of 1024 and 80, at which time the proxy system assumes itself to be the server to establish a connection with the client. The connection between the proxy system and the server uses IP addresses of 1.1.1.1 and 2.2.2.2 and ports of 2000 and 80, at which point the proxy system assumes itself to be the client to establish a connection with the server. It can be seen that in the whole communication process, the own IP address of the proxy system does not appear, only the source port is changed, and because the source port is randomly selected, the proxy system is still a bidirectional transparent proxy system even if the source port is changed. After passing through the proxy system, the MAC address, IP address, HTTP message content, etc. of the message remain unchanged, and what is changed is the source port field of the TCP protocol.
When the agent system uses the interaction flow B, the working steps are S201, S202, S203, S204, S205, S206, S207, S208, and S209, and the interaction flow of the message is as shown in fig. 11. Suppose the IP address used by the client is 1.1.1.1 and the port is 1024; the IP address of the server side is 2.2.2.2, and the ports are 80 and 443; the proxy system itself has an IP address of 3.3.3.3. When the client initiates an HTTP access to the server, the proxy system establishes an HTTP connection with the client using the IP address of 2.2.2.2 and the port 80, and simultaneously establishes an HTTPS connection with the real server using the IP address of 1.1.1.1 and a random source port. In the whole communication process, an HTTP connection is formed between the client and the proxy system; an HTTPS connection is between the proxy system and the server, and this connection is a trusted connection because the proxy system is now in use as a client and the certificate is that of the server, which is a trusted certificate. In the whole communication process, the IP address of the proxy system cannot appear, only the source port is changed, and the source port is randomly selected by the client, so that the proxy system is still a bidirectional transparent proxy system even if the source port is changed. After passing through the proxy system, the MAC address and the IP address of the packet are kept unchanged, the source port field of the TCP protocol is changed, or the HTTP protocol is changed into the HTTPs protocol, and the HTTP header field may be changed.
As shown in fig. 13, the original HTTP response header field that represents the server sent to the client; note that there is a stream-Transport-Security header field inside, and in addition, the Set-Cookie field sets a Security attribute for the Cookie.
As shown in fig. 14, an HTTP response header modified by the proxy system is represented; the proxy system deletes the cookie's secure attribute along with the Strict-Transport-Security header field.
When the agent system uses the interaction flow C, the working steps are S201, S202, S203, S210, S211, S212, S213, S214, S215, S204, S205, S206, S207, S208, and S209, and the interaction flow of the message is as shown in fig. 12. Suppose the IP address used by the client is 1.1.1.1 and the port is 1024; the IP address of the server side is 2.2.2.2, and the ports are 80 and 443; the proxy system itself has an IP address of 3.3.3.3. When the client initiates HTTP access to the server, the proxy system uses the IP address of 2.2.2.2 and the port 80 to establish an HTTP connection with the client, and simultaneously, the proxy system uses the IP address of 1.1.1.1 and a random source port to establish an HTTP connection with the real server, and probes whether the response of HTTP is HTTPS redirection. Once probing is completed, and the server is found to meet the HTTPS redirection condition, the proxy system closes the HTTP connection with the server, and establishes an HTTPS connection with the server using the IP address of 1.1.1.1 and a random source port. In the subsequent communication process, an HTTP connection is formed between the client and the proxy system, and an HTTPs connection is formed between the proxy system and the server, and the HTTPs connection is a trusted connection because the proxy system is used as the client. In the whole communication process, the source port is changed, and the source port is randomly selected by the client and can be changed. In the proxy system and client connection, the server port is 80; in the connection between the proxy system and the server, the server port is 443, but the own IP address of the proxy system does not appear during the whole communication process, and the proxy system is a bidirectional transparent proxy system.
Example 6:
fig. 15 is a schematic diagram illustrating an architecture of an HTTP-to-HTTPs bidirectional transparent proxy apparatus according to an embodiment of the present invention. The HTTP to HTTPs bidirectional transparent proxy apparatus of the present embodiment includes one or more processors 21 and a memory 22. In fig. 15, one processor 21 is taken as an example.
The processor 21 and the memory 22 may be connected by a bus or other means, and the bus connection is exemplified in fig. 15.
The memory 22 is a non-volatile computer-readable storage medium, and can be used to store a non-volatile software program and a non-volatile computer-executable program, such as the HTTP to HTTPs bidirectional transparent proxy method in embodiment 1. Processor 21 executes the HTTP to HTTPs bi-directional transparent proxy method by executing non-volatile software programs and instructions stored in memory 22.
The memory 22 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 22 may optionally include memory located remotely from the processor 21, and these remote memories may be connected to the processor 21 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The program instructions/modules are stored in the memory 22 and when executed by the one or more processors 21, perform the HTTP to HTTPs bidirectional transparent proxy method of embodiment 1 described above, for example, perform the steps shown in fig. 3 to 9 described above.
It should be noted that, for the information interaction, execution process and other contents between the modules and units in the apparatus and system, the specific contents may refer to the description in the embodiment of the method of the present invention because the same concept is used as the embodiment of the processing method of the present invention, and are not described herein again.
Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.