CN109672650A - Websites collection system, method and data processing method - Google Patents

Websites collection system, method and data processing method Download PDF

Info

Publication number
CN109672650A
CN109672650A CN201710965355.8A CN201710965355A CN109672650A CN 109672650 A CN109672650 A CN 109672650A CN 201710965355 A CN201710965355 A CN 201710965355A CN 109672650 A CN109672650 A CN 109672650A
Authority
CN
China
Prior art keywords
targeted website
website
data flow
data
targeted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710965355.8A
Other languages
Chinese (zh)
Inventor
孙建亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201710965355.8A priority Critical patent/CN109672650A/en
Publication of CN109672650A publication Critical patent/CN109672650A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Abstract

The invention discloses a kind of websites collection system, method and data processing methods.Wherein, this method comprises: from intercepted data stream from the Web portal of public cloud, wherein data flow when data flow is at least one targeted website of at least one client device access;At least one targeted website is determined based on data flow, and is classified at least one determining targeted website.The present invention solves the technical issues of being difficult to Accurate classification using the website that security protocol is transmitted in the prior art.

Description

Websites collection system, method and data processing method
Technical field
The present invention relates to internet areas, in particular to a kind of websites collection system, method and data processing side Method.
Background technique
Primarily directed to http protocol, main cause is that http protocol is plaintext agreement, Neng Gouti for the classification of current web URL and relevant GET, POST, Response information are taken, therefore the website HTTP can be carried out according to these information multiple The classification of dimension.
But continuous with HTTPS is popularized, and many web-sites all begin to use HTTPS externally to provide service.Due to The encryption of HTTPS web site contents, in the case where not grasping certificate and private key and correlation random number, HTTPS flow is observed in bypass Person appears to a pile random number, so being difficult to classify to the website HTTPS.
Aiming at the problem that being difficult to Accurate classification using the website that security protocol is transmitted in the prior art, not yet propose have at present The solution of effect.
Summary of the invention
The embodiment of the invention provides a kind of websites collection system, method and data processing methods, existing at least to solve The technical issues of being difficult to Accurate classification using the website that security protocol is transmitted in technology.
According to an aspect of an embodiment of the present invention, a kind of websites collection system is provided, comprising: at least one client Equipment accesses at least one targeted website for the Web portal by public cloud;Mirror-image system is bypassed, for receiving from network Inlet carries out the data flow that mirror image obtains to original data stream, determines at least one client device based on received data flow At least one targeted website of access;Classified service device, for classifying at least one determining targeted website.
According to another aspect of an embodiment of the present invention, a kind of Website classification method is additionally provided, comprising: from the net of public cloud Network inlet intercepted data stream, wherein number when data flow is at least one targeted website of at least one client device access According to stream;At least one targeted website is determined based on data flow, and is classified at least one determining targeted website.
According to another aspect of an embodiment of the present invention, a kind of data processing method is additionally provided, comprising: obtain application layer and add Ciphertext data, wherein application layer encryption data include HTTPS data flow, and HTTPS data flow includes server name instruction SNI word Section;Based on application layer encryption data, targeted website to be visited is determined.
According to another aspect of an embodiment of the present invention, a kind of storage medium is additionally provided, storage medium includes the journey of storage Sequence, wherein the Website classification method that equipment where control storage medium executes in program operation;Or data processing side Method.
According to another aspect of an embodiment of the present invention, a kind of processor is additionally provided, processor is used to run program, In, the Website classification method that executes when program is run;Or data processing method.
According to another aspect of an embodiment of the present invention, a kind of system is additionally provided, comprising: processor;And memory, with Processor connection, for providing the instruction for handling following processing step for processor: intercepting and capturing number from the Web portal of public cloud According to stream, wherein data flow when data flow is at least one targeted website of at least one client device access;Based on data flow It determines at least one targeted website, and classifies at least one determining targeted website.
In the prior art, can only classify to the website for using http protocol, and using the website of non-encrypted agreement, And for having used the website of secure transfer protocol, it, can not be to making since its data flow is only messy code in third party Classified with the website of secure transfer protocol.And the above embodiments of the present application are from intercepted data from the Web portal of public cloud Stream, wherein data flow when data flow is at least one targeted website of at least one client device access, it is true based on data flow At least one fixed targeted website, and classify at least one determining targeted website.Above scheme is by setting client The original data stream of standby access target website carries out mirror image and obtains data flow, is accessed by data flow to obtain client device Targeted website, and then can classify to targeted website.
The application above scheme, which solves, as a result, is difficult to Accurate classification using the website that security protocol is transmitted in the prior art The technical issues of.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the schematic diagram according to a kind of websites collection system of the embodiment of the present application 1;
Fig. 2 is the schematic diagram according to a kind of optional websites collection system of the embodiment of the present application 1;
Fig. 3 shows a kind of hardware configuration frame of terminal (or mobile device) for realizing Website classification method Figure;
Fig. 4 is the flow chart according to the Website classification method of the embodiment of the present application 2;
Fig. 5 is the flow chart according to a kind of acquisition SNI field of the embodiment of the present application 2;
Fig. 6 is the flow chart according to a kind of safety certificate of acquisition targeted website of the embodiment of the present application 2;
Fig. 7 is a kind of schematic diagram classified according to safety certificate to targeted website according to the embodiment of the present application 2;
Fig. 8 is a kind of schematic diagram for identifying counterfeit website according to the embodiment of the present application 2;
Fig. 9 is a kind of schematic diagram for crawling equipment and being crawled to webpage according to the embodiment of the present application 2;
Figure 10 is that the web page contents crawled according to a kind of basis of the embodiment of the present application 2 show what targeted website was classified It is intended to;
Figure 11 is a kind of flow chart for crawling web page contents according to the embodiment of the present application 2;
Figure 12 is the flow chart according to the data processing method of the embodiment of the present application 3;
Figure 13 is the schematic diagram according to a kind of websites collection device of the embodiment of the present application 4;
Figure 14 is the schematic diagram according to a kind of data processing equipment of the embodiment of the present application 5;And
Figure 15 is the structural block diagram according to a kind of terminal of the embodiment of the present application 6.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.
Firstly, the part noun or term that occur during the embodiment of the present application is described are suitable for following solution It releases:
HTTP:HTTP agreement (HyperText Transfer Protocol, hypertext transfer protocol) is for from WWW Transportation protocol of the server transport hypertext to local browser.
SSL:SSL agreement (Secure Sockets Layer, Secure Socket Layer), is a kind of security protocol, it is therefore an objective to be Internet communication provides safety and data integrity guarantee, the consultation encrypt network connection in transport layer.Ssl protocol position Between ICP/IP protocol and various application layer protocols, safe support is provided for data communication.
TLS:TLS agreement (Transport Layer Security, Transport Layer Security), predecessor are ssl protocol, Purpose is also that safe support is provided for data communication, is encrypted in transport layer to network connection.
HTTPS:HTTPS agreement (HyperText Transfer Protocol Secure), is safely for target The channel HTTP, frequently referred to HTTP over TLS, HTTP over SSL or HTTP Secure.HTTPS is come using SSL/TLS Package is encrypted, upper layer carrying is http protocol.
SNI:SNI (Server Name Indication) is an extended field of tls protocol, is being shaken hands by SNI Client when beginning is it is possible to notify that the server end website domain name to be accessed.The feelings of multiple domain names are corresponded in a host in this way Under condition, server can then judge the specific domain name of client access by SNI field, and return to corresponding certificate.
Digital certificate authentication mechanism: digital certificate authentication mechanism (Certificate Authority, be abbreviated as CA), Referred to as e-business certification center, e-business certification authorized organization, are responsible for the authoritative institution of distribution & management digital certificate, And as third party trusted in e-commerce transaction, the responsibility that the legitimacy of public key in Public Key Infrastructure is examined is undertaken.In CA The heart is that each user using public-key cryptography provides a digital certificate, and the effect of digital certificate is the use listed in certification Family is legal to possess the public-key cryptography listed in certificate.
Digital certificate: digital certificate provides digital certificate for realization both sides' secure communication.Internet, company intranet or In extranets, identification and encrypted electronic information are realized using digital certificate.Contain key pair (public key and private in digital certificate Key) owner identification information, the true and false by verifying identification information realizes certification to certificate holder's identity.
Embodiment 1
Primarily directed to http protocol, main cause is that http protocol is plaintext agreement, Neng Gouti for the classification of current web URL and relevant GET, POST, Response are taken, can be classified according to a variety of dimensions to the website HTTP.But with HTTPS's is continuous universal, and many web-sites all begin to use HTTPS externally to provide service.Since HTTPS web site contents encrypt , in the case where not grasping certificate and private key and correlation random number, to appear to a pile random in bypass observer for HTTPS flow Number, so being difficult to classify to the website HTTPS.
In order to solve the above-mentioned technical problem, this application provides corresponding solution, i.e., a kind of websites collection system, under It is illustrated in face of the system, as shown in Figure 1:
At least one client device 10 accesses at least one targeted website for the Web portal by public cloud.
Above-mentioned client device can be terminal of the user for the Web portal access target website by public cloud and set Standby, targeted website can be any website that publicly-owned cloud platform is capable of providing.
Fig. 2 be according to a kind of schematic diagram of optional websites collection system of the embodiment of the present application 1, as shown in connection with fig. 2, visitor Family end equipment accesses the targeted website that public cloud platform provides by the Web portal that the publicly-owned cloud platform of linking Internet provides, These targeted websites can be the website using security protocol, such as: use the website of HTTPS agreement.
Mirror-image system 20 is bypassed, the data flow that mirror image obtains is carried out to original data stream from Web portal for receiving, At least one targeted website of at least one client device access is determined based on received data flow.
Specifically, above-mentioned bypass mirror-image system may include the network equipments such as the interchanger that port has image feature.On Stating Web portal is the Web portal that publicly-owned cloud platform provides, and is accessed for client device to targeted website, original number It is the data flow that client device access targeted website generates according to stream, mirror image is carried out to original data stream by bypass mirror-image system Obtain the data flow for determining targeted website.
Mirror-image system is bypassed after obtaining data flow, server name instruction SNI field can be obtained from data flow;Base The website domain name of targeted website in data flow is determined in SNI field, so that it is determined that at least one targeted website.
Classified service device 30, for classifying at least one determining targeted website.
After determining at least one targeted website by data flow, targeted website can be divided in multiple dimensions Class.
In the prior art, can only classify to the website for using http protocol, and using the website of non-encrypted agreement, And for having used the website of secure transfer protocol, it, can not be to making since its data flow is only messy code in third party Classified with the website of secure transfer protocol.And the scheme that the above embodiments of the present application provide is by visiting client device It asks that the original data stream of targeted website carries out mirror image and obtains data flow, the mesh that client device is accessed is obtained by data flow Website is marked, and then can be classified to targeted website.
The application above scheme, which solves, as a result, is difficult to Accurate classification using the website that security protocol is transmitted in the prior art The technical issues of.
As a kind of optional embodiment, mirror-image system is bypassed, is also used to when data flow is HTTPS data flow, from connecing Server name is obtained in the data flow of receipts indicates SNI field;And targeted website is determined based on SNI field.
Specifically, can be by identifying ssl protocol from data flow, and the TCP data packet of original data stream is obtained, it is based on Identification handshake packet (client hello data packet) in the TCP data packet of original data stream, extracts phase according to the specification of TLS The SNI answered.
As a kind of optional embodiment, mirror-image system is bypassed, is also used to extract from data flow for reflecting target network The reliability information for the confidence level stood;Classified service device is also used to classify to targeted website according to reliability information.
Specifically, the reliability information of above-mentioned targeted website can be the credible of the digital certificate authentication mechanism of targeted website Grade, well-known digital certificate authentication mechanism confidence level with higher are spent, nameless digital certificate authentication mechanism has Lower confidence level.Classified service device according to reliability information, by targeted website according to this dimension of reliability information into Row classification.
As a kind of optional embodiment, reliability information includes: the safety certificate of targeted website;Classified service device, also Classify for the type according to security credential authentication mechanism to targeted website.
As shown in connection with fig. 2, in an alternative embodiment, above-mentioned classified service device includes algorithm center and HPPTS net It stands domain knowledge base, algorithm center prestores the confidence level of different digital certificate authority.Mirror-image system is bypassed from net Stand inlet to original data stream carry out mirror image obtain data flow, from data flow extract targeted website website certificate information, And the digital certificate extracted is uploaded to algorithm center, algorithm center according to digital certificate determine targeted website used in number Word certificate authority, and the credibility model based on the digital certificate authentication mechanism prestored, determine used in targeted website The confidence level of digital certificate authentication mechanism.The confidence level of each digital certificate authentication mechanism can be used as a class Not, targeted website is classified in the corresponding classification of grade belonging to its digital certificate authentication mechanism by algorithm center, to complete Classification to targeted website.
For example, well-known digital certificate authentication mechanism confidence level with higher, corresponding first category;Common number Word certificate authority has lower confidence level, corresponding second category;The confidence level of the digital certificate of oneself signature is minimum, Corresponding third classification.On the basis of this, algorithm center can be in the dimension of confidence level, based on peace used in targeted website Targeted website is divided into above three classification by full certificate.
Algorithm center can also carry out more careful division according to the information of the digital certificate authentication mechanism prestored, thus Targeted website is carried out to more careful division.
After the heart is classified targeted website in the algorithm, it can be stored in algorithm by HTTPS websites collection knowledge base The classification results that gains in depth of comprehension arrive, and service is provided by the HTTPS websites collection knowledge base directly portion of being outside one's consideration.
It should be noted that can also in addition be arranged one in above-mentioned assorting process and be classified as counterfeit classification, identify The similarity of safety certificate and the higher safe-conduct of confidence level is more than the safety certificate of preset value, will use the net of the safety certificate Station is divided into the counterfeit classification.
Safety certificate in the confidence level of safety certificate and security cerificate information library can be subjected to certificate similarity calculation, Security cerificate information library is used to store the safety certificate namely the higher safety certificate of confidence level of the higher website of popularity.? When obtained similarity is higher than preset value, confirmation current goal website is counterfeit website, which is classified to counterfeit point Class.
As a kind of optional embodiment, above system further include:
Equipment is crawled, for crawling the webpage of at least one determining targeted website.
Classified service device is also used to classify to targeted website according to the content in webpage.
Specifically, the above-mentioned equipment that crawls can be crawler, the access of the website HTTPS can be obtained by bypass mirror-image system Log crawls strategy based on the access log of the website HTTPS to determine, is crawled by crawling strategy, obtain targeted website Corresponding webpage.
It, can be from the more of setting according to the content of webpage after classified service device obtains the webpage of at least one targeted website A dimension classifies to targeted website.
As a kind of optional embodiment, above system further include: strategic server, for obtaining the access of targeted website Amount;Instruction is crawled according to amount of access generation, wherein is crawled instruction and is crawled equipment for triggering and crawl webpage.
In an alternative embodiment, as shown in connection with fig. 2, strategic server can be crawler knowledge base, crawler knowledge Library obtains access log from bypass mirror-image system, crawls strategy according to access log determination, and crawl instruction, then by HTTPS Website climbs system deeply and crawls equipment according to instruction scheduling is crawled and crawled, and will crawl result and be uploaded to classified service device.
Classified service device may include algorithm center and HTTPS websites collection knowledge base, and crawling equipment will crawl in result Algorithm center is reached, is classified according to the web page contents crawled to website by algorithm center.Specifically, algorithm center can To use participle, cluster scheduling algorithm to carry out the classification of different dimensions to targeted website.
For example, crawler knowledge base sorts from high to low according to amount of access of the access log to the webpage of targeted website, and obtain Preceding n of webpage in ranking results is taken, generation crawls instruction to what this n webpage was crawled, and system root is climbed in the website HTTPS deeply This n website is crawled according to instruction scheduling crawler equipment is crawled.
Algorithm center segments the content of text crawled in result, and word segmentation result is clustered, to realize Classification to targeted website.
It should be noted that Objective is not strong, leads if climb deeply without target the whole network according to instruction is crawled Cause crawls that efficiency is lower, and obtained classification results are also inaccurate, therefore above scheme crawls instruction by the way that strategic server is specified, It crawls equipment according to instruction scheduling is crawled and is crawled to obtain at least one webpage, while can be set according to the different purposes that crawls The different instructions that crawls is set, such mode that crawls is with specific aim and purpose.Further according at least one webpage crawled into Row classification can then obtain having targetedly classification results.
Embodiment 2
According to embodiments of the present invention, a kind of embodiment of Website classification method is additionally provided, it should be noted that in attached drawing Process the step of illustrating can execute in a computer system such as a set of computer executable instructions, although also, Logical order is shown in flow charts, but in some cases, can be executed with the sequence for being different from herein it is shown or The step of description.
Embodiment of the method provided by the embodiment of the present application one can be in mobile terminal, terminal or similar fortune It calculates and is executed in device.Fig. 3 shows a kind of hardware of terminal (or mobile device) for realizing Website classification method Structural block diagram.As shown in figure 3, terminal 30 (or mobile device 30) may include it is one or more (in figure using 302a, 302b ... ..., 302n are shown) (processor 302 can include but is not limited to Micro-processor MCV or programmable patrols processor 302 The processing unit of volume device FPGA etc.), memory 304 for storing data and the transmission module for communication function 306.It in addition to this, can also include: display, input/output interface (I/O interface), the port universal serial bus (USB) (a port that can be used as in the port of I/O interface is included), network interface, power supply and/or camera.The common skill in this field Art personnel are appreciated that structure shown in Fig. 3 is only to illustrate, and do not cause to limit to the structure of above-mentioned electronic device.For example, Terminal 30 may also include the more perhaps less component than shown in Fig. 3 or match with different from shown in Fig. 3 It sets.
It is to be noted that said one or multiple processors 302 and/or other data processing circuits lead to herein Can often " data processing circuit " be referred to as.The data processing circuit all or part of can be presented as software, hardware, firmware Or any other combination.In addition, data processing circuit for single independent processing module or all or part of can be integrated to meter In any one in other elements in calculation machine terminal 30 (or mobile device).As involved in the embodiment of the present application, The data processing circuit controls (such as the selection for the variable resistance end path connecting with interface) as a kind of processor.
Memory 304 can be used for storing the software program and module of application software, such as the website in the embodiment of the present invention The corresponding program instruction/data storage device of classification method, processor 302 by operation be stored in it is soft in memory 304 Part program and module realize above-mentioned Website classification method thereby executing various function application and data processing.Storage Device 304 may include high speed random access memory, may also include nonvolatile memory, as one or more magnetic storage device, Flash memory or other non-volatile solid state memories.In some instances, memory 304 can further comprise relative to processing The remotely located memory of device 302, these remote memories can pass through network connection to terminal 30.Above-mentioned network Example includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Transmitting device 306 is used to that data to be received or sent via a network.Above-mentioned network specific example may include The wireless network that the communication providers of terminal 30 provide.In an example, transmitting device 306 includes that a network is suitable Orchestration (Network Interface Controller, NIC), can be connected by base station with other network equipments so as to Internet is communicated.In an example, transmitting device 306 can be radio frequency (Radio Frequency, RF) module, For wirelessly being communicated with internet.
Display can such as touch-screen type liquid crystal display (LCD), the liquid crystal display aloow user with The user interface of terminal 30 (or mobile device) interacts.
Herein it should be noted that in some optional embodiments, above-mentioned computer equipment shown in Fig. 3 (or movement is set It is standby) it may include hardware element (including circuit), software element (including the computer generation that may be stored on the computer-readable medium Code) or both hardware element and software element combination.It should be pointed out that Fig. 3 is only a reality of particular embodiment Example, and it is intended to show that the type for the component that may be present in above-mentioned computer equipment (or mobile device).
Under above-mentioned running environment, this application provides Website classification methods as shown in Figure 4.Fig. 4 is according to the application The flow chart of the Website classification method of embodiment 2.
Step S41, from intercepted data stream from the Web portal of public cloud, wherein data flow is that at least one client is set Data flow when standby at least one targeted website of access.
It, can be by bypassing mirroring device, when at least one targeted website of client device access in above-mentioned steps Original data stream carry out the obtained data flow of mirror image.
Step S43 determines at least one targeted website based on data flow, and carries out at least one determining targeted website Classification.
In the prior art, can only classify to the website for using http protocol, and using the website of non-encrypted agreement, And for having used the website of secure transfer protocol, it, can not be to making since its data flow is only messy code in third party Classified with the website of secure transfer protocol.And the above embodiments of the present application are from intercepted data from the Web portal of public cloud Stream, wherein data flow when data flow is at least one targeted website of at least one client device access, it is true based on data flow At least one fixed targeted website, and classify at least one determining targeted website.Above scheme is by setting client The original data stream of standby access target website carries out mirror image and obtains data flow, is accessed by data flow to obtain client device Targeted website, and then can classify to targeted website.
The application above scheme, which solves, as a result, is difficult to Accurate classification using the website that security protocol is transmitted in the prior art The technical issues of.
As a kind of optional embodiment, from intercepted data stream from the Web portal of public cloud, comprising:
Step S411 obtains original data stream from Web portal;And mirror image is carried out to original data stream, mirror image is obtained Data flow as intercept and capture data flow.
It as shown in connection with fig. 2, can be that data flow carries out to original from the Web portal of public cloud by bypass mirror-image system It intercepts and captures, original data stream is being subjected to mirror image by the image feature of bypass mirror-image system, is obtaining above-mentioned data flow.
As a kind of optional embodiment, data flow includes HTTPS data flow;At least one target is determined based on data flow Website, comprising:
Step S431 obtains server name from received data flow and indicates SNI field;And it is determined based on SNI field Targeted website.
Since SNI field is an extended field of tls protocol, by SNI shake hands start when client inform service The device end website domain name to be accessed, to obtain SNI field to determine targeted website.
Specifically, can be by identifying ssl protocol from data flow, and the TCP data packet of original data stream is obtained, it is based on Identification handshake packet (client hello data packet) in the TCP data packet of original data stream, extracts phase according to the specification of TLS The SNI answered.
Fig. 5 is according to a kind of flow chart of acquisition SNI field of the embodiment of the present application 2, as shown in connection with fig. 2 to how obtaining SNI field is illustrated.
S51 receives TCP data packet.Above-mentioned TCP data packet can obtain for original data stream.
S52 judges whether it is ssl protocol.The SSL in original data stream can be identified by ssl protocol identification module Agreement, and TCP data is wrapped and is transmitted to SNI extraction module.In the case where judging result is for ssl protocol, enter step S53, in the case where judging result is for ssl protocol, return step S51 reacquires TCP data packet.
S53 judges whether it is Client hello packet.Content in TCP data packet is identified, is in content When Client hello packet, then S54 is entered step, in the case where content is not Client hello packet, returns to S53 and obtain Next message is taken to be judged.
S54, if can correctly extract SNI.S55 is entered step in the case where correctly SNI can be extracted, otherwise again Secondary execution S54 continues to extract.
S55 extracts SNI field.By SNI field be shake hands start when client device tell the server end to be visited The website domain name asked, therefore the website domain name of targeted website can be obtained by extracting SNI field.
As a kind of optional embodiment, classify at least one determining targeted website, comprising:
Step S433 extracts the reliability information for reflecting the confidence level of targeted website from data flow;According to credible Degree information classifies to targeted website.
Specifically, the reliability information of above-mentioned targeted website can be the credible of the digital certificate authentication mechanism of targeted website Grade, well-known digital certificate authentication mechanism confidence level with higher are spent, nameless digital certificate authentication mechanism has lower Confidence level.Classified service device is according to reliability information, and by targeted website, according to confidence level, this dimension is classified.
As a kind of optional embodiment, reliability information includes: the safety certificate of targeted website;According to reliability information Classify to targeted website, comprising: the type according to security credential authentication mechanism classifies to targeted website.
Specifically, the type of security credential authentication mechanism can be true according to the popularity of the security credential authentication mechanism prestored Fixed, the type of security credential authentication mechanism can be the confidence level according to security credential authentication mechanism to security credential authentication mechanism It is classifying as a result, the type according to security credential authentication mechanism classifies to targeted website, as safety certificate is recognized The identical targeted website of classification of card mechanism is divided into one kind.
Fig. 6 is according to a kind of flow chart of the safety certificate of acquisition targeted website of the embodiment of the present application 2, below with reference to figure 6 are illustrated.
S61 receives TCP data packet.Above-mentioned TCP data packet can obtain for original data stream.
S62 judges whether it is ssl protocol.The SSL in original data stream can be identified by ssl protocol identification module Agreement, and TCP data is wrapped and is transmitted to SNI extraction module.In the case where judging result is for ssl protocol, enter step S63, in the case where judging result is for ssl protocol, return step S61 reacquires TCP data packet.
S63 judges whether it is Client hello packet.Content in TCP data packet is identified, is in content When Client hello packet, then S64 is entered step, in the case where content is not Client hello packet, returns to S63 and obtain Next message is taken to be judged.
S64, if can correctly extract safety certificate.It is entered step in the case where correctly safety certificate can be extracted Otherwise S65 executes S64 again and continues to extract.
S65 extracts safety certificate.
Fig. 7 is a kind of schematic diagram classified according to safety certificate to targeted website according to the embodiment of the present application 2, It is extracted after the safety certificate of targeted website, according to HTTP certificate information (safety certificate of targeted website) and preset number The credibility model of word certificate authority obtains the website HTTP confidence level classification results.In an alternative embodiment, will The reliability information of safety certificate is input to the credibility model of digital certificate authentication mechanism, carries out the classification of the website HTTP confidence level It calculates, obtains the corresponding classification of its confidence level, as classification belonging to the targeted website.
It should be noted that can also in addition be arranged one in above-mentioned assorting process and be classified as counterfeit classification, identify The similarity of safety certificate and the higher safe-conduct of confidence level is more than the safety certificate of preset value, will use the net of the safety certificate Station is divided into the counterfeit classification.
Fig. 8 is as shown in connection with fig. 8 to be demonstrate,proved HTTP according to a kind of schematic diagram for identifying counterfeit website of the embodiment of the present application 2 Letter ceases the confidence level of (safety certificate of targeted website) and the safety certificate in security cerificate information library carries out certificate similarity It calculates, security cerificate information library is used to store the safety certificate namely the higher safe-conduct of confidence level of the higher website of popularity Book.When obtained similarity is higher than preset value, confirmation current goal website is counterfeit website, which is classified to imitative Emit classification.
As a kind of optional embodiment, classify at least one determining targeted website, comprising:
Crawl the web page contents of at least one targeted website;At least one targeted website is divided based on web page contents Class.
In above-mentioned steps, the webpage of crawled targeted website can be determined according to the amount of access of targeted website, it can be with It crawls equipment by issuing to crawl instruction and call determining webpage is crawled.
Specifically, can be crawled by flow chart shown in Fig. 9.
S91, crawler knowledge base, which issues, crawls instruction.
S92 judges whether the priority of the targeted website is higher than and currently crawls task, and the priority in the targeted website is high In the case where currently crawling task, S93 is entered step, S94 is otherwise entered step.
S93, basis crawls instruction and is crawled immediately.
S94 is discharged into and crawls queue.
It in above-mentioned steps, crawls instruction and is discharged into that crawl queue etc. to be performed, crawl equipment and currently crawl having executed It will continue to execute crawl in queue next after instruction and crawl instruction.
Carrying out classification at least one targeted website can be by segmenting, confirming subject of Web site word and key phrases clustering Several steps carry out, and are described in detail below with reference to Figure 10.
Figure 10 is that the web page contents crawled according to a kind of basis of the embodiment of the present application show what targeted website was classified It is intended to, as shown in connection with fig. 10, the text crawled in resulting web page contents is subjected to webpage participle, obtains word segmentation result, then right Word segmentation result carries out descriptor calculating, and descriptor is calculated can be determined based on the frequency that each participle occurs.Compare target network The descriptor stood, so that the descriptor to targeted website clusters, to obtain the classification results of targeted website.
As a kind of optional embodiment, the web page contents of at least one targeted website are crawled, comprising: crawl current web page Content;When in the content including web site url, judge whether web site url is our station link;And the feelings in judging result being yes Under condition, continue to crawl the corresponding webpage of web site url.
Figure 11 is a kind of flow chart for crawling web page contents according to the embodiment of the present application 2, in conjunction with shown in Figure 11, including such as Lower step:
S111 crawls current web page.
S112 extracts all HTTPS link in current web page.
S113 judges whether link is our station link, enters step S114 in the case where being linked as our station link, otherwise Enter step S115.
S114 goes successively to depth and crawls.
The link is put into crawler knowledge base by S115.
Embodiment 3
According to embodiments of the present invention, a kind of embodiment of data processing method is additionally provided, Figure 12 is according to the application reality Apply the flow chart of the data processing method of example 3.
Step S121 obtains application layer encryption data, wherein the application layer encryption data include HTTPS data flow, institute Stating HTTPS data flow includes server name instruction SNI field.
Specifically, above-mentioned application layer encryption data can be from intercepted data stream, Ke Yitong from the Web portal of public cloud Bypass mirroring device is crossed, original data stream when at least one targeted website of client device access carries out the number that mirror image obtains According to stream.
Step S123 is based on application layer encryption data, determines targeted website to be visited.
As a kind of optional embodiment, be based on the application layer encryption data, determine targeted website to be visited it Afterwards, the method also includes: classify to determining targeted website.
In the prior art, can only classify to the website for using http protocol, and using the website of non-encrypted agreement, And for having used the website of secure transfer protocol, it, can not be to making since its data flow is only messy code in third party Classified with the website of secure transfer protocol.And the above embodiments of the present application obtain application layer encryption data, based on application Layer encryption data determines targeted website to be visited, classifies to determining targeted website.Above scheme is added by application layer Ciphertext data determines the targeted website to be visited of client device, and then can classify to targeted website.
The application above scheme, which solves, as a result, is difficult to Accurate classification using the website that security protocol is transmitted in the prior art The technical issues of.
As a kind of optional embodiment, application layer encryption data include HTTPS data flow, and HTTPS data flow includes clothes Device title of being engaged in indicates SNI field.
Since SNI field is an extended field of tls protocol, by SNI shake hands start when client it is possible to notify that The server end website domain name to be accessed, therefore available SNI field determines targeted website.
As a kind of optional embodiment, data flow is HTTPS data flow;At least one determining targeted website is carried out Classification, comprising:
Step S1251 extracts the reliability information for reflecting the confidence level of targeted website from data flow;According to credible Degree information classifies to targeted website;Alternatively,
Step S1253 crawls the webpage of at least one determining targeted website;According to the content in webpage to targeted website Classify.
Above scheme provides two kinds of modes classified to targeted website, and first way is according to the credible of extraction Degree information is classified, and the second way is to be classified by the webpage of the targeted website crawled to targeted website.
In the first way, the reliability information of targeted website can be the digital certificate authentication mechanism of targeted website Confidence level, well-known digital certificate authentication mechanism confidence level with higher, nameless digital certificate authentication mechanism have Lower confidence level.Classified service device is according to reliability information, and by targeted website, according to confidence level, this dimension is classified.
In the second way, the amount of access of webpage in targeted website can be obtained by access log, according to amount of access Generation crawls instruction, and crawls according to crawling instruction scheduling and crawling equipment to webpage.Again to content analysis is crawled, to wherein Content of text segmented, and the descriptor of targeted website is obtained by the frequency of each participle, then pass through most each target The descriptor of website is clustered, and the classification results of targeted website are obtained.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention It is necessary.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 4
According to embodiments of the present invention, additionally provide it is a kind of for implementing the websites collection device of above-mentioned Website classification method, As shown in figure 13, which includes:
Interception module 130, for the intercepted data stream from the Web portal of public cloud, wherein the data flow is at least Data flow when one at least one targeted website of client device access.
Determining module 132 determines at least one targeted website for flowing based on the data, and at least one determining Classify targeted website.
Herein it should be noted that above-mentioned interception module 130 and determining module 132 correspond to the step S41 in embodiment 2 To step S43, two modules are identical as example and application scenarios that corresponding step is realized, but are not limited to the above embodiments one Disclosure of that.It should be noted that above-mentioned module may operate in the meter of the offer of embodiment one as a part of device In calculation machine terminal 15.
Embodiment 5
According to embodiments of the present invention, additionally provide it is a kind of for implementing the data processing equipment of above-mentioned data processing method, As shown in figure 14, which includes:
Module 140 is obtained, for obtaining application layer encryption data, wherein application layer encryption data include HTTPS data Stream, HTTPS data flow include server name instruction SNI field.
Determining module 142 determines targeted website to be visited for being based on the application layer encryption data.
Herein it should be noted that above-mentioned acquisition module 140 and determining module 142 correspond to the step in embodiment 3 S121 to step S123, two modules are identical as example and application scenarios that corresponding step is realized, but are not limited to above-mentioned reality Apply one disclosure of that of example.It is mentioned it should be noted that above-mentioned module may operate in embodiment one as a part of device In the terminal 15 of confession.
Embodiment 6
The embodiment of the present invention can provide a kind of terminal, which can be in terminal group Any one computer terminal.Optionally, in the present embodiment, above-mentioned terminal also could alternatively be mobile whole The terminal devices such as end.
Optionally, in the present embodiment, above-mentioned terminal can be located in multiple network equipments of computer network At least one network equipment.
In the present embodiment, above-mentioned terminal can execute the program code of following steps in Website classification method: From intercepted data stream from the Web portal of public cloud, wherein safety certificate data flow be at least one client device access extremely Data flow when a few targeted website;Determine at least one targeted website based on safety certificate data flow, and to it is determining extremely Classify a few targeted website.
Optionally, Figure 15 is the structural block diagram according to a kind of terminal of the embodiment of the present application 6.As shown in figure 15, Terminal A may include: one or more (one is only shown in figure) processors 1502, memory 1504 and peripheral hardware Interface 1506.
Wherein, memory can be used for storing software program and module, such as the Website classification method in the embodiment of the present invention Program instruction/module corresponding with device, the software program and module that processor is stored in memory by operation, thus Application and data processing are performed various functions, that is, realizes above-mentioned Website classification method.Memory may include that high speed is deposited at random Reservoir, can also include nonvolatile memory, such as one or more magnetic storage device, flash memory or other are non-volatile Property solid-state memory.In some instances, memory can further comprise the memory remotely located relative to processor, these Remote memory can pass through network connection to terminal 15.The example of above-mentioned network includes but is not limited to internet, enterprises Net, local area network, mobile radio communication and combinations thereof.
Processor can call the information and application program of memory storage by transmitting device, to execute following step: From intercepted data stream from the Web portal of public cloud, wherein safety certificate data flow be at least one client device access extremely Data flow when a few targeted website;Determine at least one targeted website based on safety certificate data flow, and to it is determining extremely Classify a few targeted website.
Optionally, the program code of following steps can also be performed in above-mentioned processor: obtaining from safety certificate Web portal Take original data stream;And mirror image is carried out to safety certificate original data stream, the data flow that mirror image is obtained is as the data intercepted and captured Stream.
Optionally, the program code of following steps can also be performed in above-mentioned processor: from the received data flow of safety certificate Middle acquisition server name indicates SNI field;And safety certificate targeted website is determined based on safety certificate SNI field.
Optionally, the program code of following steps can also be performed in above-mentioned processor: extracting from safety certificate data flow For reflecting the reliability information of the confidence level of safety certificate targeted website;According to safety certificate reliability information to safety certificate Classify targeted website.
Optionally, above-mentioned processor can also be performed the program code of following steps: safety certificate reliability information includes: The safety certificate of safety certificate targeted website;Classified according to safety certificate reliability information to safety certificate targeted website, It include: to classify according to the type of safety certificate security credential authentication mechanism to safety certificate targeted website.
Optionally, the program code of following steps can also be performed in above-mentioned processor: crawling at least one mesh of safety certificate Mark the web page contents of website;Classified at least one targeted website based on safety certificate web page contents to safety certificate.
Optionally, the program code of following steps can also be performed in above-mentioned processor: crawling the content of current web page;Pacifying When including web site url in full certificate content, judge whether safety certificate web site url is our station link;And it is in judging result In the case where being, continue to crawl the corresponding webpage of safety certificate web site url.
In the prior art, can only classify to the website for using http protocol, and using the website of non-encrypted agreement, And for having used the website of secure transfer protocol, it, can not be to making since its data flow is only messy code in third party Classified with the website of secure transfer protocol.And the above embodiments of the present application are from intercepted data from the Web portal of public cloud Stream, wherein data flow when data flow is at least one targeted website of at least one client device access, it is true based on data flow At least one fixed targeted website, and classify at least one determining targeted website.Above scheme is by setting client The original data stream of standby access target website carries out mirror image and obtains data flow, is accessed by data flow to obtain client device Targeted website, and then can classify to targeted website.
The application above scheme, which solves, as a result, is difficult to Accurate classification using the website that security protocol is transmitted in the prior art The technical issues of.
It will appreciated by the skilled person that structure shown in figure 15 is only to illustrate, terminal is also possible to Smart phone (such as Android phone, iOS mobile phone), tablet computer, applause computer and mobile internet device (Mobile Internet Devices, MID), the terminal devices such as PAD.Figure 15 it does not cause to limit to the structure of above-mentioned electronic device.Example Such as, terminal 15 may also include the more or less component (such as network interface, display device) than shown in Figure 15, Or with the configuration different from shown in Figure 15.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing the relevant hardware of terminal device by program, which can store in a computer readable storage medium In, storage medium may include: flash disk, read-only memory (Read-Only Memory, ROM), random access device (Random Access Memory, RAM), disk or CD etc..
Embodiment 7
The embodiments of the present invention also provide a kind of storage mediums.Optionally, in the present embodiment, above-mentioned storage medium can For saving program code performed by Website classification method provided by above-described embodiment one.
Optionally, in the present embodiment, above-mentioned storage medium can be located in computer network in computer terminal group In any one terminal, or in any one mobile terminal in mobile terminal group.
Optionally, in the present embodiment, storage medium is arranged to store the program code for executing following steps: from Intercepted data stream at the Web portal of public cloud, wherein the data flow be at least one client device access at least one Data flow when targeted website;It flows based on the data and determines at least one targeted website, and at least one determining target Classify website.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, only A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or Person is desirably integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or Part steps.And storage medium above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code Medium.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (19)

1. a kind of websites collection system characterized by comprising
At least one client device accesses at least one targeted website for the Web portal by public cloud;
Mirror-image system is bypassed, for receiving from Web portal to the obtained data flow of original data stream progress mirror image, based on connecing The data flow of receipts determines at least one targeted website of at least one client device access;
Classified service device, for classifying at least one targeted website described in determining.
2. system according to claim 1, which is characterized in that the bypass mirror-image system is also used in the data flow When for HTTPS data flow, server name is obtained from the received data flow and indicates SNI field;And it is based on the SNI word Section determines the targeted website.
3. system according to claim 1, which is characterized in that the bypass mirror-image system is also used to from the data flow The middle reliability information extracted for reflecting the confidence level of the targeted website;The classified service device, is also used to according to Reliability information classifies to the targeted website.
4. system according to claim 3, which is characterized in that the reliability information includes: the peace of the targeted website Full certificate;The classified service device is also used to classify to the targeted website according to the type of security credential authentication mechanism.
5. system according to claim 1, which is characterized in that
The system also includes: crawl equipment, for crawl it is determining described at least one targeted website webpage;
The classified service device is also used to classify to the targeted website according to the content in the webpage.
6. system according to claim 5, which is characterized in that the system also includes: strategic server, for obtaining State the amount of access of targeted website;Instruction is crawled according to amount of access generation, wherein described to crawl instruction for triggering described climb Equipment is taken to crawl the webpage.
7. a kind of Website classification method characterized by comprising
From intercepted data stream from the Web portal of public cloud, wherein the data flow be at least one client device access extremely Data flow when a few targeted website;
It flows based on the data and determines at least one targeted website, and classify at least one determining targeted website.
8. the method according to the description of claim 7 is characterized in that from intercepted data stream from the Web portal of public cloud, comprising:
Original data stream is obtained from the Web portal;And mirror image, the number that mirror image is obtained are carried out to the original data stream According to stream as the data flow intercepted and captured.
9. the method according to the description of claim 7 is characterized in that the data flow includes HTTPS data flow;Based on the number At least one targeted website is determined according to flowing, comprising:
Server name is obtained from the received data flow indicates SNI field;And the mesh is determined based on the SNI field Mark website.
10. being wrapped the method according to the description of claim 7 is characterized in that classifying at least one determining targeted website It includes:
The reliability information for reflecting the confidence level of the targeted website is extracted from the data flow;According to the confidence level Information classifies to the targeted website.
11. according to the method described in claim 10, it is characterized in that, the reliability information includes: the targeted website Safety certificate;Classified according to the reliability information to the targeted website, comprising:
Type according to security credential authentication mechanism classifies to the targeted website.
12. being wrapped the method according to the description of claim 7 is characterized in that classifying at least one determining targeted website It includes:
Crawl the web page contents of at least one targeted website;Based on the web page contents at least one described targeted website Classify.
13. according to the method for claim 12, which is characterized in that in the webpage for crawling at least one targeted website Hold, comprising:
Crawl the content of current web page;When including web site url in the content, judge whether the web site url is our station Link;And in the case where the judgment result is yes, continue to crawl the corresponding webpage of the web site url.
14. a kind of data processing method characterized by comprising
Obtain application layer encryption data, wherein the application layer encryption data include HTTPS data flow, the HTTPS data flow SNI field is indicated including server name;
Based on the application layer encryption data, targeted website to be visited is determined.
15. according to the method for claim 14, which is characterized in that be based on the application layer encryption data, determining wait visit After the targeted website asked, the method also includes:
Classify to the determining targeted website.
16. according to the method for claim 14, which is characterized in that the data flow is HTTPS data flow;To it is determining extremely Classify a few targeted website, comprising:
The reliability information for reflecting the confidence level of the targeted website is extracted from the data flow;According to the confidence level Information classifies to the targeted website;Alternatively,
Crawl the webpage of determining at least one targeted website;According to the content in the webpage to the targeted website into Row classification.
17. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program When control the storage medium where equipment perform claim require any one of 7 to 10 described in Website classification method;Or Data processing method described in any one of claim 14-16.
18. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run Benefit require any one of 7 to 10 described in Website classification method;Or data described in any one of claim 14-16 Processing method.
19. a kind of system characterized by comprising
Processor;And
Memory is connected to the processor, for providing the instruction for handling following processing step for the processor: from publicly-owned Intercepted data stream at the Web portal of cloud, wherein the data flow is at least one target of at least one client device access Data flow when website;It flows based on the data and determines at least one targeted website, and at least one determining targeted website Classify.
CN201710965355.8A 2017-10-17 2017-10-17 Websites collection system, method and data processing method Pending CN109672650A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710965355.8A CN109672650A (en) 2017-10-17 2017-10-17 Websites collection system, method and data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710965355.8A CN109672650A (en) 2017-10-17 2017-10-17 Websites collection system, method and data processing method

Publications (1)

Publication Number Publication Date
CN109672650A true CN109672650A (en) 2019-04-23

Family

ID=66140355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710965355.8A Pending CN109672650A (en) 2017-10-17 2017-10-17 Websites collection system, method and data processing method

Country Status (1)

Country Link
CN (1) CN109672650A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115766204A (en) * 2022-11-14 2023-03-07 电子科技大学 Dynamic IP equipment identification system and method for encrypted flow
CN115766204B (en) * 2022-11-14 2024-04-26 电子科技大学 Dynamic IP equipment identification system and method for encrypted traffic

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7533179B2 (en) * 2001-09-20 2009-05-12 Hitwise Pty, Ltd. Method and system for characterization of online behavior
CN101630330A (en) * 2009-08-14 2010-01-20 苏州锐创通信有限责任公司 Method for webpage classification
CN101977235A (en) * 2010-11-03 2011-02-16 北京北信源软件股份有限公司 URL (Uniform Resource Locator) filtering method aiming at HTTPS (Hypertext Transport Protocol Server) encrypted website access
CN103229479A (en) * 2012-12-28 2013-07-31 华为技术有限公司 Website identification method and device and network system
CN105117434A (en) * 2015-08-07 2015-12-02 北京品友互动信息技术有限公司 Webpage classification method and webpage classification system
CN105141575A (en) * 2015-06-25 2015-12-09 北京网康科技有限公司 Encrypted application identification and encrypted webpage content classification methods, and encrypted application identification and/or encrypted webpage content classification devices

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7533179B2 (en) * 2001-09-20 2009-05-12 Hitwise Pty, Ltd. Method and system for characterization of online behavior
CN101630330A (en) * 2009-08-14 2010-01-20 苏州锐创通信有限责任公司 Method for webpage classification
CN101977235A (en) * 2010-11-03 2011-02-16 北京北信源软件股份有限公司 URL (Uniform Resource Locator) filtering method aiming at HTTPS (Hypertext Transport Protocol Server) encrypted website access
CN103229479A (en) * 2012-12-28 2013-07-31 华为技术有限公司 Website identification method and device and network system
CN105141575A (en) * 2015-06-25 2015-12-09 北京网康科技有限公司 Encrypted application identification and encrypted webpage content classification methods, and encrypted application identification and/or encrypted webpage content classification devices
CN105117434A (en) * 2015-08-07 2015-12-02 北京品友互动信息技术有限公司 Webpage classification method and webpage classification system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115766204A (en) * 2022-11-14 2023-03-07 电子科技大学 Dynamic IP equipment identification system and method for encrypted flow
CN115766204B (en) * 2022-11-14 2024-04-26 电子科技大学 Dynamic IP equipment identification system and method for encrypted traffic

Similar Documents

Publication Publication Date Title
CN109672651A (en) Intercepting processing method, system and the data processing method of website visiting
CN106710017B (en) Identity verification method, device and system for logistics signing
CN102710770A (en) Identification method for network access equipment and implementation system for identification method
CN110351228A (en) Remote entry method, device and system
CN104660557B (en) operation processing method and device
CN107431712A (en) Network flow daily record for multi-tenant environment
CN109428878A (en) Leak detection method, detection device and detection system
CN109167797A (en) Analysis of Network Attack method and apparatus
CN110247934A (en) The method and system of internet-of-things terminal abnormality detection and response
IL295578B1 (en) Secure methods and systems for environmental credit scoring
CN105554009B (en) A method of passing through Network Data Capture device operating system information
CN105245489B (en) Verification method and device
CN110399225A (en) Monitoring information processing method, system and computer system
CN106850687A (en) Method and apparatus for detecting network attack
CN112532605B (en) Network attack tracing method and system, storage medium and electronic device
CN108429653A (en) A kind of test method, equipment and system
US20180302437A1 (en) Methods of identifying and counteracting internet attacks
Medhat et al. Testing techniques in IoT-based systems
CN107666471A (en) Method and apparatus for protecting website
CN108337235A (en) A kind of method and system executing safety operation using safety equipment
CN110536118A (en) A kind of data capture method, device and computer storage medium
CN111385272A (en) Weak password detection method and device
CN109446807A (en) The method, apparatus and electronic equipment of malicious robot are intercepted for identification
CN110196920A (en) The treating method and apparatus and storage medium and electronic device of text data
CN109413004A (en) Verification method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190423