CN109905372A - Transmission system Data Transport Protocol identifying and analyzing method - Google Patents

Transmission system Data Transport Protocol identifying and analyzing method Download PDF

Info

Publication number
CN109905372A
CN109905372A CN201910069203.9A CN201910069203A CN109905372A CN 109905372 A CN109905372 A CN 109905372A CN 201910069203 A CN201910069203 A CN 201910069203A CN 109905372 A CN109905372 A CN 109905372A
Authority
CN
China
Prior art keywords
data
session
transmission system
session data
feature string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910069203.9A
Other languages
Chinese (zh)
Inventor
李惠英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201910069203.9A priority Critical patent/CN109905372A/en
Publication of CN109905372A publication Critical patent/CN109905372A/en
Pending legal-status Critical Current

Links

Abstract

The present invention relates to transmission system technical field, especially a kind of transmission system Data Transport Protocol identifying and analyzing method includes the following steps, step S101: network package data is resolved to session data;Step S102: judge whether the entity part of session data meets preset data form;Step S103: multimode matching is carried out to the session data for meeting preset data form;Step S104: according to the hit location of the data format of session data mark and default feature string.After adopting the above method, the present invention passes through the judgement to the session data progress preset data form after parsing, the invalid data for not meeting preset data form can be filtered out, therefore data extraction time is shortened, furthermore, by carrying out multimode matching to the session data for meeting preset data form, the lookup match time extracted in data procedures is further reduced, the full-text data extraction efficiency of mass data is improved.

Description

Transmission system Data Transport Protocol identifying and analyzing method
Technical field
The present invention relates to transmission system technical field, especially a kind of transmission system Data Transport Protocol discriminance analysis side Method.
Background technique
The high speed development of internet makes data penetrate into each industry and operation function field, is increasingly becoming weight The factor of production wanted, incident is the mass data that the mankind can analyze and handle.Medium-sized or above city such as Beijing, Shanghai, the Various types of data generated in network behavior daily already exceed PB grades.Such as application program of mobile phone has several TB daily Submission data generate, longitude and latitude, mobile phone string number, subscriber identification card card number, mobile phone unique identification are contained in these data The information such as code, and these information are highly useful in security control industry, thus magnanimity extract these information as one it is important and Complicated work.
There are mainly two types of traditional full text extracting methods: one is the extracting method based on template, this method is suitable for The information extraction of specific website, but for the data that changeable cell phone application and different web sites generate, it appears incapability is Power;Another kind is that entire contents are extracted based on regular expression, and this method is suitable for offline, the lesser full text of data volume and mentions It takes, once the APP in face of magnanimity submits data, efficiency is relatively low.Therefore both methods can be expended in big data quantity A large amount of manpowers, and inefficiency cannot meet the needs in big data quantity.
108063741 A of Chinese invention patent CN discloses a kind of transport protocol conversion method, comprising: receives client Or the transmission data packet that view networked server is sent;When the transmission data packet is sent by client, by the transmission number It is converted into according to packet depending on networking data packet and is sent to the view connection server;When the transmission data packet is by view networked server When transmission, the transmission data packet is converted into Ethernet data bag and is sent to the client.
Summary of the invention
The technical problem to be solved by the invention is to provide one kind can be with the Data Transport Protocol of high efficiency extraction data information Identifying and analyzing method.
In order to solve the above technical problems, transmission system of the invention Data Transport Protocol identifying and analyzing method;Including with Lower step,
Step S101: network package data are resolved into session data;
Step S102: judging whether the entity part of session data meets preset data form, if then to session data Carry out data format mark;
Step S103: carrying out multimode matching to the session data for meeting preset data form, judges whether to hit default spy Sign is gone here and there, and the hit location of default feature string is obtained when hitting default feature string;
Step S104: according to the hit location of the data format of session data mark and default feature string, session is determined The corresponding extraction function of data, and data extraction is carried out to session data according to function is extracted.
Preferably, network package data protocol uses hypertext transfer protocol in the step S101, using http protocol Stack resolves to HTTPPOST session data;Session data after parsing includes HTTP header and HTTP entity part.
Preferably, HTTPPOST session data is resolved to using http protocol stack in the step S101 and first builds transmission control Then protocol streams are parsed by http session again.
Preferably, feature string is preset in the step S103 can preset the quantity of feature string by configuration file management More than or equal to 1.
Preferably, it is read in the step S103 after configuration file obtains default feature string and carries out multimode matching algorithm state The generation of tree.
It preferably, will by carrying out the judgement of preset data form to the session data after parsing in the step S104 The invalid data for not meeting preset data form filters out.
After adopting the above method, the present invention, can by the judgement to the session data progress preset data form after parsing To filter out the invalid data for not meeting preset data form, therefore data extraction time is shortened, in addition, by pre- to meeting If the session data of data format carries out multimode matching, the lookup match time extracted in data procedures is further reduced, is mentioned The high full-text data extraction efficiency of mass data.
Detailed description of the invention
The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
Fig. 1 is transmission system Data Transport Protocol identifying and analyzing method of the present invention.
Specific embodiment
As shown in Figure 1, transmission system of the present invention Data Transport Protocol identifying and analyzing method, includes the following steps,
Step S101: network package data are resolved into session data.The data that this step can communicate various protocols It extracts, below with the progress of hypertext transfer protocol (Hyper Text Transfer Protocol, HTTP) data instance It is described in detail.The network package data obtained from data source are resolved to the session data of text formatting first.HTTP is assisted Data are discussed, HTTPPOST session data is resolved to using http protocol stack.Session data after parsing include HTTP header and HTTP entity part.It is reduced into HTTPPOST session data according to the parsing of http protocol stack, needs first to build transmission control protocol (Transmission Control Protocol, TCP) stream, is then parsed by http session, such as utilize open source software again Snort realizes this kind of function.
Step S102: judging whether the entity part of session data meets preset data form, if then to session data Carry out data format mark.
After network package data are resolved to session data, the entity part of session data is identified, judges session Whether the entity part of data meets preset data form.Preset data form can personal settings as needed.Ordinary circumstance Under, preset data form is selected as needed, the data for needing to extract generally are only contained in preset data form, the operation, The session data for not meeting preset data form can be filtered out.Therefore it can extract, save to avoid to invalid data Data extraction time.When the entity part of session data meets preset data form, data format mark is carried out to session data Note, to identify which kind of preset data form session data belongs to.
Step S103: carrying out multimode matching to the session data for meeting preset data form, judges whether to hit default spy Sign is gone here and there, and the hit location of default feature string is obtained when hitting default feature string.
Wherein presetting feature string by configuration file management, can preset the quantity of feature string more than or equal to 1.It is default Feature string for example can be " phonenumber " (telephone number), " MAC " (hardware address) and " mac " etc..Optionally, right The session data for meeting preset data form carries out multimode matching, judges whether to hit default feature string, and in the default spy of hit Before the hit location for obtaining default feature string when sign string, further includes: preset feature string by configuration file management.Pass through and configures Feature string is preset in file management, can flexibly be added, be deleted, and multimode matching algorithm used in this research for example can be AC Algorithm can also use other open source algorithms.
Default feature string is obtained by reading configuration file, then carries out the generation of multimode matching algorithm state tree, thus most Subsequent multimode matching operation is realized eventually.
Step S104: according to the hit location of the data format of session data mark and default feature string, session is determined The corresponding extraction function of data, and data extraction is carried out to session data according to function is extracted.
By carrying out multimode matching to the session data for meeting preset data form, reduces and extract looking into data procedures Match time is looked for, the full-text data extraction efficiency of mass data is improved.
The data extraction method that transmission system of the present invention provides is not only applicable to specific website, offline data are extracted, more It is extracted suitable for not appointed website, the data of big flow, thread is per second can handle up to 10000 effective HTTP for single processing Session has achieved the purpose that magnanimity extracts full-text data.
Although specific embodiments of the present invention have been described above, those skilled in the art should be appreciated that this It is merely illustrative of, various changes or modifications can be made to present embodiment, without departing from the principle and substance of the present invention, Protection scope of the present invention is only limited by the claims that follow.

Claims (6)

1. a kind of transmission system Data Transport Protocol identifying and analyzing method, which is characterized in that include the following steps,
Step S101: network package data are resolved into session data;
Step S102: judging whether the entity part of session data meets preset data form, if then carrying out to session data Data format mark;
Step S103: carrying out multimode matching to the session data for meeting preset data form, judge whether to hit default feature string, And the hit location of default feature string is obtained when hitting default feature string;
Step S104: according to the hit location of the data format of session data mark and default feature string, session data is determined Corresponding extraction function, and according to extract function to session data carry out data extraction.
2. transmission system described in accordance with the claim 1 Data Transport Protocol identifying and analyzing method, it is characterised in that: the step Network package data protocol uses hypertext transfer protocol in rapid S101, resolves to HTTPPOST session number using http protocol stack According to;Session data after parsing includes HTTP header and HTTP entity part.
3. transmission system Data Transport Protocol identifying and analyzing method according to claim 2, it is characterised in that: the step HTTPPOST session data is resolved to using http protocol stack in rapid S101 and first builds transmission control protocol stream, then presses HTTP meeting again Words parsing.
4. transmission system described in accordance with the claim 1 Data Transport Protocol identifying and analyzing method, it is characterised in that: the step Feature string is preset in rapid S103 by configuration file management, can preset the quantity of feature string more than or equal to 1.
5. transmission system described in accordance with the claim 1 Data Transport Protocol identifying and analyzing method, it is characterised in that: the step The generation that configuration file obtains progress multimode matching algorithm state tree after default feature string is read in rapid S103.
6. transmission system described in accordance with the claim 1 Data Transport Protocol identifying and analyzing method, it is characterised in that: the step By carrying out the judgement of preset data form to the session data after parsing in rapid S104, the nothing of preset data form will not met Effect data filter out.
CN201910069203.9A 2019-01-24 2019-01-24 Transmission system Data Transport Protocol identifying and analyzing method Pending CN109905372A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910069203.9A CN109905372A (en) 2019-01-24 2019-01-24 Transmission system Data Transport Protocol identifying and analyzing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910069203.9A CN109905372A (en) 2019-01-24 2019-01-24 Transmission system Data Transport Protocol identifying and analyzing method

Publications (1)

Publication Number Publication Date
CN109905372A true CN109905372A (en) 2019-06-18

Family

ID=66944133

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910069203.9A Pending CN109905372A (en) 2019-01-24 2019-01-24 Transmission system Data Transport Protocol identifying and analyzing method

Country Status (1)

Country Link
CN (1) CN109905372A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105302885A (en) * 2015-10-15 2016-02-03 北京锐安科技有限公司 Full-text data extraction method and device
US20170208037A1 (en) * 2014-06-23 2017-07-20 Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. Method and system for providing deep packet inspection as a service
CN108287887A (en) * 2018-01-16 2018-07-17 北京奇艺世纪科技有限公司 A kind of multi-mode matching method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170208037A1 (en) * 2014-06-23 2017-07-20 Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. Method and system for providing deep packet inspection as a service
CN105302885A (en) * 2015-10-15 2016-02-03 北京锐安科技有限公司 Full-text data extraction method and device
CN108287887A (en) * 2018-01-16 2018-07-17 北京奇艺世纪科技有限公司 A kind of multi-mode matching method and device

Similar Documents

Publication Publication Date Title
US9893970B2 (en) Data loss monitoring of partial data streams
CN105162626B (en) Network flow depth recognition system and recognition methods based on many-core processor
CN101741644B (en) Flow detection method and apparatus
CN102664935B (en) Method and system for associated output of WEB class user behavior and user information
WO2019114700A1 (en) Traffic analysis method, public service traffic attribution method and corresponding computer system
CN102655482B (en) HTTP (hyper text transport protocol) protocol analysis based web E-mail recovering method
CN103384213B (en) A kind of detected rule Optimal Configuration Method and equipment
CN106101015A (en) A kind of mobile Internet traffic classes labeling method and system
CN110519298A (en) A kind of Tor method for recognizing flux and device based on machine learning
CN106330584A (en) Identification method and identification device of business flow
CN102098331A (en) Method and system for reducing WEB type application contents
CN102129528A (en) WEB page tampering identification method and system
CN103401850A (en) Message filtering method and device
CN103905482B (en) Method, push server and the system of pushed information
CN105302885A (en) Full-text data extraction method and device
CN105429950A (en) Network flow identification system and method based on dynamic data packet sampling
CN109275045B (en) DFI-based mobile terminal encrypted video advertisement traffic identification method
CN108289125A (en) TCP sessions recombination based on Stream Processing and statistical data extracting method
CN108462615A (en) A kind of network user's group technology and device
CN104079450B (en) Feature mode set creation method and device
CN109905372A (en) Transmission system Data Transport Protocol identifying and analyzing method
CN104270358B (en) Trustable network transaction system client monitor and its implementation
CN103051501B (en) Detection method for identifying network data according to network data recovery manner
CN103634164B (en) A kind of method and system for obtaining flow information
CN109995784A (en) A kind of data extraction accelerated method based on UDP

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190618