CN109905372A - Transmission system Data Transport Protocol identifying and analyzing method - Google Patents
Transmission system Data Transport Protocol identifying and analyzing method Download PDFInfo
- Publication number
- CN109905372A CN109905372A CN201910069203.9A CN201910069203A CN109905372A CN 109905372 A CN109905372 A CN 109905372A CN 201910069203 A CN201910069203 A CN 201910069203A CN 109905372 A CN109905372 A CN 109905372A
- Authority
- CN
- China
- Prior art keywords
- data
- session
- transmission system
- session data
- feature string
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The present invention relates to transmission system technical field, especially a kind of transmission system Data Transport Protocol identifying and analyzing method includes the following steps, step S101: network package data is resolved to session data;Step S102: judge whether the entity part of session data meets preset data form;Step S103: multimode matching is carried out to the session data for meeting preset data form;Step S104: according to the hit location of the data format of session data mark and default feature string.After adopting the above method, the present invention passes through the judgement to the session data progress preset data form after parsing, the invalid data for not meeting preset data form can be filtered out, therefore data extraction time is shortened, furthermore, by carrying out multimode matching to the session data for meeting preset data form, the lookup match time extracted in data procedures is further reduced, the full-text data extraction efficiency of mass data is improved.
Description
Technical field
The present invention relates to transmission system technical field, especially a kind of transmission system Data Transport Protocol discriminance analysis side
Method.
Background technique
The high speed development of internet makes data penetrate into each industry and operation function field, is increasingly becoming weight
The factor of production wanted, incident is the mass data that the mankind can analyze and handle.Medium-sized or above city such as Beijing,
Shanghai, the Various types of data generated in network behavior daily already exceed PB grades.Such as application program of mobile phone has several TB daily
Submission data generate, longitude and latitude, mobile phone string number, subscriber identification card card number, mobile phone unique identification are contained in these data
The information such as code, and these information are highly useful in security control industry, thus magnanimity extract these information as one it is important and
Complicated work.
There are mainly two types of traditional full text extracting methods: one is the extracting method based on template, this method is suitable for
The information extraction of specific website, but for the data that changeable cell phone application and different web sites generate, it appears incapability is
Power;Another kind is that entire contents are extracted based on regular expression, and this method is suitable for offline, the lesser full text of data volume and mentions
It takes, once the APP in face of magnanimity submits data, efficiency is relatively low.Therefore both methods can be expended in big data quantity
A large amount of manpowers, and inefficiency cannot meet the needs in big data quantity.
108063741 A of Chinese invention patent CN discloses a kind of transport protocol conversion method, comprising: receives client
Or the transmission data packet that view networked server is sent;When the transmission data packet is sent by client, by the transmission number
It is converted into according to packet depending on networking data packet and is sent to the view connection server;When the transmission data packet is by view networked server
When transmission, the transmission data packet is converted into Ethernet data bag and is sent to the client.
Summary of the invention
The technical problem to be solved by the invention is to provide one kind can be with the Data Transport Protocol of high efficiency extraction data information
Identifying and analyzing method.
In order to solve the above technical problems, transmission system of the invention Data Transport Protocol identifying and analyzing method;Including with
Lower step,
Step S101: network package data are resolved into session data;
Step S102: judging whether the entity part of session data meets preset data form, if then to session data
Carry out data format mark;
Step S103: carrying out multimode matching to the session data for meeting preset data form, judges whether to hit default spy
Sign is gone here and there, and the hit location of default feature string is obtained when hitting default feature string;
Step S104: according to the hit location of the data format of session data mark and default feature string, session is determined
The corresponding extraction function of data, and data extraction is carried out to session data according to function is extracted.
Preferably, network package data protocol uses hypertext transfer protocol in the step S101, using http protocol
Stack resolves to HTTPPOST session data;Session data after parsing includes HTTP header and HTTP entity part.
Preferably, HTTPPOST session data is resolved to using http protocol stack in the step S101 and first builds transmission control
Then protocol streams are parsed by http session again.
Preferably, feature string is preset in the step S103 can preset the quantity of feature string by configuration file management
More than or equal to 1.
Preferably, it is read in the step S103 after configuration file obtains default feature string and carries out multimode matching algorithm state
The generation of tree.
It preferably, will by carrying out the judgement of preset data form to the session data after parsing in the step S104
The invalid data for not meeting preset data form filters out.
After adopting the above method, the present invention, can by the judgement to the session data progress preset data form after parsing
To filter out the invalid data for not meeting preset data form, therefore data extraction time is shortened, in addition, by pre- to meeting
If the session data of data format carries out multimode matching, the lookup match time extracted in data procedures is further reduced, is mentioned
The high full-text data extraction efficiency of mass data.
Detailed description of the invention
The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
Fig. 1 is transmission system Data Transport Protocol identifying and analyzing method of the present invention.
Specific embodiment
As shown in Figure 1, transmission system of the present invention Data Transport Protocol identifying and analyzing method, includes the following steps,
Step S101: network package data are resolved into session data.The data that this step can communicate various protocols
It extracts, below with the progress of hypertext transfer protocol (Hyper Text Transfer Protocol, HTTP) data instance
It is described in detail.The network package data obtained from data source are resolved to the session data of text formatting first.HTTP is assisted
Data are discussed, HTTPPOST session data is resolved to using http protocol stack.Session data after parsing include HTTP header and
HTTP entity part.It is reduced into HTTPPOST session data according to the parsing of http protocol stack, needs first to build transmission control protocol
(Transmission Control Protocol, TCP) stream, is then parsed by http session, such as utilize open source software again
Snort realizes this kind of function.
Step S102: judging whether the entity part of session data meets preset data form, if then to session data
Carry out data format mark.
After network package data are resolved to session data, the entity part of session data is identified, judges session
Whether the entity part of data meets preset data form.Preset data form can personal settings as needed.Ordinary circumstance
Under, preset data form is selected as needed, the data for needing to extract generally are only contained in preset data form, the operation,
The session data for not meeting preset data form can be filtered out.Therefore it can extract, save to avoid to invalid data
Data extraction time.When the entity part of session data meets preset data form, data format mark is carried out to session data
Note, to identify which kind of preset data form session data belongs to.
Step S103: carrying out multimode matching to the session data for meeting preset data form, judges whether to hit default spy
Sign is gone here and there, and the hit location of default feature string is obtained when hitting default feature string.
Wherein presetting feature string by configuration file management, can preset the quantity of feature string more than or equal to 1.It is default
Feature string for example can be " phonenumber " (telephone number), " MAC " (hardware address) and " mac " etc..Optionally, right
The session data for meeting preset data form carries out multimode matching, judges whether to hit default feature string, and in the default spy of hit
Before the hit location for obtaining default feature string when sign string, further includes: preset feature string by configuration file management.Pass through and configures
Feature string is preset in file management, can flexibly be added, be deleted, and multimode matching algorithm used in this research for example can be AC
Algorithm can also use other open source algorithms.
Default feature string is obtained by reading configuration file, then carries out the generation of multimode matching algorithm state tree, thus most
Subsequent multimode matching operation is realized eventually.
Step S104: according to the hit location of the data format of session data mark and default feature string, session is determined
The corresponding extraction function of data, and data extraction is carried out to session data according to function is extracted.
By carrying out multimode matching to the session data for meeting preset data form, reduces and extract looking into data procedures
Match time is looked for, the full-text data extraction efficiency of mass data is improved.
The data extraction method that transmission system of the present invention provides is not only applicable to specific website, offline data are extracted, more
It is extracted suitable for not appointed website, the data of big flow, thread is per second can handle up to 10000 effective HTTP for single processing
Session has achieved the purpose that magnanimity extracts full-text data.
Although specific embodiments of the present invention have been described above, those skilled in the art should be appreciated that this
It is merely illustrative of, various changes or modifications can be made to present embodiment, without departing from the principle and substance of the present invention,
Protection scope of the present invention is only limited by the claims that follow.
Claims (6)
1. a kind of transmission system Data Transport Protocol identifying and analyzing method, which is characterized in that include the following steps,
Step S101: network package data are resolved into session data;
Step S102: judging whether the entity part of session data meets preset data form, if then carrying out to session data
Data format mark;
Step S103: carrying out multimode matching to the session data for meeting preset data form, judge whether to hit default feature string,
And the hit location of default feature string is obtained when hitting default feature string;
Step S104: according to the hit location of the data format of session data mark and default feature string, session data is determined
Corresponding extraction function, and according to extract function to session data carry out data extraction.
2. transmission system described in accordance with the claim 1 Data Transport Protocol identifying and analyzing method, it is characterised in that: the step
Network package data protocol uses hypertext transfer protocol in rapid S101, resolves to HTTPPOST session number using http protocol stack
According to;Session data after parsing includes HTTP header and HTTP entity part.
3. transmission system Data Transport Protocol identifying and analyzing method according to claim 2, it is characterised in that: the step
HTTPPOST session data is resolved to using http protocol stack in rapid S101 and first builds transmission control protocol stream, then presses HTTP meeting again
Words parsing.
4. transmission system described in accordance with the claim 1 Data Transport Protocol identifying and analyzing method, it is characterised in that: the step
Feature string is preset in rapid S103 by configuration file management, can preset the quantity of feature string more than or equal to 1.
5. transmission system described in accordance with the claim 1 Data Transport Protocol identifying and analyzing method, it is characterised in that: the step
The generation that configuration file obtains progress multimode matching algorithm state tree after default feature string is read in rapid S103.
6. transmission system described in accordance with the claim 1 Data Transport Protocol identifying and analyzing method, it is characterised in that: the step
By carrying out the judgement of preset data form to the session data after parsing in rapid S104, the nothing of preset data form will not met
Effect data filter out.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910069203.9A CN109905372A (en) | 2019-01-24 | 2019-01-24 | Transmission system Data Transport Protocol identifying and analyzing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910069203.9A CN109905372A (en) | 2019-01-24 | 2019-01-24 | Transmission system Data Transport Protocol identifying and analyzing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109905372A true CN109905372A (en) | 2019-06-18 |
Family
ID=66944133
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910069203.9A Pending CN109905372A (en) | 2019-01-24 | 2019-01-24 | Transmission system Data Transport Protocol identifying and analyzing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109905372A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105302885A (en) * | 2015-10-15 | 2016-02-03 | 北京锐安科技有限公司 | Full-text data extraction method and device |
US20170208037A1 (en) * | 2014-06-23 | 2017-07-20 | Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. | Method and system for providing deep packet inspection as a service |
CN108287887A (en) * | 2018-01-16 | 2018-07-17 | 北京奇艺世纪科技有限公司 | A kind of multi-mode matching method and device |
-
2019
- 2019-01-24 CN CN201910069203.9A patent/CN109905372A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170208037A1 (en) * | 2014-06-23 | 2017-07-20 | Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. | Method and system for providing deep packet inspection as a service |
CN105302885A (en) * | 2015-10-15 | 2016-02-03 | 北京锐安科技有限公司 | Full-text data extraction method and device |
CN108287887A (en) * | 2018-01-16 | 2018-07-17 | 北京奇艺世纪科技有限公司 | A kind of multi-mode matching method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9893970B2 (en) | Data loss monitoring of partial data streams | |
CN105162626B (en) | Network flow depth recognition system and recognition methods based on many-core processor | |
CN101741644B (en) | Flow detection method and apparatus | |
CN102664935B (en) | Method and system for associated output of WEB class user behavior and user information | |
WO2019114700A1 (en) | Traffic analysis method, public service traffic attribution method and corresponding computer system | |
CN102655482B (en) | HTTP (hyper text transport protocol) protocol analysis based web E-mail recovering method | |
CN103384213B (en) | A kind of detected rule Optimal Configuration Method and equipment | |
CN106101015A (en) | A kind of mobile Internet traffic classes labeling method and system | |
CN110519298A (en) | A kind of Tor method for recognizing flux and device based on machine learning | |
CN106330584A (en) | Identification method and identification device of business flow | |
CN102098331A (en) | Method and system for reducing WEB type application contents | |
CN102129528A (en) | WEB page tampering identification method and system | |
CN103401850A (en) | Message filtering method and device | |
CN103905482B (en) | Method, push server and the system of pushed information | |
CN105302885A (en) | Full-text data extraction method and device | |
CN105429950A (en) | Network flow identification system and method based on dynamic data packet sampling | |
CN109275045B (en) | DFI-based mobile terminal encrypted video advertisement traffic identification method | |
CN108289125A (en) | TCP sessions recombination based on Stream Processing and statistical data extracting method | |
CN108462615A (en) | A kind of network user's group technology and device | |
CN104079450B (en) | Feature mode set creation method and device | |
CN109905372A (en) | Transmission system Data Transport Protocol identifying and analyzing method | |
CN104270358B (en) | Trustable network transaction system client monitor and its implementation | |
CN103051501B (en) | Detection method for identifying network data according to network data recovery manner | |
CN103634164B (en) | A kind of method and system for obtaining flow information | |
CN109995784A (en) | A kind of data extraction accelerated method based on UDP |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190618 |