CN109086064A - The general abstracting method of http protocol element based on customized label language - Google Patents
The general abstracting method of http protocol element based on customized label language Download PDFInfo
- Publication number
- CN109086064A CN109086064A CN201810860708.2A CN201810860708A CN109086064A CN 109086064 A CN109086064 A CN 109086064A CN 201810860708 A CN201810860708 A CN 201810860708A CN 109086064 A CN109086064 A CN 109086064A
- Authority
- CN
- China
- Prior art keywords
- label
- attribute
- stage
- indicated
- http protocol
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
- G06F8/65—Updates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/71—Version control; Configuration management
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses the general abstracting methods of http protocol element based on customized label language, the following steps are included: S1: the inter-stitched stage: if there is response data packet, then request data package and response data packet are combined, are spliced into an interactive unit;If there is no response data packet, then by request data package separately as an interactive unit;S2: decoding stage: the interactive unit obtained to the inter-stitched stage is decoded;S3: the rule match stage: decoded interactive unit and rule set are subjected to rule match, obtain element decimation rule;S4: element extraction stage: the element decimation rule obtained according to the rule match stage carries out element extraction to decoded interactive unit;S5: detailed list output stage: the element being drawn into according to element extraction stage fills in detailed single structure respective field, is output to database.Present invention substantially reduces exploitation and maintenance workloads.
Description
Technical field
The present invention relates to the general abstracting methods of http protocol element, more particularly to the HTTP based on customized label language
The general abstracting method of protocol element.
Background technique
With the development of internet, more and more web applications are realized based on http hypertext transfer protocol, and
Renewal frequency is getting faster, this brings huge challenge to cyberspace security monitoring.Traditional network based on http protocol
The element abstracting method of application program, according to the http protocol message format of every kind of web application, customization is a set of to have included
The code that whole inter-stitched, decoding, rule match, protocol element are extracted and singly exported in detail, this needs a large amount of exploitation and maintenance
Workload;Per newly supporting a kind of web application based on http protocol to require recompility and starting version, to client
Live use have a significant impact;It simultaneously respectively also can mutual shadow between the abstraction module of the web application based on http protocol
The stability for ringing other side extracts platform to entire protocol element and forms tremendous influence.To find out its cause, being primarily due to traditional base
It is all special according to the http protocol message of each web application in the element abstracting method of the web application of http protocol
Point customizes a whole set of code, and there is no each stage that the element based on http protocol extracts is abstracted into one from macroscopic view to lead to
With processing platform, element decimation rule is not abstracted into a set of customized label language, the network based on http protocol is answered
General procedure and hardware and software platform management are carried out with the extraction of the element of program.
Summary of the invention
Goal of the invention: a kind of based on customized label the purpose of the present invention is proposing in view of the deficiencies in the prior art
The general abstracting method of http protocol element of language.
Technical solution: the http protocol element general abstracting method of the present invention based on customized label language, packet
Include following steps:
S1: the inter-stitched stage: if there is response data packet, then group is carried out to request data package and response data packet
It closes, is spliced into an interactive unit;It is if there is no response data packet, then request data package is single separately as an interaction
Member;
S2: decoding stage: the interactive unit obtained to the inter-stitched stage is decoded;
S3: the rule match stage: carrying out rule match for decoded interactive unit and rule set, obtains element and extracts rule
Then;
S4: element extraction stage: the element decimation rule obtained according to the rule match stage, to decoded interactive unit
Carry out element extraction;
S5: detailed list output stage: the element being drawn into according to element extraction stage fills in detailed single structure respective field, defeated
Database is arrived out.
Further, in the step S3, matching rule is configured, matching rule includes level-one label webapp, and two
Grade label website, three-level label url, each label are as follows:
Webapp label: for indicating configuration range;
Website label: for indicating configuration site;
Url label: for indicating url information.
Further, the website label includes sitename attribute, and sitename attribute indicates site name.
Further, the url label includes following three attributes:
Url attribute: website url content is indicated;
Method attribute: the requesting method of url is indicated;
Host attribute: the host information of webpage is indicated.
Further, the element decimation rule in the step S4 includes level Four label info or level Four label entry, is also wrapped
Include level Four label data, level Four label hash, Pyatyi label const and Pyatyi label kcm, each label are as follows:
Info label: for indicating webpage information content;
Entry label: for indicating webpage conditional information;
Data label: it is used for unlabeled data information;
Hash label: for indicating hash record information;
Const label: for configuring constant numerical value;
Kcm label: for searching complex key word.
Further, the info label includes following two attribute:
Data_type attribute: business function is indicated;
Oper_id attribute: action type is indicated.
Further, the entry label includes following five attributes:
Entry attribute: criteria character string is indicated;
Encode_type attribute: presentation code mode;
Data_type attribute: business function is indicated;
Oper_id attribute: action type is indicated;
Proto_type attribute: presentation protocol type.
Further, the data label includes position attribute, and position attribute indicates location information;Hash label
Including hash_type attribute and vtag attribute, wherein hash_type attribute indicates that hash searches type, and the expression of vtag attribute is deposited
The field of storage indicates.
Further, the const label includes following two attribute:
Value attribute: constant numerical value is indicated;
Tagname attribute: field mark is indicated.
Further, the kcm label includes following three attributes:
Key attribute: keyword message is indicated;
Tagname attribute: field mark is indicated;
Encode attribute: presentation code mode.
The utility model has the advantages that the invention discloses a kind of general extraction side of http protocol element based on customized label language
Method, compared with prior art, the present invention have it is following the utility model has the advantages that
(1) present invention describes the element decimation rule based on http protocol application program using customized markup language,
When needing newly to support an application program or more new application, it is only necessary to modify configuration file, greatly reduce exploitation and
Maintenance workload;When edition upgrading, it is only necessary to the configuration file of modification is replaced, without recompilating and starting version, Neng Gou great
The influence that the scene saved the time greatly, while also can reduce client uses.
(2) present invention uses the general extraction platform of http protocol element, has decoupled the influence of each application program, one is answered
The processing of other applications will not be had an impact with the stability of program.
Detailed description of the invention
Fig. 1 is the flow chart of method in the specific embodiment of the invention.
Specific embodiment
Present embodiment discloses a kind of general abstracting method of http protocol element based on customized label language,
As shown in Figure 1, comprising the following steps:
S1: the inter-stitched stage: if there is response data packet, then group is carried out to request data package and response data packet
It closes, is spliced into an interactive unit;It is if there is no response data packet, then request data package is single separately as an interaction
Member;
S2: decoding stage: the interactive unit obtained to the inter-stitched stage is decoded;
S3: the rule match stage: carrying out rule match for decoded interactive unit and rule set, obtains element and extracts rule
Then;
S4: element extraction stage: the element decimation rule obtained according to the rule match stage, to decoded interactive unit
Carry out element extraction;
S5: detailed list output stage: the element being drawn into according to element extraction stage fills in detailed single structure respective field, defeated
Database is arrived out.
Six kinds of message scenarios are supported in step S1:
The first, uplink and downlink message is complete;
Second, upstream message is complete, and downstream message is imperfect but includes statusline;
The third, downstream message is complete, and upstream message is imperfect but includes request row;
4th kind, uplink and downlink message is imperfect, and still, upstream message includes request row, and downstream message includes state
Row;
5th kind, upstream message is complete, no downstream message;
6th kind, upstream message is imperfect.
In step S3, matching rule is configured, matching rule includes level-one label webapp, second level label
Website, three-level label url, each label are as follows:
Webapp label: for indicating configuration range;
Website label: for indicating configuration site;
Url label: for indicating url information.
Wherein, website label includes sitename attribute, and sitename attribute indicates site name.
Url label includes following three attributes:
Url attribute: website url content is indicated;It can be the part of url, used between multiple portions ";" segmentation, it must fill out;
Method attribute: the requesting method of url is indicated;Get/post matches url subsidiary conditions, must fill out;
Host attribute: the host information of webpage is indicated;Substring matching, matches url subsidiary conditions, can not fill out.
Element decimation rule in step S4 includes level Four label info or level Four label entry, further includes level Four label
Data, level Four label hash, Pyatyi label const and Pyatyi label kcm, each label are as follows:
Info label: for indicating webpage information content;
Entry label: for indicating webpage conditional information;
Data label: it is used for unlabeled data information;
Hash label: for indicating hash record information;The information of configuration is stored in hash table, is mid-span data correlation
Service;
Const label: for configuring constant numerical value;
Kcm label: for searching complex key word.
Wherein, info label includes following two attribute:
Data_type attribute: business function is indicated;The major class for indicating webpage, must fill out;
Oper_id attribute: action type is indicated;It indicates the number filled out in detailed individual character section type of action, must fill out.
Entry label includes following five attributes:
Entry attribute: criteria character string is indicated;It must fill out;When including the character string in decoded interactive unit, then match
The entry condition;
Encode_type attribute: presentation code mode;Choosing is filled out;
Data_type attribute: business function is indicated;The major class for indicating webpage, must fill out;
Oper_id attribute: action type is indicated;It indicates the number filled out in detailed individual character section type of action, must fill out;
Proto_type attribute: presentation protocol type;It must fill out.
Data label includes with next attribute:
Position attribute: location information is indicated;It must fill out.
Hash label includes following two attribute:
Hash_type attribute: indicate that hash searches type;It must fill out;
Vtag attribute: the field mark of storage is indicated;It must fill out.
Const label includes following two attribute:
Value attribute: constant numerical value is indicated;It must fill out;
Tagname attribute: field mark is indicated;It must fill out.
Kcm label includes following three attributes:
Key attribute: keyword message, regular expression are indicated;It must fill out
Tagname attribute: field mark is indicated;It must fill out;
Encode attribute: presentation code mode, character string;Choosing is filled out.
Be introduced below with one embodiment: current system has a demand, in order to support based on http protocol
The content of application program Facebook restores, and needs to extract the http protocol element of Facebook.Only using the present invention
It needs to configure the matching rule and element decimation rule of the http protocol interactive unit of Facebook, http protocol interaction
Unit is defeated by the general inter-stitched stage for extracting platform, decoding stage, rule match stage, element extraction stage and detailed list
The ticket being made of the protocol element being concerned about can be obtained in the processing in stage out.
1, the inter-stitched stage: it is not necessary to modify code or configuration files, http protocol message can be spliced into a pair
To interactive unit.Since message is possible to the case where packet loss occur or being routed to All other routes in network, system is received
To message be likely to be incomplete.System is supported the message of 6 kinds of situations being spliced into interactive unit at present, such as institute above
It states.
2, decoding stage: it is not necessary to modify code or configuration files, can be according in request header or response head
Transfer-Encoding and Content-Encoding is decoded request body or response body, and interactive unit is decoded
For decoded interactive unit.Transfer-Encoding decoding at present support gzip, compress, deflate and
Gzip, compress and deflate etc. are supported in chunked etc., Content-Encoding decoding at present.
3, the rule match stage: the relevant matching rule of Facebook is needed to configure.When decoded interactive unit enters
After the rule match stage, its domain url, method and host and matching rule collection are subjected to rule match.If advised in matching
Then, then its junior's label is the element decimation rule chosen.Configuration rule is as follows:
(1) webapp label, no attribute
(2) website label, attribute are as follows:
Sitename:www.facebook.com
(3) the first url labels, attribute are as follows:
Url:/settings? tab=account
Method:GET
Host:www.facebook.com
(4) the 2nd url labels, attribute are as follows:
Url:events;Ref_dashboard_filter=upcoming
Method:GET
Host:www.facebook.com
4, element extraction stage: the corresponding element decimation rule of Facebook correlation url is needed to configure.According to rule match
The element decimation rule that stage identifies, extracts protocol element, for subsequent detailed single output stage that protocol element is defeated
It is out corresponding ticket.First url label junior label configuration rule is as follows:
(1) info label, attribute are as follows:
Data_type:146 represents user identity class
Oper_id:1 is represented and is logged in
(2) data label, attribute are as follows:
Position:cookie, representative are searched in cookie
(3) kcm label under data label, attribute are as follows:
Key:.*c_user=d+.*
Tagname:USER_ID represents element name as User ID
(4) const label under data label, attribute are as follows:
Value:1
Tagname:ACTION_TYPE represents element name as type of action
2nd url label junior label configuration rule is as follows:
(1) entry label, attribute are as follows:
Entry:event
Data_type:150 represents blog class
Oper_id:5 represents file download
(2) hash label, attribute are as follows:
Hash_type:1 represents four-tuple
Vtag:LIKE_COUNT, FORWARD_COUNT, COMMENT_COUNT represent the field for needing to store
(3) data label under entry label, attribute are as follows:
Position:resp_body is represented and is searched in response body
(4) the first kcm label, attribute are as follows under data label under entry label:
Key:.* " likecount ": d+.*
Tagname:LIKE_COUNT represents element name to thumb up number
Encode:url represents url coding
(5) the 2nd kcm label, attribute are as follows under data label under entry label:
Key:.*id=" pageTitle "<w+>.*
Tagname:EVENT_NAME represents element name as event title
(6) const label, attribute are as follows under data label under entry label:
Value:Send
Tagname:ACTION_TYPE represents element name as type of action
5, detailed single output stage: it is not necessary to modify code or configuration file, the element that element extraction stage can be drawn into
It is added in the corresponding field of detailed single structure according to tagname, is output to database, is checked for storing with client.
Claims (10)
1. the general abstracting method of http protocol element based on customized label language, it is characterised in that: the following steps are included:
S1: the inter-stitched stage: if there is response data packet, being then combined request data package and response data packet, spells
It is connected in an interactive unit;If there is no response data packet, then by request data package separately as an interactive unit;
S2: decoding stage: the interactive unit obtained to the inter-stitched stage is decoded;
S3: the rule match stage: decoded interactive unit and rule set are subjected to rule match, obtain element decimation rule;
S4: element extraction stage: the element decimation rule obtained according to the rule match stage carries out decoded interactive unit
Element extracts;
S5: detailed list output stage: the element being drawn into according to element extraction stage is filled in detailed single structure respective field, is output to
Database.
2. the http protocol element general abstracting method according to claim 1 based on customized label language, feature
It is: in the step S3, matching rule is configured, matching rule includes level-one label webapp, second level label
Website, three-level label url, each label are as follows:
Webapp label: for indicating configuration range;
Website label: for indicating configuration site;
Url label: for indicating url information.
3. the http protocol element general abstracting method according to claim 2 based on customized label language, feature
Be: the website label includes sitename attribute, and sitename attribute indicates site name.
4. the http protocol element general abstracting method according to claim 2 based on customized label language, feature
Be: the url label includes following three attributes:
Url attribute: website url content is indicated;
Method attribute: the requesting method of url is indicated;
Host attribute: the host information of webpage is indicated.
5. the http protocol element general abstracting method according to claim 1 based on customized label language, feature
Be: the element decimation rule in the step S4 includes level Four label info or level Four label entry, further includes level Four label
Data, level Four label hash, Pyatyi label const and Pyatyi label kcm, each label are as follows:
Info label: for indicating webpage information content;
Entry label: for indicating webpage conditional information;
Data label: it is used for unlabeled data information;
Hash label: for indicating hash record information;
Const label: for configuring constant numerical value;
Kcm label: for searching complex key word.
6. the http protocol element general abstracting method according to claim 5 based on customized label language, feature
Be: the info label includes following two attribute:
Data_type attribute: business function is indicated;
Oper_id attribute: action type is indicated.
7. the http protocol element general abstracting method according to claim 5 based on customized label language, feature
Be: the entry label includes following five attributes:
Entry attribute: criteria character string is indicated;
Encode_type attribute: presentation code mode;
Data_type attribute: business function is indicated;
Oper_id attribute: action type is indicated;
Proto_type attribute: presentation protocol type.
8. the http protocol element general abstracting method according to claim 5 based on customized label language, feature
Be: the data label includes position attribute, and position attribute indicates location information;Hash label includes hash_
Type attribute and vtag attribute, wherein hash_type attribute indicates that hash searches type, and vtag attribute indicates the field of storage
Mark.
9. the http protocol element general abstracting method according to claim 5 based on customized label language, feature
Be: the const label includes following two attribute:
Value attribute: constant numerical value is indicated;
Tagname attribute: field mark is indicated.
10. the http protocol element general abstracting method according to claim 5 based on customized label language, feature
Be: the kcm label includes following three attributes:
Key attribute: keyword message is indicated;
Tagname attribute: field mark is indicated;
Encode attribute: presentation code mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810860708.2A CN109086064B (en) | 2018-08-01 | 2018-08-01 | General extraction method of HTTP (hyper text transport protocol) protocol elements based on custom tag language |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810860708.2A CN109086064B (en) | 2018-08-01 | 2018-08-01 | General extraction method of HTTP (hyper text transport protocol) protocol elements based on custom tag language |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109086064A true CN109086064A (en) | 2018-12-25 |
CN109086064B CN109086064B (en) | 2022-01-14 |
Family
ID=64831169
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810860708.2A Active CN109086064B (en) | 2018-08-01 | 2018-08-01 | General extraction method of HTTP (hyper text transport protocol) protocol elements based on custom tag language |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109086064B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109857958A (en) * | 2019-02-13 | 2019-06-07 | 杭州孝道科技有限公司 | A kind of method that http input point is searched |
CN112287177A (en) * | 2020-11-25 | 2021-01-29 | 城云科技(中国)有限公司 | Method and device for creating, changing, displaying and inquiring object label |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101924656A (en) * | 2010-08-26 | 2010-12-22 | 北京天融信科技有限公司 | Method and device for realizing network equipment CLI (Command Line Interface for batch scripti) based on dynamic configuration |
CN103491089A (en) * | 2013-09-22 | 2014-01-01 | 北京锐安科技有限公司 | Transcoding method and system of data recovery based on HTTP |
CN103544178A (en) * | 2012-07-13 | 2014-01-29 | 百度在线网络技术(北京)有限公司 | Method and equipment for providing reconstruction page corresponding to target page |
CN104320454A (en) * | 2014-10-23 | 2015-01-28 | 北京锐安科技有限公司 | Method and system for realizing user-defined output in HTTP protocol recovery |
CN105224622A (en) * | 2015-09-22 | 2016-01-06 | 中国搜索信息科技股份有限公司 | The place name address extraction of Internet and standardized method |
CN106959944A (en) * | 2017-02-14 | 2017-07-18 | 中国电子科技集团公司第二十八研究所 | A kind of Event Distillation method and system based on Chinese syntax rule |
-
2018
- 2018-08-01 CN CN201810860708.2A patent/CN109086064B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101924656A (en) * | 2010-08-26 | 2010-12-22 | 北京天融信科技有限公司 | Method and device for realizing network equipment CLI (Command Line Interface for batch scripti) based on dynamic configuration |
CN103544178A (en) * | 2012-07-13 | 2014-01-29 | 百度在线网络技术(北京)有限公司 | Method and equipment for providing reconstruction page corresponding to target page |
CN103491089A (en) * | 2013-09-22 | 2014-01-01 | 北京锐安科技有限公司 | Transcoding method and system of data recovery based on HTTP |
CN104320454A (en) * | 2014-10-23 | 2015-01-28 | 北京锐安科技有限公司 | Method and system for realizing user-defined output in HTTP protocol recovery |
CN105224622A (en) * | 2015-09-22 | 2016-01-06 | 中国搜索信息科技股份有限公司 | The place name address extraction of Internet and standardized method |
CN106959944A (en) * | 2017-02-14 | 2017-07-18 | 中国电子科技集团公司第二十八研究所 | A kind of Event Distillation method and system based on Chinese syntax rule |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109857958A (en) * | 2019-02-13 | 2019-06-07 | 杭州孝道科技有限公司 | A kind of method that http input point is searched |
CN112287177A (en) * | 2020-11-25 | 2021-01-29 | 城云科技(中国)有限公司 | Method and device for creating, changing, displaying and inquiring object label |
Also Published As
Publication number | Publication date |
---|---|
CN109086064B (en) | 2022-01-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105094890B (en) | A kind of application plug loading method and device | |
CN103888490B (en) | A kind of man-machine knowledge method for distinguishing of full automatic WEB client side | |
CN103218431B (en) | A kind ofly can identify the system that info web gathers automatically | |
CN102164186B (en) | Method and system for realizing cloud search service | |
KR100377515B1 (en) | Method for managing advertisements on Internet and System therefor | |
WO2011041465A1 (en) | Enhanced website tracking system and method | |
EP1902389A1 (en) | Method and system for obtaining information | |
CN107145556B (en) | Universal distributed acquisition system | |
CN103428076A (en) | Method and device for transmitting information to multi-type terminals or applications | |
CN104378234B (en) | Across the data transmission processing method and system of data center | |
CN104394211A (en) | Hadoop-based user behavior analysis system design and implementation method | |
WO2003056468A1 (en) | Testing dynamic information returned by web servers | |
CN101635718A (en) | Network crawler system and method for acquiring resource as well as network resource gripping device | |
CN107092639A (en) | A kind of search engine system | |
CN102662966A (en) | Method and system for obtaining subject-oriented dynamic page content | |
CN109670081A (en) | The method and device of service request processing | |
Shevertalov et al. | A reverse engineering tool for extracting protocols of networked applications | |
CN108289093A (en) | The construction method and structure system in App application condition codes library | |
CN109086064A (en) | The general abstracting method of http protocol element based on customized label language | |
CN111222027A (en) | Distributed web crawler data extraction system and method based on micro-service architecture | |
CN103825772B (en) | Identifying user clicks on the method and gateway device of behavior | |
CN106559498A (en) | Air control data collection platform and its collection method | |
JP2008134906A (en) | Business process definition generation method, device and program | |
CN104317847A (en) | Method and system for identifying languages in network text information | |
CN103036746B (en) | Passive measurement method and passive measurement system of web page responding time based on network intermediate point |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |