CN109086064A - The general abstracting method of http protocol element based on customized label language - Google Patents

The general abstracting method of http protocol element based on customized label language Download PDF

Info

Publication number
CN109086064A
CN109086064A CN201810860708.2A CN201810860708A CN109086064A CN 109086064 A CN109086064 A CN 109086064A CN 201810860708 A CN201810860708 A CN 201810860708A CN 109086064 A CN109086064 A CN 109086064A
Authority
CN
China
Prior art keywords
label
attribute
stage
indicated
http protocol
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810860708.2A
Other languages
Chinese (zh)
Other versions
CN109086064B (en
Inventor
王丽雪
王恒亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Mao Yu Tong Software Technology Co Ltd
Original Assignee
Nanjing Mao Yu Tong Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Mao Yu Tong Software Technology Co Ltd filed Critical Nanjing Mao Yu Tong Software Technology Co Ltd
Priority to CN201810860708.2A priority Critical patent/CN109086064B/en
Publication of CN109086064A publication Critical patent/CN109086064A/en
Application granted granted Critical
Publication of CN109086064B publication Critical patent/CN109086064B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses the general abstracting methods of http protocol element based on customized label language, the following steps are included: S1: the inter-stitched stage: if there is response data packet, then request data package and response data packet are combined, are spliced into an interactive unit;If there is no response data packet, then by request data package separately as an interactive unit;S2: decoding stage: the interactive unit obtained to the inter-stitched stage is decoded;S3: the rule match stage: decoded interactive unit and rule set are subjected to rule match, obtain element decimation rule;S4: element extraction stage: the element decimation rule obtained according to the rule match stage carries out element extraction to decoded interactive unit;S5: detailed list output stage: the element being drawn into according to element extraction stage fills in detailed single structure respective field, is output to database.Present invention substantially reduces exploitation and maintenance workloads.

Description

The general abstracting method of http protocol element based on customized label language
Technical field
The present invention relates to the general abstracting methods of http protocol element, more particularly to the HTTP based on customized label language The general abstracting method of protocol element.
Background technique
With the development of internet, more and more web applications are realized based on http hypertext transfer protocol, and Renewal frequency is getting faster, this brings huge challenge to cyberspace security monitoring.Traditional network based on http protocol The element abstracting method of application program, according to the http protocol message format of every kind of web application, customization is a set of to have included The code that whole inter-stitched, decoding, rule match, protocol element are extracted and singly exported in detail, this needs a large amount of exploitation and maintenance Workload;Per newly supporting a kind of web application based on http protocol to require recompility and starting version, to client Live use have a significant impact;It simultaneously respectively also can mutual shadow between the abstraction module of the web application based on http protocol The stability for ringing other side extracts platform to entire protocol element and forms tremendous influence.To find out its cause, being primarily due to traditional base It is all special according to the http protocol message of each web application in the element abstracting method of the web application of http protocol Point customizes a whole set of code, and there is no each stage that the element based on http protocol extracts is abstracted into one from macroscopic view to lead to With processing platform, element decimation rule is not abstracted into a set of customized label language, the network based on http protocol is answered General procedure and hardware and software platform management are carried out with the extraction of the element of program.
Summary of the invention
Goal of the invention: a kind of based on customized label the purpose of the present invention is proposing in view of the deficiencies in the prior art The general abstracting method of http protocol element of language.
Technical solution: the http protocol element general abstracting method of the present invention based on customized label language, packet Include following steps:
S1: the inter-stitched stage: if there is response data packet, then group is carried out to request data package and response data packet It closes, is spliced into an interactive unit;It is if there is no response data packet, then request data package is single separately as an interaction Member;
S2: decoding stage: the interactive unit obtained to the inter-stitched stage is decoded;
S3: the rule match stage: carrying out rule match for decoded interactive unit and rule set, obtains element and extracts rule Then;
S4: element extraction stage: the element decimation rule obtained according to the rule match stage, to decoded interactive unit Carry out element extraction;
S5: detailed list output stage: the element being drawn into according to element extraction stage fills in detailed single structure respective field, defeated Database is arrived out.
Further, in the step S3, matching rule is configured, matching rule includes level-one label webapp, and two Grade label website, three-level label url, each label are as follows:
Webapp label: for indicating configuration range;
Website label: for indicating configuration site;
Url label: for indicating url information.
Further, the website label includes sitename attribute, and sitename attribute indicates site name.
Further, the url label includes following three attributes:
Url attribute: website url content is indicated;
Method attribute: the requesting method of url is indicated;
Host attribute: the host information of webpage is indicated.
Further, the element decimation rule in the step S4 includes level Four label info or level Four label entry, is also wrapped Include level Four label data, level Four label hash, Pyatyi label const and Pyatyi label kcm, each label are as follows:
Info label: for indicating webpage information content;
Entry label: for indicating webpage conditional information;
Data label: it is used for unlabeled data information;
Hash label: for indicating hash record information;
Const label: for configuring constant numerical value;
Kcm label: for searching complex key word.
Further, the info label includes following two attribute:
Data_type attribute: business function is indicated;
Oper_id attribute: action type is indicated.
Further, the entry label includes following five attributes:
Entry attribute: criteria character string is indicated;
Encode_type attribute: presentation code mode;
Data_type attribute: business function is indicated;
Oper_id attribute: action type is indicated;
Proto_type attribute: presentation protocol type.
Further, the data label includes position attribute, and position attribute indicates location information;Hash label Including hash_type attribute and vtag attribute, wherein hash_type attribute indicates that hash searches type, and the expression of vtag attribute is deposited The field of storage indicates.
Further, the const label includes following two attribute:
Value attribute: constant numerical value is indicated;
Tagname attribute: field mark is indicated.
Further, the kcm label includes following three attributes:
Key attribute: keyword message is indicated;
Tagname attribute: field mark is indicated;
Encode attribute: presentation code mode.
The utility model has the advantages that the invention discloses a kind of general extraction side of http protocol element based on customized label language Method, compared with prior art, the present invention have it is following the utility model has the advantages that
(1) present invention describes the element decimation rule based on http protocol application program using customized markup language, When needing newly to support an application program or more new application, it is only necessary to modify configuration file, greatly reduce exploitation and Maintenance workload;When edition upgrading, it is only necessary to the configuration file of modification is replaced, without recompilating and starting version, Neng Gou great The influence that the scene saved the time greatly, while also can reduce client uses.
(2) present invention uses the general extraction platform of http protocol element, has decoupled the influence of each application program, one is answered The processing of other applications will not be had an impact with the stability of program.
Detailed description of the invention
Fig. 1 is the flow chart of method in the specific embodiment of the invention.
Specific embodiment
Present embodiment discloses a kind of general abstracting method of http protocol element based on customized label language, As shown in Figure 1, comprising the following steps:
S1: the inter-stitched stage: if there is response data packet, then group is carried out to request data package and response data packet It closes, is spliced into an interactive unit;It is if there is no response data packet, then request data package is single separately as an interaction Member;
S2: decoding stage: the interactive unit obtained to the inter-stitched stage is decoded;
S3: the rule match stage: carrying out rule match for decoded interactive unit and rule set, obtains element and extracts rule Then;
S4: element extraction stage: the element decimation rule obtained according to the rule match stage, to decoded interactive unit Carry out element extraction;
S5: detailed list output stage: the element being drawn into according to element extraction stage fills in detailed single structure respective field, defeated Database is arrived out.
Six kinds of message scenarios are supported in step S1:
The first, uplink and downlink message is complete;
Second, upstream message is complete, and downstream message is imperfect but includes statusline;
The third, downstream message is complete, and upstream message is imperfect but includes request row;
4th kind, uplink and downlink message is imperfect, and still, upstream message includes request row, and downstream message includes state Row;
5th kind, upstream message is complete, no downstream message;
6th kind, upstream message is imperfect.
In step S3, matching rule is configured, matching rule includes level-one label webapp, second level label Website, three-level label url, each label are as follows:
Webapp label: for indicating configuration range;
Website label: for indicating configuration site;
Url label: for indicating url information.
Wherein, website label includes sitename attribute, and sitename attribute indicates site name.
Url label includes following three attributes:
Url attribute: website url content is indicated;It can be the part of url, used between multiple portions ";" segmentation, it must fill out;
Method attribute: the requesting method of url is indicated;Get/post matches url subsidiary conditions, must fill out;
Host attribute: the host information of webpage is indicated;Substring matching, matches url subsidiary conditions, can not fill out.
Element decimation rule in step S4 includes level Four label info or level Four label entry, further includes level Four label Data, level Four label hash, Pyatyi label const and Pyatyi label kcm, each label are as follows:
Info label: for indicating webpage information content;
Entry label: for indicating webpage conditional information;
Data label: it is used for unlabeled data information;
Hash label: for indicating hash record information;The information of configuration is stored in hash table, is mid-span data correlation Service;
Const label: for configuring constant numerical value;
Kcm label: for searching complex key word.
Wherein, info label includes following two attribute:
Data_type attribute: business function is indicated;The major class for indicating webpage, must fill out;
Oper_id attribute: action type is indicated;It indicates the number filled out in detailed individual character section type of action, must fill out.
Entry label includes following five attributes:
Entry attribute: criteria character string is indicated;It must fill out;When including the character string in decoded interactive unit, then match The entry condition;
Encode_type attribute: presentation code mode;Choosing is filled out;
Data_type attribute: business function is indicated;The major class for indicating webpage, must fill out;
Oper_id attribute: action type is indicated;It indicates the number filled out in detailed individual character section type of action, must fill out;
Proto_type attribute: presentation protocol type;It must fill out.
Data label includes with next attribute:
Position attribute: location information is indicated;It must fill out.
Hash label includes following two attribute:
Hash_type attribute: indicate that hash searches type;It must fill out;
Vtag attribute: the field mark of storage is indicated;It must fill out.
Const label includes following two attribute:
Value attribute: constant numerical value is indicated;It must fill out;
Tagname attribute: field mark is indicated;It must fill out.
Kcm label includes following three attributes:
Key attribute: keyword message, regular expression are indicated;It must fill out
Tagname attribute: field mark is indicated;It must fill out;
Encode attribute: presentation code mode, character string;Choosing is filled out.
Be introduced below with one embodiment: current system has a demand, in order to support based on http protocol The content of application program Facebook restores, and needs to extract the http protocol element of Facebook.Only using the present invention It needs to configure the matching rule and element decimation rule of the http protocol interactive unit of Facebook, http protocol interaction Unit is defeated by the general inter-stitched stage for extracting platform, decoding stage, rule match stage, element extraction stage and detailed list The ticket being made of the protocol element being concerned about can be obtained in the processing in stage out.
1, the inter-stitched stage: it is not necessary to modify code or configuration files, http protocol message can be spliced into a pair To interactive unit.Since message is possible to the case where packet loss occur or being routed to All other routes in network, system is received To message be likely to be incomplete.System is supported the message of 6 kinds of situations being spliced into interactive unit at present, such as institute above It states.
2, decoding stage: it is not necessary to modify code or configuration files, can be according in request header or response head Transfer-Encoding and Content-Encoding is decoded request body or response body, and interactive unit is decoded For decoded interactive unit.Transfer-Encoding decoding at present support gzip, compress, deflate and Gzip, compress and deflate etc. are supported in chunked etc., Content-Encoding decoding at present.
3, the rule match stage: the relevant matching rule of Facebook is needed to configure.When decoded interactive unit enters After the rule match stage, its domain url, method and host and matching rule collection are subjected to rule match.If advised in matching Then, then its junior's label is the element decimation rule chosen.Configuration rule is as follows:
(1) webapp label, no attribute
(2) website label, attribute are as follows:
Sitename:www.facebook.com
(3) the first url labels, attribute are as follows:
Url:/settings? tab=account
Method:GET
Host:www.facebook.com
(4) the 2nd url labels, attribute are as follows:
Url:events;Ref_dashboard_filter=upcoming
Method:GET
Host:www.facebook.com
4, element extraction stage: the corresponding element decimation rule of Facebook correlation url is needed to configure.According to rule match The element decimation rule that stage identifies, extracts protocol element, for subsequent detailed single output stage that protocol element is defeated It is out corresponding ticket.First url label junior label configuration rule is as follows:
(1) info label, attribute are as follows:
Data_type:146 represents user identity class
Oper_id:1 is represented and is logged in
(2) data label, attribute are as follows:
Position:cookie, representative are searched in cookie
(3) kcm label under data label, attribute are as follows:
Key:.*c_user=d+.*
Tagname:USER_ID represents element name as User ID
(4) const label under data label, attribute are as follows:
Value:1
Tagname:ACTION_TYPE represents element name as type of action
2nd url label junior label configuration rule is as follows:
(1) entry label, attribute are as follows:
Entry:event
Data_type:150 represents blog class
Oper_id:5 represents file download
(2) hash label, attribute are as follows:
Hash_type:1 represents four-tuple
Vtag:LIKE_COUNT, FORWARD_COUNT, COMMENT_COUNT represent the field for needing to store
(3) data label under entry label, attribute are as follows:
Position:resp_body is represented and is searched in response body
(4) the first kcm label, attribute are as follows under data label under entry label:
Key:.* " likecount ": d+.*
Tagname:LIKE_COUNT represents element name to thumb up number
Encode:url represents url coding
(5) the 2nd kcm label, attribute are as follows under data label under entry label:
Key:.*id=" pageTitle "<w+>.*
Tagname:EVENT_NAME represents element name as event title
(6) const label, attribute are as follows under data label under entry label:
Value:Send
Tagname:ACTION_TYPE represents element name as type of action
5, detailed single output stage: it is not necessary to modify code or configuration file, the element that element extraction stage can be drawn into It is added in the corresponding field of detailed single structure according to tagname, is output to database, is checked for storing with client.

Claims (10)

1. the general abstracting method of http protocol element based on customized label language, it is characterised in that: the following steps are included:
S1: the inter-stitched stage: if there is response data packet, being then combined request data package and response data packet, spells It is connected in an interactive unit;If there is no response data packet, then by request data package separately as an interactive unit;
S2: decoding stage: the interactive unit obtained to the inter-stitched stage is decoded;
S3: the rule match stage: decoded interactive unit and rule set are subjected to rule match, obtain element decimation rule;
S4: element extraction stage: the element decimation rule obtained according to the rule match stage carries out decoded interactive unit Element extracts;
S5: detailed list output stage: the element being drawn into according to element extraction stage is filled in detailed single structure respective field, is output to Database.
2. the http protocol element general abstracting method according to claim 1 based on customized label language, feature It is: in the step S3, matching rule is configured, matching rule includes level-one label webapp, second level label Website, three-level label url, each label are as follows:
Webapp label: for indicating configuration range;
Website label: for indicating configuration site;
Url label: for indicating url information.
3. the http protocol element general abstracting method according to claim 2 based on customized label language, feature Be: the website label includes sitename attribute, and sitename attribute indicates site name.
4. the http protocol element general abstracting method according to claim 2 based on customized label language, feature Be: the url label includes following three attributes:
Url attribute: website url content is indicated;
Method attribute: the requesting method of url is indicated;
Host attribute: the host information of webpage is indicated.
5. the http protocol element general abstracting method according to claim 1 based on customized label language, feature Be: the element decimation rule in the step S4 includes level Four label info or level Four label entry, further includes level Four label Data, level Four label hash, Pyatyi label const and Pyatyi label kcm, each label are as follows:
Info label: for indicating webpage information content;
Entry label: for indicating webpage conditional information;
Data label: it is used for unlabeled data information;
Hash label: for indicating hash record information;
Const label: for configuring constant numerical value;
Kcm label: for searching complex key word.
6. the http protocol element general abstracting method according to claim 5 based on customized label language, feature Be: the info label includes following two attribute:
Data_type attribute: business function is indicated;
Oper_id attribute: action type is indicated.
7. the http protocol element general abstracting method according to claim 5 based on customized label language, feature Be: the entry label includes following five attributes:
Entry attribute: criteria character string is indicated;
Encode_type attribute: presentation code mode;
Data_type attribute: business function is indicated;
Oper_id attribute: action type is indicated;
Proto_type attribute: presentation protocol type.
8. the http protocol element general abstracting method according to claim 5 based on customized label language, feature Be: the data label includes position attribute, and position attribute indicates location information;Hash label includes hash_ Type attribute and vtag attribute, wherein hash_type attribute indicates that hash searches type, and vtag attribute indicates the field of storage Mark.
9. the http protocol element general abstracting method according to claim 5 based on customized label language, feature Be: the const label includes following two attribute:
Value attribute: constant numerical value is indicated;
Tagname attribute: field mark is indicated.
10. the http protocol element general abstracting method according to claim 5 based on customized label language, feature Be: the kcm label includes following three attributes:
Key attribute: keyword message is indicated;
Tagname attribute: field mark is indicated;
Encode attribute: presentation code mode.
CN201810860708.2A 2018-08-01 2018-08-01 General extraction method of HTTP (hyper text transport protocol) protocol elements based on custom tag language Active CN109086064B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810860708.2A CN109086064B (en) 2018-08-01 2018-08-01 General extraction method of HTTP (hyper text transport protocol) protocol elements based on custom tag language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810860708.2A CN109086064B (en) 2018-08-01 2018-08-01 General extraction method of HTTP (hyper text transport protocol) protocol elements based on custom tag language

Publications (2)

Publication Number Publication Date
CN109086064A true CN109086064A (en) 2018-12-25
CN109086064B CN109086064B (en) 2022-01-14

Family

ID=64831169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810860708.2A Active CN109086064B (en) 2018-08-01 2018-08-01 General extraction method of HTTP (hyper text transport protocol) protocol elements based on custom tag language

Country Status (1)

Country Link
CN (1) CN109086064B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109857958A (en) * 2019-02-13 2019-06-07 杭州孝道科技有限公司 A kind of method that http input point is searched
CN112287177A (en) * 2020-11-25 2021-01-29 城云科技(中国)有限公司 Method and device for creating, changing, displaying and inquiring object label

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101924656A (en) * 2010-08-26 2010-12-22 北京天融信科技有限公司 Method and device for realizing network equipment CLI (Command Line Interface for batch scripti) based on dynamic configuration
CN103491089A (en) * 2013-09-22 2014-01-01 北京锐安科技有限公司 Transcoding method and system of data recovery based on HTTP
CN103544178A (en) * 2012-07-13 2014-01-29 百度在线网络技术(北京)有限公司 Method and equipment for providing reconstruction page corresponding to target page
CN104320454A (en) * 2014-10-23 2015-01-28 北京锐安科技有限公司 Method and system for realizing user-defined output in HTTP protocol recovery
CN105224622A (en) * 2015-09-22 2016-01-06 中国搜索信息科技股份有限公司 The place name address extraction of Internet and standardized method
CN106959944A (en) * 2017-02-14 2017-07-18 中国电子科技集团公司第二十八研究所 A kind of Event Distillation method and system based on Chinese syntax rule

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101924656A (en) * 2010-08-26 2010-12-22 北京天融信科技有限公司 Method and device for realizing network equipment CLI (Command Line Interface for batch scripti) based on dynamic configuration
CN103544178A (en) * 2012-07-13 2014-01-29 百度在线网络技术(北京)有限公司 Method and equipment for providing reconstruction page corresponding to target page
CN103491089A (en) * 2013-09-22 2014-01-01 北京锐安科技有限公司 Transcoding method and system of data recovery based on HTTP
CN104320454A (en) * 2014-10-23 2015-01-28 北京锐安科技有限公司 Method and system for realizing user-defined output in HTTP protocol recovery
CN105224622A (en) * 2015-09-22 2016-01-06 中国搜索信息科技股份有限公司 The place name address extraction of Internet and standardized method
CN106959944A (en) * 2017-02-14 2017-07-18 中国电子科技集团公司第二十八研究所 A kind of Event Distillation method and system based on Chinese syntax rule

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109857958A (en) * 2019-02-13 2019-06-07 杭州孝道科技有限公司 A kind of method that http input point is searched
CN112287177A (en) * 2020-11-25 2021-01-29 城云科技(中国)有限公司 Method and device for creating, changing, displaying and inquiring object label

Also Published As

Publication number Publication date
CN109086064B (en) 2022-01-14

Similar Documents

Publication Publication Date Title
CN105094890B (en) A kind of application plug loading method and device
CN103888490B (en) A kind of man-machine knowledge method for distinguishing of full automatic WEB client side
CN103218431B (en) A kind ofly can identify the system that info web gathers automatically
CN102164186B (en) Method and system for realizing cloud search service
KR100377515B1 (en) Method for managing advertisements on Internet and System therefor
WO2011041465A1 (en) Enhanced website tracking system and method
EP1902389A1 (en) Method and system for obtaining information
CN107145556B (en) Universal distributed acquisition system
CN103428076A (en) Method and device for transmitting information to multi-type terminals or applications
CN104378234B (en) Across the data transmission processing method and system of data center
CN104394211A (en) Hadoop-based user behavior analysis system design and implementation method
WO2003056468A1 (en) Testing dynamic information returned by web servers
CN101635718A (en) Network crawler system and method for acquiring resource as well as network resource gripping device
CN107092639A (en) A kind of search engine system
CN102662966A (en) Method and system for obtaining subject-oriented dynamic page content
CN109670081A (en) The method and device of service request processing
Shevertalov et al. A reverse engineering tool for extracting protocols of networked applications
CN108289093A (en) The construction method and structure system in App application condition codes library
CN109086064A (en) The general abstracting method of http protocol element based on customized label language
CN111222027A (en) Distributed web crawler data extraction system and method based on micro-service architecture
CN103825772B (en) Identifying user clicks on the method and gateway device of behavior
CN106559498A (en) Air control data collection platform and its collection method
JP2008134906A (en) Business process definition generation method, device and program
CN104317847A (en) Method and system for identifying languages in network text information
CN103036746B (en) Passive measurement method and passive measurement system of web page responding time based on network intermediate point

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant