CN105991369B - Message information extraction method and device - Google Patents

Message information extraction method and device Download PDF

Info

Publication number
CN105991369B
CN105991369B CN201510130058.2A CN201510130058A CN105991369B CN 105991369 B CN105991369 B CN 105991369B CN 201510130058 A CN201510130058 A CN 201510130058A CN 105991369 B CN105991369 B CN 105991369B
Authority
CN
China
Prior art keywords
message
response
response message
content
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510130058.2A
Other languages
Chinese (zh)
Other versions
CN105991369A (en
Inventor
王奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou DPTech Technologies Co Ltd
Original Assignee
Hangzhou DPTech Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou DPTech Technologies Co Ltd filed Critical Hangzhou DPTech Technologies Co Ltd
Priority to CN201510130058.2A priority Critical patent/CN105991369B/en
Publication of CN105991369A publication Critical patent/CN105991369A/en
Application granted granted Critical
Publication of CN105991369B publication Critical patent/CN105991369B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a message information extraction method and a device, wherein the method comprises the following steps: acquiring a response message returned to the client by the server in response to the request message sent by the client; filtering out response messages which do not include the target information in the response messages; and extracting the target information from the response message which is not filtered. The message information extraction device only analyzes the response message, and filters the response message without the target information before analyzing the response message, so that the message analysis speed can be increased, and the speed of acquiring the network monitoring information by a manager can be increased.

Description

Message information extraction method and device
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for extracting message information.
Background
With the development of industrial technologies, technology research and development organizations increasingly pay more attention to the protection of proprietary intellectual property rights. In order to ensure that the technology independently developed by the organization is not easily leaked, the internet needs to be monitored, so that an administrator can timely know the internet content on the server accessed by the organization user through the client.
However, in the prior art, all the interactive messages between the client and the server are usually analyzed, so the analysis speed is slow, and the speed of acquiring the network monitoring information by the administrator is slow.
Disclosure of Invention
In view of this, the present invention provides a method and an apparatus for extracting message information, so as to solve the problem that the network monitoring information acquisition speed is slow.
According to a first aspect of the embodiments of the present invention, the present invention provides a method for extracting message information, where the method includes:
acquiring a request message sent by a client;
judging whether a suffix identifier of a Uniform Resource Locator (URL) in the request message indicates that a server responds to the request message or not, wherein a response message returned to the client comprises plain text content;
if yes, acquiring a response message returned to the client by the server in response to the request message sent by the client;
filtering out response messages which do not include target information in the response messages;
and extracting the target information from the response message which is not filtered.
According to a second aspect of the embodiments of the present invention, the present invention provides a message information extraction apparatus, including:
the acquiring unit is used for acquiring a request message sent by a client; when judging that a suffix identifier of a Uniform Resource Locator (URL) in the request message indicates that a server responds to the request message and a response message returned to the client comprises plain text content; acquiring a response message returned to a client by a server in response to a request message sent by the client;
a judging unit, configured to judge whether a suffix identifier of a uniform resource locator URL in the request message indicates that a server responds to the request message, and a response message returned to the client includes plain text content;
the filtering unit is used for filtering response messages which do not include the target information;
and the extracting unit is used for extracting the target information from the response message which is not filtered out.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
in the embodiment of the invention, the message information extraction device can improve the message analysis speed by analyzing only the response message and filtering the response message without the target information before analyzing the response message, thereby improving the speed of acquiring the network monitoring information by the administrator.
Drawings
Fig. 1 is a schematic diagram of an application scenario for implementing message information extraction by applying an embodiment of the present invention;
FIG. 2 is a flow chart of an embodiment of a message information extraction method of the present invention;
fig. 3 is a block diagram of another embodiment of the message information extraction method of the present invention;
fig. 4 is a hardware configuration diagram of a device in which the message information extraction apparatus of the present invention is located;
fig. 5 is a block diagram of an embodiment of a message information extraction apparatus according to the present invention.
Detailed Description
In order to make the technical solutions in the embodiments of the present invention better understood and make the above objects, features and advantages of the embodiments of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, a schematic view of an application scenario for implementing message information extraction by applying the embodiment of the present invention is shown. In fig. 1, the client may be a computer, a mobile phone, an iPad, and the like, and the message information extraction device may be a router, a switch, and the like. The user can request to access the resource provided in the server through the client, and the message information extraction device is used for acquiring a request message sent by the client to the server and a response message returned by the server to the client.
In the embodiment of the invention, a message information extraction device firstly acquires a request message sent by a server in response to a client and returns a response message to the client; then, the response messages not including the target information in the response messages are filtered, and the target information is extracted from the response messages which are not filtered, so that the message information extraction device in the embodiment of the invention can improve the message analysis speed by analyzing only the response messages and filtering the response messages not including the target information before analyzing the response messages, thereby improving the speed of acquiring the network monitoring information by the administrator.
Referring to fig. 2, a flowchart of an embodiment of the message information extraction method of the present invention includes:
step 201, obtaining a response message returned to the client by the server in response to the request message sent by the client.
In the preferred embodiment of the present invention, when a user requests to access a resource in a server through a client, the client sends a request message to the server, and the server returns a response message to the client after receiving the request message. According to research, only when the response message returned to the client by the server includes the plain text content, the message information extraction device can obtain the information required by the administrator to realize network monitoring according to the plain text content in the response message. Whether the response message returned by the server to the client includes the plain text content is determined by the type of the response message (for example, the html-type response message may include the plain text content, and the JavaScript-type response message does not include the plain text content), and according to different response mechanisms of the server, for the same type of response message, the server may return the content in the web page response message in the plain text form or return the content in the web page response message in the non-plain text form. In addition, the request message sent by the client to the server usually includes an identifier for indicating the type of its corresponding response message.
In summary, in this embodiment, the message information extraction apparatus may first obtain the request message sent by the client, and then determine whether the response message returned to the client by the server in response to the request message may include plain text content according to the identifier in the request message, which is used to indicate the type of the response message returned to the client by the server in response to the request message. When the server responds to the request message and the response message returned to the client may include plain text content, the message information extraction device may obtain the response message returned to the client by the server in response to the request message. In this embodiment, the message information extraction device only obtains the response message corresponding to the request message when the identifier in the request message indicates that the corresponding response message may include plain text content.
For example, when a user accesses a web page on a server through a client based on an HTTP (HyperText Transfer Protocol), the message information extraction device may first acquire an HTTP request message transmitted by the client. Because the HTTP request message includes a start line, a message header, and a message body, where a suffix identifier of a URL (Uniform resource Locator) in the start line may be used to indicate a type of a response message returned to the client by the server in response to the HTTP request message, the message information extraction apparatus may find the URL of the start line in the HTTP request message after acquiring the HTTP request message sent by the client, and determine whether the suffix identifier of the URL is html (HyperText Mark-up Language). When the suffix identifier of the URL of the start line in the HTTP request message is html, it indicates that the response message returned to the client by the server in response to the request message may include plain text content, and at this time, the message information extraction device may obtain the HTTP response message returned to the client by the server in response to the HTTP request message.
It should be noted that: when the message information extraction device is a router or a switch, the message interaction between the client and the server adopts a session flow mode, so that the message information extraction device can easily distinguish the corresponding relation between each request message and each response message, and can accurately acquire the response message which possibly comprises plain text content. When the message information extraction device is not a router or a switch, the message information extraction device can acquire an Internet Protocol (IP) address of the client and an IP address of the server while acquiring a request message sent by the client and a response message sent by the server, so as to determine a corresponding relationship between each request message and each response message according to the IP addresses of the client and the server, and further accurately acquire a response message which may include plain text content.
Step 202, filtering out response messages which do not include the target information.
In a preferred embodiment of the present invention, because the web page response message returned by the server to the client includes an identifier used to indicate whether the content in the response message is plain text, the message extraction device may determine whether the content in the response message is plain text according to the identifier used to indicate whether the content in the response message is plain text in the response message. In the embodiment, when extracting the target information, the message information extracting apparatus extracts the target information only for the response message which definitely includes the plain text content, so that the speed of acquiring the network monitoring information by the administrator can be increased. It should be noted that: the content in the response message in this embodiment may refer to a part of or all of the content of the response message.
For example, when a user accesses a web page on a server through a client based on an HTTP protocol, since a Content-Type field in a header of an HTTP response packet may be used to indicate whether Content in the HTTP response packet is a plain text, the packet information extraction apparatus may first determine whether the Content-Type field in the header of the HTTP response packet is a text/html. When the Content-Type field in the message header of the HTTP response message is text/html, the Content in the HTTP response message is shown to be a plain text, and the HTTP response comprises target information required by an administrator to realize network monitoring, and at the moment, the message information extraction device can keep the HTTP response message; when the Content-Type field in the message header of the HTTP response message is not text/html, it indicates that the Content in the HTTP response message is a non-plain text and the HTTP response does not include target information required by an administrator to implement network monitoring, and at this time, the message information extraction device may filter the HTTP response message. In this example, the content of the HTTP response packet refers to the content included in the packet body of the HTTP response packet.
Step 203, extracting the target information from the response message not filtered out.
In the preferred embodiment of the present invention, since the web page title included in the web page response message returned from the server to the client can reflect the internet content that the user is accessing, the message information extraction device can extract the web page title from the response message that is not filtered out, thereby implementing network monitoring.
For example, when a user accesses a web page on a server through a client based on an HTTP protocol, since a title of the web page is usually marked by a title in an HTTP message, the message information prompting apparatus may first search for an identifier < title …/title > in a body of the HTTP response message, and then read contents between the title and the title, thereby obtaining the web page title in the HTTP response message.
As can be seen from the above embodiments, the message information extraction device in the embodiments of the present invention only analyzes the response message, and filters out the response message that does not include the target information before analyzing the response message, thereby increasing the message analysis speed, and thus increasing the speed of acquiring the network monitoring information by the administrator.
Referring to fig. 3, a flowchart of another embodiment of the message information extraction method of the present invention is shown, where the embodiment takes a message information extraction device as a switch or a router as an example to describe in detail a message information extraction process in the embodiment of the present invention:
step 301, obtaining an HTTP request message sent by a client to a server.
Step 302, judging whether the suffix identifier of the URL in the HTTP request message is html, if so, executing step 303, otherwise, executing step 307.
Step 303, acquiring an HTTP response packet returned to the client by the server in response to the HTTP request packet, and executing step 304.
And step 304, judging whether the Content-Type field in the message header of the HTTP response message is text/html, if so, executing step 305, and otherwise, executing step 306.
Step 305 extracts the web page title from the HTTP response message, and step 306 is executed.
Step 306, forwarding the HTTP response packet to the client.
Step 307, the HTTP request message is forwarded to the server.
Corresponding to the embodiment of the message information extraction method, the invention also provides an embodiment of a message information extraction device.
The embodiment of the message information extraction device can be realized by software, or by hardware or a combination of the software and the hardware. The software implementation is taken as an example, and is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for operation through the processor of the device where the software implementation is located as a logical means. From a hardware level, as shown in fig. 4, it is a hardware structure diagram of a device where the message information extraction apparatus of the present invention is located, except for the processor, the network interface, the memory and the nonvolatile memory shown in fig. 4, the device where the apparatus is located in the embodiment may also include other hardware, such as a forwarding chip responsible for processing a message; the device may also be a distributed device in terms of hardware structure, and may include multiple interface cards to facilitate expansion of message processing at the hardware level.
Referring to fig. 5, a block diagram of an embodiment of a message information extraction apparatus according to the present invention is shown, where the apparatus includes:
an obtaining unit 510, configured to obtain a response packet returned to a client by a server in response to a request packet sent by the client;
a filtering unit 520, configured to filter out response packets that do not include the target information from the response packets;
an extracting unit 530, configured to extract the target information from the unfiltered response packet.
In an alternative implementation form of the present invention,
the obtaining unit 510 is further configured to obtain a request packet sent by a client before obtaining a response packet returned to the client by a server in response to the request packet sent by the client;
the apparatus may further include: a determining unit 540, configured to determine, according to an identifier in the request packet, that is used to indicate a type of a response packet returned to the client by the server in response to the request packet, whether the response packet returned to the client by the server in response to the request packet may include plain text content;
the obtaining unit 550 is specifically configured to, when the response message returned to the client by the server in response to the request message may include plain text content, obtain the response message returned to the client by the server in response to the request message.
In a further alternative implementation form of the invention,
the determining unit 540 is specifically configured to determine whether a suffix identifier of a URL in the request packet indicates that a response packet returned to the client by the server in response to the request packet may include plain text content;
the obtaining unit 510 is specifically configured to, if a suffix identifier of a URL in the request message indicates that the server responds to the request message, a response message returned to the client may include plain text content, and obtain a response message returned to the client by the server in response to the request message.
In a further alternative implementation form of the invention,
the filtering unit 520 is specifically configured to determine whether the content in the response message is a plain text according to an identifier in the response message, where the identifier is used to indicate whether the content in the response message is a plain text; if yes, indicating that the response message comprises the target information, and not filtering the response message; otherwise, the response message does not include the target information, and the response message is filtered.
In a further alternative implementation form of the invention,
the filtering unit 520 is specifically configured to determine whether a Content-Type field in a header of the response packet indicates that the Content in the response packet is a plain text.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
As can be seen from the above embodiments, the message information extraction device in the embodiments of the present invention only analyzes the response message, and filters out the response message that does not include the target information before analyzing the response message, thereby increasing the message analysis speed, and thus increasing the speed of acquiring the network monitoring information by the administrator.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (6)

1. A message information extraction method is characterized by comprising the following steps:
acquiring a request message sent by a client;
judging whether a suffix identifier of a Uniform Resource Locator (URL) in the request message indicates that a server responds to the request message or not, wherein a response message returned to the client comprises plain text content;
if yes, acquiring a response message returned to the client by the server in response to the request message;
filtering response messages which do not include target information in the response messages, wherein the target information includes a webpage title;
and extracting the target information from the response message which is not filtered.
2. The method of claim 1, wherein the filtering out response packets that do not include destination information from the response packets comprises:
judging whether the content in the response message is a plain text or not according to an identifier which is used for indicating whether the content in the response message is the plain text or not in the response message;
if yes, indicating that the response message comprises the target information, and not filtering the response message; otherwise, the response message does not include the target information, and the response message is filtered.
3. The method of claim 2, wherein the determining whether the content in the response message is plain text according to an identifier in the response message indicating whether the content in the response message is plain text comprises:
and judging whether the Content Type Content-Type field in the message header of the response message represents that the Content in the response message is a plain text.
4. An apparatus for extracting message information, the apparatus comprising:
the acquiring unit is used for acquiring a request message sent by a client; when judging that a suffix identifier of a Uniform Resource Locator (URL) in the request message indicates that a server responds to the request message and a response message returned to the client comprises plain text content; acquiring a response message returned to the client by the server in response to the request message;
a judging unit, configured to judge whether a suffix identifier of a uniform resource locator URL in the request message indicates that a server responds to the request message, and a response message returned to the client includes plain text content;
the filtering unit is used for filtering response messages which do not include target information in the response messages, and the target information includes a webpage title;
and the extracting unit is used for extracting the target information from the response message which is not filtered out.
5. The apparatus of claim 4,
the filtering unit is specifically configured to determine whether the content in the response message is a plain text according to an identifier in the response message, where the identifier is used to indicate whether the content in the response message is a plain text; if yes, indicating that the response message comprises the target information, and not filtering the response message; otherwise, the response message does not include the target information, and the response message is filtered.
6. The apparatus of claim 4,
the filtering unit is specifically configured to determine whether a Content-Type field in a header of the response packet indicates that the Content in the response packet is a plain text.
CN201510130058.2A 2015-03-23 2015-03-23 Message information extraction method and device Active CN105991369B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510130058.2A CN105991369B (en) 2015-03-23 2015-03-23 Message information extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510130058.2A CN105991369B (en) 2015-03-23 2015-03-23 Message information extraction method and device

Publications (2)

Publication Number Publication Date
CN105991369A CN105991369A (en) 2016-10-05
CN105991369B true CN105991369B (en) 2020-03-06

Family

ID=57040407

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510130058.2A Active CN105991369B (en) 2015-03-23 2015-03-23 Message information extraction method and device

Country Status (1)

Country Link
CN (1) CN105991369B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102882703A (en) * 2012-08-31 2013-01-16 赛尔网络有限公司 Hyper text transfer protocol (HTTP)-analysis-based uniform resource locator (URL) automatically classifying and grading system and method
CN103118007A (en) * 2013-01-06 2013-05-22 瑞斯康达科技发展股份有限公司 Method and system of acquiring user access behavior
CN104348642A (en) * 2013-07-31 2015-02-11 华为技术有限公司 A spam information filtering method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102882703A (en) * 2012-08-31 2013-01-16 赛尔网络有限公司 Hyper text transfer protocol (HTTP)-analysis-based uniform resource locator (URL) automatically classifying and grading system and method
CN103118007A (en) * 2013-01-06 2013-05-22 瑞斯康达科技发展股份有限公司 Method and system of acquiring user access behavior
CN104348642A (en) * 2013-07-31 2015-02-11 华为技术有限公司 A spam information filtering method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《基于应用协议分析的网络信息监控系统》;史轶;《中国优秀硕士学位论文全文数据库 信息科技辑》;20100516;论文第22-25页 *

Also Published As

Publication number Publication date
CN105991369A (en) 2016-10-05

Similar Documents

Publication Publication Date Title
CN107341160B (en) Crawler intercepting method and device
US9954886B2 (en) Method and apparatus for detecting website security
US9794242B2 (en) Method, apparatus and application platform for realizing logon to an application service website
RU2610827C2 (en) Method and device for router-based control of operation in network
CN103780714B (en) The detection method of a kind of dns server and device
CN106656666B (en) Method and device for acquiring first screen time of webpage
CN105245550B (en) Domain Hijacking determination method and device
CN108259425A (en) The determining method, apparatus and server of query-attack
KR102090982B1 (en) How to identify malicious websites, devices and computer storage media
CN101159762B (en) Method and device of accelerating download of web page contents
CN105635073B (en) Access control method and device and network access equipment
CN105635064B (en) CSRF attack detection method and device
CN103064979A (en) Router and method for implementing same to process web page data
CN109802919B (en) Web page access intercepting method and device
EP2857987A1 (en) Acquiring method, device and system of user behavior
CN102857369A (en) Website log saving system, method and apparatus
US20150271245A1 (en) Web application interaction method, apparatus, and system
CN103825919A (en) Method, device and system for data resource caching
KR20180083897A (en) Method and apparatus for obtaining IP address
US10225358B2 (en) Page push method, device, server and system
US20130268662A1 (en) Hypertext transfer protocol http stream association method and device
CN105991369B (en) Message information extraction method and device
CN111193771A (en) Mobile-end enterprise browser-based access method and device
CN107483294B (en) Method and device for monitoring network request
CN113395278B (en) Method and system for detecting data packet grabbing of Burpesite packet grabbing tool

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Binjiang District and Hangzhou city in Zhejiang Province Road 310051 No. 68 in the 6 storey building

Applicant after: Hangzhou Dipu Polytron Technologies Inc

Address before: Binjiang District and Hangzhou city in Zhejiang Province Road 310051 No. 68 in the 6 storey building

Applicant before: Hangzhou Dipu Technology Co., Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant