CN106777362A - A kind of information collecting method of the html pages - Google Patents
A kind of information collecting method of the html pages Download PDFInfo
- Publication number
- CN106777362A CN106777362A CN201710043553.9A CN201710043553A CN106777362A CN 106777362 A CN106777362 A CN 106777362A CN 201710043553 A CN201710043553 A CN 201710043553A CN 106777362 A CN106777362 A CN 106777362A
- Authority
- CN
- China
- Prior art keywords
- plug
- information
- units
- collecting method
- html pages
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
- G06F16/986—Document structures and storage, e.g. HTML extensions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a kind of information collecting method of the html pages, comprise the following steps:S1. BHO plug-in units listening mode or npapi plug-in unit listening modes are used, web form is monitored and is submitted event to, and Form Element information is obtained in event handling;The element information that S2.BHO plug-in units or npapi plug-in units will get is encrypted;S3. by the information of encryption, it is transferred to server end.This method only need on a client install plug-in unit without making any change at website service end, you can obtain form information, and be sent to server and counted;Information extraction low cost, is greatly improved the speed and accuracy of Data Enter.
Description
Technical field
The invention belongs to information and date process field, a kind of particularly information collecting method of the html pages.
Background technology
As the level of informatization is constantly deepened, enterprise's craving integrated to informationization is also increasingly strong;Hold internet
The continuous information resources for increasing have contained the commercially valuable information of flood tide, as important information source.At present,
The Related product of the information gathering of the html pages is few in number, and Back ground Information facility requirements of the product to user in itself are high, real
Apply that the cycle is long, system Construction and maintenance cost are high, major customer is ultra-large type business and government, ordinary enterprises without
Power is born.
At present in client computer, service end(C/S)Under model, the list that service end can be to submit in direct access client computer.
But for third-party application exploitation, because service end can not be changed again, at this moment just cannot directly carry out html list statistics.
The content of the invention
To solve the above problems, need not change service end it is an object of the invention to provide one kind and can be achieved with the html pages
The method of information gathering.
To achieve the above object, the technical scheme is that:
A kind of information collecting method of the html pages, comprises the following steps:
S1. BHO plug-in units listening mode or npapi plug-in unit listening modes are used, web form is monitored and is submitted event to, and at event
Form Element information is obtained in reason;
The element information that S2.BHO plug-in units or npapi plug-in units will get is encrypted;
S3. by the information of encryption, it is transferred to server end.
Further, in S1, the browser for IE kernels uses BHO plug-in unit listening modes;For non-IE kernels
Browser uses npapi plug-in unit listening modes.
Further, BHO plug-in units listening mode is comprised the following steps:
S111.BHO plug-in units monitor the html document loaded events of browser;
S112. in html document loaded events, the corresponding element for triggering mouse click event is obtained;
S113. by the com interfaces of the element, element information is obtained;
S114. the element information that will be got is stored in BHO plug-in units, and preparation is sent to server.
Preferably, in S112, if trigger mouse click event is list submitting button, correspondence multiple element is then chosen
The corresponding element that current html documents need.
Further, npapi plug-in units listening mode is comprised the following steps:
S121. npapi plug-in units are quoted in browser extension, start an example for class in npapi plug-in units, and injection is performed
Javascript scripts;
S122. in javascript scenario process is performed, the click event of list submitting button is monitored;
S123. in the click event handling function of javascript scripts, the corresponding element for triggering mouse click event is obtained
Information;
Element information is transmitted to npapi plug-in units by S124.javascript scripts, and preparation is sent to server.
Further, in S2, described encryption is:
Be formatted for information by BHO plug-in units or npapi plug-in units, and sensitive field therein is encrypted, and to formatting after
Infomational message signed, server end to signature verify.
Preferably, information is formatted according to json forms.
Preferably, the cipher mode to sensitive field is RSA public keys, and server end is decrypted using private key;To infomational message
Using RSA private key signatures, server end is signed using RSA public key verifications.
Preferably, in S3, host-host protocol uses http agreements.
The beneficial effects of the invention are as follows:
(1)This method only need on a client install plug-in unit without making any change at website service end, you can obtain list
Information, and be sent to server and counted;Information extraction low cost, is greatly improved the speed and accuracy of Data Enter.
(2)This method has website independence, for the html pages for arbitrarily needing collection information, can use we
Method directly gathers information.
Brief description of the drawings
Fig. 1 is the html information gathering flow charts in the embodiment of the present invention;
Fig. 2 is the BHO plug-in unit listening mode flow charts in the embodiment of the present invention;
Fig. 3 is the npapi plug-in unit listening mode flow charts in the embodiment of the present invention.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples
The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the present invention, not
For limiting the present invention.
Conversely, the present invention covers any replacement done in spirit and scope of the invention being defined by the claims, repaiies
Change, equivalent method and scheme.Further, in order that the public has a better understanding to the present invention, below to of the invention thin
It is detailed to describe some specific detail sections in section description.Part without these details for a person skilled in the art
Description can also completely understand the present invention.
The html information input method flows of the embodiment of the present invention are as shown in Fig. 1.
A kind of information collecting method of the html pages, comprises the following steps:
S1. BHO plug-in units listening mode or npapi plug-in unit listening modes are used, web form is monitored and is submitted event to, and at event
Form Element information is obtained in reason;
The element information that S2.BHO plug-in units or npapi plug-in units will get is encrypted;
S3. by the information of encryption, it is transferred to server end.
In S1, the browser for IE kernels uses BHO plug-in unit listening modes;Browser for non-IE kernels is used
Npapi plug-in unit listening modes.
Wherein, BHO plug-in units are directed to the plug-in unit of IE browser, and BHO plug-in units are a kind of com components, are realized
IObjectWithSite interfaces.And it according to browser is 32 or 64 to need, to registration table HKLM SOFTWARE
Microsoft Windows CurrentVersion Explorer Browser Helper Objects registration BHO plug-in units
Guid.
BHO plug-in unit listening modes are comprised the following steps:
S111.BHO plug-in units monitor the html document loaded events of browser;It is the point of certain submitting button on actually
Hit event, event id: DISPID_HTMLDOCUMENTEVENTS2_ONCLICK.
S112. in html document loaded events, the corresponding element for triggering mouse click event is obtained;If triggering mouse
What punctuate hit event is list submitting button, correspondence multiple element, then choose the corresponding element that current html documents need.
S113. by the com interfaces of the element, element information is obtained;Use the get_all letters of IHTMLDocument2
Number, you can obtain all of element interface IHTMLElement, use the getAttribute methods of the interface, you can obtain unit
Plain content, i.e. element information.
S114. the element information that will be got is stored in BHO plug-in units, and preparation is sent to server.
Wherein, npapi plug-in units are Netscape plug-in applications DLL, are to follow Netscape Communications Corporation(Netscape
Communications Corporation)One group of simple C Plugin application programming interfaces of institute's constituting criterion, main pin
To non-IE browser.During the end of the year 2004, Ge Jia browsers company (IE, Opera, Mozilla etc.) all agrees to support
NPRuntime extends API(Application programming interface)To support Scriptability, thus at present need with
Based on NPRuntime API, Plugin can be just set to cross over various browsers.
Extension comprising npapi plug-in units is installed on browser, after browser starts, can be in the plugins of acquiescence
Loading npapi plug-in units in file, and the MimeType attributes of plug-in unit are read, it is saved in inside browser.During plug-in initialization,
The interface of oneself is passed to npapi plug-in units by browser by NP_Initialize interfaces, and npapi plug-in units pass through NP_
Its own interfaces are passed to browser by GetEntryPoints interfaces, so as to reach the purpose of both sides' intermodulation.
Npapi plug-in unit listening modes are comprised the following steps:
S121. browser extends through the reference npapi plug-in units in background html, starts a class in npapi plug-in units
Example, when targeted website and browser extension in matches definition when matching, injection perform javascript pin
This;
S122. in javascript scenario process is performed, the click of list submitting button is monitored using addListener functions
Event;
S123. in the click event handling function of javascript scripts, the corresponding element for triggering mouse click event is obtained
Information;
Element information is transmitted to npapi plug-in units by S124.javascript scripts, and preparation is sent to server.
In S2, described encryption is:
According to json forms be formatted the information of acquisition by BHO plug-in units or npapi plug-in units, and sensitive field therein is used
RAS public key encryptions, server end is issued when the public key is system initialization.Server end is decrypted using private key, it is ensured that data
Can not be identified.The json messages finished to establishment, are signed using RSA private keys, and server end uses RSA public key verifications label
Name, it is ensured that data can not tamper.
In addition, in S3, host-host protocol uses http agreements.
Presently preferred embodiments of the present invention is the foregoing is only, is not intended to limit the invention, it is all in essence of the invention
Any modification, equivalent and improvement made within god and principle etc., should be included within the scope of the present invention.
Claims (10)
1. a kind of information collecting method of the html pages, it is characterised in that comprise the following steps:
S1. BHO plug-in units listening mode or npapi plug-in unit listening modes are used, web form is monitored and is submitted event to, and at event
Form Element information is obtained in reason;
The element information that S2.BHO plug-in units or npapi plug-in units will get is encrypted;
S3. by the information of encryption, it is transferred to server end.
2. the information collecting method of the html pages as claimed in claim 1, it is characterised in that in S1, for IE kernels
Browser uses BHO plug-in unit listening modes.
3. the information collecting method of the html pages as claimed in claim 1, it is characterised in that in S1, for non-IE kernels
Browser use npapi plug-in unit listening modes.
4. the information collecting method of the html pages as claimed in claim 2, it is characterised in that BHO plug-in unit listening modes include
Following steps:
S111.BHO plug-in units monitor the html document loaded events of browser;
S112. in html document loaded events, the corresponding element for triggering mouse click event is obtained;
S113. by the com interfaces of the element, element information is obtained;
S114. the element information that will be got is stored in BHO plug-in units, and preparation is sent to server.
5. the information collecting method of the html pages as claimed in claim 4, it is characterised in that
In S112, if trigger mouse click event is list submitting button, correspondence multiple element then chooses current html documents
The corresponding element of needs.
6. the information collecting method of the html pages as claimed in claim 3, it is characterised in that npapi plug-in unit listening mode bags
Include following steps:
S121. npapi plug-in units are quoted in browser extension, start an example for class in npapi plug-in units, and injection is performed
Javascript scripts;
S122. in javascript scenario process is performed, the click event of list submitting button is monitored;
S123. in the click event handling function of javascript scripts, the corresponding element for triggering mouse click event is obtained
Information;
Element information is transmitted to npapi plug-in units by S124.javascript scripts, and preparation is sent to server.
7. the information collecting method of the html pages as claimed in claim 1, it is characterised in that in S2, at described encryption
Manage and be:
Be formatted for information by BHO plug-in units or npapi plug-in units, and sensitive field therein is encrypted, and to formatting after
Infomational message signed, server end to signature verify.
8. the information collecting method of the html pages as claimed in claim 7, it is characterised in that information is carried out according to json forms
Format.
9. the information collecting method of the html pages as claimed in claim 7, it is characterised in that to the cipher mode of sensitive field
It is RSA public keys, server end is decrypted using private key;RSA private key signatures are used infomational message, and server end uses RSA public keys
Checking signature.
10. the information collecting method of the html pages as claimed in claim 1, it is characterised in that in S3, host-host protocol is used
Http agreements.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710043553.9A CN106777362A (en) | 2017-01-19 | 2017-01-19 | A kind of information collecting method of the html pages |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710043553.9A CN106777362A (en) | 2017-01-19 | 2017-01-19 | A kind of information collecting method of the html pages |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106777362A true CN106777362A (en) | 2017-05-31 |
Family
ID=58943773
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710043553.9A Pending CN106777362A (en) | 2017-01-19 | 2017-01-19 | A kind of information collecting method of the html pages |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106777362A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239546A (en) * | 2017-06-05 | 2017-10-10 | 成都知道创宇信息技术有限公司 | A kind of method of webpage local content tracking with reminding |
CN108415804A (en) * | 2018-01-23 | 2018-08-17 | 平安普惠企业管理有限公司 | Obtain method, terminal device and the computer readable storage medium of information |
CN108540501A (en) * | 2018-07-18 | 2018-09-14 | 郑州云海信息技术有限公司 | A kind of method and apparatus of asymmetric cryptosystem |
CN108681605A (en) * | 2018-05-24 | 2018-10-19 | 四川物联亿达科技有限公司 | A kind of file data acquisition method based on e-government Intranet |
CN110083755A (en) * | 2019-04-29 | 2019-08-02 | 北京脉冲星科技有限公司 | A kind of high emulation parsing web-page approach, device and electronic equipment |
CN110119634A (en) * | 2018-11-28 | 2019-08-13 | 熵加网络科技(北京)有限公司 | A method of with browser plug-in to text encryption and decryption |
CN110955531A (en) * | 2018-09-27 | 2020-04-03 | 长沙博为软件技术股份有限公司 | Method for realizing communication among multiple tabs based on browser-Based (BHO) technology |
CN111741030A (en) * | 2020-08-26 | 2020-10-02 | 北京赛宁网安科技有限公司 | Website security detection system and method combining Web automation and agent interception |
CN113343159A (en) * | 2021-08-06 | 2021-09-03 | 万商云集(成都)科技股份有限公司 | Method and system for rapidly acquiring data from any channel, analyzing and storing data |
CN114676330A (en) * | 2022-03-30 | 2022-06-28 | 南京厚建软件有限责任公司 | Method for uniformly recovering interactive data of Internet platform |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102789561A (en) * | 2012-06-29 | 2012-11-21 | 奇智软件(北京)有限公司 | Method and device for utilizing camera in browser |
CN104023013A (en) * | 2014-05-30 | 2014-09-03 | 上海帝联信息科技股份有限公司 | Data transmission method, server side and client |
CN104750471A (en) * | 2013-12-30 | 2015-07-01 | 上海格尔软件股份有限公司 | WEB page performance detection and analysis plug-in and method based on browser |
CN105426549A (en) * | 2015-12-29 | 2016-03-23 | 北京金山安全软件有限公司 | Method and device for reading webpage resources and electronic equipment |
CN106250437A (en) * | 2016-07-27 | 2016-12-21 | 长沙麦斯森信息科技有限公司 | A kind of electronic monitoring front end data acquisition method and system |
-
2017
- 2017-01-19 CN CN201710043553.9A patent/CN106777362A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102789561A (en) * | 2012-06-29 | 2012-11-21 | 奇智软件(北京)有限公司 | Method and device for utilizing camera in browser |
CN104750471A (en) * | 2013-12-30 | 2015-07-01 | 上海格尔软件股份有限公司 | WEB page performance detection and analysis plug-in and method based on browser |
CN104023013A (en) * | 2014-05-30 | 2014-09-03 | 上海帝联信息科技股份有限公司 | Data transmission method, server side and client |
CN105426549A (en) * | 2015-12-29 | 2016-03-23 | 北京金山安全软件有限公司 | Method and device for reading webpage resources and electronic equipment |
CN106250437A (en) * | 2016-07-27 | 2016-12-21 | 长沙麦斯森信息科技有限公司 | A kind of electronic monitoring front end data acquisition method and system |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239546A (en) * | 2017-06-05 | 2017-10-10 | 成都知道创宇信息技术有限公司 | A kind of method of webpage local content tracking with reminding |
CN108415804A (en) * | 2018-01-23 | 2018-08-17 | 平安普惠企业管理有限公司 | Obtain method, terminal device and the computer readable storage medium of information |
CN108415804B (en) * | 2018-01-23 | 2021-06-04 | 平安普惠企业管理有限公司 | Method for acquiring information, terminal device and computer readable storage medium |
CN108681605A (en) * | 2018-05-24 | 2018-10-19 | 四川物联亿达科技有限公司 | A kind of file data acquisition method based on e-government Intranet |
CN108540501A (en) * | 2018-07-18 | 2018-09-14 | 郑州云海信息技术有限公司 | A kind of method and apparatus of asymmetric cryptosystem |
CN108540501B (en) * | 2018-07-18 | 2021-07-27 | 郑州云海信息技术有限公司 | Asymmetric encryption method and device |
CN110955531A (en) * | 2018-09-27 | 2020-04-03 | 长沙博为软件技术股份有限公司 | Method for realizing communication among multiple tabs based on browser-Based (BHO) technology |
CN110119634A (en) * | 2018-11-28 | 2019-08-13 | 熵加网络科技(北京)有限公司 | A method of with browser plug-in to text encryption and decryption |
CN110083755A (en) * | 2019-04-29 | 2019-08-02 | 北京脉冲星科技有限公司 | A kind of high emulation parsing web-page approach, device and electronic equipment |
CN111741030A (en) * | 2020-08-26 | 2020-10-02 | 北京赛宁网安科技有限公司 | Website security detection system and method combining Web automation and agent interception |
CN111741030B (en) * | 2020-08-26 | 2020-12-04 | 北京赛宁网安科技有限公司 | Website security detection system and method combining Web automation and agent interception |
CN113343159A (en) * | 2021-08-06 | 2021-09-03 | 万商云集(成都)科技股份有限公司 | Method and system for rapidly acquiring data from any channel, analyzing and storing data |
CN114676330A (en) * | 2022-03-30 | 2022-06-28 | 南京厚建软件有限责任公司 | Method for uniformly recovering interactive data of Internet platform |
CN114676330B (en) * | 2022-03-30 | 2023-12-08 | 南京厚建软件有限责任公司 | Method for uniformly recovering interactive data of Internet platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106777362A (en) | A kind of information collecting method of the html pages | |
CN107864677B (en) | Content access authentication system and method | |
US11757619B2 (en) | Generating sequences of network data while preventing acquisition or manipulation of time data | |
US20220376900A1 (en) | Aggregating encrypted network values | |
JP7399236B2 (en) | Using multiple aggregation servers to prevent data manipulation | |
US20220166780A1 (en) | Securing browser cookies | |
WO2017066811A2 (en) | Third-party documented trust linkages for email streams | |
US20230421544A1 (en) | Preventing fraud in aggregated network measurements | |
US20240095364A1 (en) | Privacy-preserving and secure application install attribution | |
EP4042312B1 (en) | Multi-recipient secure communication | |
EP4042665B1 (en) | Preventing data manipulation in telecommunication network measurements | |
Wu | On TCP migration and Internet privacy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170531 |
|
RJ01 | Rejection of invention patent application after publication |