CN106777362A - A kind of information collecting method of the html pages - Google Patents

A kind of information collecting method of the html pages Download PDF

Info

Publication number
CN106777362A
CN106777362A CN201710043553.9A CN201710043553A CN106777362A CN 106777362 A CN106777362 A CN 106777362A CN 201710043553 A CN201710043553 A CN 201710043553A CN 106777362 A CN106777362 A CN 106777362A
Authority
CN
China
Prior art keywords
plug
information
units
collecting method
html pages
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710043553.9A
Other languages
Chinese (zh)
Inventor
杨伟丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yun Ling Science And Technology Ltd
Original Assignee
Hangzhou Yun Ling Science And Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yun Ling Science And Technology Ltd filed Critical Hangzhou Yun Ling Science And Technology Ltd
Priority to CN201710043553.9A priority Critical patent/CN106777362A/en
Publication of CN106777362A publication Critical patent/CN106777362A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a kind of information collecting method of the html pages, comprise the following steps:S1. BHO plug-in units listening mode or npapi plug-in unit listening modes are used, web form is monitored and is submitted event to, and Form Element information is obtained in event handling;The element information that S2.BHO plug-in units or npapi plug-in units will get is encrypted;S3. by the information of encryption, it is transferred to server end.This method only need on a client install plug-in unit without making any change at website service end, you can obtain form information, and be sent to server and counted;Information extraction low cost, is greatly improved the speed and accuracy of Data Enter.

Description

A kind of information collecting method of the html pages
Technical field
The invention belongs to information and date process field, a kind of particularly information collecting method of the html pages.
Background technology
As the level of informatization is constantly deepened, enterprise's craving integrated to informationization is also increasingly strong;Hold internet The continuous information resources for increasing have contained the commercially valuable information of flood tide, as important information source.At present, The Related product of the information gathering of the html pages is few in number, and Back ground Information facility requirements of the product to user in itself are high, real Apply that the cycle is long, system Construction and maintenance cost are high, major customer is ultra-large type business and government, ordinary enterprises without
Power is born.
At present in client computer, service end(C/S)Under model, the list that service end can be to submit in direct access client computer. But for third-party application exploitation, because service end can not be changed again, at this moment just cannot directly carry out html list statistics.
The content of the invention
To solve the above problems, need not change service end it is an object of the invention to provide one kind and can be achieved with the html pages The method of information gathering.
To achieve the above object, the technical scheme is that:
A kind of information collecting method of the html pages, comprises the following steps:
S1. BHO plug-in units listening mode or npapi plug-in unit listening modes are used, web form is monitored and is submitted event to, and at event Form Element information is obtained in reason;
The element information that S2.BHO plug-in units or npapi plug-in units will get is encrypted;
S3. by the information of encryption, it is transferred to server end.
Further, in S1, the browser for IE kernels uses BHO plug-in unit listening modes;For non-IE kernels Browser uses npapi plug-in unit listening modes.
Further, BHO plug-in units listening mode is comprised the following steps:
S111.BHO plug-in units monitor the html document loaded events of browser;
S112. in html document loaded events, the corresponding element for triggering mouse click event is obtained;
S113. by the com interfaces of the element, element information is obtained;
S114. the element information that will be got is stored in BHO plug-in units, and preparation is sent to server.
Preferably, in S112, if trigger mouse click event is list submitting button, correspondence multiple element is then chosen The corresponding element that current html documents need.
Further, npapi plug-in units listening mode is comprised the following steps:
S121. npapi plug-in units are quoted in browser extension, start an example for class in npapi plug-in units, and injection is performed Javascript scripts;
S122. in javascript scenario process is performed, the click event of list submitting button is monitored;
S123. in the click event handling function of javascript scripts, the corresponding element for triggering mouse click event is obtained Information;
Element information is transmitted to npapi plug-in units by S124.javascript scripts, and preparation is sent to server.
Further, in S2, described encryption is:
Be formatted for information by BHO plug-in units or npapi plug-in units, and sensitive field therein is encrypted, and to formatting after Infomational message signed, server end to signature verify.
Preferably, information is formatted according to json forms.
Preferably, the cipher mode to sensitive field is RSA public keys, and server end is decrypted using private key;To infomational message Using RSA private key signatures, server end is signed using RSA public key verifications.
Preferably, in S3, host-host protocol uses http agreements.
The beneficial effects of the invention are as follows:
(1)This method only need on a client install plug-in unit without making any change at website service end, you can obtain list Information, and be sent to server and counted;Information extraction low cost, is greatly improved the speed and accuracy of Data Enter.
(2)This method has website independence, for the html pages for arbitrarily needing collection information, can use we Method directly gathers information.
Brief description of the drawings
Fig. 1 is the html information gathering flow charts in the embodiment of the present invention;
Fig. 2 is the BHO plug-in unit listening mode flow charts in the embodiment of the present invention;
Fig. 3 is the npapi plug-in unit listening mode flow charts in the embodiment of the present invention.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the present invention, not For limiting the present invention.
Conversely, the present invention covers any replacement done in spirit and scope of the invention being defined by the claims, repaiies Change, equivalent method and scheme.Further, in order that the public has a better understanding to the present invention, below to of the invention thin It is detailed to describe some specific detail sections in section description.Part without these details for a person skilled in the art Description can also completely understand the present invention.
The html information input method flows of the embodiment of the present invention are as shown in Fig. 1.
A kind of information collecting method of the html pages, comprises the following steps:
S1. BHO plug-in units listening mode or npapi plug-in unit listening modes are used, web form is monitored and is submitted event to, and at event Form Element information is obtained in reason;
The element information that S2.BHO plug-in units or npapi plug-in units will get is encrypted;
S3. by the information of encryption, it is transferred to server end.
In S1, the browser for IE kernels uses BHO plug-in unit listening modes;Browser for non-IE kernels is used Npapi plug-in unit listening modes.
Wherein, BHO plug-in units are directed to the plug-in unit of IE browser, and BHO plug-in units are a kind of com components, are realized IObjectWithSite interfaces.And it according to browser is 32 or 64 to need, to registration table HKLM SOFTWARE Microsoft Windows CurrentVersion Explorer Browser Helper Objects registration BHO plug-in units Guid.
BHO plug-in unit listening modes are comprised the following steps:
S111.BHO plug-in units monitor the html document loaded events of browser;It is the point of certain submitting button on actually Hit event, event id: DISPID_HTMLDOCUMENTEVENTS2_ONCLICK.
S112. in html document loaded events, the corresponding element for triggering mouse click event is obtained;If triggering mouse What punctuate hit event is list submitting button, correspondence multiple element, then choose the corresponding element that current html documents need.
S113. by the com interfaces of the element, element information is obtained;Use the get_all letters of IHTMLDocument2 Number, you can obtain all of element interface IHTMLElement, use the getAttribute methods of the interface, you can obtain unit Plain content, i.e. element information.
S114. the element information that will be got is stored in BHO plug-in units, and preparation is sent to server.
Wherein, npapi plug-in units are Netscape plug-in applications DLL, are to follow Netscape Communications Corporation(Netscape Communications Corporation)One group of simple C Plugin application programming interfaces of institute's constituting criterion, main pin To non-IE browser.During the end of the year 2004, Ge Jia browsers company (IE, Opera, Mozilla etc.) all agrees to support NPRuntime extends API(Application programming interface)To support Scriptability, thus at present need with Based on NPRuntime API, Plugin can be just set to cross over various browsers.
Extension comprising npapi plug-in units is installed on browser, after browser starts, can be in the plugins of acquiescence Loading npapi plug-in units in file, and the MimeType attributes of plug-in unit are read, it is saved in inside browser.During plug-in initialization, The interface of oneself is passed to npapi plug-in units by browser by NP_Initialize interfaces, and npapi plug-in units pass through NP_ Its own interfaces are passed to browser by GetEntryPoints interfaces, so as to reach the purpose of both sides' intermodulation.
Npapi plug-in unit listening modes are comprised the following steps:
S121. browser extends through the reference npapi plug-in units in background html, starts a class in npapi plug-in units Example, when targeted website and browser extension in matches definition when matching, injection perform javascript pin This;
S122. in javascript scenario process is performed, the click of list submitting button is monitored using addListener functions Event;
S123. in the click event handling function of javascript scripts, the corresponding element for triggering mouse click event is obtained Information;
Element information is transmitted to npapi plug-in units by S124.javascript scripts, and preparation is sent to server.
In S2, described encryption is:
According to json forms be formatted the information of acquisition by BHO plug-in units or npapi plug-in units, and sensitive field therein is used RAS public key encryptions, server end is issued when the public key is system initialization.Server end is decrypted using private key, it is ensured that data Can not be identified.The json messages finished to establishment, are signed using RSA private keys, and server end uses RSA public key verifications label Name, it is ensured that data can not tamper.
In addition, in S3, host-host protocol uses http agreements.
Presently preferred embodiments of the present invention is the foregoing is only, is not intended to limit the invention, it is all in essence of the invention Any modification, equivalent and improvement made within god and principle etc., should be included within the scope of the present invention.

Claims (10)

1. a kind of information collecting method of the html pages, it is characterised in that comprise the following steps:
S1. BHO plug-in units listening mode or npapi plug-in unit listening modes are used, web form is monitored and is submitted event to, and at event Form Element information is obtained in reason;
The element information that S2.BHO plug-in units or npapi plug-in units will get is encrypted;
S3. by the information of encryption, it is transferred to server end.
2. the information collecting method of the html pages as claimed in claim 1, it is characterised in that in S1, for IE kernels Browser uses BHO plug-in unit listening modes.
3. the information collecting method of the html pages as claimed in claim 1, it is characterised in that in S1, for non-IE kernels Browser use npapi plug-in unit listening modes.
4. the information collecting method of the html pages as claimed in claim 2, it is characterised in that BHO plug-in unit listening modes include Following steps:
S111.BHO plug-in units monitor the html document loaded events of browser;
S112. in html document loaded events, the corresponding element for triggering mouse click event is obtained;
S113. by the com interfaces of the element, element information is obtained;
S114. the element information that will be got is stored in BHO plug-in units, and preparation is sent to server.
5. the information collecting method of the html pages as claimed in claim 4, it is characterised in that
In S112, if trigger mouse click event is list submitting button, correspondence multiple element then chooses current html documents The corresponding element of needs.
6. the information collecting method of the html pages as claimed in claim 3, it is characterised in that npapi plug-in unit listening mode bags Include following steps:
S121. npapi plug-in units are quoted in browser extension, start an example for class in npapi plug-in units, and injection is performed Javascript scripts;
S122. in javascript scenario process is performed, the click event of list submitting button is monitored;
S123. in the click event handling function of javascript scripts, the corresponding element for triggering mouse click event is obtained Information;
Element information is transmitted to npapi plug-in units by S124.javascript scripts, and preparation is sent to server.
7. the information collecting method of the html pages as claimed in claim 1, it is characterised in that in S2, at described encryption Manage and be:
Be formatted for information by BHO plug-in units or npapi plug-in units, and sensitive field therein is encrypted, and to formatting after Infomational message signed, server end to signature verify.
8. the information collecting method of the html pages as claimed in claim 7, it is characterised in that information is carried out according to json forms Format.
9. the information collecting method of the html pages as claimed in claim 7, it is characterised in that to the cipher mode of sensitive field It is RSA public keys, server end is decrypted using private key;RSA private key signatures are used infomational message, and server end uses RSA public keys Checking signature.
10. the information collecting method of the html pages as claimed in claim 1, it is characterised in that in S3, host-host protocol is used Http agreements.
CN201710043553.9A 2017-01-19 2017-01-19 A kind of information collecting method of the html pages Pending CN106777362A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710043553.9A CN106777362A (en) 2017-01-19 2017-01-19 A kind of information collecting method of the html pages

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710043553.9A CN106777362A (en) 2017-01-19 2017-01-19 A kind of information collecting method of the html pages

Publications (1)

Publication Number Publication Date
CN106777362A true CN106777362A (en) 2017-05-31

Family

ID=58943773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710043553.9A Pending CN106777362A (en) 2017-01-19 2017-01-19 A kind of information collecting method of the html pages

Country Status (1)

Country Link
CN (1) CN106777362A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239546A (en) * 2017-06-05 2017-10-10 成都知道创宇信息技术有限公司 A kind of method of webpage local content tracking with reminding
CN108415804A (en) * 2018-01-23 2018-08-17 平安普惠企业管理有限公司 Obtain method, terminal device and the computer readable storage medium of information
CN108540501A (en) * 2018-07-18 2018-09-14 郑州云海信息技术有限公司 A kind of method and apparatus of asymmetric cryptosystem
CN108681605A (en) * 2018-05-24 2018-10-19 四川物联亿达科技有限公司 A kind of file data acquisition method based on e-government Intranet
CN110083755A (en) * 2019-04-29 2019-08-02 北京脉冲星科技有限公司 A kind of high emulation parsing web-page approach, device and electronic equipment
CN110119634A (en) * 2018-11-28 2019-08-13 熵加网络科技(北京)有限公司 A method of with browser plug-in to text encryption and decryption
CN110955531A (en) * 2018-09-27 2020-04-03 长沙博为软件技术股份有限公司 Method for realizing communication among multiple tabs based on browser-Based (BHO) technology
CN111741030A (en) * 2020-08-26 2020-10-02 北京赛宁网安科技有限公司 Website security detection system and method combining Web automation and agent interception
CN113343159A (en) * 2021-08-06 2021-09-03 万商云集(成都)科技股份有限公司 Method and system for rapidly acquiring data from any channel, analyzing and storing data
CN114676330A (en) * 2022-03-30 2022-06-28 南京厚建软件有限责任公司 Method for uniformly recovering interactive data of Internet platform

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102789561A (en) * 2012-06-29 2012-11-21 奇智软件(北京)有限公司 Method and device for utilizing camera in browser
CN104023013A (en) * 2014-05-30 2014-09-03 上海帝联信息科技股份有限公司 Data transmission method, server side and client
CN104750471A (en) * 2013-12-30 2015-07-01 上海格尔软件股份有限公司 WEB page performance detection and analysis plug-in and method based on browser
CN105426549A (en) * 2015-12-29 2016-03-23 北京金山安全软件有限公司 Method and device for reading webpage resources and electronic equipment
CN106250437A (en) * 2016-07-27 2016-12-21 长沙麦斯森信息科技有限公司 A kind of electronic monitoring front end data acquisition method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102789561A (en) * 2012-06-29 2012-11-21 奇智软件(北京)有限公司 Method and device for utilizing camera in browser
CN104750471A (en) * 2013-12-30 2015-07-01 上海格尔软件股份有限公司 WEB page performance detection and analysis plug-in and method based on browser
CN104023013A (en) * 2014-05-30 2014-09-03 上海帝联信息科技股份有限公司 Data transmission method, server side and client
CN105426549A (en) * 2015-12-29 2016-03-23 北京金山安全软件有限公司 Method and device for reading webpage resources and electronic equipment
CN106250437A (en) * 2016-07-27 2016-12-21 长沙麦斯森信息科技有限公司 A kind of electronic monitoring front end data acquisition method and system

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239546A (en) * 2017-06-05 2017-10-10 成都知道创宇信息技术有限公司 A kind of method of webpage local content tracking with reminding
CN108415804A (en) * 2018-01-23 2018-08-17 平安普惠企业管理有限公司 Obtain method, terminal device and the computer readable storage medium of information
CN108415804B (en) * 2018-01-23 2021-06-04 平安普惠企业管理有限公司 Method for acquiring information, terminal device and computer readable storage medium
CN108681605A (en) * 2018-05-24 2018-10-19 四川物联亿达科技有限公司 A kind of file data acquisition method based on e-government Intranet
CN108540501A (en) * 2018-07-18 2018-09-14 郑州云海信息技术有限公司 A kind of method and apparatus of asymmetric cryptosystem
CN108540501B (en) * 2018-07-18 2021-07-27 郑州云海信息技术有限公司 Asymmetric encryption method and device
CN110955531A (en) * 2018-09-27 2020-04-03 长沙博为软件技术股份有限公司 Method for realizing communication among multiple tabs based on browser-Based (BHO) technology
CN110119634A (en) * 2018-11-28 2019-08-13 熵加网络科技(北京)有限公司 A method of with browser plug-in to text encryption and decryption
CN110083755A (en) * 2019-04-29 2019-08-02 北京脉冲星科技有限公司 A kind of high emulation parsing web-page approach, device and electronic equipment
CN111741030A (en) * 2020-08-26 2020-10-02 北京赛宁网安科技有限公司 Website security detection system and method combining Web automation and agent interception
CN111741030B (en) * 2020-08-26 2020-12-04 北京赛宁网安科技有限公司 Website security detection system and method combining Web automation and agent interception
CN113343159A (en) * 2021-08-06 2021-09-03 万商云集(成都)科技股份有限公司 Method and system for rapidly acquiring data from any channel, analyzing and storing data
CN114676330A (en) * 2022-03-30 2022-06-28 南京厚建软件有限责任公司 Method for uniformly recovering interactive data of Internet platform
CN114676330B (en) * 2022-03-30 2023-12-08 南京厚建软件有限责任公司 Method for uniformly recovering interactive data of Internet platform

Similar Documents

Publication Publication Date Title
CN106777362A (en) A kind of information collecting method of the html pages
CN107864677B (en) Content access authentication system and method
US11757619B2 (en) Generating sequences of network data while preventing acquisition or manipulation of time data
US20220376900A1 (en) Aggregating encrypted network values
JP7399236B2 (en) Using multiple aggregation servers to prevent data manipulation
US20220166780A1 (en) Securing browser cookies
WO2017066811A2 (en) Third-party documented trust linkages for email streams
US20230421544A1 (en) Preventing fraud in aggregated network measurements
US20240095364A1 (en) Privacy-preserving and secure application install attribution
EP4042312B1 (en) Multi-recipient secure communication
EP4042665B1 (en) Preventing data manipulation in telecommunication network measurements
Wu On TCP migration and Internet privacy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170531

RJ01 Rejection of invention patent application after publication