CN103034711A - Form recognition method and device - Google Patents

Form recognition method and device Download PDF

Info

Publication number
CN103034711A
CN103034711A CN2012105299114A CN201210529911A CN103034711A CN 103034711 A CN103034711 A CN 103034711A CN 2012105299114 A CN2012105299114 A CN 2012105299114A CN 201210529911 A CN201210529911 A CN 201210529911A CN 103034711 A CN103034711 A CN 103034711A
Authority
CN
China
Prior art keywords
attribute
webpage
default
data
web page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012105299114A
Other languages
Chinese (zh)
Other versions
CN103034711B (en
Inventor
蔡磊
张骏
万振
傅盛
徐鸣
王昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Baohaowan Technology Co Ltd
Original Assignee
Beijing Kingsoft Internet Security Software Co Ltd
Conew Network Technology Beijing Co Ltd
Shell Internet Beijing Security Technology Co Ltd
Beijing Kingsoft Internet Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Internet Security Software Co Ltd, Conew Network Technology Beijing Co Ltd, Shell Internet Beijing Security Technology Co Ltd, Beijing Kingsoft Internet Science and Technology Co Ltd filed Critical Beijing Kingsoft Internet Security Software Co Ltd
Priority to CN201210529911.4A priority Critical patent/CN103034711B/en
Publication of CN103034711A publication Critical patent/CN103034711A/en
Application granted granted Critical
Publication of CN103034711B publication Critical patent/CN103034711B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a form identification method and a form identification device. The form identification method comprises the following steps: receiving an access instruction; loading a webpage corresponding to the access instruction; scanning the webpage codes of the loaded webpage; judging whether the scanned webpage codes comprise elements with the attributes being first preset attributes or not; judging whether the scanned webpage codes comprise elements with the attributes being second preset attributes or not; and if the scanned webpage codes comprise the elements with the attributes being the first preset attributes and the elements with the attributes being the second preset attributes, determining that the loaded webpage is the form webpage. By the method and the device, the problem of low form recognition rate in the prior art is solved, and the effect of improving the form recognition rate is achieved.

Description

The form recognition method and apparatus
Technical field
The present invention relates to data processing field, in particular to a kind of form recognition method and apparatus.
Background technology
The double-core browser namely has the browser of two kernels, comprises Trident kernel and Webkit kernel.The Trident kernel is web browser (Internet Explorer, abbreviation IE) uses, IE browser at home popularity rate is very high, compatible IE is only considered in a lot of websites, and do not meet World Wide Web Consortium (World Wide Web Consortium, be called for short W3C) standard, Net silver for example, on-line payment class website.Check the very perfect of W3C standard support in the Webkit, have simultaneously characteristics at a high speed.The compatibility of Trident kernel adds the high speed of Webkit kernel, and the double-core browser has satisfied different user's requests.In the prior art, the double-core form recognition of Trident kernel and Webkit kernel has adopted for HTML (Hypertext Markup Language) (Hypertext Markup Language, abbreviation HTML) the form list is identified in the webpage, concrete recognition method is to insert form information as the user in the page, click on submission button, after carrying out the submission event, judge by the result who carries out the submission event whether list is submitted to successfully.If submission of sheet success, then form data is stored in the database, can store a plurality of fields in the list in the database, be considered as the form information of a success, can find out by foregoing description, when being identified, list need to judge a plurality of fields in the list after submitting to successfully in the prior art, in the situation that all satisfying condition, a plurality of fields just can reach identification to list, this kind need to not only can cause the mode that a plurality of fields in the list are identified the form recognition rate to reduce, when subsequent user is filled in list, still need a plurality of fields in the database are mated respectively and can judge just whether the list that the user filling in is the current web page list, only when judging as current list, just can normally fill in, cause inconvenient user's operation, user experience reduces.
For the lower problem of form recognition rate in the correlation technique, effective solution is proposed not yet at present.
Summary of the invention
Fundamental purpose of the present invention is to provide a kind of form recognition method and apparatus, to solve the lower problem of form recognition rate in the prior art.
To achieve these goals, according to an aspect of the present invention, provide a kind of form recognition method, having comprised: received access instruction; Load the webpage corresponding with access instruction; Web page code to the webpage that loads scans; Judge whether comprise in the web page code that scans that attribute is the element of the first default attribute, wherein, the first default element corresponding to attribute is cryptographic element; Judge whether comprise in the web page code that scans that attribute is the element of the second default attribute, wherein, the second default element corresponding to attribute is the user name element; And if judge and comprise in the web page code that scans that attribute is the element of the first default attribute, and comprise that attribute is the element of the second default attribute, determines that then the webpage that loads is the list webpage.
Further, the web page code of the webpage that loads is scanned comprise: obtain the kernel type that produces access instruction; If the kernel type that gets access to is the Trident kernel, then inject default scripted code to web page code so that web page code is scanned; And if the kernel type that gets access to is the Webkit kernel, then the input control in the dom tree in the web page code is scanned.
Further, after determining that the webpage that loads is the list webpage, the form recognition method also comprises: judge whether to receive triggering command, wherein, triggering command is used for the submission form webpage; And if judge and receive triggering command, determine that then the list webpage is effective list.
Further, when the kernel type that produces access instruction is the Trident kernel, judge whether that receiving triggering command comprises: obtain that attribute is the element of the 3rd default attribute in the web page code, obtain the first element, wherein, the 3rd default element corresponding to attribute is the submission event; Copy the first element, obtain the second element; Cover the first element with the second element; And judge whether the second element is performed, and be performed if judge the second element, then determine to receive triggering command.
Further, when the kernel type that produces access instruction is the Webkit kernel, judge whether that receiving triggering command comprises: obtain that attribute is the element of the 3rd default attribute in the web page code, obtain the first element, wherein, the 3rd default element corresponding to attribute is the submission event; And judge whether the first element is performed, and be performed if judge the first element, then determine to receive triggering command.
Further, after determining that the webpage that loads is the list webpage, and before judging whether to receive triggering command, the form recognition method also comprises: getattr is the element of the first default attribute, obtains cryptographic element; Getattr is the element of the second default attribute, obtains the user name element; The inquiry presetting database is to judge whether code data and username data all have been kept in the presetting database, and wherein, code data is data corresponding to cryptographic element, and username data is data corresponding to user name element; And if judge code data and username data all has been kept in the presetting database, then add code data to the cryptographic element of the webpage that loads, and add username data to the user name element of the webpage that loads.
Further, after determining that the webpage that loads is the list webpage, and before judging whether to receive triggering command, the form recognition method also comprises: getattr is the element of the first default attribute, obtains cryptographic element; Getattr is the element of the second default attribute, obtains the user name element; The inquiry presetting database is to judge whether code data and username data all have been kept in the presetting database, and wherein, code data is data corresponding to cryptographic element, and username data is data corresponding to user name element; Be kept in the presetting database if judge username data, and code data is not kept in the presetting database, then adds username data to the user name element of the webpage that loads, and receive the code data of user's input; And if judge username data and code data all is not kept in the presetting database, then receive code data and the username data of user's input.
Further, judge receive triggering command after, the form recognition method also comprises: show the default window that plays, wherein, the default bullet on the window is provided with suggestion content, suggestion content is used for prompting user and selects whether preserve code data and username data, or prompting user selects whether to preserve code data; Reception is from user's selection instruction; And when selection instruction represents to select to preserve code data and username data, preserve code data and username data to presetting database, or preserve code data to presetting database.
To achieve these goals, according to a further aspect in the invention, provide a kind of form recognition device, this form recognition device is used for carrying out any form recognition method that foregoing of the present invention provides.
To achieve these goals, according to a further aspect in the invention, provide a kind of form recognition device, having comprised: receiving element is used for receiving access instruction; Loading unit is used for loading the webpage corresponding with access instruction; Scanning element is used for the web page code of the webpage that loads is scanned; The first judging unit is used for judging whether the web page code that scans comprises that attribute is the element of the first default attribute, and wherein, the first default element corresponding to attribute is cryptographic element; The second judging unit is used for judging whether the web page code that scans comprises that attribute is the element of the second default attribute, and wherein, the second default element corresponding to attribute is the user name element; And determining unit, comprise that attribute is the element of the first default attribute if be used for judging the web page code that scans, and comprise that attribute is the element of the second default attribute, determines that then the webpage that loads is the list webpage.
Further, scanning element comprises: first obtains subelement, is used for obtaining the kernel type that produces access instruction; The first scanning subelement is used for when the kernel type that gets access to is the Trident kernel, inject default scripted code to web page code so that web page code is scanned; And the second scanning subelement, be used for when the kernel type that gets access to is the Webkit kernel, the input control in the dom tree in the web page code being scanned.
By the present invention, adopt to receive access instruction; Load the webpage corresponding with access instruction; Web page code to the webpage that loads scans; Judge whether comprise in the web page code that scans that attribute is the element of the first default attribute, wherein, the first default element corresponding to attribute is cryptographic element; Judge whether comprise in the web page code that scans that attribute is the element of the second default attribute, wherein, the second default element corresponding to attribute is the user name element; And if judge and comprise in the web page code that scans that attribute is the element of the first default attribute, and comprise that attribute is the element of the second default attribute, determines that then the webpage that loads is the list webpage.Scan by the web page code of the user being accessed the webpage that loads, realization is to the monitoring of web page code, and then realization is to the monitoring of each element property in the web page code, the webpage that go out to load with fast detecting whether comprise satisfy default attribute element (namely, realize that fast detecting goes out cryptographic element and user name element), this kind only needs by the username field in the webpage and password field are carried out the method for monitoring, only need scan to the web page code of the webpage of loading whether the webpage that can realize loading is the identification of list webpage, relatively need recognition methods that a plurality of fields in the list after submitting to are successfully judged in the prior art, effectively reduced the complexity of form recognition, solve the lower problem of form recognition rate in the prior art, and then reached the effect that improves the form recognition rate.
Description of drawings
The accompanying drawing that consists of the application's a part is used to provide a further understanding of the present invention, and illustrative examples of the present invention and explanation thereof are used for explaining the present invention, do not consist of improper restriction of the present invention.In the accompanying drawings:
Fig. 1 is the process flow diagram according to the form recognition method of the embodiment of the invention;
Fig. 2 is according to the form recognition method of the embodiment of the invention process flow diagram to cryptographic element in the Webkit kernel browser and the scanning of user name element;
Fig. 3 is whether form recognition method according to the embodiment of the invention is to receiving the decision flow chart of triggering command in the Trident kernel browser;
Fig. 4 is the form recognition method that is applied to the Trident kernel browser according to the embodiment of the invention;
Fig. 5 is the form recognition method that is applied to the Webkit kernel browser according to the embodiment of the invention; And
Fig. 6 is the schematic diagram according to the form recognition device of the embodiment of the invention.
Embodiment
Need to prove, in the situation that do not conflict, embodiment and the feature among the embodiment among the application can make up mutually.Describe below with reference to the accompanying drawings and in conjunction with the embodiments the present invention in detail.
The embodiment of the invention provides a kind of form recognition method, below form recognition method that the embodiment of the invention is provided be specifically introduced:
Fig. 1 is the process flow diagram according to the form recognition method of the embodiment of the invention, and as shown in Figure 1, the method comprises that following step S101 is to step S107:
S101: receive the access instruction from the user, particularly, when the user wants some websites conducted interviews, can input or link the network address of this website of input to open webpage, at this moment, can receive user's access instruction.
S102: load the webpage corresponding with access instruction, that is, the HTML on the network address corresponding with access instruction is loaded, obtain the webpage corresponding with access instruction.
S103: the web page code to the webpage that loads scans;
S104: judge whether comprise in the web page code that scans that attribute is the element of the first default attribute, wherein, the first default element corresponding to attribute is cryptographic element, particularly, mainly scan by the web page code to the webpage that loads, detect and whether can scan the element that attribute is the first default attribute in the scanning process, in embodiments of the present invention, the first default attribute can be defined as attribute type=" password ", if in the web page code of the webpage that loads, scan the input element that contains attribute type=" password ", comprise in the web page code of then determining to scan that attribute is the element of the first default attribute.
S105: judge whether comprise in the web page code that scans that attribute is the element of the second default attribute, wherein, the second default element corresponding to attribute is the user name element, particularly, mainly by detecting whether to scan the element that attribute is the second default attribute in the scanning process, in embodiments of the present invention, the second default attribute can be defined as nearest apart from cryptographic element, and satisfy attribute type=" text ", if in the web page code of the webpage that loads, scan the input element that contains attribute type=" text ", comprise in the web page code of then determining to scan that attribute is the element of the second default attribute.
S106: comprise in the web page code that scans that attribute is the element of the first default attribute if judge, and comprise that attribute is the element of the second default attribute, the webpage that then loads among the determining step S102 is the list webpage, namely, in judging web page code, not only comprise cryptographic element but also comprise the user name element, can determine that the webpage that loads is a list webpage, also namely realizes the identification to list.
The form recognition method of the embodiment of the invention scans by the web page code of the user being accessed the webpage that loads, realization is to the monitoring of web page code, and then realization is to the monitoring of each element property in the web page code, the webpage that go out to load with fast detecting whether comprise satisfy default attribute element (namely, realize that fast detecting goes out cryptographic element and user name element), this kind only needs by the username field in the webpage and password field are carried out the method for monitoring, only need scan to the web page code of the webpage of loading whether the webpage that can realize loading is the identification of list webpage, relatively need recognition methods that a plurality of fields in the list after submitting to are successfully judged in the prior art, effectively reduced the complexity of form recognition, solve the lower problem of form recognition rate in the prior art, and then reached the effect that improves the form recognition rate.
Particularly, when the user conducted interviews issuing of instruction by dissimilar browser kernel, for the browser of different kernels, the concrete processing mode of the step S103 in the recognition methods that the embodiment of the invention provides was different.
Wherein, when the web page code to Web page loading carries out surface sweeping, at first the kernel type that produces access instruction is obtained, then take different scan methods according to the difference of the kernel type that gets access to, particularly, when the kernel type that gets access to is the Trident kernel, then after webpage corresponding to access instruction loads end, in the web page code of the webpage that loads, inject default JavaScript scripted code (namely, the JS scripted code), then rely on the JS scripted code that the web page code HTML of the webpage of loading is scanned monitoring; When the kernel type that gets access to is the Webkit kernel, then can directly scan the input control in the dom tree in the web page code, wherein, dom tree refers to DOM Document Object Model (HTML Document Object Model, be called for short HTML DOM), HTML DOM then is DOM Document Object Model special applicable and HTML optimization HTML/XHTML, to the input control in the dom tree scan determine cryptographic element and user name element flow process as shown in Figure 2, wherein, when in scanning dom tree, containing general input control, directly determine that attribute is that the input element of type=" password " is Password Input frame element, in the dom tree on the Password Input frame, nearest attribute type=" text ", editable input element is the user name input frame; When not being general input control in scanning dom tree, can start in the situation of webpage of automatic form filling function at webpage the compound condition consumer positioning name element and the cryptographic element that become according to webpage Origin URL, element id, element name attribute and other element set.
After the webpage of determining loading is the list webpage, namely after identifying list, the recognition methods of the embodiment of the invention also comprises: judge whether to receive the triggering command that the user issues, wherein, triggering command is used for the submission form webpage, judge receive triggering command after, determine that the list webpage is effective list, namely, after this list webpage that loads is submitted to by the user, browser determines that namely the list of this submission is an effective list, so that follow-up when again this list webpage being loaded, can identify more rapidly and accurately this list.Particularly, when the user when the list webpage that comprises cryptographic element and user name element is submitted to, can correspondingly trigger submission event corresponding in the web page code, then can whether be triggered to realize judging whether to receive the triggering command that the user issues by judging this submission event.
Wherein, when if the kernel type of generation access instruction is the Trident kernel, the judgement flow process that judges whether to receive triggering command is shown in Figure 3, particularly, as shown in Figure 3, at first scan by the web page code of JS scripted code to the webpage of loading, element take getattr as the 3rd default attribute (below be called the first element), wherein, submit button in web page code adopts in the situation of general rule, the 3rd default attribute can be defined as attribute type=" submit ", gets access to the first element and just refers to get access to that attribute is the element of type=" submit " in the web page code, also namely refers to find the submit button in the webpage of loading; Submit button in web page code adopts in the situation of non-general rule, then becomes eligible location submit button according to element, attribute and other element set; Then the first element is copied and obtain the second element, and cover the first element with the second element, namely, adopt the cloning process in the default JavaScript scripted code that this submit button is cloned, and the submit button after will cloning is placed on former submit button front end, and by a submit button name complexity and the unique id of time stream naming method after to the clone; At last, judge whether the second element is performed, because the user is when carrying out the submission of cryptographic element and user name element, must realize through triggering submission event id, this moment, the second element then was performed, so, then can realize judging whether to receive the triggering command that the user issues by judging whether the second element is performed, wherein, when the second element is performed, determine to receive the triggering command that the user issues.Behind processing execution clone's submit button, the event that original submit button is corresponding also can be performed according to original order, if after carrying out the event of submit button own, rreturn value is true, then continue to carry out the form event in the web page code, if rreturn value is false, then stop.Wherein, adopting " the first element ", " the second element " such descriptive language is for different elements is distinguished expression, and be not to be that sequencing to element is construed as limiting, the place that similar statement hereinafter occurs also is in order to distinguish, and is not the restriction to sequencing.
If producing the kernel type of access instruction is the Webkit kernel, it then is the element of the 3rd default attribute by attribute in the web page code that the input control in the dom tree in the web page code is scanned the webpage that gets access to loading, obtain the first element, namely, get access to the submit button in the webpage of caryogram browser in the Webkit, then by judging whether the first element is performed to judge whether to receive the triggering command that the user issues, be performed if judge the first element, then determine to receive triggering command.
Further, the form recognition method of the embodiment of the invention also comprises fills in the treatment step of preserving with list to list, when the kernel type of browser is the Trident kernel, whole form recognition method is shown in Figure 4, when the kernel type of browser was the Webkit kernel, whole form recognition method was shown in Figure 5, can find out from Fig. 4 and Fig. 5, for the browser of different kernels, it is identical that concrete list is filled in the treatment step of preserving with list.
Particularly, after determining that the webpage that loads is the list webpage, and judge whether to receive and carry out list before the triggering command and fill in, be specially: both comprised in judging web page code that attribute was the element of the first default attribute, and comprise that attribute is after the element of the second default attribute, when the user clicks the user name login frame, triggering the Renderer process is that the first element and attribute of presetting attribute is that the second element of presetting attribute is caught to attribute, obtain cryptographic element and user name element, and by the Renderer process send IPC ask to host process the presetting database of host process carried out the inquiry judging code data corresponding with cryptographic element and whether all to be kept in the presetting database with the corresponding username data of user name element, wherein, this presetting database is to preserve the database of form information, by form information is kept in the preset data, realized the intercommunication to form data under the double-core browser; At last, when judging code data and username data and all be kept in the presetting database, host process sends an IPC and asks the process to Renderer, filtered out the username data of having preserved and the code data of optimum matching by the Renderer process, and carry out the code data that filters out is added in the cryptographic element of webpage of loading, and username data is added in the user name element of webpage of loading.Wherein, the Renderer process is screened username data and code data and the principle of mating is: judge at first whether the list of having preserved is arranged under the current URL, if the list of having preserved is arranged under the current URL, then preferentially code data and the username data of this list are added in the element corresponding on the webpage, realize precisely coupling; If the list of not preserved under the current URL is then searched the list of preserving under the current URL master territory, and code data and the username data of the list preserved under its main territory added in the element corresponding on the webpage, realize fuzzy matching.Illustrate, if under URL " a.xxx.com ", preserved the A list, under URL " b.xxx.com ", preserved the B list, when the user opens " a.xxx.com ", can be with the username data of list A and B and code data all as alternative, but the preferential username data of selecting the A list and code data are as optimum matching; If under URL " a.xxx.com ", do not preserve list, under URL " b.xxx.com ", preserved the B list, when the user opens " a.xxx.com ", then can be with the username data of list B and code data as the username data that filters out and code data.
When preservation is processed to list, be specially: at first, after determining that the webpage that loads is the list webpage, and before judging whether to receive triggering command, when the user clicks the user name login frame, triggering the Renderer process is that the first element and attribute of presetting attribute is that the second element of presetting attribute is caught to attribute, obtain cryptographic element and user name element, and by the Renderer process send IPC ask to host process the presetting database of host process carried out the inquiry judging code data corresponding with cryptographic element and whether all to be kept in the presetting database with the corresponding username data of user name element, be kept in the presetting database when judging username data, when but code data is not kept in the presetting database, host process sends an IPC and asks the process to Renderer on the one hand, filtered out the username data of having preserved of optimum matching by the Renderer process, and the username data that filters out is added in the user name element of webpage of loading, receive on the other hand the code data of user's input; When judging username data and code data and all be not kept in the presetting database, then directly receive username data and the code data of user's input.So far only reached the step that list is filled in.Secondly, after receiving triggering command, namely, after the user has triggered login, send an IPC by the Renderer process and ask to host process, triggered playing the window prompting by host process, whether select to preserve code data and username data with prompting user, or whether prompting preserves code data; At last, receive user's selection instruction, and when selection instruction represents to select to preserve, preserve code data and username data to presetting database, or preserve code data to presetting database.
The embodiment of the invention also provides a kind of form recognition device, below form recognition device that the embodiment of the invention is provided be specifically introduced:
Fig. 6 is the schematic diagram according to the list device of the embodiment of the invention, and as shown in Figure 6, the form recognition device of this embodiment comprises receiving element 10, loading unit 20, scanning element 30, the first judging unit 40, the second judging unit 50 and determining unit 60.
Receiving element 10 is used for receiving access instruction, particularly, when the user wants some websites conducted interviews, can input or link the network address of this website of input to open webpage, at this moment, receiving element 10 can be by receiving the reception that realizes user's access instruction to the network address of user's input or the network address of link input;
Loading unit 20 is used for loading the webpage corresponding with access instruction, particularly, the HTML on the network address corresponding with access instruction is loaded, and obtains the webpage corresponding with access instruction;
Scanning element 30 is used for the web page code of the webpage that loads is scanned;
The first judging unit 40 is used for judging whether the web page code that scans comprises that attribute is the element of the first default attribute, wherein, the first default element corresponding to attribute is cryptographic element, particularly, mainly scan by the web page code to the webpage that loads, detect and whether can scan the element that attribute is the first default attribute in the scanning process, in embodiments of the present invention, the first default attribute can be defined as attribute type=" password ", if in the web page code of the webpage that loads, scan the input element that contains attribute type=" password ", comprise in the web page code of then determining to scan that attribute is the element of the first default attribute;
The second judging unit 50 is used for judging whether the web page code that scans comprises that attribute is the element of the second default attribute, wherein, the second default element corresponding to attribute is the user name element, particularly, mainly by detecting whether to scan the element that attribute is the second default attribute in the scanning process, in embodiments of the present invention, the second default attribute can be defined as nearest apart from cryptographic element, and satisfy attribute type=" text ", if in the web page code of the webpage that loads, scan the input element that contains attribute type=" text ", comprise in the web page code of then determining to scan that attribute is the element of the second default attribute;
If determining unit 60 is used for judging the web page code that scans and comprises that attribute is the element of the first default attribute, and comprise that attribute is the element of the second default attribute, determine that then the webpage that loads in the loading unit 20 is the list webpage, namely, in judging web page code, not only comprise cryptographic element but also comprise the user name element, list webpage during the webpage that can determine to load is also namely realized the identification to list.
The form recognition device of the embodiment of the invention scans by the web page code of the user being accessed the webpage that loads, realization is to the monitoring of web page code, and then realization is to the monitoring of each element property in the web page code, with fast detecting go out whether comprise satisfy default attribute element (namely, realize that fast detecting goes out cryptographic element and user name element), this kind only needs by the username field in the webpage and password field are carried out the method for monitoring, only need scan to the web page code of the webpage of loading whether the webpage that can realize loading is the identification of list webpage, relatively need recognition methods that a plurality of fields in the list after submitting to are successfully judged in the prior art, effectively reduced the complexity of form recognition, solve the lower problem of form recognition rate in the prior art, and then reached the effect that improves the form recognition rate.
Particularly, when the user conducted interviews issuing of instruction by dissimilar browser kernel, for the browser of different kernels, the scan mode of corresponding scanning element 30 was different when carrying out web page code to the webpage that loads and scan.
Wherein, when the web page code to Web page loading scans, at first obtain the kernel type that subelement produces access instruction by first in the scanning element 30 and obtain; When the kernel type that gets access to is the Trident kernel, then by first in the scanning element 30 scanning subelement inject default scripted code to web page code so that web page code is scanned; When the kernel type that gets access to is the Webkit kernel, then by the scanning of second in the scanning element 30 subelement the input control in the dom tree in the web page code is scanned.
Further, the form recognition device of the embodiment of the invention is after the triggering command that receives for the submission form webpage, can determine that the list webpage is effective list, so that follow-up when again this list webpage being loaded, can identify more rapidly and accurately this list.Wherein, in the form recognition method that the invention described above embodiment provides, do concrete introduction for the determination methods that whether receives triggering command, repeat no more herein.
In addition, the form recognition device of the embodiment of the invention can also be preserved and fill in the list that identifies, the form recognition device carry out list preserve with the concrete grammar of filling in and form recognition method that the invention described above embodiment provides in to carry out the list preservation identical with the step of filling in, repeat no more equally herein.
As can be seen from the above description, the present invention has reduced the complexity of form recognition effectively by whether comprising cryptographic element and user name element in the fast detecting web page code, has reached the effect that improves the form recognition rate; Simultaneously, by form data is preserved, realize the intercommunication of the form data under the double-core browser, improved the applicability of list.
Need to prove, can in the computer system such as one group of computer executable instructions, carry out in the step shown in the process flow diagram of accompanying drawing, and, although there is shown logical order in flow process, but in some cases, can carry out step shown or that describe with the order that is different from herein.
Obviously, those skilled in the art should be understood that, above-mentioned each module of the present invention or each step can realize with general calculation element, they can concentrate on the single calculation element, perhaps be distributed on the network that a plurality of calculation elements form, alternatively, they can be realized with the executable program code of calculation element, thereby, they can be stored in the memory storage and be carried out by calculation element, perhaps they are made into respectively each integrated circuit modules, perhaps a plurality of modules in them or step are made into the single integrated circuit module and realize.Like this, the present invention is not restricted to any specific hardware and software combination.
The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. a form recognition method is characterized in that, comprising:
Receive access instruction;
Load the webpage corresponding with described access instruction;
Web page code to the webpage that loads scans;
Judge whether comprise in the web page code that scans that attribute is the element of the first default attribute, wherein, the described first default element corresponding to attribute is cryptographic element;
Judge whether comprise in the web page code that scans that attribute is the element of the second default attribute, wherein, the described second default element corresponding to attribute is the user name element; And
Comprise in the web page code that scans that attribute is the element of the described first default attribute if judge, and comprise that attribute is the element of the described second default attribute, determines that then the webpage that loads is the list webpage.
2. form recognition method according to claim 1 is characterized in that, the web page code of the webpage that loads is scanned comprise:
Obtain the kernel type that produces described access instruction;
If the kernel type that gets access to is the Trident kernel, then inject default scripted code to described web page code so that described web page code is scanned; And
If the kernel type that gets access to is the Webkit kernel, then the input control in the dom tree in the described web page code is scanned.
3. form recognition method according to claim 1 is characterized in that, after determining that the webpage that loads is the list webpage, described form recognition method also comprises:
Judge whether to receive triggering command, wherein, described triggering command is used for submitting to described list webpage; And
Receive described triggering command if judge, determine that then described list webpage is effective list.
4. form recognition method according to claim 3 is characterized in that, when the kernel type that produces described access instruction is the Trident kernel, judges whether to receive triggering command and comprises:
Obtaining attribute in the described web page code is the element of the 3rd default attribute, obtains the first element, and wherein, the described the 3rd default element corresponding to attribute is the submission event;
Copy described the first element, obtain the second element;
Cover described the first element with described the second element; And
Judge whether described the second element is performed, be performed if judge described the second element, then determine to receive described triggering command.
5. form recognition method according to claim 3 is characterized in that, when the kernel type that produces described access instruction is the Webkit kernel, judges whether to receive triggering command and comprises:
Obtaining attribute in the described web page code is the element of the 3rd default attribute, obtains the first element, and wherein, the described the 3rd default element corresponding to attribute is the submission event; And
Judge whether described the first element is performed, be performed if judge described the first element, then determine to receive described triggering command.
6. form recognition method according to claim 3 is characterized in that, after determining that the webpage that loads is the list webpage, and before judging whether to receive triggering command, described form recognition method also comprises:
Getattr is the element of the described first default attribute, obtains cryptographic element;
Getattr is the element of the described second default attribute, obtains the user name element;
The inquiry presetting database is to judge whether code data and username data all have been kept in the described presetting database, and wherein, described code data is data corresponding to described cryptographic element, and described username data is data corresponding to described user name element; And
All be kept in the described presetting database if judge described code data and described username data, then added described code data to the cryptographic element of the webpage that loads, and added described username data to the user name element of the webpage that loads.
7. form recognition method according to claim 3 is characterized in that, after determining that the webpage that loads is the list webpage, and before judging whether to receive triggering command, described form recognition method also comprises:
Getattr is the element of the described first default attribute, obtains cryptographic element;
Getattr is the element of the described second default attribute, obtains the user name element;
The inquiry presetting database is to judge whether code data and username data all have been kept in the described presetting database, and wherein, described code data is data corresponding to described cryptographic element, and described username data is data corresponding to described user name element;
If judging described username data has been kept in the described presetting database, and described code data is not kept in the described presetting database, then add described username data to the user name element of the webpage that loads, and receive the code data of user's input; And
All be not kept in the described presetting database if judge described username data and described code data, then receive code data and the username data of user's input.
8. form recognition method according to claim 7 is characterized in that, judge receive described triggering command after, described form recognition method also comprises:
Show the default window that plays, wherein, be provided with suggestion content on the described default bullet window, described suggestion content is used for pointing out described user selection whether to preserve described code data and described username data, or points out described user selection whether to preserve described code data;
Reception is from described user's selection instruction; And
When described selection instruction represents to select to preserve described code data and described username data, preserve described code data and described username data to described presetting database, or preserve described code data to described presetting database.
9. a form recognition device is characterized in that, comprising:
Receiving element is used for receiving access instruction;
Loading unit is used for loading the webpage corresponding with described access instruction;
Scanning element is used for the web page code of the webpage that loads is scanned;
The first judging unit is used for judging whether the web page code that scans comprises that attribute is the element of the first default attribute, and wherein, the described first default element corresponding to attribute is cryptographic element;
The second judging unit is used for judging whether the web page code that scans comprises that attribute is the element of the second default attribute, and wherein, the described second default element corresponding to attribute is the user name element; And
Determining unit comprises that attribute is the element of the described first default attribute if be used for judging the web page code that scans, and comprises that attribute is the element of the described second default attribute, determines that then the webpage that loads is the list webpage.
10. form recognition device according to claim 9 is characterized in that, described scanning element comprises:
First obtains subelement, is used for obtaining the kernel type that produces described access instruction;
The first scanning subelement is used for when the kernel type that gets access to is the Trident kernel, inject default scripted code to described web page code so that described web page code is scanned; And
The second scanning subelement is used for when the kernel type that gets access to is the Webkit kernel input control in the dom tree in the described web page code being scanned.
CN201210529911.4A 2012-12-10 2012-12-10 Form recognition method and device Active CN103034711B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210529911.4A CN103034711B (en) 2012-12-10 2012-12-10 Form recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210529911.4A CN103034711B (en) 2012-12-10 2012-12-10 Form recognition method and device

Publications (2)

Publication Number Publication Date
CN103034711A true CN103034711A (en) 2013-04-10
CN103034711B CN103034711B (en) 2016-08-03

Family

ID=48021605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210529911.4A Active CN103034711B (en) 2012-12-10 2012-12-10 Form recognition method and device

Country Status (1)

Country Link
CN (1) CN103034711B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104571903A (en) * 2013-10-28 2015-04-29 腾讯科技(深圳)有限公司 Input box switching method and input box switching device
CN109246069A (en) * 2018-06-15 2019-01-18 华为技术有限公司 Webpage login method, device and readable storage medium storing program for executing
CN109460522A (en) * 2018-10-30 2019-03-12 北京网众共创科技有限公司 The acquisition methods and device of site information
CN114510930A (en) * 2022-03-31 2022-05-17 北京圣博润高新技术股份有限公司 Method, device, electronic equipment and medium for auditing operation document

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100037219A1 (en) * 2008-08-05 2010-02-11 International Buisness Machines Corporation Predictive logic for automatic web form completion
CN102651019A (en) * 2012-03-30 2012-08-29 奇智软件(北京)有限公司 Method and device for parsing tagged file
CN102663130A (en) * 2012-04-27 2012-09-12 华为技术有限公司 Method and device for submitting webpage data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100037219A1 (en) * 2008-08-05 2010-02-11 International Buisness Machines Corporation Predictive logic for automatic web form completion
CN102651019A (en) * 2012-03-30 2012-08-29 奇智软件(北京)有限公司 Method and device for parsing tagged file
CN102663130A (en) * 2012-04-27 2012-09-12 华为技术有限公司 Method and device for submitting webpage data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张忠: "面向Web表单的信息抽取通用模型", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 August 2007 (2007-08-15) *
张翀: "自动填充深度网入口表单", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 September 2007 (2007-09-15) *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104571903A (en) * 2013-10-28 2015-04-29 腾讯科技(深圳)有限公司 Input box switching method and input box switching device
CN109246069A (en) * 2018-06-15 2019-01-18 华为技术有限公司 Webpage login method, device and readable storage medium storing program for executing
CN109246069B (en) * 2018-06-15 2020-10-16 华为技术有限公司 Webpage login method and device and readable storage medium
CN109460522A (en) * 2018-10-30 2019-03-12 北京网众共创科技有限公司 The acquisition methods and device of site information
CN114510930A (en) * 2022-03-31 2022-05-17 北京圣博润高新技术股份有限公司 Method, device, electronic equipment and medium for auditing operation document
CN114510930B (en) * 2022-03-31 2022-07-15 北京圣博润高新技术股份有限公司 Method, device, electronic equipment and medium for auditing operation document

Also Published As

Publication number Publication date
CN103034711B (en) 2016-08-03

Similar Documents

Publication Publication Date Title
US10013502B1 (en) Preloading resources of a web page
US7299403B1 (en) Methods and apparatus for obtaining a state of a browser
CN105045887B (en) The system and method for mixed mode cross-domain data interaction
US9134978B1 (en) Context-sensitive optimization level selection
CN108363815B (en) Webpage pre-reading method and device and intelligent terminal equipment
US9459888B2 (en) Implementing browser based hypertext transfer protocol session storage
US20120311419A1 (en) System for displaying cached webpages, a server therefor, a terminal therefor, a method therefor and a computer-readable recording medium on which the method is recorded
US20080295024A1 (en) Communication between browser windows
CN106126693B (en) Method and device for sending related data of webpage
US9817799B2 (en) Method and apparatus for providing web pages
CN102968584B (en) A kind of method and apparatus of log-on webpage
US10885143B2 (en) Determining whether an authenticated user session is active for a domain
CN102833212A (en) Webpage visitor identity identification method and system
US10158691B2 (en) Method and apparatus for providing network resources at intermediary server
CN102185830B (en) A kind of method and system of security filtration of network television browser
CN104346464A (en) Processing method and device of webpage element information and browser client
CN103034711A (en) Form recognition method and device
CN106033450A (en) Method and device for blocking advertisement, and browser
CN111339456B (en) Preloading method and device
CN103716319B (en) A kind of apparatus and method of web access optimization
CN104008331A (en) Access method, device and system of malicious web
CN102355449A (en) Method, gateway and system for implicitly transmitting reorientation request
CN103116725B (en) The method of screen locking, device and browser are carried out to webpage
US20160117392A1 (en) Information search method and apparatus
CN102929877A (en) Method and device for causing form data on webpage to be generated into form document

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100022 Beijing City, Chaoyang District Chaoyang Road No. 237 Fosun international center 12

Applicant after: BEIJING KINGSOFT INTERNET SECURITY SOFTWARE Co.,Ltd.

Applicant after: Beijing Cheetah Network Technology Co.,Ltd.

Applicant after: Beijing Cheetah Mobile Technology Co.,Ltd.

Applicant after: CONEW NETWORK TECHNOLOGY (BEIJING) Co.,Ltd.

Address before: 100022 Beijing City, Chaoyang District Chaoyang Road No. 237 Fosun international center 12

Applicant before: BEIJING KINGSOFT INTERNET SECURITY SOFTWARE Co.,Ltd.

Applicant before: BEIJING KINGSOFT NETWORK TECHNOLOGY Co.,Ltd.

Applicant before: SHELL INTERNET (BEIJING) SECURITY TECHNOLOGY Co.,Ltd.

Applicant before: CONEW NETWORK TECHNOLOGY (BEIJING) Co.,Ltd.

COR Change of bibliographic data
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20181224

Address after: Room 105-53967, No. 6 Baohua Road, Hengqin New District, Zhuhai City, Guangdong Province

Patentee after: Zhuhai Leopard Fun Technology Co.,Ltd.

Address before: 100022 the 12 level of Fuxing International Center, 237 Chaoyang North Road, Chaoyang District, Beijing.

Co-patentee before: Beijing Cheetah Network Technology Co.,Ltd.

Patentee before: BEIJING KINGSOFT INTERNET SECURITY SOFTWARE Co.,Ltd.

Co-patentee before: Beijing Cheetah Mobile Technology Co.,Ltd.

Co-patentee before: CONEW NETWORK TECHNOLOGY (BEIJING) Co.,Ltd.