Summary of the invention
Fundamental purpose of the present invention is to provide a kind of form recognition method and apparatus, to solve the lower problem of form recognition rate in the prior art.
To achieve these goals, according to an aspect of the present invention, provide a kind of form recognition method, having comprised: received access instruction; Load the webpage corresponding with access instruction; Web page code to the webpage that loads scans; Judge whether comprise in the web page code that scans that attribute is the element of the first default attribute, wherein, the first default element corresponding to attribute is cryptographic element; Judge whether comprise in the web page code that scans that attribute is the element of the second default attribute, wherein, the second default element corresponding to attribute is the user name element; And if judge and comprise in the web page code that scans that attribute is the element of the first default attribute, and comprise that attribute is the element of the second default attribute, determines that then the webpage that loads is the list webpage.
Further, the web page code of the webpage that loads is scanned comprise: obtain the kernel type that produces access instruction; If the kernel type that gets access to is the Trident kernel, then inject default scripted code to web page code so that web page code is scanned; And if the kernel type that gets access to is the Webkit kernel, then the input control in the dom tree in the web page code is scanned.
Further, after determining that the webpage that loads is the list webpage, the form recognition method also comprises: judge whether to receive triggering command, wherein, triggering command is used for the submission form webpage; And if judge and receive triggering command, determine that then the list webpage is effective list.
Further, when the kernel type that produces access instruction is the Trident kernel, judge whether that receiving triggering command comprises: obtain that attribute is the element of the 3rd default attribute in the web page code, obtain the first element, wherein, the 3rd default element corresponding to attribute is the submission event; Copy the first element, obtain the second element; Cover the first element with the second element; And judge whether the second element is performed, and be performed if judge the second element, then determine to receive triggering command.
Further, when the kernel type that produces access instruction is the Webkit kernel, judge whether that receiving triggering command comprises: obtain that attribute is the element of the 3rd default attribute in the web page code, obtain the first element, wherein, the 3rd default element corresponding to attribute is the submission event; And judge whether the first element is performed, and be performed if judge the first element, then determine to receive triggering command.
Further, after determining that the webpage that loads is the list webpage, and before judging whether to receive triggering command, the form recognition method also comprises: getattr is the element of the first default attribute, obtains cryptographic element; Getattr is the element of the second default attribute, obtains the user name element; The inquiry presetting database is to judge whether code data and username data all have been kept in the presetting database, and wherein, code data is data corresponding to cryptographic element, and username data is data corresponding to user name element; And if judge code data and username data all has been kept in the presetting database, then add code data to the cryptographic element of the webpage that loads, and add username data to the user name element of the webpage that loads.
Further, after determining that the webpage that loads is the list webpage, and before judging whether to receive triggering command, the form recognition method also comprises: getattr is the element of the first default attribute, obtains cryptographic element; Getattr is the element of the second default attribute, obtains the user name element; The inquiry presetting database is to judge whether code data and username data all have been kept in the presetting database, and wherein, code data is data corresponding to cryptographic element, and username data is data corresponding to user name element; Be kept in the presetting database if judge username data, and code data is not kept in the presetting database, then adds username data to the user name element of the webpage that loads, and receive the code data of user's input; And if judge username data and code data all is not kept in the presetting database, then receive code data and the username data of user's input.
Further, judge receive triggering command after, the form recognition method also comprises: show the default window that plays, wherein, the default bullet on the window is provided with suggestion content, suggestion content is used for prompting user and selects whether preserve code data and username data, or prompting user selects whether to preserve code data; Reception is from user's selection instruction; And when selection instruction represents to select to preserve code data and username data, preserve code data and username data to presetting database, or preserve code data to presetting database.
To achieve these goals, according to a further aspect in the invention, provide a kind of form recognition device, this form recognition device is used for carrying out any form recognition method that foregoing of the present invention provides.
To achieve these goals, according to a further aspect in the invention, provide a kind of form recognition device, having comprised: receiving element is used for receiving access instruction; Loading unit is used for loading the webpage corresponding with access instruction; Scanning element is used for the web page code of the webpage that loads is scanned; The first judging unit is used for judging whether the web page code that scans comprises that attribute is the element of the first default attribute, and wherein, the first default element corresponding to attribute is cryptographic element; The second judging unit is used for judging whether the web page code that scans comprises that attribute is the element of the second default attribute, and wherein, the second default element corresponding to attribute is the user name element; And determining unit, comprise that attribute is the element of the first default attribute if be used for judging the web page code that scans, and comprise that attribute is the element of the second default attribute, determines that then the webpage that loads is the list webpage.
Further, scanning element comprises: first obtains subelement, is used for obtaining the kernel type that produces access instruction; The first scanning subelement is used for when the kernel type that gets access to is the Trident kernel, inject default scripted code to web page code so that web page code is scanned; And the second scanning subelement, be used for when the kernel type that gets access to is the Webkit kernel, the input control in the dom tree in the web page code being scanned.
By the present invention, adopt to receive access instruction; Load the webpage corresponding with access instruction; Web page code to the webpage that loads scans; Judge whether comprise in the web page code that scans that attribute is the element of the first default attribute, wherein, the first default element corresponding to attribute is cryptographic element; Judge whether comprise in the web page code that scans that attribute is the element of the second default attribute, wherein, the second default element corresponding to attribute is the user name element; And if judge and comprise in the web page code that scans that attribute is the element of the first default attribute, and comprise that attribute is the element of the second default attribute, determines that then the webpage that loads is the list webpage.Scan by the web page code of the user being accessed the webpage that loads, realization is to the monitoring of web page code, and then realization is to the monitoring of each element property in the web page code, the webpage that go out to load with fast detecting whether comprise satisfy default attribute element (namely, realize that fast detecting goes out cryptographic element and user name element), this kind only needs by the username field in the webpage and password field are carried out the method for monitoring, only need scan to the web page code of the webpage of loading whether the webpage that can realize loading is the identification of list webpage, relatively need recognition methods that a plurality of fields in the list after submitting to are successfully judged in the prior art, effectively reduced the complexity of form recognition, solve the lower problem of form recognition rate in the prior art, and then reached the effect that improves the form recognition rate.
Embodiment
Need to prove, in the situation that do not conflict, embodiment and the feature among the embodiment among the application can make up mutually.Describe below with reference to the accompanying drawings and in conjunction with the embodiments the present invention in detail.
The embodiment of the invention provides a kind of form recognition method, below form recognition method that the embodiment of the invention is provided be specifically introduced:
Fig. 1 is the process flow diagram according to the form recognition method of the embodiment of the invention, and as shown in Figure 1, the method comprises that following step S101 is to step S107:
S101: receive the access instruction from the user, particularly, when the user wants some websites conducted interviews, can input or link the network address of this website of input to open webpage, at this moment, can receive user's access instruction.
S102: load the webpage corresponding with access instruction, that is, the HTML on the network address corresponding with access instruction is loaded, obtain the webpage corresponding with access instruction.
S103: the web page code to the webpage that loads scans;
S104: judge whether comprise in the web page code that scans that attribute is the element of the first default attribute, wherein, the first default element corresponding to attribute is cryptographic element, particularly, mainly scan by the web page code to the webpage that loads, detect and whether can scan the element that attribute is the first default attribute in the scanning process, in embodiments of the present invention, the first default attribute can be defined as attribute type=" password ", if in the web page code of the webpage that loads, scan the input element that contains attribute type=" password ", comprise in the web page code of then determining to scan that attribute is the element of the first default attribute.
S105: judge whether comprise in the web page code that scans that attribute is the element of the second default attribute, wherein, the second default element corresponding to attribute is the user name element, particularly, mainly by detecting whether to scan the element that attribute is the second default attribute in the scanning process, in embodiments of the present invention, the second default attribute can be defined as nearest apart from cryptographic element, and satisfy attribute type=" text ", if in the web page code of the webpage that loads, scan the input element that contains attribute type=" text ", comprise in the web page code of then determining to scan that attribute is the element of the second default attribute.
S106: comprise in the web page code that scans that attribute is the element of the first default attribute if judge, and comprise that attribute is the element of the second default attribute, the webpage that then loads among the determining step S102 is the list webpage, namely, in judging web page code, not only comprise cryptographic element but also comprise the user name element, can determine that the webpage that loads is a list webpage, also namely realizes the identification to list.
The form recognition method of the embodiment of the invention scans by the web page code of the user being accessed the webpage that loads, realization is to the monitoring of web page code, and then realization is to the monitoring of each element property in the web page code, the webpage that go out to load with fast detecting whether comprise satisfy default attribute element (namely, realize that fast detecting goes out cryptographic element and user name element), this kind only needs by the username field in the webpage and password field are carried out the method for monitoring, only need scan to the web page code of the webpage of loading whether the webpage that can realize loading is the identification of list webpage, relatively need recognition methods that a plurality of fields in the list after submitting to are successfully judged in the prior art, effectively reduced the complexity of form recognition, solve the lower problem of form recognition rate in the prior art, and then reached the effect that improves the form recognition rate.
Particularly, when the user conducted interviews issuing of instruction by dissimilar browser kernel, for the browser of different kernels, the concrete processing mode of the step S103 in the recognition methods that the embodiment of the invention provides was different.
Wherein, when the web page code to Web page loading carries out surface sweeping, at first the kernel type that produces access instruction is obtained, then take different scan methods according to the difference of the kernel type that gets access to, particularly, when the kernel type that gets access to is the Trident kernel, then after webpage corresponding to access instruction loads end, in the web page code of the webpage that loads, inject default JavaScript scripted code (namely, the JS scripted code), then rely on the JS scripted code that the web page code HTML of the webpage of loading is scanned monitoring; When the kernel type that gets access to is the Webkit kernel, then can directly scan the input control in the dom tree in the web page code, wherein, dom tree refers to DOM Document Object Model (HTML Document Object Model, be called for short HTML DOM), HTML DOM then is DOM Document Object Model special applicable and HTML optimization HTML/XHTML, to the input control in the dom tree scan determine cryptographic element and user name element flow process as shown in Figure 2, wherein, when in scanning dom tree, containing general input control, directly determine that attribute is that the input element of type=" password " is Password Input frame element, in the dom tree on the Password Input frame, nearest attribute type=" text ", editable input element is the user name input frame; When not being general input control in scanning dom tree, can start in the situation of webpage of automatic form filling function at webpage the compound condition consumer positioning name element and the cryptographic element that become according to webpage Origin URL, element id, element name attribute and other element set.
After the webpage of determining loading is the list webpage, namely after identifying list, the recognition methods of the embodiment of the invention also comprises: judge whether to receive the triggering command that the user issues, wherein, triggering command is used for the submission form webpage, judge receive triggering command after, determine that the list webpage is effective list, namely, after this list webpage that loads is submitted to by the user, browser determines that namely the list of this submission is an effective list, so that follow-up when again this list webpage being loaded, can identify more rapidly and accurately this list.Particularly, when the user when the list webpage that comprises cryptographic element and user name element is submitted to, can correspondingly trigger submission event corresponding in the web page code, then can whether be triggered to realize judging whether to receive the triggering command that the user issues by judging this submission event.
Wherein, when if the kernel type of generation access instruction is the Trident kernel, the judgement flow process that judges whether to receive triggering command is shown in Figure 3, particularly, as shown in Figure 3, at first scan by the web page code of JS scripted code to the webpage of loading, element take getattr as the 3rd default attribute (below be called the first element), wherein, submit button in web page code adopts in the situation of general rule, the 3rd default attribute can be defined as attribute type=" submit ", gets access to the first element and just refers to get access to that attribute is the element of type=" submit " in the web page code, also namely refers to find the submit button in the webpage of loading; Submit button in web page code adopts in the situation of non-general rule, then becomes eligible location submit button according to element, attribute and other element set; Then the first element is copied and obtain the second element, and cover the first element with the second element, namely, adopt the cloning process in the default JavaScript scripted code that this submit button is cloned, and the submit button after will cloning is placed on former submit button front end, and by a submit button name complexity and the unique id of time stream naming method after to the clone; At last, judge whether the second element is performed, because the user is when carrying out the submission of cryptographic element and user name element, must realize through triggering submission event id, this moment, the second element then was performed, so, then can realize judging whether to receive the triggering command that the user issues by judging whether the second element is performed, wherein, when the second element is performed, determine to receive the triggering command that the user issues.Behind processing execution clone's submit button, the event that original submit button is corresponding also can be performed according to original order, if after carrying out the event of submit button own, rreturn value is true, then continue to carry out the form event in the web page code, if rreturn value is false, then stop.Wherein, adopting " the first element ", " the second element " such descriptive language is for different elements is distinguished expression, and be not to be that sequencing to element is construed as limiting, the place that similar statement hereinafter occurs also is in order to distinguish, and is not the restriction to sequencing.
If producing the kernel type of access instruction is the Webkit kernel, it then is the element of the 3rd default attribute by attribute in the web page code that the input control in the dom tree in the web page code is scanned the webpage that gets access to loading, obtain the first element, namely, get access to the submit button in the webpage of caryogram browser in the Webkit, then by judging whether the first element is performed to judge whether to receive the triggering command that the user issues, be performed if judge the first element, then determine to receive triggering command.
Further, the form recognition method of the embodiment of the invention also comprises fills in the treatment step of preserving with list to list, when the kernel type of browser is the Trident kernel, whole form recognition method is shown in Figure 4, when the kernel type of browser was the Webkit kernel, whole form recognition method was shown in Figure 5, can find out from Fig. 4 and Fig. 5, for the browser of different kernels, it is identical that concrete list is filled in the treatment step of preserving with list.
Particularly, after determining that the webpage that loads is the list webpage, and judge whether to receive and carry out list before the triggering command and fill in, be specially: both comprised in judging web page code that attribute was the element of the first default attribute, and comprise that attribute is after the element of the second default attribute, when the user clicks the user name login frame, triggering the Renderer process is that the first element and attribute of presetting attribute is that the second element of presetting attribute is caught to attribute, obtain cryptographic element and user name element, and by the Renderer process send IPC ask to host process the presetting database of host process carried out the inquiry judging code data corresponding with cryptographic element and whether all to be kept in the presetting database with the corresponding username data of user name element, wherein, this presetting database is to preserve the database of form information, by form information is kept in the preset data, realized the intercommunication to form data under the double-core browser; At last, when judging code data and username data and all be kept in the presetting database, host process sends an IPC and asks the process to Renderer, filtered out the username data of having preserved and the code data of optimum matching by the Renderer process, and carry out the code data that filters out is added in the cryptographic element of webpage of loading, and username data is added in the user name element of webpage of loading.Wherein, the Renderer process is screened username data and code data and the principle of mating is: judge at first whether the list of having preserved is arranged under the current URL, if the list of having preserved is arranged under the current URL, then preferentially code data and the username data of this list are added in the element corresponding on the webpage, realize precisely coupling; If the list of not preserved under the current URL is then searched the list of preserving under the current URL master territory, and code data and the username data of the list preserved under its main territory added in the element corresponding on the webpage, realize fuzzy matching.Illustrate, if under URL " a.xxx.com ", preserved the A list, under URL " b.xxx.com ", preserved the B list, when the user opens " a.xxx.com ", can be with the username data of list A and B and code data all as alternative, but the preferential username data of selecting the A list and code data are as optimum matching; If under URL " a.xxx.com ", do not preserve list, under URL " b.xxx.com ", preserved the B list, when the user opens " a.xxx.com ", then can be with the username data of list B and code data as the username data that filters out and code data.
When preservation is processed to list, be specially: at first, after determining that the webpage that loads is the list webpage, and before judging whether to receive triggering command, when the user clicks the user name login frame, triggering the Renderer process is that the first element and attribute of presetting attribute is that the second element of presetting attribute is caught to attribute, obtain cryptographic element and user name element, and by the Renderer process send IPC ask to host process the presetting database of host process carried out the inquiry judging code data corresponding with cryptographic element and whether all to be kept in the presetting database with the corresponding username data of user name element, be kept in the presetting database when judging username data, when but code data is not kept in the presetting database, host process sends an IPC and asks the process to Renderer on the one hand, filtered out the username data of having preserved of optimum matching by the Renderer process, and the username data that filters out is added in the user name element of webpage of loading, receive on the other hand the code data of user's input; When judging username data and code data and all be not kept in the presetting database, then directly receive username data and the code data of user's input.So far only reached the step that list is filled in.Secondly, after receiving triggering command, namely, after the user has triggered login, send an IPC by the Renderer process and ask to host process, triggered playing the window prompting by host process, whether select to preserve code data and username data with prompting user, or whether prompting preserves code data; At last, receive user's selection instruction, and when selection instruction represents to select to preserve, preserve code data and username data to presetting database, or preserve code data to presetting database.
The embodiment of the invention also provides a kind of form recognition device, below form recognition device that the embodiment of the invention is provided be specifically introduced:
Fig. 6 is the schematic diagram according to the list device of the embodiment of the invention, and as shown in Figure 6, the form recognition device of this embodiment comprises receiving element 10, loading unit 20, scanning element 30, the first judging unit 40, the second judging unit 50 and determining unit 60.
Receiving element 10 is used for receiving access instruction, particularly, when the user wants some websites conducted interviews, can input or link the network address of this website of input to open webpage, at this moment, receiving element 10 can be by receiving the reception that realizes user's access instruction to the network address of user's input or the network address of link input;
Loading unit 20 is used for loading the webpage corresponding with access instruction, particularly, the HTML on the network address corresponding with access instruction is loaded, and obtains the webpage corresponding with access instruction;
Scanning element 30 is used for the web page code of the webpage that loads is scanned;
The first judging unit 40 is used for judging whether the web page code that scans comprises that attribute is the element of the first default attribute, wherein, the first default element corresponding to attribute is cryptographic element, particularly, mainly scan by the web page code to the webpage that loads, detect and whether can scan the element that attribute is the first default attribute in the scanning process, in embodiments of the present invention, the first default attribute can be defined as attribute type=" password ", if in the web page code of the webpage that loads, scan the input element that contains attribute type=" password ", comprise in the web page code of then determining to scan that attribute is the element of the first default attribute;
The second judging unit 50 is used for judging whether the web page code that scans comprises that attribute is the element of the second default attribute, wherein, the second default element corresponding to attribute is the user name element, particularly, mainly by detecting whether to scan the element that attribute is the second default attribute in the scanning process, in embodiments of the present invention, the second default attribute can be defined as nearest apart from cryptographic element, and satisfy attribute type=" text ", if in the web page code of the webpage that loads, scan the input element that contains attribute type=" text ", comprise in the web page code of then determining to scan that attribute is the element of the second default attribute;
If determining unit 60 is used for judging the web page code that scans and comprises that attribute is the element of the first default attribute, and comprise that attribute is the element of the second default attribute, determine that then the webpage that loads in the loading unit 20 is the list webpage, namely, in judging web page code, not only comprise cryptographic element but also comprise the user name element, list webpage during the webpage that can determine to load is also namely realized the identification to list.
The form recognition device of the embodiment of the invention scans by the web page code of the user being accessed the webpage that loads, realization is to the monitoring of web page code, and then realization is to the monitoring of each element property in the web page code, with fast detecting go out whether comprise satisfy default attribute element (namely, realize that fast detecting goes out cryptographic element and user name element), this kind only needs by the username field in the webpage and password field are carried out the method for monitoring, only need scan to the web page code of the webpage of loading whether the webpage that can realize loading is the identification of list webpage, relatively need recognition methods that a plurality of fields in the list after submitting to are successfully judged in the prior art, effectively reduced the complexity of form recognition, solve the lower problem of form recognition rate in the prior art, and then reached the effect that improves the form recognition rate.
Particularly, when the user conducted interviews issuing of instruction by dissimilar browser kernel, for the browser of different kernels, the scan mode of corresponding scanning element 30 was different when carrying out web page code to the webpage that loads and scan.
Wherein, when the web page code to Web page loading scans, at first obtain the kernel type that subelement produces access instruction by first in the scanning element 30 and obtain; When the kernel type that gets access to is the Trident kernel, then by first in the scanning element 30 scanning subelement inject default scripted code to web page code so that web page code is scanned; When the kernel type that gets access to is the Webkit kernel, then by the scanning of second in the scanning element 30 subelement the input control in the dom tree in the web page code is scanned.
Further, the form recognition device of the embodiment of the invention is after the triggering command that receives for the submission form webpage, can determine that the list webpage is effective list, so that follow-up when again this list webpage being loaded, can identify more rapidly and accurately this list.Wherein, in the form recognition method that the invention described above embodiment provides, do concrete introduction for the determination methods that whether receives triggering command, repeat no more herein.
In addition, the form recognition device of the embodiment of the invention can also be preserved and fill in the list that identifies, the form recognition device carry out list preserve with the concrete grammar of filling in and form recognition method that the invention described above embodiment provides in to carry out the list preservation identical with the step of filling in, repeat no more equally herein.
As can be seen from the above description, the present invention has reduced the complexity of form recognition effectively by whether comprising cryptographic element and user name element in the fast detecting web page code, has reached the effect that improves the form recognition rate; Simultaneously, by form data is preserved, realize the intercommunication of the form data under the double-core browser, improved the applicability of list.
Need to prove, can in the computer system such as one group of computer executable instructions, carry out in the step shown in the process flow diagram of accompanying drawing, and, although there is shown logical order in flow process, but in some cases, can carry out step shown or that describe with the order that is different from herein.
Obviously, those skilled in the art should be understood that, above-mentioned each module of the present invention or each step can realize with general calculation element, they can concentrate on the single calculation element, perhaps be distributed on the network that a plurality of calculation elements form, alternatively, they can be realized with the executable program code of calculation element, thereby, they can be stored in the memory storage and be carried out by calculation element, perhaps they are made into respectively each integrated circuit modules, perhaps a plurality of modules in them or step are made into the single integrated circuit module and realize.Like this, the present invention is not restricted to any specific hardware and software combination.
The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.