Summary of the invention
In view of the above technical problems, this specification embodiment provides a kind of page feature recognition methods and device, technical side
Case is as follows:
According to this specification embodiment in a first aspect, provide a kind of page feature recognition methods, this method comprises:
Determine that parent page to be identified, the parent page to be identified are made of multistage block, according to from higher level's block to
The traversal order of junior's block traverses the parent page, and is directed to either block, proceeds as follows:
Using preset matching algorithm by the page structure feature of the block and preset a variety of key point structure features
It is matched, if the matching degree of the structure feature of the block and any pre-set page key point structure feature meets default item
The block is then identified as corresponding page key point by part;
The each page key point that will identify that is determined as the page feature of the page.
According to the second aspect of this specification embodiment, a kind of user behavior acquisition side of page feature recognition methods is provided
Method, which comprises
After monitoring that click behavior occurs for user, the page elements that the user clicks are obtained;
The page key point of block where obtaining the page elements, using page key point the returning as page elements
Belong to page key point;
The page elements information and corresponding home pages key point information are reported, using reported result as this user
Behavior collection result.
According to the third aspect of this specification embodiment, a kind of page feature identification device is provided, described device includes:
Parent page obtains module: for determining parent page to be identified, the parent page to be identified is by multistage block
Composition traverses the parent page according to from higher level's block to the traversal order of junior's block, and is directed to either block, carries out such as
Lower operation:
Structure feature matching module: for use preset matching algorithm by the page structure feature of the block with it is preset
A variety of key point structure features matched, if the structure feature of the block and any pre-set page key point structure feature
Matching degree meet preset condition, then the block is identified as corresponding page key point;
Page feature determining module: the page that each page key point for will identify that is determined as the page is special
Sign.
According to the fourth aspect of this specification embodiment, a kind of user behavior based on page feature identification device is provided and is adopted
Acquisition means, described device include:
User behavior monitoring modular: after monitoring that click behavior occurs for user, the page that the user clicks is obtained
Element;
Page key point determining module: for the page key point of block where obtaining the page elements, by the page
Home pages key point of the face key point as page elements;
User behavior reporting module: for reporting the page elements information and corresponding home pages key point information,
Using reported result as this user behavior collection result.
According to the 5th of this specification embodiment aspect, a kind of computer equipment is provided, including memory, processor and deposit
Store up the computer program that can be run on a memory and on a processor, wherein the processor is realized when executing described program
A kind of page feature recognition methods, which comprises
Determine that parent page to be identified, the parent page to be identified are made of multistage block, according to from higher level's block to
The traversal order of junior's block traverses the parent page, and is directed to either block, proceeds as follows:
Using preset matching algorithm by the page structure feature of the block and preset a variety of key point structure features
It is matched, if the matching degree of the structure feature of the block and any pre-set page key point structure feature meets default item
The block is then identified as corresponding page key point by part;
The each page key point that will identify that is determined as the page feature of the page.
According to the 6th of this specification embodiment aspect, a kind of computer equipment is provided, including memory, processor and deposit
Store up the computer program that can be run on a memory and on a processor, wherein the processor is realized when executing described program
A kind of user behavior acquisition method based on page feature recognition methods, which comprises
After monitoring that click behavior occurs for user, the page elements that the user clicks are obtained;
The page key point of block where obtaining the page elements, using page key point the returning as page elements
Belong to page key point;
The page elements information and corresponding home pages key point information are reported, using reported result as this user
Behavior collection result.
Technical solution provided by this specification embodiment provides a kind of page feature recognition methods, first presets several
The tactic pattern of kind page key point, then successively by the structure of the structure feature of webpage blocks at different levels and preset page key point
Feature is matched, with the page key point in automatic identification section webpage.When the user clicks when arbitrary element on the page, all can
Automatically its corresponding page key point automatically generated is obtained to be reported.It is not needing to mark page key point and tracking in advance
In the case where element, the identification and classification that element is clicked to user are realized.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
This specification embodiment can be limited.
In addition, any embodiment in this specification embodiment does not need to reach above-mentioned whole effects.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all embodiments consistent with this specification.On the contrary, they are only and such as institute
The example of the consistent device and method of some aspects be described in detail in attached claims, this specification.
It is only to be not intended to be limiting this explanation merely for for the purpose of describing particular embodiments in the term that this specification uses
Book.The "an" of used singular, " described " and "the" are also intended to packet in this specification and in the appended claims
Most forms are included, unless the context clearly indicates other meaning.It is also understood that term "and/or" used herein is
Refer to and includes that one or more associated any or all of project listed may combine.
It will be appreciated that though various information may be described using term first, second, third, etc. in this specification, but
These information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not taking off
In the case where this specification range, the first information can also be referred to as the second information, and similarly, the second information can also be claimed
For the first information.Depending on context, word as used in this " if " can be construed to " ... when " or
" when ... " or " in response to determination ".
In Web user behavioural analysis, it usually needs the behaviors such as browsing, click carried out to user in page end divide
Analysis.And when the click behavior to user is analyzed, generally require the element information for obtaining user and clicking, user occurs
Logical blocks information where the element of click, to identify the click behavior of simultaneously sorted users.
In traditional user behavior information collection tool, generally require manually to be labeled the key point of the page.When
When user clicks, then the element key point corresponding with the element that will click on is reported.Traditional scheme needs numerous
Trivial manual mark movement, there are markers works it is cumbersome, time-consuming, precision is poor the problems such as.
In view of the above problems, this specification embodiment provides a kind of page feature recognition methods, and a kind of for executing
The page feature identification device of this method is below described in detail the page feature recognition methods that the present embodiment is related to, ginseng
As shown in Figure 1, this method may comprise steps of:
S101 determines that parent page to be identified, the parent page to be identified are made of multistage block, according to from parent-zone
The traversal order of block to junior's block traverses the parent page;
It is appreciated that multistage block, that is, block can be divided into level-one block, second level block, each higher level's block of three-level block ...
It may include multiple junior's blocks.
Structure feature identification is carried out by level-one block, which is alternatively referred to as pattern match mistake
Journey.After the identification of some level-one block, continuation successively identifies the second level block for including under the level-one block, successively class
It pushes away, is finished until by all blocks traversal identification under the parent page.
S102, for either block, using preset matching algorithm by the page structure feature of the block and preset
A variety of key point structure features are matched, if the structure feature of the block and any pre-set page key point structure feature
Matching degree meets preset condition, then the block is identified as corresponding page key point;
Wherein, preset a variety of key point structure features may include: header syntax feature, list structure feature, content
Structure feature.And header syntax feature, list structure feature, the combination of two structure feature of content structure feature, i.e. title+
List structure feature, title+content structure feature, content+list structure feature.
Referring to fig. 2 with Fig. 3, be the Alipay page identification before with the schematic diagram after identification, illustrate specific identification stream
Journey: obtaining one of level-one block of parent page, and the level-one block and preset a variety of key point structure features are carried out
Matching, and then it is pre- to determine that the structure feature of the level-one block and the matching degree of " title+list " type key point structure feature meet
If condition, which is determined as " title+list " mode,
Continue to judge one of second level block under the level-one block, by the second level block with it is preset a variety of
Key point structure feature is matched, and then determines the structure feature and " list " type key point structure feature of the second level block
Matching degree meets preset condition, which is determined as " list " mode.
Continue to judge one of three-level block under the second level block, by the three-level block with it is preset a variety of
Key point structure feature is matched, and then determines the structure feature and " content " type key point structure feature of the three-level block
Matching degree meets preset condition, which is determined as " content " mode.
Wherein, in page structure feature, that is, page element constitutive characteristic, may include the element species for including in the page
Feature, number of elements feature, element size feature and/or element layout's feature etc..For example, when number of elements in block
It is less, and element layout be individual element isolated layout when, such structure feature is determined as title feature;When in block
When element layout is multiple elements equidistant mean array, such structure feature is determined as list characteristics;When in block not
When meeting title feature and list characteristics, then determine that its maximum probability meets content characteristic.
Under a kind of actual application scenarios, the multistage block of parent page can be considered as to father and son in web page code and saved
Point successively carries out pattern match to each child node recurrence since the root node of original web page.If for example: work as
Preceding child node has matched title+list mode, then is marked as " title+list ", continues the child node for judging it.If current son
Node matching list mode, then be marked as " list ", continue the child node for judging it.If current node has matched content
Region mode is then marked as " content ", continues the child node for judging it.Until node all on the page is all it is determined that complete
Finish.
S103, each page key point that will identify that are determined as the page feature of the page.
Further, each page key point that can be will identify that is labeled in pair of the parent page as page feature
Answer position.
This specification also provides a kind of user behavior acquisition method based on above-mentioned page feature recognition methods, referring to fig. 4,
This method may comprise steps of:
S401 after monitoring that click behavior occurs for user, obtains the page elements that the user clicks;
S402 is searched upwards from the page elements, and find first page key point is determined as page elements
Home pages key point;
Wherein, all elements in the page are included in the page key point structure identified, i.e., the institute in the page
There is element all to belong to the title that above-identified goes out, list, several page key point structures of content.Any page when the user clicks
When element, its home pages key point can be all found according to the element.
S403 reports the page elements information and corresponding home pages key point information, using reported result as this
Secondary user behavior collection result.
For example: the position that " Activates Account " referring to figs. 2 and 3, on the page is " list " key point identified, when with
After some element that " list " block includes is clicked at family, the element being clicked from this is searched upwards, and then it is corresponding to obtain the element
" list " key point and determine it as user's click page elements home pages key point, and report page member
Prime information and corresponding home pages key point information.
It should be noted that not executing above-mentioned process then when page address portions when the user clicks can not respond.It illustrates
Bright, " common Self-Service " position in the page is marked as " title " structure, but should " common Self-Service " position is can not
When responding the clicking operation of user, therefore being somebody's turn to do the page elements of " common Self-Service " position when the user clicks, do not need to look into
Look for the corresponding page key point of the page elements.
Corresponding to above method embodiment, this specification embodiment also provides a kind of page feature recognition methods device, ginseng
As shown in Figure 5, the apparatus may include: parent pages to obtain module 510, structure feature matching module 520 and page feature
Determining module 530.
Parent page obtains module 510: for determining parent page to be identified, the parent page to be identified is by multistage area
Block composition traverses the parent page according to from higher level's block to the traversal order of junior's block, and is directed to either block, carries out
Following operation:
Structure feature matching module 520: for use preset matching algorithm by the page structure feature of the block with
Preset a variety of key point structure features are matched, if the structure feature of the block and any pre-set page key point structure
The matching degree of feature meets preset condition, then the block is identified as corresponding page key point;
Page feature determining module 530: each page key point for will identify that is determined as the page of the page
Feature.
Corresponding to above method embodiment, this specification embodiment also provides a kind of based on above-mentioned page feature identification device
User behavior acquisition device, shown in Figure 6, the apparatus may include user behavior monitoring modular 610, the page is crucial
Point determining module 620 and user behavior reporting module 630.
User behavior monitoring modular 610: after monitoring that click behavior occurs for user, the page that the user clicks is obtained
Surface element;
Page key point determining module 620:, will be described for the page key point of block where obtaining the page elements
Home pages key point of the page key point as page elements;
User behavior reporting module 630: for reporting the page elements information to believe with corresponding home pages key point
Breath, using reported result as this user behavior collection result.
This specification embodiment also provides a kind of computer equipment, includes at least memory, processor and is stored in
On reservoir and the computer program that can run on a processor, wherein processor realizes that the aforementioned page is special when executing described program
Recognition methods is levied, the method includes at least:
Determine that parent page to be identified, the parent page to be identified are made of multistage block, according to from higher level's block to
The traversal order of junior's block traverses the parent page, and is directed to either block, proceeds as follows:
Using preset matching algorithm by the page structure feature of the block and preset a variety of key point structure features
It is matched, if the matching degree of the structure feature of the block and any pre-set page key point structure feature meets default item
The block is then identified as corresponding page key point by part;
The each page key point that will identify that is determined as the page feature of the page.
This specification embodiment also provides a kind of computer equipment, includes at least memory, processor and is stored in
On reservoir and the computer program that can run on a processor, wherein processor is realized aforementioned based on page when executing described program
The user behavior acquisition method of region feature recognition methods, the method include at least:
After monitoring that click behavior occurs for user, the page elements that the user clicks are obtained;
The page key point of block where obtaining the page elements, using page key point the returning as page elements
Belong to page key point;
The page elements information and corresponding home pages key point information are reported, using reported result as this user
Behavior collection result.
Fig. 7 shows one kind provided by this specification embodiment and more specifically calculates device hardware structural schematic diagram,
The equipment may include: processor 1110, memory 1120, input/output interface 1130, communication interface 1140 and bus
1150.Wherein processor 1110, memory 1120, input/output interface 1130 and communication interface 1140 are real by bus 1150
The now communication connection inside equipment each other.
Processor 1110 can use general CPU (Central Processing Unit, central processing unit), micro- place
Reason device, application specific integrated circuit (Application Specific Integrated Circuit, ASIC) or one
Or the modes such as multiple integrated circuits are realized, for executing relative program, to realize technical side provided by this specification embodiment
Case.
Memory 1120 can use ROM (Read Only Memory, read-only memory), RAM (Random Access
Memory, random access memory), static storage device, the forms such as dynamic memory realize.Memory 1120 can store
Operating system and other applications are realizing technical solution provided by this specification embodiment by software or firmware
When, relevant program code is stored in memory 1120, and execution is called by processor 1110.
Input/output interface 1130 is for connecting input/output module, to realize information input and output.Input and output/
Module can be used as component Configuration (not shown) in a device, can also be external in equipment to provide corresponding function.Wherein
Input equipment may include keyboard, mouse, touch screen, microphone, various kinds of sensors etc., output equipment may include display,
Loudspeaker, vibrator, indicator light etc..
Communication interface 1140 is used for connection communication module (not shown), to realize the communication of this equipment and other equipment
Interaction.Wherein communication module can be realized by wired mode (such as USB, cable etc.) and be communicated, can also be wirelessly
(such as mobile network, WIFI, bluetooth etc.) realizes communication.
Bus 1150 include an access, equipment various components (such as processor 1110, memory 1120, input/it is defeated
Outgoing interface 1130 and communication interface 1140) between transmit information.
It should be noted that although above equipment illustrates only processor 1110, memory 1120, input/output interface
1130, communication interface 1140 and bus 1150, but in the specific implementation process, which can also include realizing normal fortune
Other assemblies necessary to row.In addition, it will be appreciated by those skilled in the art that, it can also be only comprising real in above equipment
Component necessary to existing this specification example scheme, without including all components shown in figure.
This specification embodiment also provides a kind of computer readable storage medium, is stored thereon with computer program, the journey
Page feature recognition methods above-mentioned is realized when sequence is executed by processor, the method includes at least:
Determine that parent page to be identified, the parent page to be identified are made of multistage block, according to from higher level's block to
The traversal order of junior's block traverses the parent page, and is directed to either block, proceeds as follows:
Using preset matching algorithm by the page structure feature of the block and preset a variety of key point structure features
It is matched, if the matching degree of the structure feature of the block and any pre-set page key point structure feature meets default item
The block is then identified as corresponding page key point by part;
The each page key point that will identify that is determined as the page feature of the page.
This specification embodiment also provides a kind of computer readable storage medium, is stored thereon with computer program, the journey
The user behavior acquisition method based on page feature recognition methods above-mentioned is realized when sequence is executed by processor, the method is at least
Include:
After monitoring that click behavior occurs for user, the page elements that the user clicks are obtained;
The page key point of block where obtaining the page elements, using page key point the returning as page elements
Belong to page key point;
The page elements information and corresponding home pages key point information are reported, using reported result as this user
Behavior collection result.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices
Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates
Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality
Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separation unit
The unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be with
It is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actual
The purpose for needing to select some or all of the modules therein to realize this specification scheme.Those of ordinary skill in the art are not
In the case where making the creative labor, it can understand and implement.
As seen through the above description of the embodiments, those skilled in the art can be understood that this specification
Embodiment can be realized by means of software and necessary general hardware platform.Based on this understanding, this specification is implemented
Substantially the part that contributes to existing technology can be embodied in the form of software products the technical solution of example in other words,
The computer software product can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are to make
It is each to obtain computer equipment (can be personal computer, server or the network equipment etc.) execution this specification embodiment
Method described in certain parts of a embodiment or embodiment.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity,
Or it is realized by the product with certain function.A kind of typically to realize that equipment is computer, the concrete form of computer can
To be personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play
In device, navigation equipment, E-mail receiver/send equipment, game console, tablet computer, wearable device or these equipment
The combination of any several equipment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device reality
For applying example, since it is substantially similar to the method embodiment, so describing fairly simple, related place is referring to embodiment of the method
Part explanation.The apparatus embodiments described above are merely exemplary, wherein described be used as separate part description
Module may or may not be physically separated, can be each module when implementing this specification example scheme
Function realize in the same or multiple software and or hardware.Can also select according to the actual needs part therein or
Person's whole module achieves the purpose of the solution of this embodiment.Those of ordinary skill in the art are not the case where making the creative labor
Under, it can it understands and implements.
The above is only the specific embodiment of this specification embodiment, it is noted that for the general of the art
For logical technical staff, under the premise of not departing from this specification embodiment principle, several improvements and modifications can also be made, this
A little improvements and modifications also should be regarded as the protection scope of this specification embodiment.