CN110059272A - A kind of page feature recognition methods and device - Google Patents

A kind of page feature recognition methods and device Download PDF

Info

Publication number
CN110059272A
CN110059272A CN201811300896.XA CN201811300896A CN110059272A CN 110059272 A CN110059272 A CN 110059272A CN 201811300896 A CN201811300896 A CN 201811300896A CN 110059272 A CN110059272 A CN 110059272A
Authority
CN
China
Prior art keywords
page
key point
feature
block
structure feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811300896.XA
Other languages
Chinese (zh)
Other versions
CN110059272B (en
Inventor
饶海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201811300896.XA priority Critical patent/CN110059272B/en
Publication of CN110059272A publication Critical patent/CN110059272A/en
Application granted granted Critical
Publication of CN110059272B publication Critical patent/CN110059272B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application provides a kind of page feature recognition methods and device.The tactic pattern of several page key point is first preset, then successively matches the structure feature of webpage blocks at different levels with the structure feature of preset page key point, with the page key point in automatic identification webpage.When the user clicks when arbitrary element on the page, its corresponding page key point automatically generated can be all obtained automatically and is reported.In the case where not needing to mark page key point and tracking element in advance, the identification and classification that element is clicked to user are realized.

Description

A kind of page feature recognition methods and device
Technical field
This specification is related to Internet technical field more particularly to a kind of page feature recognition methods and device.
Background technique
In Web user behavioural analysis, it usually needs the behaviors such as browsing, click carried out to user in page end divide Analysis.And when the click behavior to user is analyzed, generally require the element information for obtaining user and clicking, user occurs Logical blocks information where the element of click, to identify the click behavior of simultaneously sorted users.
In traditional user behavior information collection tool, generally require manually to be labeled the key point of the page.When When user clicks, then the element key point corresponding with the element that will click on is reported.Traditional scheme needs numerous Trivial manual mark movement, there are markers works it is cumbersome, time-consuming, precision is poor the problems such as.
Summary of the invention
In view of the above technical problems, this specification embodiment provides a kind of page feature recognition methods and device, technical side Case is as follows:
According to this specification embodiment in a first aspect, provide a kind of page feature recognition methods, this method comprises:
Determine that parent page to be identified, the parent page to be identified are made of multistage block, according to from higher level's block to The traversal order of junior's block traverses the parent page, and is directed to either block, proceeds as follows:
Using preset matching algorithm by the page structure feature of the block and preset a variety of key point structure features It is matched, if the matching degree of the structure feature of the block and any pre-set page key point structure feature meets default item The block is then identified as corresponding page key point by part;
The each page key point that will identify that is determined as the page feature of the page.
According to the second aspect of this specification embodiment, a kind of user behavior acquisition side of page feature recognition methods is provided Method, which comprises
After monitoring that click behavior occurs for user, the page elements that the user clicks are obtained;
The page key point of block where obtaining the page elements, using page key point the returning as page elements Belong to page key point;
The page elements information and corresponding home pages key point information are reported, using reported result as this user Behavior collection result.
According to the third aspect of this specification embodiment, a kind of page feature identification device is provided, described device includes:
Parent page obtains module: for determining parent page to be identified, the parent page to be identified is by multistage block Composition traverses the parent page according to from higher level's block to the traversal order of junior's block, and is directed to either block, carries out such as Lower operation:
Structure feature matching module: for use preset matching algorithm by the page structure feature of the block with it is preset A variety of key point structure features matched, if the structure feature of the block and any pre-set page key point structure feature Matching degree meet preset condition, then the block is identified as corresponding page key point;
Page feature determining module: the page that each page key point for will identify that is determined as the page is special Sign.
According to the fourth aspect of this specification embodiment, a kind of user behavior based on page feature identification device is provided and is adopted Acquisition means, described device include:
User behavior monitoring modular: after monitoring that click behavior occurs for user, the page that the user clicks is obtained Element;
Page key point determining module: for the page key point of block where obtaining the page elements, by the page Home pages key point of the face key point as page elements;
User behavior reporting module: for reporting the page elements information and corresponding home pages key point information, Using reported result as this user behavior collection result.
According to the 5th of this specification embodiment aspect, a kind of computer equipment is provided, including memory, processor and deposit Store up the computer program that can be run on a memory and on a processor, wherein the processor is realized when executing described program A kind of page feature recognition methods, which comprises
Determine that parent page to be identified, the parent page to be identified are made of multistage block, according to from higher level's block to The traversal order of junior's block traverses the parent page, and is directed to either block, proceeds as follows:
Using preset matching algorithm by the page structure feature of the block and preset a variety of key point structure features It is matched, if the matching degree of the structure feature of the block and any pre-set page key point structure feature meets default item The block is then identified as corresponding page key point by part;
The each page key point that will identify that is determined as the page feature of the page.
According to the 6th of this specification embodiment aspect, a kind of computer equipment is provided, including memory, processor and deposit Store up the computer program that can be run on a memory and on a processor, wherein the processor is realized when executing described program A kind of user behavior acquisition method based on page feature recognition methods, which comprises
After monitoring that click behavior occurs for user, the page elements that the user clicks are obtained;
The page key point of block where obtaining the page elements, using page key point the returning as page elements Belong to page key point;
The page elements information and corresponding home pages key point information are reported, using reported result as this user Behavior collection result.
Technical solution provided by this specification embodiment provides a kind of page feature recognition methods, first presets several The tactic pattern of kind page key point, then successively by the structure of the structure feature of webpage blocks at different levels and preset page key point Feature is matched, with the page key point in automatic identification section webpage.When the user clicks when arbitrary element on the page, all can Automatically its corresponding page key point automatically generated is obtained to be reported.It is not needing to mark page key point and tracking in advance In the case where element, the identification and classification that element is clicked to user are realized.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not This specification embodiment can be limited.
In addition, any embodiment in this specification embodiment does not need to reach above-mentioned whole effects.
Detailed description of the invention
In order to illustrate more clearly of this specification embodiment or technical solution in the prior art, below will to embodiment or Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only The some embodiments recorded in this specification embodiment for those of ordinary skill in the art can also be attached according to these Figure obtains other attached drawings.
Fig. 1 is a kind of flow chart of the page feature recognition methods shown in one exemplary embodiment of this specification;
Fig. 2-3 is a kind of schematic diagram of the page feature recognition methods shown in one exemplary embodiment of this specification;
Fig. 4 is the user behavior acquisition method based on page feature identification shown in one exemplary embodiment of this specification A kind of flow chart;
Fig. 5 is a kind of schematic diagram of the page feature identification device shown in one exemplary embodiment of this specification;
Fig. 6 is the user behavior acquisition device based on page feature identification shown in one exemplary embodiment of this specification A kind of schematic diagram;
Fig. 7 is a kind of structural schematic diagram of computer equipment shown in one exemplary embodiment of this specification.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with this specification.On the contrary, they are only and such as institute The example of the consistent device and method of some aspects be described in detail in attached claims, this specification.
It is only to be not intended to be limiting this explanation merely for for the purpose of describing particular embodiments in the term that this specification uses Book.The "an" of used singular, " described " and "the" are also intended to packet in this specification and in the appended claims Most forms are included, unless the context clearly indicates other meaning.It is also understood that term "and/or" used herein is Refer to and includes that one or more associated any or all of project listed may combine.
It will be appreciated that though various information may be described using term first, second, third, etc. in this specification, but These information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not taking off In the case where this specification range, the first information can also be referred to as the second information, and similarly, the second information can also be claimed For the first information.Depending on context, word as used in this " if " can be construed to " ... when " or " when ... " or " in response to determination ".
In Web user behavioural analysis, it usually needs the behaviors such as browsing, click carried out to user in page end divide Analysis.And when the click behavior to user is analyzed, generally require the element information for obtaining user and clicking, user occurs Logical blocks information where the element of click, to identify the click behavior of simultaneously sorted users.
In traditional user behavior information collection tool, generally require manually to be labeled the key point of the page.When When user clicks, then the element key point corresponding with the element that will click on is reported.Traditional scheme needs numerous Trivial manual mark movement, there are markers works it is cumbersome, time-consuming, precision is poor the problems such as.
In view of the above problems, this specification embodiment provides a kind of page feature recognition methods, and a kind of for executing The page feature identification device of this method is below described in detail the page feature recognition methods that the present embodiment is related to, ginseng As shown in Figure 1, this method may comprise steps of:
S101 determines that parent page to be identified, the parent page to be identified are made of multistage block, according to from parent-zone The traversal order of block to junior's block traverses the parent page;
It is appreciated that multistage block, that is, block can be divided into level-one block, second level block, each higher level's block of three-level block ... It may include multiple junior's blocks.
Structure feature identification is carried out by level-one block, which is alternatively referred to as pattern match mistake Journey.After the identification of some level-one block, continuation successively identifies the second level block for including under the level-one block, successively class It pushes away, is finished until by all blocks traversal identification under the parent page.
S102, for either block, using preset matching algorithm by the page structure feature of the block and preset A variety of key point structure features are matched, if the structure feature of the block and any pre-set page key point structure feature Matching degree meets preset condition, then the block is identified as corresponding page key point;
Wherein, preset a variety of key point structure features may include: header syntax feature, list structure feature, content Structure feature.And header syntax feature, list structure feature, the combination of two structure feature of content structure feature, i.e. title+ List structure feature, title+content structure feature, content+list structure feature.
Referring to fig. 2 with Fig. 3, be the Alipay page identification before with the schematic diagram after identification, illustrate specific identification stream Journey: obtaining one of level-one block of parent page, and the level-one block and preset a variety of key point structure features are carried out Matching, and then it is pre- to determine that the structure feature of the level-one block and the matching degree of " title+list " type key point structure feature meet If condition, which is determined as " title+list " mode,
Continue to judge one of second level block under the level-one block, by the second level block with it is preset a variety of Key point structure feature is matched, and then determines the structure feature and " list " type key point structure feature of the second level block Matching degree meets preset condition, which is determined as " list " mode.
Continue to judge one of three-level block under the second level block, by the three-level block with it is preset a variety of Key point structure feature is matched, and then determines the structure feature and " content " type key point structure feature of the three-level block Matching degree meets preset condition, which is determined as " content " mode.
Wherein, in page structure feature, that is, page element constitutive characteristic, may include the element species for including in the page Feature, number of elements feature, element size feature and/or element layout's feature etc..For example, when number of elements in block It is less, and element layout be individual element isolated layout when, such structure feature is determined as title feature;When in block When element layout is multiple elements equidistant mean array, such structure feature is determined as list characteristics;When in block not When meeting title feature and list characteristics, then determine that its maximum probability meets content characteristic.
Under a kind of actual application scenarios, the multistage block of parent page can be considered as to father and son in web page code and saved Point successively carries out pattern match to each child node recurrence since the root node of original web page.If for example: work as Preceding child node has matched title+list mode, then is marked as " title+list ", continues the child node for judging it.If current son Node matching list mode, then be marked as " list ", continue the child node for judging it.If current node has matched content Region mode is then marked as " content ", continues the child node for judging it.Until node all on the page is all it is determined that complete Finish.
S103, each page key point that will identify that are determined as the page feature of the page.
Further, each page key point that can be will identify that is labeled in pair of the parent page as page feature Answer position.
This specification also provides a kind of user behavior acquisition method based on above-mentioned page feature recognition methods, referring to fig. 4, This method may comprise steps of:
S401 after monitoring that click behavior occurs for user, obtains the page elements that the user clicks;
S402 is searched upwards from the page elements, and find first page key point is determined as page elements Home pages key point;
Wherein, all elements in the page are included in the page key point structure identified, i.e., the institute in the page There is element all to belong to the title that above-identified goes out, list, several page key point structures of content.Any page when the user clicks When element, its home pages key point can be all found according to the element.
S403 reports the page elements information and corresponding home pages key point information, using reported result as this Secondary user behavior collection result.
For example: the position that " Activates Account " referring to figs. 2 and 3, on the page is " list " key point identified, when with After some element that " list " block includes is clicked at family, the element being clicked from this is searched upwards, and then it is corresponding to obtain the element " list " key point and determine it as user's click page elements home pages key point, and report page member Prime information and corresponding home pages key point information.
It should be noted that not executing above-mentioned process then when page address portions when the user clicks can not respond.It illustrates Bright, " common Self-Service " position in the page is marked as " title " structure, but should " common Self-Service " position is can not When responding the clicking operation of user, therefore being somebody's turn to do the page elements of " common Self-Service " position when the user clicks, do not need to look into Look for the corresponding page key point of the page elements.
Corresponding to above method embodiment, this specification embodiment also provides a kind of page feature recognition methods device, ginseng As shown in Figure 5, the apparatus may include: parent pages to obtain module 510, structure feature matching module 520 and page feature Determining module 530.
Parent page obtains module 510: for determining parent page to be identified, the parent page to be identified is by multistage area Block composition traverses the parent page according to from higher level's block to the traversal order of junior's block, and is directed to either block, carries out Following operation:
Structure feature matching module 520: for use preset matching algorithm by the page structure feature of the block with Preset a variety of key point structure features are matched, if the structure feature of the block and any pre-set page key point structure The matching degree of feature meets preset condition, then the block is identified as corresponding page key point;
Page feature determining module 530: each page key point for will identify that is determined as the page of the page Feature.
Corresponding to above method embodiment, this specification embodiment also provides a kind of based on above-mentioned page feature identification device User behavior acquisition device, shown in Figure 6, the apparatus may include user behavior monitoring modular 610, the page is crucial Point determining module 620 and user behavior reporting module 630.
User behavior monitoring modular 610: after monitoring that click behavior occurs for user, the page that the user clicks is obtained Surface element;
Page key point determining module 620:, will be described for the page key point of block where obtaining the page elements Home pages key point of the page key point as page elements;
User behavior reporting module 630: for reporting the page elements information to believe with corresponding home pages key point Breath, using reported result as this user behavior collection result.
This specification embodiment also provides a kind of computer equipment, includes at least memory, processor and is stored in On reservoir and the computer program that can run on a processor, wherein processor realizes that the aforementioned page is special when executing described program Recognition methods is levied, the method includes at least:
Determine that parent page to be identified, the parent page to be identified are made of multistage block, according to from higher level's block to The traversal order of junior's block traverses the parent page, and is directed to either block, proceeds as follows:
Using preset matching algorithm by the page structure feature of the block and preset a variety of key point structure features It is matched, if the matching degree of the structure feature of the block and any pre-set page key point structure feature meets default item The block is then identified as corresponding page key point by part;
The each page key point that will identify that is determined as the page feature of the page.
This specification embodiment also provides a kind of computer equipment, includes at least memory, processor and is stored in On reservoir and the computer program that can run on a processor, wherein processor is realized aforementioned based on page when executing described program The user behavior acquisition method of region feature recognition methods, the method include at least:
After monitoring that click behavior occurs for user, the page elements that the user clicks are obtained;
The page key point of block where obtaining the page elements, using page key point the returning as page elements Belong to page key point;
The page elements information and corresponding home pages key point information are reported, using reported result as this user Behavior collection result.
Fig. 7 shows one kind provided by this specification embodiment and more specifically calculates device hardware structural schematic diagram, The equipment may include: processor 1110, memory 1120, input/output interface 1130, communication interface 1140 and bus 1150.Wherein processor 1110, memory 1120, input/output interface 1130 and communication interface 1140 are real by bus 1150 The now communication connection inside equipment each other.
Processor 1110 can use general CPU (Central Processing Unit, central processing unit), micro- place Reason device, application specific integrated circuit (Application Specific Integrated Circuit, ASIC) or one Or the modes such as multiple integrated circuits are realized, for executing relative program, to realize technical side provided by this specification embodiment Case.
Memory 1120 can use ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage device, the forms such as dynamic memory realize.Memory 1120 can store Operating system and other applications are realizing technical solution provided by this specification embodiment by software or firmware When, relevant program code is stored in memory 1120, and execution is called by processor 1110.
Input/output interface 1130 is for connecting input/output module, to realize information input and output.Input and output/ Module can be used as component Configuration (not shown) in a device, can also be external in equipment to provide corresponding function.Wherein Input equipment may include keyboard, mouse, touch screen, microphone, various kinds of sensors etc., output equipment may include display, Loudspeaker, vibrator, indicator light etc..
Communication interface 1140 is used for connection communication module (not shown), to realize the communication of this equipment and other equipment Interaction.Wherein communication module can be realized by wired mode (such as USB, cable etc.) and be communicated, can also be wirelessly (such as mobile network, WIFI, bluetooth etc.) realizes communication.
Bus 1150 include an access, equipment various components (such as processor 1110, memory 1120, input/it is defeated Outgoing interface 1130 and communication interface 1140) between transmit information.
It should be noted that although above equipment illustrates only processor 1110, memory 1120, input/output interface 1130, communication interface 1140 and bus 1150, but in the specific implementation process, which can also include realizing normal fortune Other assemblies necessary to row.In addition, it will be appreciated by those skilled in the art that, it can also be only comprising real in above equipment Component necessary to existing this specification example scheme, without including all components shown in figure.
This specification embodiment also provides a kind of computer readable storage medium, is stored thereon with computer program, the journey Page feature recognition methods above-mentioned is realized when sequence is executed by processor, the method includes at least:
Determine that parent page to be identified, the parent page to be identified are made of multistage block, according to from higher level's block to The traversal order of junior's block traverses the parent page, and is directed to either block, proceeds as follows:
Using preset matching algorithm by the page structure feature of the block and preset a variety of key point structure features It is matched, if the matching degree of the structure feature of the block and any pre-set page key point structure feature meets default item The block is then identified as corresponding page key point by part;
The each page key point that will identify that is determined as the page feature of the page.
This specification embodiment also provides a kind of computer readable storage medium, is stored thereon with computer program, the journey The user behavior acquisition method based on page feature recognition methods above-mentioned is realized when sequence is executed by processor, the method is at least Include:
After monitoring that click behavior occurs for user, the page elements that the user clicks are obtained;
The page key point of block where obtaining the page elements, using page key point the returning as page elements Belong to page key point;
The page elements information and corresponding home pages key point information are reported, using reported result as this user Behavior collection result.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein described be used as separation unit The unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be with It is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actual The purpose for needing to select some or all of the modules therein to realize this specification scheme.Those of ordinary skill in the art are not In the case where making the creative labor, it can understand and implement.
As seen through the above description of the embodiments, those skilled in the art can be understood that this specification Embodiment can be realized by means of software and necessary general hardware platform.Based on this understanding, this specification is implemented Substantially the part that contributes to existing technology can be embodied in the form of software products the technical solution of example in other words, The computer software product can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are to make It is each to obtain computer equipment (can be personal computer, server or the network equipment etc.) execution this specification embodiment Method described in certain parts of a embodiment or embodiment.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.A kind of typically to realize that equipment is computer, the concrete form of computer can To be personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play In device, navigation equipment, E-mail receiver/send equipment, game console, tablet computer, wearable device or these equipment The combination of any several equipment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device reality For applying example, since it is substantially similar to the method embodiment, so describing fairly simple, related place is referring to embodiment of the method Part explanation.The apparatus embodiments described above are merely exemplary, wherein described be used as separate part description Module may or may not be physically separated, can be each module when implementing this specification example scheme Function realize in the same or multiple software and or hardware.Can also select according to the actual needs part therein or Person's whole module achieves the purpose of the solution of this embodiment.Those of ordinary skill in the art are not the case where making the creative labor Under, it can it understands and implements.
The above is only the specific embodiment of this specification embodiment, it is noted that for the general of the art For logical technical staff, under the premise of not departing from this specification embodiment principle, several improvements and modifications can also be made, this A little improvements and modifications also should be regarded as the protection scope of this specification embodiment.

Claims (15)

1. a kind of page feature recognition methods, which comprises
Determine parent page to be identified, the parent page to be identified is made of multistage block, according to from higher level's block to junior The traversal order of block traverses the parent page, and is directed to either block, proceeds as follows:
The page structure feature of the block and preset a variety of key point structure features are carried out using preset matching algorithm Matching, if the matching degree of the structure feature of the block and any pre-set page key point structure feature meets preset condition, The block is identified as corresponding page key point;
The each page key point that will identify that is determined as the page feature of the page.
2. according to the method described in claim 1, each page key point that will identify that is determined as the page of the page Region feature, comprising:
The each page key point that will identify that is labeled in the corresponding position of the parent page as page feature.
3. according to the method described in claim 1, the page key point structure feature includes: header syntax feature, list knot Structure feature and content structure feature.
4. list structure is special according to the method described in claim 1, the key point structure feature includes: header syntax feature Sign, the combination of two structure feature of content structure feature.
5. according to the method described in claim 1, the page structure feature, including the element species feature for including in the page, Number of elements feature, element size feature and/or element layout's feature.
6. a kind of user behavior acquisition method based on claim 1 page feature recognition methods, which comprises
After monitoring that click behavior occurs for user, the page elements that the user clicks are obtained;
It is searched upwards from the page elements, find first page key point is determined as to the home pages of page elements Key point;
The page elements information and corresponding home pages key point information are reported, using reported result as this user behavior Collection result.
7. a kind of page feature identification device, described device include:
Parent page obtains module: for determining that parent page to be identified, the parent page to be identified are made of multistage block, The parent page is traversed according to from higher level's block to the traversal order of junior's block, and is directed to either block, is grasped as follows Make:
Structure feature matching module: for using preset matching algorithm by the page structure feature of the block and preset more Kind key point structure feature is matched, if of the structure feature of the block and any pre-set page key point structure feature Meet preset condition with degree, then the block is identified as corresponding page key point;
Page feature determining module: each page key point for will identify that is determined as the page feature of the page.
8. device according to claim 7, each page key point that will identify that is determined as the page of the page Region feature, comprising:
The each page key point that will identify that is labeled in the corresponding position of the parent page as page feature.
9. device according to claim 7, the page key point structure feature includes: header syntax feature, list knot Structure feature and content structure feature.
10. device according to claim 7, the key point structure feature includes: header syntax feature, and list structure is special Sign, the combination of two structure feature of content structure feature.
11. device according to claim 7, the page structure feature, including the element species feature for including in the page, Number of elements feature, distance feature between element size feature and/or element.
12. a kind of user behavior acquisition device based on claim 7 page feature identification device, described device include:
User behavior monitoring modular: after monitoring that click behavior occurs for user, the page elements that the user clicks are obtained;
Page key point determining module: for the page key point of block where obtaining the page elements, the page is closed Home pages key point of the key point as page elements;
User behavior reporting module:, will be upper for reporting the page elements information and corresponding home pages key point information Report result as this user behavior collection result.
13. device according to claim 12, the page key point for obtaining page elements place block, by institute State home pages key point of the page key point as page elements, comprising:
It is searched upwards from the page elements, find first page key point is determined as to the home pages of page elements Key point.
14. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, wherein the processor realizes the method as described in claim 1 when executing described program.
15. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, wherein the processor is realized method as claimed in claim 6 when executing described program.
CN201811300896.XA 2018-11-02 2018-11-02 Page feature recognition method and device Active CN110059272B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811300896.XA CN110059272B (en) 2018-11-02 2018-11-02 Page feature recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811300896.XA CN110059272B (en) 2018-11-02 2018-11-02 Page feature recognition method and device

Publications (2)

Publication Number Publication Date
CN110059272A true CN110059272A (en) 2019-07-26
CN110059272B CN110059272B (en) 2023-08-15

Family

ID=67315522

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811300896.XA Active CN110059272B (en) 2018-11-02 2018-11-02 Page feature recognition method and device

Country Status (1)

Country Link
CN (1) CN110059272B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888810A (en) * 2019-11-19 2020-03-17 广东润联信息技术有限公司 Method and device for automatic identification and marking, computer equipment and storage medium

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB9914127D0 (en) * 1998-07-06 1999-08-18 Ibm Display screen and window size related web page adaptation system
EP2015225A1 (en) * 2007-07-11 2009-01-14 Ricoh Company, Ltd. Invisible junction feature recognition for document security or annotation
US20090110288A1 (en) * 2007-10-29 2009-04-30 Kabushiki Kaisha Toshiba Document processing apparatus and document processing method
CA2747057A1 (en) * 2008-12-16 2010-07-08 Bodymedia, Inc. Method and apparatus for determining heart rate variability using wavelet transformation
CN101833574A (en) * 2010-04-15 2010-09-15 西安酷派软件科技有限公司 Method and system for locating application programs as well as mobile terminal
CN102314498A (en) * 2011-08-26 2012-01-11 百度在线网络技术(北京)有限公司 Method and equipment for implementing main identification of page
US20120162730A1 (en) * 2010-12-27 2012-06-28 Brother Kogyo Kabushiki Kaisha Image processing apparatus, image processing method and recording medium
CN102598038A (en) * 2009-10-30 2012-07-18 乐天株式会社 Characteristic content determination program, characteristic content determination device, characteristic content determination method, recording medium, content generation device, and related content insertion device
CA2837673A1 (en) * 2011-05-30 2012-12-06 Transcon Securities Pty Ltd Financial management system
CN102981689A (en) * 2011-09-07 2013-03-20 腾讯科技(深圳)有限公司 Method and device and system for achieving default focus positioning
CN103942224A (en) * 2013-01-23 2014-07-23 百度在线网络技术(北京)有限公司 Method and device for acquiring annotation rule of webpage blocks
CN104182424A (en) * 2013-05-28 2014-12-03 中国电信股份有限公司 Webpage processing method suitable for mobile terminal and server
CN105447139A (en) * 2015-11-20 2016-03-30 广州华多网络科技有限公司 Data acquisition statistical method, and system, terminal and service equipment thereof
CN106293765A (en) * 2016-08-23 2017-01-04 乐视控股(北京)有限公司 A kind of layout updates method and device
CN106598421A (en) * 2016-11-01 2017-04-26 乐视控股(北京)有限公司 Intelligent identification method and device for web clicks
CN106708952A (en) * 2016-11-25 2017-05-24 北京神州绿盟信息安全科技股份有限公司 Web page clustering method and device
CN107169007A (en) * 2017-03-31 2017-09-15 北京奇艺世纪科技有限公司 The display interface method to set up and device of a kind of mobile terminal
CN107633019A (en) * 2017-08-24 2018-01-26 阿里巴巴集团控股有限公司 A kind of page events acquisition method and device
CN107729768A (en) * 2017-11-03 2018-02-23 广州视源电子科技股份有限公司 A kind of page display method, device, Intelligent flat and storage medium
CN108021598A (en) * 2016-11-04 2018-05-11 广州市动景计算机科技有限公司 Page extraction template matching process, device and server
CN108683666A (en) * 2018-05-16 2018-10-19 新华三信息安全技术有限公司 A kind of web page identification method and device

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB9914127D0 (en) * 1998-07-06 1999-08-18 Ibm Display screen and window size related web page adaptation system
EP2015225A1 (en) * 2007-07-11 2009-01-14 Ricoh Company, Ltd. Invisible junction feature recognition for document security or annotation
US20090110288A1 (en) * 2007-10-29 2009-04-30 Kabushiki Kaisha Toshiba Document processing apparatus and document processing method
CA2747057A1 (en) * 2008-12-16 2010-07-08 Bodymedia, Inc. Method and apparatus for determining heart rate variability using wavelet transformation
CN102598038A (en) * 2009-10-30 2012-07-18 乐天株式会社 Characteristic content determination program, characteristic content determination device, characteristic content determination method, recording medium, content generation device, and related content insertion device
CN101833574A (en) * 2010-04-15 2010-09-15 西安酷派软件科技有限公司 Method and system for locating application programs as well as mobile terminal
US20120162730A1 (en) * 2010-12-27 2012-06-28 Brother Kogyo Kabushiki Kaisha Image processing apparatus, image processing method and recording medium
CA2837673A1 (en) * 2011-05-30 2012-12-06 Transcon Securities Pty Ltd Financial management system
CN102314498A (en) * 2011-08-26 2012-01-11 百度在线网络技术(北京)有限公司 Method and equipment for implementing main identification of page
CN102981689A (en) * 2011-09-07 2013-03-20 腾讯科技(深圳)有限公司 Method and device and system for achieving default focus positioning
CN103942224A (en) * 2013-01-23 2014-07-23 百度在线网络技术(北京)有限公司 Method and device for acquiring annotation rule of webpage blocks
CN104182424A (en) * 2013-05-28 2014-12-03 中国电信股份有限公司 Webpage processing method suitable for mobile terminal and server
CN105447139A (en) * 2015-11-20 2016-03-30 广州华多网络科技有限公司 Data acquisition statistical method, and system, terminal and service equipment thereof
CN106293765A (en) * 2016-08-23 2017-01-04 乐视控股(北京)有限公司 A kind of layout updates method and device
CN106598421A (en) * 2016-11-01 2017-04-26 乐视控股(北京)有限公司 Intelligent identification method and device for web clicks
CN108021598A (en) * 2016-11-04 2018-05-11 广州市动景计算机科技有限公司 Page extraction template matching process, device and server
CN106708952A (en) * 2016-11-25 2017-05-24 北京神州绿盟信息安全科技股份有限公司 Web page clustering method and device
CN107169007A (en) * 2017-03-31 2017-09-15 北京奇艺世纪科技有限公司 The display interface method to set up and device of a kind of mobile terminal
CN107633019A (en) * 2017-08-24 2018-01-26 阿里巴巴集团控股有限公司 A kind of page events acquisition method and device
CN107729768A (en) * 2017-11-03 2018-02-23 广州视源电子科技股份有限公司 A kind of page display method, device, Intelligent flat and storage medium
CN108683666A (en) * 2018-05-16 2018-10-19 新华三信息安全技术有限公司 A kind of web page identification method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
X. PAN 等: "Region Duplication Detection Using Image Feature Matching", 《IN IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY》, vol. 5, no. 4, pages 857 - 867, XP011318933 *
X. PAN 等: "Region Duplication Detection Using Image Feature Matching", 《IN IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY》, vol. 5, no. 4, pages 857, XP011318933 *
王霞: "网络教育新闻文本分类系统的设计与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 2012, pages 138 - 2664 *
范意兴 等: "一种基于网页块特征的多级网页聚类方法", 《山东大学学报(理学版)》, vol. 50, no. 7, pages 1 - 8 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110888810A (en) * 2019-11-19 2020-03-17 广东润联信息技术有限公司 Method and device for automatic identification and marking, computer equipment and storage medium
CN110888810B (en) * 2019-11-19 2020-10-30 广东润联信息技术有限公司 Method and device for automatic identification and marking, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110059272B (en) 2023-08-15

Similar Documents

Publication Publication Date Title
CN104123332B (en) The display methods and device of search result
CN107515915B (en) User identification association method based on user behavior data
CN104850489B (en) Mobile solution test system
US9032392B2 (en) Similarity engine for facilitating re-creation of an application collection of a source computing device on a destination computing device
CN106604362B (en) A kind of Wireless Fidelity Wi-Fi scan method and mobile terminal
CN105335409A (en) Target user determination method and device and network server
WO2015081720A1 (en) Instant messaging (im) based information recommendation method, apparatus, and terminal
CN109241403A (en) Item recommendation method, device, machinery equipment and computer readable storage medium
CN102591873B (en) A kind of information recommendation method and equipment
CN110516173B (en) Illegal network station identification method, illegal network station identification device, illegal network station identification equipment and illegal network station identification medium
CN103729362A (en) Method and device for determining navigation content
CN107092609A (en) A kind of information-pushing method and device
CN111814065B (en) Information propagation path analysis method and device, computer equipment and storage medium
US20150302088A1 (en) Method and System for Providing Personalized Content
CN106257448A (en) The methods of exhibiting of a kind of key word and device
CN109118387A (en) The total packet evaluation of enterprises credit of building, block chain and storage medium based on block chain
CN110322281A (en) The method for digging and device of similar users
CN106547870A (en) Point table method and device of data base
CN108121749A (en) Website user's behavior analysis method and device
CN106535102B (en) A kind of mobile terminal locating method and mobile terminal
CN110059272A (en) A kind of page feature recognition methods and device
CN108596412A (en) Cross-cutting methods of marking and Marking apparatus based on user's similarity
CN104301170A (en) Mobile terminal application friendliness evaluation method based on feature classification
CN105447020B (en) A kind of method and device of determining business object keyword
CN114374595A (en) Event node attribution analysis method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant