CN103488652A - Webpage content extraction method and webpage content extraction device - Google Patents

Webpage content extraction method and webpage content extraction device Download PDF

Info

Publication number
CN103488652A
CN103488652A CN201210195729.XA CN201210195729A CN103488652A CN 103488652 A CN103488652 A CN 103488652A CN 201210195729 A CN201210195729 A CN 201210195729A CN 103488652 A CN103488652 A CN 103488652A
Authority
CN
China
Prior art keywords
user
web page
coordinate
input
zone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201210195729.XA
Other languages
Chinese (zh)
Other versions
CN103488652B (en
Inventor
吴涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Oak Pacific Interactive Technology Development Co Ltd
Original Assignee
Beijing Oak Pacific Interactive Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Oak Pacific Interactive Technology Development Co Ltd filed Critical Beijing Oak Pacific Interactive Technology Development Co Ltd
Priority to CN201210195729.XA priority Critical patent/CN103488652B/en
Publication of CN103488652A publication Critical patent/CN103488652A/en
Application granted granted Critical
Publication of CN103488652B publication Critical patent/CN103488652B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a webpage content extraction device and a webpage content extraction method. The webpage content extraction device comprises a detection unit, a first calculation unit, a second calculation unit, a comparison unit and an extraction unit, wherein the detection unit is used for detecting the input of a user; the first calculation unit is used for calculating the coordinate of a region according to the detected input of the user, and the input of the user is within the region; the second calculation unit is used for calculating the coordinates of webpage contents; the comparison unit is used for comparing the calculated coordinate of the region with the calculated coordinates of the webpage contents; the extraction unit is used for extracting corresponding webpage contents in the region according to a comparison result. According to the device and the method, the user can extract webpage contents conveniently.

Description

Webpage content extracting method and web page contents extraction element
Technical field
The present invention relates to infotech.More specifically, the present invention relates to a kind of webpage content extracting method and a kind of web page contents extraction element.
Background technology
Along with the fast development of internet, the information on internet all increases every day with surprising rapidity.Wherein the web page with html format is main information carrier, and therefore internet also becomes one of main information source of current social.
How to extract corresponding content from webpage, be one of study hotspot of industry.At present, knownly there are a variety of web page contents extractive techniques.For example, the language-specific based on user design extract web page contents technology, extract the technology of web page contents, web page contents extractive technique based on ontology (Ontology) etc. based on natural language.
Summary of the invention
Provided hereinafter about brief overview of the present invention, in order to the basic comprehension about some aspect of the present invention is provided.Should be appreciated that, this general introduction is not about exhaustive general introduction of the present invention.It is not that intention is determined key of the present invention or pith, and nor is it intended to limit the scope of the present invention.Its purpose is only that the form of simplifying provides some concept, usings this as the preorder in greater detail of discussing after a while.
According to a first aspect of the invention, propose a kind of webpage content extracting method, having comprised: detected user's input; According to the user's who detects input, calculate the coordinate in a zone, wherein user's input is in described zone; Calculate the coordinate of web page contents; The coordinate of the coordinate in the zone relatively calculated and the web page contents calculated; And, according to comparative result, extract the corresponding web page content in described zone.
According to a second aspect of the invention, proposed a kind of web page contents extraction element, having comprised: detecting unit, for detection of user's input; The first computing unit, for the input of the user according to detecting, calculate the coordinate in a zone, and wherein user's input is in described zone; The second computing unit, for calculating the coordinate of web page contents; Comparing unit, for the coordinate in the zone that relatively calculates and the coordinate of the web page contents calculated; And extraction unit, for according to comparative result, extract the corresponding web page content in described zone.
According to the present invention, the user can extract web page contents easily.
The accompanying drawing explanation
By below in conjunction with the description of the drawings, and along with understanding more comprehensively of the present invention, other purposes of the present invention and effect will become more clear and easy to understand, wherein:
Fig. 1 shows the block diagram of the mobile terminal 10 be suitable for for realizing embodiment of the present invention;
Fig. 2 schematically shows the wireless communication system according to embodiment of the present invention;
Fig. 3 shows the block diagram according to the web page contents extraction apparatus of an embodiment of the invention;
Fig. 4 shows the process flow diagram according to the web page contents method of an embodiment of the invention.
In all above-mentioned accompanying drawings, identical label means to have identical, similar or corresponding feature or function.
Embodiment
Preferred implementation of the present disclosure is described below with reference to accompanying drawings in more detail.Although shown preferred implementation of the present disclosure in accompanying drawing, yet should be appreciated that, can realize the disclosure and the embodiment that should do not set forth limits here with various forms.On the contrary, it is in order to make the disclosure more thorough and complete that these embodiments are provided, and can be by the scope of the present disclosure complete conveys to those skilled in the art.
Fig. 1 shows the block diagram of the mobile terminal 10 be suitable for for realizing embodiment of the present invention.Yet, should be appreciated that shown and be only the demonstration that is suitable for the mobile terminal for realizing embodiment of the present invention at the mobile terminal of after this describing, therefore, should not be used for limiting the scope of embodiment of the present invention.
Mobile terminal such as mobile phone, personal digital assistant (PDA), panel computer, mobile TV, game station, kneetop computer, camera, video recorder, GPS equipment.
In addition, should be appreciated that, non-moving terminal also can be suitable for for realizing embodiments of the present invention.
Hereinafter main combining wireless communications applications is described to embodiment of the present invention.Yet, should be appreciated that the present invention also is applicable to the situation of wire communication.
Mobile terminal 10 comprises antenna 12, and it can communicate with transmitter 14 and receiver 16.Mobile terminal 10 also comprises controller 20, and it provides respectively the signal of going to transmitter 14 and receives the signal from receiver 16.Signal comprises the signaling information according to the air-interface standard of suitable cellular system, and comprises text, voice and/or video.Mobile terminal 10 can utilize one or more air-interface standards to be operated.As demonstration, mobile terminal 10 can be operated according to any agreement in the first generation, the second generation, the third generation and/or the 4th generation communication protocol.For example, mobile terminal 10 can be operated according to the second generation (2G) wireless communication protocol IS-136 ((time division multiple access (TDMA)) TDMA), GSM (global system for mobile communications) or IS-95 ((CDMA) CDMA); Perhaps according to the third generation (3G) wireless communication protocol such as Universal Mobile Telecommunications System (UMTS), CDMA2000, wideband CDMA (WCDMA) or time-division-synchronization CDMA (TD-SCDMA), operated; Perhaps according to the 4th generation (4G) wireless communication protocol operated.
Should be appreciated that, controller 20 comprises the device that the function for realizing mobile terminal 10 is required, such as circuit.For example, controller 20 can comprise digital signal processor, microprocessor and various analog to digital converter, digital to analog converter and other support circuit.The control of mobile terminal 10 and signal processing function according to these equipment ability separately distribute betwixt.Controller 20 can also be included in the function of before modulation and transmission, message being carried out convolutional encoding and interweaving.Controller 20 can also comprise internal voice coder, and can comprise internal data modem.In addition, controller 20 can comprise the function that the one or more software programs to being stored in storer are operated.For example, controller 20 can operate linker, such as Web browser.Linker can allow mobile terminal 10 such as according to WAP (wireless application protocol) (WAP), HTML (Hypertext Markup Language) (HTTP) etc., carrying out the sending and receiving web content.
Mobile terminal 10 can also comprise output device and user input device, and all these equipment all are coupled to controller 20.Output device comprises such as traditional earphone or loudspeaker 24, display 28 etc.User input device comprises microphone 26, allows mobile terminal 10 to receive the user input device of data, and it can comprise keypad 30, touch display (not shown) etc.In the embodiment that comprises keypad 30, keypad 30 can comprise traditional numerical key (0-9) and relative keys (#, *) etc.Alternatively, keypad 30 can comprise traditional QWERTY arrangements of keypad.In addition or alternatively, mobile terminal 10 can comprise the user input device such as operating rod.
Mobile terminal 10 also comprises battery 34, such as the vibrating battery group, is used to the required various circuit supplies of operating mobile terminal 10.Mobile terminal 10 can also be by supplementary power supply 44 power supplies.Supplementing power supply 44 can directly power to mobile terminal 10.Supplement power supply 44 and can also pass through to battery 34 chargings and indirectly to mobile terminal 10 power supplies, and battery 34 is directly to mobile terminal 10 power supplies.Thus, in some embodiments, even mobile terminal 10 is connected to, supplement power supply 44, also inoperation of mobile terminal 10, until battery 34 has received sufficient electric charge from supplementing power supply 44.In addition, supplementing power supply 44 can remove from mobile terminal 10, to allow the unrestraint movability of mobile terminal 10.When supplementary power supply 44, with physics mode or electric means, be removed or while disconnecting, battery 34 can be the exclusive power supply of mobile terminal 10.
The controller 20 of mobile terminal 10 can comprise function and/or the circuit of the battery levels that detects battery 34.Battery levels can be any indication for the dump energy of battery 34 or excess time.When considering whether to carry out various operation, controller 20 can use battery levels.Controller 20 can also detect mobile terminal 10 and whether be connected to supplementary power supply 44.Controller 20 can be via for example to the input of controller 20, and direct-detection supplements the connection of power supply.
Mobile terminal 10 can also comprise web page contents extraction apparatus 36.Web page contents extraction apparatus 36 can be any device, equipment or the circuit in conjunction with realization with hardware, software or hardware and software, and it can extract web page contents on mobile terminal 10.Web page contents extraction apparatus 36 can, by the content item storage of extraction to volatile memory 40 or nonvolatile memory 42, also can send to the content item of extraction a server.Content item can be information or the data of any type, and this information or data are selected by the user and are extracted on mobile terminal.Content item can include but not limited to link, picture, music, video, word etc.Content item can comprise from more than the single position information of (such as, single web page).In addition, content item can comprise from a plurality of positions information of (such as, a plurality of web pages).
In embodiment below, suppose that web page contents extraction apparatus 36 is to be stored in storer and the software application of being carried out by the controller 20 of mobile terminal 10.Below will describe further web page contents extraction apparatus 36.
Mobile terminal 10 can also comprise alignment sensor 37.Alignment sensor 37 can be device, equipment or the circuit of any type of position for localisation of mobile terminals 10, such as GPS (GPS) module of communicating by letter with controller 20.
Alignment sensor 37 can be determined the position of mobile terminal 10, for example the longitude of mobile terminal 10, latitude and height above sea level.Can be passed to the storer of mobile terminal 10 from the information of alignment sensor 37, so that as position history or positional information and store.In addition, alignment sensor 37 can utilize controller 20 to carry out sending/receiving positional information (position of mobile terminal 10) via transmitter 14/ receiver 16.
Mobile terminal 10 can also comprise subscriber identity module (UIM) 38.UIM 38 normally has the memory devices of internal processor.UIM 38 is such as comprising subscriber identity module (SIM), Universal Integrated Circuit Card (UICC), universal subscriber identity module (USIM), removable user identity modules (R-UIM) etc.UIM 38 is the storage information relevant to mobile subscriber usually.Except UIM 38, mobile terminal 10 can also be equipped with other storeies.For example, mobile terminal 10 can comprise volatile memory 40, for example comprises the volatile random access memory (RAM) for the cache area of the interim storage of data.Mobile terminal 10 can also comprise nonvolatile memory 42, and it can be Embedded and/or removable.Nonvolatile memory 42 can be Electrically Erasable Read Only Memory (EEPROM), flash memory etc.
Fig. 2 schematically shows the wireless communication system according to embodiment of the present invention.Referring to Fig. 2, the wireless communication system of a type be suitable for for realizing embodiment of the present invention comprises a plurality of equipment.As shown in Figure 2, each mobile terminal 10 comprises antenna 12, with for transmitting a signal to base station (BS) 44, and for from base station, 44 receiving signals.Base station 44 can be mobile network's a part, and wherein the mobile network comprises the required equipment of this mobile network of operation, for example mobile switching centre (MSC) 46.In operation, when mobile terminal 10 makes a call and during receipt of call, MSC 46 can route from the calling with going to mobile terminal 10.Although it should be noted that MSC 46 has been shown in the wireless communication system of Fig. 2, MSC 46 is only exemplary mobile network appliance, and embodiments of the present invention are not limited to adopt the mobile network of MSC.
MSC 46 can be coupled to data network, such as LAN (Local Area Network) (LAN), Metropolitan Area Network (MAN) (MAN) and/or wide area network (WAN).MSC 46 can couple directly to data network.Yet, in an exemplary embodiment, MSC 46 is coupled to gateway device (GTW) 48, and GTW 48 is coupled to for example WAN of the Internet 50.Thereby, such as the equipment 52 of personal computer, server computer etc., can be coupled to mobile terminal 10 via the Internet 50.
BS 44 can also be coupled to signaling GPRS (general packet radio service) support node (SGSN) 56.As known to the skilled person, SGSN 56 can carry out the function that is similar to MSC 46 usually, with for packet-switched services.Similar with MSC 46, SGSN56 can be coupled to the data network such as the Internet 50.SGSN 56 can couple directly to data network.Yet in more typical embodiment, SGSN 56 is coupled to packet switched core network, such as GPRS core net 58.Packet switched core network is coupled to another GTW, and such as GTW GPRS Support Node (GGSN) 60, and GGSN 60 is coupled to the Internet 50.
By SGSN 56 being coupled to GPRS core net 58 and GGSN 60, such as the equipment 52 of personal computer, server computer etc., can be coupled to mobile terminal 10 via the Internet 50, GGSN 60, GPRS 58, SGSN 56 and base station 44.Therefore, such as the equipment 52 of personal computer, server computer etc., can communicate by letter with mobile terminal 10.For example, by by mobile terminal 10 and other equipment (, computer equipment 52) be connected to directly or indirectly the Internet 50, mobile terminal 10 for example can communicate each other according to HTML (Hypertext Markup Language) (HTTP) and/or other agreements and other equipment.
Although do not illustrate and describe each possible mobile network appliance at this, should recognize, mobile terminal 10 can be coupled to by BS 44 these equipment of mobile network.The mobile network can support the communication according to any one or more agreements in the first generation (1G), the second generation (2G), 2.5G, the third generation (3G), 3.9G, the 4th generation wireless communication protocol.For example, the mobile network can support the communication according to 2G wireless communication protocol IS-136 (TDMA), GSM or IS-95 (CDMA) etc.Again for example, the mobile network can support the communication according to 2.5G wireless communication protocol GPRS or enhancing data gsm environments (EDGE) etc.Again for example, the mobile network can support the communication according to 3G wireless communication protocol UMTS etc.
Mobile terminal 10 can also be coupled to one or more WAPs (AP) 62.AP 62 can comprise the access point be configured to according to the technology such as following is come and mobile terminal 10 communicates: any technology in radio frequency (RF), bluetooth (BT), infrared (IrDA) or plurality of wireless networks interconnection technique, wherein the interconnected with wireless network technology comprises: such as IEEE 802.11 (for example, 802.11 a, 802.11 b, 802.11g, 801.11n etc.) WLAN (WLAN) technology, it is also known as Wi-Fi, such as micro-wave access to global intercommunication (WiMAX) technology of IEEE 802.16, such as ultra broadband (UWB) technology of IEEE 802.1 5.AP 62 can be coupled to the Internet 50.Be similar to MSC 46, AP 62 can couple directly to the Internet 50.Yet in one embodiment, AP 62 is indirectly coupled to the Internet 50 via GTW 48.In addition, in one embodiment, BS 44 can be regarded as to another AP 62.Will appreciate that, by mobile terminal 10 and other equipment are connected to the Internet 50 directly or indirectly, mobile terminal 10 can communicate each other, communicate etc. with other equipment, carry out thus the various functions of mobile terminal 10, for example send data to equipment 52 and/or receive data from equipment 52.Term used herein " data ", " content ", " information " can Alternates, are used for meaning the data that are sent out, receive and/or store according to the embodiment of the present invention.Thus, should be by the restriction to spirit and the scope of embodiment of the present invention that is used as of any this term.Although it is not shown in Figure 2, except crossing over the Internet 50, by mobile terminal 10, be coupled to equipment 52 or as an alternative, can according to any technology in for example RF, BT, IrDA or other wired or wireless communication technology by mobile terminal 10 and equipment 52 coupled to each other with communicate by letter.
Fig. 3 shows the block diagram according to the web page contents extraction apparatus of an embodiment of the invention.
As shown in Figure 3, web page contents extraction apparatus 36 comprises detecting unit 361, for detection of user's input; The first computing unit 362, for the input of the user according to detecting, calculate the coordinate in a zone, and wherein user's input is in described zone; The second computing unit 363, for calculating the coordinate of web page contents; Comparing unit 364, for the coordinate in the zone that relatively calculates and the coordinate of the web page contents calculated; And extraction unit 365, for according to comparative result, extract the corresponding web page content in described zone.
For example, web page contents extraction apparatus 36 is a web browser or a web browser plug-in unit.More specifically, according to an embodiment of the invention, web page contents extraction apparatus 36 is under the android platform, the browser based on WebView or its plug-in unit.
As previously mentioned, between mobile terminal 10 and equipment 52, can communicate.In following description, without loss of generality, suppose that equipment 52 is Web servers.
Therefore, input according to the user at browser, corresponding URI (URL(uniform resource locator)) address for example, mobile terminal 10 can be fetched a webpage from equipment 52, and play up accordingly, so that the size of the display of corresponding web pages conform mobile terminal 10 makes display effect better.
After the user sees this corresponding webpage on the display of mobile terminal 10, it can select corresponding content item on this corresponding webpage.More specifically, in the situation that the display of mobile terminal 10 is touch-screen, the user can draw a circle to select from corresponding webpage with hand and want the content item extracted on the display of mobile terminal 10, for example, a link, passage, a picture, a music, one section video etc.
For example, if the user sees an all well and good picture on the Web of the center section of the display page, hope is saved in this picture mobile terminal this locality or sends to a SNS community, to its good friend, shares this picture, and the user can select this picture so.
More specifically, the user can draw a circle with finger around this picture on touch-screen.
Detecting unit 361 detects user's input, for example detects user's drawing a circle on touch-screen.
The user's that the first computing unit 362 detects according to detecting unit 361 input, calculate the coordinate in a zone, and wherein user's input is in this zone.That is to say, computing unit 362 calculates the position of user's input at mobile terminal display.
According to an embodiment of the invention, the user's that computing unit 362 detects according to detecting unit 361 input, calculate the coordinate of a rectangular area, wherein user's input is in this rectangular area, and the difference between this rectangular area and user's input is minimized.For example, oval if user's input is approximately, the four edges of this rectangular area and this ellipse are tangent so.
The second computing unit 363 calculates the coordinate of web page contents, and web page contents is in the position of mobile terminal display.The second computing unit 363, by traversal DOM Document Object Model DOM node, calculates the coordinate of each DOM node, calculates the coordinate of web page contents.
For example, when the Web page is loaded into mobile terminal, can, by traversal DOM Document Object Model (DOM) node, calculate the coordinate of each DOM node.
For example, the width of known certain node and height, take this node as basis to all father nodes of outer circulation and calculate height (top) and the width (left) of each father node to its corresponding father node, all width additions, all height additions, can extrapolate the coordinate of this node with respect to the page.Height and the width with respect to screen by the page, calculate the absolute coordinates of node with respect to screen again.
The coordinate of this regional coordinate that comparing unit 364 relatively calculates and the web page contents calculated.
Extraction unit 365, according to comparative result, extracts the corresponding web page content in described zone.
According to the embodiment of the present invention, the corresponding web page content in this zone extracted is complete.More particularly, if the user with the drawn circle of finger, comprise in the situation of a part of a complete picture and a pictures, so according to the embodiment of the present invention, only extract this complete picture.
In addition, according to the embodiment of the present invention, corresponding web page content in this zone extracted can be single single content of planting, can be a plurality of same contents, can be also a plurality of contents not of the same race, this depend on fully the user on the display of mobile terminal with included content in the circle of finger picture the number.
Following code shows the coordinate in the zone relatively calculated and the coordinate of the web page contents calculated, and extracts the specific implementation of the corresponding web page content in zone:
Figure BSA00000733698800111
According to an embodiment of the invention, web page contents extraction apparatus 36 also comprises shares unit 366, for the corresponding web page content that will extract zone, shares each social network services SNS platform.For example, these social network services SNS platform also is coupled to the Internet 50.Certainly, those skilled in the art will appreciate that web page contents extraction apparatus 36 can be stored in mobile terminal 10 this locality to the corresponding web page content extracted in zone, is stored in the storer of mobile terminal 10.
Whether, according to another embodiment of the present invention, Web content extraction apparatus 36 also comprises judging unit 367, for the input that judges the user, be what seal.In this embodiment, in the situation that user's input is sealing, the first computing unit 362, just according to the user's who detects input, calculates the coordinate in a zone, and wherein user's input is in described zone.In the situation that user's input is not sealing, ignores user's input, and point out accordingly or bulletin to the user.
Fig. 4 shows the process flow diagram of the corresponding web page contents method according to an embodiment of the invention.
As shown in Figure 4, method 400 comprises step S410, detects user's input; Step S420, according to the user's who detects input, calculate the coordinate in a zone, and wherein user's input is in described zone; Step S430, the coordinate of calculating web page contents; Step S440, the coordinate of the coordinate in the zone relatively calculated and the web page contents calculated; And step S450, according to comparative result, extract the corresponding web page content in described zone.
According to an embodiment of the invention, the corresponding web page content comprises following at least one: link; Picture; Music; Video; And word.
According to an embodiment of the invention, zone is rectangular area.
According to an embodiment of the invention, by traversal DOM Document Object Model DOM node, calculate the coordinate of each DOM node, calculate the coordinate of web page contents.
According to an embodiment of the invention, method 400 also comprises step S460, and the described corresponding web page content extracted in described zone is shared to each social network services SNS platform.
According to an embodiment of the invention, the corresponding web page content in the described zone extracted is complete.
According to an embodiment of the invention, method 400 also comprises step S415, judges whether user's input is what seal, and in the situation that user's input is sealing, just, according to the user's who detects input, calculate the coordinate in a zone, wherein user's input is in described zone.In the situation that user's input is not sealing, ignores user's input, and point out accordingly or bulletin to the user.
Process flow diagram in accompanying drawing and block diagram have shown architectural framework in the cards, function and the operation of system, method and program product according to a plurality of embodiment of the present invention.In this, each square frame in process flow diagram or block diagram can represent the part of module, program segment or a code, and the part of described module, program segment or code comprises one or more for realizing the executable instruction of logic function of regulation.Also it should be noted that at some in realization as an alternative, what the function marked in square frame also can be marked to be different from accompanying drawing occurs in sequence.For example, in fact two continuous square frames can be carried out substantially concurrently, and they also can be carried out by contrary order sometimes, and this determines according to related function.Also be noted that, each square frame in block diagram and/or process flow diagram and the combination of the square frame in block diagram and/or process flow diagram, can realize by the hardware based system of the special use of the function put rules into practice or operation, or can realize with the combination of specialized hardware and programmed instruction.
Below described various embodiments of the present invention, above-mentioned explanation is exemplary, exhaustive not, and also be not limited to each disclosed embodiment.In the situation that do not depart from the scope and spirit of each illustrated embodiment, many modifications and changes are all apparent for those skilled in the art.The selection of term used herein, be intended to explain best the principle, practical application of each embodiment or to the technological improvement of the technology in market, or make other those of ordinary skill of the art can understand each embodiment that this paper discloses.

Claims (14)

1. a webpage content extracting method comprises:
Detect user's input;
According to the user's who detects input, calculate the coordinate in a zone, wherein user's input is in described zone;
Calculate the coordinate of web page contents;
The coordinate of the coordinate in the zone relatively calculated and the web page contents calculated; And
According to comparative result, extract the corresponding web page content in described zone.
2. method according to claim 1, wherein said corresponding web page content comprises following at least one:
Link;
Picture;
Music;
Video; And
Word.
3. method according to claim 1, wherein said zone is rectangular area.
4. method according to claim 1, wherein by traversal DOM Document Object Model DOM node, calculate the coordinate of each DOM node, calculates the coordinate of web page contents.
5. method according to claim 1 also comprises:
The described corresponding web page content extracted in described zone is shared to each social network services SNS platform.
6. method according to claim 1, wherein, the corresponding web page content in the described zone extracted is complete.
7. method according to claim 1 also comprises:
Whether the input that judges the user is what seal, and
In the situation that user's input is sealing, just according to the user's who detects input, calculate the coordinate in a zone, wherein user's input is in described zone.
8. a web page contents extraction element comprises:
Detecting unit, for detection of user's input;
The first computing unit, for the input of the user according to detecting, calculate the coordinate in a zone, and wherein user's input is in described zone;
The second computing unit, for calculating the coordinate of web page contents;
Comparing unit, for the coordinate in the zone that relatively calculates and the coordinate of the web page contents calculated; And
Extraction unit, for according to comparative result, extract the corresponding web page content in described zone.
9. device according to claim 8, wherein said corresponding web page content comprises following at least one:
Link;
Picture;
Music;
Video; And
Word.
10. device according to claim 8, wherein said zone is rectangular area.
11. device according to claim 8, wherein the second computing unit, by traversal DOM Document Object Model DOM node, calculates the coordinate of each DOM node, calculates the coordinate of web page contents.
12. device according to claim 8 also comprises:
Share unit, for the described corresponding web page content that extracts described zone is shared to each social network services SNS platform.
13. device according to claim 8, wherein, the corresponding web page content in the described zone extracted is complete.
14. device according to claim 8 also comprises:
Whether judging unit, be what seal for the input that judges the user, and
In the situation that user's input is sealing, the first computing unit, just according to the user's who detects input, calculates the coordinate in a zone, and wherein user's input is in this zone.
CN201210195729.XA 2012-06-08 2012-06-08 Webpage content extracting method and webpage content extraction device Active CN103488652B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210195729.XA CN103488652B (en) 2012-06-08 2012-06-08 Webpage content extracting method and webpage content extraction device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210195729.XA CN103488652B (en) 2012-06-08 2012-06-08 Webpage content extracting method and webpage content extraction device

Publications (2)

Publication Number Publication Date
CN103488652A true CN103488652A (en) 2014-01-01
CN103488652B CN103488652B (en) 2018-11-16

Family

ID=49828890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210195729.XA Active CN103488652B (en) 2012-06-08 2012-06-08 Webpage content extracting method and webpage content extraction device

Country Status (1)

Country Link
CN (1) CN103488652B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107533431A (en) * 2015-09-01 2018-01-02 华为技术有限公司 Method, apparatus, user interface and the storage medium of display operation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1941961A (en) * 2006-10-13 2007-04-04 康佳集团股份有限公司 Method for processing picture in embedded terminal
CN101075172A (en) * 2006-08-23 2007-11-21 腾讯科技(深圳)有限公司 Method for capturing picture, capturer and instant-telecommunication customer terminal
CN101515272A (en) * 2008-02-18 2009-08-26 株式会社理光 Method and device for extracting webpage content
CN101526945A (en) * 2008-03-06 2009-09-09 鸿富锦精密工业(深圳)有限公司 Picture saving system and method
CN102346575A (en) * 2010-08-03 2012-02-08 郑国书 Mouse with window screenshot function
CN102446224A (en) * 2012-01-05 2012-05-09 苏州阔地网络科技有限公司 Multi-level webpage block clipping method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075172A (en) * 2006-08-23 2007-11-21 腾讯科技(深圳)有限公司 Method for capturing picture, capturer and instant-telecommunication customer terminal
CN1941961A (en) * 2006-10-13 2007-04-04 康佳集团股份有限公司 Method for processing picture in embedded terminal
CN101515272A (en) * 2008-02-18 2009-08-26 株式会社理光 Method and device for extracting webpage content
CN101526945A (en) * 2008-03-06 2009-09-09 鸿富锦精密工业(深圳)有限公司 Picture saving system and method
CN102346575A (en) * 2010-08-03 2012-02-08 郑国书 Mouse with window screenshot function
CN102446224A (en) * 2012-01-05 2012-05-09 苏州阔地网络科技有限公司 Multi-level webpage block clipping method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BINGLIU: "《Web数据挖掘》", 30 April 2009, 清华大学出版社 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107533431A (en) * 2015-09-01 2018-01-02 华为技术有限公司 Method, apparatus, user interface and the storage medium of display operation

Also Published As

Publication number Publication date
CN103488652B (en) 2018-11-16

Similar Documents

Publication Publication Date Title
CN104956303B (en) Volume control process
CN104580499A (en) Method and device for accurately labeling positions
CN105898809A (en) WiFi connection management device and method, and mobile terminal
CN103139876A (en) Method and device selecting wireless access mode for mobile terminal adaptively
CN103970436A (en) Method and device for displaying on screen of electronic equipment
CN101926148A (en) Method, apparatus and computer program product for providing native broadcast support for hypermedia formats and/or widgets
CN103299342A (en) Method and apparatus for providing a mechanism for gesture recognition
CN105893508A (en) Method, device and system for determining access sequence of native page and H5 page
CN104156054A (en) Method for reducing power consumption of mobile terminal and mobile terminal
CN103870799A (en) Character direction judging method and device
CN103369095A (en) Method and device for type identification of incoming call or text message
CN108901079A (en) Time-out time determines method, apparatus, equipment and storage medium
CN103854019A (en) Method and device for extracting fields in image
CN102325225A (en) Method and device for playing video of mobile phone website
CN102222095B (en) Equipment for converting webpage to be displayed and method thereof
US9825895B2 (en) Method and system for exchanging messages on the basis of current position
CN103037336A (en) Short message transmission method of two-dimensional code
CN105791524B (en) A kind of method and device adjusting page font
JP2007150530A (en) Mobile wireless communication apparatus
CN106303259A (en) A kind of method and apparatus realizing taking pictures
CN104778554A (en) Method and device for identifying multiple matched users
CN106211162A (en) A kind of information processing method and device, terminal
CN103488652A (en) Webpage content extraction method and webpage content extraction device
CN112394863B (en) Picture display method and device, mobile terminal and storage medium
CN112840305B (en) Font switching method and related product

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant