CN107203748A - A kind of method and apparatus of webpage notes storage, matching and reduction based on content - Google Patents

A kind of method and apparatus of webpage notes storage, matching and reduction based on content Download PDF

Info

Publication number
CN107203748A
CN107203748A CN201710350594.2A CN201710350594A CN107203748A CN 107203748 A CN107203748 A CN 107203748A CN 201710350594 A CN201710350594 A CN 201710350594A CN 107203748 A CN107203748 A CN 107203748A
Authority
CN
China
Prior art keywords
web page
strokes
group
notes
webpage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710350594.2A
Other languages
Chinese (zh)
Other versions
CN107203748B (en
Inventor
贝佳
任桐炜
张衡
杨宇洁
徐强明
佘黎明
蔡浩伦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201710350594.2A priority Critical patent/CN107203748B/en
Publication of CN107203748A publication Critical patent/CN107203748A/en
Application granted granted Critical
Publication of CN107203748B publication Critical patent/CN107203748B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/36Matching; Classification
    • G06V30/387Matching; Classification using human interaction, e.g. selection of the best displayed recognition candidate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation

Abstract

The invention discloses a kind of method and apparatus of webpage notes storage, matching and reduction based on content.Wherein, method and step is as follows:The stroke that user inputs on the web page browsing page is obtained first, by combination of strokes into group of strokes, calculates the web page element corresponding to group of strokes.Then notes information is stored by web page address.When showing webpage, according to the corresponding notes information of web page address retrieval, the web page element corresponding to each group of strokes in the notes information found is matched with the web page element in current web page;Result finally according to matching is extracted corresponding group of strokes from the notes information found and reduced.Under the inventive method, when web page contents change, as long as the corresponding content of notes does not change, you can reproduce notes, thus ignore the change of other web page contents.

Description

A kind of method and apparatus of webpage notes storage, matching and reduction based on content
Technical field
The present invention relates to webpage notes.
Background technology
With the popularization of the mobile terminals such as tablet personal computer, touch screen notebook, smart mobile phone, people take notes change on computers Obtain and increasingly facilitate.If can directly be taken notes when browsing webpage on webpage;Note contents are stored on network, when User visits again the webpage next time when, note contents can reappear, and this will significantly facilitate user.But current various nets Stand usually using dynamic web page, the structure and content of webpage often change, and now, notes can not be consistent right with web page contents Should.Particularly some webpages with advertisement, access webpage each time, and ad content can all change.But ad content Change will not produce influence to the body matter of webpage, should not also influence notes.In this case it is necessary to notes and Webpage carries out uniformity judgement and compared, and completes the storage of webpage notes based on content, matches and reduce.
The content of the invention
Problem to be solved by this invention is of webpage and notes during webpage is taken down notes when web page contents change Match somebody with somebody.
To solve the above problems, the scheme that the present invention is used is as follows:
According to the method for a kind of webpage notes storage, matching and the reduction based on content of the present invention, comprise the following steps:
S1:The stroke that is inputted on the web page browsing page of user is obtained, by combination of strokes into group of strokes;
S2:Calculate the web page element corresponding to group of strokes;
S3:Notes information is stored by web page address;The notes information includes the set of stroke snapshot;The stroke snapshot includes Web page element corresponding to group of strokes and group of strokes;
S4:When showing webpage, according to the corresponding notes information of web page address retrieval;
S5:Web page element in web page element and current web page corresponding to each group of strokes in the notes information found is carried out Matching;
S6:Corresponding group of strokes is extracted from the notes information found according to the result of matching to be reduced.
Further, according to the method for the notes of the webpage based on content storage, matching and the reduction of the present invention, the step S6 Described in the result that matches be web page element in web page element and current web page in notes information corresponding to each group of strokes Total matching degree;The notes information also includes former snapshots of web pages;The step S6 is handled as follows:
When total matching degree is less than Low threshold, point out user's Web evolution can not reduce notes too much;
When total matching degree is higher than high threshold, group of strokes is reduced on the current web page page;
When total matching degree is located between Low threshold and high threshold, the former snapshots of web pages and each stroke are shown with another window Group, and synchronously reduce group of strokes on the current web page page.
Further, according to the method for the notes of the webpage based on content storage, matching and the reduction of the present invention, the step S6 Described in the result that matches be web page element in web page element and current web page in notes information corresponding to each group of strokes Total matching degree;The step S5 includes:
S511:Extract the web page element composition in the notes found in each group of strokes and combine web page element set F1;
S512:Current web page will be put into after each group of strokes coordinate mapping in the notes found, using step S2 method, Determine each group of strokes corresponding web page element composition web page element set F2 in current web page;
S513:The common factor for calculating web page element set F1 and F2 obtains web page element set X;
S514:The ratio for calculating element number in element number and F1 in X is used as total matching degree.
Further, according to the method for the notes of the webpage based on content storage, matching and the reduction of the present invention, the notes letter Breath also includes the depth-width ratio of former webpage;The result matched described in the step S6 is corresponding to each group of strokes in notes information Web page element and current web page in web page element total matching degree;The height that the step S5 also includes comparing current web page is wide Than with the former webpage depth-width ratio in notes information, judging both, whether difference is excessive, if both differences are excessive, total matching degree It is set to 0.
Further, it is described current according to the method for the notes of the webpage based on content storage, matching and the reduction of the present invention When reducing group of strokes on Webpage, current web page is put into after each group of strokes coordinate is mapped, according to step S2 method, meter Calculate each group of strokes corresponding web page element in current web page;Extract each group of strokes corresponding webpage member in former webpage Element;Judge whether each group of strokes corresponding web page element in current web page and former webpage matches;For the group of strokes of matching Normally shown, otherwise shown with prompting mode.
According to the device of a kind of webpage notes storage, matching and the reduction based on content of the present invention, including with lower module:
M1, is used for:The stroke that is inputted on the web page browsing page of user is obtained, by combination of strokes into group of strokes;
M2, is used for:Calculate the web page element corresponding to group of strokes;
M3, is used for:Notes information is stored by web page address;The notes information includes the set of stroke snapshot;The stroke is fast According to including the web page element corresponding to group of strokes and group of strokes;
M4, is used for:When showing webpage, according to the corresponding notes information of web page address retrieval;
M5, is used for:By the webpage member in the web page element and current web page corresponding to each group of strokes in the notes information found Element is matched;
M6, is used for:Corresponding group of strokes is extracted from the notes information found according to the result of matching to be reduced.
Further, according to the device of the notes of the webpage based on content storage, matching and the reduction of the present invention, the module M6 Described in the result that matches be web page element in web page element and current web page in notes information corresponding to each group of strokes Total matching degree;The notes information also includes former snapshots of web pages;The module M6 is handled as follows:
When total matching degree is less than Low threshold, point out user's Web evolution can not reduce notes too much;
When total matching degree is higher than high threshold, group of strokes is reduced on the current web page page;
When total matching degree is located between Low threshold and high threshold, the former snapshots of web pages and each stroke are reduced with another window Group, and synchronously reduce group of strokes on the current web page page.
Further, according to the device of the notes of the webpage based on content storage, matching and the reduction of the present invention, the module M6 Described in the result that matches be web page element in web page element and current web page in notes information corresponding to each group of strokes Total matching degree;The module M5 includes:
M511, is used for:Extract the web page element composition in the notes found in each group of strokes and combine web page element set F1;
M512, is used for:Current web page will be put into after each group of strokes coordinate mapping in the notes found, it is true by module M2 Each fixed group of strokes corresponding web page element composition web page element set F2 in current web page;
M513, is used for:The common factor for calculating web page element set F1 and F2 obtains web page element set X;
M514, is used for:The ratio for calculating element number in element number and F1 in X is used as total matching degree.
Further, according to the device of the notes of the webpage based on content storage, matching and the reduction of the present invention, the notes letter Breath also includes the depth-width ratio of former webpage;The result matched described in the module M6 is corresponding to each group of strokes in notes information Web page element and current web page in web page element total matching degree;The height that the module M5 also includes comparing current web page is wide Than with the former webpage depth-width ratio in notes information, judging both, whether difference is excessive, if both differences are excessive, total matching degree It is set to 0.
Further, it is described current according to the device of the notes of the webpage based on content storage, matching and the reduction of the present invention When reducing group of strokes on Webpage, current web page is put into after each group of strokes coordinate is mapped, calculates each by module M2 Group of strokes corresponding web page element in current web page;Extract each group of strokes corresponding web page element in former webpage;Judge Whether each group of strokes corresponding web page element in current web page and former webpage matches;Group of strokes for matching is carried out normally It has been shown that, is otherwise shown with prompting mode.
The technique effect of the present invention is as follows:The present invention is clocked storage web page element by pen, during reduction notes, is passed through The web page element and current web page content of storage are compared matching, are then reproduced and taken down notes according to the result of matching.In this side Under method, when web page contents change, as long as the corresponding content of notes does not change, you can reproduce notes, thus without Depending on the change of other web page contents.
Embodiment
The present invention is described in further details below.
The present embodiment is related to client, cloud storage service device and web page server.Client can be desktop personal computer, It can also be notebook, the tablet personal computer even mobile terminal such as smart mobile phone.Web browser is installed in client.The present embodiment It is a kind of notes plug-in unit realized on web browser.When user is by web browser connection web page server, webpage is shown When, user can realize the function that webpage is taken down notes by taking down notes plug-in unit on web browser.The notes plug-in unit connects cloud storage Server, by the webpage recorded on client terminal web page browser notes deposit cloud storage service device.The notes plug-in unit includes:Pen Remember that editor module, network element are reduced to lighting module, notes memory module, notes retrieval module, notes matching module and notes Module.Notes editor module is used for the UI interfaces for providing a user webpage notes editor, shows that user inputs on current web page Stroke and group of strokes, and change function there is provided the additions and deletions of group of strokes.Network element is used for true according to group of strokes to lighting module Determine the web page element corresponding to group of strokes.Notes memory module is used for the group of strokes that inputs user and the net corresponding to group of strokes The notes of page element composition are preserved into cloud storage service device.Notes retrieval module is used to be deposited in cloud according to the address of current web page Corresponding notes are searched in storage server.Matching module is taken down notes by the webpage corresponding to each group of strokes in the notes information found Element is matched with the web page element in current web page.Notes recovery module is believed according to the result of matching from the notes found Corresponding group of strokes is extracted in breath to be reduced.Take down notes editor module and correspond to foregoing step S1 and module M1, that is, " step " stroke " and " group of strokes " of " acquisition " in rapid S1 and module M1 " is that user operates writing pencil or mouse editor in UI interfaces Formed.This is technology familiar to those skilled in the art, and this specification is repeated no more.It is pointed out that " stroke Group " is the concept of a logic, is determined by user.Such as, a bracket is made up of left bracket and right parenthesis, left bracket and right parenthesis It is stroke, the stroke for individually preserving left bracket and right parenthesis lacks meaning, it is necessary to by two group of strokes of left bracket and right parenthesis Certain logic implication could be represented into bracket, this bracket being made up of two strokes is exactly " group of strokes ".
Below to network element is to lighting module, notes memory module, notes retrieval module, notes matching module and takes down notes also Grand master pattern block is described in further detail.
First, network element is to lighting module
Network element corresponds to foregoing step S2 and module M2 to lighting module.Web page element is pair that html tag is marked As usually text type, is familiar with by this area.Web page element corresponding to group of strokes can be the webpage member of leaf node The web page element of element or non-leaf nodes.
It with web page element corresponding relation embodiment the simplest is specified by user to determine group of strokes.Namely with Family needs to specify each group of strokes the web page element corresponding to the group of strokes.After user generates group of strokes, notes editor's mould Block requires that user specifies at least one web page element as web page element, if user not named web page element then the group of strokes is given birth to Into failure.
Determine that group of strokes and web page element corresponding relation can use semiautomatic fashion.Under the embodiment, user is in pen Remember during editor's group of strokes in editor module, it is necessary to specify the type of the group of strokes.Then according to the type and group of strokes of group of strokes Corresponding coordinate determines localization region, then the webpage corresponding to the web page element as the group of strokes covered using localization region Element.Determine localization region and determine that the corresponding web page element process of group of strokes is to perform journey by computer according to localization region What sequence was carried out automatically processes process, and the type of group of strokes then needs user to intervene, therefore is a kind of automanual mode.
Determine that group of strokes uses full automatic mode, including following step with web page element corresponding relation in the present embodiment Suddenly:
S21:The type of group of strokes is judged by the analysis of group of strokes own form;
S22:Then the coordinate according to corresponding to the type and group of strokes of group of strokes determines localization region;
S23:The web page element corresponding to web page element as the group of strokes covered using localization region.
During above-mentioned steps S21, S22, S23, the type of group of strokes is divided into:Closed type, underscore type, deletion Line type, bracket type, quotation marks type, connecting line type, text type.For the group of strokes of closed type.Localization region can To be the region covered of group of strokes or the region covered of the group of strokes certain distance that stretches out is formed Region.For the group of strokes of underscore type, localization region is the region covered that stroke is extended a distance up.For The group of strokes of strikethrough type, localization region is the region covered for extending certain distance above and below stroke.For bracket type Group of strokes, localization region is the region that top water horizontal line and bottom water horizontal line are covered between bracket.For the pen of quotation marks type Group is drawn, localization region is that horizontal line extends downwardly the region that certain distance is covered between bracket.For the stroke of connecting line type Group, localization region is using border circular areas of the connecting line terminal as the center of circle, at a certain distance for radius.For the stroke of text type Group, localization region is the region that the text filed certain distance that stretches out is covered.
Above-mentioned steps S21 judges that the type of group of strokes comprises the following steps by the analysis of group of strokes own form:
S211:Closing whether is constituted by analyzing group of strokes and determines whether closed type, is then returned if closed type;
S212:Whether sentenced by the minimax Y-axis coordinate difference and minimax X-axis coordinate difference that calculate stroke more than limit value Whether disconnected is underscore type or strikethrough type;If not less than limit value, then by analyzing whether group of strokes is located at certain webpage It is underscore type or strikethrough type that the lower section of element, which judges,;
S213:The stroke that whether there is left bracket and right parenthesis in group of strokes by analyzing judges whether group of strokes is bracket class Type;
S214:By analyze in group of strokes with the presence or absence of two double quotation marks judge group of strokes whether quotation marks type;
S215:Judge whether group of strokes is connecting line type with the presence or absence of the lines with arrow by analyzing in group of strokes;
S216:If group of strokes the above-mentioned type can not all be met, it is text type to assert the group of strokes.
It is pointed out that the web page element corresponding to group of strokes is the set of web page element, illustrate that group of strokes can be right Should be in multiple web page elements.
2nd, memory module is taken down notes
Take down notes memory module correspondence foregoing step S3 and module M3.In the present embodiment, notes information is stored in cloud store-service In device.It will be appreciated by those skilled in the art that can also to store client local for notes information.Storage mode can be by file side Formula, can also pass through database mode.Notes information is stored by web page address, thus being capable of convenient search when notes retrieval Arrive.Specifically, when being stored with database mode, crucial docuterm is used as using web page address;, can when file mode is stored Filename is used as using web page address.Notes information includes the set of webpage metamessage, snapshots of web pages and stroke snapshot.Webpage member letter Breath includes web page title, access time, webpage depth-width ratio.Snapshots of web pages can be webpage capture or html document.Examine Consider and consider that the processing mode of big CSS files is comparatively laborious under html document, therefore the present embodiment is preferentially made from webpage capture For snapshots of web pages.Stroke snapshot includes group of strokes, timestamp, the corresponding web page element of group of strokes.
3rd, notes retrieval module
Notes retrieval module correspondence foregoing step S4 and module M4.Search whether there are corresponding notes according to web page address Information.Notes retrieval, to the storage mode of notes information, is those skilled in the art institute dependent on foregoing notes memory module Known, this specification is repeated no more.
4th, notes matching module and notes recovery module
Take down notes matching module correspondence foregoing step S5 and module M5.Take down notes the foregoing step S6 of recovery module correspondence and module M6.Result of the notes reduction dependent on notes matching, both relevances are very strong, can also be combined into a step or module, match Recovery module.Match recovery module has implemented a variety of modes.Mode the simplest is that notes matching module is direct The group of strokes matched is matched, the result of matching is exactly the group of strokes matched, then show this in notes recovery module The group of strokes matched a bit.In the present embodiment, the result of matching is total matching degree, and notes matching module is to calculate total matching Degree, notes recovery module is then to extract corresponding group of strokes from the notes information found according to total matching degree to show.Calculate The specific method of total matching degree is as follows:Compare webpage essential information first, i.e., it is basic by the webpage preserved in notes information Information and the essential information of current web page compare, and specifically compare depth-width ratio and the depth-width ratio of current web page in notes information, If the ratio of the depth-width ratio of depth-width ratio and current web page in notes information is more than 1.5 or less than 0.7, then it is assumed that current web page It is excessive with former webpage gap, if total matching degree is 0 return, otherwise continue total matching degree calculation procedure below.
Total matching degree calculation procedure can use following several embodiments.The first embodiment implement as Under:
S511:Extract the web page element composition in the notes found in each group of strokes and combine web page element set F1;
S512:Current web page will be put into after each group of strokes coordinate mapping in the notes found, using step S2 method, Determine each group of strokes corresponding web page element composition web page element set F2 in current web page;
S513:The common factor for calculating web page element set F1 and F2 obtains web page element set X;
S514:The ratio for calculating element number in element number and F1 in X is used as total matching degree.
Second embodiment is implemented as follows:
S521:Calculate in the web page element in the notes found in each group of strokes and the web page contents corresponding to web-page requests The group of strokes number Nk that matches completely of web page element;
S522:Nk and Nm ratio is calculated as total matching degree, wherein Nm is the number of group of strokes in the notes found.
The third embodiment is implemented as follows:
S531:Extract the web page element composition in the notes found in each group of strokes and combine web page element set F;
S532:The common factor for calculating the web page element in current web page and F obtains web page element set X;
S533:The ratio for calculating element number in element number and F in X is used as total matching degree.
The present embodiment preferentially uses the first above-mentioned embodiment.
It can be seen from above-mentioned several embodiments, total matching degree is the numerical value between 0 and 1.According to total matching degree from The method that corresponding group of strokes shows is extracted in the notes information found also many kinds, and mode the simplest is to give One threshold value, such as 0.5, judge whether total matching degree is more than the threshold value, if matching degree is more than the threshold value then in current web page Group of strokes is shown on the page, does not otherwise show or point out user's Web evolution can not reduce notes too much.
The present embodiment employs the mode of dual threshold.Under which, previously given two threshold values:High threshold and low threshold Value.When total matching degree is less than Low threshold, point out user's Web evolution can not reduce notes too much;When total matching degree is higher than high threshold During value, group of strokes is shown on the current web page page;When total matching degree is located between Low threshold and high threshold, another window is used The former snapshots of web pages and each group of strokes are shown, and group of strokes is shown on the current web page page.It is, ought always match When degree is located between Low threshold and high threshold, group of strokes is shown by way of control, facilitates user to compare.
The present embodiment uses following method when group of strokes is reduced on the above-mentioned current web page page:By each group of strokes coordinate Current web page is put into after mapping, each group of strokes corresponding web page element in current web page is calculated by module M2;Extract every Individual group of strokes corresponding web page element in former webpage;Judge each group of strokes corresponding webpage in current web page and former webpage Whether element matches;Group of strokes for matching is normally shown, is otherwise shown with prompting mode.Such as, it is normal aobvious Black is used when showing group of strokes, and group of strokes is shown using other modes such as grey or Red Yellows under prompting mode.Thus Whether the web page element that user can be distinguished corresponding to group of strokes is corresponding with the web page element of former webpage.

Claims (10)

1. a kind of method of webpage notes storage, matching and reduction based on content, it is characterised in that comprise the following steps:
S1:The stroke that is inputted on the web page browsing page of user is obtained, by combination of strokes into group of strokes;
S2:Calculate the web page element corresponding to group of strokes;
S3:Notes information is stored by web page address;The notes information includes the set of stroke snapshot;The stroke snapshot includes Web page element corresponding to group of strokes and group of strokes;
S4:When showing webpage, according to the corresponding notes information of web page address retrieval;
S5:Web page element in web page element and current web page corresponding to each group of strokes in the notes information found is carried out Matching;
S6:Corresponding group of strokes is extracted from the notes information found according to the result of matching to be reduced.
2. the method that the webpage notes based on content are stored, match and reduced as claimed in claim 1, it is characterised in that institute The result matched described in step S6 is stated in the web page element and current web page corresponding to each group of strokes in notes information Total matching degree of web page element;The notes information also includes former snapshots of web pages;The step S6 is handled as follows:
When total matching degree is less than Low threshold, point out user's Web evolution can not reduce notes too much;
When total matching degree is higher than high threshold, group of strokes is reduced on the current web page page;
When total matching degree is located between Low threshold and high threshold, the former snapshots of web pages and each stroke are shown with another window Group, and synchronously reduce group of strokes on the current web page page.
3. the method that the webpage notes based on content are stored, match and reduced as claimed in claim 1, it is characterised in that institute The result matched described in step S6 is stated in the web page element and current web page corresponding to each group of strokes in notes information Total matching degree of web page element;The step S5 includes:
S511:Extract the web page element composition web page element set F1 in the notes found in each group of strokes;
S512:Current web page will be put into after each group of strokes coordinate mapping in the notes found, using step S2 method, Determine each group of strokes corresponding web page element in current web page, composition web page element set F2;
S513:The common factor for calculating web page element set F1 and F2 obtains web page element set X;
S514:The ratio for calculating element number in element number and F1 in X is used as total matching degree.
4. the method that the webpage notes based on content are stored, match and reduced as claimed in claim 1, it is characterised in that institute Stating notes information also includes the depth-width ratio of former webpage;The result matched described in the step S6 is each stroke in notes information Total matching degree of web page element in group corresponding web page element and current web page;The step S5 also includes more current net Former webpage depth-width ratio in the depth-width ratio and notes information of page, judging both, whether difference is excessive, if both differences are excessive, Total matching degree is set to 0.
5. the method that the webpage notes based on content are stored, match and reduced as claimed in claim 2, it is characterised in that When reducing group of strokes on the current web page page, current web page is put into after each group of strokes coordinate is mapped, and according to step S2's Method, calculates each group of strokes corresponding web page element in current web page;Extract each group of strokes corresponding in former webpage Web page element;Judge whether each group of strokes corresponding web page element in current web page and former webpage matches;For matching Group of strokes is normally shown, is otherwise shown with prompting mode.
6. a kind of device of webpage notes storage, matching and reduction based on content, it is characterised in that including with lower module:
M1, is used for:The stroke that is inputted on the web page browsing page of user is obtained, by combination of strokes into group of strokes;
M2, is used for:Calculate the web page element corresponding to group of strokes;
M3, is used for:Notes information is stored by web page address;The notes information includes the set of stroke snapshot;The stroke is fast According to including the web page element corresponding to group of strokes and group of strokes;
M4, is used for:When showing webpage, according to the corresponding notes information of web page address retrieval;
M5, is used for:By the webpage member in the web page element and current web page corresponding to each group of strokes in the notes information found Element is matched;
M6, is used for:Corresponding group of strokes is extracted from the notes information found according to the result of matching to be reduced.
7. the device that the webpage notes based on content are stored, match and reduced as claimed in claim 6, it is characterised in that institute The result matched described in module M6 is stated in the web page element and current web page corresponding to each group of strokes in notes information Total matching degree of web page element;The notes information also includes former snapshots of web pages;The module M6 is handled as follows:
When total matching degree is less than Low threshold, point out user's Web evolution can not reduce notes too much;
When total matching degree is higher than high threshold, group of strokes is shown on the current web page page;
When total matching degree is located between Low threshold and high threshold, the former snapshots of web pages and each stroke are shown with another window Group, and synchronously reduce group of strokes on the current web page page.
8. the device that the webpage notes based on content are stored, match and reduced as claimed in claim 6, it is characterised in that institute The result matched described in module M6 is stated in the web page element and current web page corresponding to each group of strokes in notes information Total matching degree of web page element;The module M5 includes:
M511, is used for:Extract the web page element composition in the notes found in each group of strokes and combine web page element set F1;
M512, is used for:Current web page will be put into after each group of strokes coordinate mapping in the notes found, it is true by module M2 Fixed each group of strokes corresponding web page element in current web page, composition web page element set F2;
M513, is used for:The common factor for calculating web page element set F1 and F2 obtains web page element set X;
M514, is used for:The ratio for calculating element number in element number and F1 in X is used as total matching degree.
9. the device that the webpage notes based on content are stored, match and reduced as claimed in claim 6, it is characterised in that institute Stating notes information also includes the depth-width ratio of former webpage;The result matched described in the module M6 is each stroke in notes information Total matching degree of web page element in group corresponding web page element and current web page;The module M5 also includes more current net Former webpage depth-width ratio in the depth-width ratio and notes information of page, judging both, whether difference is excessive, if both differences are excessive, Total matching degree is set to 0.
10. the device that the webpage notes based on content are stored, match and reduced as claimed in claim 7, it is characterised in that institute State when showing group of strokes on the current web page page, will each group of strokes coordinate map after be put into current web page, pass through module M2 Calculate each group of strokes corresponding web page element in current web page;Extract each group of strokes corresponding webpage member in former webpage Element;Judge whether each group of strokes corresponding web page element in current web page and former webpage matches;For the group of strokes of matching Normally shown, otherwise shown with prompting mode.
CN201710350594.2A 2017-05-18 2017-05-18 Method and device for storing, matching and restoring webpage notes based on content Active CN107203748B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710350594.2A CN107203748B (en) 2017-05-18 2017-05-18 Method and device for storing, matching and restoring webpage notes based on content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710350594.2A CN107203748B (en) 2017-05-18 2017-05-18 Method and device for storing, matching and restoring webpage notes based on content

Publications (2)

Publication Number Publication Date
CN107203748A true CN107203748A (en) 2017-09-26
CN107203748B CN107203748B (en) 2020-12-22

Family

ID=59905719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710350594.2A Active CN107203748B (en) 2017-05-18 2017-05-18 Method and device for storing, matching and restoring webpage notes based on content

Country Status (1)

Country Link
CN (1) CN107203748B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112486532A (en) * 2020-11-25 2021-03-12 中移(杭州)信息技术有限公司 Method and device for managing configuration file, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1625741A (en) * 2002-01-31 2005-06-08 西尔弗布鲁克研究有限公司 An electronic filing system searchable by a handwritten search query
CN101441644A (en) * 2007-11-19 2009-05-27 英福达科技股份有限公司 Web page annotation system and method
CN101551800A (en) * 2008-03-31 2009-10-07 富士通株式会社 Marked information generation device, inquiry unit and sharing system
CN102609401A (en) * 2011-12-26 2012-07-25 北京大学 Webpage annotation method
US20140344658A1 (en) * 2013-05-15 2014-11-20 Microsoft Corporation Enhanced links in curation and collaboration applications
CN104615601A (en) * 2013-11-04 2015-05-13 英业达科技有限公司 Webpage based recording system and method thereof
CN104794174A (en) * 2015-04-01 2015-07-22 百度在线网络技术(北京)有限公司 Webpage marking information display method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1625741A (en) * 2002-01-31 2005-06-08 西尔弗布鲁克研究有限公司 An electronic filing system searchable by a handwritten search query
CN101441644A (en) * 2007-11-19 2009-05-27 英福达科技股份有限公司 Web page annotation system and method
CN101551800A (en) * 2008-03-31 2009-10-07 富士通株式会社 Marked information generation device, inquiry unit and sharing system
CN102609401A (en) * 2011-12-26 2012-07-25 北京大学 Webpage annotation method
US20140344658A1 (en) * 2013-05-15 2014-11-20 Microsoft Corporation Enhanced links in curation and collaboration applications
CN104615601A (en) * 2013-11-04 2015-05-13 英业达科技有限公司 Webpage based recording system and method thereof
CN104794174A (en) * 2015-04-01 2015-07-22 百度在线网络技术(北京)有限公司 Webpage marking information display method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
朱小辉: "基于教育云的学习笔记跨平台的研究与实现", 《中国优秀硕士学位论文全文数据库》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112486532A (en) * 2020-11-25 2021-03-12 中移(杭州)信息技术有限公司 Method and device for managing configuration file, electronic equipment and storage medium
CN112486532B (en) * 2020-11-25 2024-04-09 中移(杭州)信息技术有限公司 Configuration file management method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN107203748B (en) 2020-12-22

Similar Documents

Publication Publication Date Title
Vishwakarma et al. Detection and veracity analysis of fake news via scrapping and authenticating the web search
US7917514B2 (en) Visual and multi-dimensional search
CN110263180B (en) Intention knowledge graph generation method, intention identification method and device
US6996268B2 (en) System and method for gathering, indexing, and supplying publicly available data charts
JP2005085285A5 (en)
US8312012B1 (en) Automatic determination of whether a document includes an image gallery
CN106709032A (en) Method and device for extracting structured information from spreadsheet document
KR20100044669A (en) Method, system and computer-readable recording medium for providing information on goods based on image matching
CN110457579B (en) Webpage denoising method and system based on cooperative work of template and classifier
WO2022105119A1 (en) Training corpus generation method for intention recognition model, and related device thereof
CN104090904A (en) Method and equipment for providing target search result
CN105930174B (en) A kind of graphical page program comparison in difference method and system
CN104317867B (en) The system that entity cluster is carried out to the Web page picture that search engine returns
CN110232126A (en) Hot spot method for digging and server and computer readable storage medium
CN110020312A (en) The method and apparatus for extracting Web page text
CN103942211A (en) Text page recognition method and device
CN108647312A (en) A kind of user preference analysis method and its device
CN103631796A (en) Website sort management method and electronic device
CN102236713A (en) Digital television interaction service page information extraction method and device
CN108628871A (en) A kind of link De-weight method based on chain feature
CN107203748A (en) A kind of method and apparatus of webpage notes storage, matching and reduction based on content
Cameron et al. Mesogranular structure in a hydrodynamical simulation
CN105550183A (en) Identifying method of identifying information in webpage and electronic device
CN110866170A (en) Importance evaluation method, search method and system for Tor darknet service based on site quality
Li et al. Cleaning web pages for effective web content mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant