CN102314492A - Method and equipment for acquiring candidate document sections matched with target document section - Google Patents

Method and equipment for acquiring candidate document sections matched with target document section Download PDF

Info

Publication number
CN102314492A
CN102314492A CN201110243486A CN201110243486A CN102314492A CN 102314492 A CN102314492 A CN 102314492A CN 201110243486 A CN201110243486 A CN 201110243486A CN 201110243486 A CN201110243486 A CN 201110243486A CN 102314492 A CN102314492 A CN 102314492A
Authority
CN
China
Prior art keywords
sections
chapters
information
destination document
candidate documents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201110243486A
Other languages
Chinese (zh)
Inventor
林帆
洪庚伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201110243486A priority Critical patent/CN102314492A/en
Publication of CN102314492A publication Critical patent/CN102314492A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention aims to provide a method and equipment for acquiring candidate document sections matched with a target document section. The method comprises the following steps of: acquiring a target document section to be matched; determining section identification information of the target document section according to the section title information of the target document section; and performing matching inquiry according to the section identification information to obtain one or more candidate document sections which correspond to the target document section. Compared with the prior art, the invention has the advantages that: the document access efficiency of a user is increased, and the user experience is improved. Furthermore, the candidate document section(s) can be provided for the user according to the matching degree between the obtained candidate document section(s) and the target document section, so that the document access efficiency of the user is increased, and the user experience is improved.

Description

Obtain the method and apparatus of the candidate documents chapters and sections that are complementary with the destination document chapters and sections
Technical field
The present invention relates to the web search technical field, relate in particular to a kind of technology that is used to obtain the candidate documents chapters and sections that are complementary with the destination document chapters and sections.
Background technology
Along with popularizing of network application, the more and more users dependency network carries out the reading of network documentation, and for example, the user can obtain in " natural language processing " related Sections about " Hidden Markov chain model " through network.
Yet, during real network is used, because the open characteristics of Internet communication possibly reprinted by a plurality of websites with certain chapters and sections of a network documentation, and the reprinting quality of these chapters and sections on these websites possibly vary.For example, some website inserts the advertising content in these chapters and sections, has not only increased user's flowing of access, has also influenced user's reading experience; Even in some website, there are problems such as empty chapter, picture chapter or dead chain in the page that should chapters and sections, this has all had a strong impact on the continuity that the user reads, and has also reduced user's experience.
Therefore, how for the destination document chapters and sections mate corresponding candidate documents chapters and sections, with raising user's document access efficient, and lifting user's experience, become the problem that those skilled in the art need solution badly.
Summary of the invention
The purpose of this invention is to provide a kind of method and apparatus that is used to obtain the candidate documents chapters and sections that are complementary with the destination document chapters and sections.
According to an aspect of the present invention, provide a kind of by the computer implemented method that is used to obtain the candidate documents chapters and sections that are complementary with the destination document chapters and sections, wherein, this method comprises:
A obtains destination document chapters and sections to be matched;
B confirms the chapters and sections identification information of said destination document chapters and sections according to the chapter title information of said destination document chapters and sections;
C carries out matching inquiry according to said chapters and sections identification information, to obtain and the corresponding one or more candidate documents chapters and sections of said destination document chapters and sections.
According to a further aspect in the invention, a kind of chapters and sections matching unit that is used to obtain the candidate documents chapters and sections that are complementary with the destination document chapters and sections is provided also, wherein, this equipment comprises:
The chapters and sections deriving means is used to obtain destination document chapters and sections to be matched;
Sign is confirmed device, is used for the chapter title information according to said destination document chapters and sections, confirms the chapters and sections identification information of said destination document chapters and sections;
The chapters and sections coalignment is used for carrying out matching inquiry according to said chapters and sections identification information, to obtain and the corresponding one or more candidate documents chapters and sections of said destination document chapters and sections.
Compared with prior art; The present invention is through the chapter title information of destination document chapters and sections to be matched; Confirm the chapters and sections identification information of said destination document chapters and sections, and carry out matching inquiry in view of the above, to obtain and the corresponding one or more candidate documents chapters and sections of said destination document chapters and sections; Thereby improved user's document access efficient, and promoted user's experience.Further, the present invention can also offer the user with the candidate documents chapters and sections, thereby improve user's document access efficient further, and promote user's experience according to the matching degree of relative these destination document chapters and sections of the candidate documents chapters and sections that obtained.
Description of drawings
Through reading the detailed description of doing with reference to following accompanying drawing that non-limiting example is done, it is more obvious that other features, objects and advantages of the present invention will become:
Fig. 1 illustrates the equipment synoptic diagram that is used to obtain the candidate documents chapters and sections that are complementary with the destination document chapters and sections according to one aspect of the invention;
Fig. 2 illustrates the equipment synoptic diagram that is used to obtain the candidate documents chapters and sections that are complementary with the destination document chapters and sections in accordance with a preferred embodiment of the present invention;
Fig. 3 illustrates the method flow diagram that is used to obtain the candidate documents chapters and sections that are complementary with the destination document chapters and sections according to a further aspect of the present invention;
Fig. 4 illustrates the method flow diagram that is used to obtain the candidate documents chapters and sections that are complementary with the destination document chapters and sections in accordance with a preferred embodiment of the present invention.
Same or analogous Reference numeral is represented same or analogous parts in the accompanying drawing.
Embodiment
Below in conjunction with accompanying drawing the present invention is described in further detail.
Fig. 1 illustrates the equipment synoptic diagram that is used to obtain the candidate documents chapters and sections that are complementary with the destination document chapters and sections according to one aspect of the invention; Chapters and sections matching unit 1 comprises chapters and sections deriving means 101, the definite device 102 of sign and chapters and sections coalignment 103.
Chapters and sections matching unit 1 includes but not limited to the cloud that network host, single network server, a plurality of webserver collection or a plurality of server constitute.At this, cloud is by constituting based on the great amount of calculation machine of cloud computing (Cloud Computing) or the webserver, and wherein, cloud computing is a kind of of Distributed Calculation, a super virtual machine of being made up of the loosely-coupled computing machine collection of a group.Those skilled in the art will be understood that above-mentioned chapters and sections matching unit is merely for example, and other chapters and sections matching units existing or that possibly occur from now on also should be included in the protection domain of the present invention, and be contained in this at this with way of reference as applicable to the present invention.
Chapters and sections deriving means 101 obtains destination document chapters and sections to be matched.Particularly, chapters and sections deriving means 101 is for example through obtaining the document chapters and sections randomly or sequentially in the chapters and sections information bank, with as destination document chapters and sections to be matched; Perhaps, mutual through with third party's equipment such as search engine obtains the document chapters and sections that third party's equipment such as said search engine is provided, with as destination document chapters and sections to be matched; Perhaps, through the page analysis device etc., detect each document chapters and sections of document, therefrom detect problematic chapters and sections, for example empty chapter, picture chapter, the pairing chapters and sections that are linked as dead chain etc. are with as destination document chapters and sections to be matched.At this, empty chapter for example the chapters and sections content is empty or the effective Word message of chapters and sections less than the chapters and sections of predetermined threshold; The picture chapter for example in the chapters and sections perhaps the chapters and sections subject content be the chapters and sections of picture; Dead chain is for example clicked the link that jumps to catalogue page or other uncorrelated webpages behind the damned chain.At this, store in the chapters and sections information bank large volume document chapters and sections and with the mapping relations of document, chapters and sections identification information etc., this chapters and sections information bank both can be arranged in chapters and sections matching unit 1, also can be arranged in the third party device that is connected with this chapters and sections matching unit 1.Those skilled in the art will be understood that the above-mentioned mode of obtaining destination document chapters and sections to be matched is merely for example; Other existing or modes of obtaining destination document chapters and sections to be matched that possibly occur from now on are as applicable to the present invention; Also should be included in the protection domain of the present invention, and be contained in this with way of reference at this.
Sign is confirmed the chapter title information of device 102 according to said destination document chapters and sections, confirms the chapters and sections identification information of said destination document chapters and sections.Particularly, sign confirms that the mode of the chapters and sections identification information of device 102 definite destination document chapters and sections includes but not limited to:
1) the destination document chapters and sections to be matched that obtained according to chapters and sections deriving means 101 extract chapter title information from these destination document chapters and sections, with the chapters and sections identification information of said chapter title information as these destination document chapters and sections.
2) the destination document chapters and sections to be matched that obtained according to chapters and sections deriving means 101; From these destination document chapters and sections, extract chapter title information; Through this chapter title information being removed heading order number information, remove title suffix information, being removed the pretreatment operation such as Word message that sign character and said sign character included; Obtain the chapter title information after the pretreatment operation, and will this pretreated chapter title information as the chapters and sections identification information of these destination document chapters and sections.
Those skilled in the art will be understood that the mode of above-mentioned definite chapters and sections identification information is merely for example; The mode of other existing or definite chapters and sections identification informations that possibly occur from now on is as applicable to the present invention; Also should be included in the protection domain of the present invention, and be contained in this with way of reference at this.
Chapters and sections coalignment 103 carries out matching inquiry according to said chapters and sections identification information, to obtain and the corresponding one or more candidate documents chapters and sections of said destination document chapters and sections.Particularly, chapters and sections coalignment 103 is through matching inquiry, and the mode that obtains said one or more candidate documents chapters and sections includes but not limited to:
1) according to the chapters and sections identification information of the definite device 102 determined destination document chapters and sections of sign, in the chapters and sections information bank, carries out matching inquiry, perhaps, in the search index storehouse, carry out online matching inquiry, to obtain said one or more candidate documents chapters and sections.For example, chapters and sections deriving means 101 gets access to destination document chapters and sections to be matched chapter 6 for " those things of the Ming Dynasty---the online stack room of * * "; Sign confirms that device 102 is with the chapter title information " beginning of chapter 6 ruleship " of these destination document chapters and sections chapters and sections identification information as these destination document chapters and sections; Chapters and sections coalignment 103 is according to this chapters and sections identification information " beginning of chapter 6 ruleship "; In the chapters and sections information bank, carry out matching inquiry; Perhaps; In the search index storehouse, carry out online matching inquiry, obtain and the chapter 6 of the corresponding a plurality of candidate documents chapters and sections of these destination document chapters and sections " those things of the Ming Dynasty are published in instalments reading * * net ", the chapter 6 of " those thing history culture reading channels * * of Ming Dynasty net " etc.
2) according to identifying the chapters and sections identification information of confirming device 102 determined destination document chapters and sections; Document identification information in conjunction with the pairing destination document of these destination document chapters and sections; In the chapters and sections information bank, carry out matching inquiry; Perhaps, in the search index storehouse, carry out online matching inquiry, to obtain said one or more candidate documents chapters and sections.The document identification information such as document title, author's title or document content mark etc. can be used for identifying the information of document.For example; The chapter title information of supposing the destination document chapters and sections to be matched that chapters and sections deriving means 101 is accessed has included only heading order number information; As getting access to destination document chapters and sections to be matched chapter 6 for " those things of the Ming Dynasty---the online stack room of * * ", the chapter title of these destination document chapters and sections is " chapter 6 "; Sign confirms that device 102 is with the chapters and sections identification information of this chapter title information " chapter 6 " as these destination document chapters and sections; Chapters and sections coalignment 103 is according to the document identification information in the pairing destination document of these destination document chapters and sections " those things of the Ming Dynasty---the online stack room of * * "; Like document title " those things of the Ming Dynasty ", author's title " bright moon then " etc.; In the chapters and sections information bank, carry out matching inquiry; Perhaps; In the search index storehouse, carry out online matching inquiry, obtain and the corresponding one or more candidate documents of this destination document, like " those things of the Ming Dynasty are published in instalments reading * * net ", " those thing history culture reading channels * * of Ming Dynasty net " etc.; Then; Chapters and sections coalignment 103 is again according to carrying out matching inquiry in these these one or more candidate documents of chapters and sections identification information " chapter 6 ", to obtain and the corresponding one or more candidate documents chapters and sections of these destination document chapters and sections, like the chapter 6 of " those things of the Ming Dynasty are published in instalments reading * * net ", the chapter 6 of " those thing history culture reading channels * * of Ming Dynasty net " etc.
3) according to identifying the chapters and sections identification information of confirming device 102 determined destination document chapters and sections; And combine the chapters and sections supplementary such as heading order number information, title suffix information of said destination document chapters and sections; In the chapters and sections information bank, carry out matching inquiry; Perhaps, in the search index storehouse, carry out online matching inquiry, to obtain said one or more candidate documents chapters and sections.
4) according to identifying the chapters and sections identification information of confirming device 102 determined destination document chapters and sections; And combine the document identification information of the pairing destination document of said destination document chapters and sections and the chapters and sections supplementary of said destination document chapters and sections; In the chapters and sections information bank, carry out matching inquiry; Perhaps, in the search index storehouse, carry out online matching inquiry, to obtain said one or more candidate documents chapters and sections.
Those skilled in the art will be understood that the mode of above-mentioned matching inquiry acquisition candidate documents chapters and sections is merely for example; The mode that other matching inquiries existing or that possibly occur from now on obtain the candidate documents chapters and sections is as applicable to the present invention; Also should be included in the protection domain of the present invention, and be contained in this with way of reference at this.
At this; The present invention is through the chapter title information of destination document chapters and sections to be matched; Confirm the chapters and sections identification information of said destination document chapters and sections, and carry out matching inquiry in view of the above, to obtain and the corresponding one or more candidate documents chapters and sections of said destination document chapters and sections; Thereby improved user's document access efficient, and promoted user's experience.
Preferably, work continuously between each device of chapters and sections matching unit 1.Particularly, chapters and sections deriving means 101 continues to obtain destination document chapters and sections to be matched; Sign is confirmed the chapter title information that device 102 continues according to said destination document chapters and sections, confirms the chapters and sections identification information of said destination document chapters and sections; Chapters and sections coalignment 103 continues to carry out matching inquiry according to said chapters and sections identification information, to obtain and the corresponding one or more candidate documents chapters and sections of said destination document chapters and sections.At this; It will be understood by those skilled in the art that " continuing " is meant that above-mentioned each device constantly carries out the coupling of confirming of the obtaining of destination document chapters and sections, chapters and sections identification information and candidate documents chapters and sections respectively, stops to obtain destination document chapters and sections to be matched in a long time until this chapters and sections matching unit 1.
Preferably, said sign confirms that 102 pairs of said chapter title information of device carry out pretreatment operation, and to obtain said chapters and sections identification information, wherein, said chapters and sections identification information comprises the said chapter title information after the pretreatment operation; Wherein, said pretreatment operation comprise following at least each:
-removal heading order number information from said chapter title information;
-removal title suffix information from said chapter title information;
-removal the Word message that sign character and said sign character included from said chapter title information.
Particularly; Sign is confirmed the destination document chapters and sections to be matched that device 102 is obtained according to chapters and sections deriving means 101; From these destination document chapters and sections, extract chapter title information; Through semantic analysis, cut technology such as speech or string matching, the parts such as Word message that identification and deletion heading order number information, title suffix information, sign character and said sign character are included from this chapter title information are carried out pretreatment operation to this chapter title information; Obtaining the chapter title information after the pretreatment operation, and with the chapter title information after this pretreatment operation as the chapters and sections identification information.At this; Sign confirms that device 102 through keywords such as digital number included in the identification chapter title information or " the ", " chapter ", " piece of writing ", " returning ", " volume ", " joint ", " collection ", identifies the heading order number information in the said chapter title information; Through keywords such as included in the identification chapter title information " on ", " in ", D score, " continuing ", identify the title suffix information in the said chapter title information; Included in sign character such as " () ", " [] " through comprising described in the identification chapter title information, " { } ", " [] " and the above-mentioned sign character such as Word messages such as " figure ", " newly ", " renewals ", identify the Word message that sign character and said sign character included in the said chapter title information.For example, chapters and sections deriving means 101 gets access to destination document chapters and sections to be matched first chapters and sections for the chapter 7 in " those things of the Ming Dynasty---the online stack room of * * "; Sign confirm device 102 through semantic analysis, cut technology such as speech or string matching; From with this chapter title information identification and the deletion heading order number information " chapter 7 "; The chapter title information of these destination document chapters and sections " adversary that chapter 7 is fearful (on) " has been carried out pretreatment operation; Obtained pretreated chapter title information " fearful adversary (on) ", and with it chapters and sections identification information as these destination document chapters and sections; Chapters and sections coalignment 103 is according to this chapters and sections identification information " fearful adversary (on) "; In the chapters and sections information bank, carry out matching inquiry; Perhaps; In the search index storehouse, carry out online matching inquiry, first chapters and sections of first chapters and sections of the chapter 7 of acquisition and the corresponding a plurality of candidate documents chapters and sections of these destination document chapters and sections " those things of the Ming Dynasty are published in instalments reading * * net ", the chapter 7 of " those thing history culture reading channels * * of Ming Dynasty net " etc.Preferably; The definite 102 pairs of said chapter title information of device of sign are carried out above-mentioned whole three kinds of pretreatment operation; And will remove chapter title information that parts such as Word message back that heading order number information, title suffix information, sign character and said sign character included obtained title trunk information as this chapter title information; And with this title trunk information as the chapters and sections identification information, operate accordingly for follow up device.Connect example; Sign confirm device 102 through semantic analysis, cut technology such as speech or string matching; Identification and deletion heading order number information " chapter 7 " and title suffix information from this chapter title information " (on) "; The chapter title information of these destination document chapters and sections " adversary that chapter 7 is fearful (on) " has been carried out pretreatment operation; Obtained pretreated chapter title information, like title trunk information " to the adversary who is afraid of ", and with it chapters and sections identification information as these destination document chapters and sections; Chapters and sections coalignment 103 is according to this chapters and sections identification information " fearful adversary "; In the chapters and sections information bank, carry out matching inquiry; Perhaps; In the search index storehouse, carry out online matching inquiry, upper and lower two parts of upper and lower two parts chapters and sections of the chapter 7 of acquisition and the corresponding a plurality of candidate documents chapters and sections of these destination document chapters and sections " those things of the Ming Dynasty are published in instalments reading * * net ", the chapter 7 of " those thing history culture reading channels * * of Ming Dynasty net " etc.
Those skilled in the art will be understood that above-mentioned pretreatment operation to chapter title information is merely for example; Other existing or possibly occur from now on to the pretreatment operation of chapter title information as applicable to the present invention; Also should be included in the protection domain of the present invention, and be contained in this with way of reference at this.
At this, the present invention at first carries out pre-service to the chapter title information of destination document chapters and sections, to obtain the chapters and sections identification information; And carry out matching inquiry in view of the above; To obtain and the corresponding one or more candidate documents chapters and sections of said destination document chapters and sections, improved the accuracy rate of matching inquiry, further; Improve user's document access efficient, and promoted user's experience.
Preferably, said chapters and sections coalignment 103 carries out matching inquiry according to said chapters and sections identification information in the chapters and sections information bank, to obtain said one or more candidate documents chapters and sections.Particularly; Chapters and sections coalignment 103 is confirmed the chapters and sections identification information of device 102 determined destination document chapters and sections according to sign; Mode through the database matching inquiry; In the chapters and sections information bank, carry out matching inquiry, obtaining and the corresponding one or more document chapters and sections records of this chapters and sections identification information, as with the corresponding one or more candidate documents chapters and sections of these destination document chapters and sections; Wherein, the chapter title information or the chapters and sections identification information of said one or more document chapters and sections records are all or part of consistent with the chapters and sections identification information of these destination document chapters and sections.At this, store in the chapters and sections information bank large volume document chapters and sections and with the mapping relations of document, chapters and sections identification information etc., this chapters and sections information bank both can be arranged in chapters and sections matching unit 1, also can be arranged in the third party device that is connected with this chapters and sections matching unit 1.
Preferably, said chapters and sections coalignment 103 carries out online matching inquiry according to said chapters and sections identification information in the search index storehouse, to obtain said one or more candidate documents chapters and sections.Particularly; Chapters and sections coalignment 103 is confirmed the chapters and sections identification information of device 102 determined destination document chapters and sections according to sign; In the search index storehouse, carry out online matching inquiry, obtaining and the corresponding one or more document chapters and sections pages of this chapters and sections identification information, with as with the corresponding one or more candidate documents chapters and sections of these destination document chapters and sections; Wherein, The pairing indexing key words of said one or more document chapters and sections pages, like chapter title information or chapters and sections identification information, all or part of consistent with the chapters and sections identification information of these destination document chapters and sections.At this; The search index library storage is through the Webpage of index process, and search engine is handled through continuing to climb the page of getting in the network line index of going forward side by side, to bring in constant renewal in this search index storehouse; Comprise the pairing page of document chapters and sections in this search index storehouse, and the indexing key words corresponding with this page.
More preferably, chapters and sections matching unit 1 also comprises the updating device (not shown), and this updating device is according to said one or more candidate documents chapters and sections that online matching inquiry obtained, and sets up or upgrades said chapters and sections information bank.Particularly, chapters and sections coalignment 103 carries out online matching inquiry according to the chapters and sections identification information of destination document chapters and sections in the search index storehouse, obtain one or more candidate documents chapters and sections; Subsequently, updating device deposits one or more candidate documents chapters and sections that these chapters and sections coalignment 103 online matching inquiries are obtained in said chapters and sections information bank, to set up or to upgrade said chapters and sections information bank.For example; Chapters and sections matching unit 1 at first attempts in the chapters and sections information bank, carrying out matching inquiry to obtain said one or more candidate documents chapters and sections; When in the chapters and sections information bank, not obtaining the candidate documents chapters and sections, then through in the search index storehouse, carrying out online matching inquiry, to obtain said one or more candidate documents chapters and sections; And, set up or upgrade this chapters and sections information bank according to said one or more candidate documents chapters and sections that online matching inquiry obtained.Preferably; Updating device is set up mapping relations with the chapters and sections identification information of said one or more candidate documents chapters and sections and said destination document chapters and sections; Perhaps; Confirm the chapters and sections identification information of said candidate documents chapters and sections according to the chapter title information of said one or more candidate documents chapters and sections, and deposit said chapters and sections information bank together in, to set up or to upgrade said chapters and sections information bank.Those skilled in the art will be understood that the mode of above-mentioned foundation or renewal chapters and sections information bank is merely for example; Other foundation existing or that possibly occur from now on or the mode of upgrading the chapters and sections information bank are as applicable to the present invention; Also should be included in the protection domain of the present invention, and be contained in this with way of reference at this.
In a preferred embodiment, chapters and sections matching unit 1 also comprises the extraction element (not shown).Followingly with reference to Fig. 1 the preferred embodiment is described, chapters and sections deriving means 101 obtains destination document chapters and sections to be matched; Sign is confirmed the chapter title information of device 102 according to said destination document chapters and sections, confirms the chapters and sections identification information of said destination document chapters and sections; Extraction element carries out accessory ID to the chapter title information of said destination document chapters and sections and extracts processing; To obtain the chapters and sections supplementary of said destination document chapters and sections; Wherein, said chapters and sections supplementary includes but not limited to the pairing heading order number information of said chapter title information, title suffix information; Chapters and sections coalignment 103 carries out matching inquiry according to said chapters and sections identification information and said chapters and sections supplementary, to obtain said one or more candidate documents chapters and sections.Wherein, The detailed process that chapters and sections deriving means 101 and sign are confirmed device 102 is with aforementioned identical with reference to the performed operating process of the definite device of chapters and sections deriving means among the described embodiment of Fig. 1 101 and sign 102; For for simplicity, be contained in this with way of reference, do not give unnecessary details and do not do.
Particularly; Extraction element obtains destination document chapters and sections to be matched according to 101 of chapters and sections deriving means; Chapter title information through to these destination document chapters and sections is carried out semantic analysis or string matching etc.; In said chapter title information, discern and extract heading order number information, title suffix information etc.; With realization this chapter title information is carried out accessory ID and extract processing, and with the chapters and sections supplementary as these destination document chapters and sections such as the heading order number information that extracts, title suffix information.At this, extraction element identifies the heading order number information in the said chapter title information through keywords such as digital number included in the identification chapter title information or " the ", " chapter ", " piece of writing ", " returning ", " volume ", " joint ", " collection "; Through keywords such as included in the identification chapter title information " on ", " in ", D score, " continuing ", identify the title suffix information in the said chapter title information.Subsequently, chapters and sections coalignment 103 carries out matching inquiry according to said chapters and sections identification information and said chapters and sections supplementary, to obtain said one or more candidate documents chapters and sections.At this, it is basic identical that chapters and sections coalignment 103 matching inquiries obtain the operating process of chapters and sections coalignment 103 in process and Fig. 1 previous embodiment of candidate documents chapters and sections, for for simplicity, repeat no more, and mode by reference is contained in this here.For example; Chapters and sections deriving means 101 gets access to destination document chapters and sections to be matched chapter 9 for " those things of the Ming Dynasty---the online stack room of * * "; The chapter title information of these destination document chapters and sections is " chapter 9 war inevitable (figure) "; At this, the Word message that sign character and said sign character included " (figure) " representes that these chapters and sections are the picture chapter; Subsequently; Definite 102 pairs of these chapter title information of device of sign have been carried out removal heading order number information " chapter 9 ", have been removed the pretreatment operation of the Word message " (figure) " that sign character and said sign character included; Obtained pretreated chapter title information; Like title trunk information " war is inevitable ", and with it chapters and sections identification information as these destination document chapters and sections; And extraction element is through semantic analysis or string matching technology, and identification and extract heading order number information " chapter 9 " from this chapter title information is with the chapters and sections supplementary as these destination document chapters and sections; Then; Chapters and sections coalignment 103 is according to this chapters and sections identification information " war is inevitable " and chapters and sections supplementary " chapter 9 "; In the chapters and sections information bank, carry out matching inquiry; Perhaps, in the search index storehouse, carry out online matching inquiry, obtain and the chapter 9 of the corresponding a plurality of candidate documents chapters and sections of these destination document chapters and sections " those things of the Ming Dynasty are published in instalments reading * * net ", the chapter 9 of " those thing history culture reading channels * * of Ming Dynasty net " etc.
At this, those skilled in the art will be understood that the definite performed operation of device of extraction element and sign does not have temporal sequencing relation.
Those skilled in the art will be understood that the definite device of said extracted device and sign is merely example, and in practice, they can be two independently modules, also can be integrated in the module.
Those skilled in the art will be understood that and above-mentionedly chapter title information is carried out the mode that accessory ID extract to handle are merely for example; Other existing or possibly occur from now on chapter title information is carried out mode that accessory ID extract to handle as applicable to the present invention; Also should be included in the protection domain of the present invention, and be contained in this with way of reference at this.Those skilled in the art will be understood that also above-mentioned chapters and sections supplementary is merely for example, and other chapters and sections supplementarys existing or that possibly occur from now on also should be included in the protection domain of the present invention, and be contained in this at this with way of reference as applicable to the present invention.
At this; The present invention carries out matching inquiry through extracting the chapters and sections supplementary of destination document chapters and sections according to this chapters and sections supplementary and chapters and sections identification information, to obtain and the corresponding one or more candidate documents chapters and sections of said destination document chapters and sections; Improved the accuracy rate of matching inquiry; Further, improve user's document access efficient, and promoted user's experience.
Fig. 2 illustrates the equipment synoptic diagram that is used to obtain the candidate documents chapters and sections that are complementary with the destination document chapters and sections in accordance with a preferred embodiment of the present invention; Chapters and sections matching unit 1 also comprises generator 204, and wherein, sign confirms that device 202 and chapters and sections coalignment 203 are same or similar with corresponding intrument shown in Figure 1 respectively, so locate to repeat no more, and mode by reference is contained in this.
Wherein, said chapters and sections deriving means 201 obtains and the corresponding said destination document chapters and sections of user's accessing page request.Particularly, user's mutual through with subscriber equipment, input network address or clickthrough in browser; To submit accessing page request to; Chapters and sections deriving means 201 passes through such as dynamic web page techniques such as JSP, ASP or PHP, or the application programming interfaces (API) through calling this subscriber equipment, obtains this accessing page request; And then; Through sending this accessing page request to third party's equipment such as page servers, and be received from the document chapters and sections page that third party's equipment such as said page server is obtained based on this accessing page request coupling, with the document chapters and sections page as said destination document chapters and sections; Perhaps; Chapters and sections deriving means 201 is received from the accessing page request that the user submitted to that other device or third party devices obtain; This accessing page request is forwarded to third party's equipment such as page server; And be received from the document chapters and sections page that third party's equipment such as said page server is obtained based on this accessing page request coupling, as said destination document chapters and sections; Or, chapters and sections deriving means 201 obtain directly that third party's equipment such as page server obtained based on user's accessing page request coupling with the corresponding document chapters and sections of this accessing page request page, as said destination document chapters and sections.
Generator 204 offers said user with said one or more candidate documents chapters and sections.Particularly, generator 204 is through such as dynamic web page techniques such as JSP, ASP or PHP, one or more candidate documents chapters and sections that chapters and sections coalignment 203 couplings are obtained at random or in certain sequence or rule offer said user.At this, this generator 204 both can offer said user with the chapters and sections content of said one or more candidate documents chapters and sections, also can said one or more candidate documents chapters and sections corresponding summary infos of institute or URL be offered said user.Those skilled in the art will be understood that the above-mentioned mode that the candidate documents chapters and sections are offered the user is merely for example; Other existing or modes that the candidate documents chapters and sections offered the user that possibly occur from now on are as applicable to the present invention; Also should be included in the protection domain of the present invention, and be contained in this with way of reference at this.
At this, the present invention combines with application, obtains the destination document chapters and sections that the user asks; Chapter title information through these destination document chapters and sections is confirmed the chapters and sections identification information; And carry out matching inquiry in view of the above, obtaining and the corresponding one or more candidate documents chapters and sections of these destination document chapters and sections, and these one or more candidate documents chapters and sections are offered the user; Improve user's document access efficient, and promoted user's experience.
Preferably, chapters and sections matching unit 1 also comprises matching degree deriving means (not shown), and this matching degree deriving means obtains the matching degree of the said relatively destination document chapters and sections of said candidate documents chapters and sections; Wherein, said generator 204 offers said user according to said matching degree with said one or more candidate documents chapters and sections.Particularly, the matching degree deriving means mode of obtaining said matching degree includes but not limited to:
1) the said one or more candidate documents chapters and sections that obtained according to said chapters and sections coalignment 203 couplings, the direct matching degree of from third party's equipment such as chapters and sections information bank, obtaining the said relatively destination document chapters and sections of said one or more candidate documents chapters and sections;
2) the said one or more candidate documents chapters and sections that obtained according to said chapters and sections coalignment 203 couplings; From said one or more candidate documents chapters and sections, extract chapter title information; Through modes such as for example semantic analyses; Chapter title information to said chapter title information and said destination document chapters and sections compares, and obtains the matching degree of the said relatively destination document chapters and sections of said candidate documents chapters and sections.For example; The matching degree deriving means is confirmed said matching degree according to the identical shared ratio of number of words in the chapter title information of said candidate documents chapters and sections and said destination document chapters and sections; Like the chapter title information ratio of whole numbers of words that all identical then matching degrees are 100%, identical number of words accounts for the chapter title information of said destination document chapters and sections in the chapter title information is 80%, and then said matching degree is 80%; Perhaps; The matching degree deriving means is confirmed said matching degree according to title serial number information, title trunk information and title suffix information in the chapter title information of said candidate documents chapters and sections and said destination document chapters and sections; Like the whole identical then matching degrees of heading order number information, title trunk information and title suffix information is 100%, and having only the identical then matching degree of title trunk information is 80%.
Subsequently; Generator 204 is according to the matching degree of said one or more candidate documents chapters and sections and said destination document chapters and sections; Through such as dynamic web page techniques such as JSP, ASP or PHP; Said one or more candidate documents chapters and sections are offered said user according to said matching degree, preferentially offer said user, matching degree is directly offered said user etc. greater than the chapters and sections content of the candidate documents chapters and sections of predetermined matching degree threshold value like the candidate documents chapters and sections that matching degree is higher.
Those skilled in the art will be understood that the mode of the matching degree of the relative destination document chapters and sections of above-mentioned definite candidate documents chapters and sections is merely for example; The mode of the matching degree of the relative destination document chapters and sections of other definite candidate documents chapters and sections existing or that possibly occur from now on is as applicable to the present invention; Also should be included in the protection domain of the present invention, and be contained in this with way of reference at this.
More preferably, the matching degree of at least one is greater than predetermined matching degree threshold value in said one or more candidate documents chapters and sections, and said generator 204 offers said user with at least one pairing chapters and sections content in said one or more candidate documents chapters and sections; Otherwise said generator 204 offers said user with the pairing summary info of said one or more candidate documents chapters and sections.For example; Suppose that predetermined matching degree threshold value is 80%; The matching degree deriving means gets access to the chapter 9 of candidate documents chapters and sections " those things of the Ming Dynasty are published in instalments reading * * net " and the matching degree of destination document chapters and sections is 90%; Greater than said matching degree threshold value, the chapter 9 of candidate documents chapters and sections " those thing history culture reading channels * * of Ming Dynasty net " and the matching degree of destination document chapters and sections are 70%, less than said matching degree threshold value; Generator 204 is through such as dynamic web page techniques such as JSP, ASP or PHP, and only the chapters and sections content with the chapter 9 of candidate documents chapters and sections " those things of the Ming Dynasty are published in instalments reading * * net " offers said user.And for example; Suppose that predetermined matching degree threshold value is 80%; The matching degree deriving means gets access to the chapter 9 of candidate documents chapters and sections " those things of the Ming Dynasty are published in instalments reading * * net " and the matching degree of destination document chapters and sections is 60%; Less than said matching degree threshold value; The chapter 9 of candidate documents chapters and sections " those thing history culture reading channels * * of Ming Dynasty net " and the matching degree of destination document chapters and sections are 70%; Less than said matching degree threshold value, generator 204 offers said user through such as dynamic web page techniques such as JSP, ASP or PHP with the chapter 9 of candidate documents chapters and sections " those things of the Ming Dynasty are published in instalments reading * * net ", the pairing summary info of chapter 9 of candidate documents chapters and sections " those thing history culture reading channels * * of Ming Dynasty net ".At this, the pairing summary info of said candidate documents chapters and sections can provide that device 204 obtained from third party's equipment such as search engine, can provide also that device 204 generates according to the chapters and sections content of these candidate documents chapters and sections in real time.At this, the candidate documents chapters and sections that said predetermined matching degree threshold value can be preset and the matching degree threshold value of destination document chapters and sections, it can be adjusted according to the setting that situation or user are provided of candidate documents chapters and sections.
At this; The present invention offers the user according to the matching degree of the relative destination document chapters and sections of candidate documents chapters and sections that obtained with the candidate documents chapters and sections, makes the user obtain more intuitive viewing experience; Thereby improve user's document access efficient further, and promoted user's experience.
Preferably, said chapters and sections deriving means 201 obtains and the corresponding chapters and sections to be visited of user's accessing page request according to preset triggering rule, with as said destination document chapters and sections; Wherein, each obtains said chapters and sections to be visited to said preset triggering rule at least based on following, with as said destination document chapters and sections:
-said chapters and sections to be visited are empty chapter;
-said chapters and sections to be visited are the picture chapter;
-said chapters and sections to be visited are pairing to be linked as dead chain.
Particularly, user's mutual through with subscriber equipment, input network address or clickthrough in browser; To submit accessing page request to; Chapters and sections deriving means 201 passes through such as dynamic web page techniques such as JSP, ASP or PHP, or the application programming interfaces (API) through calling this subscriber equipment, obtains this accessing page request; And then; Through sending this accessing page request to third party's equipment such as page servers, and be received from the document chapters and sections page that third party's equipment such as said page server is obtained based on this accessing page request coupling, with the document chapters and sections page as said chapters and sections to be visited; Perhaps; Chapters and sections deriving means 201 is received from the accessing page request that the user submitted to that other device or third party devices obtain; This accessing page request is forwarded to third party's equipment such as page server; And be received from the document chapters and sections page that third party's equipment such as said page server is obtained based on this accessing page request coupling, as said chapters and sections to be visited; Or, chapters and sections deriving means 201 obtain directly that third party's equipment such as page server obtained based on user's accessing page request coupling with the corresponding document chapters and sections of this accessing page request page, as said chapters and sections to be visited; When said chapters and sections to be visited be empty chapter, picture chapter or this chapters and sections to be visited are pairing when being linked as dead chain, chapters and sections deriving means 201 should chapters and sections to be visited as the destination document chapters and sections.At this, empty chapter for example the chapters and sections content is empty or the effective Word message of chapters and sections less than the chapters and sections of predetermined threshold; The picture chapter for example in the chapters and sections perhaps the chapters and sections subject content be the chapters and sections of picture; Dead chain is for example clicked the link that jumps to catalogue page or other uncorrelated webpages behind the damned chain.
Those skilled in the art will be understood that above-mentioned preset triggering rule is merely for example, and other preset triggering rule existing or that possibly occur from now on also should be included in the protection domain of the present invention, and be contained in this at this with way of reference as applicable to the present invention.
Fig. 3 illustrates the method flow diagram that is used to obtain the candidate documents chapters and sections that are complementary with the destination document chapters and sections according to a further aspect of the present invention.
Chapters and sections matching unit 1 includes but not limited to the cloud that network host, single network server, a plurality of webserver collection or a plurality of server constitute.At this, cloud is by constituting based on the great amount of calculation machine of cloud computing (Cloud Computing) or the webserver, and wherein, cloud computing is a kind of of Distributed Calculation, a super virtual machine of being made up of the loosely-coupled computing machine collection of a group.Those skilled in the art will be understood that above-mentioned chapters and sections matching unit is merely for example, and other chapters and sections matching units existing or that possibly occur from now on also should be included in the protection domain of the present invention, and be contained in this at this with way of reference as applicable to the present invention.
In step S301, chapters and sections matching unit 1 obtains destination document chapters and sections to be matched.Particularly, in step S301, chapters and sections matching unit 1 is for example through obtaining the document chapters and sections randomly or sequentially in the chapters and sections information bank, with as destination document chapters and sections to be matched; Perhaps, mutual through with third party's equipment such as search engine obtains the document chapters and sections that third party's equipment such as said search engine is provided, with as destination document chapters and sections to be matched; Perhaps, through the page analysis device etc., detect each document chapters and sections of document, therefrom detect problematic chapters and sections, for example empty chapter, picture chapter, the pairing chapters and sections that are linked as dead chain etc. are with as destination document chapters and sections to be matched.At this, empty chapter for example the chapters and sections content is empty or the effective Word message of chapters and sections less than the chapters and sections of predetermined threshold; The picture chapter for example in the chapters and sections perhaps the chapters and sections subject content be the chapters and sections of picture; Dead chain is for example clicked the link that jumps to catalogue page or other uncorrelated webpages behind the damned chain.At this, store in the chapters and sections information bank large volume document chapters and sections and with the mapping relations of document, chapters and sections identification information etc., this chapters and sections information bank both can be arranged in chapters and sections matching unit 1, also can be arranged in the third party device that is connected with this chapters and sections matching unit 1.Those skilled in the art will be understood that the above-mentioned mode of obtaining destination document chapters and sections to be matched is merely for example; Other existing or modes of obtaining destination document chapters and sections to be matched that possibly occur from now on are as applicable to the present invention; Also should be included in the protection domain of the present invention, and be contained in this with way of reference at this.
In step S302, chapters and sections matching unit 1 is confirmed the chapters and sections identification information of said destination document chapters and sections according to the chapter title information of said destination document chapters and sections.Particularly, in step S302, chapters and sections matching unit 1 confirms that the mode of the chapters and sections identification information of destination document chapters and sections includes but not limited to:
1) according to the destination document chapters and sections to be matched that in step S301, obtained, from these destination document chapters and sections, extracts chapter title information, with the chapters and sections identification information of said chapter title information as these destination document chapters and sections.
2) according to the destination document chapters and sections to be matched that in step S301, obtained; From these destination document chapters and sections, extract chapter title information; Through this chapter title information being removed heading order number information, remove title suffix information, being removed the pretreatment operation such as Word message that sign character and said sign character included; Obtain the chapter title information after the pretreatment operation, and will this pretreated chapter title information as the chapters and sections identification information of these destination document chapters and sections.
Those skilled in the art will be understood that the mode of above-mentioned definite chapters and sections identification information is merely for example; The mode of other existing or definite chapters and sections identification informations that possibly occur from now on is as applicable to the present invention; Also should be included in the protection domain of the present invention, and be contained in this with way of reference at this.
In step S303, chapters and sections matching unit 1 carries out matching inquiry according to said chapters and sections identification information, to obtain and the corresponding one or more candidate documents chapters and sections of said destination document chapters and sections.Particularly, in step S303, chapters and sections matching unit 1 is through matching inquiry, and the mode that obtains said one or more candidate documents chapters and sections includes but not limited to:
1) according to the chapters and sections identification information of determined destination document chapters and sections in step S302, in the chapters and sections information bank, carries out matching inquiry, perhaps, in the search index storehouse, carry out online matching inquiry, to obtain said one or more candidate documents chapters and sections.For example, in step S301, chapters and sections matching unit 1 gets access to destination document chapters and sections to be matched chapter 6 for " those things of the Ming Dynasty---the online stack room of * * "; In step S302, chapters and sections matching unit 1 is with the chapter title information " beginning of chapter 6 ruleship " of these destination document chapters and sections chapters and sections identification information as these destination document chapters and sections; In step S303; Chapters and sections matching unit 1 is according to this chapters and sections identification information " beginning of chapter 6 ruleship "; In the chapters and sections information bank, carry out matching inquiry; Perhaps, in the search index storehouse, carry out online matching inquiry, obtain and the chapter 6 of the corresponding a plurality of candidate documents chapters and sections of these destination document chapters and sections " those things of the Ming Dynasty are published in instalments reading * * net ", the chapter 6 of " those thing history culture reading channels * * of Ming Dynasty net " etc.
2) the chapters and sections identification information of basis determined destination document chapters and sections in step S302; Document identification information in conjunction with the pairing destination document of these destination document chapters and sections; In the chapters and sections information bank, carry out matching inquiry; Perhaps, in the search index storehouse, carry out online matching inquiry, to obtain said one or more candidate documents chapters and sections.The document identification information such as document title, author's title or document content mark etc. can be used for identifying the information of document.For example; Suppose in step S301; The chapter title information of the destination document chapters and sections to be matched that chapters and sections matching unit 1 is accessed has included only heading order number information; As getting access to destination document chapters and sections to be matched chapter 6 for " those things of the Ming Dynasty---the online stack room of * * ", the chapter title of these destination document chapters and sections is " chapter 6 "; In step S302, chapters and sections matching unit 1 is with the chapters and sections identification information of this chapter title information " chapter 6 " as these destination document chapters and sections; In step S303; Chapters and sections matching unit 1 is according to the document identification information in the pairing destination document of these destination document chapters and sections " those things of the Ming Dynasty---the online stack room of * * "; Like document title " those things of the Ming Dynasty ", author's title " bright moon then " etc.; In the chapters and sections information bank, carry out matching inquiry, perhaps, in the search index storehouse, carry out online matching inquiry; Obtain and the corresponding one or more candidate documents of this destination document; Like " those things of the Ming Dynasty are published in instalments reading * * net ", " those thing history culture reading channels * * of Ming Dynasty net " etc., then, chapters and sections matching unit 1 is again according to carrying out matching inquiry in these these one or more candidate documents of chapters and sections identification information " chapter 6 "; To obtain and the corresponding one or more candidate documents chapters and sections of these destination document chapters and sections, like the chapter 6 of " those things of the Ming Dynasty are published in instalments reading * * net ", the chapter 6 of " those thing history culture reading channels * * of Ming Dynasty net " etc.
3) the chapters and sections identification information of basis determined destination document chapters and sections in step S302; And combine the chapters and sections supplementary such as heading order number information, title suffix information of said destination document chapters and sections; In the chapters and sections information bank, carry out matching inquiry; Perhaps, in the search index storehouse, carry out online matching inquiry, to obtain said one or more candidate documents chapters and sections.
4) the chapters and sections identification information of basis determined destination document chapters and sections in step S302; And combine the document identification information of the pairing destination document of said destination document chapters and sections and the chapters and sections supplementary of said destination document chapters and sections; In the chapters and sections information bank, carry out matching inquiry; Perhaps, in the search index storehouse, carry out online matching inquiry, to obtain said one or more candidate documents chapters and sections.
Those skilled in the art will be understood that the mode of above-mentioned matching inquiry acquisition candidate documents chapters and sections is merely for example; The mode that other matching inquiries existing or that possibly occur from now on obtain the candidate documents chapters and sections is as applicable to the present invention; Also should be included in the protection domain of the present invention, and be contained in this with way of reference at this.
At this; The present invention is through the chapter title information of destination document chapters and sections to be matched; Confirm the chapters and sections identification information of said destination document chapters and sections, and carry out matching inquiry in view of the above, to obtain and the corresponding one or more candidate documents chapters and sections of said destination document chapters and sections; Thereby improved user's document access efficient, and promoted user's experience.
Preferably, work continuously between each step of chapters and sections matching unit 1.Particularly, in step S301, chapters and sections matching unit 1 continues to obtain destination document chapters and sections to be matched; In step S302, chapters and sections matching unit 1 continues the chapter title information according to said destination document chapters and sections, confirms the chapters and sections identification information of said destination document chapters and sections; In step S303, chapters and sections matching unit 1 continues to carry out matching inquiry according to said chapters and sections identification information, to obtain and the corresponding one or more candidate documents chapters and sections of said destination document chapters and sections.At this; It will be understood by those skilled in the art that " continuing " is meant that above-mentioned each step constantly carries out the coupling of confirming of the obtaining of destination document chapters and sections, chapters and sections identification information and candidate documents chapters and sections respectively, stops to obtain destination document chapters and sections to be matched in a long time until this chapters and sections matching unit 1.
Preferably, in step S302,1 pair of said chapter title information of chapters and sections matching unit is carried out pretreatment operation, and to obtain said chapters and sections identification information, wherein, said chapters and sections identification information comprises the said chapter title information after the pretreatment operation; Wherein, said pretreatment operation comprise following at least each:
-removal heading order number information from said chapter title information;
-removal title suffix information from said chapter title information;
-removal the Word message that sign character and said sign character included from said chapter title information.
Particularly; In step S302; Chapters and sections matching unit 1 extracts chapter title information according to the destination document chapters and sections to be matched that in step S301, obtained from these destination document chapters and sections, through semantic analysis, cut technology such as speech or string matching; The parts such as Word message that identification and deletion heading order number information, title suffix information, sign character and said sign character are included from this chapter title information; This chapter title information is carried out pretreatment operation, obtaining the chapter title information after the pretreatment operation, and with the chapter title information after this pretreatment operation as the chapters and sections identification information.At this; In step S302; Chapters and sections matching unit 1 identifies the heading order number information in the said chapter title information through keywords such as digital number included in the identification chapter title information or " the ", " chapter ", " piece of writing ", " returning ", " volume ", " joint ", " collection "; Through keywords such as included in the identification chapter title information " on ", " in ", D score, " continuing ", identify the title suffix information in the said chapter title information; Included in sign character such as " () ", " [] " through comprising described in the identification chapter title information, " { } ", " [] " and the above-mentioned sign character such as Word messages such as " figure ", " newly ", " renewals ", identify the Word message that sign character and said sign character included in the said chapter title information.For example, in step S301, chapters and sections matching unit 1 gets access to destination document chapters and sections to be matched first chapters and sections for the chapter 7 in " those things of the Ming Dynasty---the online stack room of * * "; In step S302; Chapters and sections matching unit 1 through semantic analysis, cut technology such as speech or string matching; Identification and deletion heading order number information " chapter 7 " from this chapter title information; The chapter title information of these destination document chapters and sections " adversary that chapter 7 is fearful (on) " is carried out pretreatment operation, obtained pretreated chapter title information " fearful adversary (on) ", and with it chapters and sections identification information as these destination document chapters and sections; In step S303; Chapters and sections matching unit 1 is according to this chapters and sections identification information " fearful adversary (on) "; In the chapters and sections information bank, carry out matching inquiry; Perhaps, in the search index storehouse, carry out online matching inquiry, first chapters and sections of first chapters and sections of the chapter 7 of acquisition and the corresponding a plurality of candidate documents chapters and sections of these destination document chapters and sections " those things of the Ming Dynasty are published in instalments reading * * net ", the chapter 7 of " those thing history culture reading channels * * of Ming Dynasty net " etc.Preferably; In step S302; 1 pair of said chapter title information of chapters and sections matching unit is carried out above-mentioned whole three kinds of pretreatment operation; And will remove chapter title information that parts such as Word message back that heading order number information, title suffix information, sign character and said sign character included obtained title trunk information as this chapter title information, and with this title trunk information as the chapters and sections identification information, operate accordingly for chapters and sections matching unit 1 is follow-up.Connect example; In step S302; Chapters and sections matching unit 1 through semantic analysis, cut technology such as speech or string matching; From with identification this chapter title information and deletion heading order number information " chapter 7 " and title suffix information " (on) ", the chapter title information of these destination document chapters and sections " adversary that chapter 7 is fearful (on) " has been carried out pretreatment operation, obtained pretreated chapter title information; Like title trunk information " to the adversary who is afraid of ", and with it chapters and sections identification information as these destination document chapters and sections; In step S303; Chapters and sections matching unit 1 is according to this chapters and sections identification information " fearful adversary "; In the chapters and sections information bank, carry out matching inquiry; Perhaps, in the search index storehouse, carry out online matching inquiry, upper and lower two parts of upper and lower two parts chapters and sections of the chapter 7 of acquisition and the corresponding a plurality of candidate documents chapters and sections of these destination document chapters and sections " those things of the Ming Dynasty are published in instalments reading * * net ", the chapter 7 of " those thing history culture reading channels * * of Ming Dynasty net " etc.
Those skilled in the art will be understood that above-mentioned pretreatment operation to chapter title information is merely for example; Other existing or possibly occur from now on to the pretreatment operation of chapter title information as applicable to the present invention; Also should be included in the protection domain of the present invention, and be contained in this with way of reference at this.
At this, the present invention at first carries out pre-service to the chapter title information of destination document chapters and sections, to obtain the chapters and sections identification information; And carry out matching inquiry in view of the above; To obtain and the corresponding one or more candidate documents chapters and sections of said destination document chapters and sections, improved the accuracy rate of matching inquiry, further; Improve user's document access efficient, and promoted user's experience.
Preferably, in step S303, chapters and sections matching unit 1 carries out matching inquiry according to said chapters and sections identification information in the chapters and sections information bank, to obtain said one or more candidate documents chapters and sections.Particularly; In step S303; Chapters and sections matching unit 1 through the mode of database matching inquiry, carries out matching inquiry according to the chapters and sections identification information of determined destination document chapters and sections in step S302 in the chapters and sections information bank; To obtain and the corresponding one or more document chapters and sections records of this chapters and sections identification information; As with the corresponding one or more candidate documents chapters and sections of these destination document chapters and sections, wherein, the chapter title information or the chapters and sections identification information of said one or more document chapters and sections record are all or part of consistent with the chapters and sections identification information of these destination document chapters and sections.At this, store in the chapters and sections information bank large volume document chapters and sections and with the mapping relations of document, chapters and sections identification information etc., this chapters and sections information bank both can be arranged in chapters and sections matching unit 1, also can be arranged in the third party device that is connected with this chapters and sections matching unit 1.
Preferably, in step S303, chapters and sections matching unit 1 carries out online matching inquiry according to said chapters and sections identification information in the search index storehouse, to obtain said one or more candidate documents chapters and sections.Particularly; In step S303, chapters and sections matching unit 1 carries out online matching inquiry according to the chapters and sections identification information of determined destination document chapters and sections in step S302 in the search index storehouse; To obtain and the corresponding one or more document chapters and sections pages of this chapters and sections identification information; With as with the corresponding one or more candidate documents chapters and sections of these destination document chapters and sections, wherein, the pairing indexing key words of said one or more document chapters and sections pages; Like chapter title information or chapters and sections identification information, all or part of consistent with the chapters and sections identification information of these destination document chapters and sections.At this; The search index library storage is through the Webpage of index process, and search engine is handled through continuing to climb the page of getting in the network line index of going forward side by side, to bring in constant renewal in this search index storehouse; Comprise the pairing page of document chapters and sections in this search index storehouse, and the indexing key words corresponding with this page.
More preferably, in step S305 (not shown), chapters and sections matching unit 1 is according to said one or more candidate documents chapters and sections that online matching inquiry obtained, and sets up or upgrades said chapters and sections information bank.Particularly, in step S303, chapters and sections matching unit 1 carries out online matching inquiry according to the chapters and sections identification information of destination document chapters and sections in the search index storehouse, obtain one or more candidate documents chapters and sections; Subsequently, in step S305, one or more candidate documents chapters and sections that chapters and sections matching unit 1 will online matching inquiry obtains in step S303 deposit said chapters and sections information bank in, to set up or to upgrade said chapters and sections information bank.For example; Chapters and sections matching unit 1 at first attempts in the chapters and sections information bank, carrying out matching inquiry to obtain said one or more candidate documents chapters and sections; When in the chapters and sections information bank, not obtaining the candidate documents chapters and sections, then through in the search index storehouse, carrying out online matching inquiry, to obtain said one or more candidate documents chapters and sections; And, set up or upgrade this chapters and sections information bank according to said one or more candidate documents chapters and sections that online matching inquiry obtained.Preferably; In step S305; Chapters and sections matching unit 1 is set up mapping relations with the chapters and sections identification information of said one or more candidate documents chapters and sections and said destination document chapters and sections, perhaps, confirms the chapters and sections identification information of said candidate documents chapters and sections according to the chapter title information of said one or more candidate documents chapters and sections; And deposit said chapters and sections information bank together in, to set up or to upgrade said chapters and sections information bank.Those skilled in the art will be understood that the mode of above-mentioned foundation or renewal chapters and sections information bank is merely for example; Other foundation existing or that possibly occur from now on or the mode of upgrading the chapters and sections information bank are as applicable to the present invention; Also should be included in the protection domain of the present invention, and be contained in this with way of reference at this.
In a preferred embodiment, also comprise step S306 (not shown).Followingly with reference to Fig. 3 the preferred embodiment is described, in step S301, chapters and sections matching unit 1 obtains destination document chapters and sections to be matched; In step S302, chapters and sections matching unit 1 is confirmed the chapters and sections identification information of said destination document chapters and sections according to the chapter title information of said destination document chapters and sections; In step S306; The chapter title information of 1 pair of said destination document chapters and sections of chapters and sections matching unit is carried out accessory ID and is extracted processing; To obtain the chapters and sections supplementary of said destination document chapters and sections; Wherein, said chapters and sections supplementary includes but not limited to the pairing heading order number information of said chapter title information, title suffix information; In step S303, chapters and sections matching unit 1 carries out matching inquiry according to said chapters and sections identification information and said chapters and sections supplementary, to obtain said one or more candidate documents chapters and sections.Wherein, Chapters and sections matching unit 1 in step S301 and step S302 performed detailed process with aforementioned with reference to Fig. 3 among the described embodiment the performed operating process of step S301 and step S302 identical; For for simplicity, be contained in this with way of reference, do not give unnecessary details and do not do.
Particularly; In step S306; Chapters and sections matching unit 1 is according to the destination document chapters and sections to be matched that obtain in step S301; Chapter title information through to these destination document chapters and sections is carried out semantic analysis or string matching etc.; In said chapter title information, discern and extract heading order number information, title suffix information etc., with realization this chapter title information is carried out accessory ID and extract processing, and with the chapters and sections supplementary as these destination document chapters and sections such as the heading order number information that extracts, title suffix information.At this; In step S306; Chapters and sections matching unit 1 identifies the heading order number information in the said chapter title information through keywords such as digital number included in the identification chapter title information or " the ", " chapter ", " piece of writing ", " returning ", " volume ", " joint ", " collection "; Through keywords such as included in the identification chapter title information " on ", " in ", D score, " continuing ", identify the title suffix information in the said chapter title information.Subsequently, in step S303, chapters and sections matching unit 1 carries out matching inquiry according to said chapters and sections identification information and said chapters and sections supplementary, to obtain said one or more candidate documents chapters and sections.At this; The operating process of chapters and sections matching unit 1 in step S303 is basic identical in the process of chapters and sections matching unit 1 matching inquiry acquisition candidate documents chapters and sections in step S303 and Fig. 3 previous embodiment; For for simplicity, repeat no more, and mode by reference is contained in this here.For example; In step S301; Chapters and sections matching unit 1 gets access to destination document chapters and sections to be matched chapter 9 for " those things of the Ming Dynasty---the online stack room of * * "; The chapter title information of these destination document chapters and sections is " chapter 9 war inevitable (figure) ", and at this, the Word message that sign character and said sign character included " (figure) " representes that these chapters and sections are the picture chapter; Subsequently; In step S302; 1 pair of this chapter title information of chapters and sections matching unit has been carried out removal heading order number information " chapter 9 ", has been removed the pretreatment operation of the Word message " (figure) " that sign character and said sign character included; Obtained pretreated chapter title information, like title trunk information " war is inevitable ", and with it chapters and sections identification information as these destination document chapters and sections; And in step S306, chapters and sections matching unit 1 is through semantic analysis or string matching technology, and identification and extract heading order number information " chapter 9 " from this chapter title information is with the chapters and sections supplementary as these destination document chapters and sections; Then; In step S303; Chapters and sections matching unit 1 carries out matching inquiry, perhaps according to this chapters and sections identification information " war is inevitable " and chapters and sections supplementary " chapter 9 " in the chapters and sections information bank; In the search index storehouse, carry out online matching inquiry, obtain and the chapter 9 of the corresponding a plurality of candidate documents chapters and sections of these destination document chapters and sections " those things of the Ming Dynasty are published in instalments reading * * net ", the chapter 9 of " those thing history culture reading channels * * of Ming Dynasty net " etc.
At this, those skilled in the art will be understood that chapters and sections matching unit 1 performed operation in step S306 and step S302 does not have temporal sequencing relation.
Those skilled in the art will be understood that and above-mentionedly chapter title information is carried out the mode that accessory ID extract to handle are merely for example; Other existing or possibly occur from now on chapter title information is carried out mode that accessory ID extract to handle as applicable to the present invention; Also should be included in the protection domain of the present invention, and be contained in this with way of reference at this.Those skilled in the art will be understood that also above-mentioned chapters and sections supplementary is merely for example, and other chapters and sections supplementarys existing or that possibly occur from now on also should be included in the protection domain of the present invention, and be contained in this at this with way of reference as applicable to the present invention.
At this; The present invention carries out matching inquiry through extracting the chapters and sections supplementary of destination document chapters and sections according to this chapters and sections supplementary and chapters and sections identification information, to obtain and the corresponding one or more candidate documents chapters and sections of said destination document chapters and sections; Improved the accuracy rate of matching inquiry; Further, improve user's document access efficient, and promoted user's experience.
Fig. 4 illustrates the method flow diagram that is used to obtain the candidate documents chapters and sections that are complementary with the destination document chapters and sections in accordance with a preferred embodiment of the present invention; Wherein, step S402 and step S403 are same or similar with corresponding step shown in Figure 3 respectively, so locate to repeat no more, and mode by reference is contained in this.
Wherein, in step S401, chapters and sections matching unit 1 obtains and the corresponding said destination document chapters and sections of user's accessing page request.Particularly, user's mutual through with subscriber equipment, input network address or clickthrough in browser; To submit accessing page request to, in step S401, chapters and sections matching unit 1 passes through such as dynamic web page techniques such as JSP, ASP or PHP; Or application programming interfaces (API) through calling this subscriber equipment; Obtain this accessing page request, and then, through sending this accessing page request to third party's equipment such as page servers; And be received from the document chapters and sections page that third party's equipment such as said page server is obtained based on this accessing page request coupling, with the document chapters and sections page as said destination document chapters and sections; Perhaps; In step S401; Chapters and sections matching unit 1 is received from the accessing page request that the user submitted to that other product or third party devices obtain; This accessing page request is forwarded to third party's equipment such as page server, and is received from the document chapters and sections page that third party's equipment such as said page server is obtained based on this accessing page request coupling, as said destination document chapters and sections; Or, in step S401, chapters and sections matching unit 1 obtain directly that third party's equipment such as page server obtained based on user's accessing page request coupling with the corresponding document chapters and sections of this accessing page request page, as said destination document chapters and sections.
In step S404, chapters and sections matching unit 1 offers said user with said one or more candidate documents chapters and sections.Particularly, in step S404, chapters and sections matching unit 1 is through such as dynamic web page techniques such as JSP, ASP or PHP, will in step S403, mate one or more candidate documents chapters and sections of being obtained at random or in certain sequence or rule offer said user.At this, in step S404, chapters and sections matching unit 1 both can offer said user with the chapters and sections content of said one or more candidate documents chapters and sections, also can said one or more candidate documents chapters and sections corresponding summary infos of institute or URL be offered said user.Those skilled in the art will be understood that the above-mentioned mode that the candidate documents chapters and sections are offered the user is merely for example; Other existing or modes that the candidate documents chapters and sections offered the user that possibly occur from now on are as applicable to the present invention; Also should be included in the protection domain of the present invention, and be contained in this with way of reference at this.
At this, the present invention combines with application, obtains the destination document chapters and sections that the user asks; Chapter title information through these destination document chapters and sections is confirmed the chapters and sections identification information; And carry out matching inquiry in view of the above, obtaining and the corresponding one or more candidate documents chapters and sections of these destination document chapters and sections, and these one or more candidate documents chapters and sections are offered the user; Improve user's document access efficient, and promoted user's experience.
Preferably, in step S407 (not shown), chapters and sections matching unit 1 obtains the matching degree of the said relatively destination document chapters and sections of said candidate documents chapters and sections; Wherein, in step S404, chapters and sections matching unit 1 offers said user according to said matching degree with said one or more candidate documents chapters and sections.Particularly, in step S407, the mode that chapters and sections matching unit 1 obtains said matching degree includes but not limited to:
1), directly from third party's equipment such as chapters and sections information bank, obtains the matching degree of the said relatively destination document chapters and sections of said one or more candidate documents chapters and sections according in step S403, mating the said one or more candidate documents chapters and sections that obtained;
2) according in step S403, mating the said one or more candidate documents chapters and sections that obtained; From said one or more candidate documents chapters and sections, extract chapter title information; Through modes such as for example semantic analyses; Chapter title information to said chapter title information and said destination document chapters and sections compares, and obtains the matching degree of the said relatively destination document chapters and sections of said candidate documents chapters and sections.For example; In step S407; Chapters and sections matching unit 1 is confirmed said matching degree according to the identical shared ratio of number of words in the chapter title information of said candidate documents chapters and sections and said destination document chapters and sections; Like the chapter title information ratio of whole numbers of words that all identical then matching degrees are 100%, identical number of words accounts for the chapter title information of said destination document chapters and sections in the chapter title information is 80%, and then said matching degree is 80%; Perhaps; In step S407; Chapters and sections matching unit 1 is confirmed said matching degree according to title serial number information, title trunk information and title suffix information in the chapter title information of said candidate documents chapters and sections and said destination document chapters and sections; Like the whole identical then matching degrees of heading order number information, title trunk information and title suffix information is 100%, and having only the identical then matching degree of title trunk information is 80%.
Subsequently; In step S404; Chapters and sections matching unit 1 is according to the matching degree of said one or more candidate documents chapters and sections and said destination document chapters and sections; Through such as dynamic web page techniques such as JSP, ASP or PHP; Said one or more candidate documents chapters and sections are offered said user according to said matching degree, preferentially offer said user, matching degree is directly offered said user etc. greater than the chapters and sections content of the candidate documents chapters and sections of predetermined matching degree threshold value like the candidate documents chapters and sections that matching degree is higher.
Those skilled in the art will be understood that the mode of the matching degree of the relative destination document chapters and sections of above-mentioned definite candidate documents chapters and sections is merely for example; The mode of the matching degree of the relative destination document chapters and sections of other definite candidate documents chapters and sections existing or that possibly occur from now on is as applicable to the present invention; Also should be included in the protection domain of the present invention, and be contained in this with way of reference at this.
More preferably; The matching degree of at least one is greater than predetermined matching degree threshold value in said one or more candidate documents chapters and sections; In step S404, chapters and sections matching unit 1 offers said user with at least one pairing chapters and sections content in said one or more candidate documents chapters and sections; Otherwise in step S404, chapters and sections matching unit 1 offers said user with the pairing summary info of said one or more candidate documents chapters and sections.For example; Suppose that predetermined matching degree threshold value is 80%; In step S407; Chapters and sections matching unit 1 gets access to the chapter 9 of candidate documents chapters and sections " those things of the Ming Dynasty are published in instalments reading * * net " and the matching degree of destination document chapters and sections is 90%, and greater than said matching degree threshold value, the chapter 9 of candidate documents chapters and sections " those thing history culture reading channels * * of Ming Dynasty net " and the matching degree of destination document chapters and sections are 70%; Less than said matching degree threshold value; In step S404, chapters and sections matching unit 1 is through such as dynamic web page techniques such as JSP, ASP or PHP, and only the chapters and sections content with the chapter 9 of candidate documents chapters and sections " those things of the Ming Dynasty are published in instalments reading * * net " offers said user.And for example; Suppose that predetermined matching degree threshold value is 80%; In step S407; Chapters and sections matching unit 1 gets access to the chapter 9 of candidate documents chapters and sections " those things of the Ming Dynasty are published in instalments reading * * net " and the matching degree of destination document chapters and sections is 60%, and less than said matching degree threshold value, the chapter 9 of candidate documents chapters and sections " those thing history culture reading channels * * of Ming Dynasty net " and the matching degree of destination document chapters and sections are 70%; Less than said matching degree threshold value; In step S404, chapters and sections matching unit 1 offers said user through such as dynamic web page techniques such as JSP, ASP or PHP with the chapter 9 of candidate documents chapters and sections " those things of the Ming Dynasty are published in instalments reading * * net ", the pairing summary info of chapter 9 of candidate documents chapters and sections " those thing history culture reading channels * * of Ming Dynasty net ".At this; The pairing summary info of said candidate documents chapters and sections can be that chapters and sections matching unit 1 obtains from third party's equipment such as search engine in step S404, also can be that chapters and sections matching unit 1 chapters and sections content according to these candidate documents chapters and sections in step S404 generates in real time.At this, the candidate documents chapters and sections that said predetermined matching degree threshold value can be preset and the matching degree threshold value of destination document chapters and sections, it can be adjusted according to the setting that situation or user are provided of candidate documents chapters and sections.
At this; The present invention offers the user according to the matching degree of the relative destination document chapters and sections of candidate documents chapters and sections that obtained with the candidate documents chapters and sections, makes the user obtain more intuitive viewing experience; Thereby improve user's document access efficient further, and promoted user's experience.
Preferably, in step S401, chapters and sections matching unit 1 obtains and the corresponding chapters and sections to be visited of user's accessing page request according to preset triggering rule, with as said destination document chapters and sections; Wherein, each obtains said chapters and sections to be visited to said preset triggering rule at least based on following, with as said destination document chapters and sections:
-said chapters and sections to be visited are empty chapter;
-said chapters and sections to be visited are the picture chapter;
-said chapters and sections to be visited are pairing to be linked as dead chain.
Particularly, user's mutual through with subscriber equipment, input network address or clickthrough in browser; To submit accessing page request to, in step S401, chapters and sections matching unit 1 passes through such as dynamic web page techniques such as JSP, ASP or PHP; Or application programming interfaces (API) through calling this subscriber equipment; Obtain this accessing page request, and then, through sending this accessing page request to third party's equipment such as page servers; And be received from the document chapters and sections page that third party's equipment such as said page server is obtained based on this accessing page request coupling, with the document chapters and sections page as said chapters and sections to be visited; Perhaps; In step S401; Chapters and sections matching unit 1 is received from the accessing page request that the user submitted to that other product or third party devices obtain; This accessing page request is forwarded to third party's equipment such as page server, and is received from the document chapters and sections page that third party's equipment such as said page server is obtained based on this accessing page request coupling, as said chapters and sections to be visited; Or, in step S401, chapters and sections matching unit 1 obtain directly that third party's equipment such as page server obtained based on user's accessing page request coupling with the corresponding document chapters and sections of this accessing page request page, as said chapters and sections to be visited; When said chapters and sections to be visited be empty chapter, picture chapter or this chapters and sections to be visited are pairing when being linked as dead chain, in step S401, chapters and sections matching unit 1 should chapters and sections to be visited as the destination document chapters and sections.At this, empty chapter for example the chapters and sections content is empty or the effective Word message of chapters and sections less than the chapters and sections of predetermined threshold; The picture chapter for example in the chapters and sections perhaps the chapters and sections subject content be the chapters and sections of picture; Dead chain is for example clicked the link that jumps to catalogue page or other uncorrelated webpages behind the damned chain.
Those skilled in the art will be understood that above-mentioned preset triggering rule is merely for example, and other preset triggering rule existing or that possibly occur from now on also should be included in the protection domain of the present invention, and be contained in this at this with way of reference as applicable to the present invention.
To those skilled in the art, obviously the invention is not restricted to the details of above-mentioned example embodiment, and under the situation that does not deviate from spirit of the present invention or essential characteristic, can realize the present invention with other concrete form.Therefore; No matter from which point; All should regard embodiment as exemplary; And be nonrestrictive, scope of the present invention is limited accompanying claims rather than above-mentioned explanation, therefore is intended to the implication of the equivalents that drops on claim and all changes in the scope are included in the present invention.Should any Reference numeral in the claim be regarded as limit related claim.In addition, obviously other unit or step do not got rid of in " comprising " speech, and odd number is not got rid of plural number.A plurality of unit of stating in the device claim or device also can be realized through software or hardware by a unit or device.The first, the second word such as grade is used for representing title, and does not represent any specific order.

Claims (20)

1. one kind by the computer implemented method that is used to obtain the candidate documents chapters and sections that are complementary with the destination document chapters and sections, and wherein, this method comprises:
A obtains destination document chapters and sections to be matched;
B confirms the chapters and sections identification information of said destination document chapters and sections according to the chapter title information of said destination document chapters and sections;
C carries out matching inquiry according to said chapters and sections identification information, to obtain and the corresponding one or more candidate documents chapters and sections of said destination document chapters and sections.
2. method according to claim 1, wherein, said step b comprises:
-said chapter title information is carried out pretreatment operation, to obtain said chapters and sections identification information, wherein, said chapters and sections identification information comprises the said chapter title information after the pretreatment operation;
Wherein, said pretreatment operation comprise following at least each:
-removal heading order number information from said chapter title information;
-removal title suffix information from said chapter title information;
-removal the Word message that sign character and said sign character included from said chapter title information.
3. method according to claim 1 and 2, wherein, said step c comprises:
-according to said chapters and sections identification information, in the chapters and sections information bank, carry out matching inquiry, to obtain said one or more candidate documents chapters and sections.
4. according to each described method in the claim 1 to 3, wherein, said step c comprises:
-according to said chapters and sections identification information, in the search index storehouse, carry out online matching inquiry, to obtain said one or more candidate documents chapters and sections.
5. method according to claim 4, wherein, claim 4 comprises method according to claim 3, wherein, this method also comprises:
-according to said one or more candidate documents chapters and sections that online matching inquiry obtained, set up or upgrade said chapters and sections information bank.
6. according to each described method in the claim 1 to 5, wherein, this method also comprises:
-the chapter title information of said destination document chapters and sections is carried out accessory ID extract processing, to obtain the chapters and sections supplementary of said destination document chapters and sections;
Wherein, said step c comprises:
-carry out matching inquiry according to said chapters and sections identification information and said chapters and sections supplementary, to obtain said one or more candidate documents chapters and sections;
Wherein, said chapters and sections supplementary comprise following at least each:
The pairing heading order number information of-said chapter title information;
The pairing title suffix of-said chapter title information information.
7. according to each described method in the claim 1 to 6, wherein, said step a comprises:
-obtain and the corresponding said destination document chapters and sections of user's accessing page request;
Wherein, this method also comprises:
X offers said user with said one or more candidate documents chapters and sections.
8. method according to claim 7, wherein, this method also comprises:
-obtain the matching degree of the said relatively destination document chapters and sections of said candidate documents chapters and sections;
Wherein, said step x comprises:
-according to said matching degree, said one or more candidate documents chapters and sections are offered said user.
9. method according to claim 8, wherein, said step x comprises:
-at least one matching degree offers said user greater than predetermined matching degree threshold value with at least one pairing chapters and sections content in said one or more candidate documents chapters and sections in said one or more candidate documents chapters and sections;
-otherwise, the pairing summary info of said one or more candidate documents chapters and sections is offered said user.
10. according to each described method in the claim 7 to 9, wherein, said step a comprises:
-according to preset triggering rule, obtain and the corresponding chapters and sections to be visited of user's accessing page request, with as said destination document chapters and sections;
Wherein, each obtains said chapters and sections to be visited to said preset triggering rule at least based on following, with as said destination document chapters and sections:
-said chapters and sections to be visited are empty chapter;
-said chapters and sections to be visited are the picture chapter;
-said chapters and sections to be visited are pairing to be linked as dead chain.
11. a chapters and sections matching unit that is used to obtain the candidate documents chapters and sections that are complementary with the destination document chapters and sections, wherein, this equipment comprises:
The chapters and sections deriving means is used to obtain destination document chapters and sections to be matched;
Sign is confirmed device, is used for the chapter title information according to said destination document chapters and sections, confirms the chapters and sections identification information of said destination document chapters and sections;
The chapters and sections coalignment is used for carrying out matching inquiry according to said chapters and sections identification information, to obtain and the corresponding one or more candidate documents chapters and sections of said destination document chapters and sections.
12. chapters and sections matching unit according to claim 11, wherein, said sign confirms that device is used for:
-said chapter title information is carried out pretreatment operation, to obtain said chapters and sections identification information, wherein, said chapters and sections identification information comprises the said chapter title information after the pretreatment operation;
Wherein, said pretreatment operation comprise following at least each:
-removal heading order number information from said chapter title information;
-removal title suffix information from said chapter title information;
-removal the Word message that sign character and said sign character included from said chapter title information.
13. according to claim 11 or 12 described chapters and sections matching units, wherein, said chapters and sections coalignment is used for:
-according to said chapters and sections identification information, in the chapters and sections information bank, carry out matching inquiry, to obtain said one or more candidate documents chapters and sections.
14. according to each described chapters and sections matching unit in the claim 11 to 13, wherein, said chapters and sections coalignment is used for:
-according to said chapters and sections identification information, in the search index storehouse, carry out online matching inquiry, to obtain said one or more candidate documents chapters and sections.
15. chapters and sections matching unit according to claim 14, wherein, claim 14 comprises chapters and sections matching unit according to claim 13, and wherein, this equipment also comprises:
Updating device is used for said one or more candidate documents chapters and sections of obtaining according to online matching inquiry, sets up or upgrades said chapters and sections information bank.
16. according to each described chapters and sections matching unit in the claim 11 to 15, wherein, this equipment also comprises:
Extraction element is used for that the chapter title information of said destination document chapters and sections is carried out accessory ID and extracts processing, to obtain the chapters and sections supplementary of said destination document chapters and sections;
Wherein, said chapters and sections coalignment is used for:
-carry out matching inquiry according to said chapters and sections identification information and said chapters and sections supplementary, to obtain said one or more candidate documents chapters and sections;
Wherein, said chapters and sections supplementary comprise following at least each:
The pairing heading order number information of-said chapter title information;
The pairing title suffix of-said chapter title information information.
17. according to each described chapters and sections matching unit in the claim 11 to 16, wherein, said chapters and sections deriving means is used for:
-obtain and the corresponding said destination document chapters and sections of user's accessing page request;
Wherein, this equipment also comprises:
Generator is used for said one or more candidate documents chapters and sections are offered said user.
18. chapters and sections matching unit according to claim 17, wherein, this equipment also comprises:
The matching degree deriving means is used to obtain the matching degree of the said relatively destination document chapters and sections of said candidate documents chapters and sections;
Wherein, said generator is used for:
-according to said matching degree, said one or more candidate documents chapters and sections are offered said user.
19. chapters and sections matching unit according to claim 18, wherein, said generator is used for:
-at least one matching degree offers said user greater than predetermined matching degree threshold value with at least one pairing chapters and sections content in said one or more candidate documents chapters and sections in said one or more candidate documents chapters and sections;
-otherwise, the pairing summary info of said one or more candidate documents chapters and sections is offered said user.
20. according to each described chapters and sections matching unit in the claim 17 to 19, wherein, said chapters and sections deriving means is used for:
-according to preset triggering rule, obtain and the corresponding chapters and sections to be visited of user's accessing page request, with as said destination document chapters and sections;
Wherein, each obtains said chapters and sections to be visited to said preset triggering rule at least based on following, with as said destination document chapters and sections:
-said chapters and sections to be visited are empty chapter;
-said chapters and sections to be visited are the picture chapter;
-said chapters and sections to be visited are pairing to be linked as dead chain.
CN201110243486A 2011-08-22 2011-08-22 Method and equipment for acquiring candidate document sections matched with target document section Pending CN102314492A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110243486A CN102314492A (en) 2011-08-22 2011-08-22 Method and equipment for acquiring candidate document sections matched with target document section

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110243486A CN102314492A (en) 2011-08-22 2011-08-22 Method and equipment for acquiring candidate document sections matched with target document section

Publications (1)

Publication Number Publication Date
CN102314492A true CN102314492A (en) 2012-01-11

Family

ID=45427657

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110243486A Pending CN102314492A (en) 2011-08-22 2011-08-22 Method and equipment for acquiring candidate document sections matched with target document section

Country Status (1)

Country Link
CN (1) CN102314492A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544172A (en) * 2012-07-13 2014-01-29 深圳市世纪光速信息技术有限公司 Method and device for processing chapter catalogs of E-book
CN104346186A (en) * 2013-08-02 2015-02-11 腾讯科技(深圳)有限公司 Method and device for off-line reading of network books
CN104572620A (en) * 2014-12-31 2015-04-29 百度在线网络技术(北京)有限公司 Method and device for displaying chapter content
CN107256209A (en) * 2017-06-16 2017-10-17 江苏经贸职业技术学院 A kind of document exchange method
CN107256211A (en) * 2017-06-16 2017-10-17 江苏经贸职业技术学院 A kind of document exchange method
CN107291670A (en) * 2017-06-16 2017-10-24 江苏经贸职业技术学院 A kind of document exchange method
CN108319688A (en) * 2018-02-01 2018-07-24 上海掌门科技有限公司 A kind of method and apparatus for user read prompting
CN108681603A (en) * 2018-05-22 2018-10-19 福建天泉教育科技有限公司 The method of fast search tree structure data, storage medium in database
CN110781287A (en) * 2019-09-02 2020-02-11 上海连尚网络科技有限公司 Method and equipment for providing electronic books
CN110781269A (en) * 2019-09-29 2020-02-11 上海连尚网络科技有限公司 Method and equipment for searching books in reading application
CN112818111A (en) * 2021-01-28 2021-05-18 北京百度网讯科技有限公司 Document recommendation method and device, electronic equipment and medium
CN114925102A (en) * 2022-06-29 2022-08-19 抖音视界(北京)有限公司 Book content acquisition method and device, computer equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101089843A (en) * 2006-06-15 2007-12-19 王刘忠 Search method only for product or service supply information
CN101441635A (en) * 2007-11-21 2009-05-27 周磊 Abstract method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101089843A (en) * 2006-06-15 2007-12-19 王刘忠 Search method only for product or service supply information
CN101441635A (en) * 2007-11-21 2009-05-27 周磊 Abstract method

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544172A (en) * 2012-07-13 2014-01-29 深圳市世纪光速信息技术有限公司 Method and device for processing chapter catalogs of E-book
CN103544172B (en) * 2012-07-13 2019-01-29 深圳市世纪光速信息技术有限公司 A kind of chapters and sections catalogue processing method and processing device of e-book
CN104346186A (en) * 2013-08-02 2015-02-11 腾讯科技(深圳)有限公司 Method and device for off-line reading of network books
CN104572620A (en) * 2014-12-31 2015-04-29 百度在线网络技术(北京)有限公司 Method and device for displaying chapter content
CN107256211A (en) * 2017-06-16 2017-10-17 江苏经贸职业技术学院 A kind of document exchange method
CN107291670A (en) * 2017-06-16 2017-10-24 江苏经贸职业技术学院 A kind of document exchange method
CN107256209A (en) * 2017-06-16 2017-10-17 江苏经贸职业技术学院 A kind of document exchange method
CN108319688A (en) * 2018-02-01 2018-07-24 上海掌门科技有限公司 A kind of method and apparatus for user read prompting
CN108319688B (en) * 2018-02-01 2021-11-23 上海掌门科技有限公司 Method and equipment for reading reminding of user
CN108681603A (en) * 2018-05-22 2018-10-19 福建天泉教育科技有限公司 The method of fast search tree structure data, storage medium in database
CN110781287A (en) * 2019-09-02 2020-02-11 上海连尚网络科技有限公司 Method and equipment for providing electronic books
CN110781287B (en) * 2019-09-02 2022-12-30 上海连尚网络科技有限公司 Method and equipment for providing electronic books
CN110781269A (en) * 2019-09-29 2020-02-11 上海连尚网络科技有限公司 Method and equipment for searching books in reading application
CN112818111A (en) * 2021-01-28 2021-05-18 北京百度网讯科技有限公司 Document recommendation method and device, electronic equipment and medium
CN112818111B (en) * 2021-01-28 2023-07-25 北京百度网讯科技有限公司 Document recommendation method, device, electronic equipment and medium
CN114925102A (en) * 2022-06-29 2022-08-19 抖音视界(北京)有限公司 Book content acquisition method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN102314492A (en) Method and equipment for acquiring candidate document sections matched with target document section
US11797626B2 (en) Search result filters from resource content
CN103491205B (en) The method for pushing of a kind of correlated resources address based on video search and device
CN103544176A (en) Method and device for generating page structure template corresponding to multiple pages
CN102346778B (en) Method and equipment for providing searching result
CN108280114B (en) Deep learning-based user literature reading interest analysis method
CN104699737A (en) Method and system for managing a search
US20130219255A1 (en) Authorized Syndicated Descriptions of Linked Web Content Displayed With Links in User-Generated Content
CN104750754A (en) Website industry classification method and server
CN101986293A (en) Method and equipment for displaying search answer information on search interface
CN102567290B (en) Method, device and equipment for expanding short text to be processed
CN103631794A (en) Method, device and equipment for sorting search results
CN102737021B (en) Search engine and realization method thereof
CN103049495A (en) Method, device and equipment for providing searching advice corresponding to inquiring sequence
US20110208715A1 (en) Automatically mining intents of a group of queries
CN102722498A (en) Search engine and implementation method thereof
CN104221017A (en) Finding data in connected corpuses using examples
CN101986306A (en) Method and equipment for acquiring yellow page information based on query sequence
CN102236710A (en) Method and equipment for displaying news information in query result
CN103207900B (en) Position-based information provides the method and apparatus of inquiry solicited message to targeted customer
CN102855245A (en) Image similarity determining method and image similarity determining equipment
CN109214417A (en) The method for digging and device, computer equipment and readable medium that user is intended to
CN103914488A (en) Document collection, identification, association, search and display system
CN102609539A (en) Search method and search system
US20170235835A1 (en) Information identification and extraction

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20120111