Embodiment
The present invention includes and be used for determining that the implication of document is so that the method and system that document and content are complementary.Below will be in detail with reference in the literary composition and the exemplary embodiments of the present invention shown in the accompanying drawing.To in institute's drawings attached and following explanation, use identical drawing reference numeral to represent identical or similar part.
Can make up various systems according to the present invention.Fig. 1 shows the synoptic diagram of the canonical system that exemplary embodiments of the present invention can operate therein.The present invention can also operate other system and realize in other systems.
System 100 shown in Fig. 1 comprises multi-client device 102a-n, server unit 104,140 and network 106.The network 106 that illustrates comprises the internet.In other embodiment, can use other network, for example Intranet.And the method according to this invention can be moved on single computing machine.Each includes computer-readable medium the client apparatus 102a-n that illustrates, and for example is coupled to the random access memory (RAM) 108 of processor 110 in the illustrated embodiment.Processor 110 is carried out the executable program instruction set of computing machine that is stored in the storer 108.Sort processor can comprise microprocessor, ASIC and state machine.Sort processor comprises or can communicate with medium (for example computer-readable medium) that this medium memory instruction when instruction is carried out by computing machine, makes processor carry out step described herein.The embodiment of computer-readable medium includes, but are not limited to this, and electronics, light, magnetic or other storeies maybe can offer computer-readable instruction the transmitting device of processor (for example processor of getting in touch with the input media of touch-sensitive).Other suitable media include, but are not limited to this, floppy disk, CD-ROM, disk, memory chip, ROM, RAM, ASIC, the processor of configuration (configured processor), all light media, all tapes or other magnetic medium, or computer processor can be from any other medium of its reading command.And various other forms of computer-readable mediums can transmit or transport and instruct computing machine, comprise router, special use or public network, or other transmitting devices or passage, existing wired have again wireless.These instructions can comprise the code of being write by any computer programming language (for example, comprising C, C++, C#, Visual Basic, Java and JavaScript).
Client apparatus 102a-n also can comprise many outsides or interior arrangement, and for example mouse, CD-ROM, keyboard, display or other input or output device.The example of client apparatus 102a-n is personal computer, digital assistants, personal digital assistant, portable phone, mobile phone, smart phone, pager, digitizing tablet, portable computer, based on the device of processor and the system and the device of similar type.Generally speaking, client apparatus 102a-n can be the platform based on processor that is connected to network 106 of any kind, and it is mutual with one or more application programs.The client apparatus 102a-n that illustrates comprises execution browser application (for example, the InternetExplorer of Microsoft's 6.0 versions
TM, the Netscape Navigator of 7.1 versions of Netscape communication company
TM, and the Safari of Apple's 1.0 versions
TM) personal computer.By client apparatus 102a-n, user 112a-n can communicate with one another and communicate by letter with device with the other system that is coupled to network 106 by network 106.
As shown in Figure 1, server unit 104,140 also is coupled to network 106.The document server device 104 that illustrates comprises the server of carrying out document engine application program.The content server device 140 that illustrates comprises the server of carrying out content engine application program.System 100 also can comprise a plurality of other server units.Be similar to client apparatus 102a-n, the server unit 104,140 shown in each comprises the processor 116,142 that is coupled to computer-readable memory 118,144.Each server unit 104,140 is described to single computer system, but it may be implemented as the network of computer processor.The example of server unit 104,140 is server, mainframe computer, network computer, based on the device of processor and the system and the device of similar type.Client processor 110 and processor-server 116,142 can be any one in many known computer processors, for example from Santa Clara, the Intel Company of California and Schaumbug, the processor of the motorola inc of Illinois.
The storer 118 of document server device 104 comprises document engine application program, also is usually said document engine 124.Document engine 124 is determined the implication of source article, and with source article and entries match, for example, another article or knowledge entry.Clauses and subclauses can be contents itself or can be associated with content.Can be from being connected to other device retrieval source article of network 106.Article (article, file, thing) comprise document, for example, the webpage of various forms, any other information of available audio frequency, video or any type on for example HTML, XML, XHTML, Portable Document format (PDF) file, and word processor, database and application document file or network (for example internet), PC or other calculating or the memory storage.Embodiment described herein is relevant with document usually, but embodiment can operate on the article of any type.Knowledge entry is can be by any physics and the thing non-physics of symbolic representation, for example, and key word, node, catalogue, people, notion, product, phrase, document and other knowledge units.Knowledge entry can be taked any form, for example, and individual character, term, phrase, document or some other structurized and non-structured information.Embodiment described here is relevant with key word usually, but embodiment can operate on the knowledge entry of any kind.
The document engine 124 that illustrates comprises pretreater 134, implication processor 136 and matched-field processors 137.In the illustrated embodiment, each includes the computer code that resides in the storer 118.Document engine 124 receives the requests for content that is positioned on the source document.This request can receive from the device that is connected to network 106.Content can comprise document, for example webpage and advertisement, and knowledge entry, for example key word.Pretreater 134 reception sources documents are also analyzed source document, with the notion determining to comprise in the document and the district in the document.Notion can be with relevant with it bunch, or word collection or term define, and for example, wherein word or term can be synonyms.Notion also can define with various other information, for example, and the relation of related notion, the relationship strength of related notion, part of speech, common usage, frequency of utilization, notion width and other statistics about the usage of notion in language.Implication processor 136 concept of analysis and district are to eliminate and the irrelevant district of the subject concept of source document.Implication processor 136 is determined the source implication of source document from remaining district then.Matched-field processors 137 is complementary with the source implication of source document with from the implication of the clauses and subclauses of one group of clauses and subclauses.
The storer 144 of content server device 140 comprises content engine application program, promptly said content engines 146.In the illustrated embodiment, content engines comprises the computer code that resides in the storer 144.Content engines 146 receives the coupling clauses and subclauses from document server device 104, and these clauses and subclauses or the content relevant with these clauses and subclauses are placed in the source document.In one embodiment, the match keywords that content engines 146 receives from matching engine 137, and document (for example advertisement) and its are associated.Then advertisement is sent to requestor's website, and be placed in the source document (for example framework on the webpage).
Document server device 104 also provides the visit to other memory elements in the implication database 120 that illustrates in this example (for example implication memory element).The implication database can be used for storing the implication relevant with source document.Content server device 140 also provides the visit to other memory elements in the content data base 148 that illustrates in an embodiment (for example content storage element).Content data base can be used for store items and the content relevant with clauses and subclauses, for example key word and relevant advertisement.Data storage elements can comprise the combination of any data storing method or several different methods, includes but not limited to array, Hash table, the tabulation and to (pair).The data storage device of other similar type can serviced apparatus 104 and 140 visits.
Be noted that the present invention can comprise the system with structure different with the structure shown in Fig. 1.For example, in systems more according to the present invention, pretreater 134 and implication processor 136 can not be the parts of document engine 124, and can off line carry out their operation.In one embodiment, when document engine was creeped document (for example webpage), the implication of document was periodically determined.In another embodiment, when receiving when being placed on the requests for content in the document, the implication of document is determined.System 100 shown in Fig. 1 is just typical, and is used to explain the typical method shown in Fig. 2-3.
In the exemplary embodiments shown in Fig. 1, user 112a can visit the document on the device that is connected to network 106, for example webpage on the website.For example, user 112a can visit to comprise about fly fly at Washington on news website and angle the webpage of the story of (fly fishing) salmon.In this example, webpage comprises four districts: title division comprises a word summary of title, author and the story of story; Main story part comprises the text and the picture of story; Relate to the banner of selling automobile; And the link part, cover the link of other webpages on this website (for example national news, weather and physical culture).The owner of news website may want to sell the advertising space on the source web page, thereby makes clauses and subclauses (for example advertisement) be presented at the request on the webpage via network 106 to archive server 104 transmissions.
For source web page and clauses and subclauses are complementary, at first determine the implication of source web page.Document engine 124 access originator webpages, and can receive this webpage.The source implication of webpage may before be determined, and can be stored in the implication database 120.If the source implication before had been determined, document engine 124 is retrieved the source implications so.
If the source implication of webpage also is not determined, then pretreater 134 is at first discerned the district that comprises in the notion that comprises in the webpage and the webpage.For example, pretreater can determine that webpage has four districts, and corresponding to header area, story district, banner district and link zone, and webpage comprises about salmon, flies that fly is angled, the notion of Washington, automobile, news, weather and physical culture.These districts needn't be corresponding to the framework on the webpage.The implication engine is determined the local concept in each district then, and arranges portion of owning administration notion.Can use multiple weighting coefficient and arrange these notions, for example, the frequency of the importance in district, the importance of notion, notion, the quantity in district that this notion occurs and the width of notion.
Implication engine 136 is discerned and the irrelevant district of most of notion then, and the deletion local concept relevant with them.In this example, the banner district does not comprise the notion relevant especially with story with link zone, thereby the notion that relates to these districts is deleted.The implication engine is determined the source based on the notion of remainder then.Implication can be the vector of the notion of weighting.For example, implication can be salmon (40%), fly fly and angle (40%), and Washington (20%).
This implication can be matched clauses and subclauses by matched-field processors 137.Clauses and subclauses can comprise, document, and for example webpage and advertisement, and knowledge entry, key word for example, and can receive from content server device 140.Clauses and subclauses can be stored in the content data base 148.For example, if clauses and subclauses are key words, for example, fly that fly is angled, knapsack, CD and travelling, then matching engine compares the source implication with the implication relevant with key word, mates determining.Can use discrepancy factor (biasing factor), the cost of for example relevant each click data with each key word.For example, if it is a coupling more approaching than the implication of keyword travel that key word flies the implication that fly angles, but the current advertiser that has bought keyword travel has higher each clicking rate cost, and the implication engine can be with source implication and keyword travel coupling.Content filter also can be used to filter out in the adult perhaps sensitive content.
The key word of coupling can be received by content server device 140.Content engines 146 is related with the key word of advertisement and coupling, and advertisement is presented on the source web page.For example, if this key word of travelling has been mated, then content engines will angle the demonstration advertisement relevant with keyword travel on the source web page of salmon story comprising about fly fly at Washington.If user 112a points to its input media advertisement and clicks it, then the user can be directed into the webpage relevant with this advertisement.
Can carry out the whole bag of tricks according to the present invention.A typical method comprises the access originator article according to the present invention; The a plurality of districts of identification in source article; Determine at least one local concept relevant with each district; The local concept that analyzes each district is to discern any unrelated regions; Delete the local concept relevant to determine related notion with any unrelated regions; Analyze related notion to be identified for the source implication of this source article; And with source implication and clauses and subclauses implication coupling, this clauses and subclauses implication is with relevant from the clauses and subclauses of one group of clauses and subclauses.Can use discrepancy factor so that source implication and clauses and subclauses implication are mated.The source implication can be the vector of the notion of weighting.
In certain embodiments, this method also is included in the clauses and subclauses that show coupling on the source article.In these embodiments, source article can be a webpage, and the clauses and subclauses of coupling can be key words.Alternatively, source article can be a webpage, and the clauses and subclauses of coupling can be advertisements.
In certain embodiments, this method also is included in and shows on the source article and the relevant content of coupling clauses and subclauses.In these embodiments, source article can be a webpage, and the clauses and subclauses of coupling can be key words, and relevant content can be advertisement.In addition, source article can be first webpage, and the clauses and subclauses of coupling can be second webpages, and relevant content can be advertisement.Alternatively, source article can be first webpage, and the clauses and subclauses of coupling can be second webpages, and relevant content can be the link to second webpage.
In certain embodiments, determine that each local concept that at least one local concept is related in each district determines mark.The local concept that has highest score in each district is maximally related local concept.In addition, the identification unrelated regions relates to the correction mark of at first determining each local concept.Next, based on revising mark, determine to comprise the global listings through arranging of all local concepts.Deletion merge to be revised mark to the contribution of the global listings local concept less than the scheduled volume of gross score, with the tabulation that bears results.Then, determine in the results list, to have the unrelated regions of least relevant local concept.From the results list, delete the local concept relevant then, to generate the tabulation of related notion with unrelated regions.And the correction mark that is used for related notion by normalization is determined the source implication.
Another typical method according to the present invention comprises the access originator article; In source article, discern first content district and second content district at least; Determine at least the first local concept relevant, and determine at least the second local concept relevant with the second content district with the first content district; At least in part based on first local concept, with the first content district with from the first entry coupling of one group of clauses and subclauses; And at least in part based on second local concept, with the second content district with from the second entry of one group of clauses and subclauses coupling.
Fig. 2 at length shows according to typical method 200 of the present invention to Fig. 3.Because the mode of multiple execution the method according to this invention is arranged, the mode with example provides typical method here.Method 200 shown in Fig. 2 can be carried out by various system, perhaps realizes.The method of carrying out by system shown in Figure 1 100 below by case description 200, and when the case method of key drawing 2 to Fig. 3 each element of frame of reference 100.The method 200 that illustrates provides determining of source document implication, with source document and entries match.
Each piece shown in Fig. 2 and Fig. 3 is illustrated in one or more steps of carrying out in the typical method 200.With reference to Fig. 2, in piece 202, case method 200 beginnings.After the piece 202 is piece 204, and document is accessed in this piece.For example document can and receive by the visit of the device on network 106 or other sources.
After the piece 204 is piece 206, determines the implication of source document in this piece.In the illustrated embodiment,, delete the notion that comprises in useless district and the analytical documentation remaining area, determine the implication of source document by with the document subregion.For example, in the illustrated embodiment, the notion that comprises in pretreater 134 initial definite source documents, and the district in definite document.Implication processor 136 is arranged notions, and removes and irrelevant district of most of notion and relevant notion.From remaining notion, implication processor 136 is determined the source implication of document.
Fig. 3 shows the subroutine 206 of the method 200 that is used for shown in the execution graph 2.Subroutine 206 provides the implication of the source document that receives.An example of subroutine is as follows.
Subroutine begins at piece 300 places.At piece 300 places, the pretreated notion of source document to determine to comprise in the document.This can realize literal and notion correspondence (align) then by natural language and text-processing so that document is construed to literal.In one embodiment, for example, at first determine mark corresponding to literal, the indicia matched that comprises in the semantic network with these marks and interconnection implication then by natural language and text-processing.From the mark of coupling, from semantic network, determine term then.The notion that is used to the term determined then is designated, and provides the possibility relevant with term.
After the piece 300 is piece 302, the district of identification document in this piece.For example,, comprise formatted message, can determine the district of document based on specific search procedure (heuristics).For example, for a source document, it is a webpage that comprises html tag, and these labels can be used for helping cog region.For example, at<title〉....</title〉text in the label can be marked as the text of header area.Surpassing 70% text therein is at label<a〉....</a in paragraph in text can be labeled as at link zone.The structure of text also can be used in the help cog region.For example, the hurdle in text in the short paragraph or the table does not have sentence structure, for example, does not have verb, few word or does not have punctuate to finish sentence, can be labeled as to be in the list area.Have the text in long sentence of verb and punctuate, can be labeled as the part text area.When district's type change, can begin to create the newly developed area from the text that is marked with newtype.In one embodiment, if text area obtains to surpass 20% document, then can be divided into smaller piece.
After the piece 302 is piece 304, determines the related notion in each district in this frame.In the illustrated embodiment, implication processor 136 is treated to the notion of each district's identification, thinks that each district proposes one group of less local concept.Relation between notion, the frequency that notion occurs in the district and the width of notion can be used in determining of local concept.
In one embodiment, for each district, each notion is placed in the tabulation.By using the multiple factor to determine mark, notion is arranged in the tabulation for each notion.For example, if first notion has very strong the getting in touch with other notions, this can be used to improve the mark of first notion and relevant notion thereof.Regulate this effect by the frequency of first notion appearance and the focus (or width) of first notion, to reduce the wider notion of very general concept and meaning.But the notion of rejection frequency on certain threshold value.The discernable importance of notion also can influence the mark of notion.For example, can in processing procedure, determine the importance of notion earlier by causing whether the word that comprises notion is used the runic mark.After the notion in each district is arranged, remove least relevant notion.This can be by selecting one group of highest level notion or remove the notion that the rank mark is lower than certain mark and realize.
After the piece 304 is piece 306, in this piece, merges and analyze all local concepts in each district.In the illustrated embodiment, implication processor 136 receives all local concepts in each district, and, create the global listings through arranging of all local concepts by the mark of for example each local concept.Discrepancy factor (for example importance in each district) can be used for determining mark.The importance in each district can be determined by the type in district and the size in district.For example, it is more important than link zone that the header area can be considered to, and the notion that appears at the header area can be given more weighting than the notion that appears at link zone.Can give extra weighting to the notion that appears at more than a district.For example, the copy of notion can merge, and their mark can add together.This global listings is classified then, for example, can delete end position (trailing) notion of 20% that contribution is less than gross score, to generate the global listings as a result of local concept.
After the piece 306 is piece 308, in this frame, and its irrelevant district of deletion main concept nothing to do with notion.In the illustrated embodiment, implication processor 136 is determined unrelated regions, is comprised the district of the notion that has nothing to do with most of notion, and with they deletions.Should be appreciated that " relevant " and " irrelevant " do not need to determine with absolute standard." relevant " is the indication of higher relatively relationship degree and/or predetermined relationship degree." irrelevant " is the indication of relatively low relationship degree and/or predetermined relationship degree.By the deletion unrelated regions, relevant unrelated concepts is deleted.For example, if source document is the webpage of being made by various frameworks, some frameworks relate to advertisement or the link of other webpages to the website, thereby, will be irrelevant with the main meaning of webpage.
In one embodiment, for example, the global listings of determining in the piece 306 as a result can be the approximate value of document implication, and can be used for removing and the incoherent district of document implication.For each district, whether the most representative local concept that implication processor 136 can be identified for this district is not present in as a result in the global listings.If at global listings as a result, then this district can not be labeled as uncorrelated in the most representative local concept that is used for distinguishing.For example, the most representative local concept that is used for distinguishing can be the notion with highest score as piece 304 determined these districts.
After the piece 308 is piece 310, in this piece, determines the implication of source document.In the illustrated embodiment, implication processor 136 recomputates the representativeness of the local concept in the district that does not have deletion, to create the list related of notion.Local concept in list related can be chosen the notion of fixed qty so that the implication tabulation to be provided, and normalization is to provide the source implication then.For example, can only use the notion that comprises in the relevant district to create the implication tabulation, and from new tabulation, remove all notions except 25 top scores.The mark of top score notion can be by normalization to provide the source implication.In this example, the source implication can be the weighing vector of related notion.
Coming with reference to Fig. 2, is piece 208 after the piece 206 again, receives one group of clauses and subclauses in this piece.For example, can receive clauses and subclauses from content server device 140 by matched-field processors 137.Clauses and subclauses can comprise knowledge entry, for example, and key word, and document, for example, advertisement and webpage.Each clauses and subclauses that receives can have an implication relevant with it.For keyword meanings, for example, can determine by using the information relevant with key word, as being 10/690 in relevant U.S. Patent Application Serial Number, 328 (attorney docket number No.53051/288072), title is that it is incorporated into this for your guidance described in " Methods and Systems for Understanding a Meaning o f aKnowledge Item Using Information Associated with the KnowledgeItem ".Can be to determine the implication of document with the mode that mode is identical as described in Figure 3.
After the piece 208 is piece 210, in this piece with source document and entries match.In matching process, can use discrepancy factor.For example, in one embodiment, with source implication and the keyword meanings coupling that is associated with key word from a set of keyword.Matching engine is compared source implication and keyword meanings, and uses discrepancy factor, and for example relevant with these key words each click data cost is to determine coupling.The key word of coupling can be sent to content server device 140 then.Content engines 146 can be with the relative advertisement of the key word coupling of coupling, and on source document display ads.Alternatively, content engines can show key word itself on source document.In another embodiment, the implication and the source implication of advertisement are mated.In this embodiment, content engines 146 advertisement that can cause mating is presented on the source document.In another embodiment, the implication and the source implication of webpage are mated.In this embodiment, content engines 146 can cause the demonstration of the advertisement relevant with webpage.After the piece 210 is piece 212, and in this piece, this method finishes.
In one embodiment, after source document was accessed, pretreater 134 was analyzed source document, to determine the content regions of source document.Content regions can be the district that comprises a large amount of texts, and for example, text area or link zone maybe can be important relatively districts, for example, and the header area.Can determine these districts by using aforesaid search procedure.As mentioned above, pretreater 134 also can be discerned the notion that is positioned at each content regions.Implication processor 136 can use these notions, to determine the implication of each content regions.Matched-field processors 137 can be with the implication and the keyword matching of each content regions.Content engines 146 can mate the key word advertisement relevant with it of coupling, and on source document display ads.Alternatively, content engines can show key word itself on source document.In another embodiment, implication and district's implication of advertisement are mated.In this embodiment, content engines 146 advertisement that can cause mating is presented on the source document.In another embodiment, with the implication of webpage and the implication coupling in district.In this embodiment, content engines 146 can cause the demonstration of the advertisement relevant with webpage.In one embodiment, advertisement or key word are displayed in the content regions with its coupling.
The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.