CN100470541C - Methods and systems for determining a meaning of a document to match the document to conte - Google Patents

Methods and systems for determining a meaning of a document to match the document to conte Download PDF

Info

Publication number
CN100470541C
CN100470541C CNB2004800219225A CN200480021922A CN100470541C CN 100470541 C CN100470541 C CN 100470541C CN B2004800219225 A CNB2004800219225 A CN B2004800219225A CN 200480021922 A CN200480021922 A CN 200480021922A CN 100470541 C CN100470541 C CN 100470541C
Authority
CN
China
Prior art keywords
document
notion
district
implication
subclauses
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2004800219225A
Other languages
Chinese (zh)
Other versions
CN1829990A (en
Inventor
阿达姆·J·韦斯曼
吉拉德·伊斯雷尔·埃勒巴兹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Publication of CN1829990A publication Critical patent/CN1829990A/en
Application granted granted Critical
Publication of CN100470541C publication Critical patent/CN100470541C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Transfer Between Computers (AREA)
  • Machine Translation (AREA)

Abstract

Systems and methods for determining a meaning of a document to match the document to content are described. In one aspect, a source article is accessed, a plurality of regions in the source article are identified, at least one local concept associated with each region is determined, the local concepts of each region are analyzed to identify any unrelated regions, the local concepts associated with any unrelated regions are eliminated to determine relevant concepts, the relevant concepts are analyzed to determine a source meaning for the source article, and the source meaning is matched with an item meaning associated with an item from a set of items.

Description

Be used for determining that the implication of document is so that the method and system of document and content match
Technical field
Relate generally to document of the present invention.More specifically, the present invention relates to be used for determining that the implication of document is so that the system and method that document (document) and content (content) are complementary.
Background technology
For example, document (for example webpage) can be complementary with other content on the internet.Document comprises, for example, and such as the webpage of the various forms of HTML, XML, XHTML; Portable Document format (PDF) file; And word processor and application document file.
The example that document and content are complementary is in Internet advertising.For example, the publisher of website can allow at its webpage promulgating advertisement with charge.When the publisher wishes that service provider can be provided at the advertisement that shows on the webpage to the publisher on the webpage during to user's display ads.Service provider can be according to selecting advertisement such as demographic information, webpage classification (for example, physical culture or amusement) or the multiple factor of web page contents about the user.Service provider also can make web page contents and be complementary such as the knowledge entry (knowledge item) from the key word of Keyword List.Can on webpage, show key word associated advertisement afterwards with coupling.The user can operate mouse or other also " click " advertisement of input equipment, to watch the webpage on the advertiser website that goods for sale or service are provided.
In another example of Internet advertising, in peer link or similar portions (section), showing actual match keywords on publisher's webpage.With above-mentioned example class seemingly, the content of webpage and one or more key word are complementary, this key word for example is displayed in the peer link part afterwards.When the user clicks specific key word, the user can be guided to the result of page searching of the mixing that may comprise advertisement and routine search result.The advertiser bid key word is so that their advertisement appears on this such keyword search results page.The user can operate mouse or other also " click " advertisement of input equipment, to watch the webpage on the advertiser website that goods for sale and service are provided.
Advertiser wishes that the content of webpage and advertisement are closely related, because if the content height correlation that this advertisement and user are read on webpage, the user who then reads this webpage more likely clicks this advertisement and commodity or the service that is provided is provided.The publisher of webpage wishes that also the content of advertisement content and webpage is complementary, because if the user has clicked advertisement, the publisher can obtain remuneration usually; And with regard to sensitive content, not matching is that advertiser or publisher are undesirable.
Document (for example webpage) can comprise a plurality of districts, such as, the framework with regard to webpage (frame).Some districts can be uncorrelated with the main contents of document.Therefore, the content in uncorrelated district may be desalinated the content of entire document with incoherent theme.Therefore, be complementary, when determining the implication of source document, need to analyze the source document in relevant district in order to make document and content.
Summary of the invention
Embodiments of the invention comprise that the implication of determining document is so that the system and method that document and content are complementary.An aspect of the embodiment of the invention comprises: access originator article (sourcearticle, source thing, source file); A plurality of districts in the identification source article; Definite at least one local concept (local concept) that is associated with each district; Analyze the local concept in each district, to discern any uncorrelated district; The local concept that deletion is associated with uncorrelated district is to determine related notion; Analyze related notion, to determine the source implication of source article; And make the source implication and the clauses and subclauses implication that is associated with clauses and subclauses from one group of clauses and subclauses is complementary.These clauses and subclauses can itself be contents, or can be associated with content.In one embodiment, the present invention further is included in the clauses and subclauses that show coupling on the source article.In another embodiment, the present invention further is included on the source article and shows and this clauses and subclauses associated content.Others of the present invention are meant the computer system and computer computer-readable recording medium with feature relevant with aforesaid aspect.
Description of drawings
Read following embodiment with reference to accompanying drawing, these and other feature, aspect and the advantage that the present invention may be better understood, in the accompanying drawing:
Fig. 1 shows the block diagram of system according to an embodiment of the invention;
Fig. 2 shows the process flow diagram of method according to an embodiment of the invention; And
Fig. 3 shows the process flow diagram of the subroutine of method shown in Figure 2.
Embodiment
The present invention includes and be used for determining that the implication of document is so that the method and system that document and content are complementary.Below will be in detail with reference in the literary composition and the exemplary embodiments of the present invention shown in the accompanying drawing.To in institute's drawings attached and following explanation, use identical drawing reference numeral to represent identical or similar part.
Can make up various systems according to the present invention.Fig. 1 shows the synoptic diagram of the canonical system that exemplary embodiments of the present invention can operate therein.The present invention can also operate other system and realize in other systems.
System 100 shown in Fig. 1 comprises multi-client device 102a-n, server unit 104,140 and network 106.The network 106 that illustrates comprises the internet.In other embodiment, can use other network, for example Intranet.And the method according to this invention can be moved on single computing machine.Each includes computer-readable medium the client apparatus 102a-n that illustrates, and for example is coupled to the random access memory (RAM) 108 of processor 110 in the illustrated embodiment.Processor 110 is carried out the executable program instruction set of computing machine that is stored in the storer 108.Sort processor can comprise microprocessor, ASIC and state machine.Sort processor comprises or can communicate with medium (for example computer-readable medium) that this medium memory instruction when instruction is carried out by computing machine, makes processor carry out step described herein.The embodiment of computer-readable medium includes, but are not limited to this, and electronics, light, magnetic or other storeies maybe can offer computer-readable instruction the transmitting device of processor (for example processor of getting in touch with the input media of touch-sensitive).Other suitable media include, but are not limited to this, floppy disk, CD-ROM, disk, memory chip, ROM, RAM, ASIC, the processor of configuration (configured processor), all light media, all tapes or other magnetic medium, or computer processor can be from any other medium of its reading command.And various other forms of computer-readable mediums can transmit or transport and instruct computing machine, comprise router, special use or public network, or other transmitting devices or passage, existing wired have again wireless.These instructions can comprise the code of being write by any computer programming language (for example, comprising C, C++, C#, Visual Basic, Java and JavaScript).
Client apparatus 102a-n also can comprise many outsides or interior arrangement, and for example mouse, CD-ROM, keyboard, display or other input or output device.The example of client apparatus 102a-n is personal computer, digital assistants, personal digital assistant, portable phone, mobile phone, smart phone, pager, digitizing tablet, portable computer, based on the device of processor and the system and the device of similar type.Generally speaking, client apparatus 102a-n can be the platform based on processor that is connected to network 106 of any kind, and it is mutual with one or more application programs.The client apparatus 102a-n that illustrates comprises execution browser application (for example, the InternetExplorer of Microsoft's 6.0 versions TM, the Netscape Navigator of 7.1 versions of Netscape communication company TM, and the Safari of Apple's 1.0 versions TM) personal computer.By client apparatus 102a-n, user 112a-n can communicate with one another and communicate by letter with device with the other system that is coupled to network 106 by network 106.
As shown in Figure 1, server unit 104,140 also is coupled to network 106.The document server device 104 that illustrates comprises the server of carrying out document engine application program.The content server device 140 that illustrates comprises the server of carrying out content engine application program.System 100 also can comprise a plurality of other server units.Be similar to client apparatus 102a-n, the server unit 104,140 shown in each comprises the processor 116,142 that is coupled to computer-readable memory 118,144.Each server unit 104,140 is described to single computer system, but it may be implemented as the network of computer processor.The example of server unit 104,140 is server, mainframe computer, network computer, based on the device of processor and the system and the device of similar type.Client processor 110 and processor-server 116,142 can be any one in many known computer processors, for example from Santa Clara, the Intel Company of California and Schaumbug, the processor of the motorola inc of Illinois.
The storer 118 of document server device 104 comprises document engine application program, also is usually said document engine 124.Document engine 124 is determined the implication of source article, and with source article and entries match, for example, another article or knowledge entry.Clauses and subclauses can be contents itself or can be associated with content.Can be from being connected to other device retrieval source article of network 106.Article (article, file, thing) comprise document, for example, the webpage of various forms, any other information of available audio frequency, video or any type on for example HTML, XML, XHTML, Portable Document format (PDF) file, and word processor, database and application document file or network (for example internet), PC or other calculating or the memory storage.Embodiment described herein is relevant with document usually, but embodiment can operate on the article of any type.Knowledge entry is can be by any physics and the thing non-physics of symbolic representation, for example, and key word, node, catalogue, people, notion, product, phrase, document and other knowledge units.Knowledge entry can be taked any form, for example, and individual character, term, phrase, document or some other structurized and non-structured information.Embodiment described here is relevant with key word usually, but embodiment can operate on the knowledge entry of any kind.
The document engine 124 that illustrates comprises pretreater 134, implication processor 136 and matched-field processors 137.In the illustrated embodiment, each includes the computer code that resides in the storer 118.Document engine 124 receives the requests for content that is positioned on the source document.This request can receive from the device that is connected to network 106.Content can comprise document, for example webpage and advertisement, and knowledge entry, for example key word.Pretreater 134 reception sources documents are also analyzed source document, with the notion determining to comprise in the document and the district in the document.Notion can be with relevant with it bunch, or word collection or term define, and for example, wherein word or term can be synonyms.Notion also can define with various other information, for example, and the relation of related notion, the relationship strength of related notion, part of speech, common usage, frequency of utilization, notion width and other statistics about the usage of notion in language.Implication processor 136 concept of analysis and district are to eliminate and the irrelevant district of the subject concept of source document.Implication processor 136 is determined the source implication of source document from remaining district then.Matched-field processors 137 is complementary with the source implication of source document with from the implication of the clauses and subclauses of one group of clauses and subclauses.
The storer 144 of content server device 140 comprises content engine application program, promptly said content engines 146.In the illustrated embodiment, content engines comprises the computer code that resides in the storer 144.Content engines 146 receives the coupling clauses and subclauses from document server device 104, and these clauses and subclauses or the content relevant with these clauses and subclauses are placed in the source document.In one embodiment, the match keywords that content engines 146 receives from matching engine 137, and document (for example advertisement) and its are associated.Then advertisement is sent to requestor's website, and be placed in the source document (for example framework on the webpage).
Document server device 104 also provides the visit to other memory elements in the implication database 120 that illustrates in this example (for example implication memory element).The implication database can be used for storing the implication relevant with source document.Content server device 140 also provides the visit to other memory elements in the content data base 148 that illustrates in an embodiment (for example content storage element).Content data base can be used for store items and the content relevant with clauses and subclauses, for example key word and relevant advertisement.Data storage elements can comprise the combination of any data storing method or several different methods, includes but not limited to array, Hash table, the tabulation and to (pair).The data storage device of other similar type can serviced apparatus 104 and 140 visits.
Be noted that the present invention can comprise the system with structure different with the structure shown in Fig. 1.For example, in systems more according to the present invention, pretreater 134 and implication processor 136 can not be the parts of document engine 124, and can off line carry out their operation.In one embodiment, when document engine was creeped document (for example webpage), the implication of document was periodically determined.In another embodiment, when receiving when being placed on the requests for content in the document, the implication of document is determined.System 100 shown in Fig. 1 is just typical, and is used to explain the typical method shown in Fig. 2-3.
In the exemplary embodiments shown in Fig. 1, user 112a can visit the document on the device that is connected to network 106, for example webpage on the website.For example, user 112a can visit to comprise about fly fly at Washington on news website and angle the webpage of the story of (fly fishing) salmon.In this example, webpage comprises four districts: title division comprises a word summary of title, author and the story of story; Main story part comprises the text and the picture of story; Relate to the banner of selling automobile; And the link part, cover the link of other webpages on this website (for example national news, weather and physical culture).The owner of news website may want to sell the advertising space on the source web page, thereby makes clauses and subclauses (for example advertisement) be presented at the request on the webpage via network 106 to archive server 104 transmissions.
For source web page and clauses and subclauses are complementary, at first determine the implication of source web page.Document engine 124 access originator webpages, and can receive this webpage.The source implication of webpage may before be determined, and can be stored in the implication database 120.If the source implication before had been determined, document engine 124 is retrieved the source implications so.
If the source implication of webpage also is not determined, then pretreater 134 is at first discerned the district that comprises in the notion that comprises in the webpage and the webpage.For example, pretreater can determine that webpage has four districts, and corresponding to header area, story district, banner district and link zone, and webpage comprises about salmon, flies that fly is angled, the notion of Washington, automobile, news, weather and physical culture.These districts needn't be corresponding to the framework on the webpage.The implication engine is determined the local concept in each district then, and arranges portion of owning administration notion.Can use multiple weighting coefficient and arrange these notions, for example, the frequency of the importance in district, the importance of notion, notion, the quantity in district that this notion occurs and the width of notion.
Implication engine 136 is discerned and the irrelevant district of most of notion then, and the deletion local concept relevant with them.In this example, the banner district does not comprise the notion relevant especially with story with link zone, thereby the notion that relates to these districts is deleted.The implication engine is determined the source based on the notion of remainder then.Implication can be the vector of the notion of weighting.For example, implication can be salmon (40%), fly fly and angle (40%), and Washington (20%).
This implication can be matched clauses and subclauses by matched-field processors 137.Clauses and subclauses can comprise, document, and for example webpage and advertisement, and knowledge entry, key word for example, and can receive from content server device 140.Clauses and subclauses can be stored in the content data base 148.For example, if clauses and subclauses are key words, for example, fly that fly is angled, knapsack, CD and travelling, then matching engine compares the source implication with the implication relevant with key word, mates determining.Can use discrepancy factor (biasing factor), the cost of for example relevant each click data with each key word.For example, if it is a coupling more approaching than the implication of keyword travel that key word flies the implication that fly angles, but the current advertiser that has bought keyword travel has higher each clicking rate cost, and the implication engine can be with source implication and keyword travel coupling.Content filter also can be used to filter out in the adult perhaps sensitive content.
The key word of coupling can be received by content server device 140.Content engines 146 is related with the key word of advertisement and coupling, and advertisement is presented on the source web page.For example, if this key word of travelling has been mated, then content engines will angle the demonstration advertisement relevant with keyword travel on the source web page of salmon story comprising about fly fly at Washington.If user 112a points to its input media advertisement and clicks it, then the user can be directed into the webpage relevant with this advertisement.
Can carry out the whole bag of tricks according to the present invention.A typical method comprises the access originator article according to the present invention; The a plurality of districts of identification in source article; Determine at least one local concept relevant with each district; The local concept that analyzes each district is to discern any unrelated regions; Delete the local concept relevant to determine related notion with any unrelated regions; Analyze related notion to be identified for the source implication of this source article; And with source implication and clauses and subclauses implication coupling, this clauses and subclauses implication is with relevant from the clauses and subclauses of one group of clauses and subclauses.Can use discrepancy factor so that source implication and clauses and subclauses implication are mated.The source implication can be the vector of the notion of weighting.
In certain embodiments, this method also is included in the clauses and subclauses that show coupling on the source article.In these embodiments, source article can be a webpage, and the clauses and subclauses of coupling can be key words.Alternatively, source article can be a webpage, and the clauses and subclauses of coupling can be advertisements.
In certain embodiments, this method also is included in and shows on the source article and the relevant content of coupling clauses and subclauses.In these embodiments, source article can be a webpage, and the clauses and subclauses of coupling can be key words, and relevant content can be advertisement.In addition, source article can be first webpage, and the clauses and subclauses of coupling can be second webpages, and relevant content can be advertisement.Alternatively, source article can be first webpage, and the clauses and subclauses of coupling can be second webpages, and relevant content can be the link to second webpage.
In certain embodiments, determine that each local concept that at least one local concept is related in each district determines mark.The local concept that has highest score in each district is maximally related local concept.In addition, the identification unrelated regions relates to the correction mark of at first determining each local concept.Next, based on revising mark, determine to comprise the global listings through arranging of all local concepts.Deletion merge to be revised mark to the contribution of the global listings local concept less than the scheduled volume of gross score, with the tabulation that bears results.Then, determine in the results list, to have the unrelated regions of least relevant local concept.From the results list, delete the local concept relevant then, to generate the tabulation of related notion with unrelated regions.And the correction mark that is used for related notion by normalization is determined the source implication.
Another typical method according to the present invention comprises the access originator article; In source article, discern first content district and second content district at least; Determine at least the first local concept relevant, and determine at least the second local concept relevant with the second content district with the first content district; At least in part based on first local concept, with the first content district with from the first entry coupling of one group of clauses and subclauses; And at least in part based on second local concept, with the second content district with from the second entry of one group of clauses and subclauses coupling.
Fig. 2 at length shows according to typical method 200 of the present invention to Fig. 3.Because the mode of multiple execution the method according to this invention is arranged, the mode with example provides typical method here.Method 200 shown in Fig. 2 can be carried out by various system, perhaps realizes.The method of carrying out by system shown in Figure 1 100 below by case description 200, and when the case method of key drawing 2 to Fig. 3 each element of frame of reference 100.The method 200 that illustrates provides determining of source document implication, with source document and entries match.
Each piece shown in Fig. 2 and Fig. 3 is illustrated in one or more steps of carrying out in the typical method 200.With reference to Fig. 2, in piece 202, case method 200 beginnings.After the piece 202 is piece 204, and document is accessed in this piece.For example document can and receive by the visit of the device on network 106 or other sources.
After the piece 204 is piece 206, determines the implication of source document in this piece.In the illustrated embodiment,, delete the notion that comprises in useless district and the analytical documentation remaining area, determine the implication of source document by with the document subregion.For example, in the illustrated embodiment, the notion that comprises in pretreater 134 initial definite source documents, and the district in definite document.Implication processor 136 is arranged notions, and removes and irrelevant district of most of notion and relevant notion.From remaining notion, implication processor 136 is determined the source implication of document.
Fig. 3 shows the subroutine 206 of the method 200 that is used for shown in the execution graph 2.Subroutine 206 provides the implication of the source document that receives.An example of subroutine is as follows.
Subroutine begins at piece 300 places.At piece 300 places, the pretreated notion of source document to determine to comprise in the document.This can realize literal and notion correspondence (align) then by natural language and text-processing so that document is construed to literal.In one embodiment, for example, at first determine mark corresponding to literal, the indicia matched that comprises in the semantic network with these marks and interconnection implication then by natural language and text-processing.From the mark of coupling, from semantic network, determine term then.The notion that is used to the term determined then is designated, and provides the possibility relevant with term.
After the piece 300 is piece 302, the district of identification document in this piece.For example,, comprise formatted message, can determine the district of document based on specific search procedure (heuristics).For example, for a source document, it is a webpage that comprises html tag, and these labels can be used for helping cog region.For example, at<title〉....</title〉text in the label can be marked as the text of header area.Surpassing 70% text therein is at label<a〉....</a in paragraph in text can be labeled as at link zone.The structure of text also can be used in the help cog region.For example, the hurdle in text in the short paragraph or the table does not have sentence structure, for example, does not have verb, few word or does not have punctuate to finish sentence, can be labeled as to be in the list area.Have the text in long sentence of verb and punctuate, can be labeled as the part text area.When district's type change, can begin to create the newly developed area from the text that is marked with newtype.In one embodiment, if text area obtains to surpass 20% document, then can be divided into smaller piece.
After the piece 302 is piece 304, determines the related notion in each district in this frame.In the illustrated embodiment, implication processor 136 is treated to the notion of each district's identification, thinks that each district proposes one group of less local concept.Relation between notion, the frequency that notion occurs in the district and the width of notion can be used in determining of local concept.
In one embodiment, for each district, each notion is placed in the tabulation.By using the multiple factor to determine mark, notion is arranged in the tabulation for each notion.For example, if first notion has very strong the getting in touch with other notions, this can be used to improve the mark of first notion and relevant notion thereof.Regulate this effect by the frequency of first notion appearance and the focus (or width) of first notion, to reduce the wider notion of very general concept and meaning.But the notion of rejection frequency on certain threshold value.The discernable importance of notion also can influence the mark of notion.For example, can in processing procedure, determine the importance of notion earlier by causing whether the word that comprises notion is used the runic mark.After the notion in each district is arranged, remove least relevant notion.This can be by selecting one group of highest level notion or remove the notion that the rank mark is lower than certain mark and realize.
After the piece 304 is piece 306, in this piece, merges and analyze all local concepts in each district.In the illustrated embodiment, implication processor 136 receives all local concepts in each district, and, create the global listings through arranging of all local concepts by the mark of for example each local concept.Discrepancy factor (for example importance in each district) can be used for determining mark.The importance in each district can be determined by the type in district and the size in district.For example, it is more important than link zone that the header area can be considered to, and the notion that appears at the header area can be given more weighting than the notion that appears at link zone.Can give extra weighting to the notion that appears at more than a district.For example, the copy of notion can merge, and their mark can add together.This global listings is classified then, for example, can delete end position (trailing) notion of 20% that contribution is less than gross score, to generate the global listings as a result of local concept.
After the piece 306 is piece 308, in this frame, and its irrelevant district of deletion main concept nothing to do with notion.In the illustrated embodiment, implication processor 136 is determined unrelated regions, is comprised the district of the notion that has nothing to do with most of notion, and with they deletions.Should be appreciated that " relevant " and " irrelevant " do not need to determine with absolute standard." relevant " is the indication of higher relatively relationship degree and/or predetermined relationship degree." irrelevant " is the indication of relatively low relationship degree and/or predetermined relationship degree.By the deletion unrelated regions, relevant unrelated concepts is deleted.For example, if source document is the webpage of being made by various frameworks, some frameworks relate to advertisement or the link of other webpages to the website, thereby, will be irrelevant with the main meaning of webpage.
In one embodiment, for example, the global listings of determining in the piece 306 as a result can be the approximate value of document implication, and can be used for removing and the incoherent district of document implication.For each district, whether the most representative local concept that implication processor 136 can be identified for this district is not present in as a result in the global listings.If at global listings as a result, then this district can not be labeled as uncorrelated in the most representative local concept that is used for distinguishing.For example, the most representative local concept that is used for distinguishing can be the notion with highest score as piece 304 determined these districts.
After the piece 308 is piece 310, in this piece, determines the implication of source document.In the illustrated embodiment, implication processor 136 recomputates the representativeness of the local concept in the district that does not have deletion, to create the list related of notion.Local concept in list related can be chosen the notion of fixed qty so that the implication tabulation to be provided, and normalization is to provide the source implication then.For example, can only use the notion that comprises in the relevant district to create the implication tabulation, and from new tabulation, remove all notions except 25 top scores.The mark of top score notion can be by normalization to provide the source implication.In this example, the source implication can be the weighing vector of related notion.
Coming with reference to Fig. 2, is piece 208 after the piece 206 again, receives one group of clauses and subclauses in this piece.For example, can receive clauses and subclauses from content server device 140 by matched-field processors 137.Clauses and subclauses can comprise knowledge entry, for example, and key word, and document, for example, advertisement and webpage.Each clauses and subclauses that receives can have an implication relevant with it.For keyword meanings, for example, can determine by using the information relevant with key word, as being 10/690 in relevant U.S. Patent Application Serial Number, 328 (attorney docket number No.53051/288072), title is that it is incorporated into this for your guidance described in " Methods and Systems for Understanding a Meaning o f aKnowledge Item Using Information Associated with the KnowledgeItem ".Can be to determine the implication of document with the mode that mode is identical as described in Figure 3.
After the piece 208 is piece 210, in this piece with source document and entries match.In matching process, can use discrepancy factor.For example, in one embodiment, with source implication and the keyword meanings coupling that is associated with key word from a set of keyword.Matching engine is compared source implication and keyword meanings, and uses discrepancy factor, and for example relevant with these key words each click data cost is to determine coupling.The key word of coupling can be sent to content server device 140 then.Content engines 146 can be with the relative advertisement of the key word coupling of coupling, and on source document display ads.Alternatively, content engines can show key word itself on source document.In another embodiment, the implication and the source implication of advertisement are mated.In this embodiment, content engines 146 advertisement that can cause mating is presented on the source document.In another embodiment, the implication and the source implication of webpage are mated.In this embodiment, content engines 146 can cause the demonstration of the advertisement relevant with webpage.After the piece 210 is piece 212, and in this piece, this method finishes.
In one embodiment, after source document was accessed, pretreater 134 was analyzed source document, to determine the content regions of source document.Content regions can be the district that comprises a large amount of texts, and for example, text area or link zone maybe can be important relatively districts, for example, and the header area.Can determine these districts by using aforesaid search procedure.As mentioned above, pretreater 134 also can be discerned the notion that is positioned at each content regions.Implication processor 136 can use these notions, to determine the implication of each content regions.Matched-field processors 137 can be with the implication and the keyword matching of each content regions.Content engines 146 can mate the key word advertisement relevant with it of coupling, and on source document display ads.Alternatively, content engines can show key word itself on source document.In another embodiment, implication and district's implication of advertisement are mated.In this embodiment, content engines 146 advertisement that can cause mating is presented on the source document.In another embodiment, with the implication of webpage and the implication coupling in district.In this embodiment, content engines 146 can cause the demonstration of the advertisement relevant with webpage.In one embodiment, advertisement or key word are displayed in the content regions with its coupling.
The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (22)

1. a method is used for article is matched to clauses and subclauses, may further comprise the steps:
The reception sources article;
Discern a plurality of districts in the described source article;
Determine at least one local concept of expression in each district;
Analyze the described local concept in each district, to discern one or more uncorrelated notions;
Source implication by described source article determine to delete described uncorrelated notion; And
Make described source article based on the relation between the implication of described source implication and described clauses and subclauses at least in part and be complementary with the clauses and subclauses that are selected from one group of clauses and subclauses.
2. method according to claim 1 further is included in the clauses and subclauses that show coupling on the described source article.
3. method according to claim 2, wherein, described source article comprises webpage, and the clauses and subclauses of described coupling comprise key word.
4. method according to claim 2, wherein, described source article comprises webpage, and the clauses and subclauses of described coupling comprise advertisement.
5. method according to claim 1 further is included in the clauses and subclauses associated content that shows on the described source article with coupling.
6. method according to claim 5, wherein, described source article comprises webpage, the clauses and subclauses of described coupling comprise key word, and described associated content comprises advertisement.
7. method according to claim 5, wherein, described source article comprises first webpage, the clauses and subclauses of described coupling comprise second webpage, and described associated content comprises advertisement.
8. method according to claim 5, wherein, described source article comprises first webpage, the clauses and subclauses of described coupling comprise second webpage, and described associated content is included in the link of described second webpage.
9. method according to claim 1 wherein, is complementary described source article and described clauses and subclauses and comprises that the use discrepancy factor is complementary described source article and described clauses and subclauses.
10. method according to claim 1, wherein, described source implication comprises the vector of weighted concept.
11. method according to claim 1, wherein,
Determine that at least one local concept comprises the mark of determining each local concept; And
The described local concept that has highest score in each district comprises the most relevant local concept that is used for described district.
12. method according to claim 11 wherein, is discerned uncorrelated district and be may further comprise the steps:
Determine the correction mark of each local concept;
Determine the global listings through arranging of all local concepts based on described correction mark;
Remove to merge and revise the local concept of mark contribution, with the tabulation that bears results less than the scheduled volume of the gross score of described global listings;
Be identified in and do not have the most uncorrelated district of relevant local concept on described the results list; And
Remove the local concept that described uncorrelated district, represents from described the results list, to produce the related notion tabulation.
13. method according to claim 12 wherein, determines that described source implication comprises the described correction mark of the described related notion of normalization.
14. a method that is used for the district of article is matched to advertisement may further comprise the steps:
The access originator article;
Be identified in first content district and second content district in the described source article;
Determine to be illustrated in first local concept in the described first content district, and determine to be illustrated in second local concept in the described second content district;
At least in part based on described first local concept, the described first content district and first advertisement from one group of advertisement are complementary; And
At least in part based on described second local concept, the described second content district and second advertisement from described one group of advertisement are complementary.
15. method according to claim 14 wherein, shows that the advertisement of being mated comprises:
In described first content district, show described first advertisement; And
In described second content district, show described second advertisement.
16. method according to claim 14, wherein, described source article comprises webpage.
17. the method for a definite document implication comprises:
Receive document;
Described document is divided into zones of different;
Determine to be illustrated in the notion in the described document;
Identification Lists is shown in first notion in the first area, and wherein said first notion is uncorrelated with the most of notion that is illustrated in the described document;
Remove described first notion by the source implication of determining described document;
Determine the described source implication of described document; And
Described source implication can be used for the user.
18. method according to claim 17, wherein, the zones of different of discerning in the described document comprises the framework of discerning in the web document.
19. method according to claim 17 wherein, is discerned zones of different in the described document and is comprised based on the mark in the described document and discern described zones of different.
20. method according to claim 17 wherein, determines that notion comprises: be identified in the set of the related words in each zone of zones of different.
21. method according to claim 17 further comprises:
Based on the described notion in each zone that is illustrated in the described zones of different, select one or more advertisements; And
Make the exportable user of giving of described document and selected advertisement.
22. method according to claim 17 is wherein discerned described first notion and is comprised:
Determine the mark of each described notion; And
Determine to comprise the global listings through arranging of all described notions based on described mark; And
Remove described first notion and comprise that removal merges the local concept of the contribution of mark less than the amount of the gross score of determined described global listings.
CNB2004800219225A 2003-07-30 2004-07-23 Methods and systems for determining a meaning of a document to match the document to conte Active CN100470541C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US49142203P 2003-07-30 2003-07-30
US60/491,422 2003-07-30
US10/689,903 2003-10-21

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN2009100062933A Division CN101482881B (en) 2003-07-30 2004-07-23 Methods and systems for determining a meaning of a document to match the document to content

Publications (2)

Publication Number Publication Date
CN1829990A CN1829990A (en) 2006-09-06
CN100470541C true CN100470541C (en) 2009-03-18

Family

ID=36947555

Family Applications (3)

Application Number Title Priority Date Filing Date
CNA200480021909XA Pending CN1829989A (en) 2003-07-30 2004-07-23 Methods and systems for understanding a meaning of a knowledge item using information associated with the knowledge item
CNB2004800219225A Active CN100470541C (en) 2003-07-30 2004-07-23 Methods and systems for determining a meaning of a document to match the document to conte
CN2009100062933A Active CN101482881B (en) 2003-07-30 2004-07-23 Methods and systems for determining a meaning of a document to match the document to content

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CNA200480021909XA Pending CN1829989A (en) 2003-07-30 2004-07-23 Methods and systems for understanding a meaning of a knowledge item using information associated with the knowledge item

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN2009100062933A Active CN101482881B (en) 2003-07-30 2004-07-23 Methods and systems for determining a meaning of a document to match the document to content

Country Status (2)

Country Link
JP (2) JP4825669B2 (en)
CN (3) CN1829989A (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4740630B2 (en) * 2005-04-14 2011-08-03 株式会社リコー Fashion creation advertising method and program
US7958126B2 (en) * 2006-12-19 2011-06-07 Yahoo! Inc. Techniques for including collection items in search results
CN101004753B (en) * 2007-01-25 2010-08-11 北京搜狗科技发展有限公司 Method and system for recognizing conception type files
CN101404031B (en) * 2008-11-12 2012-05-30 北京搜狗科技发展有限公司 Method and system for recognizing concept type web pages
JP2010237779A (en) * 2009-03-30 2010-10-21 Mitsubishi Space Software Kk Advertisement selection server, advertisement selection method, and program
JP2010250827A (en) 2009-04-16 2010-11-04 Accenture Global Services Gmbh Touchpoint customization system
US9177057B2 (en) 2010-06-08 2015-11-03 Microsoft Technology Licensing, Llc Re-ranking search results based on lexical and ontological concepts
US9779385B2 (en) * 2011-06-24 2017-10-03 Facebook, Inc. Inferring topics from social networking system communications
CN105335163A (en) * 2015-11-30 2016-02-17 上海斐讯数据通信技术有限公司 Software code reading method and system
CN108363696A (en) * 2018-02-24 2018-08-03 李小明 A kind of processing method and processing device of text message
CN111507813B (en) * 2020-04-21 2023-05-12 江西省机电设备招标有限公司 Bidder identity identification method and bidding method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5960383A (en) * 1997-02-25 1999-09-28 Digital Equipment Corporation Extraction of key sections from texts using automatic indexing techniques
US6473730B1 (en) * 1999-04-12 2002-10-29 The Trustees Of Columbia University In The City Of New York Method and system for topical segmentation, segment significance and segment function
JP2001337984A (en) * 2000-05-30 2001-12-07 Sony Communication Network Corp Advertisement system, advertisement device and advertisement method
JP4489994B2 (en) * 2001-05-11 2010-06-23 富士通株式会社 Topic extraction apparatus, method, program, and recording medium for recording the program
JP4729736B2 (en) * 2001-07-27 2011-07-20 学校法人日本大学 Internet search result modification apparatus and program
CN1185595C (en) * 2001-09-05 2005-01-19 联想(北京)有限公司 Jamproof theme word extracting method

Also Published As

Publication number Publication date
JP2007500899A (en) 2007-01-18
JP4825669B2 (en) 2011-11-30
CN1829990A (en) 2006-09-06
JP2007500900A (en) 2007-01-18
CN101482881A (en) 2009-07-15
CN101482881B (en) 2013-12-11
JP4829789B2 (en) 2011-12-07
CN1829989A (en) 2006-09-06

Similar Documents

Publication Publication Date Title
AU2010241249B2 (en) Methods and systems for determining a meaning of a document to match the document to content
US8635107B2 (en) Automatic expansion of an advertisement offer inventory
TWI544352B (en) System and method to facilitate matching of content to advertising information in a network
CN100517304C (en) Method sorting result page
US8219577B2 (en) Apparatus and method product for presenting recommended information
US9189562B2 (en) Apparatus, method and program product for classifying web browsing purposes
JP4750814B2 (en) Advertising method and system for exposing contextual advertising information
US20100017390A1 (en) Apparatus, method and program product for presenting next search keyword
JP5442401B2 (en) Behavior information extraction system and extraction method
CN100470541C (en) Methods and systems for determining a meaning of a document to match the document to conte
KR100902674B1 (en) Method and system for serving document exploration service
Coste et al. Advances in clickbait and fake news detection using new language-independent strategies
US20090198552A1 (en) System and process for identifying users for which cooperative electronic advertising is relevant
WO2009099842A2 (en) System and process for generating a user model for use in providing personalized advertisements to retail customers
JP2020057188A (en) Providing apparatus, providing method and providing program
WO2009097362A1 (en) System and process for selecting personalized non-competitive electronic advertising
US20090199233A1 (en) System and process for generating a selection model for use in personalized non-competitive advertising
KR102451020B1 (en) A method of company-customized intelligent content curation using web crawling function
KR102602936B1 (en) Electronic device for generating short form based on collected data through artificial intelligence and method using the same
US20090198551A1 (en) System and process for selecting personalized non-competitive electronic advertising for electronic display
US20090198554A1 (en) System and process for identifying users for which non-competitive advertisements is relevant
WO2024074760A1 (en) Content management arrangement
US20090198555A1 (en) System and process for providing cooperative electronic advertising
AU2011235994A1 (en) Methods and systems for determining a meaning of a document to match the document to content
Zhou et al. A keyword extraction based model for web advertisement

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CI02 Correction of invention patent application

Correction item: Priority

Correct: 2003.10.21 US 10/689,903

False: Lack of priority second

Number: 36

Page: The title page

Volume: 22

COR Change of bibliographic data

Free format text: CORRECT: PRIORITY; FROM: MISSING THE SECOND ARTICLE OF PRIORITY TO: 2003.10.21 US 10/689,903

C14 Grant of patent or utility model
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: American California

Patentee after: Google limited liability company

Address before: American California

Patentee before: GOOGLE Inc.