US7519588B2 - Keyword characterization and application - Google Patents
Keyword characterization and application Download PDFInfo
- Publication number
- US7519588B2 US7519588B2 US11/452,709 US45270906A US7519588B2 US 7519588 B2 US7519588 B2 US 7519588B2 US 45270906 A US45270906 A US 45270906A US 7519588 B2 US7519588 B2 US 7519588B2
- Authority
- US
- United States
- Prior art keywords
- keyword
- objects
- documents
- collection
- characterizations
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active - Reinstated, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99932—Access augmentation or optimizing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99934—Query formulation, input preparation, or translation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99935—Query augmenting and refining, e.g. inexact access
Definitions
- the present invention relates to the field of data processing, in particular, to methods and apparatuses for keyword characterization, having particular application to advertising associated with information search using a search engine.
- search engines exist to make information accessible. Among the kinds of information promulgated by search engines is advertising.
- the display of advertisements (“ads”) is often mediated by a bidding system—an advertiser bids on a keyword and the placement of his ad on the search result page for that keyword depends on, possibly among other factors, his bid.
- the click-through rate on the ad is a function of its placement.
- FIG. 1 illustrates an overview of various embodiments of the present invention, processing documents and/or objects determined to be potentially relevant to a keyword to extract keyword characterizations for use as proxies for the keyword;
- FIG. 2 illustrates a flow chart view of selected operations of the methods of various embodiments of the present invention, to extract keyword characterizations from documents and/or objects determined to be relevant to the keyword;
- FIG. 3 illustrates a block diagram depicting a method of processing web page results comprising a collection of documents and/or objects to extract one or more keyword characterizations for use as proxies for the keyword, in accordance with various embodiments;
- FIG. 4 is a block diagram illustrating an example computing device suitable for use to practice the present invention, in accordance with various embodiments.
- Illustrative embodiments of the present invention include, but are not limited to, methods and apparatuses for receiving a collection of documents and/or objects determined to be potentially relevant to a keyword, and processing the collection of documents and/or objects to extract one or more keyword characterizations for use as proxies for the keyword.
- the one or more keyword characterizations may be used to compute a measure of keyword similarity for the keyword, facilitate keyword behavior modeling of the keyword, and/or find one or more advertisements.
- keyword may refer to any word, string, token, phrase, or set of words (which may or may not be ordered), strings, tokens, or linguistic constructs that may be searched upon by a user. “Keyword” may also refer to non-linguistic constructs, such as a partial image that may be used in an image search.
- the phrase “in one embodiment” is used repeatedly. The phrase generally does not refer to the same embodiment; however, it may.
- the terms “comprising,” “having,” and “including” are synonymous, unless the context dictates otherwise.
- the phrase “A/B” means “A or B”.
- the phrase “A and/or B” means “(A), (B), or (A and B)”.
- the phrase “at least one of A, B and C” means “(A), (B), (C), (A and B), (A and C), (B and C) or (A, B and C)”.
- the phrase “(A) B” means “(B) or (A B)”, that is, A is optional.
- FIG. 1 illustrates an overview of various embodiments of the present invention, processing documents and/or objects determined to be potentially relevant to a keyword to extract keyword characterizations for use as proxies for the keyword.
- search results 108 comprising a collection of documents and/or objects determined to be potentially relevant to a keyword 102 , may be received and utilized by a search results characterization process 110 .
- the search results 108 may be received from a search engine 104 , which may take a keyword 102 as input and search a keyword relational database 106 or some other electronic information corpus (based on the keyword 102 ), obtaining a collection of documents and/or objects as search results 108 .
- the search results characterization process 110 may process the search results 108 to extract keyword characterizations for use as proxies for the keyword 102 , and the keyword characterizations, in some embodiments, may then serve as inputs to one or more other processes, such as keyword behavior modeling process 112 or keyword similarity measurement process 114 .
- search engine 104 may be separate processes of a computer system. In other embodiments, they may be sub-processes of one or more processes of the computer system. In yet other embodiments, processes 104 , 110 , 112 , and 114 may be modules of the computer system. For ease of understanding, however, the processes 104 , 110 , 112 , and 114 will be described as separate processes of a computer system throughout the following description of FIG. 1 , but shall not be read as limiting on the scope of the invention.
- the various processes and data illustrated by FIG. 1 may be processes and data of a computer system (not shown), such as the exemplary computer system illustrated by FIG. 4 , which is described in greater detail below.
- the computer system except for keyword 102 , search engine 104 , database 106 , search results 108 , search results characterization process 110 , keyword behavioral modeling process 112 , keyword similarity measurement process 114 , and other processes utilizing the extracted keyword characterizations (discussed below), may be any single- or multi-processor computing system known in the art, such as a personal computer (PC), a workstation, a server, a router, a mainframe, a personal digital assistant (PDA), an entertainment center, a set-top box, or a mobile device.
- PC personal computer
- PDA personal digital assistant
- the computer system may additionally comprise one or more networking interfaces (not shown) connecting the computer system to a networking fabric (not shown), facilitating a web browser of the computer system in interacting with a search engine 104 to search an electronic information corpus, such as the World Wide Web.
- the networking interfaces may be of any sort known in the art, such as Ethernet, Bluetooth, WiFi (802.11), or 3 G interfaces, providing connectivity to a wired or wireless networking fabric.
- the processes and data illustrated by FIG. 1 are, instead, a series of distributed processes of a plurality of computer systems connected by a networking fabric.
- the keyword relational database 106 is located on a database server and the search engine 104 on a web application server, both servers separate from the computer system or systems having the other data and processes illustrated by FIG. 1 .
- the search engine 104 on a web application server, both servers separate from the computer system or systems having the other data and processes illustrated by FIG. 1 .
- FIG. 1 For ease of understanding, however, reference to one computer system possessing each of the data and processes depicted in FIG. 1 will be made throughout the following description.
- the computer system described above may be connected to a networking fabric (not shown) which, in some embodiments, may provide access to the World Wide Web and/or some other electronic information corpus, including access to a search engine 104 , which may be a web application provided by a remote web application server.
- the networking fabric may a local area network, a wide area network, or the Internet. Further the connections between the various computer systems of the networking fabric may be of any sort known in the art, such as transmission control protocol/Internet protocol (TCP/IP) connections or asynchronous transfer mode (ATM) virtual connections.
- TCP/IP transmission control protocol/Internet protocol
- ATM asynchronous transfer mode
- the computer system of FIG. 1 may receive or generate a keyword 102 .
- a plurality of keywords may, instead, be received or generated.
- the keyword 102 may be any word, string, token, phrase, or set of words (which may or may not be ordered), strings, tokens, or linguistic constructs that may be searched upon by a user.
- the keyword 102 may also refer to non-linguistic constructs, such as a partial image that may be used in an image search.
- keyword 102 may be a word, a set of words, or phrase that is used by a consumer to search for a specific product or service, and is thus of interest to merchants of that product or service.
- the keywords may be generated by a keyword generator generating keywords at random or, based upon a set of criteria provided by a merchant or some other user, or generated by a keyword generation method, such as the method disclosed in “Keyword Generation Method and Apparatus,” a co-pending patent application Ser. No. 11/371,267, filed on Mar. 8, 2006.
- the keyword 102 may actually be a keyword characterization extracted by search results characterization process 110 that may then be searched upon by search engine 104 as a keyword 102 in its own right.
- keyword 102 may be received by the computer system rather than generated. The keyword 102 may be received via an input device, a networking fabric or through a storage medium, and may have been previously generated by any of the above methods.
- the keyword 102 may be input to a search engine 104 , which may search the database 106 , an electronic information corpus, or the World Wide Web based on the keyword 102 .
- search engine 104 may be a Google or Yahoo! web search engine.
- the search engine 104 may be accessed via a web browser (not shown) of the computer system, such as the Internet Explorer web browser produced by Microsoft of Redmond, Wash., or the Firefox web browser of Mozilla Foundation of Mountain View, Calif.
- the search engine 104 may be a process of the computer system rather than a web application accessible via a web browser.
- the computer system may possess a plurality of search engines 104 , including a first accessible via a web browser (e.g., Google) and a second executing on the computer system as a search engine 104 process.
- the first search engine 104 may be used to search the World Wide Web and/or an electronic corpus of web pages and data objects
- the second search engine 104 may be used to search the keyword relational database 106 .
- either search engine 104 may search one or all of database 106 , the World Wide Web, and an electronic corpus.
- the search engine 104 may possess a user interface, such as a graphic user interface, to facilitate a user in conducting the search. In some embodiments, however, no user need be involved in the search and the search engine 104 process may perform its functions automatically, at the request of another process.
- keyword 102 may also be input to search engine 104 to perform a passive search.
- a passive search is a search initiated by a process to inform a user, attempting to answer user queries before they are submitted by predicting the queries and returning the results.
- a user's cell phone may keep track of its location via GPS technology and may use the location information as keyword 102 to search for and retrieve geographically close destinations that may be of interest to the user, such as a coffee house or restroom.
- the computer system of FIG. 1 may comprise a keyword relational database 106 , which may be any sort of relational database capable of organizing data into entities and representing the relationships between those entities.
- database 106 may be another sort of database, which may or may not store normalized data.
- database 106 may store a number of web pages and data objects.
- the search engine 104 may perform a lookup function in database 106 , based on the keyword 102 , to produce the search results 108 . These search results may be web pages and/or data objects that the search engine 104 determines may be relevant to the keyword 102 .
- the search engine 104 may search an electronic information corpus or the World Wide Web and receive a search results page indicating the most relevant web pages and/or data objects. In such embodiments, the search engine 104 or some related process may then retrieve and collect the web pages and/or data objects, forming the search results 108 . In other embodiments, the search engine 104 may receive the web pages and/or data objects themselves rather than a results page with links, as described above. The web pages and/or data objects may be retrieved from one or more remote computer systems connected to the computer system via a networking fabric.
- the search results 108 may comprise a collection of documents and/or objects determined to be potentially relevant to the keyword 102 .
- the search results 108 may be web pages in which keyword 102 appears or web pages in which the constituent words of keyword 102 appear.
- the web pages may be documents of any format known in the art and used to display web pages, including HTML format, HTM format, and PDF format, among many others.
- the documents of search results 108 need not be web pages however but, rather, may be any sort of document containing the keyword 102 or constituent words of the keyword 102 . Such documents may be found in some electronic corpus rather than on the World Wide Web.
- Search results 108 may also include data objects, which may be annotated with keywords.
- the search engine 104 may find data objects annotated with keyword 102 , with keywords identical to words of keyword 102 , or with keywords semantically similar to keyword 102 or a word of keyword 102 .
- keyword 102 is an image
- search engine 104 may also find non-textual data objects that have not been annotated with keywords.
- Exemplary data objects may or may not be annotated with keywords and may be textual, partially textual, and non-textual in nature.
- Some types of data objects are: images, video files, programs, files of any type, and even items such as companies, descriptions of molecules, etc.
- the data objects may be made searchable, possibly through a keyword driven interface, such as search engine 104 .
- the data objects may be made searchable on the basis of the text composing them (in the case of documents), the text associated with them (such as annotations on a photograph, commentary, reviews or scripts associated with a movie or tv show) chemical constituents of a molecule, close parentheses, created ambiguity, or some other feature-set derived directly or indirectly from the data objects.
- corpora-based methods may produce results such that even searches that do not have any terms in common with the annotation keywords may match them.
- the number of documents and/or objects comprising search results 108 may be limited to a pre-determined threshold number of the most relevant web pages and/or objects produced by the search engine 104 .
- the search results characterization process 110 may process the collection of documents and/or objects comprising search results 108 to extract one or more keyword characterizations for use as proxies for keyword 102 .
- the search results characterization process 110 may be incorporated into a keyword search engine, such as, for example, search engine 104 .
- Processing the search results 108 , by search results characterization process 110 may comprise at least one of: generating a spectrum of n-grams; extracting and aggregating noun phrases, proper nouns, and/or named entities; determining links to and/or from a document of search results 108 ; calculating a distance from a document of search results 108 to a set of websites or data resources; determining a distance from keyword 102 to a range of core word senses; and determining a web page of the search results 108 .
- processing search results 108 may involve generating a spectrum of n-grams, the spectrum of n-grams constituting keyword characterizations that may be used as proxies for keyword 102 .
- n-grams may be generated by obtaining search results 108 and extracting from those results one or more sequences of a number (n) of contiguous words found within the documents and/or annotated object descriptions returned by the search.
- Unigrams may be individual words; bigrams may be pairs of adjacent words, etc.
- this type of characterization of keyword 102 may result in a spectrum of n-grams, where n is typically a small positive integer equal to or greater than 1.
- An exemplary spectrum of n-grams is illustrated by FIG. 3 .
- the n-gram generating performed by search results characterization process 110 may further involve calculating the frequencies of one or more of the n-grams, where the frequencies are absolute or relative to some base-line corpus, such as search results 108 .
- the frequencies may constitute additional keyword characterizations.
- processing the search results 108 may also, or instead, involve extracting noun phrases, proper nouns, or named entities from the documents and/or annotated object descriptions and aggregating them in some way.
- ontologies may be employed to make generalizations over nouns, keywords, or noun phrases associated with keyword 102 .
- Such noun phrases, proper nouns, named entities, and/or aggregations of one or all may also comprise part or all of a keyword characterization.
- processing the search results 108 by the search results characterization process 110 may also involve determining links to and/or from a document.
- a document could be a web page of the collection of documents and/or objects.
- Keyword characterizations extracted by such processing could comprise the links to and/or from a search result page of search results 108 , or the links to and/or from a web page of search results 108 .
- processing the search results 108 could involve calculating a distance to, or association with, some core set of websites or data resources.
- a distance which may constitute a feature in a keyword characterization, could be the number of link traversals required to get between the search result page of search results 108 or a document and/or object of search results 108 and a core website or data resource.
- processing the search results 108 may further involve determining a distance metric from a word of keyword 102 to representations of a range of core word senses, the representations in some embodiments extracted from the search results 108 .
- the keyword 102 may be a set of words or a phrase comprised of words that support ambiguous interpretations, or may be a word that, itself, supports ambiguous interpretations, and, thus, a plurality of possible characterizations. For example, “bay area fencing,” fencing may support an ambiguous interpretation by potentially referring to the sport of fencing or to the construction material.
- determining a distance from a word of keyword 102 to representations of a range of core word senses may facilitate automatically disambiguating the keyword 102 , and such a distance metric may constitute at least a part of a keyword characterization for use as a proxy for the keyword 102 .
- processing the search results 108 by search results characterization process 110 may also, or instead, involve determining a document or documents, such as a web page returned by the search engine 104 for keyword 102 (rather than, for example, the pages linked to or on that page).
- processing search results 108 to extract one or more keyword characterizations may involve a number of other calculations/determinations, such as the per month frequency of searches of keyword 102 on search engine 104 .
- the one or more characterizations may be used as proxies for the keyword 102 .
- the keyword characterizations may facilitate keyword behavioral modeling of keyword 102 by the keyword behavioral modeling process 112 of the computer system.
- Keyword behavioral models may include, but are not limited to, models of keyword 102 's click-through rate, and models of revenue-generating properties of search ads linked to keyword 102 .
- a model may include a neural network or a backward propagation system, and the input keyword characterizations may include one binary or real valued feature for some subset of the n-grams associated with the keyword 102 .
- the keyword similarity measurement process 114 of the computer system may also use the keyword characterizations extracted by the search results characterization process 110 for use as proxies for the keyword 102 , computing a measure of keyword similarity for the keyword 102 .
- the n-grams may facilitate the computation of keyword similarity measures by computing the probabilities of each n-gram, taking their dot product, and weighing each n-gram according to their inverse frequency in some broad corpus.
- similarity measures may be computed by the keyword similarity measurement process 114 using a Bayesian classifier.
- the Na ⁇ ve Bayes algorithm as it is generally used for document classification, may be used by treating the keyword 102 as a document, and another keyword or one of the keyword characterizations as a category.
- a similarity measure e.g., an asymmetric one
- Keyword similarity measures may be useful for classifying keywords, finding keywords that may be relevant to a merchant, and finding keywords that may be relevant to some other keyword. Techniques of these and similar embodiments may be used for keywords having no, or limited, other data associated with it (other than the extracted keyword characterization(s), that is). For example, a keyword 102 might not be associated with any click-through data. Thus keywords relevant to a given topic may be produced in accordance with a generate-and-test methodology.
- the keyword characterizations may be used to filter a plurality of other generated keywords by the computer system.
- a method of keyword generation may produce a larger number of results than desired, and the keyword characterizations may be used to produce a subset of the generated keywords, such as a subset determined to be more optimal for a given merchant.
- the keyword characterizations may be used to filter the results from other methods of keyword generation in other contexts.
- the keyword characterizations may be used to find one or more advertisements for keyword 102 .
- a keyword 102 when searched upon in a search engine 104 , might not return any advertisements.
- keyword characterizations for use as proxies for that keyword 102 such as the distance metric to a related keyword, mentioned above, may be used to find the keyword most similar to keyword 102 , such that keyword 102 has an ad associated with it.
- the search engine 104 may be adapted to find advertisements for keyword 102 only if the keyword most similar to the keyword 102 reaches some predetermined threshold of keyword similarity.
- the advertisements found may be relevant to a domain name. This may take the form of advertisements one might wish to surface on an otherwise empty website (including, for example, when only the domain name is known).
- keyword characterization may be used in advertising contexts other than ads in search engines. For example, based on a transcript of a television show, ads may appear on the margins of the television screen. Some such other advertising contexts may include: print, radio, television, etc.
- use in various advertising contexts may include samples of text associated with each data object, including, for example, the script of an ad, reviews or an abstract of a television show, and so forth.
- FIG. 2 illustrates a flow chart view of selected operations of the methods of various embodiments of the present invention, to extract keyword characterizations from documents and/or objects determined to be potentially relevant to the keyword.
- a computer system may receive and/or generate a keyword, input the keyword into a search engine which may search based upon the keyword, and receive as search results a collection of documents and/or objects, blocks 202 - 206 .
- a computer system performing some or all of the operations illustrated by FIG. 2 may generate a keyword itself, or may receive a keyword generated by another computer system.
- the keyword may be a word, a set of words, or a phrase that is used by a consumer to search for a specific product or service, and is thus of interest to merchants of that product or service.
- the keyword may be a keyword characterization previously extracted by the computer systems, block 210 .
- the computer system may then input the keyword into a search engine, block 202 , and search based upon the keyword, block 204 .
- the search engine may search a keyword relational database, an electronic information corpus, or the World Wide Web. Based upon the search, the search engine may receive a collection of documents and/or objects, block 206 , which may comprise web pages, documents from an electronic information corpus, and/or data objects, such as audio and video files, that are determined to be potentially relevant to the keyword.
- the computer system may process the collection of documents and/or objects, extracting one or more keyword characterizations for use as proxies for the keyword, blocks 208 - 210 .
- the processing of the collection of documents and/or objects, block 208 may comprise at least one of: generating a spectrum of n-grams; extracting and aggregating noun phrases, proper nouns, and/or named entities; determining links to and/or from a document of the collection of documents and/or objects; calculating a distance from a document of the collection of documents and/or objects to a set of websites or data resources; determining a distance from the keyword to a range of core word senses; and determining a web page of the collection of documents and/or objects.
- the keyword characterizations extracted, block 210 by processing the collection of documents or objects may include n-grams, aggregations of noun phrases, proper nouns, or named entities, links, distance metrics, and web pages, all described in greater detail above.
- the computer system may optionally utilize the keyword characterizations in one or more of the following operations: computing a similarity measure, facilitating behavioral modeling, filtering keywords, and finding advertisements, blocks 212 - 218 .
- Computations of keyword similarity measurement, block 212 may involve, for example, taking a dot product of the spectrum of n-grams (where the keyword characterizations are a spectrum of n-grams), and weighing each n-gram based on an inverse frequency of that n-gram.
- computations of keyword similarity measurement, block 212 may involve Bayesian classification methods, discussed in greater detail above.
- Facilitating keyword behavioral modeling may involve inputting the keyword characterizations into models of keyword click-through and revenue generating properties, and/or may include neural networks and/or backward propagation systems. Also, the keyword characterizations may be used to filter a plurality of generated keywords, block 216 , where more keywords have been generated than is desirable. Further, the keyword characterizations may be used to find advertisements where the keyword has no advertisements associated with it, block 218 . Such advertisements may be advertisements that are relevant to a domain name.
- the computer system may determine if more keywords have been received or generated, block 220 . If more keywords have been generated or received, blocks 202 - 220 may be repeated.
- FIG. 3 illustrates a block diagram depicting a method of processing web page results comprising a collection of documents and/or objects to extract one or more keyword characterizations for use as proxies for the keyword, in accordance with various embodiments.
- a keyword search process 302 may generate a number of web page results 304 .
- An n-gram spectrum generation process 306 may then accept the web page results 304 as input and generate pluralities of unigrams 308 , bigrams 310 , and trigrams 312 for use to characterize the keyword that was input to the keyword search process 302 .
- the keyword search process 302 may receive one or more keywords as input to a search engine which may search a database, electronic corpus, or the World Wide Web to obtain web page results 304 .
- a search engine which may search a database, electronic corpus, or the World Wide Web to obtain web page results 304 .
- Such a keyword search process 302 is described above in greater detail in reference to keyword 102 , search engine 104 , and keyword relational database 106 of FIG. 1 .
- Web page results 304 are also discussed in greater detail above in reference to search results 108 of FIG. 1 and, like search results 108 , web page results may comprise a collection of documents and or objects.
- the web page results 304 may be input to an n-gram generation process 306 .
- the n-grams may be generated by obtaining the web page results 304 and extracting from those results 304 one or more sequences of a number (n) of contiguous words found within the web pages returned by the search.
- Unigrams may be individual words; bigrams may be pairs of adjacent words, etc.
- this type of characterization of a keyword may result in a spectrum of n-grams, where n is typically a small positive integer.
- the words aaa, bbb, ccc, ddd, and eee may be those contained in a hypothetical set of web page results 304 returned by a search engine in response to a particular keyword.
- the unigrams 308 may be individual listings of those words; the bigrams 310 may be pairs of adjacent words; and the trigrams 312 may be groups of three contiguous words.
- n-grams may also be maintained, such as their frequencies, either absolute or relative to some base-line corpus.
- FIG. 4 is a block diagram illustrating an example computing device suitable for use to practice the present invention, in accordance with various embodiments.
- computing system/device 400 includes one or more processors 402 and system memory 404 .
- computing system/device 400 includes mass storage devices 406 (such as diskette, hard drive, CDROM and so forth), input/output devices 408 (such as keyboard, cursor control and so forth), and communication interfaces 410 (such as network interface cards, modems and so forth).
- the elements are coupled to each other via system bus 412 , which represents one or more buses. In the case of multiple buses, they are bridged by one or more bus bridges (not shown).
- system memory 404 and mass storage 406 may be employed to store a working copy and a permanent copy of the programming instructions implementing selected ones or all of the various components of embodiments of the present invention, such as the processes illustrated by FIG. 1 , herein collectively denoted as 422 .
- the various components may be implemented as assembler instructions supported by processor(s) 402 or high level languages, such as C, that can be compiled into such instructions.
- the permanent copy of the programming instructions may be placed into permanent storage 406 in the factory or in the field, through, for example, a distribution medium (not shown) or through communication interface 410 (from a distribution server (not shown)).
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (31)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/452,709 US7519588B2 (en) | 2005-06-20 | 2006-06-13 | Keyword characterization and application |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US69242105P | 2005-06-20 | 2005-06-20 | |
US75533305P | 2005-12-29 | 2005-12-29 | |
US11/452,709 US7519588B2 (en) | 2005-06-20 | 2006-06-13 | Keyword characterization and application |
Publications (2)
Publication Number | Publication Date |
---|---|
US20060287988A1 US20060287988A1 (en) | 2006-12-21 |
US7519588B2 true US7519588B2 (en) | 2009-04-14 |
Family
ID=37574594
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/452,709 Active - Reinstated 2027-04-12 US7519588B2 (en) | 2005-06-20 | 2006-06-13 | Keyword characterization and application |
Country Status (1)
Country | Link |
---|---|
US (1) | US7519588B2 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040172319A1 (en) * | 1997-01-06 | 2004-09-02 | Eder Jeff Scott | Value chain system |
US20080228720A1 (en) * | 2007-03-14 | 2008-09-18 | Yahoo! Inc. | Implicit name searching |
US20130060761A1 (en) * | 2011-09-02 | 2013-03-07 | Microsoft Corporation | Using domain intent to provide more search results that correspond to a domain |
US9128981B1 (en) | 2008-07-29 | 2015-09-08 | James L. Geer | Phone assisted ‘photographic memory’ |
US9792361B1 (en) | 2008-07-29 | 2017-10-17 | James L. Geer | Photographic memory |
US10445376B2 (en) | 2015-09-11 | 2019-10-15 | Microsoft Technology Licensing, Llc | Rewriting keyword information using search engine results |
US10990630B2 (en) | 2018-02-27 | 2021-04-27 | International Business Machines Corporation | Generating search results based on non-linguistic tokens |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7680760B2 (en) * | 2005-10-28 | 2010-03-16 | Yahoo! Inc. | System and method for labeling a document |
US7627559B2 (en) * | 2005-12-15 | 2009-12-01 | Microsoft Corporation | Context-based key phrase discovery and similarity measurement utilizing search engine query logs |
US7716229B1 (en) * | 2006-03-31 | 2010-05-11 | Microsoft Corporation | Generating misspells from query log context usage |
US7620551B2 (en) * | 2006-07-20 | 2009-11-17 | Mspot, Inc. | Method and apparatus for providing search capability and targeted advertising for audio, image, and video content over the internet |
US20080039966A1 (en) * | 2006-08-09 | 2008-02-14 | Newsnet Ltd | System and method for rediffusion of audio data |
US8156112B2 (en) * | 2006-11-07 | 2012-04-10 | At&T Intellectual Property I, L.P. | Determining sort order by distance |
CA2674294C (en) * | 2006-12-29 | 2017-03-07 | Thomson Reuters Global Resources | Information-retrieval systems, methods, and software with concept-based searching and ranking |
US8041662B2 (en) * | 2007-08-10 | 2011-10-18 | Microsoft Corporation | Domain name geometrical classification using character-based n-grams |
US8005782B2 (en) * | 2007-08-10 | 2011-08-23 | Microsoft Corporation | Domain name statistical classification using character-based N-grams |
US8793265B2 (en) * | 2007-09-12 | 2014-07-29 | Samsung Electronics Co., Ltd. | Method and system for selecting personalized search engines for accessing information |
US7853597B2 (en) * | 2008-04-28 | 2010-12-14 | Microsoft Corporation | Product line extraction |
US8290946B2 (en) * | 2008-06-24 | 2012-10-16 | Microsoft Corporation | Consistent phrase relevance measures |
FR2935498B1 (en) * | 2008-08-27 | 2010-10-15 | Eads Europ Aeronautic Defence | METHOD FOR IDENTIFYING AN OBJECT IN A VIDEO ARCHIVE |
US8234274B2 (en) * | 2008-12-18 | 2012-07-31 | Nec Laboratories America, Inc. | Systems and methods for characterizing linked documents using a latent topic model |
US20100161406A1 (en) * | 2008-12-23 | 2010-06-24 | Motorola, Inc. | Method and Apparatus for Managing Classes and Keywords and for Retrieving Advertisements |
US8849649B2 (en) * | 2009-12-24 | 2014-09-30 | Metavana, Inc. | System and method for determining sentiment expressed in documents |
US9201863B2 (en) * | 2009-12-24 | 2015-12-01 | Woodwire, Inc. | Sentiment analysis from social media content |
US8417650B2 (en) * | 2010-01-27 | 2013-04-09 | Microsoft Corporation | Event prediction in dynamic environments |
EP2635965A4 (en) | 2010-11-05 | 2016-08-10 | Rakuten Inc | Systems and methods regarding keyword extraction |
US10409873B2 (en) * | 2014-11-26 | 2019-09-10 | Facebook, Inc. | Searching for content by key-authors on online social networks |
US10380195B1 (en) * | 2017-01-13 | 2019-08-13 | Parallels International Gmbh | Grouping documents by content similarity |
CN108304365A (en) * | 2017-02-23 | 2018-07-20 | 腾讯科技(深圳)有限公司 | keyword extracting method and device |
US11222057B2 (en) * | 2019-08-07 | 2022-01-11 | International Business Machines Corporation | Methods and systems for generating descriptions utilizing extracted entity descriptors |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001003041A1 (en) | 1999-07-02 | 2001-01-11 | Aguayo Erwin Jr | System and method for short notice advertising placement |
US6470307B1 (en) * | 1997-06-23 | 2002-10-22 | National Research Council Of Canada | Method and apparatus for automatically identifying keywords within a document |
US20030220866A1 (en) | 2001-12-28 | 2003-11-27 | Findwhat.Com | System and method for pay for performand advertising in general media |
US6778979B2 (en) * | 2001-08-13 | 2004-08-17 | Xerox Corporation | System for automatically generating queries |
US20040225562A1 (en) | 2003-05-09 | 2004-11-11 | Aquantive, Inc. | Method of maximizing revenue from performance-based internet advertising agreements |
US20050144068A1 (en) | 2003-12-19 | 2005-06-30 | Palo Alto Research Center Incorporated | Secondary market for keyword advertising |
US20050216516A1 (en) | 2000-05-02 | 2005-09-29 | Textwise Llc | Advertisement placement method and system using semantic analysis |
US7124129B2 (en) * | 1998-03-03 | 2006-10-17 | A9.Com, Inc. | Identifying the items most relevant to a current query based on items selected in connection with similar queries |
US7321892B2 (en) * | 2005-08-11 | 2008-01-22 | Amazon Technologies, Inc. | Identifying alternative spellings of search strings by analyzing self-corrective searching behaviors of users |
-
2006
- 2006-06-13 US US11/452,709 patent/US7519588B2/en active Active - Reinstated
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6470307B1 (en) * | 1997-06-23 | 2002-10-22 | National Research Council Of Canada | Method and apparatus for automatically identifying keywords within a document |
US7124129B2 (en) * | 1998-03-03 | 2006-10-17 | A9.Com, Inc. | Identifying the items most relevant to a current query based on items selected in connection with similar queries |
WO2001003041A1 (en) | 1999-07-02 | 2001-01-11 | Aguayo Erwin Jr | System and method for short notice advertising placement |
US20050216516A1 (en) | 2000-05-02 | 2005-09-29 | Textwise Llc | Advertisement placement method and system using semantic analysis |
US6778979B2 (en) * | 2001-08-13 | 2004-08-17 | Xerox Corporation | System for automatically generating queries |
US20030220866A1 (en) | 2001-12-28 | 2003-11-27 | Findwhat.Com | System and method for pay for performand advertising in general media |
US20040225562A1 (en) | 2003-05-09 | 2004-11-11 | Aquantive, Inc. | Method of maximizing revenue from performance-based internet advertising agreements |
US20050144068A1 (en) | 2003-12-19 | 2005-06-30 | Palo Alto Research Center Incorporated | Secondary market for keyword advertising |
US7321892B2 (en) * | 2005-08-11 | 2008-01-22 | Amazon Technologies, Inc. | Identifying alternative spellings of search strings by analyzing self-corrective searching behaviors of users |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040172319A1 (en) * | 1997-01-06 | 2004-09-02 | Eder Jeff Scott | Value chain system |
US20080228720A1 (en) * | 2007-03-14 | 2008-09-18 | Yahoo! Inc. | Implicit name searching |
US7917489B2 (en) * | 2007-03-14 | 2011-03-29 | Yahoo! Inc. | Implicit name searching |
US9128981B1 (en) | 2008-07-29 | 2015-09-08 | James L. Geer | Phone assisted ‘photographic memory’ |
US9792361B1 (en) | 2008-07-29 | 2017-10-17 | James L. Geer | Photographic memory |
US11086929B1 (en) | 2008-07-29 | 2021-08-10 | Mimzi LLC | Photographic memory |
US11308156B1 (en) | 2008-07-29 | 2022-04-19 | Mimzi, Llc | Photographic memory |
US11782975B1 (en) | 2008-07-29 | 2023-10-10 | Mimzi, Llc | Photographic memory |
US20130060761A1 (en) * | 2011-09-02 | 2013-03-07 | Microsoft Corporation | Using domain intent to provide more search results that correspond to a domain |
US8504561B2 (en) * | 2011-09-02 | 2013-08-06 | Microsoft Corporation | Using domain intent to provide more search results that correspond to a domain |
US10445376B2 (en) | 2015-09-11 | 2019-10-15 | Microsoft Technology Licensing, Llc | Rewriting keyword information using search engine results |
US10990630B2 (en) | 2018-02-27 | 2021-04-27 | International Business Machines Corporation | Generating search results based on non-linguistic tokens |
Also Published As
Publication number | Publication date |
---|---|
US20060287988A1 (en) | 2006-12-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7519588B2 (en) | Keyword characterization and application | |
KR101721338B1 (en) | Search engine and implementation method thereof | |
US9135308B2 (en) | Topic relevant abbreviations | |
CN109885773B (en) | Personalized article recommendation method, system, medium and equipment | |
US9262532B2 (en) | Ranking entity facets using user-click feedback | |
US20090248661A1 (en) | Identifying relevant information sources from user activity | |
Pu et al. | Subject categorization of query terms for exploring Web users' search interests | |
US9262509B2 (en) | Method and system for semantic distance measurement | |
US8782037B1 (en) | System and method for mark-up language document rank analysis | |
CN107784092A (en) | A kind of method, server and computer-readable medium for recommending hot word | |
US7849081B1 (en) | Document analyzer and metadata generation and use | |
US10755179B2 (en) | Methods and apparatus for identifying concepts corresponding to input information | |
US20120317088A1 (en) | Associating Search Queries and Entities | |
US20070136256A1 (en) | Method and apparatus for representing text using search engine, document collection, and hierarchal taxonomy | |
US20070214133A1 (en) | Methods for filtering data and filling in missing data using nonlinear inference | |
US20060155751A1 (en) | System and method for document analysis, processing and information extraction | |
US20110125791A1 (en) | Query classification using search result tag ratios | |
JP2013516022A (en) | Cluster and present search suggestions | |
WO2010014082A1 (en) | Method and apparatus for relating datasets by using semantic vectors and keyword analyses | |
Lubis et al. | A framework of utilizing big data of social media to find out the habits of users using keyword | |
KR100954842B1 (en) | Method and System of classifying web page using category tag information and Recording medium using by the same | |
JP5427694B2 (en) | Related content presentation apparatus and program | |
CN107665442B (en) | Method and device for acquiring target user | |
Han et al. | Folksonomy-based ontological user interest profile modeling and its application in personalized search | |
Preetha et al. | Personalized search engines on mining user preferences using clickthrough data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: EFFICIENT FRONTIER, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MASON, ZACHARY;REEL/FRAME:017974/0719 Effective date: 20060607 |
|
AS | Assignment |
Owner name: ADOBE SYSTEMS INCORPORATED, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:EFFICIENT FRONTIER, INC.;REEL/FRAME:027702/0156 Effective date: 20120210 |
|
REMI | Maintenance fee reminder mailed | ||
FEPP | Fee payment procedure |
Free format text: PETITION RELATED TO MAINTENANCE FEES FILED (ORIGINAL EVENT CODE: PMFP); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: PETITION RELATED TO MAINTENANCE FEES GRANTED (ORIGINAL EVENT CODE: PMFG); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: PAT HOLDER NO LONGER CLAIMS SMALL ENTITY STATUS, ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: STOL); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees | ||
REIN | Reinstatement after maintenance fee payment confirmed | ||
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20130414 |
|
PRDP | Patent reinstated due to the acceptance of a late maintenance fee |
Effective date: 20130719 |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
SULP | Surcharge for late payment | ||
FPAY | Fee payment |
Year of fee payment: 8 |
|
AS | Assignment |
Owner name: ADOBE INC., CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:ADOBE SYSTEMS INCORPORATED;REEL/FRAME:048525/0042 Effective date: 20181008 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |