US20200265491A1 - Dynamic determination of data facets - Google Patents

Dynamic determination of data facets Download PDF

Info

Publication number
US20200265491A1
US20200265491A1 US16/738,746 US202016738746A US2020265491A1 US 20200265491 A1 US20200265491 A1 US 20200265491A1 US 202016738746 A US202016738746 A US 202016738746A US 2020265491 A1 US2020265491 A1 US 2020265491A1
Authority
US
United States
Prior art keywords
fields
subset
items
dataset
product categories
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US16/738,746
Inventor
Jonathan H. Young
Sid Probstein
Rik Tamm-Daniels
William K. Johnson, III
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ServiceNow Inc
Original Assignee
ServiceNow Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ServiceNow Inc filed Critical ServiceNow Inc
Priority to US16/738,746 priority Critical patent/US20200265491A1/en
Publication of US20200265491A1 publication Critical patent/US20200265491A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2428Query predicate definition using graphical user interfaces, including menus and forms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/45Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0603Catalogue ordering

Definitions

  • the invention relates generally to techniques for data search and retrieval, and more specifically, to the determination of parameters for classification, organization, presentation and retrieval of data.
  • facet-based searching provides a significant improvement over conventional query/result methods, it is not without its drawbacks.
  • current techniques for implementing faceted-based search require a significant amount of work to determine the facets long before a website is implemented.
  • Embodiments of the invention provide methods and technical implementations of systems for gathering and assessing large amounts of data to identify data facets that can be used to classify data and help users narrow search queries. Assessing the data may include reviewing structured and/or semi-structured data that is typically tagged with a property, as well as reviewing query logs (both submitted queries and query results). Probabilistic techniques are preferably used to select “good” facets (e.g., those that segment the data into a well-distributed set of groups) while maintaining a balance between the number of facets and the number of strata within each facet.
  • the facets may include discrete values, continuous numeric values (either evenly distributed or skewed) and/or hierarchical values
  • embodiments of the invention also facilitate determining optimal ranges and groupings of the facets.
  • White lists and black lists may also be used to ensure that a particular field is either used or avoided.
  • a method for dynamically determining data facets includes receiving a dataset of information that is organized into a plurality of fields (which may be structured, semi-structured, and/or unstructured). Each field has values associated with it for each information element, and the information is analyzed to determine distribution statistics for the fields. Based on the statistics, fields are selected as data facets that may be used to categorize the dataset and facilitate execution of search queries against the dataset. For example, the facets can improve the user experience by being presented as links labeled with a specific term to limit (or refine) the search, or a link (labeled, e.g., “(remove Price restriction)”) that broadens the search parameters.
  • query logs generated in response to queries submitted against the dataset may be incorporated into the analysis such that the distribution statistics reflect these previous queries.
  • a subset of the dataset may be identified and used to represent the dataset as a whole, in which case the analysis is limited set to the information contained in the subset.
  • the information may be stored in a document repository, database, search repository or other form of physical and/or virtual storage.
  • the method may also include the processing of a search query, in which information is retrieved from the dataset based on a correlation between components of the search query and the data facets.
  • the facets may then be ranked. The ranking may be based on the distribution statistics, query logs, or other attributes of the fields and used to present information from the dataset such that the information is grouped and ordered by the ranked facets.
  • the data facets may be incorporated into source code (e.g., XML, HTML or other structured markup language) of an application, and the presentation of that application then changes based on the newly identified or modified data facets.
  • the data facets may also include item groupings, which may be other facts or, in some cases sub-categories. The groupings may depend, for example, on a statistical distribution of documents and/or data. The groupings may be linear (e.g., of equal range), logarithmic, or, in some cases based on data clusters.
  • a system for dynamically determining data facets includes a data repository for storing information to be searched, wherein the information is organized into fields and the fields have associated values.
  • the system also includes a facet recommendation engine for (i) analyzing the information to determine distribution statistics for the fields, and (ii) based on the statistics, selecting fields as data facets to be used to categorize the dataset.
  • the instructions implement and/or performing the methods described in the preceding paragraphs.
  • a computer-readable medium such as, but not limited to, a floppy disk, a hard disk, an optical disk, a magnetic tape, a PROM, an EPROM, CD-ROM, or DVD-ROM or downloaded from a server.
  • the functionality of the techniques may be embedded on the computer-readable medium in any number of computer-readable instructions, or languages such as, for example, FORTRAN, PASCAL, C, C++, Java, PERL, C #, Tcl, BASIC and assembly language.
  • the computer-readable instructions may, for example, be written in a script, macro, or functionally embedded in commercially available software (such as, e.g., EXCEL or VISUAL BASIC).
  • FIG. 1 is a screen-shot from a web-based storefront illustrating the use of data facets to group products, which may be implemented using various embodiments of the invention.
  • FIG. 2 is a flow chart depicting a process for automatically discovering data facets in accordance with an embodiment of the invention.
  • FIG. 3 illustrates an example of an XML listing of facets and facet values.
  • FIG. 4 schematically depicts a system for automatically discovering data facets in accordance with an embodiment of the invention.
  • a web page 100 a collection of laptop computers available at a retail store and organized by five data facets 110 .
  • the available laptops are grouped according to processor type, screen size, proposed use, price and manufacturer, such that visitors to the webpage can narrow their searches for a particular product meeting certain criteria.
  • each data facet includes not only values (e.g., groupings) 120 for each facet 110 , but the number of items 130 that meet that particular criteria. Visitors select a grouping 120 based on individual preferences, and the selection is sent to an underlying search engine which applies the values as filter criteria in a subsequent query that returns a list of products matching the criteria.
  • embodiments of the invention facilitate the analysis of a set of documents and/or data records (from a database, search engine index, and/or any other document/data repository, collectively referred to as a “dataset”) and the automated selection and recommendation of a set of fields to be used as data facets.
  • the technique can be implemented in two different ways—either offline, where the dataset is analyzed but the results do not immediately affect the application supported by the dataset, or online, where the results automatically influence the presentation of data facets in real time.
  • a facet-finder recommendation engine provides the data fields to a website configuration tool, thereby creating an updated user interface that presents the products and categories to users.
  • a feedback loop provides query results and usage statistics based on previously selected facets as input into the analysis step for continuous improvement.
  • FIG. 2 illustrates a process 200 for determining optimal or near-optimal facets based on an underlying dataset.
  • a query or set of queries is identified (STEP 210 ) that represents a particular user group or selection criteria to narrow the set of documents being identified.
  • a subset of the dataset may be identified and used, for purposes of facet identification, as a representation of the entire dataset.
  • Certain documents of the dataset may, for example, be identified as more important (based on date submitted, length, author, frequency of use, user feedback, etc.) and used either exclusively during the analysis step, or weighted such that they have a greater influence on term distributions and frequencies. In such cases, the processing resources needed to analyze the dataset and determine appropriate facets may be significantly reduced. In some embodiments, however, this step may be skipped, and the entire dataset may be used.
  • the dataset and/or its constituent documents may be structured (e.g., pulled from a relational database), semi-structured (fielded with values), or unstructured.
  • Representative examples include a music catalog in which songs and albums have attributes such as artist, title, length, genre, and release date; a recipe collection in which recipes are associated with a type of cuisine, main ingredients, cooking style and/or a holiday; travel information that may be organized by destinations, prices, and include articles by authors or publications; regulatory documents that include product and part codes, machine types, expiration dates, filing dates and submission data; and images that may be tagged with the name of an artist, date, style, type of image, artistic movement, major colors, theme, etc.
  • a facet recommendation engine analyzes the data (or, in some cases a defined subset of the data) and computes term frequencies and/or distributions (STEP 230 ) to determine potential facets that may be used to cluster the data or documents into meaningful classifications. Based on the computed frequencies, one or more fields or data values are selected as data facets.
  • a field is a good candidate for use as a facet if a large percentage (e.g., >95%) of the documents include a value for the field, since using fields with a lower population percentage can result in a significant amount of data being ignored, and therefore not included in the search results.
  • the field should contain a relatively small number of terms (as compared to the total number of documents or records being indexed). As an example, a field having between 10 and 50 values for a dataset containing 1000-5000 items is a good candidate for a data facet.
  • the values in the dataset need not have been distributed evenly or according to a predictable pattern, and in fact if a small number of the values represent a high percentage of the items, the field may be an even stronger candidate for use as a facet. Furthermore, it is preferable for the fields used as facets to be mutually exclusive—e.g., they do not overlap and are not correlated with each other—and the values of one field should not be discernible or predictable from values in other fields.
  • date fields and numeric fields are good candidates for facets.
  • fields that contain highly unique numbers e.g., SKUs, ISBNs, SSNs, etc.
  • the system may, in some cases, identify and reject those facets that represent ID-type data having only one or two items in each (or some high percentage, e.g., >95%) of the groupings.
  • the grouping of numeric values need not be linear. In some cases, especially with products exhibiting high price variability, other classifications may be more beneficial. For example, when presenting memorabilia in an online auction, the prices may range from a few dollars for common merchandise (e.g., pins or hats that were produced in great numbers) to tens of thousands for one-of-a-kind, autographed, limited-edition paraphernalia (e.g., signed artwork, mint-condition coins, etc.). In these cases, the ranges may be logarithmic (or some combination of linear and logarithmic) such that the price facet is presented as $0-$10; $10-$100; $100-$1,000; $1,000-$5,000; and >$5,000.
  • numeric ranges may be logarithmic (e.g., 1-10, 10-100, and 100-1000) or linear
  • other implementations compute ranges by dividing the actual population of data values into “bins” of equal (or approximately equal) numbers of items. This approach is especially useful when values “clump” together.
  • Using this method allows for the detection of outliers by statistical testing or simply by detecting empty bins adjacent to the top and/or bottom bin, where the extreme bin is smaller than a configurable percentage (e.g., 5%) of the total values.
  • bins corresponding to “less than 500”, “500-510”, . . . , 590-600”, and “greater than 600” may be created.
  • each field may be scored and/or ranked according to the number of values in the field, the distribution of the values across the entire dataset, the frequency the field is included in a search, etc.
  • recommended facets can be determined for an entire repository and the ranked fields then displayed to a system designer or programmer using a facet-recommendation user interface, thus facilitating the selection of facets from the set of recommended facets.
  • the user interface also allows the designer to reorder facets if, for example, the ordering and/or placement is deemed to be important.
  • the facet recommendation process may also be performed for a single query (or set of queries) such that the presentation of the search interface includes only those facets deemed highly relevant to the current (or fairly recent) search log.
  • the fields may be dynamically ranked using the dataset as a reference.
  • a subset of the dataset e.g., the 500 most requested documents, the 100 most recently added documents, etc.
  • the recommended facets are then supplied to a web-design application using, for example, XML format (or other markup language) with each facet being represented using a unique tag.
  • web forms may be created automatically, using the dynamically created facets as categories for documents as presented to the users (STEP 250 ).
  • the facets may be determined based on an analysis of the current query (by, for example, parsing the query string into component terms and searching the data store for the terms) or recent queries.
  • An example of an XML listing of facets and facet values is illustrated in FIG. 3 .
  • query logs may be included in the analysis to capture user interactions with the dataset. For example, if users continually submit queries based on processor speed or operating system (which may not be immediately apparent as important distinguishing factors), these fields may be added as facets, even though the initial analysis indicated they would be poor facets. As a result, fields that otherwise would be overlooked can become important data facets for subsequent searching and retrieval. In this manner, a feedback loop may be used to capture the ongoing performance of the current facet set.
  • Facet performance may be measured based on, for example, the frequency with which queries are submitted using the facet, the percentage of total queries using the facet, the percentage of queries submitted using the facet that are not immediately followed by other queries (i.e., it is likely that the result included the document or product the user was looking for) as well as other factors. Facets may then be added or removed based on the feedback.
  • one or more subsets of query logs may also be identified and used to select (or help select) data facets. For example, more recent queries (e.g., those submitted in the past week) may provide greater insight into current search trends, and therefore be used exclusively. In other cases, statistical samplings may be used from different time periods, days of the week, seasons, etc. to obtain an accurate representation of how users interact with the dataset. A large set of search queries submitted in the weeks leading up to Christmas may be selected to identify data facets relating to toys, for example.
  • a system 410 for implementing the techniques described above includes a facet recommendation engine 415 and a data-storage module 420 .
  • the system includes an interface-generation module.
  • the facet recommendation engine 415 provides the application processing component for determining desirable data facets as described above.
  • the facet recommendation engine 415 includes programming instructions for evaluating large amounts of data and documents, calculating field and value distributions and ranges and recommending which fields to use as data facets.
  • the engine is preferably implemented on one or more server class computers that have sufficient memory, data storage, and processing power and that run a server class operating system (e.g.
  • the server may be part of a server farm or server network, which is a logical group of one or more servers.
  • server network which is a logical group of one or more servers.
  • application software can be implemented in components, with different components running on different server computers, on the same server, or some combination.
  • the data-storage module 420 stores the data and/or documents being analyzed by the facet recommendation engine 415 and subsequently searched.
  • the data repository may store information relating to products, documents, people, and/or transactions against which users submit search queries. Examples of databases that may be used to implement this functionality include the MySQL Database Server by Sun Microsystems, the PostgreSQL Database Server by the PostgreSQL Global Development Group of Berkeley, Calif., and the ORACLE Database Server offered by ORACLE Corp. of Redwood Shores, Calif.
  • an interface-generation module 430 generates the structured, tagged source code for integration into the application(s) operating on an application server 440 .
  • one or more clients 460 may be used to access the application server 440 via a web server 450 .
  • Such implementations may include a design interface for providing the recommended facets to a web design application for implementing the recommendations.
  • the clients 460 are preferably implemented using software running on a personal or professional grade computer workstation (e.g., a PC with an INTEL processor or an APPLE MACINTOSH) capable of running such operating systems as the MICROSOFT WINDOWS family of operating systems from Microsoft Corporation of Redmond, Wash., the MACINTOSH OSX operating system from Apple Computer of Cupertino, Calif., and various varieties of Unix, such as SUN SOLARIS from SUN MICROSYSTEMS, and GNU/Linux from RED HAT, INC. of Durham, N.C. (and others).
  • a personal or professional grade computer workstation e.g., a PC with an INTEL processor or an APPLE MACINTOSH
  • operating systems e.g., a PC with an INTEL processor or an APPLE MACINTOSH
  • MICROSOFT WINDOWS family of operating systems from Microsoft Corporation of Redmond, Wash.
  • the MACINTOSH OSX operating system from Apple
  • the client 460 can also be implemented on such hardware as a smart or dumb terminal, network computer, wireless device, personal data assistant, information appliance, workstation, minicomputer, mainframe computer, or other computing device, that is operated as a general purpose computer or a special purpose hardware device solely used for serving as a client in the system.
  • a smart or dumb terminal network computer, wireless device, personal data assistant, information appliance, workstation, minicomputer, mainframe computer, or other computing device, that is operated as a general purpose computer or a special purpose hardware device solely used for serving as a client in the system.
  • the client 460 may include client interface software for facilitating the review and selection of data facets as determined by the facet recommendation engine 415 , and may be implemented in various forms, for example, in the form of a Java applet that is downloaded to the client and runs in conjunction with a web browser.
  • the client software may be in the form of a standalone application, implemented in a language such as Java, C++, C #, VisualBasic or in native processor-executable code.
  • the client software if executing on the client, the client software opens a network connection to the server over a communications network and communicates via that connection to the server.
  • a communications network 470 connects the clients 460 with the server(s) 450 , 440 .
  • the communication may take place via any media such as standard telephone lines, LAN or WAN links (e.g., T1, T3, 56kb, X.25), broadband connections (ISDN, Frame Relay, ATM), wireless links, and so on.
  • the network can carry TCP/IP protocol communications, and HTTP/HTTPS requests made by the client software and the connection between the client software and the server can be communicated over such TCP/IP networks.
  • the type of network is not a limitation, however, and any suitable network may be used.
  • Typical examples of networks that can serve as the communications network include a wireless or wired Ethernet-based intranet, a local or wide-area network (LAN or WAN), and/or the global communications network known as the Internet, which may accommodate many different communications media and protocols.
  • modules described throughout the specification can be implemented in whole or in part as a software program (or programs) operating on one or more processors using any suitable programming language or languages (C++, C #, java, Visual Basic, LISP, BASIC, PERL, etc.) and/or as a hardware device (e.g., ASIC, FPGA, processor, memory, storage and the like).
  • a software program or programs operating on one or more processors using any suitable programming language or languages (C++, C #, java, Visual Basic, LISP, BASIC, PERL, etc.) and/or as a hardware device (e.g., ASIC, FPGA, processor, memory, storage and the like).

Abstract

Documents and data are analyzed to determine one or more data facets. The documents, data and other information contained therein may be presented according to statistically-determined groupings based on the data facets.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of U.S. patent application Ser. No. 12/353,565 filed Jan. 14, 2009, which claims priority to and the benefits of U.S. provisional patent application Ser. No. 61/022,001, entitled “Dynamic Determination Of Data Facets” and filed Jan. 18, 2008, the entire disclosure of which is incorporated herein by reference.
  • FIELD OF THE INVENTION
  • The invention relates generally to techniques for data search and retrieval, and more specifically, to the determination of parameters for classification, organization, presentation and retrieval of data.
  • BACKGROUND
  • Data presented to consumers of electronic information is often provided in pure “list” form—that is, as a one-dimensional listing in response to a query. Although much effort goes into determining the contents of the list, the ordering of the results and even the visual presentation of individual items, the consumer must still have some knowledge of the subject matter being searched to make the results meaningful. Attempts to provide general classifications (early implementations of search engines such as Yahoo!, for example) often become outdated, overly burdensome or, even worse, irrelevant.
  • In an attempt to help consumers direct their searches, many websites (typically those selling electronics, automobiles, books, etc.) categorize their products and associate each product with one or more of these categories. As a result, the data is semi-structured, meaning there are certain data elements that are common to all the products, and the values of these elements can be used to classify and select subsets of the products. One example can be seen on many consumer-electronics websites that sell computers. It is common to classify computers as either notebooks or desktops, by price (e.g., <$1000, between $1000 and $2000, and >$2000), screen size, processing power, weight and/or projected use (business, personal, gaming, graphical design). Each of these categories is referred to as a “facet” or “dimension” that can be used to assist the consumer in narrowing down his search using known data elements prior to presenting the results of a search query.
  • While facet-based searching provides a significant improvement over conventional query/result methods, it is not without its drawbacks. In particular, current techniques for implementing faceted-based search require a significant amount of work to determine the facets long before a website is implemented. Likewise, it is difficult to change the facets as the underlying data and queries evolve without disrupting or modifying the functionality of the search application that acts on the data.
  • What is needed, therefore, is a method and supporting systems for analyzing data and automatically determining data facets for use as search categories.
  • SUMMARY OF THE INVENTION
  • Embodiments of the invention provide methods and technical implementations of systems for gathering and assessing large amounts of data to identify data facets that can be used to classify data and help users narrow search queries. Assessing the data may include reviewing structured and/or semi-structured data that is typically tagged with a property, as well as reviewing query logs (both submitted queries and query results). Probabilistic techniques are preferably used to select “good” facets (e.g., those that segment the data into a well-distributed set of groups) while maintaining a balance between the number of facets and the number of strata within each facet. Because the facets may include discrete values, continuous numeric values (either evenly distributed or skewed) and/or hierarchical values, embodiments of the invention also facilitate determining optimal ranges and groupings of the facets. White lists and black lists may also be used to ensure that a particular field is either used or avoided.
  • In one aspect, a method for dynamically determining data facets includes receiving a dataset of information that is organized into a plurality of fields (which may be structured, semi-structured, and/or unstructured). Each field has values associated with it for each information element, and the information is analyzed to determine distribution statistics for the fields. Based on the statistics, fields are selected as data facets that may be used to categorize the dataset and facilitate execution of search queries against the dataset. For example, the facets can improve the user experience by being presented as links labeled with a specific term to limit (or refine) the search, or a link (labeled, e.g., “(remove Price restriction)”) that broadens the search parameters.
  • In some embodiments, query logs generated in response to queries submitted against the dataset may be incorporated into the analysis such that the distribution statistics reflect these previous queries. A subset of the dataset may be identified and used to represent the dataset as a whole, in which case the analysis is limited set to the information contained in the subset. The information may be stored in a document repository, database, search repository or other form of physical and/or virtual storage. The method may also include the processing of a search query, in which information is retrieved from the dataset based on a correlation between components of the search query and the data facets. In some cases, the facets may then be ranked. The ranking may be based on the distribution statistics, query logs, or other attributes of the fields and used to present information from the dataset such that the information is grouped and ordered by the ranked facets.
  • In some implementations, the data facets may be incorporated into source code (e.g., XML, HTML or other structured markup language) of an application, and the presentation of that application then changes based on the newly identified or modified data facets. The data facets may also include item groupings, which may be other facts or, in some cases sub-categories. The groupings may depend, for example, on a statistical distribution of documents and/or data. The groupings may be linear (e.g., of equal range), logarithmic, or, in some cases based on data clusters.
  • In another aspect, a system for dynamically determining data facets includes a data repository for storing information to be searched, wherein the information is organized into fields and the fields have associated values. The system also includes a facet recommendation engine for (i) analyzing the information to determine distribution statistics for the fields, and (ii) based on the statistics, selecting fields as data facets to be used to categorize the dataset.
  • In an other aspect the instructions implement and/or performing the methods described in the preceding paragraphs. In particular, the functionality of a method of the present invention may be embedded on a computer-readable medium, such as, but not limited to, a floppy disk, a hard disk, an optical disk, a magnetic tape, a PROM, an EPROM, CD-ROM, or DVD-ROM or downloaded from a server. The functionality of the techniques may be embedded on the computer-readable medium in any number of computer-readable instructions, or languages such as, for example, FORTRAN, PASCAL, C, C++, Java, PERL, C #, Tcl, BASIC and assembly language. Further, the computer-readable instructions may, for example, be written in a script, macro, or functionally embedded in commercially available software (such as, e.g., EXCEL or VISUAL BASIC).
  • Other aspects and advantages of the invention will become apparent from the following drawings, detailed description, and claims, all of which illustrate the principles of the invention, by way of example only.
  • DESCRIPTION OF THE DRAWINGS
  • In the drawings, like reference characters generally refer to the same parts throughout the different views. Also, the drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the invention.
  • FIG. 1 is a screen-shot from a web-based storefront illustrating the use of data facets to group products, which may be implemented using various embodiments of the invention.
  • FIG. 2 is a flow chart depicting a process for automatically discovering data facets in accordance with an embodiment of the invention.
  • FIG. 3 illustrates an example of an XML listing of facets and facet values.
  • FIG. 4 schematically depicts a system for automatically discovering data facets in accordance with an embodiment of the invention.
  • DETAILED DESCRIPTION
  • Consumer websites often facilitate product searching using data facets. Conventional methods of implementing data-faceted search require manual analysis and selection of the data facets based on known, fielded properties that exist in the data. Referring to FIG. 1, for example, a web page 100 a collection of laptop computers available at a retail store and organized by five data facets 110. Specifically, the available laptops are grouped according to processor type, screen size, proposed use, price and manufacturer, such that visitors to the webpage can narrow their searches for a particular product meeting certain criteria. Furthermore, each data facet includes not only values (e.g., groupings) 120 for each facet 110, but the number of items 130 that meet that particular criteria. Visitors select a grouping 120 based on individual preferences, and the selection is sent to an underlying search engine which applies the values as filter criteria in a subsequent query that returns a list of products matching the criteria.
  • Conventional techniques for determining and presenting facets of structured or unstructured data involve significant manual effort. Typically, technicians review product descriptions, metadata, search logs and other information to select a handful of aggregate groupings to use as selection facets. In practice, once these data facets are selected, they are hard-coded into web page designs and data structures, which makes expansion and/or modification of the data facets difficult and time consuming. In certain environments (e.g., real-time news stories, sports, financial markets, etc.) the data facets may change many times throughout a day, in which case manual changes to facet lists is futile.
  • In contrast to the manual approach described above, embodiments of the invention facilitate the analysis of a set of documents and/or data records (from a database, search engine index, and/or any other document/data repository, collectively referred to as a “dataset”) and the automated selection and recommendation of a set of fields to be used as data facets. Generally, the technique can be implemented in two different ways—either offline, where the dataset is analyzed but the results do not immediately affect the application supported by the dataset, or online, where the results automatically influence the presentation of data facets in real time. In the latter implementation, a facet-finder recommendation engine provides the data fields to a website configuration tool, thereby creating an updated user interface that presents the products and categories to users. In some embodiments, a feedback loop provides query results and usage statistics based on previously selected facets as input into the analysis step for continuous improvement.
  • FIG. 2 illustrates a process 200 for determining optimal or near-optimal facets based on an underlying dataset. Initially, a query or set of queries is identified (STEP 210) that represents a particular user group or selection criteria to narrow the set of documents being identified. In some implementations, a subset of the dataset may be identified and used, for purposes of facet identification, as a representation of the entire dataset. Certain documents of the dataset may, for example, be identified as more important (based on date submitted, length, author, frequency of use, user feedback, etc.) and used either exclusively during the analysis step, or weighted such that they have a greater influence on term distributions and frequencies. In such cases, the processing resources needed to analyze the dataset and determine appropriate facets may be significantly reduced. In some embodiments, however, this step may be skipped, and the entire dataset may be used.
  • The dataset and/or its constituent documents may be structured (e.g., pulled from a relational database), semi-structured (fielded with values), or unstructured. Representative examples include a music catalog in which songs and albums have attributes such as artist, title, length, genre, and release date; a recipe collection in which recipes are associated with a type of cuisine, main ingredients, cooking style and/or a holiday; travel information that may be organized by destinations, prices, and include articles by authors or publications; regulatory documents that include product and part codes, machine types, expiration dates, filing dates and submission data; and images that may be tagged with the name of an artist, date, style, type of image, artistic movement, major colors, theme, etc. In each case, a facet recommendation engine analyzes the data (or, in some cases a defined subset of the data) and computes term frequencies and/or distributions (STEP 230) to determine potential facets that may be used to cluster the data or documents into meaningful classifications. Based on the computed frequencies, one or more fields or data values are selected as data facets.
  • In general, a field is a good candidate for use as a facet if a large percentage (e.g., >95%) of the documents include a value for the field, since using fields with a lower population percentage can result in a significant amount of data being ignored, and therefore not included in the search results. Furthermore, the field should contain a relatively small number of terms (as compared to the total number of documents or records being indexed). As an example, a field having between 10 and 50 values for a dataset containing 1000-5000 items is a good candidate for a data facet. The values in the dataset need not have been distributed evenly or according to a predictable pattern, and in fact if a small number of the values represent a high percentage of the items, the field may be an even stronger candidate for use as a facet. Furthermore, it is preferable for the fields used as facets to be mutually exclusive—e.g., they do not overlap and are not correlated with each other—and the values of one field should not be discernible or predictable from values in other fields.
  • Typically, date fields and numeric fields (e.g., prices) are good candidates for facets. However, fields that contain highly unique numbers (e.g., SKUs, ISBNs, SSNs, etc.) are usually poor candidates for facets, as there is no discernable logic for grouping products, documents or records based on these numbers. As such, the system may, in some cases, identify and reject those facets that represent ID-type data having only one or two items in each (or some high percentage, e.g., >95%) of the groupings.
  • The grouping of numeric values need not be linear. In some cases, especially with products exhibiting high price variability, other classifications may be more beneficial. For example, when presenting memorabilia in an online auction, the prices may range from a few dollars for common merchandise (e.g., pins or hats that were produced in great numbers) to tens of thousands for one-of-a-kind, autographed, limited-edition paraphernalia (e.g., signed artwork, mint-condition coins, etc.). In these cases, the ranges may be logarithmic (or some combination of linear and logarithmic) such that the price facet is presented as $0-$10; $10-$100; $100-$1,000; $1,000-$5,000; and >$5,000.
  • While many fact groupings that include numeric ranges may be logarithmic (e.g., 1-10, 10-100, and 100-1000) or linear, other implementations compute ranges by dividing the actual population of data values into “bins” of equal (or approximately equal) numbers of items. This approach is especially useful when values “clump” together. Using this method allows for the detection of outliers by statistical testing or simply by detecting empty bins adjacent to the top and/or bottom bin, where the extreme bin is smaller than a configurable percentage (e.g., 5%) of the total values. For example, if the linearly spaced bins from 0 to 1000 contain 1, 0, 0, 0, 100, 0, 0, 0, 0, and 1 values, and the middle bin contains values ranging from 500 to 600 (min and max), bins corresponding to “less than 500”, “500-510”, . . . , 590-600”, and “greater than 600” may be created.
  • Once a set of fields has been identified as potential facets, the above criteria are used to determine which fields are “good” facets (STEP 240). In one example, each field may be scored and/or ranked according to the number of values in the field, the distribution of the values across the entire dataset, the frequency the field is included in a search, etc. If the analysis is performed offline, recommended facets can be determined for an entire repository and the ranked fields then displayed to a system designer or programmer using a facet-recommendation user interface, thus facilitating the selection of facets from the set of recommended facets. In some instances, the user interface also allows the designer to reorder facets if, for example, the ordering and/or placement is deemed to be important. The facet recommendation process may also be performed for a single query (or set of queries) such that the presentation of the search interface includes only those facets deemed highly relevant to the current (or fairly recent) search log.
  • If the process is implemented online, the fields may be dynamically ranked using the dataset as a reference. In some cases, a subset of the dataset (e.g., the 500 most requested documents, the 100 most recently added documents, etc.) may be used to determine the potential facets. The recommended facets are then supplied to a web-design application using, for example, XML format (or other markup language) with each facet being represented using a unique tag. As a result, web forms may be created automatically, using the dynamically created facets as categories for documents as presented to the users (STEP 250). Similar to offline mode, the facets may be determined based on an analysis of the current query (by, for example, parsing the query string into component terms and searching the data store for the terms) or recent queries. An example of an XML listing of facets and facet values is illustrated in FIG. 3.
  • In each implementation, query logs may be included in the analysis to capture user interactions with the dataset. For example, if users continually submit queries based on processor speed or operating system (which may not be immediately apparent as important distinguishing factors), these fields may be added as facets, even though the initial analysis indicated they would be poor facets. As a result, fields that otherwise would be overlooked can become important data facets for subsequent searching and retrieval. In this manner, a feedback loop may be used to capture the ongoing performance of the current facet set. Facet performance may be measured based on, for example, the frequency with which queries are submitted using the facet, the percentage of total queries using the facet, the percentage of queries submitted using the facet that are not immediately followed by other queries (i.e., it is likely that the result included the document or product the user was looking for) as well as other factors. Facets may then be added or removed based on the feedback.
  • Similar to limiting the sample of documents used to determine the facets, one or more subsets of query logs may also be identified and used to select (or help select) data facets. For example, more recent queries (e.g., those submitted in the past week) may provide greater insight into current search trends, and therefore be used exclusively. In other cases, statistical samplings may be used from different time periods, days of the week, seasons, etc. to obtain an accurate representation of how users interact with the dataset. A large set of search queries submitted in the weeks leading up to Christmas may be selected to identify data facets relating to toys, for example.
  • Referring to FIG. 4, a system 410 for implementing the techniques described above includes a facet recommendation engine 415 and a data-storage module 420. In some embodiments, the system includes an interface-generation module. The facet recommendation engine 415 provides the application processing component for determining desirable data facets as described above. In one embodiment, the facet recommendation engine 415 includes programming instructions for evaluating large amounts of data and documents, calculating field and value distributions and ranges and recommending which fields to use as data facets. The engine is preferably implemented on one or more server class computers that have sufficient memory, data storage, and processing power and that run a server class operating system (e.g. SUN Solaris, GNU/Linux, MICROSOFT WINDOWS 2000, and later versions, or other such operating system). Other types of system hardware and software can also be used, depending on the capacity of the device, the number of users and the amount of data received. For example, the server may be part of a server farm or server network, which is a logical group of one or more servers. As another example, there may be multiple servers associated with or connected to each other, or multiple servers may operate independently but with shared data. As is typical in large-scale systems, application software can be implemented in components, with different components running on different server computers, on the same server, or some combination.
  • The data-storage module 420 (or modules) stores the data and/or documents being analyzed by the facet recommendation engine 415 and subsequently searched. For instance, the data repository may store information relating to products, documents, people, and/or transactions against which users submit search queries. Examples of databases that may be used to implement this functionality include the MySQL Database Server by Sun Microsystems, the PostgreSQL Database Server by the PostgreSQL Global Development Group of Berkeley, Calif., and the ORACLE Database Server offered by ORACLE Corp. of Redwood Shores, Calif.
  • In embodiments in which the facets are automatically incorporated into the web pages, an interface-generation module 430 generates the structured, tagged source code for integration into the application(s) operating on an application server 440. In implementations in which users can manually modify webpage source code to implement newly discovered or modified facets, one or more clients 460 may be used to access the application server 440 via a web server 450. Such implementations may include a design interface for providing the recommended facets to a web design application for implementing the recommendations. The clients 460 are preferably implemented using software running on a personal or professional grade computer workstation (e.g., a PC with an INTEL processor or an APPLE MACINTOSH) capable of running such operating systems as the MICROSOFT WINDOWS family of operating systems from Microsoft Corporation of Redmond, Wash., the MACINTOSH OSX operating system from Apple Computer of Cupertino, Calif., and various varieties of Unix, such as SUN SOLARIS from SUN MICROSYSTEMS, and GNU/Linux from RED HAT, INC. of Durham, N.C. (and others). The client 460 can also be implemented on such hardware as a smart or dumb terminal, network computer, wireless device, personal data assistant, information appliance, workstation, minicomputer, mainframe computer, or other computing device, that is operated as a general purpose computer or a special purpose hardware device solely used for serving as a client in the system.
  • The client 460 may include client interface software for facilitating the review and selection of data facets as determined by the facet recommendation engine 415, and may be implemented in various forms, for example, in the form of a Java applet that is downloaded to the client and runs in conjunction with a web browser. Alternatively, the client software may be in the form of a standalone application, implemented in a language such as Java, C++, C #, VisualBasic or in native processor-executable code. In one embodiment, if executing on the client, the client software opens a network connection to the server over a communications network and communicates via that connection to the server.
  • A communications network 470 connects the clients 460 with the server(s) 450, 440. The communication may take place via any media such as standard telephone lines, LAN or WAN links (e.g., T1, T3, 56kb, X.25), broadband connections (ISDN, Frame Relay, ATM), wireless links, and so on. Preferably, the network can carry TCP/IP protocol communications, and HTTP/HTTPS requests made by the client software and the connection between the client software and the server can be communicated over such TCP/IP networks. The type of network is not a limitation, however, and any suitable network may be used. Typical examples of networks that can serve as the communications network include a wireless or wired Ethernet-based intranet, a local or wide-area network (LAN or WAN), and/or the global communications network known as the Internet, which may accommodate many different communications media and protocols.
  • The modules described throughout the specification can be implemented in whole or in part as a software program (or programs) operating on one or more processors using any suitable programming language or languages (C++, C #, java, Visual Basic, LISP, BASIC, PERL, etc.) and/or as a hardware device (e.g., ASIC, FPGA, processor, memory, storage and the like).
  • The invention can be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting on the invention described herein.

Claims (21)

1-20. (canceled)
21. A method comprising:
receiving a dataset comprising a plurality of catalog items, each having a plurality of fields, each of the plurality of fields associated with a respective product category value, wherein each product category value identifies a respective catalog item of the plurality of catalog items as belonging to a particular product category of a plurality of product categories;
analyzing the dataset to determine, for each field of the plurality of fields, a quantity of the plurality of catalog items belonging to each of the plurality of product categories and a distribution of the plurality of catalog items across the plurality of product categories;
selecting a subset of fields from the plurality of fields based on the quantity of the plurality of catalog items belonging to each of the plurality of product categories and the distribution of the plurality of catalog items; and
displaying, on a user interface, for each field of the subset of fields, a respective subset of product categories of the plurality of product categories associated with a respective field, wherein each respective product category of the respective subset of product categories indicates a number of catalog items of the plurality of catalog items belonging to the respective product category of the respective subset of product categories.
22. The method of claim 22, comprising:
receiving, via the user interface, a user selection of a selected product category of the respective subset of product categories; and
displaying, on the user interface, a corresponding catalog items of the plurality of catalog items belonging to the selected product category.
23. The method of claim 21, comprising ranking the plurality of fields based on the quantity of the plurality of product categories belonging to each of the plurality of product categories and the distribution of the plurality of catalog items, wherein selecting the subset of fields is based on ranking the plurality of fields.
24. The method of claim 21, comprising receiving a query indicative of a subset of the dataset, wherein analyzing the dataset comprises analyzing the subset of the dataset to determine the quantity of the plurality of catalog items belonging to each of the plurality of product categories and the distribution of the plurality of catalog items across the plurality of product categories.
25. The method of claim 21, wherein displaying the respective subset of product categories comprises displaying a respective data facet for each field of the subset of fields.
26. The method of claim 25, comprising generating source code in response to selecting the subset of fields, wherein the source code comprises the respective data facet for display on the user interface.
27. The method of claim 21, comprising receiving a query log generated in response to a query associated with the dataset, wherein selecting the subset of fields is based on the quantity of the plurality of category items belonging to each of the plurality of product categories, the distribution of the plurality of catalog items, and the query log.
28. A system comprising:
data storage configured to store a dataset comprising a plurality of items and a plurality of fields, wherein each field of the plurality of fields is associated with a plurality of values, and each value of the plurality of values is associated with a subset of items of the plurality of items; and
a facet recommendation engine communicatively coupled to the data storage, wherein the facet recommendation engine is configured to:
retrieve the dataset stored in the data storage;
analyze the dataset to determine, for at least one field of the plurality of fields of the dataset, a quantity of the plurality of values and a distribution of the plurality of items across the plurality of values;
select a subset of fields from the plurality of fields based on the plurality of values and the distribution of the plurality of items; and
cause a plurality of data facets to be displayed on a user interface based on the subset of fields, wherein the plurality of data facets categorizes the plurality of items.
29. The system of claim 28, comprising an interface-generation module communicatively coupled to the facet recommendation engine, wherein the interface-generation module is configured to generate source code based on the subset of fields selected by the facet recommendation engine for integration into an application server.
30. The system of claim 29, comprising the application server, wherein the application server is communicatively coupled to the user interface, and the application server displays the plurality of data facets based on the source code.
31. The system of claim 28, wherein the data storage is configured to store a query log associated with a search of the dataset, and the facet recommendation engine is configured to select the subset of fields from the plurality of fields based on the plurality of values, the distribution of the plurality of items, and the query log.
32. The system of claim 28, wherein each data facet of the plurality of data facets is associated with a corresponding plurality of values, each corresponding plurality of values is associated with a corresponding subset of items of the plurality of items, and the facet recommendation engine is configured to:
cause the corresponding plurality of values to be displayed on the user interface;
receive a user selection of a selected value of the corresponding plurality of values; and
cause the corresponding subset of items associated with the selected value to be displayed on the user interface.
33. The system of claim 28, wherein each item of the plurality of items comprises a product searchable via the user interface.
34. The system of claim 28, wherein the facet recommendation engine is configured to cause a corresponding plurality of values associated with each of the plurality of data facets to be displayed on the user interface, wherein the corresponding plurality of values is linearly organized, logarithmically organized, or organized based on data clusters.
35. A system comprising:
one or more processors; and
a memory comprising instructions that, when executed by the one or more processors, are configured to cause the one or more processors to perform operations comprising:
identifying a dataset comprising a plurality of catalog items, each having a plurality of fields, each of the plurality of fields associated with a respective product category value, wherein each product category value identifies a respective catalog item of the plurality of catalog items as belonging to a particular product category of a plurality of product categories;
analyzing the dataset to determine, for each field of the plurality of fields, a quantity of the plurality of catalog items belonging to each of the plurality of product categories and a distribution of the plurality of catalog items across the plurality of product categories;
selecting a subset of fields from the plurality of fields based on the quantity of the plurality of catalog items belonging to each of the plurality of product categories and the distribution of the plurality of catalog items; and
displaying, on a user interface, for each field of the subset of fields, a respective subset of product categories of the plurality of product categories associated with a respective field, wherein each respective product category of the respective subset of product categories indicates a number of catalog items of the plurality of catalog items belonging to the respective product category of the respective subset of product categories.
36. The system of claim 35, wherein the instructions, when executed by the one or more processors, are configured to display each field of the subset of fields based on a numeric value or based on data clusters.
37. The system of claim 35, wherein the instructions, when executed by the one or more processors, are configured to cause the one or more processors to perform operations comprising:
receiving a search query; and
retrieving information from the dataset based on a correlation between the search query and the subset of fields.
38. The system of claim 35, wherein the instructions, when executed by the one or more processors, are configured to cause the one or more processors to perform operations comprising:
identifying a subset of the dataset; and
analyzing the subset of the dataset to determine the quantity of the plurality of category items belonging to each of the plurality of product categories and the distribution of the plurality of catalog items across the plurality of product categories for each field of the plurality of fields associated with the subset of the dataset.
39. The system of claim 35, wherein the instructions, when executed by the one or more processors, are configured to cause the one or more processors to perform operations comprising receiving a query and identifying the dataset based on the query.
40. The system of claim 35, wherein the instructions, when executed by the one or more processors, are configured to cause the one or more processors to perform operations comprising:
receiving a plurality of query logs generated in response to previous queries submitted against the dataset; and
selecting the subset of fields based on the quantity of the plurality of catalog items belonging to each of the plurality of product categories and the distribution of the plurality of catalog items to reflect the previous queries.
US16/738,746 2008-01-18 2020-01-09 Dynamic determination of data facets Abandoned US20200265491A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/738,746 US20200265491A1 (en) 2008-01-18 2020-01-09 Dynamic determination of data facets

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US2200108P 2008-01-18 2008-01-18
US12/353,565 US10585931B1 (en) 2008-01-18 2009-01-14 Dynamic determination of data facets
US16/738,746 US20200265491A1 (en) 2008-01-18 2020-01-09 Dynamic determination of data facets

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US12/353,565 Continuation US10585931B1 (en) 2008-01-18 2009-01-14 Dynamic determination of data facets

Publications (1)

Publication Number Publication Date
US20200265491A1 true US20200265491A1 (en) 2020-08-20

Family

ID=69723433

Family Applications (2)

Application Number Title Priority Date Filing Date
US12/353,565 Active 2030-05-07 US10585931B1 (en) 2008-01-18 2009-01-14 Dynamic determination of data facets
US16/738,746 Abandoned US20200265491A1 (en) 2008-01-18 2020-01-09 Dynamic determination of data facets

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US12/353,565 Active 2030-05-07 US10585931B1 (en) 2008-01-18 2009-01-14 Dynamic determination of data facets

Country Status (1)

Country Link
US (2) US10585931B1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200226666A1 (en) * 2019-01-14 2020-07-16 Walmart Apollo, Llc Methods and apparatus for facilitating electronic webpage purchases
US11423460B1 (en) * 2021-03-31 2022-08-23 Coupang Corp. Electronic apparatus and information providing method thereof
US11940996B2 (en) 2020-12-26 2024-03-26 International Business Machines Corporation Unsupervised discriminative facet generation for dynamic faceted search

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10853863B2 (en) * 2016-01-30 2020-12-01 Walmart Apollo, Llc Systems and methods for browse facet ranking
US10838994B2 (en) * 2017-08-31 2020-11-17 International Business Machines Corporation Document ranking by progressively increasing faceted query
US10984069B2 (en) * 2019-01-23 2021-04-20 Adobe Inc. Generating user experience interfaces by integrating analytics data together with product data and audience data in a single design tool
US20220012236A1 (en) * 2020-07-10 2022-01-13 Salesforce.Com, Inc. Performing intelligent affinity-based field updates

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080082518A1 (en) * 2006-09-29 2008-04-03 Loftesness David E Strategy for Providing Query Results Based on Analysis of User Intent
US7765227B1 (en) * 2007-03-30 2010-07-27 A9.Com, Inc. Selection of search criteria order based on relevance information
US8135718B1 (en) * 2007-02-16 2012-03-13 Google Inc. Collaborative filtering

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6834278B2 (en) * 2001-04-05 2004-12-21 Thothe Technologies Private Limited Transformation-based method for indexing high-dimensional data for nearest neighbour queries
US6920459B2 (en) * 2002-05-07 2005-07-19 Zycus Infotech Pvt Ltd. System and method for context based searching of electronic catalog database, aided with graphical feedback to the user
US8214345B2 (en) * 2006-10-05 2012-07-03 International Business Machines Corporation Custom constraints for faceted exploration

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080082518A1 (en) * 2006-09-29 2008-04-03 Loftesness David E Strategy for Providing Query Results Based on Analysis of User Intent
US8135718B1 (en) * 2007-02-16 2012-03-13 Google Inc. Collaborative filtering
US7765227B1 (en) * 2007-03-30 2010-07-27 A9.Com, Inc. Selection of search criteria order based on relevance information

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200226666A1 (en) * 2019-01-14 2020-07-16 Walmart Apollo, Llc Methods and apparatus for facilitating electronic webpage purchases
US11940996B2 (en) 2020-12-26 2024-03-26 International Business Machines Corporation Unsupervised discriminative facet generation for dynamic faceted search
US11423460B1 (en) * 2021-03-31 2022-08-23 Coupang Corp. Electronic apparatus and information providing method thereof

Also Published As

Publication number Publication date
US10585931B1 (en) 2020-03-10

Similar Documents

Publication Publication Date Title
US20200265491A1 (en) Dynamic determination of data facets
US11347963B2 (en) Systems and methods for identifying semantically and visually related content
US11314822B2 (en) Interface for a universal search
US7743059B2 (en) Cluster-based management of collections of items
US8560545B2 (en) Item recommendation system which considers user ratings of item clusters
US7689457B2 (en) Cluster-based assessment of user interests
US10354308B2 (en) Distinguishing accessories from products for ranking search results
US7966225B2 (en) Method, system, and medium for cluster-based categorization and presentation of item recommendations
US8019766B2 (en) Processes for calculating item distances and performing item clustering
US8893011B2 (en) Chronology display and feature for online presentations and webpages
US8019650B2 (en) Method and system for producing item comparisons
JP5501373B2 (en) System and method for collecting and ranking data from multiple websites
US10204121B1 (en) System and method for providing query recommendations based on search activity of a user base
US20060173753A1 (en) Method and system for online shopping
US20020099581A1 (en) Computer-implemented dimension engine
US9330071B1 (en) Tag merging
WO2008121872A1 (en) Cluster-based assessment of user interests
US8121970B1 (en) Method for identifying primary product objects
Bhatia et al. Machine Learning with R Cookbook: Analyze data and build predictive models
US20130185315A1 (en) Identification of Events of Interest
JP2001229171A (en) Article retrieval system
dos Santos et al. Building comparison-shopping brokers on the web

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION