US20180239781A1 - Automatically obtaining real-time, geographically-relevant product information from heterogeneus sources - Google Patents

Automatically obtaining real-time, geographically-relevant product information from heterogeneus sources Download PDF

Info

Publication number
US20180239781A1
US20180239781A1 US15/961,120 US201815961120A US2018239781A1 US 20180239781 A1 US20180239781 A1 US 20180239781A1 US 201815961120 A US201815961120 A US 201815961120A US 2018239781 A1 US2018239781 A1 US 2018239781A1
Authority
US
United States
Prior art keywords
crawler
application
inventory information
web
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US15/961,120
Inventor
Jack Phillip Abraham
Aaron Adelson
Matthew Barto
Theodore James Dziuba
John Evans
Neville Newey
Justin Van Winkle
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
eBay Inc
Original Assignee
eBay Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US201161439724P priority Critical
Priority to US13/366,962 priority patent/US9977790B2/en
Application filed by eBay Inc filed Critical eBay Inc
Priority to US15/961,120 priority patent/US20180239781A1/en
Assigned to EBAY INC. reassignment EBAY INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BARTO, MATTHEW, DZIUBA, THEODORE JAMES, ADELSON, AARON, ABRAHAM, JACK PHILLIP, VAN WINKLE, Justin, EVANS, JOHN, NEWEY, NEVILLE
Publication of US20180239781A1 publication Critical patent/US20180239781A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • G06F17/30241
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • G06F17/30864

Abstract

Techniques for obtaining geographically-relevant product inventory information, in real-time, from heterogeneous data sources are described. Product inventory information, including the volume of available products in specific geographical locations, is obtained from at least three different sources. First, one or more data feeds may be received. Second, a data obtaining module uses one or more APIs to obtain product inventory information from one or more third-party inventory management systems. Finally, a structured data mining module uses a web crawler, at the direction of a crawler configuration, to systematically obtain product inventory information from various third-party websites. Accordingly, a user's search query is processed to provide geographically relevant product inventory information in near real time.

Description

    RELATED APPLICATIONS
  • The present application is a continuation of U.S. patent application Ser. No. 13/366,962, filed Feb. 6, 2012, which claims the benefit of priority, under 35 U.S.C. § 119(e), to U.S. Provisional Patent Application Ser. No. 61/439,724, entitled, “Methods and Systems for Automatically Obtaining Real-Time, Geographically-Relevant Product Information From Heterogeneous Sources, and Enhancing and Presenting the Product Information”, filed on Feb. 4, 2011, which is by way of reference incorporated herein in its entirety.
  • TECHNICAL FIELD
  • The present disclosure generally relates to data processing techniques for obtaining from disparate and heterogeneous sources, real-time, geographically-relevant information concerning products and their availability.
  • BACKGROUND
  • The Internet and the World Wide Web have given rise to a wide variety of on-line retailers that operate virtual stores from which consumers can purchase products (i.e., merchandise, or goods) as well as services. Although the popularity of these on-line retail sites is clearly evidenced by their increasing sales, for a variety of reasons, some consumers may still prefer to purchase products and services in a more conventional manner—i.e., via a brick-and-mortar store.
  • DESCRIPTION OF THE DRAWINGS
  • Some embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings:
  • FIG. 1 is an example of a web page from which various elements of information are retrieved by a crawler operating in conjunction with a crawler configuration generated with a crawler configuration application, according to some embodiments of the invention;
  • FIG. 2 is a block diagram illustrating an example of a cache key in the form of a three-tuple with a zip code, offer or product code, and offer variant, consistent with some embodiments of the invention;
  • FIG. 3 is a block diagram illustrating the data source and data flows that occur for populating a database with product inventory information, according to some embodiments of the invention; and
  • FIG. 4 is a block diagram of a machine in the form of a computing device within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed.
  • DETAILED DESCRIPTION
  • The present disclosure describes data processing techniques for obtaining from disparate and heterogeneous sources, real-time, geographically-relevant information concerning products and their availability. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the various aspects of different embodiments of the present invention. It will be evident, however, to one skilled in the art, that the present invention may be practiced without all of the specific details.
  • Embodiments of the present invention involve a set of sophisticated and computer-implemented automated tools and processes for obtaining current data about products and their availability from a wide variety of data sources, such as web sites, network-connected databases, inventory systems, and so forth. In particular, the systems and methods described herein facilitate obtaining and presenting in near real-time, geographically-relevant data concerning products and their availability, such that a potential consumer can perform a web-based search to locate a product, with its current inventory information, at a retail store in a particular geographical area. For example, an automated process (e.g., a crawler) can be configured to obtain product information from a variety of web sites. Alternatively, an external database may be accessed via an application programming interface (API). In any case, once the data is obtained, this data is enhanced and stored in a local database. The data can then be presented to potential consumers in response to a consumer browsing or searching for relevant products and specifying a particular location. As there are many stages involved in the overall process of obtaining, enhancing and presenting this product data, the following description of the inventive subject matter is presented in sections, which loosely correlate with the various stages.
  • Data Acquisition—Structured Data Mining
  • Consistent with some embodiments of the inventive subject matter, data from a wide variety of sources is obtained via a system and method of structured data mining. The system and related automated processes that facilitate the structured data mining consist primarily of two components. The first is a web-based application (referred to herein as the crawler construction kit, or CCK) used to configure one or more proprietary crawlers. A crawler (sometimes referred to as a web crawler, or bot) is an automated computer program process that operates to browse the Internet or World Wide Web in a methodical manner, gathering or obtaining data in an orderly fashion. The CCK is a web-based application that allows its user to browse a retailer's web site and quickly establish a crawler configuration—e.g., a set of automated steps—that is required to obtain some item of information (e.g., the color, price, quantity available, etc.) about a particular product being offered via a particular retailer. Accordingly, using the CCK, a user can create a crawler configuration (e.g., a set of interpretable, or executable, instructions), which is then used to direct a crawler to perform a particular set of operations to obtain a particular set or item of data, and thereby populate a database with product inventory information obtained automatically from various websites. This type of technique is generally referred to as web scraping.
  • With some embodiments, the CCK provides a user with a web-based set of tools for selecting and tagging various elements of a web page that correspond with elements of product inventory information that can be automatically extracted by an automated crawler. For instance, with some embodiments, the CCK application enables a user to manipulate a cursor with a pointing device to interact with elements on a web page, for example, by clicking, selecting, dragging, etc. When a particular item or element of information displayed on the web pages has been selected, the source document underlying the web page is analyzed to identify information that might be used by a crawler to extract or obtain the element of information. This information is then automatically populated in a crawler configuration (e.g., a configuration file) for a particular crawler that will later be used to periodically obtain the set or item of information. In some cases, the CCK application may prompt the user to select various options or settings for use in obtaining a particular element of information. Additionally, as discussed briefly below, the user may opt to open a separate window, pane or similar user interface element in which the user can directly edit a snippet of code for inclusion with the crawler configuration for the specified crawler. For instance, in certain scenarios, a user may be required to customize a crawler configuration to direct a crawler to perform some specialized operation(s) that are required to obtain a particular element of information.
  • Once extracted, the data may be manipulated or enhanced and then inserted into a database and used in the processing of users' queries, and presentation in search results, etc. With some embodiments, normalizing the information so that common characteristics can be compared with a common nomenclature may enhance the information. Additionally, with some embodiments, specific products may be categorized and classified into a proprietary hierarchy. Similarly, with some embodiments, products may be assigned to proprietary product identifiers, where common, publicly available SKU's (or other identifiers) are not used.
  • As illustrated in FIG. 1, an example user interface for a merchant website is displayed. Using the CCK tool, a user can select various elements of information presented in the web page, and specify configuration information for use by a crawler in obtaining the elements of information. For instance, any of the following elements of information may be selected with the CCK tool for purposes of customizing or configuring the crawler: the descriptive name of the product 10, the users' ratings and reviews 12, the text within the details tab 14, the information presented within the fit tab 16, the shipping information 18, the color information 20, the size information 22, the picture 24, the pricing information 26, and the item number 28. In addition, with some embodiments, the crawler may identify a volume or amount of a particular product that is available within a particular geographical location.
  • The second component that is part of the structured data mining system is a suite of crawlers that are configured to use a crawler configuration created by the web-based CCK application. In contrast to conventional web crawlers, crawlers consistent with embodiments of the invention are configured to be driven by the crawler configurations that are created by the CCK, which can be quite complex. As a result, the crawlers can be configured to crawl web sites and obtain data that many conventional automated crawlers would have no way of accessing. As there may be many different crawlers in the suite of crawlers, the crawler configuration may specify the particular crawler for which the configuration is to be used.
  • Consistent with some embodiments of the invention, the web-based CCK application enables an approach to describing how to select desired information from various sources (HTML, XML, JSON, javascript, etc.). Fundamentally, the web-based CCK application defines sets of what are referred to herein as selectors, where each selector describes how to extract a single item of information (e.g. product title, retail price, product image URL, description, etc). Each selector is in essence a set of steps, or a pipeline, that describes a series of operations that are to be performed in order to request and then extract the desired information from a web server, for insertion into a database.
  • To establish a pipeline, the following steps or stages are followed.
      • 1. Select elements from data source
      • 2. Apply filters to the selected elements
      • 3. Apply filters to the values of the selected elements
      • 4. Apply custom treatments to values.
  • Each of the first three stages have several built-in mechanisms, but in most cases the user can, if necessary, fall back to writing code (e.g., python code) directly in the user interface of the web-based CCK application in order to define custom behaviors. For instance, the web-based CCK application includes a code editing module that enables a user to define a script or section of executable code, which can be executed to perform a customized operation that is not easily definable by the automated tools of the web-based CCK application. This code can be arbitrarily complex, so for example it can open new network resources, download additional web pages or assets, use third party libraries, and so on. Accordingly, the web-based CCK application enables a user to very quickly automate a crawler to retrieve an item of information from a web site, by generating a crawler configuration, and if necessary, customizing the behavior of the crawler to perform more complex operations. The custom treatments in stage four (4) are all built-in optimizations for common cases that are frequently encountered in this problem domain (e.g. handling of currency in prices).
  • In addition to configuration, the web-based CCK application also supports live testing of configurations as well as automated validation of crawler configurations. Accordingly, a user attempting to generate a crawler configuration to obtain a particular item of information about a product can select to test the crawler configuration in real-time, and observe how the crawler, controlled by the crawler configuration, performs the operations. This allows the user to tweak or modify the configuration to obtain the required data item.
  • Real-Time Product Availability Lookup (RTPAL)
  • In addition to using a crawler to obtain information, with some embodiments, a more formal or dedicated process might also be used. Whereas a crawler can obtain information from websites when the operator of the website does not provide a publically available API, the RTPAL generally relies on the existence of formal, publically accessible inventory systems to obtain product inventory information. For instance, with some embodiments, a Real-Time Product Availability Lookup (RTPAL) system is used to query external inventory systems. The RTPAL system consists primarily of three components. The first component is a framework for retrieval and caching of information from individual merchant inventory systems. The second component is a suite of components to make building clients to individual inventory systems easy. The third component is a set of individual clients (which are built using these components to run inside the framework) for accessing specific merchant inventory systems (i.e. individual big box retailers like Target, Best Buy, etc., as well as aggregate sources like Volusion or MerchantOS, and small merchant sources, such as Quickbooks or Microsoft Dynamic).
  • The RTPAL system has a cache system that uses ZVOTs (Zip-Code, Variation, Offer tuples) as cache keys. Specifically, the cache key used in querying the cache includes three components, a zip code relevant to the query, an offer identifier corresponding with the specific offer or product, and a variation identifier specifying or indicating the particular variant of the product or offering. The offer identifier is essentially synonymous with a product identifier, and uniquely identifies at a top level a particular product or item that is being offered for sale. A variation is a set of product specific characteristics. For example, for clothing, the variation may specify such characteristics as size and color, etc. With other products, other variations are possible. For instance, with a tablet computer, a variation may specify the amount of member (16 GB, 32 GB, 64 GB, etc.) included with the computer. The zip code is used to specify the zip code of relevance to the search. For instance, if a user is looking for a product in a particular zip code, the specified zip code can be used to query the cache and ensure only relevance cache information is returned. In other embodiments, the system might implement fuzzy geographic-based caching in order to drastically increase the cache hit rate and to support significantly higher traffic volumes.
  • As illustrated in FIG. 2, an example of a three-tuple (e.g., ZVOT) cache key for querying a cache is shown. For example, the cache key with reference 30 scores a cache hit with the cache entry having reference number 32 when used to query the cache entries on the right with reference number 34. By including the zip code in the cache key, those cache entries that are geographically relevant to a particular user's product query can be returned and presented with minimal processing delay.
  • In addition to using the RTPAL and the structured data mining techniques described above, with some embodiments, product inventory information is received from third party sources via a simple data feed. Accordingly, FIG. 3 illustrates a block diagram of the various data sources from which product inventory information can be obtained. For instance, the data sources 40 include data feeds 42, application programming interfaces (APIs) 44 for accessing merchant-specific product inventory systems, and structured data mining of third-party websites 46. As illustrated in FIG. 3, the data obtaining module 48 facilitates the receiving of the product inventory information from the data feeds 42 and the RTPAL-based APIs 44, while the structured data mining module 50 facilitates the real-time receipt of information from third-party websites. As described in greater detail below, once obtained, the date is stored in a product inventory database 52 and enhanced, for example, by the product offering matching module 54.
  • Automated Product Matching
  • With some embodiments, automated product matching is performed by a product offering matching module 54 (FIG. 3). The goal of product matching is to aggregate offers for the same product to enhance the user experience. When data is collected, all attempts are made to capture unique product identifiers such as: UPC, EAN, ASIN, ISBN, SKU and Model Number. When one or more of the above identifiers are available, a rule-based algorithm is evoked to determine if the offer matches an existing product. If a match is achieved, the product or offer is assigned to the matching product. If none of the above identifiers are available, other attributes of the offer, for example title, description, brand and specifications are used to determine a similarity score with respect to one or more existing products. If a match is found, the matching product is assigned, and if not, a new product is created.
  • Automated Product Categorization
  • With some embodiments of the invention, a product type taxonomy is used. For instance, with some embodiments, the taxonomy may consist of approximately three-thousand (3000) unique categories and sub-categories, arranged as nodes of a tree-like hierarchical structure. Approximately twenty-six hundred (2600) of these unique categories may be leaf nodes. An example of a leaf node would be: Vehicle GPS Units. The aim of categorization is to ensure that every product offer has at least one category node assigned to it.
  • With some embodiments, labelled offers are collected. These labelled offers are used as training data in a machine learning algorithm, which then classifies the remaining unlabeled offers. The classification algorithm is a hybrid of variations on several different classic algorithms: Naive Bayes, Rocchio, and kNN. With some embodiments, precisions vary by category and are typically upwards of 0.9. Overall precision may be upwards of 0.96. With some embodiments, approximately 80% of active offers can be classified with the automated categorization system.
  • The various operations of example methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules or objects that operate to perform one or more operations or functions. The modules and objects referred to herein may, in some example embodiments, comprise processor-implemented modules and/or objects.
  • Similarly, the methods described herein may be at least partially processor-implemented. For example, at least some of the operations of a method may be performed by one or more processors or processor-implemented modules. The performance of certain operations may be distributed among the one or more processors, not only residing within a single machine or computer, but deployed across a number of machines or computers. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment or at a server farm), while in other embodiments the processors may be distributed across a number of locations.
  • The one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or within the context of “software as a service” (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., Application Program Interfaces (APIs)).
  • FIG. 4 is a block diagram of a machine in the form of a computer system within which a set of instructions, for causing the machine to perform any one or more of the methodologies discussed herein, may be executed. In alternative embodiments, the machine operates as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, the machine may operate in the capacity of a server or a client machine in a client-server network environment, or as a peer machine in peer-to-peer (or distributed) network environment. In a preferred embodiment, the machine will be a server computer, however, in alternative embodiments, the machine may be a personal computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a mobile telephone, a web appliance, a network router, switch or bridge, or any machine capable of executing instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
  • The example computer system 1500 includes a processor 1502 (e.g., a central processing unit (CPU), a graphics processing unit (GPU) or both), a main memory 1501 and a static memory 1506, which communicate with each other via a bus 1508. The computer system 1500 may further include a display unit 1510, an alphanumeric input device 1517 (e.g., a keyboard), and a user interface (UI) navigation device 1511 (e.g., a mouse). In one embodiment, the display, input device and cursor control device are a touch screen display. The computer system 1500 may additionally include a storage device 1516 (e.g., drive unit), a signal generation device 1518 (e.g., a speaker), a network interface device 1520, and one or more sensors 1521, such as a global positioning system sensor, compass, accelerometer, or other sensor.
  • The drive unit 1516 includes a machine-readable medium 1522 on which is stored one or more sets of instructions and data structures (e.g., software 1523) embodying or utilized by any one or more of the methodologies or functions described herein. The software 1523 may also reside, completely or at least partially, within the main memory 1501 and/or within the processor 1502 during execution thereof by the computer system 1500, the main memory 1501 and the processor 1502 also constituting machine-readable media.
  • While the machine-readable medium 1522 is illustrated in an example embodiment to be a single medium, the term “machine-readable medium” may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more instructions. The term “machine-readable medium” shall also be taken to include any tangible medium that is capable of storing, encoding or carrying instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present invention, or that is capable of storing, encoding or carrying data structures utilized by or associated with such instructions. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, and optical and magnetic media. Specific examples of machine-readable media include non-volatile memory, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks such as internal hard disks and removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
  • The software 1523 may further be transmitted or received over a communications network 1526 using a transmission medium via the network interface device 1520 utilizing any one of a number of well-known transfer protocols (e.g., HTTP). Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), the Internet, mobile telephone networks, Plain Old Telephone (POTS) networks, and wireless data networks (e.g., Wi-Fi® and WiMax® networks). The term “transmission medium” shall be taken to include any intangible medium that is capable of storing, encoding or carrying instructions for execution by the machine, and includes digital or analog communications signals or other intangible medium to facilitate communication of such software.
  • Although an embodiment has been described with reference to specific example embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the invention. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense. The accompanying drawings that form a part hereof, show by way of illustration, and not of limitation, specific embodiments in which the subject matter may be practiced. The embodiments illustrated are described in sufficient detail to enable those skilled in the art to practice the teachings disclosed herein. Other embodiments may be utilized and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. This Detailed Description, therefore, is not to be taken in a limiting sense, and the scope of various embodiments is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.

Claims (20)

What is claimed is:
1. A method comprising:
providing a web-based crawler configuration application to client device;
receiving, via the web-based crawler configuration application, a selection of an element of a web page displayed at the client device;
identifying, in a source document of the web page, an element of product inventory information corresponding to the selected element of the web page; and
generating a crawler configuration file based on the identified element of product inventory information, the crawler configuration file configuring an operation of a crawler application.
2. The method of claim 1, further comprising:
executing a first set of instructions representing an instance of the crawler application, the instance of the crawler application configured to perform a set of operations specified in the crawler configuration file, the set of operations resulting in a retrieval of product inventory information for one or more products hosted at one or more web servers;
executing a second set of instructions representing an instance of a real-time product availability lookup (RTPAL) application, the RTPAL application to use one or more application programming interfaces (APIs) to request and receive product inventory information from one or more third-party network-connected inventory management systems;
enhancing the product inventory information from the crawler application with the product inventory information from the RTPAL application; and
storing the enhanced product inventory information in the database.
3. The method of claim 2, further comprising:
subsequent to receiving product inventory information as a result of executing either of the crawler application and the RTPAL application, determining that received product inventory information for a particular product does not specify a unique product identifier;
performing a matching operation by comparing various elements of information concerning the particular product with corresponding information from one or more known products to determine a product with which the received product inventory information for the particular product not specifying the unique product identifier best matches; and
storing the received product inventory information for the particular product not specifying the unique product identifier in the database.
4. The method of claim 2, further comprising:
storing product inventory information for a particular product received via the crawler application or the RTPAL application in a data cache with each cache entry having a cache key based on a zipcode for a location at which the particular product received via the crawler application or the RTPAL application is available, a product identifier, and a product variation identifier.
5. The method of claim 2, further comprising:
assigning to each product identified in the received product inventory information as a result of executing either of the crawler application and the RTPAL application one or more category identifiers for a particular category in a hierarchical category.
6. The method of claim 1, further comprising:
receiving a search query from a user, the search query including information identifying a desired geographical area; and
processing the search query to provide product inventory information for a particular product satisfying the search query, the product inventory information for the particular product satisfying the search query specifying one or more merchant stores in a geographical location satisfying the desired geographical area identified in the query, and indicating a quantity of the particular product satisfying the search query available at each of the one or more merchant stores.
7. The method of claim 1, wherein the web-based crawler configuration application enables the user to specify one of a suite of web crawlers for use with a particular crawler configuration file generated by the web crawler configuration application.
8. The method of claim 1, wherein the crawler configuration identifies one crawler from a plurality of crawlers to be used with the crawler configuration.
9. The method of claim 1, wherein the web-based crawler configuration application enables the user to invoke a code editing application to specify customized code for obtaining a particular element of product inventory information, the customized code for inclusion in a crawler configuration file for use with a particular crawler.
10. The method of claim 1, wherein the web-based crawler configuration application defines a set of selectors, each selector describing how to extract a single item of information for insertion into the database.
11. A server comprising:
one or more processors for executing one or more sets of instructions stored in a memory, the one or more set of instructions comprising:
providing a web-based crawler configuration application to client device;
receiving, via the web-based crawler configuration application, a selection of an element of a web page displayed at the client device;
identifying, in a source document of the web page, an element of product inventory information corresponding to the selected element of the web page; and
generating a crawler configuration file based on the identified element of product inventory information, the crawler configuration file configuring an operation of a crawler application.
12. The server of claim 11, wherein the one or more sets of instructions further comprise:
executing a first set of instructions representing an instance of the crawler application, the instance of the crawler application configured to perform a set of operations specified in the crawler configuration file, the set of operations resulting in a retrieval of product inventory information for one or more products hosted at one or more web servers;
executing a second set of instructions representing an instance of a real-time product availability lookup (RTPAL) application, the RTPAL application to use one or more application programming interfaces (APIs) to request and receive product inventory information from one or more third-party network-connected inventory management systems;
enhancing the product inventory information from the crawler application with the product inventory information from the RTPAL application; and
storing the enhanced product inventory information in the database.
13. The server of claim 12, wherein the one or more sets of instructions further comprise:
subsequent to receiving product inventory information as a result of executing either of the crawler application and the RTPAL application, determining that received product inventory information for a particular product does not specify a unique product identifier;
performing a matching operation by comparing various elements of information concerning the particular product with corresponding information from one or more known products to determine a product with which the received product inventory information for the particular product not specifying the unique product identifier best matches; and
storing the received product inventory information for the particular product not specifying the unique product identifier in the database.
14. The server of claim 12, wherein the one or more sets of instructions further comprise:
storing product inventory information for a particular product received via the crawler application or the RTPAL application in a data cache with each cache entry having a cache key based on a zipcode for a location at which the particular product received via the crawler application or the RTPAL application is available, a product identifier, and a product variation identifier.
15. The server of claim 11, wherein the one or more sets of instructions further comprise:
receiving a search query from a user, the search query including information identifying a desired geographical area; and
processing the search query to provide product inventory information for a particular product satisfying the search query, the product inventory information for the particular product satisfying the search query specifying one or more merchant stores in a geographical location satisfying the desired geographical area identified in the query, and indicating a quantity of the particular product satisfying the search query available at each of the one or more merchant stores.
16. The server of claim 11, wherein the web-based crawler configuration application enables the user to specify one of a suite of web crawlers for use with a particular crawler configuration file generated by the web crawler configuration application.
17. The server of claim 11, wherein the crawler configuration identifies one crawler from a plurality of crawlers to be used with the crawler configuration.
18. The server of claim 11, wherein the web-based crawler configuration application enables the user to invoke a code editing application to specify customized code for obtaining a particular element of product inventory information, the customized code for inclusion in a crawler configuration file for use with a particular crawler.
19. The server of claim 11, wherein the web-based crawler configuration application defines a set of selectors, each selector describing how to extract a single item of information for insertion into the database.
20. A machine-readable storage medium storing instructions thereon, which, when executed by a processor of a server, will cause the server to perform a set of operations comprising:
providing a web-based crawler configuration application to client device;
receiving, via the web-based crawler configuration application, a selection of an element of a web page displayed at the client device;
identifying, in a source document of the web page, an element of product inventory information corresponding to the selected element of the web page; and
generating a crawler configuration file based on the identified element of product inventory information, the crawler configuration file configuring an operation of a crawler application.
US15/961,120 2011-02-04 2018-04-24 Automatically obtaining real-time, geographically-relevant product information from heterogeneus sources Pending US20180239781A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US201161439724P true 2011-02-04 2011-02-04
US13/366,962 US9977790B2 (en) 2011-02-04 2012-02-06 Automatically obtaining real-time, geographically-relevant product information from heterogeneus sources
US15/961,120 US20180239781A1 (en) 2011-02-04 2018-04-24 Automatically obtaining real-time, geographically-relevant product information from heterogeneus sources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US15/961,120 US20180239781A1 (en) 2011-02-04 2018-04-24 Automatically obtaining real-time, geographically-relevant product information from heterogeneus sources

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/366,962 Continuation US9977790B2 (en) 2011-02-04 2012-02-06 Automatically obtaining real-time, geographically-relevant product information from heterogeneus sources

Publications (1)

Publication Number Publication Date
US20180239781A1 true US20180239781A1 (en) 2018-08-23

Family

ID=46601382

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/366,962 Active 2034-11-14 US9977790B2 (en) 2011-02-04 2012-02-06 Automatically obtaining real-time, geographically-relevant product information from heterogeneus sources
US15/961,120 Pending US20180239781A1 (en) 2011-02-04 2018-04-24 Automatically obtaining real-time, geographically-relevant product information from heterogeneus sources

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/366,962 Active 2034-11-14 US9977790B2 (en) 2011-02-04 2012-02-06 Automatically obtaining real-time, geographically-relevant product information from heterogeneus sources

Country Status (1)

Country Link
US (2) US9977790B2 (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10169308B1 (en) 2010-03-19 2019-01-01 Google Llc Method and system for creating an online store
US8600153B2 (en) 2012-02-07 2013-12-03 Zencolor Corporation System and method for normalization and codification of colors for dynamic analysis
US9607404B2 (en) 2012-02-07 2017-03-28 Zencolor Corporation System for normalizing, codifying and categorizing color-based product and data based on a universal digital color system
US9047633B2 (en) 2012-02-07 2015-06-02 Zencolor Corporation System and method for identifying, searching and matching products based on color
US10460475B2 (en) 2012-02-07 2019-10-29 Zencolor Global, Llc Normalization of color from a captured image into a universal digital color system for specification and matching
US9436704B2 (en) 2012-02-07 2016-09-06 Zencolor Corporation System for normalizing, codifying and categorizing color-based product and data based on a universal digital color language
US9087357B2 (en) 2013-10-16 2015-07-21 Zencolor Corporation System for normalizing, codifying and categorizing color-based product and data based on a universal digital color language
US10664534B2 (en) * 2012-11-14 2020-05-26 Home Depot Product Authority, Llc System and method for automatic product matching
US9928515B2 (en) 2012-11-15 2018-03-27 Home Depot Product Authority, Llc System and method for competitive product assortment
US10504127B2 (en) 2012-11-15 2019-12-10 Home Depot Product Authority, Llc System and method for classifying relevant competitors
US10290012B2 (en) 2012-11-28 2019-05-14 Home Depot Product Authority, Llc System and method for price testing and optimization
US20140172558A1 (en) * 2012-12-13 2014-06-19 Christopher Kenneth Harris Purchase transaction content display
US9836775B2 (en) * 2013-05-24 2017-12-05 Ficstar Software, Inc. System and method for synchronized web scraping
US10482512B2 (en) 2013-05-31 2019-11-19 Michele Meek Systems and methods for facilitating the retail shopping experience online
CN104090965A (en) * 2014-07-15 2014-10-08 百度在线网络技术(北京)有限公司 Showing method and device for search results
CN104615792A (en) * 2015-03-12 2015-05-13 浪潮集团有限公司 One household type querying and showing method for enterprise internet data
US20180211206A1 (en) 2017-01-23 2018-07-26 Tête-à-Tête, Inc. Systems, apparatuses, and methods for managing inventory operations
US20180293234A1 (en) * 2017-04-10 2018-10-11 Bdna Corporation Curating objects
US20200114394A1 (en) * 2018-10-13 2020-04-16 Waste Repurposing International, Inc. Waste Classification Systems and Methods

Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020111880A1 (en) * 2001-02-14 2002-08-15 Buy And Sell Fast, Inc. Method of facilitating electronic commerce over a computer network
US20030028451A1 (en) * 2001-08-03 2003-02-06 Ananian John Allen Personalized interactive digital catalog profiling
US20030204448A1 (en) * 2002-04-26 2003-10-30 Vishik Claire S. System and method for creating electronic marketplaces
US20050075926A1 (en) * 2000-07-25 2005-04-07 Informlink, Inc. On-line promotion server
US20050116033A1 (en) * 1996-01-02 2005-06-02 Moore Steven J. Apparatus and method for purchased product security
US20060059424A1 (en) * 2004-09-15 2006-03-16 Petri Jonah W Real-time data localization
US7139747B1 (en) * 2000-11-03 2006-11-21 Hewlett-Packard Development Company, L.P. System and method for distributed web crawling
US20070112831A1 (en) * 2005-11-15 2007-05-17 Microsoft Corporation User interface for specifying desired configurations
US20070124721A1 (en) * 2005-11-15 2007-05-31 Enpresence, Inc. Proximity-aware virtual agents for use with wireless mobile devices
US20070282693A1 (en) * 2006-05-23 2007-12-06 Stb Enterprises, Inc. Method for dynamically building documents based on observed internet activity
US20070294230A1 (en) * 2006-05-31 2007-12-20 Joshua Sinel Dynamic content analysis of collected online discussions
US20080195507A1 (en) * 2007-01-01 2008-08-14 Nitesh Ratnakar Virtual Online Store
US20080195949A1 (en) * 2007-02-12 2008-08-14 Geoffrey King Baum Rendition of a content editor
US20080313165A1 (en) * 2007-06-15 2008-12-18 Microsoft Corporation Scalable model-based product matching
US20090063781A1 (en) * 2007-08-31 2009-03-05 Ebersole Steven cache access mechanism
US7536389B1 (en) * 2005-02-22 2009-05-19 Yahoo ! Inc. Techniques for crawling dynamic web content
US20090265251A1 (en) * 2007-11-30 2009-10-22 Nearbynow Systems and Methods for Searching a Defined Area
US20090300641A1 (en) * 2008-05-30 2009-12-03 Novell, Inc. System and method for supporting a virtual appliance
US20100086192A1 (en) * 2008-10-02 2010-04-08 International Business Machines Corporation Product identification using image analysis and user interaction
US20100114957A1 (en) * 2006-04-05 2010-05-06 Glenbrook Associates, Inc. System and method for collecting and accessing product information in a database
US20100125497A1 (en) * 2008-12-16 2010-05-20 Dale Junior Arguello Electronic coupon distribution and redemption method for electronic devices
US20120072409A1 (en) * 2005-09-28 2012-03-22 Bradley John Perry Method and system for identifying targeted data on a web page
US20120191719A1 (en) * 2000-05-09 2012-07-26 Cbs Interactive Inc. Content aggregation method and apparatus for on-line purchasing system
US20120246022A1 (en) * 1997-06-19 2012-09-27 Jerome Dale Johnson Inventory sales system and method

Patent Citations (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050116033A1 (en) * 1996-01-02 2005-06-02 Moore Steven J. Apparatus and method for purchased product security
US20120246022A1 (en) * 1997-06-19 2012-09-27 Jerome Dale Johnson Inventory sales system and method
US20120191719A1 (en) * 2000-05-09 2012-07-26 Cbs Interactive Inc. Content aggregation method and apparatus for on-line purchasing system
US20050075926A1 (en) * 2000-07-25 2005-04-07 Informlink, Inc. On-line promotion server
US7139747B1 (en) * 2000-11-03 2006-11-21 Hewlett-Packard Development Company, L.P. System and method for distributed web crawling
US20020111880A1 (en) * 2001-02-14 2002-08-15 Buy And Sell Fast, Inc. Method of facilitating electronic commerce over a computer network
US20030028451A1 (en) * 2001-08-03 2003-02-06 Ananian John Allen Personalized interactive digital catalog profiling
US20030204448A1 (en) * 2002-04-26 2003-10-30 Vishik Claire S. System and method for creating electronic marketplaces
US20060059424A1 (en) * 2004-09-15 2006-03-16 Petri Jonah W Real-time data localization
US7536389B1 (en) * 2005-02-22 2009-05-19 Yahoo ! Inc. Techniques for crawling dynamic web content
US20120072409A1 (en) * 2005-09-28 2012-03-22 Bradley John Perry Method and system for identifying targeted data on a web page
US20070112831A1 (en) * 2005-11-15 2007-05-17 Microsoft Corporation User interface for specifying desired configurations
US20070124721A1 (en) * 2005-11-15 2007-05-31 Enpresence, Inc. Proximity-aware virtual agents for use with wireless mobile devices
US20100114957A1 (en) * 2006-04-05 2010-05-06 Glenbrook Associates, Inc. System and method for collecting and accessing product information in a database
US20070282693A1 (en) * 2006-05-23 2007-12-06 Stb Enterprises, Inc. Method for dynamically building documents based on observed internet activity
US20070294230A1 (en) * 2006-05-31 2007-12-20 Joshua Sinel Dynamic content analysis of collected online discussions
US20080195507A1 (en) * 2007-01-01 2008-08-14 Nitesh Ratnakar Virtual Online Store
US20080195949A1 (en) * 2007-02-12 2008-08-14 Geoffrey King Baum Rendition of a content editor
US20080313165A1 (en) * 2007-06-15 2008-12-18 Microsoft Corporation Scalable model-based product matching
US20090063781A1 (en) * 2007-08-31 2009-03-05 Ebersole Steven cache access mechanism
US20090265251A1 (en) * 2007-11-30 2009-10-22 Nearbynow Systems and Methods for Searching a Defined Area
US20090300641A1 (en) * 2008-05-30 2009-12-03 Novell, Inc. System and method for supporting a virtual appliance
US20100086192A1 (en) * 2008-10-02 2010-04-08 International Business Machines Corporation Product identification using image analysis and user interaction
US20100125497A1 (en) * 2008-12-16 2010-05-20 Dale Junior Arguello Electronic coupon distribution and redemption method for electronic devices

Also Published As

Publication number Publication date
US9977790B2 (en) 2018-05-22
US20120203760A1 (en) 2012-08-09

Similar Documents

Publication Publication Date Title
US10657161B2 (en) Intelligent navigation of a category system
US10757202B2 (en) Systems and methods for contextual recommendations
US9712588B1 (en) Generating a stream of content for a channel
US10528637B2 (en) Systems and methods for recommended content platform
US20170169054A1 (en) System and method for dynamically retrieving data specific to a region of a layer
US9766861B2 (en) State-specific external functionality for software developers
US9633082B2 (en) Search result ranking method and system
TWI636416B (en) Method and system for multi-phase ranking for content personalization
US20160328483A1 (en) Generating content for topics based on user demand
US10489842B2 (en) Large-scale recommendations for a dynamic inventory
US9836774B2 (en) Comparative shopping tool
US10354309B2 (en) Methods and systems for selecting an optimized scoring function for use in ranking item listings presented in search results
US8566173B2 (en) Using application market log data to identify applications of interest
US20200027143A1 (en) System and method allowing social fashion selection in an electronic marketplace
US9881332B2 (en) Systems and methods for customizing search results and recommendations
CN103827863B (en) Dynamic image display area and image display within web search results
US10789626B2 (en) Deep-linking system, method and computer program product for online advertisement and e-commerce
CN104412265B (en) Update for promoting the search of application searches to index
US10146887B2 (en) Providing separate views for items
TWI424369B (en) Activity based users' interests modeling for determining content relevance
US20160042427A1 (en) Mining For Product Classification Structures For Internet-Based Product Searching
US20140207767A1 (en) Information repository search system
US10635711B2 (en) Methods and systems for determining a product category
US8005832B2 (en) Search document generation and use to provide recommendations
US8359237B2 (en) System and method for context and community based customization for a user experience

Legal Events

Date Code Title Description
AS Assignment

Owner name: EBAY INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ABRAHAM, JACK PHILLIP;ADELSON, AARON;BARTO, MATTHEW;AND OTHERS;SIGNING DATES FROM 20120311 TO 20120404;REEL/FRAME:045623/0172

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STCB Information on status: application discontinuation

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STCB Information on status: application discontinuation

Free format text: FINAL REJECTION MAILED