US20140095463A1 - Product Search Engine - Google Patents
Product Search Engine Download PDFInfo
- Publication number
- US20140095463A1 US20140095463A1 US13/911,049 US201313911049A US2014095463A1 US 20140095463 A1 US20140095463 A1 US 20140095463A1 US 201313911049 A US201313911049 A US 201313911049A US 2014095463 A1 US2014095463 A1 US 2014095463A1
- Authority
- US
- United States
- Prior art keywords
- data
- image
- product
- data field
- template
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 claims abstract description 57
- 238000000605 extraction Methods 0.000 claims description 49
- 238000013075 data extraction Methods 0.000 claims description 7
- 239000003086 colorant Substances 0.000 claims description 4
- 238000004140 cleaning Methods 0.000 claims description 2
- 230000003213 activating effect Effects 0.000 claims 3
- 230000004931 aggregating effect Effects 0.000 claims 1
- 230000005540 biological transmission Effects 0.000 claims 1
- 230000008569 process Effects 0.000 abstract description 5
- 239000000047 product Substances 0.000 description 240
- 238000010586 diagram Methods 0.000 description 14
- 238000012545 processing Methods 0.000 description 14
- 238000004422 calculation algorithm Methods 0.000 description 9
- 230000008901 benefit Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 239000000284 extract Substances 0.000 description 7
- 238000013500 data storage Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 230000006855 networking Effects 0.000 description 4
- 235000012813 breadcrumbs Nutrition 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 230000002085 persistent effect Effects 0.000 description 3
- 238000003825 pressing Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 239000006227 byproduct Substances 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000007670 refining Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 241001417495 Serranidae Species 0.000 description 1
- -1 and features) Substances 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 235000014510 cooky Nutrition 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- PWPJGUXAGUPAHP-UHFFFAOYSA-N lufenuron Chemical compound C1=C(Cl)C(OC(F)(F)C(C(F)(F)F)F)=CC(Cl)=C1NC(=O)NC(=O)C1=C(F)C=CC=C1F PWPJGUXAGUPAHP-UHFFFAOYSA-N 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G06F17/30864—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
Definitions
- the product information is also being extracted from the paragraphs, headings, breadcrumb and menu links and tables containing product name, description, category, specifications, retailer name, etc.
- the second method includes clicking on the web browser device icon in the browser address bar at any remote site. The user then selects an image to send a message to the first remote site's image lookup server. The message contains the name of the remote site and the image URL.
- Matching images on a product information site to a product record facilitates the serving of ads on the social catalog site, brand analytics on the social catalog site, conversion of links on the social catalog site to affiliate marketing links for commission based programs so that when the user clicks on the link to the page at the original site contains the image a cookie is set on the user's computer and if the user buys something at the site the store pays a commission to the referring site. Additional advantages include adding meta-information about the product to the visible text on the page to give the viewer additional information about the product. Another advantage of the system is setting keywords in meta-tags and descriptions for search engines to index. Other SEO and SEM advantages that adding keywords to pages have are not described here but are well understood in the Internet community.
- Computer system 800 also includes an I/O device 814 for coupling computer system 800 with external entities.
- I/O device 814 is a modem for enabling wired or wireless communications between computer system 800 and an external network such as, but not limited to, the Internet.
- an operating system 802 , applications 803 , modules 804 , and data 805 are shown as typically residing in one or some combination of computer usable volatile memory 806 , e.g. random access memory (RAM), and data storage unit 807 .
- operating system 802 may be stored in another location such as on a network or on a flash drive.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention facilitates product searches on a personal computer, mobile or other device from remote sites via widget lookup using a computed image signature and optional product information extracted using a template in order to retrieve a list of the same or similar products available at other sites. The search starts with a widget lookup process, followed by the submission of the product image URL, optional product information extracted using a site specific product information template and information from HTML attributes to a server. The image signature is computed, a lookup based on the image signature and product information is executed and a product list with an image, price and link to each retailer where the product can be found is returned. The list is reduced based on the submitted image, optional product template and attribute information. The server sends the product list to the user's browser for display.
Description
- The present application claims the benefit of U.S. Provisional Application No. 61/656,502, filed Jun. 6, 2012, by Derek Edwin Pappas and titled “Structured and Social Data Aggregator”, incorporated by reference herein and for which benefit of the priority date is hereby claimed.
- Not applicable.
- Not applicable.
- The present invention relates to web search, image processing, on-line shopping and social networking. Specifically, techniques of web search and image processing to aid users that either view, compare and buy products on-line, or share their product findings and preferences via social networks.
- Currently, users search for products on retailer, manufacturer, shopping engine, social network and blog websites. When users find a product image that they like, they often want to know how much the product costs, where they can buy it and other details and attributes about the product. In addition the users may want the specifications for the product and to know the manufacturer, model number, and product name. Currently users cannot search for products by image at remote websites. Users can search by image for similar images using services such as Google image search. But Google image search does not return the structured data associated with images search results.
- Shopping search engines do not de-duplicate or normalize the shopping data for all products. Typically the same image will appear in many different records in a shopping search engine result.
- Socially curated image sites which are used for curating images from other sites typically can capture the image and the page title. However, the meta-data (i.e., the structured data record in the page which was generated from a database) is not extracted automatically. The image could have been copied to a blog by another user.
- Users can save a product image found on the 3rd party website, go to Google image search, upload the image and search for it. Google image search will find the images that contain the same features (i.e., are similar in terms of shape and structure). Google image search will also return images that contain keywords which match the keywords associated with the matching images. A list of images is presented in the search results. The list will contain the original image, images that are from the same manufacturer and other images. The user then needs to click on each image and visit each site to see the price and the product information. The user then needs to note the product information on each of the different product pages in order to compare the different offers for the same product. The Google search results do not include the product information or any related products.
- The Google Data Highlighter tool performs structured data extraction using a template made by the user (web page owner). The user tags the data field values with data field names on the web page using the tool. The Data Highlighter finds pages that contain the same HTML markup and structure. The tool then finds the additional pages on the site which match the specified HTML layout. The tool then identifies the other pages on the site with the same structure. The extracted data is then presented as a rich snippet in the search results. Currently, the Data Highlighter can extract only the events-related data records which contain a time, date, place and person. The identification of semantic information is the most difficult part of structured data extraction from web pages.
- “Intelligent data search engine” U.S. Pat. No. 8,190,556 automatically identifies pages with similar structure from the same site, finds the intersection between the page structure (i.e. the XPATH and semantic type) automatically generates an extraction template, crawls each page on the site and checks if the page matches the structure of the template. If there is a match the structured data on the page is extracted and stored in a data store.
- Pinterest is a social information catalog that is curated by users. Users navigate to pages on remote sites which contain images and then press the “Pin it!” button embedded in the web page or use the “Pin it” bookmarklet to upload or add a pin on the Pinterest web site. A set of images from the web page appears and the user clicks on one of the images, adds a description, selects an existing pin board or creates a new pin board and then presses submit. The image, page title and user description are added to the user's Pinterest pin board. Currently Pinterest does not support product information extraction via templates nor do they allow the user to perform a remote product information search via their widget. Pinterest does identify URL's that belong to stores, looks up the price in a database that is created from a retailer data feed (not extracted) and displays the price in the page.
- TheFind is a conventional shopping engine. A user searches for a product by brand, store, category or can use a limited set of specifications to narrow down the search results. The search results are presented to the user. The results often contain duplicate products from different and the same store. The results do not group the stores that contain the same product.
- Currently neither Google nor Pinterest extract product information from a single product page using templates. Normally, TheFind and other shopping engines do not de-duplicate or group the same product together and display a canonical record for the product. Moreover, the shopping engines do not present all of the data from all of the stores on the Internet. Shopping engines use invented indexes generated by Apache SOLR or internal tools that index the fields in the product records. Based on the search results presented to the user, limited attempts are made to group the same product from different stores.
- In accordance with the present invention, there is provided a method and system for implementing a shopping engine that users take with them when they browse the Internet, providing a centralized search service that is connected to content on remote sites and provides a remote lookup system for the Internet's products. The invention facilitates web search, image processing, on-line shopping and social networking. Specifically, the invention's unique methods of web search and image processing, when employed, aid users that view, compare and buy products on-line, or share their product findings and preferences via social networks.
- The invention facilitates users search for products on retailer, manufacturer, shopping engine, social network, blog, and other types of websites. When users find a product they are interested in, the invention provides the information users want to know. Information such as product costs, where they can buy it, model numbers, product names, product specifications, the name of the manufacturer, and various other details.
- Another aspect of the invention is that the lookup system provides the users viewing product information on product information sites with the following information: other stores that sell the same product; the best store to buy from for non-price reasons (i.e., support, store, warranty, returns, and customer service); the historical pricing for the product, similar products and aggregated information about social brand messages.
- The product recognition process consists of the following eleven steps. First, execute a web browser program on a computer device with a screen, a microprocessor, volatile memory and persistent storage such as a hard disk drive or flash memory. Second, log into a first remote site incorporating our invention which contains our web browser device. Install the install the web browser device. Third, navigate the Internet via web browser to find a product page by searching, browsing or directly typing in a known URL.
- In the fourth step, the advanced search method looks up the site URL and if the site template exists sends it from the server to the client browser. The template is created by the user in the current or previous session on the same or a different page at the same site by selecting the each data field value (DFV) in the HTML rendered web page, in the web browser, associating the DFV with a data field name (DFN), and extracting the XPATH to the data field value (DFV). The web browser device then uses the XPATH's in the template to the extract the DFV's from the current page and associate them with their respective DFN's. The product record DFN's include but are not limited to the manufacturer name (MN), model number (M#), retailer and manufacturer logos, product name (PN), product image, ratings, breadcrumb (product category), price, sales price, the rich attributes (specifications, colors, and features), and product identification codes such as Universal Product Code (UPC's), and ISBN's. If a template does not exist then the user is prompted to identify the parts of the page which are associated with each of the DFN's. The product information is extracted from different places in the HTML code of the product page using the XPATHS associated with each DFN/DFV. The places include “alt” attribute of the “img” tag, URL and title. The product information is also being extracted from the paragraphs, headings, breadcrumb and menu links and tables containing product name, description, category, specifications, retailer name, etc. The second method includes clicking on the web browser device icon in the browser address bar at any remote site. The user then selects an image to send a message to the first remote site's image lookup server. The message contains the name of the remote site and the image URL.
- Fifth, the first remote site's image lookup server downloads the image from the image URL. The lookup server automatically performs the image signature computation producing an image signature conversion of the image to a vector of numbers, and creation of the image signature. Sixth, the software converts the image signature into a list of product IDs. Send the image signature to lookup in the image signature database via the product index, which finds a list of product records for the same or similar products with matching image signatures (a range check is performed to allow for image artifacts such as noise). Seventh, the list of product records that is sent back to the user who is waiting at the remote site. The displayed list of product records shows all stores where the user can buy the same or similar products. Eighth, combine the image signature and product information lookup results to allow further refining of the combined search results by checking for the same or similar products in the combined list, using such checks as a range check on the price and similar categories. In the event that two similar products have the same signature, the product information is used to verify that the combined results contain the same product. If the user requested that similar products be returned, then a combined result including similar products is returned to the user. Ninth, sending the resulting product list from the first remote site via a JSON file over the world wide web and displaying it in the user's browser which is executing on their client computing device. Tenth, the user selects retailer sites to visit by clicking on the links in the returned search results. And eleventh, optionally allowing the user to add the product(s) in the search results in the web browser executing on the client computing device to the user's collection on the first remote site.
- A complete understanding of the present invention may be obtained by reference to the accompanying drawings, when considered in conjunction with the subsequent, detailed description, in which:
-
FIG. 1 is a block diagram of various functional components of a system. -
FIG. 2 is a block diagram of various functional components of a system. -
FIG. 3 is a block diagram of various functional components of a system. -
FIG. 4 is a flow chart of the image signature computation. -
FIG. 5A is a block diagram of a step of the example image signature computation. -
FIG. 5B is a block diagram of a step of the example image signature computation. -
FIG. 5C is a block diagram of a step of the example image signature computation. -
FIG. 5D is a block diagram of a step of the example image signature computation. -
FIG. 5E is a block diagram of a step of the example image signature computation. -
FIG. 5F is a block diagram of a step of the example image signature computation. -
FIG. 5G is a block diagram of a step of the example image signature computation. -
FIG. 5H is a block diagram of a step of the example image signature computation. -
FIG. 5I is a block diagram of a step of the example image signature computation. -
FIG. 5J is a block diagram of a step of the example image signature computation. -
FIG. 6A is a flow chart illustrating use of the web browser device. -
FIG. 6B is a flow chart illustrating use of the web browser device. -
FIG. 6C is a flow chart illustrating use of the web browser device. -
FIG. 6D is a flow chart illustrating use of the web browser device. -
FIG. 6E is a flow chart illustrating use of the web browser device. -
FIG. 6F is a flow chart illustrating use of the web browser device. -
FIG. 6G is a flow chart illustrating use of the web browser device. -
FIG. 6H is a flow chart illustrating use of the web browser device. -
FIG. 6I is a flow chart illustrating use of the web browser device. -
FIG. 7 is a flow chart of product recognition based on an image. -
FIG. 8 is a block diagram of an example computing system. - Before the invention is described in further detail, it is to be understood that the invention is not limited to the particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.
- Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range is encompassed with the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges is also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
- Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention, a limited number of the exemplary methods and materials are described herein.
- It must be noted that as used herein and in the appended claims, the singular forms “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise.
- All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, if dates of publication are provided, they may be different from the actual publication dates and may need to be confirmed independently.
- Embodiments of the present invention include methods and apparatus for product searches on a personal computer, mobile or other device that provides means for a user to gather the information the user needs with minimal effort and in a straightforward way. One advantage is that users are able to bring the shopping engine with them when they browse the internet because the centralized service provides the databases for the Internet's products.
- In some embodiments, methods invoke a browser extension in the form of a widget placed on the browser tool bar. A user can navigate on the Internet to the web browser device installation website using a browser. The web browser device installation websites contains a web browser device. The web browser device is installed in the web browser tool bar. The user can then navigate to any remote site where desired information is to be searched for. The web browser device has two buttons: remote search (for images) and Advanced Search (for data records). When Advanced Search is clicked, the remote URL is sent to the template server. The template server looks up the root URL. If a template is found, then it is returned to the web browser along with a JavaScript extractor which then extracts the data (product record) from the (template) page and sends it to the cleaner server which performs additional extraction and cleaning. The extracted product record is then looked up in a product database and all stores which carry the product along with additional information are returned to the browser which displays the information in a popup or another browser tab. Additional information can also be sent to the popup, such as similar products or in the case where the image signature results in a number of product matches from the same or different categories, the list of categories or canonical product records, which the user then chooses from to get a list of products.
- If a template is not returned by the template server, then the user is prompted to complete the search form in the web browser device by right clicking on the data field value elements in the page as per the steps described above. The search process described above is performed and the products are looked up. When remote search button is clicked: the web browser device analyses the page, creates a template describing the page, and sends the information to the template server.
-
FIG. 1 shows the abstract of the system. The product information can be extracted from a remoteproduct information site 101 by the automaticproduct information extraction 102 and the user generated template semi-automaticproduct information extraction 103. The extracted product information which is normalized, grouped, de-duped and classified is stored in theproduct data store 109. The product images which are first processed in theimage processing service 111 are stored in theimage data store 110. The user can then perform a lookup using theweb browser device 104. Thelookup 108 queries the product data store and the image data store and through theweb services 105returns results 114 displayed in theweb browser 100. Advertisements stored in theads data base 107 can also be looked up 106. The advertisements that were returned from the lookup are displayed 113 in the browser. Thesocial network 112 is communicating with theweb services 105 and contains records pointing to the remoteproduct information sites 101. -
FIG. 2 shows the system operation when the user presses the web browser device (bookmarklet or button or extension) 202 to extract visible or hidden data on the remote site web page. The user will register at the shopping engine or socially curated shopping site orsearch engine 201. The user installs the web browser device for product lookup. The user can then go to a remotethird party site 208 generated by aremote web service 207 which contains products stored in a structured data format generated from a remote product or other structureddata website database 205 and a remoteweb site template 206. Theremote web page 209 contains theproduct record 210 and an image URL and/orimage bytes 226. - Then the user clicks on the
web browser device 202 containing JavaScript code, which can be an embedded button, widget, extension or toolbar button, in thebrowser 200. When the widget, extension or button is pressed, theproduct record 210 and theimage URL 226 embedded in theweb page 209 are extracted. Theimage 226 is downloaded 230 and processed as described later in the patent and sent to theweb service controller 214. When the user presses theweb browser device 202, the JavaScript is executed by thebrowser 200. The web browserdevice JavaScript code 202 creates anHTML script tag 213 in the page which points to aserver side script 236 that will be created on theweb service server 203. TheHTML script tag 213 passes aURL 204 from theaddress bar 235 to theserver side script 236 as an argument. Theweb service server 203 will extract the root URL from the sentURL 204 and look up the retrieved extraction template(s) 216. Theserver side script 236 is created on theweb service server 203 which contains the merged site extraction template(s) 216 for the root URL associated with theURL 204, web browser device paneluser interface code 218, and theJavaScript extractor 217. The modifiedHTML page 209 that contains the injectedHTML script tag 237 is converted to theDOM representation 212 by thebrowser 200. The browser then executes theserver side script 236 creates the following elements in 213 in the browser: the webbrowser device panel 218 which appears in the product page tab, theJavaScript extractor 217, and the merged site extraction template(s) 216. If a template was retrieved then the XPATH in each tuple is looked up in the DOM and theproduct record 219 is extracted and inserted into the web browserdevice panel UI 218 data fields. The extracted data field values will be highlighted in the web page and tagged with the corresponding data field name. - If no template was returned by the
web service server 203 or thepage 209 has changed or there is missing information in the page then the user selects that product information in the web page. The information that the user selects in the web page is checked for semantic errors, string too long errors, other types of checks and the data is cleaned by the product record checker/cleaner 225. After the user selects the product information in the web page and populates the panel the user presses the panel submitbutton 221 the web browser device sends thesubmission container 220 with the submittedproduct record 224, thenew extraction template 222 which contains the list of tuples (data field name, data field value, XPATH, semantic type)url 226 in a post key/value form to theweb service controller 214. If the user selected a price alert option for the product in the web browser device panel, then the set price alert message is sent to the price alert and history server which then stores the price alert in the price history database. - The user can press the
find button 223 to search for products in theproduct database 241 and in theimage database 227. The selectedproduct record 224 is processed in the image/data processing pipeline 240. Theindex 242 is generated from the image and the product database. Thelookup 243 will generate the search results 238 sent to theweb service controller 214. Thebrowser 200 will display the search results 238 containing the list of stores withprices 211 andproduct list 238 with the product that can be selected 232. When the user selects aproduct 232 from the search results 238 the selected product is looked up 231. - The web service will send a new template record which contains the URL of the
page 204, thenew extraction template 222, if the user created one, from the web browser device and submittedproduct record 224 from the web page to the product record cleaner. The cleaner will clean the product record and send a cleaned product record. - The web service performs the following operations: (1) the server generates a unique identifier. The
product page URL 204 is hashed to a 256-bit UUID by theweb service 214; (2) the web service sends the unique identifier and the user collection identifier to theuser database 228; and (4) the server sends the unique identifier the extraction template inJSON form 222 to theextraction template database 215. Templates from the template database are checked by thetemplate checker 233. Template widget stat server 234 communicates with thetemplate database 215. The XPATH and the semantic type are used to extract data field values from pages on the site and associate them with data field names. Pages on the site are constructed from the sameremote template 206. Thenew extraction template 222 contains the list of tuples (a tuple consists of the following: data field name, data field value, XPATH, semantic type). - If the user submitted a product using a web browser device the user and others can see the selected data record that was inserted into the collection specified by a collection id on their profile page on the socially curated website. Periodically a job is run to generate a
new index 242 from theproduct database 241 to make it easier to search for the products in user collections. - Search engines index words and phrases. Attempts to extract structured data in web pages have been made by search engines using special markup in the web pages such as RDF, good relations, micro format and rich snippets. The web designer inserts the industry standard structured data formats into the web page to create data records in the web pages. The search engine crawls the site and examines the web pages for the presence of industry standard structured data formats. The industry standard structured data formats identify the data field values using a set of data field names. A method for extraction of structured data from a page containing a visible and invisible data record at a site using an identifiable invisible data and layout format is shown when a web browser device button is pressed on the web page. The data record is located in a set of HTML tag(s) with corresponding data field names. An aspect of the present invention provides that a 3rd party predefined set of data field names are used to enclose the data field values on the page. 3rd party data field names are placed in attributes next to the data field values in the HTML tags.
- Turning now to
FIG. 3 , the product record information in theonline store database 324 at the affiliatemarketing FTP website 325 is accessed by the ftp downloader 326 which fetches the product record data feed 327. The downloaded product records are then sent to the data processing pipeline. A productinformation web site 302 is connected to theremote web service 329 that reads remote template(s) 328 containing the data field name variables, and remoteonline store database 324 to generate theonline store site 302. The page downloader orcrawler 306 reads a list of sites or pages from the onlinestore URL list 305 and downloads the product pages 307. - The downloaded pages are then used in conjunction with the selected corresponding
site template 336 fromtemplate database 303 by theautomatic extractor 308 which extracts the product records from all pages matching the site template. A site may have more than one site template. The product pages are processed by the automatic extractor which sends the root URL of each page that it is processing to theextraction template database 303 and retrieves the web browser device extraction template. The web browser device extraction template is converted to an automatic extraction template. The automatic extractor extracts the structured data record from each product information page using the automatic extraction template and creates aproduct record 309. - The affiliate downloaded
product records 327 and automatically extractedproduct records 309 each are read by the cleaner 310. The cleaner analyses each downloaded product record and produces a cleanedproduct record 311. The cleaner moves data field values and partial data field values from one data field to another, removes extraneous text, verifies the correctness of the data field values, and calculates statistics on the number of good/bad data field values using semantic checking and stores the stats in the product record. Cleaned product records are then classified by theproduct classifier 312. The product classifier matches data records to one or more product classification tuples from the product classification tuple list using words from the data record which are product classification base or synonym words. Theclassified data records 313 are normalized and grouped by thenormalizer 314. The normalizer will de-duplicate the product record stream, group records together which are the same record found at different sources (e.g. stores, shopping engines, socially curated sites, blogs, and manufacturer sites), refine the classification of a group of the same product records from different sources using methods such as voting. Further normalization steps can also be performed. The automatic extraction, cleaner, product classifier, normalizer and grouper stages communicate with thedictionary database 304. The dictionary looks up token(s) and returns semantic type information. Synonyms are converted to base words. The dictionary information is used by each pipe stage to process the data record. The resulting cleaned, classified and normalizedproduct records 315 are saved 316 in theaffiliate product database 319 or in the extractedproduct database 318 depending on the source of the product record. - The user runs the
web browser device 345 in aweb browser 300 and creates anew extraction template 333 and aproduct record 331 from a productinformation web page 334 which is inserted into theextraction template database 303. The web browser device new extraction template is converted to an automatic structured data extraction process template which is used to do thestructured data extraction 308 of all pages matching the page layout at the site that the web browser device extraction template was created from. All pages are downloaded from the site. Each web page from the same site is tested to see if it matches the structured data extraction template(s). If there is a match the data record is extracted from the matching pages. The extracted record is cleaned, classified, normalized, and stored in a database or index. - The image URL from the cleaned
product record 311 is used to download theimage 337. TheDownloaded image 338 is then processed in the imagesignature computation flow 339 and theimage record 340 is generated. The image record containsproduct id 341,image URL 342 and thecomputed image signature 343. The image records are stored in theimage database 344. The web browser device extractedproduct records database 317, the extractedproduct database 318, theaffiliate product database 319 and theimage database 344 are merged by thedatabase merger 320 and a merged and normalizedproduct database 321 is created. The merged product database is then indexed by theindexer 322 and anindex 323 is created. - The
user 348 can optionally search for a product using theweb browser device 345. The web browser'sdevice panel 330 sends the product image (image URL and/or the image byte) 342 and/or theproduct record 347 from thecurrent web page 334 to theweb service 332. The web service queries the index. Theproduct search index 323 is looked up 301 and the search results are returned 349. The product search result is displayed in thebrowser 300. The user can then select a specific product by clicking the URL, navigating to the remote URL and then viewing the remote product information. The advantage of this aspect of the invention is that the user can search for product information on remote product information web sites without leaving the product information web page i.e. the user does not have to cut information from the product page and paste it into the search box at Google and/or a shopping search engine. - The user or a previous user identifies the data field values (DFV) on the web page and associates each DFV with a data field name (DFN) which are converted into an extraction template. If the template server contains a template the template is downloaded to the browser. The template contains downloaded JavaScript used to extract the data record from the HTML page, and send the information to the template server. If the template server does not contain the extraction template for the web page then the user will be prompted to specify the data field values (DFV's). The DFV's be used in the product record search on the server. In either case after the information in the page is extracted to the web browser device panel the user presses search and the product server looks up the product record information and returns the list of stores and their prices that contain the item. Additional information can be returned as well, such as specifications and other rich attributes and similar products.
-
FIG. 4 describes the imagesignature computation flow 402. Animage 401 is transmitted to the image processing service and prepared for the processing.Image preprocessing 403 scales the image to a predefined size (the scaled image) and creates a gray-scale copy of the scaled image. Certain parts of the algorithm use the gray-scale model. The background type, solid color, gradient or transparent is detected instep 404. Thefilter selection 405 for the object boundary detection is determined by the background type. If the background is transparent and the pixel is transparent then it's a part of the background. Otherwise, it's a part of the object. In case of the solid background the edge between the background and the object is detected. If the background is a gradient the background between the gradient and object is detected. Various industry standard edge detection algorithms can be used to detect the boundary (minimum bounding box). Then, thebinary search lookup 406 is performed along the rays to define the intersection between the background and the edge of the object. Using the bounding box, the image in original color space and the gray-scale image are then cropped to the bounding box edges and prepared for thefurther processing 407. - The external
image signature creation 408 projects lines from each corner angled in the gray-scale cropped image at 45° to the minimum bounding box which intersects the line. Rays bisecting the image edges are projected and the intersection with the minimum bounding box is detected using the same method as the 45° intersection. Traversal lines form other characteristic points on edges perpendicular to the edge they are on. Then, the first intersection with the object on each traversal line starting from the line origin is found. Next the lengths from line origin at the edge of the image to the object intersection on each line are found. The x, y coordinates of the intersection point are equal for 45° lines. A single value (x or y) is used in the image signature for each 45° and 90° lines in the implementation. The number of lines can be increased for accuracy. - The first phase in the internal
image signature computation 409 is taking the eight traversal lines starting from the image (original color space cropped) center in eight directions. The first line is perpendicular and directed to the top edge and each subsequent angled at 45° to the previous one in clockwise rotation direction. Then the first color changes with large differences in intensity on each traversal line, along the line direction (exceeding certain threshold) is found. Next step is to calculate the lengths from line start to the color change point on each line. -
Color histogram 410—in the cropped image in original color space several pixel samples in characteristic positions relative to the image are taken. Then, color value intervals of equal lengths for each sample are made and occurrences of values from each interval are counted. The following is the example of the color histogram. For each pixel the RGB pixel values are converted to luv color space. -
FF 10 20 6A D7 AD R G B L U V
Then occurrences of each (L, U, V) number in each set of 3 lines is counted, -
L U V OCC 0 0 0 152 C8 5B 8A 295 8A 5B 8A 198 60 48 2A 90 3C 5A 70 65 6A D7 AD 170
and finally first three colors by occurrence are selected, in order of occurrence. -
L U V OCC C8 5B 8A 295 8A 5B 8A 198 6A D7 AD 170
Next, a table is made containing the top three colors by occurrence for all four sampling directions. That makes the internal color signature. - The
external image signature 408,internal image signature 409 andcolor histogram 410, along with the bounding box dimensions are passed to theimage signature generator 411 which produces theimage signature 412. The image signature can be computed using the traditional feature detection algorithms, such as BRISK. People in state of the art in image feature detection are familiar with BRISK algorithm and its computational efficiency. BRISK is created to match images with a high level of detail and has a configurable (but large by default) number of keypoints that are used in comparison. Hence, the performance in the use case of product images with lower level of detail can yield a lower number of keypoints needed for comparison and therefore almost proportionally lower computation time. Another performance enhancement may be made by using only the image part within the cropped andscaled image 407. Then, scale-space calculation phase in BRISK algorithm can be omitted, as the scale dimension is invariant. - Consequently, the product detection in images, besides being performed by the proposed image signature algorithm, can also be done by some other familiar algorithms in the field, in conjunction with or as a replacement for the image signature algorithm, whilst satisfying conditions for more efficient utilization than for a regular use case for the algorithms as shown in
FIGS. 5A-5J . Shown are theoriginal image 501 and the scaled, gray-scale image 502. After the background type is detected 503 the bounding box is found in the analyzedimage 504. The analyzed image is cropped to bounding box edges 505. 506 shows the scaled bounding box. Two image signatures are computed: the external 507 and theinternal image signature 508. 509 shows the ray color sample and the 510 shows the color sample. - Turing now to
FIGS. 6A-6I , the user navigates to a product information site web page containing the product image, product information andadditional images 602, inbrowser 601. Previously installed web browser device will be displayed in the browser address bar as anicon 603. When a user presses the icon the panel with images found on the product information web page will be displayed 604. The user can then, inbrowser 605, select animage 606 to lookup. The selected image will be highlighted 607. Pressing the “Done” button will send the product information and product image URL (optionally the image bytes will be sent as well) to the web service. The web browser device icon will be updated 608 to show the current status of thelookup 609. When a number of found results appears 610 user can click 611 on an icon to see the lookup results. The lookup results list 613 inbrowser 612 can contain the same and/or similar products found on different store pages. - In the case where the product database is normalized and the same product from different retailers are grouped together the user can be presented with a list of single products which might match the product and/or the image on the page that is being searched for. The user then selects one of the normalized products and the user is then presented with a list of the stores that carry that single product. If the user is interested in similar products the user can also indicate that they want to see similar products. This search is facilitated by preprocessing the products and grouping similar products by image characteristics, product classification and the same products by product record and image signature. Brands make products in certain categories so it is possible to group different manufacturer's products by category.
- This direct image search and product information lookup from remote shopping engine, retailer, manufacturer and other shopping related pages provides an efficient method for shoppers to find out competing prices, additional product information, and other locations where the product can be purchased. User has an
option 614 to select a lookup result from theresult list 615 and the store page containing the selected product will be shown in apopup window 617 on thebrowser 616. -
FIG. 7 represents the product identification by an image. Product record and image URL and/orimage bytes 702 are sent from the remote productinformation web site 701 to thewidget extraction flow 703. The image is then processed in theimage processing service 704 which produces theimage record 705 containing thecomputed image signature 706. Product and image records are stored in the product data store andimage data store 707. Othersocial bookmarking widgets 708 can be used on the same remote sites to extract the image from the remoteproduct information website 709 and save it on thesocial bookmarking website 710. Users that come to thesocial bookmarking website 710 can use the web browser device to perform aremote lookup 711 on the selected image. Image URL and/orimage bytes 712 are used to run thelookup 713 which will query the image and the productrecord data store 707. Thedata store 707 will return the product information and/orimages 714. The search results 715 can be displayed on theremote page 716 or theadvertising platform 717 or the services for externalsocial bookmarking 718 can be built. - The product recognition consists of the following steps. First, logging into a first remote site which contains a web browser device and installing the web browser device. Second, executing a web browser program on a computer device with a screen, a microprocessor, volatile memory and persistent storage such as a hard disk drive or flash memory. Navigating the Internet via web browser to find a second remote site to find a URL containing a single product page by searching, browsing or directly typing in a known URL. The URL is sent to a remote computer server, which contains a microprocessor, volatile memory and persistent storage such as a hard disk drive or flash memory, over a network connection to the Internet to retrieve the single product page (the web page contents—the HTML) and send it over a network connection to the Internet and rendering the HTML for the second remote site in the browser. The user can optionally indicate that similar products be included in the search results. Third, pressing the web browser device button. Pressing the “find” button in the web browser device panel. Selecting a product image from the multi-image view to lookup. Fourth, sending the product image signature and optional product information from the client computer to the first remote site's server. Fifth, the first remote site's server performs the image signature computation which produces an image signature. Sixth, sending the image signature to the image signature lookup which finds a list of product records for the same or similar products with matching (a range check is performed to allow for image artifacts such as noise) image signatures. Seventh, performing the product information lookup which finds a list of product records matching the client side product information. Eighth, optionally combining the image signature and product lookup results. Further refining the combined search results by checking for the same or similar products in the combined list, using such checks as a range check on the price, similar categories. In the event that two similar products have the same signature the product information is used to verify that the combined results contain the same product. If the user requested that similar products be returned then a combined result including similar products is returned to the user. Ninth, sending the resulting product list from the first remote site via a JSON file over the world wide web and displaying it in the user's browser which is executing on their client computing device. Tenth, optionally adding the product(s) in the search results in the web browser executing on the client computing device to the user's collection on the first remote site. And eleventh, the user selects retailer sites to visit by clicking on the links in the returned search results.
- The MN, M#, PN, UPC product information is extracted from different places in the HTML code of the product page. The places include “alt” attribute of the “img” tag, URL and title. The product information is also being extracted from the paragraphs, headings, breadcrumb and menu links and tables containing product name, description, category, specifications, retailer name, etc.
- The information extracted from product information site web pages is used to create clusters of different images of the same product. The textual information is used to find potentially similar product records. The images in the similar product records are then analyzed by the image processing service to join existing clusters and/or add products to clusters and/or create new clusters. Comparison of image signatures can thus be used in conjunction with limited, semi, and/or complete product record information to identify products in product information sites (i.e., manufacturer, retailer sites, blogs and social catalog).
- Matching images on a product information site to a product record facilitates the serving of ads on the social catalog site, brand analytics on the social catalog site, conversion of links on the social catalog site to affiliate marketing links for commission based programs so that when the user clicks on the link to the page at the original site contains the image a cookie is set on the user's computer and if the user buys something at the site the store pays a commission to the referring site. Additional advantages include adding meta-information about the product to the visible text on the page to give the viewer additional information about the product. Another advantage of the system is setting keywords in meta-tags and descriptions for search engines to index. Other SEO and SEM advantages that adding keywords to pages have are not described here but are well understood in the Internet community.
- Furthermore, the merging of structured data and social networking information greatly increases the accuracy of search results where qualitative results are desired. The probability of finding useful information in response to search keywords is significantly greater. Moreover, because the database contains more complete information, such as numeric attribute information which describe the database elements (e.g., the size of an object) and qualitative information (e.g., an expert's opinion of the durability of an object), searches can be conducted using general descriptions of the objects (e.g., search for a digital SLR which is within a certain dimension range and longevity) or searches can be conducted using the category, brand, store, and social rating of the former. Conventional search engines, by contrast, return results that require the user to manually validate, sort, and filter the search results. In the case of conventional search engines that return links based on popularity, the user must search through the list of links to find relevant web pages and manually search social networking services to find corresponding qualitative data.
- With reference now to
FIG. 8 , portions of the technology for providing computer-readable and computer-executable instructions that reside, for example, in or on computer-usable media of a computer system. That is,FIG. 8 illustrates one example of a type of computer that can be used to implement one embodiment of the present technology. - Although
computer system 800 ofFIG. 8 is an example of one embodiment, the present technology is well suited for operation on or with a number of different computer systems including general purpose networked computer systems, embedded computer systems, routers, switches, server devices, user devices, various intermediate devices/artifacts, standalone computer systems, mobile phones, personal data assistants, and the like. - In one embodiment,
computer system 800 ofFIG. 8 includes peripheral computerreadable media 801 such as, for example, a floppy disk, a compact disc, and the like coupled thereto. -
Computer system 800 ofFIG. 8 also includes an address/data bus 810 for communicating information, and aprocessor 8091 coupled to bus 810 for processing information and instructions. In one embodiment,computer system 800 includes a multi-processor environment in which a plurality ofprocessors computer system 800 is also well suited to having a single processor such as, for example,processor 8091.Processors Computer system 800 also includes data storage features such as a computer usablevolatile memory 806, e.g. random access memory (RAM), coupled to bus 810 for storing information and instructions forprocessors -
Computer system 800 also includes computer usablenon-volatile memory 808, e.g. read only memory (ROM), coupled to bus 810 for storing static information and instructions forprocessors computer system 800 is a data storage unit 807 (e.g., a magnetic or optical disk and disk drive) coupled to bus 810 for storing information and instructions.Computer system 800 also includes an optional alpha-numeric input device 812 including alpha-numeric and function keys coupled to bus 810 for communicating information and command selections toprocessor Computer system 800 also includes an optionalcursor control device 813 coupled to bus 810 for communicating user input information and command selections toprocessor 8091 orprocessors optional display device 811 is coupled to bus 810 for displaying information. - Referring still to
FIG. 8 ,optional display device 811 ofFIG. 8 may be a liquid crystal device, cathode ray tube, plasma display device or other display device suitable for creating graphic images and alpha-numeric characters recognizable to a user. Optionalcursor control device 813 allows the computer user to dynamically signal the movement of a visible symbol (cursor) on a display screen ofdisplay device 811. Implementations ofcursor control device 813 include a trackball, mouse, touch pad, joystick or special keys onalphanumeric input device 812 capable of signaling movement of a given direction or manner of displacement. Alternatively, in one embodiment, the cursor can be directed and/or activated via input from alpha-numeric input device 812 using special keys and key sequence commands or other means such as, for example, voice commands. -
Computer system 800 also includes an I/O device 814 forcoupling computer system 800 with external entities. In one embodiment, I/O device 814 is a modem for enabling wired or wireless communications betweencomputer system 800 and an external network such as, but not limited to, the Internet. Referring still toFIG. 8 , various other components are depicted forcomputer system 800. Specifically, when present, anoperating system 802,applications 803,modules 804, anddata 805 are shown as typically residing in one or some combination of computer usablevolatile memory 806, e.g. random access memory (RAM), anddata storage unit 807. However, in an alternate embodiment,operating system 802 may be stored in another location such as on a network or on a flash drive. Further,operating system 802 may be accessed from a remote location via, for example, a coupling to the internet. In one embodiment, the present technology is stored as anapplication 803 ormodule 804 in memory locations withinRAM 806 and memory areas withindata storage unit 807. - The present technology may be described in the general context of computer-executable instructions stored on computer readable medium that may be executed by a computer. However, one embodiment of the present technology may also utilize a distributed computing environment where tasks are performed remotely by devices linked through a communications network.
- It should be further understood that the examples and embodiments pertaining to the systems and methods disclosed herein are not meant to limit the possible implementations of the present technology. Further, although the subject matter has been described in a language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the Claims.
- Since other modifications and changes varied to fit particular operating requirements and environments will be apparent to those skilled in the art, the invention is not considered limited to the example chosen for purposes of disclosure, and covers all changes and modifications which do not constitute departures from the true spirit and scope of this invention.
Claims (34)
1. A method for extracting a data record from a web page, said method comprising:
a. accessing said web page with a web browser;
b. activating a web browser device in said web page;
c. associating an extraction template with a data record type on said web page;
d. extracting the data record associated with said data record type;
e. downloading an image associated with an image url;
f. creating an image signature for said image;
g. associating said image signature with said data record;
h. storing said image signature in a third data store;
i. storing the association between said image signature, said image and said data record in the fourth data store; and
j. storing said data record in a first data store wherein there is an association between a first data field name in said data record in said first data store and a second data field name in said extraction template in said second data store.
2. The method of claim 1 wherein said data record is a hidden data record.
3. The method of claim 1 wherein said data record is a visible data record on said web page, further comprising extracting said data record associated with said data record type by:
i. selecting a data field value on said web page;
ii. associating said first data field name with said data field value;
iii. displaying a visible rectangle around said data field value and displaying said first data field name;
iv. calculating an XPATH value of said data field value on said web page,
wherein said extraction template is created utilizing said first data field name and said XPATH value using said web browser device; and
v. storing said extraction template in said first data store.
4. The method of claim 1 further comprising automatically retrieving said extraction template for said web page.
5. The method of claim 2 further comprising storing said hidden data record in an industry standard format and associating said hidden data record with a hidden data record template and XPATH location of the hidden data record and is associated with a root URL for a web site associated with said web page.
6. The method of claim 1 further comprising automatically displaying said data record in said web browser device panel, accepting a description and a collection from a user, and submitting said data record, said description and said collection to said first data store.
7. The method of claim 2 further comprising checking validity of said extraction template by re-extracting a current data field value and comparing to said data field value and finding any data field names present in the web page which are missing in the extraction template or the hidden data record.
8. A method for displaying errors and missing template elements in a data record from a web page, said method comprising:
a. logging in as an administrator;
b. accessing said web page with a web browser;
c. activating a web browser device in said web page;
d. associating an extraction template with a data record type on the web page;
e. accessing the error report from the template error report server for the data record type in said web page;
f. highlighting the errors and missing elements in said web page;
g. highlighting data field name/data field value pairs in said web browser device panel that contain errors or are missing from the template or should not be in the template;
h. correcting the web page template errors by
i. associating a data field name with said data field value or
ii. removing said data field values;
i. creating an extraction template comprising said data field name and an XPATH value using said web browser device; and
j. storing said extraction template in a first data store.
k. extracting said data record associated with said data record type;
l. storing said data record in a second data store wherein there is an association between said data field name in said data record in said first data store and a second data field name in said extraction template in said second data store;
9. The method of claim 1 further comprising computing the image signature using one of the following methods
a. compute an image signature from standard manufacturer image used by stores;
i. using an external and internal image signature and color histogram;
ii. using a industry standard signature such as BRISK for the entire image;
b. compute an image signature from a random image which displays the product from different angles using a industry standard signature such as BRISK for the entire image.
10. The method of claim 9 further comprising computing an external image signature by finding a minimum bounding box around a product in a manufacturer or retailer image by projecting rays from the edge of the image and finding the intersection of the ray with the edge of the product in the image.
11. The method of claim 10 further comprising computing an image signature by finding a minimal bounding box around a product in a manufacturer or retailer image using a binary search to find the closest point from the product object to the edge of each of the four sides of an image;
12. The method of claim 11 further comprising creating said image signature from points indicating the intersection between the rays and the minimum bounding box.
13. The method of claim 12 further comprising finding an internal image signature by finding the center of said minimal bounding box and projecting rays from the center to the edges terminating the rays at the boundary between two different colors/features.
14. The method of claim 10 further comprising accepting from a user an indication that said data field value is a constant wherein said constant becomes part of said extraction template or hidden data record template, and said constant is displayed in subsequent extraction processes.
15. The method of claim 1 further comprising storing said data field value with said data field name, said XPATH value and associating a root URL name in an extraction template in said first data store.
16. The method of claim 1 further comprising classifying said data field value using a product classifier and assigning a product classification to said data field value.
17. The method of claim 1 further comprising aggregating a plurality of said data field names and said data field values in said second data store into user defined collections.
18. The method of claim 8 further comprising, associating plurality of said extraction templates with a user for measuring the quality and quantity of extraction templates generated by said user.
19. The method of claim 1 further comprising allowing a second user accessing said web page from which the data record was extracted or said extraction template was created or retrieved to extract a current data field value from said web page.
20. The method of claim 1 further comprising extracting all of the elements of a list associated with said data field value using a repeating structured pattern associated with said data field name and said XPATH value.
21. The method of claim 1 further comprising selecting said data field value using a predefined extraction template retrieved from said first data store.
22. The method of claim 1 further comprising selecting said data field value extracted from the hidden data record.
23. The method of claim 1 further comprising selecting said data field value using by searching for a predefined data field name on said web page.
24. The method of claim 1 further comprising converting said extraction template from said first data store into an automatic data extraction template to extract current data field values from all web pages at the root web site which matches said template.
25. The method of claim 1 further comprising converting said hidden record data template from said first data store into an automatic data extraction template to extract current data field values from all web pages at the root web site which matches said template.
26. The method of claim 1 further comprising cleaning said data field value, classifying said data field value, normalizing said data field value, storing said data field value and indexing said data field value.
27. The method of claim 1 further comprising adding date and purchase location information associated with said data field value to said second data store.
28. The method of claim 1 further comprising comparing a plurality of data field values from said second data store by a user in the in a social network or a shopping engine and storing the comparison for viewing by said user or other social network members.
29. A method for implementing a browser based information transmission method comprising:
a. extracting a data record from a web page;
b. adding said data record to a user profile on a social network; and
c. sharing said data record with a plurality of users wherein each of said users can comment, copy, compare, vote on, or access the web page.
30. The method of claim 29 further comprising combining said data record with plurality of other extracted data records to form a collection.
31. The method of claim 29 further comprising storing said collection in a searchable index.
32. The method of claim 29 further comprising finding a product search result from a product image on a web page by
a. accessing the web page with a web browser;
b. activating a web browser device on the web page in a web browser;
c. The image identifier associated with the web browser device automatically finds the images greater than a certain size;
d. The selected images are shown in a pop up;
e. The user selects a single image in the pop up and presses “done”;
f. extracting the image url/bytes from the web page;
g. transmitting the extracted image url/bytes to the web service controller;
h. querying a image data store and associating the image with a product search result;
i. returning the product search result from the web service controller to the web browser device;
j. displaying the product search result in the web browser device.
33. The method of claim 29 wherein the data record is a visible data record, further comprising: transmitting a root URL from the web browser device to a web service controller; associating the root URL with an extraction template; returning the extraction template from the web service controller to the web browser device; and extracting a product record from the web page using the extraction template.
34. The method of claim 29 wherein JavaScript is inserted into the web page such that when said user navigates to said web page said JavaScript is activated and identifies the hidden data records, product images or product words; transmits the information to the web server which looks up the information and returns product information about the image which includes the list of stores the product can be purchased at, similar brands, other products from the brand, other products from the store, products from the same category with affiliate links or paper-click links that when activated by a user result in the commission being payed to the end user and to the service provider.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/911,049 US20140095463A1 (en) | 2012-06-06 | 2013-06-05 | Product Search Engine |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261656502P | 2012-06-06 | 2012-06-06 | |
US13/911,049 US20140095463A1 (en) | 2012-06-06 | 2013-06-05 | Product Search Engine |
Publications (1)
Publication Number | Publication Date |
---|---|
US20140095463A1 true US20140095463A1 (en) | 2014-04-03 |
Family
ID=49716133
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/911,049 Abandoned US20140095463A1 (en) | 2012-06-06 | 2013-06-05 | Product Search Engine |
US13/911,056 Active - Reinstated 2034-09-23 US9672283B2 (en) | 2012-06-06 | 2013-06-05 | Structured and social data aggregator |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/911,056 Active - Reinstated 2034-09-23 US9672283B2 (en) | 2012-06-06 | 2013-06-05 | Structured and social data aggregator |
Country Status (1)
Country | Link |
---|---|
US (2) | US20140095463A1 (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140195893A1 (en) * | 2013-01-07 | 2014-07-10 | Alibaba Group Holding Limited | Method and Apparatus for Generating Webpage Content |
US20140337165A1 (en) * | 2013-05-10 | 2014-11-13 | Dell Products L.P. | Mobile application enabling product discovery and obtaining feedback from network |
US20150169607A1 (en) * | 2013-12-17 | 2015-06-18 | Ebay Inc. | Systems and methods to present images representative of searched items |
US20160162582A1 (en) * | 2014-12-09 | 2016-06-09 | Moodwire, Inc. | Method and system for conducting an opinion search engine and a display thereof |
US20170301009A1 (en) * | 2016-04-16 | 2017-10-19 | Boris Sheykhetov | Philatelic Search Service System and Method |
US9805408B2 (en) | 2013-06-17 | 2017-10-31 | Dell Products L.P. | Automated creation of collages from a collection of assets |
CN107463696A (en) * | 2017-08-15 | 2017-12-12 | 中译语通科技(北京)有限公司 | A kind of method of Webpage largest block extraction |
CN107967324A (en) * | 2017-11-24 | 2018-04-27 | 广州明动软件股份有限公司 | Intellectual data conversion storage and Rapid input system and method |
US20180123888A1 (en) * | 2016-11-03 | 2018-05-03 | Allied Telesis Holdings Kabushiki Kaisha | Dynamic management of network environments |
US9965792B2 (en) | 2013-05-10 | 2018-05-08 | Dell Products L.P. | Picks API which facilitates dynamically injecting content onto a web page for search engines |
US10445377B2 (en) | 2015-10-15 | 2019-10-15 | Go Daddy Operating Company, LLC | Automatically generating a website specific to an industry |
US10614118B2 (en) | 2018-02-28 | 2020-04-07 | Microsoft Technology Licensing, Llc | Increasing inclusiveness of search result generation through tuned mapping of text and images into the same high-dimensional space |
US20200320247A1 (en) * | 2014-04-29 | 2020-10-08 | Wix.Com Ltd. | System and method for the creation and use of visually-diverse high-quality dynamic layouts |
US10949907B1 (en) | 2020-06-23 | 2021-03-16 | Price Technologies Inc. | Systems and methods for deep learning model based product matching using multi modal data |
US11048707B2 (en) * | 2017-06-28 | 2021-06-29 | Researchgate Gmbh | Identifying a product in a document |
US11120362B2 (en) | 2017-06-28 | 2021-09-14 | Researchgate Gmbh | Identifying a product in a document |
US11204975B1 (en) * | 2020-08-10 | 2021-12-21 | Coupang Corp. | Program interface remote management and provisioning |
US11250204B2 (en) * | 2017-12-05 | 2022-02-15 | International Business Machines Corporation | Context-aware knowledge base system |
US20230325456A1 (en) * | 2022-04-08 | 2023-10-12 | Content Square SAS | Insights Interface for Hidden Products |
US11816143B2 (en) | 2017-07-18 | 2023-11-14 | Ebay Inc. | Integrated image system based on image search feature |
US11816176B2 (en) * | 2021-07-27 | 2023-11-14 | Locker 2.0, Inc. | Systems and methods for enhancing online shopping experience |
US11907282B2 (en) | 2021-04-01 | 2024-02-20 | Find My, LLC | Method, apparatus, system, and non-transitory computer readable medium for performing image search verification using an online platform |
US11995843B2 (en) * | 2017-12-08 | 2024-05-28 | Ebay Inc. | Object identification in digital images |
Families Citing this family (59)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8612270B2 (en) | 2004-06-12 | 2013-12-17 | James K. Hazy | System and method to simulate the impact of leadership activity |
US10083421B2 (en) | 2004-06-12 | 2018-09-25 | James K. Hazy | System and method for the augmentation of emotional and social intelligence in technology mediated communication |
US20130254181A1 (en) * | 2012-01-03 | 2013-09-26 | Be Labs, Llc | Aggregation and Categorization |
CA2789909C (en) | 2012-09-14 | 2019-09-10 | Ibm Canada Limited - Ibm Canada Limitee | Synchronizing http requests with respective html context |
US9852223B2 (en) | 2012-12-20 | 2017-12-26 | Ebay Inc. | Enhancing search results with social networking data |
US20140207787A1 (en) * | 2013-01-23 | 2014-07-24 | Nubean LLC | Multi-tenant system for consolidating, searching and sharing distributed user-specific digital content |
CN103971244B (en) * | 2013-01-30 | 2018-08-17 | 阿里巴巴集团控股有限公司 | A kind of publication of merchandise news and browsing method, apparatus and system |
US20140250196A1 (en) * | 2013-03-01 | 2014-09-04 | Raymond Anthony Joao | Apparatus and method for providing and/or for processing information regarding, relating to, or involving, defamatory, derogatory, harrassing, bullying, or other negative or offensive, comments, statements, or postings |
US20140258267A1 (en) * | 2013-03-08 | 2014-09-11 | Microsoft Corporation | Aggregating and Searching Social Network Images |
US10262029B1 (en) | 2013-05-15 | 2019-04-16 | Google Llc | Providing content to followers of entity feeds |
US9686348B2 (en) * | 2013-10-28 | 2017-06-20 | Salesforce.Com, Inc. | Inter-entity connection maps |
WO2015069924A1 (en) | 2013-11-06 | 2015-05-14 | Yahoo! Inc. | Client-side scout and companion in a real-time bidding advertisement system |
CN104679769B (en) * | 2013-11-29 | 2018-04-06 | 国际商业机器公司 | The method and device classified to the usage scenario of product |
US10719562B2 (en) * | 2013-12-13 | 2020-07-21 | BloomReach Inc. | Distributed and fast data storage layer for large scale web data services |
US10013483B2 (en) * | 2014-01-30 | 2018-07-03 | Microsoft Technology Licensing, Llc | System and method for identifying trending topics in a social network |
US11074293B2 (en) | 2014-04-22 | 2021-07-27 | Microsoft Technology Licensing, Llc | Generating probabilistic transition data |
US11481424B2 (en) * | 2014-05-16 | 2022-10-25 | RCRDCLUB Corporation | Systems and methods of media selection based on criteria thresholds |
US10902497B1 (en) * | 2014-08-25 | 2021-01-26 | Twitter, Inc. | Method and system for processing requests in a messaging platform |
US9396483B2 (en) | 2014-08-28 | 2016-07-19 | Jehan Hamedi | Systems and methods for determining recommended aspects of future content, actions, or behavior |
US9129027B1 (en) | 2014-08-28 | 2015-09-08 | Jehan Hamedi | Quantifying social audience activation through search and comparison of custom author groupings |
KR101594835B1 (en) * | 2014-11-05 | 2016-02-17 | 현대자동차주식회사 | Vehicle and head unit having voice recognizing function, and method for voice recognizning therefor |
US10242107B2 (en) | 2015-01-11 | 2019-03-26 | Microsoft Technology Licensing, Llc | Extraction of quantitative data from online content |
US10127293B2 (en) | 2015-03-30 | 2018-11-13 | International Business Machines Corporation | Collaborative data intelligence between data warehouse models and big data stores |
US20160314513A1 (en) * | 2015-04-24 | 2016-10-27 | Ebay Inc. | Automatic negotiation using real time messaging |
KR101620980B1 (en) * | 2015-08-27 | 2016-05-16 | 지방근 | Method and system for managing relation and society between members |
US11468368B2 (en) | 2015-10-28 | 2022-10-11 | Qomplx, Inc. | Parametric modeling and simulation of complex systems using large datasets and heterogeneous data structures |
US10210255B2 (en) * | 2015-12-31 | 2019-02-19 | Fractal Industries, Inc. | Distributed system for large volume deep web data extraction |
US11074652B2 (en) | 2015-10-28 | 2021-07-27 | Qomplx, Inc. | System and method for model-based prediction using a distributed computational graph workflow |
US10079738B1 (en) * | 2015-11-19 | 2018-09-18 | Amazon Technologies, Inc. | Using a network crawler to test objects of a network document |
US20170193569A1 (en) * | 2015-12-07 | 2017-07-06 | Brandon Nedelman | Three dimensional web crawler |
US10580024B2 (en) * | 2015-12-15 | 2020-03-03 | Adobe Inc. | Consumer influence analytics with consumer profile enhancement |
US10936675B2 (en) * | 2015-12-17 | 2021-03-02 | Walmart Apollo, Llc | Developing an item data model for an item |
US9530023B1 (en) * | 2015-12-21 | 2016-12-27 | Vinyl Development LLC | Reach objects |
EP3398088A4 (en) * | 2015-12-28 | 2019-08-21 | Sixgill Ltd. | Dark web monitoring, analysis and alert system and method |
US20170213138A1 (en) * | 2016-01-27 | 2017-07-27 | Machine Zone, Inc. | Determining user sentiment in chat data |
US10042924B2 (en) * | 2016-02-09 | 2018-08-07 | Oath Inc. | Scalable and effective document summarization framework |
WO2017149540A1 (en) * | 2016-03-02 | 2017-09-08 | Feelter Sales Tools Ltd | Sentiment rating system and method |
WO2017176944A1 (en) * | 2016-04-05 | 2017-10-12 | Fractal Industries, Inc. | System for fully integrated capture, and analysis of business information resulting in predictive decision making and simulation |
US10067965B2 (en) * | 2016-09-26 | 2018-09-04 | Twiggle Ltd. | Hierarchic model and natural language analyzer |
US20180089316A1 (en) | 2016-09-26 | 2018-03-29 | Twiggle Ltd. | Seamless integration of modules for search enhancement |
CN108228629A (en) * | 2016-12-15 | 2018-06-29 | 北大方正集团有限公司 | Data pick-up method and device |
US10536551B2 (en) | 2017-01-06 | 2020-01-14 | Microsoft Technology Licensing, Llc | Context and social distance aware fast live people cards |
US10599771B2 (en) * | 2017-04-10 | 2020-03-24 | International Business Machines Corporation | Negation scope analysis for negation detection |
CN107688594B (en) * | 2017-05-05 | 2019-07-16 | 平安科技(深圳)有限公司 | The identifying system and method for risk case based on social information |
WO2019038588A1 (en) | 2017-07-24 | 2019-02-28 | Wix. Com Ltd. | Editing a database during preview of a virtual web page |
US20190155946A1 (en) * | 2017-11-20 | 2019-05-23 | Colossio, Inc. | N-gram classification in social media messages |
CN109977393B (en) * | 2017-12-28 | 2021-09-03 | 中国科学院计算技术研究所 | Popular news prediction method and system based on content disputeness |
CN108829680A (en) * | 2018-06-22 | 2018-11-16 | 北京百悟科技有限公司 | A kind of violation publicity detection method and device, computer readable storage medium |
US10963173B2 (en) | 2019-02-05 | 2021-03-30 | Bank Of America Corporation | System for smart contract dependent resource transfer |
US11080091B2 (en) | 2019-02-05 | 2021-08-03 | Bank Of America Corporation | System for real time provisioning of resources based on condition monitoring |
US10831548B2 (en) | 2019-02-05 | 2020-11-10 | Bank Of America Corporation | System for assessing and prioritizing real time resource requirements |
US10810040B2 (en) | 2019-02-05 | 2020-10-20 | Bank Of America Corporation | System for real-time transmission of data associated with trigger events |
US10635506B1 (en) | 2019-02-05 | 2020-04-28 | Bank Of America Corporation | System for resource requirements aggregation and categorization |
US10937038B2 (en) | 2019-02-05 | 2021-03-02 | Bank Of America Corporation | Navigation system for managing utilization of resources |
US11507966B2 (en) * | 2019-02-07 | 2022-11-22 | Dell Products L.P. | Multi-region document revision model with correction factor |
US11392657B2 (en) | 2020-02-13 | 2022-07-19 | Microsoft Technology Licensing, Llc | Intelligent selection and presentation of people highlights on a computing device |
CN114461930B (en) * | 2022-04-13 | 2022-06-24 | 四川大学 | Social network data acquisition method and device and storage medium |
US20230394168A1 (en) * | 2022-06-01 | 2023-12-07 | Microsoft Technology Licensing, Llc | Detecting personally identifiable information in data associated with a cloud computing system |
CN116701810B (en) * | 2023-07-29 | 2023-12-15 | 北京长亭科技有限公司 | Website operation playback method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7373313B1 (en) * | 2000-04-25 | 2008-05-13 | Alexa Internet | Service for enabling users to share information regarding products represented on web pages |
US7584194B2 (en) * | 2004-11-22 | 2009-09-01 | Truveo, Inc. | Method and apparatus for an application crawler |
US8478052B1 (en) * | 2009-07-17 | 2013-07-02 | Google Inc. | Image classification |
US8589366B1 (en) * | 2007-11-01 | 2013-11-19 | Google Inc. | Data extraction using templates |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6446035B1 (en) * | 1999-05-05 | 2002-09-03 | Xerox Corporation | Finding groups of people based on linguistically analyzable content of resources accessed |
US20110099507A1 (en) * | 2009-10-28 | 2011-04-28 | Google Inc. | Displaying a collection of interactive elements that trigger actions directed to an item |
US20110282734A1 (en) * | 2010-04-07 | 2011-11-17 | Mark Zurada | Systems and methods used for publishing and aggregating real world and online purchases via standardized product information |
US20110251973A1 (en) * | 2010-04-08 | 2011-10-13 | Microsoft Corporation | Deriving statement from product or service reviews |
WO2012009832A1 (en) * | 2010-07-23 | 2012-01-26 | Ebay Inc. | Instant messaging robot to provide product information |
US20120246029A1 (en) * | 2011-03-25 | 2012-09-27 | Ventrone Mark D | Product comparison and selection system and method |
WO2013059290A1 (en) * | 2011-10-17 | 2013-04-25 | Metavana, Inc. | Sentiment and influence analysis of twitter tweets |
US20130311875A1 (en) * | 2012-04-23 | 2013-11-21 | Derek Edwin Pappas | Web browser embedded button for structured data extraction and sharing via a social network |
-
2013
- 2013-06-05 US US13/911,049 patent/US20140095463A1/en not_active Abandoned
- 2013-06-05 US US13/911,056 patent/US9672283B2/en active Active - Reinstated
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7373313B1 (en) * | 2000-04-25 | 2008-05-13 | Alexa Internet | Service for enabling users to share information regarding products represented on web pages |
US7584194B2 (en) * | 2004-11-22 | 2009-09-01 | Truveo, Inc. | Method and apparatus for an application crawler |
US8589366B1 (en) * | 2007-11-01 | 2013-11-19 | Google Inc. | Data extraction using templates |
US8478052B1 (en) * | 2009-07-17 | 2013-07-02 | Google Inc. | Image classification |
Non-Patent Citations (2)
Title |
---|
Mingqiang Yang et al. âA Survey of Shape Feature Extraction Techniquesâ, Pattern Recognition IN-TEX, pp. 43-90, 2008 * |
Yang Mingqiang et al. âShape Matching and Object Recognition Using Chord Contextsâ, International Conference on Visualization, IEEE, 2008 * |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140195893A1 (en) * | 2013-01-07 | 2014-07-10 | Alibaba Group Holding Limited | Method and Apparatus for Generating Webpage Content |
US9965792B2 (en) | 2013-05-10 | 2018-05-08 | Dell Products L.P. | Picks API which facilitates dynamically injecting content onto a web page for search engines |
US20140337165A1 (en) * | 2013-05-10 | 2014-11-13 | Dell Products L.P. | Mobile application enabling product discovery and obtaining feedback from network |
US10354310B2 (en) * | 2013-05-10 | 2019-07-16 | Dell Products L.P. | Mobile application enabling product discovery and obtaining feedback from network |
US9805408B2 (en) | 2013-06-17 | 2017-10-31 | Dell Products L.P. | Automated creation of collages from a collection of assets |
US20150169607A1 (en) * | 2013-12-17 | 2015-06-18 | Ebay Inc. | Systems and methods to present images representative of searched items |
US11544442B2 (en) * | 2014-04-29 | 2023-01-03 | Wix.Com Ltd. | System and method for the creation and use of visually-diverse high-quality dynamic layouts |
US20200320247A1 (en) * | 2014-04-29 | 2020-10-08 | Wix.Com Ltd. | System and method for the creation and use of visually-diverse high-quality dynamic layouts |
US20160162582A1 (en) * | 2014-12-09 | 2016-06-09 | Moodwire, Inc. | Method and system for conducting an opinion search engine and a display thereof |
US11372935B2 (en) * | 2015-10-15 | 2022-06-28 | Go Daddy Operating Company, LLC | Automatically generating a website specific to an industry |
US10445377B2 (en) | 2015-10-15 | 2019-10-15 | Go Daddy Operating Company, LLC | Automatically generating a website specific to an industry |
US10482528B2 (en) * | 2016-04-16 | 2019-11-19 | Boris Sheykhetov | Philatelic search service system and method |
US20170301009A1 (en) * | 2016-04-16 | 2017-10-19 | Boris Sheykhetov | Philatelic Search Service System and Method |
US20180123888A1 (en) * | 2016-11-03 | 2018-05-03 | Allied Telesis Holdings Kabushiki Kaisha | Dynamic management of network environments |
US10972351B2 (en) * | 2016-11-03 | 2021-04-06 | Allied Telesis Holdings Kabushiki Kaisha | Dynamic management of network environments |
US11048707B2 (en) * | 2017-06-28 | 2021-06-29 | Researchgate Gmbh | Identifying a product in a document |
US11120362B2 (en) | 2017-06-28 | 2021-09-14 | Researchgate Gmbh | Identifying a product in a document |
US11816143B2 (en) | 2017-07-18 | 2023-11-14 | Ebay Inc. | Integrated image system based on image search feature |
CN107463696A (en) * | 2017-08-15 | 2017-12-12 | 中译语通科技(北京)有限公司 | A kind of method of Webpage largest block extraction |
CN107967324A (en) * | 2017-11-24 | 2018-04-27 | 广州明动软件股份有限公司 | Intellectual data conversion storage and Rapid input system and method |
US11250204B2 (en) * | 2017-12-05 | 2022-02-15 | International Business Machines Corporation | Context-aware knowledge base system |
US11995843B2 (en) * | 2017-12-08 | 2024-05-28 | Ebay Inc. | Object identification in digital images |
US10614118B2 (en) | 2018-02-28 | 2020-04-07 | Microsoft Technology Licensing, Llc | Increasing inclusiveness of search result generation through tuned mapping of text and images into the same high-dimensional space |
US10949907B1 (en) | 2020-06-23 | 2021-03-16 | Price Technologies Inc. | Systems and methods for deep learning model based product matching using multi modal data |
US11978106B2 (en) | 2020-06-23 | 2024-05-07 | Price Technologies Inc. | Method and non-transitory, computer-readable storage medium for deep learning model based product matching using multi modal data |
US11204975B1 (en) * | 2020-08-10 | 2021-12-21 | Coupang Corp. | Program interface remote management and provisioning |
TWI787706B (en) * | 2020-08-10 | 2022-12-21 | 南韓商韓領有限公司 | System for provisioning computing interfaces and system and method for assigning reference to target computing interface |
US11907282B2 (en) | 2021-04-01 | 2024-02-20 | Find My, LLC | Method, apparatus, system, and non-transitory computer readable medium for performing image search verification using an online platform |
US11816176B2 (en) * | 2021-07-27 | 2023-11-14 | Locker 2.0, Inc. | Systems and methods for enhancing online shopping experience |
US20230325456A1 (en) * | 2022-04-08 | 2023-10-12 | Content Square SAS | Insights Interface for Hidden Products |
Also Published As
Publication number | Publication date |
---|---|
US20130332460A1 (en) | 2013-12-12 |
US9672283B2 (en) | 2017-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20140095463A1 (en) | Product Search Engine | |
US9606970B2 (en) | Web browser device for structured data extraction and sharing via a social network | |
US10904117B1 (en) | Insights for web service providers | |
US10262342B2 (en) | Deep-linking system, method and computer program product for online advertisement and E-commerce | |
US20130311875A1 (en) | Web browser embedded button for structured data extraction and sharing via a social network | |
US8707167B2 (en) | High precision data extraction | |
US9613008B2 (en) | Dynamic aggregation and display of contextually relevant content | |
US8355997B2 (en) | Method and system for developing a classification tool | |
US8793239B2 (en) | Method and system for form-filling crawl and associating rich keywords | |
JP5779187B2 (en) | Contextual support for publish-subscribe systems | |
US20160042427A1 (en) | Mining For Product Classification Structures For Internet-Based Product Searching | |
US20090125529A1 (en) | Extracting information based on document structure and characteristics of attributes | |
EP3563240B1 (en) | Systems and methods for harvesting data associated with fraudulent content in a networked environment | |
US20150287047A1 (en) | Extracting Information from Chain-Store Websites | |
US9652543B2 (en) | Task-oriented presentation of auxiliary content to increase user interaction performance | |
US20150186739A1 (en) | Method and system of identifying an entity from a digital image of a physical text | |
US9390446B2 (en) | Consumer centric online product research | |
US20160103913A1 (en) | Method and system for calculating a degree of linkage for webpages | |
US9256805B2 (en) | Method and system of identifying an entity from a digital image of a physical text | |
US20150302090A1 (en) | Method and System for the Structural Analysis of Websites | |
US20140143172A1 (en) | System, method, software arrangement and computer-accessible medium for a mobile-commerce store generator that automatically extracts and converts data from an electronic-commerce store | |
US20120254158A1 (en) | Aggregating product review information for electronic product catalogs | |
US20160070794A1 (en) | Method and system for masking and filtering web contents and computer program product | |
US10824606B1 (en) | Standardizing values of a dataset | |
US11256703B1 (en) | Systems and methods for determining long term relevance with query chains |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |