WO2010089736A2 - A method and means for identifying items in a printed document associated with media objects - Google Patents

A method and means for identifying items in a printed document associated with media objects Download PDF

Info

Publication number
WO2010089736A2
WO2010089736A2 PCT/IL2010/000089 IL2010000089W WO2010089736A2 WO 2010089736 A2 WO2010089736 A2 WO 2010089736A2 IL 2010000089 W IL2010000089 W IL 2010000089W WO 2010089736 A2 WO2010089736 A2 WO 2010089736A2
Authority
WO
WIPO (PCT)
Prior art keywords
strings
printed document
media objects
items
string
Prior art date
Application number
PCT/IL2010/000089
Other languages
French (fr)
Other versions
WO2010089736A3 (en
Inventor
Efrat Rotem
Arnon Rotem-Gal-Oz
Original Assignee
Xsights Media Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xsights Media Ltd. filed Critical Xsights Media Ltd.
Publication of WO2010089736A2 publication Critical patent/WO2010089736A2/en
Publication of WO2010089736A3 publication Critical patent/WO2010089736A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5854Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using shape and object relationship
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Definitions

  • the present invention relates in general to the field of communications, and in particular to a device and method for allowing retrieval of information based upon content included in a printed item.
  • the various prior art solutions are based upon realizing which media object is requested by processing the picture transmitted in order to retrieve a marked object therefrom, and thereafter proceeding based upon the information retrieved in order to provide the user with the requested media object.
  • the prior art solutions are not too concerned with the problem of how to provide a reader with the opportunity to retrieve media objects associated with items that have not been pre-selected by the editor of the printed documents indicating them as being items that could lead the reader to such media objects, nor with the problem of how to select items that can safely be marked within the printed document, knowing that when the user takes a picture of a portion of the printed document that comprises the marked item, the service provider will be able to establish which media object has been requested.
  • the latter problem may be regarded as a two parts problem.
  • the present invention seeks to provide a method and device to overcome this problem.
  • a method for enabling provisioning to a user of a mobile communication device, of one or more media objects associated with a printed document comprises the steps of: providing one or more digital representations of at least one portion of a printed document, where each of the representations may be for example in the form of an image, text and the like or any combination thereof; retrieving -from the one or more digital representations a plurality of strings, each comprising at least one word, where the retrieval may be carried out by applying a process like an OCR process on a received image or directly from the received text; identifying at least one of the plurality of strings which can be associated with one or more retrievable media objects, and wherein the at least one identified string had not been marked as being a string associated with one or more retrievable media objects in the one or more digital representations provided; and enabling the provisioning of at least one media object associated with one or more items comprised in the at least one portion of the printed document, to the user, wherein the one or more
  • the term "item” as well as the term “printed object” refer to a printed matter such as for example a combination of words, a title of an article, an advertisement, a headline in a daily newspaper, and the like, so that by invoking the method of the present invention, the user is eventually able to receive predefined video signals, links to mobile internet pages, a link to a mobile internet browser with a selected word, etc., that are associated/linked with the item/printed object which in turn is associated with the printed portion whose image would be captured by the user's camera .
  • word as used herein throughout the specification and claims should be understood to encompass also a combination of characters, one or more parts of words as well as combinations of words.
  • image should be understood to encompass text, a figure, a logo, a symbol and the like, or any combination thereof as the case may be, converted into a digital representation.
  • media object as used herein should be understood to encompass a unit of image data (optionally accompanied by other media data, e.g. sound or Internet link) , and encompasses individual static images as well as sequences of video images or video frames (such sequences are referred to herein as media objects) .
  • Media objects may alternatively or additionally comprise animation data defining animations.
  • animation as used herein encompasses any form of animation, including, for example, frame-based animation, vector animation or procedural animation.
  • the term “media object” should be considered to encompass also any such animations.
  • image data preferably refers to image data of static images or of video/animation frames in media objects, or to image data associated with animations .
  • the number of items does not necessarily have to match the number of media objects, as to some of the items there could be more than one media object associated therewith.
  • the method provided further comprising a step of receiving a captured image of at least one portion of the printed document taken by the user of the mobile communication device, identifying the at least one portion of the printed document whose image was captured, identifying the one or more corresponding digital representations, and retrieving the one or more items belonging to the one or more corresponding digital representations, e.g. to enable the user to select a media object associated with one of these one or more items.
  • a digital representation of the respective portion would be the digital representation of one or more articles to which the portion whose image was taken, belong.
  • the step of retrieving the plurality of strings comprises identifying one or more printed portions of the printed document whose image has been captured at the captured image from among a plurality of printed portions included in the printed document.
  • the method comprises upon identifying the one or more items, inserting corresponding identification marks into the digital representation of the printed document.
  • These identification marks would be used to provide the user (the reader) with easier identification of items within the printed document, thereby enabling him/her to realize which are the items that appear in the printed document that are associated with one or more media objects which he/she may retrieve.
  • the step of identifying at least one string which can be associated with one or more retrievable media objects comprises matching the plurality of strings with strings comprised in a database.
  • the step of identifying at least one string which can be associated with one or more retrievable media objects comprises matching the plurality of strings with strings associated with media objects that have been found to be of interest to a plurality of users, e.g. Internet users.
  • the media objects that have been found to be of interest to the plurality of users are media objects that had been frequently visited by Internet users (e.g. as determined by a search engine statistical tool) .
  • the step of identifying at least one string which can be associated with one or more retrievable media objects comprises matching the plurality of strings with strings associated with pre-defined media objects (e.g. paid advertisements, paid links to certain websites, etc . ) .
  • pre-defined media objects e.g. paid advertisements, paid links to certain websites, etc .
  • the identification of the captured portion of the printed document comprises applying an OCR process or an image matching process on the captured image, and once the portion of the printed document has been identified, the one or more associated digital representations are recognized, and this recognition subsequently leads to the provisioning of a list of items (words) that had already been associated with that one or more digital representations (e.g. of the article) .
  • the plurality of strings (words) may be directly associated with one or more articles which correspond to the one or more digital representations .
  • the step of enabling the provisioning of at least one media object to the user comprises communicating the at least one media object associated with the one or more items comprised in the captured image of the at least one portion of the printed document, to the user of the mobile communication device.
  • the step of enabling the provisioning of at least one media object comprises providing identification of one or more items in the printed document which correspond to one or more strings of the at least one identified string.
  • the editor may have an option to determine which of the available occurrences will be identified for the user as being items with which media objects may be retrieved.
  • the method provided further comprises a step of determining at least one portion of the printed document from among a plurality of other portions associated with that printed document for marking the one or more identified items that correspond to one or more strings of the at least one identified string, so that a reader of the printed document is able to recognize the one or more associated items for which media objects may be retrieved.
  • the step of identifying at least one portion of the printed document comprises:
  • step (v) if the similarity factor of the first portion is lower than a pre-defined value, allowing the association of one or more identification marks with the one or more items comprised within said first portion. Still preferably, if the similarity factor of the first portion is higher than the pre-defined value, applying one or more of the following steps: a) adding an identification mark to said selected first portion and repeating steps (iii) to (v) ; b) selecting another item which appears in that first portion and could be associated with an identification mark and repeating steps (iii) to (v) ; c) selecting another portion that comprises at least one item included in said first portion and repeating steps (iii) to (v) ; d) associating an identification mark with another portion which would belong to the same article in the printed document and repeating steps (iii) to (v) for said another portion; or e) adding one or more unique symbols from a pre defined library to said selected first portion and repeating steps (iii) to (v) for said other portion.
  • the method provided further comprises repeating steps (ii) to (v) for each of the plurality of portions, thereby establishing in which portions out of said plurality of portions, it would be possible to associate the one or more identification marks .
  • an apparatus for enabling the provisioning of one or more media objects associated with at least one portion of a printed document, to a user of a mobile communication device.
  • the apparatus comprises the following: receiving means adapted to receive one or more digital representations of the at least one portion of a printed document; processing means adapted to: retrieve from the one or more digital representations, a plurality of strings each comprising at least one word; identify at least one of the plurality of strings which can be associated with one or more retrievable media objects, and wherein the at least one identified string has not been marked as being a string associated with any retrievable media object in the one or more digital representations received; and transmission means operative to enable provisioning at least one media object associated with one or more items comprised in the at least one portion of the printed document, wherein the one or more items correspond to one or more strings of the at least one identified string.
  • the receiving means is adapted to receive a captured image of the at least one portion of the printed document taken by the user of the mobile communication device, and wherein the processing means is adapted to identify the at least one portion of the printed document whose image was captured, and to retrieve one or more items associated with a respective portion of the printed document.
  • the transmission means is operative to forward indications to enable recognizing the one or more items in the printed document which correspond to one or more strings of the at least one identified string.
  • the apparatus further comprising a storage means for storing: o at least one electronic file comprising one or more digital representations of a printed document, wherein the printed document comprises a plurality of printed portions and plurality of printed objects associated with at least some of the printed portions; o information related to direct and/or indirect association of each of the plurality of portions comprised in the printed document, with respective printed objects of the plurality of printed objects; and o a plurality of media objects associated with said plurality of printed objects.
  • the processing means is operative to identify the at least one of the plurality of strings by establishing that the at least one of the plurality of strings is associated with one or more retrievable media objects that matches one or more corresponding strings comprised in a database .
  • the processing means is operative to identify the at least one of said plurality of strings by establishing that the at least one of the plurality of strings is associated with one or more media objects that have been found to be of interest to Internet users.
  • the one or more media objects that have been found to be of interest to Internet users are one or more media objects that had frequently been visited by
  • the at least one of the plurality of strings that has been identified as being associated with one or more media objects is selected from among a pre-defined plurality of media objects.
  • the transmission means is operative to forward indications to enable recognizing the one or more items in the printed document which correspond to one or more strings of the at least one identified string.
  • the transmission means is operative to forward to a service provider one or more indications associated with the at least one identified string to enable the provisioning of at least one corresponding media object.
  • the transmission means is operative to forward to the user of the mobile communication device the at least one media object associated with the one or more items comprised in a captured image of the at least one portion of the printed document.
  • a system for enabling the provisioning of one or more media objects associated with at least one portion of a printed document to a user of a mobile communication device comprising: receiving means adapted to receive one or more digital representations of the at least one portion of a printed document; one or more processors each adapted to carry out one or more of the following operations: retrieve from the one or more digital representations, a plurality of strings each comprising at least one word; identify at least one of the plurality of strings which can be associated with one or more retrievable media objects, and wherein the at least one identified string has not been marked as a string associated with one or more retrievable media objects in the one or more digital representations received; and transmission means operative to enable the provisioning of at least one media object associated with one or more items comprised in the at least one portion of the printed document, wherein the one or more items correspond to one or more strings of the at least one identified string.
  • the system comprises receiving means that are adapted to receive an image captured by a user of the mobile communication device, of at least one portion of the printed document, and the one or more processors are adapted to identify the at least one portion of the printed document whose image was captured, and to retrieve one or more items of associated with a respective portion of the printed document.
  • the one or more processors are adapted to provide identification of one or more items in the printed document which correspond to one or more strings of the at least one identified string.
  • the one or more processors are adapted to insert marking of one or more of the items which correspond to one or more strings of the at least one identified string in the digital representation of the printed document.
  • the one or more processors are adapted to match the plurality of strings with strings comprised in a database.
  • the one or more processors are adapted to match the plurality of strings with strings associated with media objects that have been found to be of interest to Internet users.
  • the media objects that have been found to be of interest to Internet users are media objects that had been frequently visited by Internet users.
  • the one or more processors are adapted to match the plurality of strings with strings associated with pre-defined media objects.
  • the one or more processors are adapted to determine at least one portion of the printed document from among a plurality of other portions associated with the printed document, in which to insert marking of the one or more of the items that correspond to one or more strings of the at least one identifyed string, for a reader of the printed document to be able to recognize the one or more identified items for which media objects may be retrieved.
  • the determination of the at least one portion of the printed document by the one or more processors comprises: (i) providing a plurality of portions comprised in the printed document;
  • the similarity factor of the first portion is higher than the pre-defined value, associating another item which appears in the first portion with an identification mark and repeating steps (iii) to (v) .
  • the similarity factor of the first portion is higher than the pre-defined value, applying one or more of the following: a) adding an identification mark to the selected first portion and repeating steps (iii) to (v) ; b) selecting another item which appears in that first printed portion and could be associated with an identification mark and repeating steps (iii) to (v) ; c) associating an identification mark with another portion which belongs to the same article of the printed document and repeating steps (iii) to (v) for said another portion; or d) adding one or more unique symbols from a pre defined library to said selected first portion and repeating steps (iii) to (v) for said another portion.
  • a computer program product encoding a computer program stored on a non-transitory computer readable storage medium for executing a set of instructions by a computer system comprising one or more computer processors for carrying out a process for enabling the provisioning of one or more media objects associated with a printed document to a user of a mobile communication device, wherein the process comprises the steps of: retrieving a plurality of strings from one or more digital representations provided, wherein each of the plurality of strings comprises at least one word; identifying at least one of the plurality of strings which can be associated with one or more retrievable media objects, and wherein the at least one identified string had not been marked as a string associated with one or more retrievable media objects in the one or more digital representations provided; and enabling the provisioning of at least one media object associated with one or more items comprised in the at least one portion of the printed document, to the user, wherein the one or more items correspond to one or more strings of the at least one identified string.
  • the process further comprises a step of matching the one or more strings with strings associated with media objects that had been found to be of interest to a plurality of users such as Internet users for example .
  • FIG. 1 - presents an example of carrying out the method according to an embodiment of the present invention
  • FIG. 2 - presents another example of carrying out the method according to another embodiment of the present invention
  • FIG. 3 - exemplifies a complementary embodiment to the one illustrated in FIG. 2.
  • a reader may retrieve media objects associated with an article he is reading or has just read, or for an editor of a printed document to take a decision of where to insert identification marks so that when a reader of that printed document will be able to send an image of the portion of the printed document that comprises such an identification mark, which is associated with a media object, so that portion is may preferably be relatively easily recognizable from among the rest of that printed document.
  • the received electronic file is processed by the service provider using any method known in the art per se such as OCR, word processing and the like, and a plurality of strings is derived therefrom (step 20) .
  • Each such string may contain a word, a combination of words, a part of a word or any combination thereof.
  • a string may contain an image.
  • the strings according to this example are matched against a database of an Internet search engine in order to establish which of the plurality of strings is included in the database. Although it is likely that most, if not all, strings may be found in the database, still, a screening step may preferably be carried out to allow selecting only those strings that meet a pre-defined criterion (or a number of criteria) .
  • Such a criterion could be a minimum number of entries to a website, number of times people initiated searches for a specific string, number of times people initiated searches for a specific string within a pre- defined period of time (e.g. within last week), etc. (step 30) .
  • the service provider Upon determining which are the strings that would be associated with media objects in the article, the service provider associates these strings with items that appear in the electronic file of the newspaper, thereby resulting in the article being associated with selected items that are in turn associated with retrievable media objects (step 40) .
  • the selected strings associated with a certain article are in fact identical to the items associated with that article, for example in the case that the strings are individual words, and so are the respective items.
  • the information that would be received in the form of media objects (s) could be a video clip of the goals scored, or textual information which, if the user so chooses, would provide him/her with additional information to that which already appear in the sport section.
  • the reader takes a picture of the article describing the soccer game by using his/her cellular telephone (step 60) and transmits the captured image to a pre-defined address (e.g. a telephone number of a subscribers' service provider) (step 70) .
  • a pre-defined address e.g. a telephone number of a subscribers' service provider
  • the captured image is then forwarded for analyzing the image received (step 80) . If the image comprises extractable text having high probability for recognition by applying any process known in the art per se (such as OCR) or pattern matching of an image, then a search is conducted for the unique text/image in the printed-media file of the corresponding daily newspaper so that once the unique image/text is identified, and subsequently the digital file that corresponds thereto, it could be used to obtain the list of items associated with that digital file, from which the user may choose the media objects he/she wishes to retrieve.
  • any process known in the art per se such as OCR
  • pattern matching of an image a search is conducted for the unique text/image in the printed-media file of the corresponding daily newspaper so that once the unique image/text is identified, and subsequently the digital file that corresponds thereto, it could be used to obtain the list of items associated with that digital file, from which the user may choose the media objects he/she wishes to retrieve.
  • This navigational algorithm is an algorithm that allows identifying the location of the captured image within the printed document, e.g. at which page, at which part of a page, etc. is this printed object located, and/or the location within an electronic file representing that printed document, e.g. where is the electronic signal representing the printed object (s) positioned within the electronic signal representing the combination of all the newspaper's articles (step 90) .
  • This algorithm preferably allows easy identification of a partial image within a large frame. Once the image is identified, its location
  • the service provider Upon identifying the article, the service provider becomes aware of the items that have already been associated with that article and may retrieve for the user the media objects associated with the items included in that article (step 110) . Next, the service provider may send to the reader either the media objects associated with the selected items, or a list of the selected items for the user to choose from, or any other applicable way that would allow the user to receive the requested media objects.
  • Sending the media objects to the user could either be to the same cellular phone number from which the captured image was sent, or once the user has been identified (e.g. through his/her telephone number) , to any pre-defined communication address of the user's choice, such as to his/her e-mail address.
  • another option of utilizing the present invention is to provide at tool for example for an editor of a printed document, whether in a newspaper, a magazine, a book etc., a tool that enables the editor to insert identification marks that can be used by readers of the printed document to realize that they can retrieve on their mobile communication device one or more media objects that are relevant to the items in the printed document that are marked by the identification mark.
  • a tool that enables the editor to insert identification marks that can be used by readers of the printed document to realize that they can retrieve on their mobile communication device one or more media objects that are relevant to the items in the printed document that are marked by the identification mark.
  • the service provider identifies the location of the portion whose image was taken, and by using prior knowledge of the items associated with that portion of the printed document, he may establish which are the media objects that are of interest to the reader.
  • these media objects are retrieved from the service provider database, and transmitted to the mobile communication device of the reader.
  • One of the concerns associated with that type of operation is the problem of how to figure out where should the identification marks be inserted in the document to be printed, while ensuring that the marked items are indeed associated with retrievable media objects, and preferably, media objects that are likely to be of interest to the reader.
  • FIG. 2 Let us consider now a method illustrated in FIG. 2, in which we assume that the editor is working on an electronic file of tomorrow's daily newspaper and wishes to establish where to insert the identification marks.
  • the editor sends the electronic file which is the digital representation of tomorrow's daily newspaper (or a part thereof) to a service provider either in a word format, or in a pdf format, or in a text format or in any other applicable format (step 210) .
  • each such string may contain a word, a combination of words, a part of a word or any combination thereof.
  • a string may contain an image.
  • the strings according to this example are matched against a database of an Internet search engine in order to establish which of the plurality of strings is included in the database. Although it is likely that most if not all the strings may be found in the database, preferably a screening process may be carried out, leaving only those strings that meet a pre-defined criterion as discussed in the previous example (step 230) .
  • the service provider Upon determining the strings that would be associated with media objects, the service provider associates these strings with items that appear in the electronic file (step 240) and inserts identification marks in the electronic file, thereby marking these items so as to allow a reader of the newspaper to easily recognize which are the items for which media objects may be retrieved (step 250) .
  • the electronic file with the marked items is then returned to the editor and can be used for printing the newspaper (step 260) .
  • Fig. 3 illustrates a further embodiment of the present invention. As was previously explained, there is another concern associated with that type of operation, the problems that might be faced when trying to figure out where should the identification of an item be made
  • the paragraph comprises an item associated with a string to which there is a media object to be linked to.
  • the paragraph relates to a certain soccer game
  • the goals scored may be presented in the media object, or the media object may comprise textual information which, if the user so chooses, would provide him/her with additional information to that which would appear in the printed paragraph.
  • the first choice of the editor is to mark (e.g. to underline) the name of the player who scored a goal.
  • a portion that contains the name of the player is selected (step 310) .
  • the selection can be made manually, e.g. by the editor determining one or more corners of the portion, or by automatically selecting such a portion, e.g. when the underlined name appears at the center of the portion.
  • the selected portion is compared directly or indirectly with other portions that will be printed as part of tomorrow's newspaper (step 320) .
  • the comparison is preferably made with other portions that are stored in a database, or by ad hoc dividing the electronic file of the newspaper into a plurality of such printed portions, or with printed portions obtained in any other suitable method.
  • the comparison is made by using any method known in the art per se, for matching/recognition an image by comparing it to other images. Such a method could be for example SIFT
  • the approximation of the similarity factor may be written as follows : l-P(i ⁇ j)*P(i ⁇ k)*...
  • the result of the analysis for a given portion provides an indication of the likelihood of confusion if the reader would take a picture of that portion with one or more other portions comprised in that newspaper (step 330) .
  • the indication received for the portion selected is that the probability of confusion is too high (step 340) .
  • the editor receives that indication, he will review the digital representation of the newspaper and will select another portion that contains the selected item (step 370) and will mark the latter as before in step 310, say the title of the article related to that soccer game.
  • a reader who gets the printed document and is interested in receiving the video clip of the goal will view this mark and will use the phone camera to take a picture of that page of the newspaper with the title.
  • the picture will be forwarded to the service provider who in turn will have no problem to identify which paragraph was photographed by the reader and will provide the reader with the requested media object (step 380) .
  • the database used to determine which of the words that appear in the article would be considered as selected items for which the user may retrieve the media objects may be comprised in the service provider's server, or in the alternative, may be uploaded to the mobile device. It should be understood that any such shifting a functionality from the mobile device to the server and vice versa, is a matter of simple selection and can be done without departing from the scope of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method is provided for enabling provisioning to a user of a mobile communication device, of one or more media objects associated with a printed document. The method comprises the steps of: providing digital representations of a number of portions of a printed document; retrieving from the digital representations a plurality of strings, each comprising at least one word; identifying at least one of the plurality of strings which may be associated with one or more retrievable media objects, and wherein the at least one identified string had not been marked as being a string associated with retrievable media objects in the digital representations provided; and enabling the provisioning to the user of the media object(s) associated with items comprised in at least one portion of the printed document, wherein the items correspond to one or more strings of the at least one identified string.

Description

A METHOD AND MEANS FOR IDENTIFYING ITEMS IN A PRINTED DOCUMENT ASSOCIATED WITH MEDIA OBJECTS
Field of the Invention The present invention relates in general to the field of communications, and in particular to a device and method for allowing retrieval of information based upon content included in a printed item.
Background of the Invention
In our co-pending applications published under US 20090046320, WO 2009/104193, WO 2009/147675 and WO 2010/001389, new methods are described for enabling the provisioning to a user of a mobile communication device, media objects that are associated with a printed document, such as a newspaper. The methods described, rely on the fact that there are one or more marked objects comprised in the printed documents, to allow the user to recognize the items for which associated media object (s) may be retrieved. To allow such retrieval, the user takes a picture of a printed portion of the printed document which comprises the marked object, the picture is then transmitted to a service provider, who processes the received picture and eventually sends back to the user the media object (s) of interest associated with the marked object.
The various prior art solutions are based upon realizing which media object is requested by processing the picture transmitted in order to retrieve a marked object therefrom, and thereafter proceeding based upon the information retrieved in order to provide the user with the requested media object.
Typically, the prior art solutions are not too concerned with the problem of how to provide a reader with the opportunity to retrieve media objects associated with items that have not been pre-selected by the editor of the printed documents indicating them as being items that could lead the reader to such media objects, nor with the problem of how to select items that can safely be marked within the printed document, knowing that when the user takes a picture of a portion of the printed document that comprises the marked item, the service provider will be able to establish which media object has been requested. The latter problem may be regarded as a two parts problem. The first being, where should the marking be placed in order to ensure high probability to successfully identify the portion of the printed document which image would be taken by the user, and the second, is, how to allow easy selection of the objects to be marked while ensuring that indeed there is a media object associated therewith that can be retrieved and forwarded to the user.
The first part of this problem was discussed in our co-pending application published under WO 2009/14767514, which provides means to improve the probability of identifying a part of a printed document which image has been captured, out of the whole printed document, even if the image capturing device is a low quality device, such as low resolution cameras integrated in cellular telephones, with or without auto-focus.
However, no adequate solution has yet been suggested to the question of how to select the items to be marked, so that once they are identified as marked objects, there will be a retrievable media object associated with that item. The prior art solutions which tried to address this problem, either proposed to use pre-determined items such as a trademark in a printed ad that would eventually lead to a presentation of the product or to the company whose trademark is the marked object, or to rely on the editor of the printed document to insert the marking. However, for the latter to carry out this task, it involves a tedious and cumbersome process by which the editor (or his co-workers) should go through each of the items to be marked in the printed document, decide which of them are likely to be of interest to the readers and then check whether there could be found retrievable media objects that can be associated with these items, and then where should the marks be placed. The present invention seeks to provide a method and device to overcome this problem.
Summary of the Invention
It is an object of the present invention to provide a device, a system and a method to allow a reader of a printed document to retrieve media objects associated with items of interest that appear unmarked in a printed document .
It is another object of the present invention to provide a device, a system and a method to allow identifying items within a printed document which are associated with retrievable media objects.
It is another object of the present invention to provide a method and a device for selecting objects for marking in a printed document.
It is still another object of the present invention to provide a method and a device to enable selection of objects for marking in a printed document, where the selection is based, at least partially, upon the interest of Internet users in the selected objects.
Other objects of the invention will become apparent as the description of the invention proceeds.
According to a first embodiment of the present invention there is provided a method for enabling provisioning to a user of a mobile communication device, of one or more media objects associated with a printed document. The method comprises the steps of: providing one or more digital representations of at least one portion of a printed document, where each of the representations may be for example in the form of an image, text and the like or any combination thereof; retrieving -from the one or more digital representations a plurality of strings, each comprising at least one word, where the retrieval may be carried out by applying a process like an OCR process on a received image or directly from the received text; identifying at least one of the plurality of strings which can be associated with one or more retrievable media objects, and wherein the at least one identified string had not been marked as being a string associated with one or more retrievable media objects in the one or more digital representations provided; and enabling the provisioning of at least one media object associated with one or more items comprised in the at least one portion of the printed document, to the user, wherein the one or more items correspond to one or more strings of the at least one identified string.
The term "printed document" as used herein and throughout the specification and claims should be understood to encompass a newspaper, a magazine, a periodical, a brochure, a book etc.
The term "item" as well as the term "printed object" refer to a printed matter such as for example a combination of words, a title of an article, an advertisement, a headline in a daily newspaper, and the like, so that by invoking the method of the present invention, the user is eventually able to receive predefined video signals, links to mobile internet pages, a link to a mobile internet browser with a selected word, etc., that are associated/linked with the item/printed object which in turn is associated with the printed portion whose image would be captured by the user's camera . The term "word" as used herein throughout the specification and claims should be understood to encompass also a combination of characters, one or more parts of words as well as combinations of words.
The term "image" as used herein should be understood to encompass text, a figure, a logo, a symbol and the like, or any combination thereof as the case may be, converted into a digital representation.
The term "media object" as used herein should be understood to encompass a unit of image data (optionally accompanied by other media data, e.g. sound or Internet link) , and encompasses individual static images as well as sequences of video images or video frames (such sequences are referred to herein as media objects) . Media objects may alternatively or additionally comprise animation data defining animations. The term "animation" as used herein encompasses any form of animation, including, for example, frame-based animation, vector animation or procedural animation. The term "media object" should be considered to encompass also any such animations. The term "image data" preferably refers to image data of static images or of video/animation frames in media objects, or to image data associated with animations .
As will be appreciated by those skilled in the art, the number of items (printed objects) does not necessarily have to match the number of media objects, as to some of the items there could be more than one media object associated therewith. According to another embodiment of the invention, the method provided further comprising a step of receiving a captured image of at least one portion of the printed document taken by the user of the mobile communication device, identifying the at least one portion of the printed document whose image was captured, identifying the one or more corresponding digital representations, and retrieving the one or more items belonging to the one or more corresponding digital representations, e.g. to enable the user to select a media object associated with one of these one or more items. Preferably, a digital representation of the respective portion would be the digital representation of one or more articles to which the portion whose image was taken, belong.
By another preferred embodiment, the step of retrieving the plurality of strings comprises identifying one or more printed portions of the printed document whose image has been captured at the captured image from among a plurality of printed portions included in the printed document.
In the case where the method provided by the present invention is implemented in a process for determining where to insert identification marks in a document to be printed, then preferably the method comprises upon identifying the one or more items, inserting corresponding identification marks into the digital representation of the printed document. These identification marks would be used to provide the user (the reader) with easier identification of items within the printed document, thereby enabling him/her to realize which are the items that appear in the printed document that are associated with one or more media objects which he/she may retrieve. According to another embodiment of the present invention, the step of identifying at least one string which can be associated with one or more retrievable media objects comprises matching the plurality of strings with strings comprised in a database. In addition or in the alternative, the step of identifying at least one string which can be associated with one or more retrievable media objects comprises matching the plurality of strings with strings associated with media objects that have been found to be of interest to a plurality of users, e.g. Internet users. Preferably, the media objects that have been found to be of interest to the plurality of users are media objects that had been frequently visited by Internet users (e.g. as determined by a search engine statistical tool) .
In accordance with another embodiment of the invention, the step of identifying at least one string which can be associated with one or more retrievable media objects, comprises matching the plurality of strings with strings associated with pre-defined media objects (e.g. paid advertisements, paid links to certain websites, etc . ) .
By another embodiment of the present invention, in the case where an image of a portion of the printed document is taken by the mobile phone camera, the identification of the captured portion of the printed document comprises applying an OCR process or an image matching process on the captured image, and once the portion of the printed document has been identified, the one or more associated digital representations are recognized, and this recognition subsequently leads to the provisioning of a list of items (words) that had already been associated with that one or more digital representations (e.g. of the article) . For the case where the items are eventually associated with one or more digital representations, and the one or more digital representations are text representations of articles included in the printed document and are already in a word format, the plurality of strings (words) may be directly associated with one or more articles which correspond to the one or more digital representations .
According to another preferred embodiment of the invention, the step of enabling the provisioning of at least one media object to the user, comprises communicating the at least one media object associated with the one or more items comprised in the captured image of the at least one portion of the printed document, to the user of the mobile communication device.
By another preferred embodiment of the invention, the step of enabling the provisioning of at least one media object, comprises providing identification of one or more items in the printed document which correspond to one or more strings of the at least one identified string.
As was previously explained, apart from determining which of the items included in the printed document may be associated with a media object as discussed above, sometimes it may preferably also involve a step of determining a preferred location within the printed document to insert the identification mark of an item. For example, when an item appears in the printed document more than once, the editor may have an option to determine which of the available occurrences will be identified for the user as being items with which media objects may be retrieved.
Thus, according to another preferred embodiment of the invention, the method provided further comprises a step of determining at least one portion of the printed document from among a plurality of other portions associated with that printed document for marking the one or more identified items that correspond to one or more strings of the at least one identified string, so that a reader of the printed document is able to recognize the one or more associated items for which media objects may be retrieved.
Preferably, the step of identifying at least one portion of the printed document comprises:
(i) providing a plurality of portions (e.g. each being one or more sections or parts of sections) comprised in the one or more digital representations;
(ii) selecting a first portion from among said plurality of portions;
(iii) carrying out a comparative analysis between said first portion and at least one other portion selected from among said plurality of portions;
(iv) based on the analysis results, determining a similarity factor for the selected first portion; and
(v) if the similarity factor of the first portion is lower than a pre-defined value, allowing the association of one or more identification marks with the one or more items comprised within said first portion. Still preferably, if the similarity factor of the first portion is higher than the pre-defined value, applying one or more of the following steps: a) adding an identification mark to said selected first portion and repeating steps (iii) to (v) ; b) selecting another item which appears in that first portion and could be associated with an identification mark and repeating steps (iii) to (v) ; c) selecting another portion that comprises at least one item included in said first portion and repeating steps (iii) to (v) ; d) associating an identification mark with another portion which would belong to the same article in the printed document and repeating steps (iii) to (v) for said another portion; or e) adding one or more unique symbols from a pre defined library to said selected first portion and repeating steps (iii) to (v) for said other portion.
By yet another embodiment, the method provided further comprises repeating steps (ii) to (v) for each of the plurality of portions, thereby establishing in which portions out of said plurality of portions, it would be possible to associate the one or more identification marks .
According to another aspect of the invention there is provided an apparatus (e.g. a server) for enabling the provisioning of one or more media objects associated with at least one portion of a printed document, to a user of a mobile communication device. The apparatus comprises the following: receiving means adapted to receive one or more digital representations of the at least one portion of a printed document; processing means adapted to: retrieve from the one or more digital representations, a plurality of strings each comprising at least one word; identify at least one of the plurality of strings which can be associated with one or more retrievable media objects, and wherein the at least one identified string has not been marked as being a string associated with any retrievable media object in the one or more digital representations received; and transmission means operative to enable provisioning at least one media object associated with one or more items comprised in the at least one portion of the printed document, wherein the one or more items correspond to one or more strings of the at least one identified string.
In accordance with an embodiment of this aspect of the invention, the receiving means is adapted to receive a captured image of the at least one portion of the printed document taken by the user of the mobile communication device, and wherein the processing means is adapted to identify the at least one portion of the printed document whose image was captured, and to retrieve one or more items associated with a respective portion of the printed document.
By another preferred embodiment, the transmission means is operative to forward indications to enable recognizing the one or more items in the printed document which correspond to one or more strings of the at least one identified string.
Preferably, the apparatus provided further comprising a storage means for storing: o at least one electronic file comprising one or more digital representations of a printed document, wherein the printed document comprises a plurality of printed portions and plurality of printed objects associated with at least some of the printed portions; o information related to direct and/or indirect association of each of the plurality of portions comprised in the printed document, with respective printed objects of the plurality of printed objects; and o a plurality of media objects associated with said plurality of printed objects. According to still another embodiment, the processing means is operative to identify the at least one of the plurality of strings by establishing that the at least one of the plurality of strings is associated with one or more retrievable media objects that matches one or more corresponding strings comprised in a database .
According to yet another preferred embodiment of the present invention the processing means is operative to identify the at least one of said plurality of strings by establishing that the at least one of the plurality of strings is associated with one or more media objects that have been found to be of interest to Internet users.
Preferably, the one or more media objects that have been found to be of interest to Internet users are one or more media objects that had frequently been visited by
Internet users. In addition or in the alternative, the at least one of the plurality of strings that has been identified as being associated with one or more media objects is selected from among a pre-defined plurality of media objects.
By another preferred embodiment, the transmission means is operative to forward indications to enable recognizing the one or more items in the printed document which correspond to one or more strings of the at least one identified string.
According to another preferred embodiment, the transmission means is operative to forward to a service provider one or more indications associated with the at least one identified string to enable the provisioning of at least one corresponding media object.
In accordance with still another preferred embodiment, the transmission means is operative to forward to the user of the mobile communication device the at least one media object associated with the one or more items comprised in a captured image of the at least one portion of the printed document.
In accordance with still another aspect of the present invention, there is provided a system for enabling the provisioning of one or more media objects associated with at least one portion of a printed document to a user of a mobile communication device, wherein the system comprises: receiving means adapted to receive one or more digital representations of the at least one portion of a printed document; one or more processors each adapted to carry out one or more of the following operations: retrieve from the one or more digital representations, a plurality of strings each comprising at least one word; identify at least one of the plurality of strings which can be associated with one or more retrievable media objects, and wherein the at least one identified string has not been marked as a string associated with one or more retrievable media objects in the one or more digital representations received; and transmission means operative to enable the provisioning of at least one media object associated with one or more items comprised in the at least one portion of the printed document, wherein the one or more items correspond to one or more strings of the at least one identified string.
According to yet another embodiment of the invention, the system provided comprises receiving means that are adapted to receive an image captured by a user of the mobile communication device, of at least one portion of the printed document, and the one or more processors are adapted to identify the at least one portion of the printed document whose image was captured, and to retrieve one or more items of associated with a respective portion of the printed document.
In accordance with another preferred embodiment of the invention, the one or more processors are adapted to provide identification of one or more items in the printed document which correspond to one or more strings of the at least one identified string.
According to still another embodiment of the invention the one or more processors are adapted to insert marking of one or more of the items which correspond to one or more strings of the at least one identified string in the digital representation of the printed document.
According to yet another preferred embodiment, the one or more processors are adapted to match the plurality of strings with strings comprised in a database. In addition or in the alternative, the one or more processors are adapted to match the plurality of strings with strings associated with media objects that have been found to be of interest to Internet users. Preferably, the media objects that have been found to be of interest to Internet users are media objects that had been frequently visited by Internet users. In addition or in another alternative, the one or more processors are adapted to match the plurality of strings with strings associated with pre-defined media objects.
By still another embodiment, the one or more processors are adapted to determine at least one portion of the printed document from among a plurality of other portions associated with the printed document, in which to insert marking of the one or more of the items that correspond to one or more strings of the at least one identifyed string, for a reader of the printed document to be able to recognize the one or more identified items for which media objects may be retrieved.
Preferably, the determination of the at least one portion of the printed document by the one or more processors, comprises: (i) providing a plurality of portions comprised in the printed document;
(ii) selecting a first portion associated with the printed document;
(iii) carrying out a comparative analysis between the first portion and at least one other portion selected from among the plurality of portions;
(iv) based on the analysis results, determining a similarity factor for the selected first portion; and
(v) if the similarity factor of the first portion is lower than a pre-defined value, allowing the association of one or more identification marks with the one or more items comprised within the first portion.
Preferably, if the similarity factor of the first portion is higher than the pre-defined value, associating another item which appears in the first portion with an identification mark and repeating steps (iii) to (v) .
In the alternative, if the similarity factor of the first portion is higher than the pre-defined value, applying one or more of the following: a) adding an identification mark to the selected first portion and repeating steps (iii) to (v) ; b) selecting another item which appears in that first printed portion and could be associated with an identification mark and repeating steps (iii) to (v) ; c) associating an identification mark with another portion which belongs to the same article of the printed document and repeating steps (iii) to (v) for said another portion; or d) adding one or more unique symbols from a pre defined library to said selected first portion and repeating steps (iii) to (v) for said another portion.
According to another aspect of the invention there is provided a computer program product encoding a computer program stored on a non-transitory computer readable storage medium for executing a set of instructions by a computer system comprising one or more computer processors for carrying out a process for enabling the provisioning of one or more media objects associated with a printed document to a user of a mobile communication device, wherein the process comprises the steps of: retrieving a plurality of strings from one or more digital representations provided, wherein each of the plurality of strings comprises at least one word; identifying at least one of the plurality of strings which can be associated with one or more retrievable media objects, and wherein the at least one identified string had not been marked as a string associated with one or more retrievable media objects in the one or more digital representations provided; and enabling the provisioning of at least one media object associated with one or more items comprised in the at least one portion of the printed document, to the user, wherein the one or more items correspond to one or more strings of the at least one identified string.
According to another embodiment of this aspect of the invention, the process further comprises a step of matching the one or more strings with strings associated with media objects that had been found to be of interest to a plurality of users such as Internet users for example .
Brief Description of the Drawing
For a more complete understanding of the present invention, reference is made to the following detailed description taken in conjunction with the accompanying drawing wherein: FIG. 1 - presents an example of carrying out the method according to an embodiment of the present invention; FIG. 2 - presents another example of carrying out the method according to another embodiment of the present invention; and FIG. 3 - exemplifies a complementary embodiment to the one illustrated in FIG. 2.
Detailed Description of the Invention
The following are examples demonstrating certain ways of carrying out embodiments of the present invention, by which a reader may retrieve media objects associated with an article he is reading or has just read, or for an editor of a printed document to take a decision of where to insert identification marks so that when a reader of that printed document will be able to send an image of the portion of the printed document that comprises such an identification mark, which is associated with a media object, so that portion is may preferably be relatively easily recognizable from among the rest of that printed document.
Let us consider now the following first example in which a reader of a newspaper wishes to retrieve a media object associated with an article he has just read. Prior to distributing the newspaper to the readers, an electronic file, which is the digital representation of the newspaper (or a part thereof) to be printed, is sent to a service provider either in a word format, or in a pdf format, or in a text format or in any other applicable format (step 10) .
The received electronic file is processed by the service provider using any method known in the art per se such as OCR, word processing and the like, and a plurality of strings is derived therefrom (step 20) . Each such string may contain a word, a combination of words, a part of a word or any combination thereof. In addition, a string may contain an image. The strings according to this example are matched against a database of an Internet search engine in order to establish which of the plurality of strings is included in the database. Although it is likely that most, if not all, strings may be found in the database, still, a screening step may preferably be carried out to allow selecting only those strings that meet a pre-defined criterion (or a number of criteria) . Such a criterion could be a minimum number of entries to a website, number of times people initiated searches for a specific string, number of times people initiated searches for a specific string within a pre- defined period of time (e.g. within last week), etc. (step 30) .
Upon determining which are the strings that would be associated with media objects in the article, the service provider associates these strings with items that appear in the electronic file of the newspaper, thereby resulting in the article being associated with selected items that are in turn associated with retrievable media objects (step 40) . As will be appreciated, it could well be that some or all of the selected strings associated with a certain article are in fact identical to the items associated with that article, for example in the case that the strings are individual words, and so are the respective items. Now, let us turn to a reader who has just read the sport section in a newspaper and now wishes to receive further information (step 50) associated with a certain soccer game described in that section. The information that would be received in the form of media objects (s), could be a video clip of the goals scored, or textual information which, if the user so chooses, would provide him/her with additional information to that which already appear in the sport section.
For the next step, the reader takes a picture of the article describing the soccer game by using his/her cellular telephone (step 60) and transmits the captured image to a pre-defined address (e.g. a telephone number of a subscribers' service provider) (step 70) .
The captured image is then forwarded for analyzing the image received (step 80) . If the image comprises extractable text having high probability for recognition by applying any process known in the art per se (such as OCR) or pattern matching of an image, then a search is conducted for the unique text/image in the printed-media file of the corresponding daily newspaper so that once the unique image/text is identified, and subsequently the digital file that corresponds thereto, it could be used to obtain the list of items associated with that digital file, from which the user may choose the media objects he/she wishes to retrieve.
However, typically, the image would be of low quality due to any one or more reasons such as low resolution camera, no-auto-focus, reflection, backlight, uneven-surface or any other mechanism affecting the image quality, then a navigational algorithm is initiated. This navigational algorithm is an algorithm that allows identifying the location of the captured image within the printed document, e.g. at which page, at which part of a page, etc. is this printed object located, and/or the location within an electronic file representing that printed document, e.g. where is the electronic signal representing the printed object (s) positioned within the electronic signal representing the combination of all the newspaper's articles (step 90) . This algorithm preferably allows easy identification of a partial image within a large frame. Once the image is identified, its location
(e.g. even of a partial printed object) within the printed document and/or within the electronic file representing that printed document is retrieved, so that the article which the reader was interested in is identified, based upon its relative location in the printed newspaper (step 100) . Upon identifying the article, the service provider becomes aware of the items that have already been associated with that article and may retrieve for the user the media objects associated with the items included in that article (step 110) . Next, the service provider may send to the reader either the media objects associated with the selected items, or a list of the selected items for the user to choose from, or any other applicable way that would allow the user to receive the requested media objects. Sending the media objects to the user could either be to the same cellular phone number from which the captured image was sent, or once the user has been identified (e.g. through his/her telephone number) , to any pre-defined communication address of the user's choice, such as to his/her e-mail address.
As previously discussed, another option of utilizing the present invention is to provide at tool for example for an editor of a printed document, whether in a newspaper, a magazine, a book etc., a tool that enables the editor to insert identification marks that can be used by readers of the printed document to realize that they can retrieve on their mobile communication device one or more media objects that are relevant to the items in the printed document that are marked by the identification mark. By this implementation, once the reader is interested in receiving the media objects, he/she simply takes a picture of that part of the printed document (preferably which includes the identification mark) by his/her camera telephone and have it transmitted to the service provider. Once the image of the picture taken by the reader is received by the service provider, the service provider identifies the location of the portion whose image was taken, and by using prior knowledge of the items associated with that portion of the printed document, he may establish which are the media objects that are of interest to the reader. Next, these media objects are retrieved from the service provider database, and transmitted to the mobile communication device of the reader. One of the concerns associated with that type of operation is the problem of how to figure out where should the identification marks be inserted in the document to be printed, while ensuring that the marked items are indeed associated with retrievable media objects, and preferably, media objects that are likely to be of interest to the reader.
Let us consider now a method illustrated in FIG. 2, in which we assume that the editor is working on an electronic file of tomorrow's daily newspaper and wishes to establish where to insert the identification marks. The editor sends the electronic file which is the digital representation of tomorrow's daily newspaper (or a part thereof) to a service provider either in a word format, or in a pdf format, or in a text format or in any other applicable format (step 210) .
Once the electronic file is received it is reviewed by the service provider using any method known in the art per se such as OCR, word processing and the like, and a plurality of strings is derived therefrom (step 220) . Each such string may contain a word, a combination of words, a part of a word or any combination thereof. In addition, a string may contain an image. The strings according to this example are matched against a database of an Internet search engine in order to establish which of the plurality of strings is included in the database. Although it is likely that most if not all the strings may be found in the database, preferably a screening process may be carried out, leaving only those strings that meet a pre-defined criterion as discussed in the previous example (step 230) .
Upon determining the strings that would be associated with media objects, the service provider associates these strings with items that appear in the electronic file (step 240) and inserts identification marks in the electronic file, thereby marking these items so as to allow a reader of the newspaper to easily recognize which are the items for which media objects may be retrieved (step 250) . The electronic file with the marked items is then returned to the editor and can be used for printing the newspaper (step 260) .
Fig. 3 illustrates a further embodiment of the present invention. As was previously explained, there is another concern associated with that type of operation, the problems that might be faced when trying to figure out where should the identification of an item be made
(provided of course that there are a number of possible locations for marking the same item) so that the portion of the printed document that the reader took its picture can be identified, even while using poor quality camera, under harsh environmental conditions, etc. The solution which this embodiment provides for minimizing the difficulties the service provider might have to recognize the portion of the printed document included in the image, is, to select to the extent possible, proper locations within the printed document, where the identification marks will be inserted, while editing the article and prior to its printing.
Let us revert to the steps described in Fig. 2. After receiving the screened list of strings (step 240) one needs to determine where to insert the identification mark for a certain item in a certain paragraph of newspaper, as the paragraph comprises an item associated with a string to which there is a media object to be linked to. For example, if the paragraph relates to a certain soccer game, the goals scored may be presented in the media object, or the media object may comprise textual information which, if the user so chooses, would provide him/her with additional information to that which would appear in the printed paragraph. In the above example of the soccer competition, let us assume that the first choice of the editor is to mark (e.g. to underline) the name of the player who scored a goal.
Next, a portion that contains the name of the player is selected (step 310) . The selection can be made manually, e.g. by the editor determining one or more corners of the portion, or by automatically selecting such a portion, e.g. when the underlined name appears at the center of the portion. Then, the selected portion is compared directly or indirectly with other portions that will be printed as part of tomorrow's newspaper (step 320) . The comparison is preferably made with other portions that are stored in a database, or by ad hoc dividing the electronic file of the newspaper into a plurality of such printed portions, or with printed portions obtained in any other suitable method. The comparison is made by using any method known in the art per se, for matching/recognition an image by comparing it to other images. Such a method could be for example SIFT
(as described in "Object Recognition from Local Scale- Invariant Features" (1999) David G. Lowe Proc. of the International Conference on Computer Vision ICCV, Corfu) , SURF (as described in "SURF: Speeded Up Robust Features", Herbert Bay, Tinne Tuytelaars, Luc Van Gool, Proceedings of the ninth European Conference on Computer Vision, May 2006) , etc. Based on such a matching algorithm, a portion of the newspaper to be checked is compare with other portions comprised in the database. From that comparison one may derive a grade established by using various internal parameters, e.g. the number of good corresponding features. Some matching algorithms are robust to image projective transforms (such as zoom, rotate and angle) , while others are robust to changes in the intensity and blur. For those matching/recognition algorithms that are not robust to the expected distortions, a preprocessing step may be required, such as creating pyramids of image in size /blur. One way of calculating the similarity (uniqueness) factor may be as follows. Each comparison made as explained above between the selected portion i and another portion comprised in the database, j, would get a grade. This grade can then be converted into an approximation of the probability that the selected portion is not the checked portion from database (P(i≠j), by any appropriate mathematical method known in the art. Thus, assuming independency between the various results obtained while carrying out the comparisons, the approximation of the similarity factor may be written as follows : l-P(i≠j)*P(i≠k)*... The result of the analysis for a given portion, provides an indication of the likelihood of confusion if the reader would take a picture of that portion with one or more other portions comprised in that newspaper (step 330) . Let us assume that the indication received for the portion selected is that the probability of confusion is too high (step 340) . Once the editor receives that indication, he will review the digital representation of the newspaper and will select another portion that contains the selected item (step 370) and will mark the latter as before in step 310, say the title of the article related to that soccer game. The above described process is repeated and again the probability for confusion is determined. One way of continuing the process is for the editor to use a map of the printed document where all printed portions of the printed document that are unique when compared with the rest of the printed document's portions, are indicated as ones in which the editor may use to insert the identification mark(s) thereat. Now, once the editor chooses one of these printed portions, the value of the similarity factor will obviously be low enough and less than a predetermined threshold. Then the editor confirms his/her choice of the printed portion where the identification mark will be inserted thereby ensuring that in tomorrow's newspaper the title will be marked accordingly (step 370) . Thereafter, a reader who gets the printed document and is interested in receiving the video clip of the goal, will view this mark and will use the phone camera to take a picture of that page of the newspaper with the title. The picture will be forwarded to the service provider who in turn will have no problem to identify which paragraph was photographed by the reader and will provide the reader with the requested media object (step 380) .
It is to be understood that the above description only includes some embodiments of the invention and serves for its illustration. Numerous other ways of carrying out the methods provided by the present invention may be devised by a person skilled in the art without departing from the scope of the invention, and are thus encompassed by the present invention.
It is to be understood that the above description only includes some embodiments of the invention and serves for its illustration. Numerous other ways of carrying out the methods provided by the present invention may be devised by a person skilled in the art without departing from the scope of the invention, and are thus encompassed by the present invention. For example, it should be clear to any person skilled in the art that the functionalities required to carry out the present invention may be divided differently between the mobile device and the server. To name but few, the processing of the image in order to derive therefrom the items associated with the article whose picture was taken may be carried out either by the mobile device or in the alternative, the captured image can be sent to the server where the processing can take place. Similarly, the database used to determine which of the words that appear in the article would be considered as selected items for which the user may retrieve the media objects, may be comprised in the service provider's server, or in the alternative, may be uploaded to the mobile device. It should be understood that any such shifting a functionality from the mobile device to the server and vice versa, is a matter of simple selection and can be done without departing from the scope of the present invention.

Claims

Claims
1. A method to enable provisioning to a user of a mobile communication device, of one or more media objects associated with a printed document, wherein said method comprises the steps of: providing one or more digital representations of at least one portion of a printed document; retrieving from said one or more digital representations a plurality of strings each comprising at least one word; identifying at least one of said plurality of strings which can be associated with one or more retrievable media objects, and wherein the at least one identified string had not been previously marked as being a string associated with one or more retrievable media objects in the one or more digital representations provided; and enabling the provisioning of at least one media object associated with one or more items comprised in the at least one portion of the printed document, wherein said one or more items correspond to one or more strings of said at least one identified string.
2. The method according to claim 1, further comprising: receiving a captured image of at least one portion of the printed document, identifying the at least one portion of the printed document whose image was captured, identifying the one or more corresponding digital representations, and retrieving said one or more items associated with the one or more corresponding digital representations .
3. The method according to claim 1, wherein the step of identifying at least one string which can be associated with one or more retrievable media objects, comprises matching said plurality of strings with strings comprised in a database.
4. The method according to claim 1, wherein the step of identifying at least one string which can be associated with one or more retrievable media objects, comprises matching said plurality of strings with strings associated with media objects that have been found to be of interest to a plurality of Internet users .
5. The method according to claim 4, wherein said media objects that have been found to be of interest to Internet users are media objects that had been frequently visited by a plurality of Internet users.
6. The method according to claim 1, wherein the step of identifying at least one string which can be associated with one or more retrievable media objects, comprises matching said plurality of strings with strings associated with pre-defined media objects.
7. The method according to claim 2, wherein the step of enabling the provisioning to said user at least one media object, comprises communicating the at least one media object associated with said one or more items comprised in the captured image of the at least one portion of the printed document, to said user of the mobile communication device.
8. The method according to claim 1, wherein the step of enabling the provisioning to said user of at least one media object, comprises providing identification of one or more items in the printed document which correspond to one or more strings of the at least one identified string.
9. The method according to claim 8, further comprising a step of determining at least one portion of the printed document from among a plurality of other portions associated with said printed document, as being suitable for inserting an identification mark of the one or more items that correspond to one or more strings of the at least one identified string.
10. The method according to claim 9, wherein the step of determining the at least one portion of the printed document comprises: (i) providing a plurality of portions comprised in the one or more digital representations;
(ii) selecting a first portion from among said plurality of portions;
(iii) carrying out a comparative analysis between said first portion and at least one other portion selected from among said plurality of portions;
(iv) based on the analysis results, determining a similarity factor for the selected first portion; and
(v) if the similarity factor of the first portion is lower than a pre-defined value, allowing the association of one or more identification marks with the one or more items comprised within said first portion.
11. The method according to claim 10, wherein if said similarity factor of the first portion is higher than said pre-defined value, applying one or more of the following steps: a) adding an identification mark to said selected first portion and repeating steps (iii) to (v) ; b) selecting another item which appears in that first portion and could be associated with an identification mark and repeating steps (iii) to (v) ; c) selecting another portion that comprises at least one item included in said first portion and repeating steps (iii) to (v) ; d) associating an identification mark with another portion which would belong to the same article in the printed document and repeating steps (iii) to (v) for said another portion; or e) adding one or more unique symbols from a pre defined library to said selected first portion and repeating steps (iii) to (v) for said other portion.
12. The method according to claim 10, further comprising repeating steps (ii) to (v) for each of said plurality of portions, thereby establishing in which portions out of said plurality of portions, it would be possible to associate said one or more identification marks.
13. A communication apparatus adapted to enable provisioning of one or more media objects associated with at least one portion of a printed document to a user of a mobile communication device, wherein said communication apparatus comprises: receiving means adapted to receive one or more digital representations of the at least one portion of a printed document; processing means adapted to: retrieve from said one or more digital representations, a plurality of strings each comprising at least one word; identify at least one of said plurality of strings which can be associated with one or more retrievable media objects, and wherein the at least one identified string has not been marked as a string associated with one or more retrievable media objects in the one or more digital representations received; and transmission means operative to enable the provisioning to said user of at least one media object associated with one or more items comprised in the at least one portion of the printed document, wherein the one or more items correspond to one or more strings of the at least one identified string.
14. The apparatus according to claim 13, wherein said receiving means is adapted to receive an image captured by said user of the mobile communication device, of the at least one portion of the printed document, and wherein said processing means is adapted to identify the at least one portion of the printed document whose image was captured, and to identify one or more items associated with a respective portion of the printed document.
15. The apparatus according to claim 13, wherein said processing means is operative to identify the at least one of said plurality of strings by establishing that said at least one of said plurality of strings is associated with one or more retrievable media objects that match one or more corresponding strings comprised in a database.
16. The apparatus according to claim 13, wherein said processing means is operative to identify the at least one of said plurality of strings by establishing that said at least one of said plurality of strings is associated with one or more media objects that have been found to be of interest to Internet users.
17. The apparatus according to claim 16, wherein the one or more media objects that have been found to be of interest to Internet users are one or more media objects that had been frequently visited by Internet users.
18. The apparatus according to claim 13, wherein said processing means is operative to identify the at least one of said plurality of strings by establishing that said at least one of said plurality of strings is associated with one or more media objects selected from among a pre-defined plurality of media objects.
19. The apparatus according to claim 13, wherein said transmission means is operative to forward indications to enable recognizing the one or more items in the printed document wherein said one or more items correspond to one or more strings of the at least one identified string.
20. The apparatus according to claim 13, wherein said transmission means is operative to forward to a service provider one or more indications identifying the at least one identified string to enable the provisioning of at least one corresponding media object.
21. The apparatus according to claim 13, wherein said transmission means is operative to forward to said user of the mobile communication device the at least one media object associated with said one or more items comprised in a captured image of the at least one portion of the printed document.
22. A system adapted to enable provisioning of one or more media objects associated with at least one portion of a printed document to a user of a mobile communication device, wherein said system comprises: receiving means adapted to receive one or more digital representations of the at least one portion of a printed document; one or more processors each adapted to carry out one or more of the following operations: retrieve from said one or more digital representations, a plurality of strings each comprising at least one word; identify at least one of said plurality of strings which can be associated with one or more retrievable media objects, and wherein the at least one identified string has not been marked as a string associated with one or more retrievable media objects in the one or more digital representations received; and transmission means operative to enable the provisioning to said user of at least one media object associated with one or more items comprised in the at least one portion of the printed document, wherein the one or more items correspond to one or more strings of the at least one identified string.
23. The system according to claim 22, wherein said one or more processors are adapted to match said plurality of strings with strings comprised in a database.
24. The system according to claim 22, wherein said one or more processors are adapted to match said plurality of strings with strings associated with media objects that had been found to be of interest to Internet users.
25. The system according to claim 24, wherein said media objects that have been found to be of interest to Internet users are media objects that had been frequently visited by Internet users.
26. The system according to claim 22, wherein said one or more processors are adapted to match said plurality of strings with strings associated with pre-defined media objects.
27. The system according to claim 22, wherein said one or more processors are adapted to provide identification of one or more items in the printed document which correspond to one or more strings of the at least one identified string.
28. The system according to claim 22, wherein said one or more processors are adapted to insert marking of one or more of the items which correspond to one or more strings of the at least one identified string in the digital representation of the printed document.
29. The system according to claim 22, which comprises receiving means that are adapted to receive an image captured by a user of the mobile communication device of at least one portion of the printed document, and wherein said one or more processors are adapted to identify the at least one portion of the printed document whose image was captured, and to retrieve one or more digital items associated with a respective portion of the printed document .
30. A computer program product encoding a computer program stored on a non-transitory computer readable storage medium for executing a set of instructions by a computer system comprising one or more computer processors for carrying out a process for enabling the provisioning of one or more media objects associated with a printed document to a user of a mobile communication device, wherein said process comprises the steps of: retrieving a plurality of strings from one or more digital representations provided, wherein each of said plurality of strings comprises at least one word; identifying at least one of said plurality of strings which can be associated with one or more retrievable media objects, and wherein the at least one identified string had not been marked as a string associated with one or more retrievable media objects in the one or more digital representations provided; and enabling the provisioning of at least one media object associated with one or more items comprised in the at least one portion of the printed document, to the user, wherein the one or more items correspond to one or more strings of the at least one identified string.
31. The A computer program product according to claim 30, wherein said process further comprises a step of matching said one or more strings with strings associated with media objects that had been found to be of interest to a plurality of Internet users.
PCT/IL2010/000089 2009-02-04 2010-02-02 A method and means for identifying items in a printed document associated with media objects WO2010089736A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IL196899 2009-02-04
IL196899A IL196899A0 (en) 2009-02-04 2009-02-04 Method, apparatus and system for identifying items in a printed document associated with media objects

Publications (2)

Publication Number Publication Date
WO2010089736A2 true WO2010089736A2 (en) 2010-08-12
WO2010089736A3 WO2010089736A3 (en) 2010-11-11

Family

ID=42113526

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2010/000089 WO2010089736A2 (en) 2009-02-04 2010-02-02 A method and means for identifying items in a printed document associated with media objects

Country Status (2)

Country Link
IL (1) IL196899A0 (en)
WO (1) WO2010089736A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3019669A1 (en) * 2014-04-08 2015-10-09 Yves Roubinet METHOD AND DEVICE FOR ACCESSING LOCAL INFORMATION BY PLAN RECOGNITION

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090046320A1 (en) 2007-08-19 2009-02-19 Xsights Media Ltd. Method and apparatus for forwarding media...
WO2009104193A1 (en) 2008-02-24 2009-08-27 Xsights Media Ltd. Provisioning of media objects associated with printed documents
WO2009147675A1 (en) 2008-06-05 2009-12-10 Xsights Media Ltd. Method and device for inserting identification marks in a printed document
WO2010001389A1 (en) 2008-07-02 2010-01-07 Xsights Media Ltd. A method and a system for identifying a printed object

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7239747B2 (en) * 2002-01-24 2007-07-03 Chatterbox Systems, Inc. Method and system for locating position in printed texts and delivering multimedia information
US20070226321A1 (en) * 2006-03-23 2007-09-27 R R Donnelley & Sons Company Image based document access and related systems, methods, and devices

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090046320A1 (en) 2007-08-19 2009-02-19 Xsights Media Ltd. Method and apparatus for forwarding media...
WO2009104193A1 (en) 2008-02-24 2009-08-27 Xsights Media Ltd. Provisioning of media objects associated with printed documents
WO2009147675A1 (en) 2008-06-05 2009-12-10 Xsights Media Ltd. Method and device for inserting identification marks in a printed document
WO2010001389A1 (en) 2008-07-02 2010-01-07 Xsights Media Ltd. A method and a system for identifying a printed object

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3019669A1 (en) * 2014-04-08 2015-10-09 Yves Roubinet METHOD AND DEVICE FOR ACCESSING LOCAL INFORMATION BY PLAN RECOGNITION
WO2015155478A1 (en) * 2014-04-08 2015-10-15 Roubinet Yves Method and device for accessing local information by means of map recognition

Also Published As

Publication number Publication date
WO2010089736A3 (en) 2010-11-11
IL196899A0 (en) 2009-12-24

Similar Documents

Publication Publication Date Title
US9075779B2 (en) Performing actions based on capturing information from rendered documents, such as documents under copyright
US8418055B2 (en) Identifying a document by performing spectral analysis on the contents of the document
EP2409269B1 (en) Associating rendered advertisements with digital content
US7920759B2 (en) Triggering applications for distributed action execution and use of mixed media recognition as a control input
US8315465B1 (en) Effective feature classification in images
US20110044512A1 (en) Automatic Image Tagging
US20140254942A1 (en) Systems and methods for obtaining information based on an image
CN102855298B (en) Image search method and system
US20170132225A1 (en) Storing and retrieving associated information with a digital image
WO2010105244A2 (en) Performing actions based on capturing information from rendered documents, such as documents under copyright
CN107111618B (en) Linking thumbnails of images to web pages
US20070239848A1 (en) Uniform resource locator vectors
CN109756760A (en) Generation method, device and the server of video tab
US20150186739A1 (en) Method and system of identifying an entity from a digital image of a physical text
US9256805B2 (en) Method and system of identifying an entity from a digital image of a physical text
EP2028588A2 (en) Method and apparatus for forwarding media objects to a cellular telephone user
US9081801B2 (en) Metadata supersets for matching images
US20210019511A1 (en) Systems and methods for extracting data from an image
US20060167899A1 (en) Meta-data generating apparatus
US7925121B2 (en) Theme-based batch processing of a collection of images
CN108921193B (en) Picture input method, server and computer storage medium
US7286722B2 (en) Memo image managing apparatus, memo image managing system and memo image managing method
CN108230220A (en) Watermark adding method and device
US20200186668A1 (en) Method and device for recommending watermark for electronic terminal
CN111079777B (en) Page positioning-based click-to-read method and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10712164

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10712164

Country of ref document: EP

Kind code of ref document: A2