WO2013138846A1

WO2013138846A1 - Method and system of interacting with content disposed on substrates

Info

Publication number: WO2013138846A1
Application number: PCT/AU2013/000256
Authority: WO
Inventors: Kia Silverbrook; Jonathon Leigh Napper; Timothy Merrick Long; Robert Dugald Gates; Christopher Wooldridge; Christopher David OWEN; Jane Lesley CHILDS; Dimitrios Koubaroulis; Quentin Barrie GOLDFINCH; Jason Wayne SANKEY; Peter Malcolm Roberts
Original assignee: Silverbrook Research Pty Ltd
Priority date: 2012-03-22
Filing date: 2013-03-15
Publication date: 2013-09-26

Abstract

A system and method of interacting with content disposed on a substrate is provided. An image of the content disposed on the substrate is captured and used to identify the substrate via an image-matching technique. Digital content corresponding to the identified 5 substrate is retrieved and displayed on a screen of a viewing device. The displayed digital content is a virtual reality and/or augmented reality view, referred to herein as a 'digital twin', of the imaged content. The digital twin has at least one interactive element for user interaction. 10

Description

METHOD AND SYSTEM OF INTERACTING WITH CONTENT DISPOSED ON

SUBSTRATES

TECHNICAL FIELD

The present disclosure relates generally to a system, method, and device for interacting with printed content using a viewing device. More specifically, the present disclosure relates to a system, method, and device for allowing users and printed content to interact in a dynamic manner such that the printed content is no longer considered static and unchanging by the user.

BACKGROUND The Applicant has developed a system known as Netpage™ in which substrate surfaces carry an encoded pattern that is visible to an optical sensor. The data encoded in the pattern typically provides a unique identity for the substrate (or Netpage as it is known) and the position of the encoded data on the Netpage. The Netpage system is comprehensively described in the cross referenced publications listed above. One of these references (US 6,788,293 the contents of which are incorporated herein) is specifically directed to devices used to view the Netpage. Figs. 1, 2 and 3 illustrate a Netpage viewer 50 described in detail in US 6,788,293. The Netpage viewer 50 has an image sensor 51 positioned on its lower side for capturing images of the encoded pattern printed on a Netpage 1. A display screen 52 on its upper side displays content 6 to the user. Referring Figure 4, the coding pattern consists of Netpage tags 4 tiled across the

Netpage 1. Figure 3 shows the Netpage viewer 50 placed on the Netpage 1 in order to sense the Netpage tags 4 shown in Figure 4. The image sensor 51 senses one or more of the tags 4, decodes the coded information and transmits this decoded information to the Netpage system via a transceiver (not shown). The Netpage system retrieves a page description corresponding to the page ID encoded in the sensed tag 4 and sends the page description (or corresponding display content 5) to the Netpage viewer 50 for display on the screen 52. Typically, the Netpage 1 has human readable text and/or graphics 5, and the Netpage viewer 50 provides the user with the experience of virtual transparency. In other words, the content 5 displayed on the display screen 52 of the Netpage viewer 50 matches with the underlying printed content, so that the display screen 52 appears to be transparent from a user's perspective. Since the content 5 displayed on the Netpage viewer 50 is rendered from downloaded digital data, this information may be enhanced with additional interactive functionality. For example, hyperlinks may be displayed which are interactive via touchscreen interactions. Other functions, such as magnification, translation, playing video, playing audio, filling-in forms etc. are all described in US 6,788,293.

Since each Netpage tag 4 of the coding pattern incorporates data identifying the page ID and its own location on the Netpage 1, the Netpage system can determine the location of the Netpage viewer 50 relative to the Netpage 1 and so can extract information corresponding to that position. Additionally the Netpage tags 4 include information which enables the Netpage viewer 50 to derive its orientation relative to the Netpage 1. This enables the displayed content 6 to be rotated relative to the Netpage viewer 50 so as to match the orientation of the text 5. Thus, information displayed by the Netpage viewer 50 may be aligned with content 6 printed on the Netpage 1, as shown in Fig. 3, irrespective of the orientation of the Netpage viewer 50.

As the Netpage viewer 50 is moved, the image sensor 51 images the same or different tags 4. This enables the Netpage viewer 50 and/or system to update the viewer's relative position on the Netpage 1 and to scroll the display 52 as the Netpage viewer 50 moves. The position of the Netpage viewer 50 relative to the Netpage 1 can easily be determined from the image of a single Netpage tag 4; as the Netpage viewer 50 moves the image of the Netpage tag 4 changes, and from this change in image, the position relative to the Netpage tag 4 can be determined.

It will be appreciated that the Netpage viewer 50 provides users with a richer experience of printed substrates. However, the Netpage viewer 50 typically relies on detection of Netpage tags 4 for identifying a Netpage 1 identity and position in order to provide the functionality described above. Further, in order for the Netpage coding pattern to be invisible (or at least nearly invisible), it is necessary to print the coding pattern with customized invisible IR inks, such as those described by the present Applicant in US 7, 148,345.

The Applicant has recognized the desirability of providing at least some functionality of the Netpage viewer 50 without the requirement for a customized Netpage tag-reading device. Smartphones are now ubiquitous and have many of the attributes required for providing users with the experience of virtual reality via imaging of printed substrates. These attribute include a high-resolution camera, fast processing speeds, color touchscreen display and high-speed internet connectivity.

In US Publication No. 2011/0292198, the contents of which are incorporated herein by reference, the Applicant described a microscope attachment 61 for a smartphone. Referring to Figure 5, the microscope attachment may be in the form of a sleeve 60 incorporating microscope optics 62. The sleeve additionally serves as a protective cover for the smartphone.

With the microscope attachment 61 in place, it becomes possible to view objects and surfaces in close-up using the smartphone's camera. Accordingly, with the microscope attachment, a conventional smartphone may be used as a Netpage viewer 50 when placed in contact with a surface of a page having a Netpage coding pattern printed thereon. The microscope optics 62 may include an IR phosphor placed in front of the smartphone's flash so as to illuminate Netpage tags 4 printed in IR ink.

With the smartphone suitably configured for decoding the Netpage coding pattern and rendering received digital display data in real-time, the smartphone can effectively have the same functionality as the customized Netpage viewer 50. The user simply requires the microscope attachment 61 and suitable software in order to read Netpage tags 4 and provide an experience of virtual reality with printed substrates.

Although microscope attachments for smartphones enable similar functionality to the Netpage viewer 50, it would be desirable to provide this functionality without the requirement for printed Netpage tags 4. Magazine publishers, for example, are typically reluctant to incorporate special coding patterns into each page of a magazine due to the additional cost of IR ink and the visual impact of the coding pattern, even if it is virtually imperceptible to the human eye.

Existing applications for smartphones enable decoding of barcodes and recognition of page content, typically via OCR and/or recognition of page fragments. Page fragment recognition uses a server-side index of rotationally-invariant fragment features, a client- or server-side extraction of features from captured images and a multi-dimensional index lookup. A well-known algorithm for performing page fragment recognition using rotationally-invariant fragment features is known as 'SIFT' (Scale-Invariant Feature Transform; see US 6,71 1,293, the contents of which are herein incorporated by reference). Existing recognition applications (e.g. Google Goggles) make use of the smartphone camera without modification of the smartphone or the imaged content. Therefore, they are potentially attractive from the point of view of publishers and app developers. However, these applications are somewhat brittle due to the poor focusing of the smartphone camera and errors in page fragment recognition techniques. With poor recognition accuracy, the user's experience is correspondingly poor, meaning that the uptake of apps for interacting with printed (or displayed) content has been very slow to date.

It would be desirable to provide an improved user experience of interacting with printed (or viewed) content via smartphones or other viewing devices, without necessarily requiring a special coding pattern. SUMMARY OF THE INVENTION

In a first aspect, there is provided a method of interacting with a substrate. The method comprises the steps of: capturing an image of content disposed on the substrate; identifying the substrate using the captured image; retrieving digital content corresponding to the identified substrate; and displaying the digital content on a display screen of a viewing device, wherein the displayed digital content is a virtual reality and/or augmented reality view, referred to herein as a 'digital twin' (described below) of the imaged content, the digital twin having at least one interactive element for user interaction.

In one embodiment, the interactive elements may be hyperlinks, video/audio playback options etc. In a second aspect, there is provided a method of disambiguating a plurality of possible publications identified using an image-matching process. The method comprises the steps of: acquiring contextual and/or other non-image information relating to an interaction between a viewing device and a page of a viewed publication; and disambiguating the plurality of possible publications using the contextual and/or other non- image information to provide a most likely publication corresponding to the viewed publication.

In one embodiment, the contextual information relates to a user and is identified via a viewing device ID or user ID. In another embodiment, the viewing device ID or user ID is retrieved using at least one of: a viewing device ID stored in the viewing device; a NFC tag or token associated with the user; facial recognition (e.g. via a user-facing camera of the viewing device); iris recognition (e.g. via a user-facing camera of the viewing device); voice recognition (e.g. via a microphone of the viewing device); a password entered via a user interface of the viewing device; a signature (e.g. a signature captured as digital ink); a fingerprint (e.g. via a high-resolution touchscreen of the viewing device or a custom biometric sensor).

In another embodiment, the contextual information comprises at least one of: visual continuity in respect of the interaction between the viewing device and the page; a time period elapsed between a previous interaction and the current interaction; a favorites list generated by the user; a browsing history of the user; user subscription information; a preferred language of the user; publications associated with the user's demographics; publications associated with the user's geographical location; publications having a circulation exceeding a predetermined number; most popular viewed pages; publications published within a predetermined period relative to a current date; a manual indication from the user; and a type of viewing app installed in the viewing device.

In a third aspect, there is provided a method of optimizing an image-matching process. The method comprises the steps of acquiring contextual and/or other non-image information relating to an interaction between a viewing device and a page of a viewed publication; using the contextual and/or other non- image information to identify one or more possible publications; and searching the identified possible publications for an image match in respect of the page of the viewed publication.

In a fourth aspect, there is provided a method of optimizing an image-matching process. The method comprises the steps of: displaying to a user one or more publication identifiers; the user selecting one of said publication identifiers, the selected publication identifier corresponding to a publication for viewing by the user; capturing an image of a page from the publication; searching a database corresponding to the publication for an image match in respect of the imaged page, thereby to identify the imaged page.

In one embodiment, the publication identifiers displayed to the user are identified using contextual information. In another embodiment, the contextual information comprises at least one of: a favorites list generated by the user; a browsing history of the user; publications associated with the user's demographics; publications associated with the user's geographical location; publications having a circulation exceeding a predetermined number; and publications published within a predetermined period relative to a current date.

In another embodiment, the publication identifiers identify individual bind issues or generic publication titles.

In a fifth aspect, there is provided a cover match approach for optimizing an image- matching process. The method comprises the steps of: capturing a first image of a cover of a publication; identifying the publication by matching features of the captured first image with a reference image contained in a first database of cover pages; capturing a second image of content disposed on a page within the publication; identifying the page by matching features of the captured second image with a reference image contained in a second database corresponding to the identified publication. In a sixth aspect, there is provided a method of providing a virtual reality display to a user: displaying a live video image of a viewed page on a display screen of the viewing device; identifying the viewed page; retrieving digital data corresponding to the viewed page; and rendering digital content to the display screen based on the digital data, wherein the rendered digital content is cross-faded with the live video image such that the user experiences an apparently seamless transition between the live video image and the rendered digital content.

In a seventh aspect, there is provided a hybrid method of providing a virtual reality and/or augmented reality view of a printed page of a publication. The method comprises the steps of: identifying the publication via a technique which does not rely on image- matching in respect of printed content; retrieving a set of reference images corresponding to the identified publication; capturing an image of printed content on the page using a camera of a viewing device; using the captured camera image to identify at least one of: the imaged page from the set of reference images; a location of the viewing device relative to the page; an orientation of the viewing device relative to the page; and a projection transform identifying a pose of the viewing device relative to the imaged page; retrieving digital data corresponding to the imaged page; and rendering digital content to the display screen based on the retrieved digital data, wherein the digital content is displayed as a virtual reality and/or augmented reality view of the printed page. In one embodiment, the technique for identifying the publication and/or page is selected from the group consisting of: reading a barcode associated with the publication and/or page, reading a two dimensional encoded tag associated with the publication and/or page, reading an RFID tag associated with the publication and/or page, reading an NFC tag associated with the publication and/or page, reading a steganographic code associated with the publication and/or page, manual or oral selection of the publication by a user.

In another embodiment, a publication (e.g. magazine) is identified using a non- image based technique, a set of reference images corresponding to individual pages of the publications is identified, and the captured image is used for page recognition using the set of reference images. In an eighth aspect, there is provided a method of attaching a media object to a page.

The method comprises the steps of: capturing an image of content disposed on a printed page; identifying the printed page using the captured image; retrieving digital content corresponding to the identified page; displaying the digital content as a virtual and/or augmented reality display on a display screen of a viewing device; identifying the media object for attachment; a user interacting with the media object to indicate attachment to the displayed digital content; and updating a digital description corresponding to the printed page, wherein the attached media object is retrievable via the updated digital description of the printed page.

In one embodiment, the media object is selected from a group consisting of: photo, video, audio, text, drawings, digital ink, hyperlinks, files, apps, ratings. In another embodiment, the updated digital description containing the attached media object is retrievable by a third party unknown to the user.

In a ninth aspect, there is provided an optically imaging pen comprising: a first camera for capturing a first image of printed content on a page when the pen is held at a height above the page; and a second camera for capturing second images of the printed content when a nib of the pen is in contact with the page, wherein the first image is used to identify the page using image-matching techniques, and the second images are used for generating digital ink representing movement of the pen relative to the page.

In one embodiment, the pen comprises a nib force sensor for sensing when the nib is in contact with the page.

In another embodiment, the second camera is activated when the nib force sensor indicates that the nib is in contact with the page.

In another embodiment, the first and second cameras are the same camera, which is reconfigured when the nib force sensor indicates that the nib is in contact with the page. In a tenth aspect, there is provided a method of capturing user interactions with content disposed on a page. The method comprises the steps of: capturing, using a view- facing camera, an image of printed content disposed on the page; identifying the printed page using the captured image; retrieving digital content corresponding to the identified page; displaying the digital content as a virtual and/or augmented reality display on a display screen of a viewing device; interacting with the printed content disposed on the page, for example, using a finger; capturing the interaction using the view-facing camera; and interpreting the interaction as it relates to the displayed digital content.

In one embodiment, the interaction with the printed content initiates an interactive function, such as hyperlinking. In another embodiment, the interaction with the printed content is interpreted as a gesture, wherein different gestures initiate different interactive functions.

In another embodiment, the interaction is interpreted as digital ink. In an eleventh aspect, there is provided a method of interacting with content disposed on a page. The method comprises the steps of: capturing, using a view-facing camera of a viewing device, an image of printed content disposed on a page; identifying the printed page using the captured image; retrieving digital content corresponding to the identified page; displaying the digital content as a virtual and/or augmented reality display on a display screen of a viewing device; capturing a user intent in respect of the displayed digital content using at least one of: a user-facing camera, a view-facing camera, a microphone, an internal motion sensor, a touchscreen and manual input device, such as a keyboard; and interpreting the user intent as it relates to the displayed digital content. In one embodiment, the user intent is captured via a user-facing camera and the user intent is indicated by at least one of: a gesture (e.g. hand gestures), eye movement, facial expression or a movement of the viewing device relative to the user.

In another embodiment, the user intent is captured via a microphone and the user intent is indicated by a spoken command which is interpreted using voice recognition techniques.

In another embodiment, the user intent is captured via a view-facing camera and the user intent is indicated by a hand movement or a movement of the viewing device relative to the printed page.

In another embodiment, the user intent is captured via an internal motion sensor of the viewing device and the user intent is indicated by motion of the viewing device (e.g. tilt, shake etc).

In a twelfth aspect, there is provided a method of interacting with content disposed on a page and recording user interactivity. The method comprises the steps of: capturing, using a view-facing camera of a viewing device, an image of printed content disposed on a page; identifying the printed page using the captured image; retrieving digital content corresponding to the identified page; displaying the digital content as a virtual and/or augmented reality display on a display screen of a viewing device; a user interacting with the displayed digital content; and recording interactivity data in respect of the user's interactions. In one embodiment, the recorded interactivity data includes at least one of: number of page views; page dwell time; article dwell time; click-throughs; searches; clippings extracted from the digital content; purchases; eye track data.

In another embodiment, at least some of the recorded interactivity data is added to a user profile so as to provide contextual information for future image-matching.

In a thirteenth aspect, there is provided a method of clipping digital content. The method comprises the steps of: capturing, using a view-facing camera of a viewing device, an image of printed content disposed on a page; identifying the printed page using the captured image; retrieving digital content corresponding to the identified page; displaying the digital content as a virtual and/or augmented reality display on a display screen of a viewing device; a user interacting with the displayed content to select a clipping; and extracting the clipping from a digital description of the displayed content.

In one embodiment, the user interaction with the displayed content indicates an extent of the clipping from the digital description. In one embodiment, the extent of the clipping is indicated by a representation of a torn edge of a paper page on the displayed content such that the clipping appears to have been ripped from the page.

In another embodiment, the displayed content has pre-defined clipping regions and the user interaction selects one of the pre-defined clipping regions. In one embodiment, the pre-defined clipping regions overlap. In one embodiment, the pre-defined clipping regions are nested. In one embodiment, the displayed content indicates one of the predefined clipping regions as a target clipping depending on a current position of the image within the page. In one embodiment, the displayed content indicates the extent of the target clipping.

In another embodiment, the clipping preserves any interactivity associated with a part of the displayed content defined by the clipping.

In another embodiment, the clipping is shared with a third party such that the third party can experience at least some of the interactivity experienced by the initial user.

In another embodiment, sharing of the clipping and/or any associated interactivity is subject to digital rights management.

In a fourteenth aspect, there is provided a one-pass method for calculating image descriptors. The method comprises the steps of: defining a local image patch; defining a first set of radial zones within the local image patch; sampling pixels in each zone of the first set to determine an orientation of zone boundaries in the first set relative to an orientation of the local image patch; defining a second set of radial zones within the local image patch, each radial zone in the second set comprising a merged plurality of radial zones from the first set; calculating image descriptors using a gradient strength and orientation for each pixel contained in the second set of radial zones, wherein image descriptors derived from differently oriented views of the same image patch are similar.

In one embodiment, the radial zones in the second set are defined by the intersections of annuli and segments.

In another embodiment, the second set is defined by 2 annuli and 3 segments to provide 6 radial zones. In a fifteenth aspect, there is provided a method for determining a projection transform which employs a greedy algorithm, wherein only a part of a query image is processed to estimate an initial projection transform. The method comprises the steps of: providing an estimated projection transform based on a local cluster of features in one zone of camera image; using the estimated projection transform to transform other features extracted from the camera image; using the estimated projected transform to transform features of a reference image stored in a database; comparing the transformed features from the camera image with the transformed features from the reference image; filtering results of the comparing step to include only candidate features having a close image descriptor correspondence with features from the reference image; combining the candidate features with the cluster of features used to determine the estimated projection transform; determining if the combined list of features exceeds a predetermined number; and if the predetermined number is exceeded, then assuming that the initial estimated projection transform is correct; or otherwise if the predetermined number is not exceeded, then re-estimating the projection transform using a larger set of features extracted from the camera image. In one embodiment, the method is performed on a handheld viewing device having a stored set of reference images.

In another embodiment, the method is used to validate page recognition in respect of the reference image. In another embodiment, the method is repeated using the re-estimated projection transform if the predetermined number is not exceeded.

In another embodiment, a new reference image is sought if all estimated projection transforms fail to produce a combined list of features exceeding the predetermined number.

In another embodiment, the close image descriptor correspondence is determined by a Euclidean distance between image descriptors.

In another embodiment, the features are transformed by the projection transform to determine at least one of: scale, orientation, X coordinate and Y coordinate in camera space.

In another embodiment, the projection transform infers a pose of a viewing device relative to the page being viewed.

In a sixteenth aspect, there is provided a method for calculating a projection transform using only three pairs of extracted correspondence points, each pair of extracted correspondence points comprising a feature extracted from a camera image and a corresponding feature extracted from a reference image stored in a database. In one embodiment, the method assumes that the location of an optical axis of a camera is in the centre of the camera image.

In another embodiment, the projection transform is calculated using the 3 extracted correspondence points and a fourth correspondence point which is calculated based on the assumption regarding the optical axis of the camera. In another embodiment, only some of the extracted correspondence points are true correspondences. In another embodiment, the method is performed on a handheld viewing device.

In a seventeenth aspect, there is provided a method of page recognition. The method comprises the steps of: providing a database of image features derived from a set of reference images, each image feature being tagged with a corresponding page identifier, wherein similar image features derived from the same reference image are absent from the database; matching an image feature derived from a camera image of a page with the database of image features; and determining one or more candidate reference images from the database, each candidate reference image containing an image feature corresponding to the page imaged by the camera. In one embodiment, the method uses contextual and/or other non-image information to reduce the number of candidate reference images (e.g. reducing to one candidate reference image by selecting a reference image contained in a known recently viewed publication)

In another embodiment, the database is generated by the steps of: clustering all similar image features into clusters; identifying image features having the same page identity in the same cluster; and discarding all except one of the identified image features from each cluster.

In an eighteenth aspect, there is provided a method of determining a pose of a viewing device relative to a page. The method comprises the steps of: calculating a projection transform by comparing features in a camera image of the page with corresponding features in a reference image of the page; determining a first pose of the viewing device relative to the page using the calculated projection transform; moving the viewing device relative to the page; and determining a second pose of the viewing device relative to the page using the first pose and data from one or more internal motion sensors in the viewing device.

In one embodiment, the one or more internal motion sensors are selected from the group consisting of: an accelerometer (e.g. a pair of orthogonal accelerometers) and a gyroscope.

In a nineteenth aspect, there is provided a method of tracking a location and/or orientation of a viewing device relative to a page. The method comprises the steps of: imaging the page using a camera of the viewing device; attempting to calculate a projection transform by comparing features in a camera image of the page with corresponding features in a presumed reference image of the page; if the attempted calculation of the projection transform succeeds, then determining an absolute location and/or orientation of the viewing device relative to the page using the calculated projection transform; if the attempted calculation of the projection transform fails (e.g. due to excessive motion blur, defocus etc), then using an optical flow method to estimate a shift between pairs of captured camera images and thereby estimate a location and/or orientation of the viewing device relative to the last known absolute location and/or orientation; and if the attempted projection transform calculation and optical flow method both fail, then reverting to a page recognition process to identify a new reference image from a plurality of reference images stored in a database.

In one embodiment, the presumed reference image is stored on the viewing device, and the projection transform calculation and optical flow method are performed on the viewing device.

In another embodiment, the viewing device has a stored cache of potential reference images, and wherein the page recognition process initially searches the stored cache for an image match. In another embodiment, the stored cache comprises reference images for one or more of: pages from a publication manually selected by the user; recently viewed pages; pages surrounding a current pages (e.g. adjacent page(s) in a publication); an opposite page in a spread if one page from the spread is currently viewed; pages from an article if the article being viewed is contained in a plurality of pages; pages based on a user history; pages based on user demographic information; and pages based on viewing behaviors of other users in relation to the viewed page.

In another embodiment, the page recognition method is performed in a page server, which is remote from the viewing device.

In another embodiment, the page recognition process is employed only if a camera image of sufficient quality can be identified. In another embodiment, the page recognition method employs contextual information and/or other non-image information to facilitate the recognition process (e.g. searching reference images from the most recently viewed publication first).

In a twentieth aspect, there is provided a method of determining a pose of a viewing device relative to a page and rendering digital content in perspective. The method comprises the steps of: imaging the page using a camera of the viewing device; identifying a reference image corresponding to said page using image-matching techniques; retrieving digital content corresponding to the reference image; calculating a projection transform by comparing features in a camera image of the page with corresponding features in the reference image of the page; using the projection transform to determine a pose of the viewing device relative to the page; and rendering at least some of the retrieved digital content to a display screen of the viewing device, wherein the rendered digital content is displayed in perspective in accordance with the determined pose of the viewing device relative to the page, and wherein the rendered digital content is displayed as a virtual reality and/or augmented reality display.

In one embodiment, the viewing device has a stored cache of potential reference images, and wherein the page recognition process initially searches the stored cache for an image match.

In a twenty-first aspect, there is provided a method of performing page recognition via text recognition. The method comprises the steps of: imaging a portion of text disposed on a page, said portion containing a plurality of lines of text; extracting an off-axis text signature from the portion of text, the off-axis text signature comprising a string of text read in a line which is not parallel to the reading direction of the lines of text; and identifying the page by looking up the text signature in an inverted index of text signatures. In one embodiment, the text signature is sent to a remote database comprising the inverted index.

In another embodiment, the text signature uniquely identifies the page from a plurality of other pages.

In another embodiment, the method additionally employs contextual information and/or other non-image information to disambiguate pages containing the same or similar text signatures.

In another embodiment, the text signature is read in a line which is orthogonal to a reading direction of the lines of text on the page. In a twenty-second aspect there is provided a method of optimizing the quality of camera images used for image-matching. The method comprises: capturing images of a page via a camera of a viewing device; and displaying live camera images in the viewing device at a zoom level which encourages the user to hold the viewing device further away from the page. In one embodiment, the method comprises the step of switching from the live camera images to a virtual reality display of a digital twin, wherein the digital twin is displayed at the same zoom level as the live camera images.

In another embodiment, the live camera images are displayed at a greater zoom level (i.e. more magnified) than a typical camera preview image used for photography. In a twenty-third aspect, there is provided a method of providing a virtual reality experience to a user via a viewing device. The method comprises the steps of: capturing images of a page via a camera of the viewing device; retrieving display data using the captured camera images; and displaying rendered digital content to a user in real-time, such that the user experiences the page as a virtual reality experience, wherein, if the captured camera images are of insufficient quality to determine appropriate digital content for virtual reality, then a live camera image is displayed to the user until such time that camera images of sufficient quality are captured.

In one embodiment, camera images of insufficient quality are caused by rapid movement of the viewing device relative to the page so that the user is displayed the live camera images when the viewing device is moving rapidly.

In another embodiment, the display seamlessly switches between the digital content and the live camera image such that the user maintains an apparent virtual reality experience, even when the viewing device is being moved rapidly. In a twenty-fourth aspect, there is provided a method of switching between display modes in a viewing device configured for providing a virtual reality display. The method comprises the steps of: capturing images of a page via a camera of the viewing device; sampling camera images for image-matching and retrieving display data using the captured camera images; displaying rendered digital content to a user in real-time as a dynamic display, such that the user experiences the page as a virtual reality experience; detecting if the viewing device is lying flat against the page using the captured images; and displaying the rendered digital content to the user as a static display if it is detected that the viewing device is lying flat against the page, wherein the static display enables the user to navigate the rendered digital content using on-screen gestures, such as panning and zooming/unzooming gestures.

In one embodiment, interactive functions associated with the reference digital content are the same in the dynamic display and the static display.

In another embodiment, the detection of the viewing device lying flat against the page is via a blackness detector, which detects black camera images.

In another embodiment, the display reverts to the dynamic display (i.e. virtual reality display), when the camera captures recognizable page images.

In another embodiment, a processor in the viewing device samples camera images for potential image-matches at a relatively lower rate when the blackness detector indicates that the viewing device is lying flat against the page.

In a twenty-fifth aspect, there is provided a method of displaying a user interface of a viewing device in an appropriate orientation (i.e. portrait or landscape) without relying on internal accelerometers. The method comprises the steps of: imaging a page using a camera of the viewing device; identifying a reference image corresponding to the page using image-matching techniques; determining an orientation of the viewing device relative to the page by comparing features in the camera image with features in the reference image; rendering digital content to a display screen of the viewing device as a virtual reality display in real-time irrespective of the orientation of the viewing device; and arranging the user interface (including buttons, header bar etc) in accordance with the determined orientation relative to the page. In one embodiment, the user interface is displayed in the appropriate orientation, even when the plane of the viewing device is substantially perpendicular to a gravitational force.

In another embodiment, the user interface comprises a header bar which appears at the top of the user interface in both portrait and landscape orientations.

In a twenty-sixth aspect, there is provided a method of changing an augmented reality display depending on a zoom level. The method comprises the steps of: imaging a page using a camera of a viewing device; and displaying rendered digital content to a user in real-time as a dynamic display, such that the user experiences the page as a virtual reality experience, wherein the rendered digital content contains augmented reality interactive features (such as buttons); determining a zoom level of the display; and changing the displayed augmented reality interactive features depending on the determined zoom level.

In one embodiment, a first interactive button at a first zoom level is changed to a plurality of second interactive buttons at a second zoom level, wherein the second zoom level is greater than the first zoom level.

In another embodiment, a total number of interactive features displayed to the user does not exceed a predetermined number irrespective of the zoom level.

In a twenty-seventh aspect, there is provided a method of scheduling download of data to a viewing device for a virtual reality display of a page. The method comprises the steps of: downloading display image data (e.g. pdf image data) for the page; downloading page tracking data enabling viewing device to track its position relative to the page; downloading definitions for interactive features appearing in the virtual reality display; and downloading a word index for the page, the word index enabling text extraction from the displayed page, wherein a degree of interactivity with the displayed page increases as the download schedule progresses.

In one embodiment, the download schedule is in the order defined above.

In another embodiment, a thumbnail image is downloaded prior to the display image data.

In a twenty-eighth aspect, there is provided a method of scheduling download of data to a viewing device for a virtual reality display of a page spread having a primary page and second page. The method comprises the steps of: downloading all requisite data in respect of the primary page; downloading page tracking data for the secondary page; and downloading display image data for the secondary page, wherein the page tracking data for the secondary page is downloaded prior to the display image data for the secondary page so as to enable tracking between the primary and secondary pages before the display image data for the secondary page has downloaded. In one embodiment, the secondary page is represented by a placeholder before the image data for the secondary page has downloaded. In one embodiment, image data for the placeholder is downloaded with the tracking data.

In a twenty-ninth aspect, there is provided a method of allocating processor resources in a viewing device providing a virtual reality display to a user. The method comprises the steps of: imaging a page using a camera of the viewing device; displaying rendered digital content to a user in real-time as a dynamic display, such that the user experiences the page as a virtual reality experience; determining processor resources required for display of the digital content; and sampling camera images so as to track movement of the viewing device relative to the page and maintain the virtual reality experience via the displayed digital content, wherein a rate of sampling is dependent on the processor resources required for display of the digital content.

In one embodiment, a relatively lower sampling rate is used for digital content having a relatively high demand on processor resources (e.g. animations, games etc), and a higher sampling rate is used for digital content having a relatively low demand on processor resources (e.g. plain text, monochrome graphics etc).

In another embodiment, display data retrieved by the viewing devices includes an indication of processor resources required for display of graphical content

In a thirtieth aspect, there is provided a viewing system comprising a viewing device and/or a computer system which is configured for performing any of the methods described above. The viewing device is typically equipped with a display screen, a view- facing camera, a memory, a processing system and a communications system, such as a digital transceiver. The viewing device may further comprise a user-facing camera. Typically, the viewing device is configured by means of a suitable app which runs in the viewing device. The computer system may be located remotely from the viewing device (e.g. a remote server, a local device etc) or it may be integrated with the viewing device.

In a thirty-first aspect, there is provided a viewing device which is configured for performing, at least partially, any of the methods described above.

In a thirty-second aspect, there is provided a computer system which is configured for performing, at least partially, any of the methods described above.

In thirty-third aspect, there is provided a substrate for viewing by a viewing device, wherein the substrate comprises: content for viewing by the viewing device; and a readable storage device containing a digital description of the content and any interactivity associated with the content. In one embodiment, the storage device may be selected from electronic devices, such as an RFID tag, an NFC tag, a miniature computer system etc.

In another embodiment, the substrate is selected from: magazines, journals, newspapers, books, catalogues, brochures, flyers, restaurant menus, point-of-sale material, posters, billboards, product labels, product packaging, album artwork (e.g. CD or DVD artwork), desktop printed documents, printed webpages, printed e-mails, business cards, tickets, clothing, stickers, electronic billboards, monitors, e-readers, TVs, computers, phones, projector screens, building, vehicle, product item etc.

In a thirty-fourth aspect, there is provided a method of calculating an image signature for a query image to be subject to an image-matching technique, the method comprising the steps of: using the query image as a base image to generate a series of scale images that are successively blurred versions of the base image; using the series of scale images to produce a set of gradient images consisting of gradient vectors at each pixel location in the set of scale images; producing a set of squared, normalized, gradient difference images from the set of gradient images by arranging the set of gradient images into adjacent pairs of gradient images and subtracting the gradient vectors in one gradient image of the pair from the gradient vectors at corresponding pixel locations in the other gradient image of the pair, calculating a squared magnitude of the gradient vector difference at each pixel location, and normalizing the squared magnitude of the gradient difference at each pixel location to generate the set of squared, normalized, gradient difference images; comparing each pixel in the set of squared, normalized, gradient difference images to pixels surrounding said pixel to identify local maxima; using the local maxima to provide a set of feature points that is characteristic of the base image; deriving an image descriptor for each feature point in the set of features points; and, using the images descriptors to form an image signature characterizing the query image.

Optionally, the series of scale images is a Gaussian scale space in which the scale images are blurred by a Gaussian filter to increasingly blur versions of the base image.

Optionally, the series of scale images is subject to a Scharr filter to produce the set of gradient images. Optionally, the squared gradient magnitude at each pixel location of the set of gradient images are normalized by multiplying by the square of the sigma of the Gaussian blur at that level within the set of scale images.

Optionally, the local maxima are identified by comparing each pixel in the set of squared, normalized, gradient difference images to pixels immediately surrounding said pixel such that said pixel is compared to the eight adjacent pixels in the squared, normalized, gradient difference image in which said pixel is positioned, as well as the nine pixels in the pixel locations corresponding to the said pixel and the nine adjacent pixels in the squared, normalized, gradient difference images on either side of said squared, normalized, gradient difference image in which said pixel is positioned.

In a thirty-fifth aspect, there is provided a method of determining a pose of a viewing device relative to a page and rendering digital content in an altered perspective. The method comprises the steps of: imaging the page using a camera of the viewing device; identifying a reference image corresponding to said page using image-matching techniques; retrieving digital content corresponding to the reference image; calculating a projection transform by comparing features in a camera image of the page with corresponding features in the reference image of the page; using the projection transform to determine a pose of the viewing device relative to the page; and rendering at least some of the retrieved digital content to a display screen of the viewing device, wherein the rendered digital content is displayed in a perspective that differs from a perspective that corresponds with the determined pose of the viewing device relative to the page.

In one embodiment, the rendered digital content is displayed as if the page is parallel to the display screen of the viewing device. In one embodiment, the rendered digital content is displayed as a virtual reality and/or augmented reality display.

In another embodiment, the viewing device has a stored cache of potential reference images, and wherein the page recognition process initially searches the stored cache for an image match. In a thirty-sixth aspect, there is provided a method of interacting with content disposed on a page. The method comprising the steps of: capturing, using a view-facing camera of a viewing device, an image of printed content disposed on a page; identifying the printed page using the captured image; retrieving digital content corresponding to the identified page; and displaying the digital content as a virtual and/or augmented reality display on a display screen of a viewing device, wherein a user interface element is shown on the display screen between the capturing of the image and displaying the digital content.

In one embodiment, the user interface element is configured to encourage the user to hold the viewing device steady with the camera capturing the image. In one embodiment, the user interface element is the image captured by the camera displayed on the display screen overlaid with representation of a static reticule that is visually similar to a camera reticule.

In another embodiment, the user interface element includes an animation on the display screen showing a sliding bar reciprocating across the static reticule to give an impression that the captured image is being scanned. In another embodiment, the viewing device sends the captured image to a remote page recognition server which identifies the page, and transmits the page identity to the viewing device.

In another embodiment, in response to receiving the page identity, the viewing device requests resources from the server for the user interaction with the content. In another embodiment, the resources include data required to track across the page and content augmentation specifications.

In a thirty-seventh aspect, there is provided a method of interacting with content disposed on a page. The method comprises the steps of: capturing an image of the content using a viewing device that has a view facing camera and a display screen; identifying a reference image corresponding to the page using image- matching techniques; retrieving digital content corresponding to the reference image; displaying the digital content as an augmented reality display on the display screen of the viewing device, the augmented reality display including an overlay augmentation, wherein the overlay augmentation has a fixed position on the display screen regardless of movement of the viewing device relative to the page.

In one embodiment, the page has a plurality of regions, one or more of the regions being associated with the overlay augmentation. In one embodiment, the overlay augmentation appears on the display screen when the view facing camera lingers on said one or more regions. In another embodiment, the overlay augmentation includes at least one of: a static element; an animated element; a video element; and an interactive element.

In another embodiment, the viewing device characterizes lingering when: the augmented reality display center is within the region for more than a predetermined threshold; and a low rate of movement of the viewing device relative to the page.

In another embodiment, the overlay augmentation appears on the display screen when the viewing device, after first lingering on said one or more regions, is rapidly removed from the page such that the page is no longer viewed by the view facing camera. In another embodiment, the overlay augmentation is preceded by an animation augmentation. In one form, the animation augmentation is a fade in of the overlay augmentation.

In another embodiment, the augmented reality display has a video element which transitions into the overlay augmentation in response to the display screen lingering on the one or more regions.

In another embodiment, the rate of movement of the viewing device relative to the page is gauged by a rate of view change, the view change being determined using a difference between a prior image of the content captured by the view facing camera and a later image of the content captured by the view facing camera, the prior image and the later image being timestamped to provide the rate of view change.

In another embodiment, the later view is timestamped between 0.05 seconds and 1.0 seconds after the prior view. Typically, the later view is timestamped 0.1 seconds after the prior view.

In another embodiment, the display screen is rectangular and the low rate of movement of the viewing device relative to the page is characterized as the change of view between the prior image and the later image is less than half the length of the long side of the display screen.

In another embodiment, a high rate of movement is characterized as the change of view between the prior image and the later image is more than twice the length of the long side of the display screen.

According to a thirty-eighth aspect, the present invention provides a system for user interaction with printed content on a substrate, the system comprising: a sensing device for capturing an image of the printed content disposed on the substrate; a server with a database of reference images; a viewing device with a display screen, the viewing device configured to transmit a match request to the server, the match request including the image captured of the printed content; wherein, the server is configured to use an image-matching technique to match the image to a reference image corresponding to the substrate in response to the match request, and transmit a match response to the viewing device, the match response including digital content corresponding to the substrate identified by the image-matching technique and, the viewing device is configured to display the digital content on the screen, such that the screen displays a digital twin of the image captured, the digital twin having at least one interactive element for user interaction.

Optionally, the sensing device is a camera incorporated into the viewing device and the screen in touch sensitive to enable user interaction with the at least one interactive element.

Optionally, the viewing device is a head-mounted display (HMD) worn on the user's head, the sensing device is a digital camera positioned to capture digital video of the user's field of view and the screen for displaying the digital content positioned in at least part of the user's field of view.

Optionally, the at least one interactive element is one or more of: a hyperlink; a button to initiate an action; and, video and/or audio playback options. Optionally, the server is configured to calculate an image signature that characterizes the image captured of the substrate, such that the image-matching technique compares the image signature to similarly calculated image signatures respectively characterizing the database of reference images.

Optionally, the server is configured to calculate the image signature of the image by: using the image as a base image to generate a series of scale images that are successively blurred versions of the base image; using the series of scale images to produce a set of gradient images consisting of gradient vectors at each pixel location in the set of scale images; producing a set of squared, normalized, gradient difference images from the set of gradient images by arranging the set of gradient images into adjacent pairs of gradient images and subtracting the gradient vectors in one gradient image of the pair from the gradient vectors at corresponding pixel locations in the other gradient image of the pair, calculating a squared magnitude of the gradient vector difference at each pixel location, and normalizing the squared magnitude of the gradient difference at each pixel location to generate the set of squared, normalized, gradient difference images; comparing each pixel in the set of squared, normalized, gradient difference images to pixels surrounding said pixel to identify local maxima; using the local maxima to provide a set of feature points that is characteristic of the base image; deriving an image descriptor for each feature point in the set of features points; and, using the images descriptors to form an image signature characterizing the image.

A reliable system enabling user interaction with a variety of printed (or displayed) content provides the springboard for widespread adoption of smartphone apps linking users to the printed world. This is clearly attractive from both users' and publishers' perspectives, as well advertisers who wish to maximize the value of printed advertisements by maximizing the degree of engagement with those advertisements by users. Further advantages of the interactive viewing system will be readily apparent from the following description. BRIEF DESCRIPTION OF DRAWINGS

Specific embodiments of the viewing system will now be described, by way of non- limiting example only, with reference to the accompanying drawings, in which:

Figure 1 is a front perspective view of a Netpage viewer device described in US 6,788,293;

Figure 2 is a rear perspective view of the Netpage viewer device;

Figure 3 shows the Netpage viewer in contact with a Netpage;

Figure 4 is a magnified view of a surface having printed text and the Netpage coding pattern; Figure 5 is a perspective view of a microscope sleeve attachment for a smartphone;

Figure 6 shows a substrate suitable for viewing by a viewing device according to the invention;

Figure 7 shows a smartphone viewing the substrate shown in Figure 6;

Figure 8 shows communication between a smartphone and a page server; Figure 9 shows a virtual reality display of a substrate on a smartphone display screen;

Figure 10 is a diagrammatic representation of the interactive viewing system and a viewed substrate;

Figure 1 1 is a flowchart illustrating the page recognition system; Figure 12 is a flowchart illustrating the operation of the conductor module;

Figure 13 is a schematic diagram of a view finder bundle;

Figure 14 is a flowchart of the creation of a view finder bundle by the view finder analysis module; Figure 15 diagrammatically illustrates the generation of a view finder bundle;

Figure 16 schematically illustrates a fixed radius patch about a corner feature for calculating a local image feature in the scale pyramid level in which the corner feature is detected; Figure 17 schematically shows how the local image patch processed to generate an image feature descriptor;

Figure 18 shows the histogram providing the image feature descriptor generated from the local image patch;

Figure 19 diagrammatically shows the generation of a descriptor-to-feature index and a position-to-feature index;

Figure 20 schematically illustrates the repeated down-sampling of an input camera image to form the scale pyramid;

Figure 21 is a flowchart of the processing steps followed by the page recognition and projection determination system in the view finder module; Figure 22 is a flowchart of matching each local descriptor to a reference feature descriptor derived from a local feature in reference page;

Figure 23 is flowchart for the process of checking for a consistent projection solution;

Figure 24 diagrammatically illustrates the optical system of the camera; Figure 25 diagrammatically illustrates a projection transform mapping a partial view of a substrate to a projected partial view of a corresponding reference image;

Figure 26 is a flowchart of the projection refinement process;

Figure 27 is a flowchart showing the search for an improved projection transform from a list of candidate image feature correspondences; Figure 28 is a flowchart of the least squares fit process applied to the filtered candidate correspondence list;

Figure 29 is a flowchart of producing a composite page recognition pack from a set of view finder bundles; Figure 30 is a flowchart of searching for a local projection transform from a page recognition index;

Figure 31 diagrammatically shows the process of setting up or updating an initial page server database;

Figure 32 is a diagram of the duplicate page image detection method; Figure 33A diagrammatically illustrates the system used to identify image features in a series of scale images;

Figure 33B diagrammatically illustrates a gradient vector at a pixel location in a gradient image;

Figure 34 is a flowchart for generating an n-dimensional vector for each of the image features;

Figure 35 is a flowchart of the image match process within the page server;

Figure 36 shows a text signature generation technique to improve page recognition;

Figures 37A and 37B show a viewing device displaying a video associated with an interactive element of a page; Figure 38 shows a viewing device displaying an image gallery interaction associated with an interactive element of a page;

Figure 39 shows a tablet computer capturing an image of a user's face via a user- facing camera;

Figure 40 shows a sensing device incorporated into a wireless mouse connected to a laptop;

Figure 41 shows a sensing device incorporated into a wireless mouse connected to a desktop computer;

Figure 42 shows a sensing device incorporated into a wireless mouse connected to a TV display;

Figure 43A is a perspective showing a user viewing a page though a head-mounted display (HMD);

Figure 43B is a schematic plan view of the user observing the field of view through the HMD; Figure 43C is the user's field of view including the display screen;

Figure 44 shows a dedicated document viewer;

Figure 45 shows a handheld games console for use as a viewing device; Figure 46 shows a media player for use as a viewing device;

Figure 47 shows a projector for viewing the substrate shown in Figure 6 via a wirelessly connected sensing device and notebook computer;

Figure 48 shown a viewing device displaying rendered digital content to the user as an augmented reality display of a live video image;

Figure 49 shows a viewing device viewing a substrate in perspective;

Figures 50A and 50B show a viewing device displaying an overlay augmentation during interaction with a substrate;

Figure 51 illustrates a method of measuring changes in the rate of change in camera view point;

Figure 52A shows an optically imaging pen performing page recognition of a substrate; Figure 52B shows the optically imaging pen of Figure 52A generating digital ink during handwritten input on the substrate;

Figure 53 shows a viewing device equipped with an electronic tag reader and a camera interacting with a substrate; Figure 54 shows a user's finger interacting with a zone of a substrate within a view- camera's field of view;

Figure 55 illustrates the flow of information through the interactive viewing system during clipping operations;

Figure 56 shows a viewing device with a typical UI for viewing a digital twin having a "Clip & Share" button and a header bar;

Figure 57 shows the typical clipping disposition options presented to the user as onscreen touch buttons;

Figure 58 shows an alternative scheme for clipping regions of a page defined in the digital twin; Figure 59 shows a viewing device displaying a "smart clipping";

Figure 60 shows a viewing device presenting the user with an options menu;

Figure 61 shows a viewing device displaying the user's saved clippings as large thumbnail images;

Figure 62 shows a viewing device displaying the user's saved clippings as a list containing a small thumbnail image together with clipping information;

Figure 63 shows a viewing device displaying the user's saved clippings as a list organized in accordance with magazine title;

Figure 64 shows a viewing device displaying a clipping with ragged edges simulating a torn page; and, Figure 65 shows a viewing device displaying a clipping with straight, clean edges.

DETAILED DESCRIPTION

Interactive viewing system

Figure 8 is a basic sketch of an interactive viewing system 2 according to the invention that provides user interactivity with a printed substrate 10 via a viewing device 100 and a page server 20. A sensing device 808 senses data from the substrate 10 which is used to generate interaction data 101 transmitted to the page server 20. The sensing device 808 may be integrated with the rest of the viewing device 100 or a physically separate.

From the interaction data 101, the system 2 recognizes the substrate 10, usually a printed page. The server 20 normally attends to page recognition but recognition can be perform by both the server and the viewing device, and in some cases just the viewing device 100. Upon recognizing the page 10, display data 103 is returned to the viewing device 100 which includes digital content defining a "digital twin" of the page 10 (or at least part thereof) and interactive content. The digital twin is displayed on the screen 105 to enable use of the interactive content. The digital twin 107 (see Figure 9) is described in much greater detail below, but is essentially a virtual or augmented reality view of the substrate which incorporates at least one interactive element for user interaction. It is broadly understood that a virtual reality view aims to exactly mirror an actual view whereas an augmented reality view substantially corresponds to the actual view but introduces changes, or augmentations.

An input device 814 allows user interactions with the digital twin described in greater detail below. Like the sensing device, the input device 814 may be physically separate from the rest of the viewing device 100 or an integrally formed component. When the viewing device 100 is a smartphone, using the touch sensitive screen 105 as the input device 814 is most convenient.

Figures 6 and 7 show a viewing device 100 with integrated sensing device 808 interacting with a substrate 10. The substrate 10 is, for example, a page of a magazine containing printed content 13 such as text 1 1 and other graphics 12. In order to interact with the substrate 10, a user holds the viewing device 100 above the substrate 10 with a viewing application running on the viewing device 100. The sensing device 808 in the form of the viewing device's camera 102 images an area of the substrate within its field of view 14 containing part of the printed content 13. The captured imaged is used by the viewing device 100, and/or a remote server 20, to determine the identity of the substrate 10 such as the page of the magazine. The captured image may also be used to determine the location of the viewing device 100 relative to the substrate 10.

The page server 20 may in fact be several servers, which act together to perform page recognition processing. In the interests of clarity and brevity, many of the embodiments described here with simply refer to the server 20 in singular. However, the skilled worker will appreciate references to the "server 20" or "page server 20" will also encompass a server system of multiple interconnected servers and/or databases.

Figure 10 schematically illustrates the interaction between the substrate 10, the viewing device 100 and the page server 20 in greater detail. The viewing device 100, controlled by the viewer application 190, sends interaction data 101 (see Fig. 8) to the page server 20 via the Internet using a viewer network interface 120. The interaction data 101 may be raw image data captured by the sensing device 808 (in this case the inbuilt camera 102) and may include an image match request 260 and a content request 290. Alternatively, the interaction data 101 has been processed by the viewer processor 106 from the raw image data. For example, the interaction data 101 may be raw image data that has been compressed by the viewer processor 106 into a format suitable for sending over the Internet. Alternatively, some feature extraction from the raw image data may be performed on the viewing device 100, in the page server 20, or shared between the two.

The page server 20 interprets the interaction data 101 and returns display data 103 to the viewing device 100 via the server network interface 121. In particular, the page server 20 uses the interaction data 101 to determine the identity of the substrate 10 and a projection transform mapping the camera view to a reference image 210 of the substrate 10. The display data 103 returned to the viewing device 100 has a match response 280 and a content response 300 which provide the identity and projection respectively. In one mode of operation the viewing device 100 transmits a content request 290 for corresponding content response 300 to the page server 20. The page server 20 generates or retrieves the display data 103 and transmits it to the viewing device 100. In another mode of operation, the page server 20 transmits the display data 103 directly after identifying the substrate 10 without requiring a request from the viewing device 100. The display data 103 sent to the viewing device 100 includes a 'digital twin' 107 (see Fig. 9) of the substrate 10. The digital twin 107 is a digital description of the printed content, together with 'augmentation data' 220 and additional page recognition information (page recognition bundle 230 and view finder bundle 240 described below). Augmentation data 220 is a description of any interactivity associated with the digital twin 107 and pre-calculated image feature information 240 used in local (on-device) image matching.

The page server 20 (and/or viewing device 100) employs a variety of page recognition techniques, optionally in combination with one or more other parameters, to achieve high page recognition accuracy whilst minimizing processing time on the page server 20. The user interface of the viewer application 190 running on the viewing device 100 in one embodiment provides options or instructions to the user to assist in page recognition and maximizing accuracy.

The display data 103 usually includes data corresponding to one or more pages. For example, if the viewed substrate 10 is a page of a magazine, the page server 20 may send display data 103 corresponding to the viewed page as well as adjacent page(s) in the magazine. If a magazine cover is recognized, then the page server 20 may send (or at least "expect" to send) any or all pages from that magazine. Various caching strategies may be employed to minimize processing times and provide a seamless transition from a camera preview of a page to a digital display of the page. The caching strategies may further be used to optimize communication between the page server 20 and the viewing device 100, thereby improving the user's overall experience.

The display data 103 received by the viewing device 100 is rendered in real-time to display digital content on the touchscreen 105 corresponding to the printed 13 on the substrate 10 viewed by the camera 102. The opaque screen 105 of the viewing device 100 in one embodiment has real-time virtual transparency with respect to the substrate 10. That is, the displayed digital content matches and aligns with the camera's preview image of the printed content 13.

Of course, the screen 105, like the sensing device 808, need not be integrated with the remainder of the viewing device 100. In some forms, the screen 105 viewed by the user is physically separate. In order to provide real-time virtual transparency, the location and orientation of the viewing device 100 with respect to the printed content 13 must be determined. The location and orientation is initially determined from feature matching and/or keystoning of matched features. When the viewing device 100 is moved relative to the printed content 13, the relative location and/or orientation of the viewing device 100 must be updated in order to maintain the effect of virtual transparency. The relative location of the viewing device 100 may be determined by, for example, comparing features of the printed content 13 in the camera's field of view 14 (see Fig. 7) with features in the digital twin 107 cached in the viewing device's memory 131 (see Fig. 10). Likewise, the orientation of the viewing device 100 relative to the substrate 10 may be determined, for example, by comparing features of the printed content 13 in the camera's field of view 14 with features in the digital twin 107 cached in the memory 131. The location and/or orientation of the viewing device 100 may also be updated relative to a previously determined location and/or orientation by comparing a plurality of frames of imaged content in the camera's field of view 14.

If the viewing device 100 is tilted so that the optical axis of the camera 102 is not perpendicular to the plane of the substrate 10, then keystoning of features in the printed content 13 may be used to determine, via a projection transform, a 3D orientation of the viewing device 100 relative to the substrate 10. A position of the user's eyes P_e (see Fig. 49) relative to the viewing device 100 may either be estimated or determined via a user- facing camera 108 (see Fig. 39) of the viewing device 100. Techniques for determining an orientation of a viewing device 100 relative to a substrate 10 and relative to a user are described in further detail in US Publication No. 201 1/0292198.

The digital content 116 (see Fig. 9) rendered on the display screen 105 using the received display data 103 may include embedded interactive elements (e.g. hyperlink 117 and video play button 104) to provide the user with a richer experience of the printed substrate 10. This embedded interactivity is authored into the digital content 116 by, for example, a magazine or newspaper publisher, an advertiser and so on. Not only does this interactivity add value to, for example, printed advertisements, it also provides advertisers with valuable information on the effectiveness of advertising campaigns. For example, the number of user interactions with a particular printed (or displayed) advertisement can be monitored and provided to the advertiser. Hitherto, such information was unavailable to advertisers who traditionally relied on relatively inaccurate market research to assess the effectiveness of printed advertising campaigns.

Figure 9 illustrates the printed content 13 on the viewed substrate 10 of Fig. 6 rendered on the viewing device 100 as digital content 1 16 comprising an interactive element in the form of a hyperlink 117 corresponding to "200+ Race Cars". The on-screen hyperlink 1 17 corresponding to "200+ Race Cars" is shown in a different colour and/or underlined in the rendered digital content 116 to indicate to the user that this is an interactive icon. The user is able to link to a webpage corresponding to "200+ Race Cars" by touching that region of the touchscreen 105.

In another example of an interactive element, the viewing device 100 provides the user with the option of playing a video corresponding to the graphic 12 on the printed substrate 10. Playback of the video may be initiated automatically by hovering or dwelling over a particular region of the substrate. Alternatively, and as shown in Figure 9, the user may be prompted with an on-screen playback control button 104 which augments the displayed graphic 12. It will be appreciated that a plethora of multimedia and interactive options are available via the rendered digital content 116. Moreover, the user interface of the viewer application 190 (see Fig. 10) may be configured to optimize the user's overall experience of the viewing system 2. From a partial view 15 (see Fig. 7) of the substrate 10 as imaged by the viewing device 100, a corresponding reference page 210 stored in a reference database 250 is identified. A projection transform is also determined, which maps the partial view 15 to the corresponding reference page 210.

The reference page 210 comprises a digital description of the printed content 13, including any associated interactivity. In addition, it is advantageous for the reference page 210 to be accompanied by information about a publication such as the publication's title, bind edition, issue date, region of circulation etc. Additionally, the page number (index) of each reference page 210 within the associated publication may advantageously be included as well. Workers in the print media industry will under that a "bind edition" is a particular issue of a magazine that may have different print configurations with varying

advertisements, editorial content, and inserts. A bind edition is a specific complete print configuration of a given issue of a magazine. For example, the variations may cater to specific needs based on geographic region, such as advertisements containing addresses of local offices. There might be a "Sydney Newsstand" bind edition whose cover contains a barcode suitable for point of sale scanning, and in which the editors contact details are a Sydney local phone number, whilst an "International Subscriber" bind edition may have no barcode on the cover, and may have the editors contact details printed as an international phone number with country code.

Given the identified reference page 210 and projection transform determined by the viewer application 190, the location, scale and orientation of the partial view 15 (and hence the viewing device 100) may be used to map the digital twin onto the viewing device screen 105. Figure 25 diagrammatically illustrates a projection transform 18 mapping a partial view 15 of a substrate 10 to a projected partial view 21 of a corresponding reference image 210. Where the reference image 210 has already been identified, only the projection transform 18 is determined, typically to establish the exact location of the partial view 15 and the pose of the viewing device 100 with regard to the printed substrate 10. For example, if the printed substrate 10 is a board game map, then the viewing device 100 will only interact with a reference image database 250 (see Fig. 10) having a single reference image 210. The location of the partial view 15 on the game board substrate 10, and the orientation of the viewing device 100, determines the relevant display data 103 shown on the touchscreen 105 (see Fig. 8). Where the projection transform 18 is known or irrelevant, only the identity of the reference image 210 needs to be determined. For example, if the printed substrate 10 is a magazine page that is shown as a whole at a fixed pose in the touchscreen 105 of a viewing device 100 irrespective of the actual pose of the viewing device 100 with respect to the magazine page. In applications of the system 2 to viewed substrates 10 other than pages from a magazine, the system recognizes pages from other types of printed publications, or any substrate, by their appearance (image). The system 2 is configured to recognize commonly used page spreads such as typical two-page spreads, multiple page gatefolds or other connected page combinations.

One type of interactivity the system 2 provides is digitally "clipping" an item of interest from an actual magazine or other publication. The clipping interaction will now be described with reference to an embodiment of the system 2 in which all the viewed substrates 10 are pages of publications. Accordingly, the reference image database 250 is a reference page database 250 and the reference images 210 are reference pages 210. Referring again to Figure 10, the creation of a digital clipping is initiated by a user interaction while a digital twin 107 (or part thereof) is being shown on the touchscreen 105. The user interaction causes a clip request 310 to be transmitted to the server 20. The clip request 310 contains details of the issue, page, and sub-region being clipped. The clip request 310 adds a record to the shared clippings database 330. Shared clippings may be accessed by other devices 340. Other types of interactivity require digital media such as video to be supplied to the viewing device 100. Such digital media is referenced by page augmentations 220 and stored in a media database 350. When a user interaction with a digital twin initiates a request for a digital media asset, the viewing device 100 retrieves the media via a content request 290 and a content response 300. The page reference database 250 is a database of all known documents accessible via a network or other wired or wireless communications link. Where practical, the database 250, or a part thereof, is mirrored on the viewing device 100 to minimize access time. Recently used reference pages are also preferably cached on the viewing device 100.

The page server reference database 250 has digital images of reference pages 210, publication details (including bind edition information) and metadata such as image signatures, lookup indexes, page recognition data 230, and page tracking data 240.

The page recognition data 230 is collectively referred to as a page recognition bundle (PRB). There is a PRB for each recognizable page.

The page tracking data 240 is packaged in data structures known as view finder bundles (VFB, and also known as a view finder pack).

The digital images of the pages, publication details, metadata, lookup indexes, page recognition bundles, view finder bundles, and the like stored in the page reference database 250, are referred to as "reference" data entities to distinguish from the data generated by the viewing device 100. Therefore, a camera image of a printed page captured by the camera 102 of the viewing device 100 is referred to as a match request 260. The reference image or reference page 210 corresponding to the printed page 10 is stored in the page reference database 250 as a PDF file together with associated metadata. Also associated with each of the reference pages 210 are the augmentations 220 that are incorporated into the reference page 210 to provide the digital twin that is (at least partially) displayed on the touchscreen 105.

The server 20 has a page recognition module 1 10 that attempts to recognize a page from a match request 260 using reference pages 210, page recognition bundles 230 and a page recognition feature index 200. The reference data accessed by the page recognition module 110 may be a full reference repository (the universal set of reference pages 210 and associated metadata) or a selected subset of the full reference repository, such as all cover pages and all pages of the last viewed magazine.

A cover page recognition feature index 270 is an over-the-network page recognition system which attempts to recognize a match request 260 using only reference data for cover pages of publications known to the system 2. If successful, matching against a subset of the full reference repository is more time efficient.

The view finder module 130 is a program used by the viewing device 100 to recognize a camera image using reference document data 131 available to the viewing device processor 100. Reference document data 131 is data associated with previous usage, such as view finder bundles 240 of reference pages 210 already viewed by the viewer system 2.

A blackness detector module 140 is provided to determine whether a camera image contains non-document content, such as that when the camera 102 of the viewing device 100 is pointing to the sky or face-down on a flat surface. An optical flow tracker module 150 is provided, using an optical flow technique to determine whether the camera image is a displaced version of a previously captured camera image.

The conductor module 160 is a decision making component which combines input from various recognition and tracking modules to determine the identity of a page 10 being viewed by the camera 102 and the pose of the viewing device 100 relative to the viewed page 10.

The page recognition analysis module (PR Analysis) 170 analyses a reference page 210 rendered from a page PDF and produces a page recognition bundle (PRB) 230 for the reference page 210 which is stored in the page server reference database 250.

The view finder analysis module (VF Analysis) 180 analyses a reference page 210 rendered from a page PDF and produces a view finder bundle (VFB) 240 for the reference page 210 which is stored in the page server reference database 250.

Figure 11 is a flowchart 365 showing an overview of the operation of the page recognition system 2 of Figure 10. While the flowchart 365 shows the interaction of the main components, a more detailed description of other operations and components is given in later sections.

The viewer application 190 (see Figure 10) operates the viewing device 100 in two main processing flows; one commencing at step 271 when a display refresh is required and the other at step 275 when there is a change to the state of frame information.

The flow commencing at step 1 is executed frequently, usually at the display refresh rate, which is typically 60Hz. In the step 2, the conductor module 160 determines if the page 10 viewed by the camera 102 is being successfully tracked with respect to a reference page 210. If so, the partial view 15 of the reference page 210 (see Figure 25) and projection 18 onto the digital twin 107 are determined by the conductor module 160, in step 273. In step 274, the various overlapping components of the digital content 1 16 (see Figure 9) forming the view presented to the user are rendered together. The video view directly from the camera 102 is the most-obscured (lowest priority in term of 2D drawing order) and the remaining items are placed at progressively higher priority (that is, overlaid on the video view in the digital content 116 shown on the touchscreen 105). In typical usage, if a projected partial view 21 (see Figure 25 again) has been determined, the digital twin 107 may completely or partially obscure the video view.

At step 275, the frame state from the camera 102 has updated to a new frame state. New frames arrive from the camera 102 at a frequency of 15Hz. However, the frame state may also change when a match response 280 (see Figure 10) is received from the server 20, when the view finder module 130 (the on-device page recognition facility) identifies a reference page 210, or when the optical flow module 150 determines that the camera image is a displaced version of a previously captured image. The operation of the server 20, the camera 102, the view finder module 130 and the optical flow module 150 is asynchronous. Analysis by these asynchronous modules is controlled by the conductor module 160 and shown as steps 275, 276, 277, 278, 279, 281 and 282 of the flowchart 365. This forms the second main data processing flow.

These processing flows attempt to maintain up-to-date recognition and tracking of the page 10 viewed by the camera 102 with respect to reference pages 210 stored on the page server 20, and downloaded to the viewer application 190 as necessary.

As discussed above, the conductor module 160 governs the second processing flow through steps 275 to 282. Figure 12 is a flowchart 366 illustrating the operation of the conductor module 160. The conductor module 160 retains information that has been gathered and or computed about recent camera frames and related environmental inputs. The viewer application 190 uses the conductor module 160 to make two types of determination based on this retained information. The two determinations are:

• In step 284, what page is being viewed by the user through the camera 102, including what projection transform 18 (see Fig. 25) should be used to represent the user's view.

• In step 287, what tracking related operation should happen next while balancing the need to answer requests for the current view, with the availability and cost of input sampling, network, and computation resources.

The conductor module 160 receives notifications 296 of information and the results of requested computations asynchronously from the viewer application 190. Such notifications include:

• New frame notification 288 that the camera 102 has captured a new frame, including a unique identifier for the frame and a time stamp of when the capture occurred.

• Page recognition request transmission notification 289 that a page recognition request 260 has been transmitted from the viewer device 100.

• Match response receipt notification 291 that a match response 280 has been received, including the frame identifier, the success or failure of the recognition, the reference page 210 that was recognized, and the projected partial view 21 of the reference page 210 that was determined.

· View finder recognition notification 292 that the view finder module 292 recognition analysis request has completed, including the frame identifier, the success or failure of the recognition, the reference page 210 that was recognized, and the partial projected view 21 that was determined.

• Optical flow notification 293 that a frame displacement analysis from the optical flow module 150 has completed, including the frame identifiers, the success or failure status and the projection transform 18 change from one frame to the other.

• Black check notification 294 from the black check module 140 that the analysis for a frame being blank, or black, has completed, including the frame identifier, and the analysis result.

· Accelerometer notification 295 that an accelerometer reading request has completed, including a three axis accelerometer measurement.

The conductor module 160 retains a list of recently received frames 286, with a record associated with each. Each record can potentially include the following information, although actual computation of the fields will depends upon circumstances. · A unique frame identifier.

• A timestamp when the frame was sampled from the camera 102.

• The match response 280 from the server 20, including the success or failure of the recognition, the reference page 210, and the projection transform 18 of the recognition.

• The results of the viewer device 100 recognition analysis of the viewed frame by the view finder module 130, including the success or failure of the recognition, the reference page 210 recognized, and the projection transform 18 of the recognition. • The result of optical flow module 150 determination between the present frame and a prior frame, including the success or failure of a frame displacement determination, the identifier of the prior frame, and the apparent frame-to-frame projection transform 18 between the prior frame and this frame.

· The results of the black check module 140 for the frame being "blank" or "black".

• A flag indicating if pixel data of the frame from the camera 102 is still available.

In addition to the above per-frame state, some overall state is retained. This includes:

• When commands 277, 279 and 282 (see Fig. 11) to start asynchronous tasks were started.

· The reference page 210 and projection transform 18 last reported, and when it was reported.

As discussed above, the conductor module 160 is called upon to identify the page 10 being viewed by the user through the camera 102 and what projection transform 18 should be used to represent the user's view. To make this determination, the conductor module 160 must resolve absolute recognition results (from the server 20 and the view finder module 130) and relative movement results (from optical flow module 150) that have arrived after different delays for previous frames. It is also advantageous to eliminate spurious false positive recognitions.

To achieve this, the step of updating the frame data 275 (see Figs 11 and 12), the conductor module 160 scans frames from the oldest that are currently being held to the most recent, propagating forward absolute results by using relative increments where they are available. In doing this scan, certain absolute recognition results are required to have multiple confirmations before they are regarded as being legitimate. In particular, a match response 280 indicating a reference page 210 from a different publication is being viewed must be confirmed by a second successful match of the different publication, and a local page recognition result that indicates a different page is being viewed must be confirmed by a second successful match of the different page.

In step 284 (see Figure 12), after performing the scan, if a recent frame is determined to have a known reference page 210 and projected view 21 onto the page, and no more recent frames have a failed optical flow determination, that frame, and the timestamp of when the frame was captured by the camera 102, is taken as the "raw location".

At step 285, the raw location is input to a filtering process that uses retained state (that is, the pre-existing state of the filter that is progressively updated when each new raw location is processed) to determine the view that should be used to represent the user's view at the current time. In one embodiment a simple element-wise Infinite Impulse Response Filter of the projection transform 18 is used to smooth the view. Optionally, a Kalman filter is used to forward estimate what projection transform 18 should be used at the current time.

In some operating modes, the projection transform 18 determined at step 285 is returned from the conductor module 160 and used to determine the projected partial view 21 shown on the touchscreen 105. However, in some operating modes the user is presented with an on-screen view that is not the projected partial view 21 of the digital twin 107 corresponding to the camera's 102 view of the page 10. Instead, a flat, or non- perspective, view (or partial view) of the digital twin 107 is shown on the screen 105. The 'flattened' onscreen view is a rotated, scaled and translated version of the digital twin 107 that approximates the projection transform 18. The conductor module 160 determines this Rotation Scale Translation (RST) transform from the projection transform

18 that was output from the filtering process in step 285. There are a variety of methods of "downgrading" a projection transform 18 to an RST transform. A preferred embodiment considers the projection of two short vectors (one or less screen pixels in length) orthogonal to each other at the centre 17 of the camera's partial view 15 of the page 10 (see Figure 25). These vectors are transformed through the projection transform 18 to find where they map to in the projected partial view 21. The average of the lengths of these projected vectors divided by their unprojected lengths is the scale. The average of the rotation from the original to the projected vectors is the rotation, and the shift from the coordinates of the centre 17 of the camera's partial view 15 to the coordinate of the centre

19 of the projected view 21 is the translation. This RST transform is returned to the system 2 as the result of the determination at step 285. This RST view gives the user a more natural view of the printed content 13 of the page 10 (see Figure 25), while still allowing them to track over the printed page 10 to find augmented features.

In some operating modes, the system 2 does not seek to analyze the most recent view 14, but what the user was recently hovering over (that is, what was held in the camera's field of view 14) in a stable manner. This operating mode is used to overcome unintended movement of the viewing device 100 while the user touches an interface element, or if the user suddenly moves the viewing device 100 away from the page 10 and the interface wishes to keep a stable view and enter "static mode" with this view as described below. To determine this stable view, the conductor module 160 determines a measure of the displacement of successively captured frames, the percentage of each frame that projects off the viewed page 10, and how old the captured frame is. The conductor module 160 then uses a linear combination of these values to score each frame and selects the frame with the minimum value as the stable view to use.

The skilled worker in this field will understand that the "frame" is a rectangular array of pixels within the camera's field of view 14 (which is not typically rectangular). The pixel values in the array define a static image captured by the camera. Frame-to-frame motion is measured by determining where the corners of one frame would project to in the space of a second frame, and then taking the Euclidean distance between the 8 coordinates as a whole. The long edge of a frame is normalized to 100 units for this exercise. The linear combination the conductor module 160 uses to score each frame (see above) is the frame-to-frame motion value, plus 20 times the age in seconds of the frame, plus 200 times the fraction of the frame that projects off the page 10. The frame with the minimum score becomes the stable view.

Referring back to Figure 12, in step 287, the conductor module 160 is called upon to determine what action 308 the viewer application 190 (see Figure 10) should take next; balancing the need to answer requests for the current view, with the availability and cost of input sampling, network, and computation resources. When called upon to make this determination, the conductor module 160 considers a number of possible actions 308 in the priority order to be described here. In considering each possible action 308, system states that preclude the action being taken are always considered and will cause further consideration of that action to be skipped. Such system states include needing a single- threaded module (i.e. a module without the ability to parallel process) that is currently in use, the pixels of a required frame no longer being available to the viewer application 190, the viewer application 190 specifying that it does not wish to consider that action now, and the result of an analysis operation is already known. Other obvious conditions relevant to each case are included.

When the viewing device 100 captures a continuous image frame sequence, some of the captured frames may have appearances which indicate that recognition on such frames will most likely fail. To avoid initiating a relatively expensive (in processing terms) page server match request 260, recently captured frames are scanned to provide a "last seen stable camera image" in case the most current frame is unsuitable. This can be used to select the best frame, in terms of its image quality, to bundle in a page server match request 260 as described below.

The possible actions 308, and their trigger conditions are: · Parked mode 297 is appropriate when the viewing device 100 is resting face down with the camera 102 obscured. Parked mode 297 triggers if recent frames have all been blank or black.

• Exit parked mode 298 triggers if the viewing device 100 is in parked mode 297 and the recent frames are not blank or black.

· Check for black or blankness 299 triggers on a recent frame if earlier recent frames have been deemed black or blank by the blackness or blankness detector 140.

• Send a server match request 301 if a recent frame has low relative motion, and it has been more than 0.5 seconds since the last match request 280 to the server 20, and recent frames have successful optical flow, and there is no traceable view from a successful recognition result (i.e. no recognition result can be propagated forward using optical flow results to get a recent position), and either the camera image is not completely stationary or it has been over 5 seconds since the last match request 280.

• Send a server match request 301 to identify a frame if the most recent recognition result is more than 5 seconds old, yet optical flow is still working.

· Send a server match request 301 to identify a frame if the most recent view finder match request 302 is a failure, and optical flow indicates the frame has low motion, and it has been over 5 seconds since the last server match request 301.

• An optical flow analysis 303 between the second most recent frame and the most recent frame will initiate when the viewer application 190 seeks to economize on processing power, and the second most recent frame has a successful recognition result from the view finder module 130. • Initiate a view finder match request 302 for local analysis of the most recent frame if basic conditions are met and either the viewer application 190 is not seeking to economize on processing power, or there is more than one recent frame with unknown recognition result.

· Initiate a request for optical flow 303 between a pair of frames if the most recent view finder match request failed, and there is prior frame whose position can be estimated by forward propagating optical flows from a recognition result, and an immediately later frame with unknown optical flow result.

• Initiate a bar code check 304 of a frame for the presence of a barcode (such as QR- Code) if recent frames have no traceable view and it has been more than 3 seconds since the last barcode check 304.

• Initiate non-tracking mode 305 if recent view finder recognition requests 302 have failed and there is a recent stable view and either the view has tracked off the viewed page 10, or recent accelerometer readings 295 indicate the viewing device 100 has been subject to high acceleration, or optical flow is failing, and recent accelerometer readings indicate the viewing device is not stationary.

• Resume tracking mode 306 (i.e. exit non- tracking mode 305) if there is a recent successful recognition result and that result projects onto at least 60% of a viewed page 10.

· Transmit a no action 307 indication to the viewer application 190 if none of actions 297 to 306 are taken. In response, the viewer application does not query the conductor module for new actions until at least one notification 296 has been received. If one of the above actions is informed to the viewer application, the viewer application initiates it and immediately queries the conductor module 160 for further actions.

The Optical Flow Module 150

The optical flow module 150 (see Figure 10) is used to determine the approximate relative motion that has occurred between two frames captured by the camera 102. The conductor module 160 typically uses this to estimate what portion of a substrate, such as a printed page 10, is being viewed when image recognition fails on a frame, but a recent previous frame had known tracking position. However, it is also applied between pairs of recent camera frames where neither has a known position, as an absolute position determination process for the earlier frame may be executing asynchronously and will return a result in the future. Because the relative shift between the pair of frames has been calculated in advance, it will be possible to immediately forward estimate a more recent position once the position of an earlier frame becomes known. The optical flow module 150 determines an approximate image shift between pairs of frames. This is performed by first down-sampling a central box (that is, a central rectangular array of pixels) within each frame to a low resolution, such as 64 by 64, and then applying phase-correlation to discover a shift amount. This will be an approximate shift, as the real change in view, assuming a planar target is being viewed, will be a projection transform. The phase correlation may result in a too-weak or ill-conditioned peak, in which case the optical flow module 150 indicates failure for this pair of frames.

Additionally, the optical flow module 150 can also be used to provide an estimate of scale, as well as estimates of rotation and other transforms as well by performing phase correlation on sub-sections (such as quadrants of the pixel array that makes up the frame) of the pair of camera frames.

The Blackness or Blankness Check Module 140

One indication that recognition of one or more frames will likely fail is the blackness or blankness of the frames. Given an 8-bit greyscale image, the blackness or blankness check module 140 constructs a simple image histogram (x-axis for pixel intensity values 0 to 255, y-axis for pixel frequency or count) for each frame being analyzed. To determine if the image is "black" the 98 percentile confidence limits of the image histogram is computed. If the pixel intensity value (x-axis value) corresponding to the maximum confidence limit and the dynamic range (pixel intensity values corresponding to the maximum and minimum confidence limits of the cumulative distribution) of the histogram are below some estimated threshold values, then the image is considered to be "dark" or "black". For example, if the image is almost all black, then the maximum confidence limit will fall below a level (such as 20) on the scale of 0 to 255. An image with all pixel intensities equal to 0 would be considered totally black, while an image with all pixel intensities equal to 255 would be considered white. The first condition (maximum confidence limit) is likely to be satisfied but the image can also have a large number of brighter pixels, leading to a "brighter" or non-dark image that has a wider distribution of pixel intensities rather than just a narrow peak. Hence there is a need to also check the dynamic range. An alternative implementation only uses the dynamic range check for deciding if the image is "featureless" or "blank". This could indicate that the camera 102 is facing the sky or the ceiling.

By detecting blackness and blankness, frames that will probably be unrecognizable are disregarded, and a processor expensive server match request 260 is saved.

The View Finder Module 130 and View Finder Bundles 240

The system of on-device page recognition provided by the view finder module 130 and the view finder bundles (VFB's) 240 is fast, precise, and consumes low-memory. It is particularly suitable for low-latency applications such as those running on the viewing device 100.

Certain aspects of the image matching process used by the view finder module 130 are similar to the SIFT image matching and affine transform estimation method. The SIFT (Scale-Invariant Feature Transform) image matching technique is described in detail in US 6,71 1,293 to Lowe (previously referred to in the Background section and incorporated herein in its entirety). However, the view finder module 130 uses an image matching technique that differs from SIFT in many respects, the main ones being: a) The view finder system uses a different method for selecting interest points. Unlike SIFT which selects local extrema of corners detected over Difference-of-Gaussian images, the view finder system selects corners detected by the FAST9 corner detector (described in more detail below) over down-sampled images from the scale space pyramid using a custom weak corner suppression technique (also described in more detail below).

b) The view finder system computes a full projection transform from three image feature correspondences, unlike SIFT which computes an RST (Rotation Scale and Translation) transform using the same number of correspondences.

c) The view finder system is designed to be greedy in its search for a solution so that it completes with minimum computation in cases where the camera 102 is viewing a true reference page 210. This differs from SIFT, as only a subset of camera image corners is typically detected in the view finder system when the camera image is indeed a view of a reference page 210.

d) Unlike SIFT, the view finder system uses a one-pass approach when computing both the orientation and orientation-dependent descriptor from a local image patch, processing each pixel only once.

e) The view finder system computes a coarser pyramid of scales for the camera image than for the reference image 210. This asymmetry improves the efficiency of the system by minimizing the amount of processing done on the viewing device 100 with a marginal loss of scale recognition accuracy. Differences between the view fmder system and other image-based recognition methods are: a) The view finder system clusters image descriptors to achieve multiple image descriptor matches by performing only a single match calculation.

b) The view finder system combines various sources of non-image contextual information to achieve recognition and projection transform estimation. For example, information from the device hardware sensor(s) such as accelerometer(s) and/or gyroscope(s) or application specific information.

Figure 10 shows the two main components of the view finder system. The view finder module 130 is the software in the viewer device processor 106, and the VF analysis module 180 is the software in the server 20. A page image based on a reference page 210 is input to the VF analysis module 180 which produces a corresponding view finder bundle (VFB) 240. This is stored on the page server reference database 250, and delivered to the viewer device 100 via a content response 300. A small set of view fmder bundles 240 are used to configure the view finder module 130 so that it can perform recognition of a camera images and estimate the camera's field of view 14 (see Figure 7) of the substrate 10. This sub-set of images 210 and corresponding VFB's 240 is stored on-device as viewer page information 131.

The view finder module 130 uses a projection determination system to determine which reference page 210, a free-pose camera image has captured, or partially captured, and what specific projection transform 18 (see Fig. 25) maps the camera image onto the determined reference page 210. The reference pages 210 are typically rendered or captured images of pages of a publication. The camera 102 typically captures a partial view 15 of a printed page 10 corresponding to the reference page 210 which may be subject to distortions and degradations such as lighting variation, de-focus, motion blur, specular reflection, partial occlusion, physical page distortion (including bending), as well as the perspective distortion from the (unknown) camera pose. Part of the projection determination system typically operates in a continuous low-latency environment where determination results are presented visually to the user as the user moves the camera 102 over pages 10; typically in a mobile computing device with a screen 105 and camera 102. This part of the system requires high efficiency and low latency algorithms.

The projection determination system in the view finder module 130 further includes a reference image analysis module which produces a view finder bundle (a collection of data derived from the image and used as input to the recognition process) from a reference image 210. The reference image analysis module typically operates off-line, applied to all potentially recognizable page images, thus producing a universal set of view finder bundles.

The projection determination system also includes a recognition and projection determination module which has as input a selected sub-set of view finder bundles 240, and a camera image, and produces as output a reference page indicator, a projection transform, and a confidence measure.

View Finder Bundles

A view finder bundle 240 is derived data based on a reference page 210. Given a view finder bundle 210 and the reference page 210 of a viewed page 10, the viewing device 100 can recognize and track a viewed page 10 using the view finder module 130. The view finder bundle 240 is typically small in data size to facilitate quick communication to the viewing device 100 separately from the much larger pdf file of the reference page 210. This, of course, is not restrictive and in some cases, the pdf of the reference page 210 may be included in the view finder bundle 240.

A schematic diagram of a view finder bundle 240 is shown in Figure 13, and comprises one or more of the following: a) A list of local image features 368 from a reference page 210. During system operation, camera image features are matched against these local image features to establish feature correspondences.

b) A thumbnail page image 370 to aid, for example, in precise alignment of coarse projection transform estimations. Additionally, the thumbnail page image(s) 370 may be used in the user interface, avoiding the need to resample the full page image on the viewing device 100 when a page thumbnail is needed. Examples of user interface elements that might use a page thumbnail are dialogs, display headers and footers, scroll down lists of pages or publications, other graphics incorporating a thumbnail of a page such as a cover page and so on.

c) Other data 372, such as references to the page and publication from which the view finder bundle 240 was computed.

The VF Analysis Module 180 Figure 14 is a flowchart 374 illustrating the creation of a view finder bundle 240 by the view finder analysis module 180 (see Figure 10) - one of the software modules in the server 20. For clarity, view finder bundle generation is also diagrammatically represented in Figure 15

At 376, a reference page 210 is input to the view finder analysis module 180. Typically, the reference page 210 is a single-channel gray-scale image at a resolution of 96 pixels per inch. At 376, the view finder analysis module 180 produces a series of images known as a scale pyramid 396 by successively down-sampling the reference page 210 by a constant scale factor. A typical scale factor is the inverse of the cube root of 2, thus giving three down-sampled images for each full factor of two reduction in the image size. A set of steps in the scale pyramid 396 that form a full factor-of-2 reduction is often referred to as an octave of the scale pyramid 396.

The choice of the process for down-sampling the image to produce the scale pyramid 396 is not crucial for the recognition system. In the preferred implementation, the original image (the pdf of the reference page 210) is filtered by a Gaussian filter with a small sigma corresponding to the next scale in the pyramid and subsequently sampled using bilinear interpolation to produce the lower-resolution scale image 394. The produced lower- resolution image is then processed in the same manner (adjusting the sigma of the Gaussian filter appropriately) to produce an even lower resolution scale image and so on.

Each scale image 394 of the scale pyramid 396 is raster-scanned for corner features using a corner detector such as the FAST9 corner detector 380 as is known in the art. This detector registers a candidate pixel as the centre of a corner feature if it is surrounded by a specific configuration of dark and light grey values with reference to the central candidate pixel. The detector also determines a corner score measure for each candidate corner. Other corner detection methods may be employed as alternatives. All detected corners are considered in order of their score measure in step 382. The strongest corner is selected and moved to a selected corner list (noting its row and column position and the scale pyramid level) at step 386, and then all other corners in the candidate list within a specific radius (typically 9 pixels) are discarded by marking the area in an exclusion mask as shown in step 388. The operation continues (step 384) until there are no corners in the candidate list. Thus a dispersed collection of strong corner features is achieved in step 390.

Next, a local image feature is calculated for each selected corner using data from a fixed radius patch in the scale pyramid level it was detected in (step 390). An example of such a patch 398 is shown in Figure 16. The local image feature is the combination of:

• a level 394 in the scale pyramid 396;

• a location (image row and column of the corner 400 of the local image patch 398);

• a local orientation (an angle derived from the local image patch 398); and

• a local patch image descriptor (a vector of numeric values derived from the local image patch 398).

It should be noted that the local image patch 398 considered around a corner 400 detected at a high (more down-sampled) image 394 of the scale pyramid 396 will represent a larger area of the original reference image 210 than one detected at, say, the base resolution of the scale pyramid 396. The size of the local image patch 398 is typically a disc of radius 9 pixels. The local image patch 398 orientation is the peak image gradient orientation discovered by making a histogram of all gradient orientations from all pixels in the local image patch 398, weighted by the gradient magnitude at each pixel. As shown in Figures 17 and 18, the image descriptor part of a detected corner 400 is a value vector 416 built from the concatenation of six histograms 420 to 430 calculated for each of a set of six zones 402 to of the local patch 398. The six zones 420 to 430 are based on breaking the local image patch 398 into the intersections of rings (annuli) and pie-slices (segments) such that each has approximately equal pixel area. The embodiment shown in Figures 17 and 18 uses six zones but other methods of zone delineation are possible. Each pixel in the local image patch 398 could be considered to fall exactly into one of the zones 420 to 430. However soft-binning is used instead. Histogram information derived from each pixel 418 contributes to the four zones it is nearest to (zones 412, 406, 408 and 402 in the case of pixel 418), weighted bilinearly to the distance to the centre of each zone. Thus a pixel exactly in the centre of a zone (in terms of angular position and distance from the centre of the patch) will contribute just to the one zone. However a pixel near the edge of a zone will contribute proportionally to the neighbor zones.

To accumulate the histogram 420 to 430 of each zone 402 to 412, two values are calculated for each pixel as shown in Figure 17. The first is the gradient strength at the pixel 418. For the second, the gradient orientation is determined and rotated 90 degrees to form an orientation vector 416 for a local pixel intensity edge 414. The distance p of closest approach of this vector 416 to the centre (detected corner 400) of the local image patch 398 is determined. Distance p is signed, depending on which side of the vector the centre of the patch falls on. The p value of the pixel determines which bin of a zone's histogram is contributed to, and the gradient strength weights its contribution. Again, soft- binning is used to spread the contribution between two neighboring histogram entries.

Gradient strengths and orientations at each pixel are calculated with a Scharr operator, although other methods such as a Sobel operator could also be used.

The histograms 420 to 430 each have the property that strong straight edges in the image patch 398 will form peaks in the histogram independent of their orientation. In one embodiment the histograms each have six bins. As will be seen later in the matching process, it is important that image descriptors derived from differently oriented views of the of the same detected corner 400 produce similar descriptors. As a result of this, the angular alignment of the segment boundaries that contribute to the zone determination must be the same. To achieve this, the radial segment boundaries are determined relative to the overall orientation of the image patch 398.

To improve the computational efficiency of the local feature determination, a one pass calculation method is used that visits each pixel in the local patch 398 just once. Local image feature extraction is also used in the computationally sensitive camera image processing described below. To overcome the dependency on the global patch orientation calculation (which is only determined after all pixels have been considered) on the zone geometry, the zone radial segments are over-sampled by some factor. For example four times as many radial segments as the final target number of radial segments might be used initially. A dense histogram of orientations where each histogram bin corresponds to one of the narrow segments is created during processing of each pixel. Once the single pass over the pixels is complete the global patch orientation is determined and the zones are re- sampled into the smaller number of target zones oriented relative to the global patch orientation. In the implementation, only the histogram bins need to be rearranged given the new zone re-sampling. The construction of the image descriptor 432 from zone histograms 420 to 430 is shown in Figure 18. The final image descriptor 432 is the concatenation of each of the zone histograms 420 to 430 as a single vector of values and normalized to be of Euclidean length 256 (i.e. 8 bit or 2⁸), then stored as a byte array. In Figures 17 and 18, three segments, two annuli and six histogram bins are used, giving a total descriptor 432 size of 36 bytes.

An optional step in the building of the view finder bundle 240 is to scan the features for features with similar image descriptors 432 and discard all but one of such duplicates. This can aid in preventing useless clutter in patterned or highly repetitive image areas.

The View Finder Module 130 The view finder module 130 (see Figure 10) is configured with an on-device set 434 of view finder bundles 240. The set 434 will change from time to time in response to the recently recognized reference pages 210 the viewer application 190 selects as candidates for local recognition.

With reference to Figure 19, when the view finder module 130 is configured with a set 434 of view finder bundles 240, the features contained in all the VFBs 240 of the selected set 434 are tagged with their reference page 210 of origin and moves them to a pool of all image descriptors 436. Two working indexes are then built:

1) A descriptor-to-feature index (or map) 438, implemented using a k-d tree provides an efficient approximate nearest neighbor lookup. The produced index can be used to lookup a novel descriptor and obtain one or more features from the pooled reference image indexes that are approximate nearest neighbors of the novel descriptor in terms of the Euclidean distance between descriptor vectors.

2) A position-to-feature index (or map) 440 (in which the term "position" means a combination of page indicator, scale, orientation, row and column) implemented by a hash table using quantized values of the scale, orientation, row and column, and the exact value of the page indicator as a key. The quantization factors used are deliberately broad as will be described below.

Typically, many recognition and projection determinations are made with the same set 434 of view finder bundles. As the processing cost of building these indices (438, 440) is significant (particularly the k-d tree), the indexes are only rebuilt when the set 434 of view finder bundles changes; that is, only when the set of reference pages 210 that are candidates for recognition changes.

Camera image processing typically operates in a continuous low-latency environment where determination results are presented visually to the user as the user moves the camera over pages. Thus methods and algorithms in this processing section of the system employ a number of features to improve the efficiency of the system.

Figure 20 shows the repeated down-sampling of an input camera image 442 to form the scale pyramid 396 (see Figure 15). The input camera image 442 is supplied to the page recognition and projection determination system in the view finder module 130. The camera image 442 is typically a gray-scale image of approximately 320 by 240 pixels. It may be a frame from a video sequence. However, the recognition and projection determination system makes very few assumptions about the view based on previously determined views.

The camera image 442 is repeatedly down-sampled by factors of two to form a scale pyramid 396. Note that this is typically not the same scale step as that used in the reference image analysis. By using whole octave steps in the scale pyramid 396, processing load is reduced.

Next, a process of corner discovery starts. Unlike the reference image analysis, corner discovery in the camera image 442 proceeds by testing tiles 444, of the scale image pyramid 396 in pseudo random order. Typically 15 x 10 tiles 444 are used at each scale level 394. Each tile 444 is scanned for its strongest corner using a corner detection technique 380 such as the FAST9 method used in the reference image analysis (see Figures 15 and 16). The corner detection technique 380 becomes increasingly strict as all but the best corner discovered in the tile 444 is eliminated at step 446. If a corner is discovered in a tile 444, a local image feature descriptor 432 (see Figure 18) is extracted at step 448 in the same manner as described in the reference image analysis. In addition to being immediately applied to the following processing steps, the image feature descriptor 432 is added to a pool of image descriptors 450.

Figure 21 is a flowchart 452 of the processing steps followed by the page recognition and projection determination system in the view finder module 130 (see Figure 10). As described above, the process begins at step 454 by inputting a camera image 442 to the page recognition and projection determination system in the view finder module 130. The camera image 442 is down-sampled to generate a scale pyramid 396 (see Figure 20). The camera image 442 and the scaled images 394 in the pyramid 396 are divided into tiles 444 at step 456. An unprocessed tile 444 is processed for image features (such as FAST9 corner detection) at step 460. A feature descriptor 432 (see Figure 18) is generated for any features detected in the tile 444. If no image features are detected in the tile 444 at step 462, the next unprocessed tile 444 is selected at step 456 for processing at step 460. In the case where all tiles 444 are processed at step 460, and no features found at step 462, a match failure (step 458) is reported to the viewer application 190 (see Figure 10).

When a descriptor 432 is found in step 462, it is looked up in the descriptor-to- feature index (k-d tree) 438 at step 464 to discover the nearest neighbor reference image feature based on the descriptor value. The probability that an extracted camera feature corresponds to, and finds, a true corresponding feature in the pooled reference image features can, in practice, be fairly low. In situations with high image self-similarity, large feature pool size, poor focus, motion blur, and significant perspective distortion, the true correspondence rate at this stage of the processing can be as low as 1 in 50 (the k-d tree lookup is also approximate, so it may return the wrong match).

Steps 464 through 468 of flowchart 452 will now be described in more detail with reference to Figure 22. Each local descriptor 432 is matched to a reference feature descriptor in step 482 derived from a local feature in reference page 210. This corresponds to steps 464 and 466 of Figure 21. A comparison of the locations, scale pyramid levels, and orientation values of the two matched features implies that a specific rotation, scale and translation transform (an RST transform) would be needed to transform this local camera image patch to the reference image patch if this were a true correspondence. Note the reference feature 482 will have a reference page indicator associated with it. Also note that the general transform will be a projection transform. An RST transform is only a local approximation which can have significantly different values over the surface of the camera image under extreme perspective distortion.

As candidate correspondences 486, or 'votes' between camera image feature descriptors 432 and reference feature descriptors 482 are determined, any clusters with similar RST and page values are mapped into a sparse array, hash table 484. This is achieved by soft-binning each candidate correspondence 486 into the hash table 484.

The term soft-binning is used in several parts of this description. This term refers to the creation of a histogram in which the sample values being accumulated in the histogram are real-valued. In a common form of creating a histogram, histogram bins have central values and samples accumulated into the histogram are rounded to the nearest bin's central value, then a unit weight is added to that bin. When the term soft-binning is used, instead a sample value's vote is split into two values (summing to one) proportionally to how close the real- valued sample is to the centers of the two closest bins. The partial values are then accumulated in each of the bins. For example, if bin centers are at 0, 1, and 2 then a value of 1.75 will cause an accumulation of the value 0.25 into the bin centered at 1 and an accumulation of the value 0.75 into the bin centered at 2. Soft-binning may be applied in multiple dimensions.

The hash table 484 is a 5 dimensional table indexed by range-reduce values of rotation (10 bins per full circle), scale (one bin per octave), x-translation (one bin per 125 reference image pixels), y-translation (one bin per 125 reference image pixels) and the page (used exactly). The soft-binning means that each candidate correspondence 486 contributes a weight to the 16 bins formed by the 2 nearest bins in each of the 4 real-valued dimensions of the ST. The reference page indicator is exact and forms the 5th dimension without soft-binning. The exact bin closest to each candidate correspondence 486 also holds a reference to that candidate correspondence 486.

Candidate correspondences 486 in the table 484 may also be weighted by a factor related to the likelihood of each reference page 210, and RST, being a current true match. An example of a property from which such a weighting can be derived is the similarity of the RST of a candidate correspondence to the RST of a previously fully matched frame. At step 488 of Figure 22, a local bin (shown as cell C) of the hash table 484 accumulates a weight that indicates a cluster of similar RST and reference page indicators has been discovered (say 5 correspondences). In response, a check for a consistent projection solution is performed in step 490.

Figure 23 shows flowchart 492 for the process of checking for a consistent projection solution. In step 494, the weighted centroid of the bins in the local area of the peak weight is determined. Then the range-reduction is reversed to derive a RST transform at step 496. At step 498, the reference features in the candidate correspondences 486 in the local patch of bins are back-projected through the RST transform to give a camera view location. These are then filtered in step 500 to eliminate any reference feature location that back- projects to a camera view location that is greater than 15% of the camera view size from the supposed corresponding point. In step 502, the page recognition and projection determination system checks that a sufficient number of candidate correspondences 486 remain after step 500. If there are insufficient candidate correspondences 486 the projection check registers a failure at step 506. The broad tolerance is necessary because the RST is only a local approximation to what is really a projection transform. If there are sufficient candidate correspondences 486 remaining after the elimination process of step 500, a more expensive projection transform search begins at step 504 to identify a consistent projection transform. To discover if there is a consistent projection transform within the remaining candidate correspondences 486, a number (for example 20) of pseudo-random sub-sets of the candidate correspondences 486 are selected (see step 472 of Figure 21). An exact projection transform is solved for each sub-set. The remaining candidate correspondences 486 in the list are then tested by back-projecting their reference image location into the camera view space via the projection transform and comparing it with the actual feature location in the camera image space (see step 474 of Figure 21). In this case, a tolerance of 1% of the camera view size is used. If sufficient candidate correspondences 486 (typically say seven) are consistent with a given projection transform within the given tolerance, this projection transform will be accepted as a candidate solution. If all checks fail, the view finder module 130 will continue with corner detection and soft-binning until a higher weight in this local ST bin area is achieved, or some other peak weight appears.

A significant aspect of the viewer application 190 is the selection of the sub-set of candidate correspondences 486 that are used to solve for candidate projection transforms. In general, four candidate correspondences 486 are required to solve for a projection transform. Given not all candidate correspondences 486 are true correspondences, this number has implications for the number of tests the view finder module 130 is expected to perform to find the true projection. By way of illustration, if the probability of a candidate correspondence 486 being a true correspondence at this stage of the process is one in four, then only one in 256 of four selected at random will solve to a true projection.

The geometry of a camera view 14 (see Figure 7) is restricted and the location of the optical axis of the camera view is known to be in the centre of the camera image. By using this restricted geometry, a method can be derived to determine the projection transform from a three point subset rather than the normal four point method. Continuing the illustration, the hit rate is improved to 1 in 64.

The method for determining a projection transform from three candidate correspondences 486 is now described in greater detail.

Figure 24 diagrammatically illustrates the optical system 508 of the camera 102 (see Figure 7). The focus of the optical system 508 is located at the point F. The focal length of the optical system is denoted f. There are two planes parallel to the image sensor in the camera and a distance f from the focus. One is the plane containing the image sensor (not shown) and the other is the "virtual sensor plane" 510. The line 512 leaving the focus F and perpendicular to the virtual sensor plane 510 passes through the virtual sensor plane 510 before intersecting the object plane 516 being photographed, and thus the object plane 516 and the focus F are on opposite sides of the virtual sensor plane 510.

The virtual sensor plane 510 is coordinatized by placing its origin 514 at the point of intersection of the line 512. The points Si, S_j, and S_k represent the three features of interest in the camera image. The angles α, β, and γ are the angles made at the focus F by the lines from Si to F and S_j to F, S_j to F and S_k to F, and from S_k to F and S; to F respectively.

The coordinates of each of Si = (x;, y , Sj = (xj, y) and Sk = (xk, yk) are known explicitly (that is, in physical units, for example millimetres) from the location of the features in the camera image and the size of the camera pixels. Hence the edge lengths of the tetrahedron with corners F, Si, S_j, and S_k can be computed explicitly, and so too the cosines of the angles at the focus F, namely α, β, and γ. The distances from F to Si, F to S_j, and F to Sk are indicated by di, dj and dk respectively. These are determined by the equations given in Equation 1.

itk ⁼ y^' ¾ + ¾ + "

(equation 1) The distance from S; to S_j, S_j to S_k, and S_k to S; is indicated by the notation ε¾, e-_jk, and en respectively. These are obtained from equations given in Equation 2.

(equation 2)

Finally, the computed cosines are given by the equations in Equation 3.

(equation 3)

Pi, P_j, and P_k are the actual location of the features that correspond to Si, S_j and S_k respectively, in space on the object plane 516 being photographed. The object plane 516 is not typically parallel to the virtual sensor plane 510 (containing Si, S_j, and S_k). The transform between these two planes needs to be derived.

The coordinates of P; = (Si, ¾), P_j = (S_j, t_j) and P_k = (S_k, t_k) are known in the coordinate system of the reference image 210 (from the page server reference database 250). This allows us to compute the edge lengths of the triangle with corners Pi, P_j and P_k, again in reference image units. The notation Ui_j, U_jk and ¾i is used for the distance from P; to P_j, P_j to P_k and P_k to P; respectively and explicit equations are given in Equation 4.

(equation 4)

Conversion to physical units cannot be done without knowing the scaling from the reference image 210 to the object plane 516 being photographed. While this may be known, this method does not rely on it, and ensures the algorithm is correct even if the object plane 516 being photographed was printed with a uniform linear scaling other than 1. The plane 518 parallel to the object plane 516 and on the same side of the virtual sensor plane 510 as that plane, but intersecting the lines 520, 522 and 524 from the focus F to Pi, F to Pj and F to Pk respectively at points Qi, Qj and Qk respectively, such that the distance from F to Qi is 1 is now considered. There is a unique such plane.

The distance from the focus F to Qj is denoted as v. The distance from the focus F to Qk is denoted as w. The lengths of the sides of the triangle with corners Qi, Qj, and Qk are denoted as ¾, gjk, and gki for the distances from Qi to Qj, Qj to Qk, and from Qk to Qi respectively. At this stage, none of v, w, ¾, gjk, or gy are known explicitly. However, the geometry of the tetrahedron with corners F, Qi, Qj, and Qk allows the derivation of relations between them and the known cosines of the angles α, β and γ, as given by the equations in Equation 5. f .- I¹ 4- v² - 2 J..V,. COB a-:

5¾ V -f- w — ΊΛ -W. COS

4- 1²— .«;-,L eos

(equation 5)

In addition, the triangle with corners Pi, Pj, and Pk is similar to the triangle with corners Qi, Qj, Qk, since they are projections from the same point (namely F) on to parallel planes 510 and 518 respectively. Thus the ratios of the side lengths of these triangles are equal. Now the side lengths Uij, Ujk, and Uki of the former triangle are known, and hence these ratios can be computed. Importantly, these ratios are dimensionless, even though only Uij, Ujk and ¾i in reference space are known, and are unaffected by the uniform linear scaling to the object plane 516 being photographed. Thus, the ratios of gij to gjk, and gki to gjk can be computed by the equations in Equation 6.

9ki ^ k:

(equation 6)

At this point, it simplifies matters to introduce the auxiliary values p and from the ratios of the equations in Equation 6, as shown in Equation 7.

^«I

<j = ^~

(equation 7)

Values p and σ are dimensionless ratios whose values are known. These can rearranged to give expressions for the squares of Uij and Uki as shown in the equations Equation 8.

(equation 8)

Combining the known ratios from the equations of Equation 6 with the equations for v and w of Equation 5, and substituting the equations of Equation 8 to rewrite Uy and ¾i yields the two equations given in Equation 9.

( I ÷ t?² - 2v cm a) =

(y² - ¾;² - 2vtv cos β)

wii iv¹ 4- 1 - 2w COB ) = (σ - 1)¾¾ (v² + iu² - 2vw cos

(equation

9)

Dividing through to clear Ujk and expanding and rearranging, yields the equations given in Equation 10. —put - 2p(€m. )vw— (ρ— 1 }¾Γ— 2(οο« ο)ΐ? - 2 = 0 -συ - 2{co87) w ÷ 2{ + 11 (cos β)ν - (σ 4- I ir ÷ I. = 0

(equation 10)

The w² can be linearly eliminated from these two equations to obtain an equation for w in terms of v and v² and known constants, as given in Equation 11.

(ίτ 4 p)v - 2ai€< tx\-v ÷ (σ— .ø)

W = - ^: : — — ^; —

(equation 11)

This can then be substituted into the first of the equations from Equation 10 to obtain a quartic equation for v, which is given in Equation 12.

(4/J(<7 + l) cos² 5 - ( ) t <7)^a)w⁴ + σ(ρ + σ) os — Βρ(σ + 1) cos <x cos^*2 β^■+- 4p(p— σ— 2) cos β cos } v*

-4τ { Βρ{σ + 2 ) cos a os β cos 7 — ² os"* a— 4p{p— σ— 1 } cos" β

-4p(p - 1) cos² 7 + 2 - 2<7^:%² \ 4.p cos β cos η{ρ— σ— 2)— 4 cos α{ρσ→- 2p coe^*4 7— σ² } v

4 pew² — p— σ = 0

(equation 12)

The quartic in Equation 12 can be numerically solved to obtain v, and this substituted into the equation of Equation 1 1 to obtain w. Knowledge of v and w is knowledge of the geometry of the tetrahedron with corners F, Qi, Q_j, and Q_k, and thus of the geometry of the tetrahedron with corners F, Pi, P_j, and P_k by similarity.

At this point a sanity check is made on the values of v and w, and any solutions where either v or w is small (say less then 0.3) or large (say greater than 3) is discarded. This represents solutions with extreme perspective beyond the requirements of the system to match.

It is desired to derive the transform from the virtual sensor plane 510 containing Si, Sj, and Sk to the object plane 516 containing Pi, Pj, and Pk. A projective transform can be easily computed given four corresponding points. Having three corresponding points already, knowledge of the geometry is used to derive a fourth corresponding point, specifically, the location of the point of intersection of the line 512 through the focus F and perpendicular to the virtual sensor plane 510 with the object plane 516 containing Pi, Pj, and Pk.

The space defined by the x, y and z axes originating at point 514 in the virtual sensor plane 510 makes the line 512 co-linear with the z axis. The z axis passes through the focus F at coordinates (0, 0, f). The equation of the points lying on the rays through the focus F and the points Si, Sj, and Sk, parameterized by length 1 along these lines, are given by the equations in Equation 13.

(equation 13)

Since the distances to Qi, Qj, and Qk are known, the coordinates of these points be computed, obtaining the equations in Equation 14.

Qi ^' d ^"

(equation 14)

The non-orthogonal coordinatization of the plane containing Qi, Qj, and Qk with origin at Qi, and unit axis vectors from Qi to Qj and from Qi to Qk is considered. There is a need to determine the coordinate of the intersection of the z axis (that is, the line 512 through the focus F and perpendicular to the virtual sensor plane 510) in this coordinate system. To compute this, a linear combination of the unit axis vectors which sum to the displacement from the origin Qi to the desired intersection point is found. In the coordinatization of space used above, this is the pair (a,b) which solve the system of equations shown in Equation 15.

(equation 15)

All the coefficients in these equations are known either from the features in the virtual sensor plane as given, computed from Equation 1, or v and w as computed above. Hence this system can be solved directly by inverting the coefficient matrix and multiplying by the constant vector to obtain a and b.

Finally, by the similarity of geometry from Qi, Qj, and Qk to Pi, Pj, and Pk, the point 526 where the z axis intersects the object plane 520 containing Pi, Pj, and Pk must be the same linear combination of the analogous non-orthogonal coordinatization of the object plane 516 containing Pi, Pj, and Pk with origin at Pi, and unit axis vectors from Pi to Pj and from Pi to Pk. Further, this linear combination is unaffected by the uniform scaling induced by the choice of units for Pi, Pj, and Pk or any other uniform scale difference between the object plane 516 being photographed and the reference image 210. Thus, this intersection point, in the coordinate system of the reference page 210, is as given by the equations in Equation 16.

(equation 16) This intersection point is none other than the projection of the origin of sensor space to the object plane 516 being photographed, which thus provides the fourth point for input into the usual algorithm for computation of a projective transform.

Once a page 10 and projection 18 (see Figure 25) derived from a local RST peak has been accepted, a process of refinement commences. It should be noted that at this stage, the projection is an approximation only. Because an RST transform is only an approximation to a projection transform locally, it is typical for the corresponding features that gave rise to the projection to be a local cluster in one area of the camera image 15 (see Figure 25). While the projection transform 18 may correspond reasonably in this local area, it often has large errors in more distant parts of the camera image 15.

Figure 26 is a flowchart 528 of the projection refinement process. The first step 530 in the refinement process is to re-consider features that were extracted from the camera image, and were binned in the RST binning process but did not contribute to the local RST cluster that gave rise to the projection transform (see step 500 of flowchart 492, Figure 23). From the process discussed above in relation to Figure 24, an approximate transform is now known for each of these features. In light of this, the feature's camera space scale, orientation, X coordinate and Y coordinate can now be considered, and by transforming, at step 532, through the projection, a corresponding reference feature's scale, orientation, X coordinate and Y coordinate can be estimated. In step 534, this estimation is looked up in the previously constructed position to feature index 440 (see Figure 19). Again, because of the approximate nature of the transform, wide margins for error must be accounted for. To achieve this, the four real values of scale, orientation, X and Y are divided by quantization factors, and the two integer values that each of the four real values is closest to noted. Quantization factors are typically 1/10th of a circle for the orientation, one octave for the scale, and 160 reference image pixel steps for the coordinates. Every combination of the four pairs of integer values (16 of them) are produced, and combined with the page indicator and used as a key in the position to feature index 440.

In step 536, features found by this lookup in the position to feature index 440 are then filtered to those that have a close descriptor correspondence (in terms of descriptor Euclidean distance). This will produce a list of new candidate correspondences that have reasonable positional and descriptor matches (within broad error margins) to the approximate projection. These are combined in step 540 with the correspondences that originally formed the projection approximation. At step 540, the algorithm checks for sufficient candidate correspondences (for example, more than 30).

If there are insufficient candidate correspondences, the algorithm checks for another unprocessed tile 444 from the camera image 442 (see Figure 20) at step 542. Feature detection is performed on unprocessed tile 444 at step 546 and these features are similarly processed until sufficient are found or there are no more unprocessed tiles 444 in the camera image 442. With sufficient features, or a lack of unprocessed tiles 444, the algorithm proceeds to validate the projection at step 544

It should be noted that once sufficient correspondences are found, the algorithm will no longer perform feature detection in the camera image 442. The algorithm is designed to be greedy in its search for a solution so that it completes with minimum computation in cases where the camera is viewing a true reference page 210. In good cases only a small fraction (for example, 25%) of pixels from the camera image 442 are considered by the feature detector and fewer than 50 features extracted.

An improved projection is now searched for based on the list of candidate correspondences 486(see Figure 22), which may still contain a percentage of false correspondences. This process corresponds to step 476 of the flowchart 452, Figure 21 and is expanded here with reference to flowchart 548 shown in Figure 27. A method broadly similar to the RANSAC algorithm (RANdom SAmple Consensus), as it is known in the art, is used, however with some significant differences. The algorithm performs a fixed number of iterations (typically fifty) on the list of candidate correspondences (step 550). The predetermined number of iterations is used in step 568 is described below. In step 552, a random subset of three candidate correspondences 486 is selected. A candidate projection transform is computed in step 554 using the method shown in Figure 26. Traditional RANSAC uses a fixed tolerance to evaluate the worth of this candidate model (i.e. projection transform) by counting how many candidate correspondences fit within the tolerance (i.e. projection error). The modified algorithm used in step 556 of this embodiment uses a continuous valued variation that allows for better discrimination between similar models.

For a given tolerance t and back projection error d (the Euclidian distance from a reference feature location back projected by the model to the camera space location of the correspondence), the value t² / (d² - 1²) is calculated for each candidate correspondence and summed in step 556 to produce a fitness measure. In step 558, this fitness measure is compared with the best fitness measure so far. If it is greater, this candidate projection transform is accepted as the current best projection transform at step 560.

While traditional RANSAC maintains a fixed tolerance, the modified algorithm re- evaluates the tolerance whenever the best projection transform is updated (see step 562). The tolerance is updated to be the maximum projection error of any point that has an error less than the current tolerance. Further, at step 564, if at any iteration the tolerance is greater than a linear interpolation from a start tolerance to a target finish tolerance (over the number of iterations), it is lowered to the current interpolated value and the best fitness measure updated in step 566.

Note also that the current best projection transform in the above method is based solely on the direct solution of a small number (for example, three) of candidate correspondences. In a traditional RANSAC, a model is fitted over all the points that have error less than the tolerance for a test model (typically a least squares fit) and this forms the current best model. This step is forgone due to the processing expense of the model fit.

Once the iterations are complete, the candidate correspondence list is filtered at step 570 to those candidate correspondences that fit the best projection transform. Then, a least squares fit occurs in step 572 to produce a final best projection transform.

The projection transform produced by the above fitting process will still have small errors. As the target environment provides continuous visual feedback to a user of successive camera frames that are independently recognized, even small errors (of a few pixels) in any part of the image are very noticeable as jitter. To overcome this a final alignment phase 478, in Figure 21, is performed.

The alignment process in step 478 is illustrated in greater detail in flowchart 574 shown in Figure 28. The list of candidate correspondences output from step 570 form the starting point 576 for the alignment process. At step 578, a reliable seed alignment point is determined from the centroid of the current correspondence list. Then at step 580, a list of all the corners detected by the original corner detection process (see Figure 20) is used to provide a sub-set (typically 20) of corners. The corners in this sub-set are ordered by distance from that seed alignment point for individual processing in that order. At step 582 the next unprocessed point in the ordered list is chosen. Next, a small patch (typically 1 1 x 1 1 pixels) centred on the point is extracted from the camera image at step 584. Then at step 586, a corresponding patch is resampled through the current projection transform from the alignment reference image stored in the view finder bundle for this reference image. A Mipmap, as is known in the art, is built to improve the efficiency of this resampling. At step 588, each of these corner patches is compared by normalized cross correlation of the patch extracted from the camera image and corresponding patch from the reference alignment image. The position of the peak in each cross correlation reveals a shift required to align the patches. This shift is determined in step 590 and stored to memory in step 592. The shift is used to align feature correspondence at this position (in step 594). The shift distances are accumulated at step 596 and each time the sum of shift distances passes a threshold (step 598), a new projection transform is produced in step 600 by least squares fit of the aligned correspondences. Step 602 iterates back to step 582 if there are corner points yet to be processed.

Finally, at step 604, a confidence measure is computed as the area of the convex hull of successfully aligned correspondences expressed as a fraction of the area of the camera image, multiplied by the number of successfully aligned correspondences.

The final projection, the confidence measure and the identified reference image (page) are returned as the result of the recognition and projection determination module's processing for this camera image.

Calculation of the final projection can be further refined using non-image information such as data from hardware sensors on the viewing device 100 including accelerometer(s) and/or gyroscope(s). For example the relative movement of the viewing device 100 with respect to a previously established projection from a previous image frame may be provided by the accelerometer(s) and/or gyroscope(s) and used to compute, validate or refine the projection transform computed by view finder module 130 (see Figure 10).

Viewing Device Page Recognition with Reuse of View Finding Calculations

The projection determination system described above is optimized to determine a precise projection onto one of a small set of active pages, typically less than 20 pages. However there is also a need to recognize when a page outside this active set is being viewed, and what page it is, without significant extra computational load. To achieve this, a composite page recognition pack may be produced from a set of view finder bundles 240; for example, all the pages of a single magazine. This process is illustrated in flowchart 606 of Figure 29.

The input 608 to the page recognition pack construction module is a set of view finder bundles, each tagged with a page identifier. All the features from the view finder bundles are tagged with their page identifier and pooled in step 610. Optionally, features from high resolution layers of the scale pyramid, and/or a pseudo random subset of all features are punctured (removed) to reduce the total number of features.

Next the pooled page recognition features are arranged into clusters with similar descriptors at step 612. The clustering algorithm is parameterized by a maximum cluster radius. For each feature, all other features within this maximum cluster radius are listed against it. From this pool, repeatedly, the feature with the most other features listed against it is moved to a cluster list, and the other features listed against it are removed from the pool. The process stops when the pool is empty.

The cluster list now contains features and other features listed against each that are considered to be in the same cluster. For each cluster, if there are multiple features in the cluster with the same page identifier, all but one are discarded in step 614. Next the features are formed into a list with features belonging to the same cluster placed adjacent to each other. Each feature is "painted" with a cluster "color" which alternates from "black" to "white" each time there is a transition from one cluster to another in the list (the cluster color is simply implemented as a bit). The resulting list forms a page recognition pack at step 616. The view finder module 130 described above may also be supplied a page recognition pack as input. If such a pack is provided, a page recognition descriptor to feature index is built in the same manner as described above (based on a k-d tree).

During the projection search steps described above, if no solution is found, and a page recognition index is available, page recognition is performed according to the flowchart 618 illustrated in Figure 30.

Firstly, the page recognition index is accessed as the input 620 to the process. At step 622, the RST bin sparse array (implemented as hash table 484 - see Figure 22) is cleared. The features previously extracted in the view recognition process are looked up again in the page recognition descriptor to feature index 438 in step 624. In step 626, camera image features are compared to the features in the page recognition descriptor to feature index 438 for correspondence with any feature that is a member of a cluster. In this case, a correspondence is added to the RST bins of the hash table 484 for each combination of the camera feature and all potentially corresponding features in the cluster of reference page features (step 630). Thus, by way of illustration, if a camera views a local image feature that is similar to local images in 20 different pages out of a universal set of 1000 pages, all twenty potential RST transforms will be accumulated in the RST bins of the hash table 484 from the single k-d tree descriptor lookup.

If a camera image feature does not correspond to any of the clustered features in the page recognition descriptor to feature index 438, a correspondence failure is reported at step 628.

In step 632, detection of peaks in the RST bins, and the discovery of a locally compatible projection (in step 634) proceeds as previously described. However, as soon as an acceptable local projection is discovered (in step 636), the process terminates with a page recognition and an approximate projection returned to the view finder module 130 (in step 640). The projection is not refined.

This process is designed to add minimal additional computation while achieving page recognition from a much larger set of candidate pages, and only incurring this extra computation when projection determination from the active page set fails. Computation costs are reduced by re-using the corner detection and features extraction already performed, and matching features across multiple pages from a single k-d tree lookup.

Using Feature Clustering and Contextual Information

In another implementation of the system, during the recognition process, contextual information is used to either select a subset of pages to test for RST transform or assign a priority weight to all the images which have at least one matched feature with a feature from the camera image. For example, pages from popular publications are selected or prioritized over other images during the construction of the RST bin array in the hash table 484.

Overview of the Server System Operation With reference to Figure 10, the server system 20 holds the page server reference database 250. The process of setting up the page server reference database 250 is preferably initiated offline, before any interaction with the viewing device 100 occurs. Subsequent updates to the page server reference database 250 may occur while the server system 20 is in operation. In addition, one or more page recognition feature indexes, 200, and cover page recognition feature indexes, 270, are constructed. The server system 20 processes two types of request. A match request 260 (i.e. query image recognition request) and content requests 290.

Setting Up the Page Server Reference Database and Descriptor Indexes

Figure 31 is a diagram showing the process of setting up or updating an initial page server database 250.

At step 642, each magazine publisher provides data such as digital images of each page of a magazine and the magazine's bind edition details which may include title, issue date and region of circulation. In step 644, the data provided by the magazine publisher is then added to the page server reference database 250. Multiple images at different known fixed poses or resolutions of each magazine page may be stored in the page server reference database 250, however only one image is typically stored. Interactivity data (such as augmentation data 220, Figure 10) may also be provided by the publisher for some or all of the magazine's pages. The flows beginning from steps 646 and 652 operate independently and may execute in parallel at any time after magazine data has been added to the page server reference database 250.

At step 652, a view finder bundle 240 is generated for each reference magazine page 210 by the view finder image analysis module 180 (described in more detail above with reference to Figure 10). View finder bundles 240 allow fast and precise page tracking in the view fmder module 130. At step 654, the view fmder bundle 240 is saved to the page server reference database 250.

At step 646 a duplicate detection process is executed on the reference pages 210. The duplicate detection process is described in more detail below. This process detects and marks in the page server reference database 250, reference pages 210 that are extremely close matches to other reference pages 210. Only one exemplary version from each set of duplicate pages is preserved for the next stage of processing.

At step 648, page recognition bundles 230 are calculated from the unique pages that resulted from the duplicate detection, and stored in the server database at step 650. The page recognition analysis module 170 (see Figure 10) is described in more detail below.

Once all page recognition bundles 230 have been produced, before the page server reference database 250 is made active for processing image match requests 260, a page recognition features index 200, is built at step 656. This index is built from data stored in the page recognition bundles 230, and is retained in server memory for fast access. As they are only stored in memory, these indexes are rebuilt as necessary is on server re-starts. Each index is a kd-tree, as is known in the art, of all the image descriptors included in the tree, with reference to their feature information. These indexes allow approximate nearest neighbor lookup of an image descriptor from a query image to discover closely matching image descriptors from the reference images 210; which occurs in the matching process, described in more detail below.

Depending on the number of entries and therefore required storage size, the reference database 250 may be deployed in different ways such as compact (on a single server) or distributed (on multiple servers), on the viewing device 100 itself, on the local network which the viewing device 100 is connected to, and on the Internet. To optimize look up in the page server reference database 250, various indexing methods are employed. Database organization is improved using application domain information. For example, quicker operation is achieved overall by prioritizing access to popular magazines. If the page is the cover page of a magazine, a separate cover page recognition feature index 270 may be built to allow fast lookups restricted to the set of cover pages in the page server reference database 250.

After database setup is complete, each database entry (corresponding to a magazine page) comprises a page recognition bundle 230, a view fmder bundle 240, the digital page image 210, the magazine publication details and other metadata such as the number of the page 235 in the specific publication 225, the placement of the page in a page spread or gatefold, and the like.

In applications where page tracking is not required, view finder reference image analysis is not performed and therefore view finder bundles 240 need not be stored in the reference database 250. The viewing device 100 is preferably set up to interact with the page server reference database 250 through the installation of a viewer application 190 on the viewing device 100. Alternatively, a custom viewing device 100 may be used which is pre-configured to interact with one or more specific reference databases 250.

Duplicate Page Detection It is desirable to detect which pages of a large set of magazine page images 210 are near duplicates. Near duplicate magazine page images frequently occur due to ads being run in multiple publications. A recognition process that views one of these ads may misattribute the ad to a publication other than the actual publication being viewed if no knowledge of potential duplication is available. Further, vote-splitting in the server page matching process (described below) will degrade matching performance of frames where multiple copies of the same reference image 210 are indexed.

As briefly discussed above in the "Setting Up the Page Server Reference Database and Descriptor Indexes", a special duplicate detection module 360 is used to detect duplicates. Although the general image matching process used for match requests 260 (see Figure 10) could be used for this purpose, due to the restricted problem domain (very low rotation, scale and shifts and very high similarity) a more efficient process can be used.

With reference to Figure 32, the duplicate page image detection method maintains a set of unique pages 658. That is, pages that are all known to be different from each other. An index of descriptors 660 of these pages is maintained, in memory, during the duplicate detection process. This index 660 allows the lookup of an unknown descriptor from a match request 260, along with its approximate location on the page, and returns closely matching descriptors from the previously indexed pages 662. This index 660 is implemented as a set of kd-trees 664 (as is known in the art), each kd-tree 664 indexes descriptors within a limited tile 666 of page 662. The tiles 666 are typically 30 by 30 pixels.

The duplicate page image detection method takes a new page image 668 of unknown status. The new page image 668 is analyzed to discover stable feature point at step 670. This is accomplished with a simplified Harris corner detector as known in the art. This can be simplified to use small box filter approximations rather than the exact filter. In step 672, the new page image 668 is tessellated into 256 by 256 pixel tiles, and the twenty strongest feature detections in each 256 by 256 pixel tile are used.

For each detected feature a local image descriptor is extracted in step 674. The local image descriptor is computed from a sparsely sampled gradient strengths expressed as X and Y strengths. A 30 by 30 patch of pixels around the detected feature centre is considered. Fifty local image gradient vectors are sampled in a sparse checkerboard pattern within this area. The resulting 100 bytes is normalized to have length 256 by dividing all elements by the square root of the sum of the squares of the elements divided by 256. This results in a 100 byte local image descriptor. Step 676 checks if the new page image 668 is a duplicate of an existing page 210 in the page server reference database 250. Each descriptor in the new page image 668 causes a nearest neighbor search in the index of unique pages 660. A nearest neighbor search is performed in the four kd-trees 664 closest to the descriptor centre. Thus each descriptor in the new page image 668 is paired with a candidate correspondence 486 (see Figure 22) in the page server reference database 250. In step 678, each new page image 668 with more than twenty candidate correspondences 486 is selected for a RANSAC fit, as known in the art, perform in step 680. This determines a projection transform between the new page image 668 and the candidate reference page 210. The best such fit, if any, is regarded as a candidate duplicate page.

In step 682, the original page image 662 corresponding to each candidate duplicate page, is retrieved from the set of unique page images 658. The new page image 668, subject to the discovered projection transform, is compared to the original page image 662 in step 684. The mean and variance of the difference is determined, and if it is within acceptable tolerances, the candidate duplicate page is declared a duplicate.

If it is not a duplicate, the page is merged into the unique page index 660 and the set of unique page images 658.

Image Signatures and their Calculation

A camera image frame is compared (matched) with a reference image by comparing (matching) their respective image signatures. The processes of calculating an image signature is a common sub-operation that is shared between the PR analysis module 170 and the PageReco module 110 in the server 20 (see Figure 10).

An image signature is a plurality of image features each of which is computed from a unique interest point from the plurality of interest points detected in the image. Each image feature comprises an n-dimensional vector (the image descriptor) plus image location, scale information and local orientation information related to the interest point used to compute it.

Such signatures can be computed using a variety of image signature computation methods. One way to compute an image signature is using a variation of the SIFT (Scale Invariant Feature Transform) algorithm known in the art and described in US 6,711,293 to Lowe (hereinafter Lowe), the contents of which are incorporated herein by reference. Given an image, one can compute interest point locations in a scale pyramid, and descriptors at those interest point locations. The system described here identifies feature points differently to the known SIFT. Referring to Figure 33 A, candidate feature points are identified by generating a scale pyramid 396 similar to that used in the corner detection process described above with reference to Figures 14 and 15. The scale pyramid 396 is a series of scale images 394 that are successively blurred versions of a base image 688. In the context of the interactive viewing system 2, the base image 688 is the query image 109 (see Figure 7) incorporated into the match request 260 to the page server 20 (see Figure 10). The scale pyramid will be referred to as a Gaussian Scale Space (GSS) 396 and each scale image in the stack as a level 394 in the GSS 396. At this point the method significantly diverges from Lowe. Each level 394 in the GSS 396 is processed 694 to produce a set of gradient images 690 with a corresponding number of levels 394.

Referring to Figure 33B, the gradient image at each level 394 consists of gradient vectors 916 that resolve into X and Y gradient strengths at each pixel location 692. Various methods may be used to calculate a gradient image. In the present embodiment, the processing 694 is a Scharr filter. Next, a set of squared, normalized, gradient difference images 700 are produced by processing 698 adjacent pairs of gradient images 690. The processing 698 is an element-wise subtraction of the associated pairs of gradient images to obtain a gradient difference (DX, DY) at each point 692, calculating the squared magnitude of this gradient difference (DX² + DY²) at each point, normalizing this gradient difference magnitude by multiplying by the square of the sigma of the Gaussian blur at this level and storing the resulting value at the corresponding point 702 in the set of squared, normalized, gradient difference images 700.

The set of squared, normalized, gradient difference images 700 are a stack of images in which each pixel is a squared normalized gradient difference magnitude. All the pixels in each of the images 700 are individually compared to pixels immediately surrounding it to search for maxima. For example, the pixel 702 in the squared, normalized, gradient difference image 704 is not only compared to the eight adjacent pixels 712 in the same squared, normalized, gradient difference image 704, but also the nine pixels 714 at the corresponding location in the squared, normalized, gradient difference image 706 beneath, and the nine pixels 710 in the squared, normalized, gradient difference image 708 above the image 704. If pixel 702 is greater than all twenty-six surrounding pixels 710, 712 and 714, it is a maxima. Each maxima so identified, is a candidate feature location. The location is refined by fitting a local curve 696 in the same manner as Lowe, giving a location 692 in both GSS 396 and the Gaussian gradient space 690. By this process a set of feature points 716 in scale space are established that are characteristic of the base image 688. This method requires only maxima to be identified, rather than maxima and minima as is described by Lowe. The skilled worker will appreciate the processing efficiencies this yields.

Once candidate feature points 716 are discovered, image descriptors 432 (see Figures 16 and 17) in the locality of each identified feature point are extracted. Figure 34 is a flowchart 730 for generating a feature point descriptor in the form of an n-dimensional vector for each of the image features 716:

• At step 718, values interpolated between the closest two levels of the gradient scale space 690 of the image region around an interest point 692 established previously are considered to form a region of interest ( OI).

· At step 720, gradient orientations with reference to the interest point 692 within the ROI are computed and the peak value used to normalize for rotation invariance.

• At step 722, an n-dimensional histogram of normalized gradient orientations is created (where n is typically but not necessarily 128).

• At step 724, the histogram is normalized in magnitude for illumination invariance to form the image descriptor.

• At step 726, the image location and scale corresponding to the interest point 692 are appended to form the image feature.

• If the signature of a reference page 210 (as opposed to a query image) is being computed, at step 728, a reference of the substrate/magazine page whose image is being processed to extract features from is appended to the image feature.

For a reference page 210, the complete set of features 716 (each with descriptor, location, scale, angle and substrate reference) form the image signature and their stored representation is the page recognition bundle 230 for that reference page 210 (see Figure 10). The PageReco Module 110

A server image match is performed upon receipt of a match request 260 from a viewing device 100 (see Figure 10). Figure 35 is a flowchart 750 of the image match process within the server 20. The match request 260 includes a query image 109 from within the camera field of view 14 (see Figure 7). In step 732, an image signature for the query image 109 is computed in the manner described above. This gives a list of query image features.

In step 734, each query image descriptor is looked up in the page recognition feature index 200 (see Figure 10) to find a small (typically three) set of approximate nearest neighbor descriptors from the population of reference page descriptors. In a typical embodiment the reference feature index is a k-d tree as is known in the art. Each matched descriptor corresponds with a feature in the Gaussian scale space 396 (see Figure 33A) of a reference page 210, which gives an orientation (angle) scale and location.

In step 736, each reference feature of each match is binned into a histogram with dimensions of page, angle, and scale.

In step 738, the best N (typically top ten) peaks in the histogram are discovered. Each of these corresponds to a potential image match.

In step 740, for each potential image match, the set of feature matches that are compatible with the page, angle and scale are selected. In step 742, a projection transform is fitted to each selected set of feature matches using RANSAC, as is known in the art.

In step 744, each fit is scored. Many scoring mechanisms are possible. One used is the area of the convex hull of the features compatible with the projection transform fit (that is, the points selected by RANSAC) times the number of features. In step 746, all the fits so found and compared, and the best, or a small set of bests, is selected. It is possible none are selected, which is returned as a failure to recognize. If more than one is selected, it is returned as an ambiguous recognition, that is, the system can not distinguish which reference page of several this query image is viewing - possibly due to reference image similarity.

In step 748, reference pages 210 that are exemplary examples of duplicate pages are expanded to be the complete set of duplicates. That is, these are automatically ambiguous recognitions.

Page Server and Viewer Content Management

With reference to Figure 10, the page server 20 sends a content response 300 to the viewer device 100 in response to both match requests 260 and content requests 290. Initiation of content response 300 may be made by the viewer device 100 or the server 20. In the preferred embodiment, initiation of content response 300 is made by the viewer device 100 making a content request 290 to the server 20, or, an implied content request 290 with a match request 260. Caching schemes, as are known in the art, are employed in the network path from the viewing device 100 to the server 20 to reduce network traffic and speed delivery of content. In the following embodiment, the terminology "sent to the viewing device" is shorthand for "requested by the viewing device and returned to the viewing device by the server" as used in the preferred embodiment. However, as discussed above, direct server initiated transmission can also be used.

Upon page server match completion, the match response 280 is communicated to the viewing device 100. Recognition results may include the ID of the recognized reference page(s) 210 and associated publication(s) and the publisher(s), the view finder bundle 240 and the full image of the recognized reference page(s), as well as other data (listed below). If recognition is unsuccessful, notification is sent to the viewing device 100 to update its display accordingly. Typically, upon recognition failure, the user is asked to point to a page or cover page to initiate a new match. Apart from the page ID, the view finder bundle 240 and full images of the recognized page(s), one or more of the following may be also sent to the viewing device 100 as a result of a successful page server match: a) Interactivity definitions, each comprising one or more page extents describing the location of interactive features relative to the recognized page image, as well as all the interactive content for that feature including a graphical thumbnail of the associated interaction type. An interactive feature is, for example, a hyperlink, a video, a form field etc. For button-type interactive features, the interactivity definition may include click-through menu options, smart clippings, etc.

b) A word index to support word-based operations such as dictionary or web search of a selected word associated with the interactive feature of the containing page or publication.

c) Network connection session data including page recognition module 1 10 status details, last page server match time duration, authentication tokens, etc. Such data may by used for quality, security and monitoring purposes.

d) Information regarding related pages such as adjacent pages in a page spread or gatefold used. Such data may be used to make decisions about pre-fetching nearby view finder bundles 240 to facilitate page tracking. Such information may be in the form of a set 434 of view finder bundles or a list of references to related pages, publications or specific interactive features.

For efficiency reasons, upon a successful page server match, view finder bundles 434 of pages likely to be viewed after the currently viewed page are preferably pre- fetched and sent together with the view finder bundle 240 of the recognized reference page 210 to the viewing device 100. If the page server match is a cover match, only the cover page's view finder bundle is sent to the viewing device 100. If the page server match is a page match, two or more view finder bundles are sent to the viewing device 100 corresponding to the publication page two-page spread or multi-page gatefold which includes the page recognized by the server match.

Additionally, view finder bundles 434 of other pages may be sent to the viewing device 100 such as the N previous and M next pages of the publication relative to the currently viewed reference page 210, where M, N are integer numbers. View finder bundles 240 of pages likely to be viewed in the future (according to application defined criteria) may also be selected and sent to the viewing device 100.

Depending on the application, full page images of such nearby pages may also be pre-fetched, for example using the logic described above. View finder bundles and full page images may be bundled together or sent separately for efficiency as the full page images are typically larger in data size and may take longer to transmit over a network. The Όη-device' availability of a set 434 view finder bundles allows a viewing device 100 to track corresponding pages without having to wait for their full page (digital twin) images to be downloaded from the server 20. The viewing device 100 may display a placeholder graphic or a magnified thumbnail when tracking pages for which a full page image is not yet available and only display a full resolution digital twin image when that becomes available.

A cache of most recently recognized reference pages 210 on the viewing device 100 may also be used to improve page recognition efficiency. In this implementation, during operation, the viewing device 100 builds a small reference document database of documents already recognized. This is a speed optimization only, not required in order to achieve page recognition or page tracking. When a page server match is initiated, the local (on-device) reference database is looked up as well. If a good match is obtained earlier it is used instead of the page server match for immediate feedback. The local match is validated or discarded when the result of the page server match is available to the viewing device 100.

Method of Improving Match Performance

The efficiency and accuracy of a magazine page recognition system can be improved by restricting the set of reference magazine pages 210 searched. One method of restricting the reference page search space is by asking the user to bring the cover of a magazine in the field of view of the viewing device 100, and making a server match request 280 that uses the restricted cover page recognition feature index 270 (see Figure 10), thus initiating a page server match which, assuming it is successful, returns a cover page match response 280 which the user may be asked to confirm. Once confirmed by the user, subsequent page server match requests 260 may search a subset of the page server reference database 250 comprised of the magazine pages of that particular magazine and all page covers. The reference search space remains the same until a different cover page is recognized or if recognition fails. If recognition over the restricted reference set fails to produce a good page match, the reference set may be made progressively larger until a good page match is found or the complete universal reference set has been exhausted. Page recognition accuracy can be further improved when recognition results from multiple image frames are combined. Such image frames may be views of the same magazine page, different magazine pages or background images as the viewing device 100 moves. Examples of image frames that can be used in addition to a given image frame X include: a) The N image frames captured before frame X.

b) The M image frames captured after frame X.

c) Both N previous and M later frames relative to X.

Recognition results of such supplementary image may be used to compute the match response 280 for frame X or to validate a recognition result for frame X.

Page recognition accuracy can also be improved in case two adjacent pages in the same publication are visible in the same image frame. Recognizing the existence of at least one of the two pages in the image is important information which assists recognition or validation of the recognition result of the other page. This observation extends to cases where more than two pages appear on the same image. For instance when a few magazine covers are imaged stacked onto each other, each of them partially visible. Another example is when a page corner is folded revealing page content from the next page as well as the page after that. Multiple page recognition may be invoked during the recognition process (allowing for more than one page results per page) or post-recognition (allowing a singe page result per page) to disambiguate a previous recognition result. In the latter case, recognizing features from an adjacent page helps to increase confidence in the single page result returned, or decrease it otherwise.

In addition to matching the query image 109 for making a decision regarding the identity of the document corresponding to the query, the system may use contextual information such as the history of past user interaction with the system, the environment of the user such as application preferences, the user's location, the user's personal details, application domain specific restrictions such as knowledge that some known documents may or may not exist at a particular time or place.

The involvement of such non-image based information may happen at different stages of the recognition process - for example, before matching, during query formation, during matching, affecting matching of image-based measurements, and after matching, to pick one of a set of possible recognition results and to "learn" knowledge for future operation of the system.

Other Methods of Matching The following methods may also be used instead of, or in combination with the aforementioned system.

Other Luminance Features

Recognizing a document via image recognition as described above may use other luminance based features in addition to, or instead of, the features described earlier. Such other luminance features can be but are not limited to luminance measurements of the colour information in the document image, e.g. related to the colour content at a specified distance from other computed localized features (such as detected image corner features) in specified directions or min-max colour points of an image of a document page.

Barcodes Barcodes printed or otherwise displayed on the document such as one dimensional barcodes or two dimensional QR codes scanned by the viewing device 100.

Netpage Tags

Netpage tagging (for example as disclosed in US 8,028,925, the contents of which are herein incorporated by reference) or other continuous tagging covering part, or all, the document page 10 (see Figure 10) and decoded by the viewing device 100 may be used to uniquely identify a document.

Text Recognition (OCR)

Text uniquely identifying the document can be detected in an image of a part of a document page and matched against a database such as an inverted database of text in all documents known in an application context. Referring to Figure 36, document recognition accuracy may be improved and latency reduced by performing off-axis text signature recognition and matching against an inverted database of text signatures. An example of an off-axis signature of page text spanning multiple text lines is the first letter of the words of all text lines closest to an assumed vertical axis 752 across the viewed page 10. That is, the first letter of the left hand side word 754 closest the vertical axis 752, and the first letter of the right hand side word 756 closest to the axis 752.

Printed or Displayed Stegano graphic Code

Printed documents may be recognized by detecting a steganographic code such as a watermark added to the document in a user visible or invisible manner (see, for example, US 5,748,783 assigned to Digimarc). For documents displayed on active screens the equivalent steganographic code may be implemented in ways such as a small section with a flashing code or a temporal code displayed interleaved between video frames.

Characteristic Image Recognition

Printed or displayed documents may be recognized by detecting a characteristic image such as a logo or trademark added to the document. For example, a special graphic may be present at a particular place of each page of a publication with a special appearance which is used to identify the page, the publication or its publisher or other relevant information. A viewing application trained to recognize such special graphics can then be used to retrieve the page or publication or publisher information thus achieving high confidence page recognition or at least reducing the reference search space to a small subset relevant to the recognized graphic.

Substrate Fingerprinting

The viewed page 10 may be recognized by detecting a unique physical property or fingerprint associated with the page. Known methods for identifying pages using a physical property or fingerprint include: · identifying a unique paper texture defining a unique fingerprint (see, for example, US 8,078,875);

• detecting a randomly-dispersed ink taggant defining a unique fingerprint (see, for example, US2005/0045055); and • identifying random ink splatters around the edges of printed features (see, for example, B. Zhu et al, "Print signatures for document authentication," in Proc. 10th ACM Conference on Computer and Communications Security, 2003, pp. 145-154).

Electronic Tags A viewed substrate 10 (Figure 10) such as a publication or page may be recognized using an electronic tag, such as an NFC (near field communication) tag or RFID (radio frequency identification) tag, associated with the substrate. Electronic tags of this type advantageously identify substrates with 100% confidence, although they require a suitable tag reader and modification of the substrate. Colour Features

Colour features may also be used for page recognition. A major issue with using colour features in general purpose application domains is that of image (pixel) colour dependence on the colour temperature of the illumination sources, the ambient illumination such as shading from nearby objects, and the settings of the imaging device. In recognizing image-based pages one can generally assume that at least part of the typically white page background will be detectable in some of the images belonging to the same physical publication page. Given that assumption, white-balance normalization of the query publication page image can be performed before computing the query image descriptor, thus reducing or completely discarding the effect of the illumination source on the query image colours.

Measurements of colour information in a document page image can also enhance the distinctive power of locally computed image descriptors usable in a printed publication page recognition system, such as a) the colour gamut of the page or a page region established around an interest point b) the minimum and maximum colour of the page or a page region established around an interest point

c) measurements of the distribution or other statistics of colours of the page or a page region established around an interest point d) colour gradient measurements of the page or a page region established around an interest point

e) colour measurements at specific locations relative to the page or a page region established around an interest point

A typical interest point may, for example, be the corner of a page.

Additionally, a number of colour features may be used as image descriptors such as locally computed colour histograms, locally computed colour moments and measurements from uniformly coloured regions or edges in the image.

Such measurements can be appended to, or otherwise combined with grayscale luminance features in a recognition system to produce a richer description of the image content of the query and increase matching accuracy. Colour information can be incorporated into image signatures by computing gradient orientation histograms at detected interest points where the gradient is computed in a specially selected colour space instead of the gray intensity. The advantage of such colour features is that they can be easily combined with grayscale features in a recognition system to enhance recognition performance (an improvement of 7-8% is reported in some experiments).

The disadvantage is increased computation (three luminance bands instead of one) and storage cost (as the number of local features per page is generally higher than the number of grayscale features from the same image).

Colour image information in a printed publication page recognition system can also be used to detect the regions corresponding to text, where such regions will have two or in any case few distinct colour modes or clusters in a colour space. In contrast, the colour density function in page regions with colourful graphics will generally have a more complex modal structure.

Colour information can further be used to order (rank) a set of possible matches determined by the page recognition algorithm in order to help select matches most similar to the query. In a user interface, pages with the most similar colour content would appear more highly ranked than other pages. Colour information can further prune the search space of colourful pages such as publication cover pages to a subset of pages with similar colour content. A crude, whole- image, colour matching algorithm such as a colour histogram match or similar reduces the candidate match set significantly. The match algorithm is as follows: a) compute colour descriptor (e.g. histogram) of all pages of interest and store it in a database

b) compute colour descriptor (e.g. histogram) of query page

c) match query colour descriptor with database descriptors assigning a match score to each query - database comparison pair

d) select the most similar (in terms of a defined similarity measure) database descriptors given a suitable similarity metric threshold (depends on the application)

e) the corresponding documents to the selected colour descriptors are the pruned candidate document set

Page Recognition Optimization The above described methods of discovering the viewed page 10 (see Figure 10) which best match the image captured on the viewing device 100 can be potentially slow if the number of reference documents or reference pages 210 in the page server reference database 250 is large. In order to speed this process up, various optimization strategies can be used, either separately or in combination. Searching the most popularly referenced magazines first will tend to cause good matches to be found earlier for most searches. The determination of the most popular magazines can be done across the entire magazine database, or it can be refined by characteristics of the user, such as location, age, gender, interests, affiliations and so on.

Where a server stores a user's browsing history, or has access to data from other sources about magazine purchases, or the user's reading patterns, the viewing device 100 may use such information to optimize searching. The categories of magazines which have been purchased or read in the past by the particular user may also be used to determine the more likely magazines for the future. For example, if a user is known to have purchased or read one or more titles of women's fashion magazine, other women's fashion magazines may be sorted for early searching. Similarly, if a user has accessed web sites or is a member of on-line groups related to science, then magazines focused on science can be sorted for early searching.

Where a server stores a user's detailed browse history, and that history shows the user has accessed a particular magazine title over several issues, association rule learning can be used to predict the likelihood of the user referencing the latest edition of that title, and allow sorting of the search sequence to optimize the likelihood of a quick search result.

These same strategies can be applied to page searches, for example by searching the most popular pages first. Similarly, the user demographic or other information mentioned above can refine the search ordering of the list of pages. All of the above "extra information" can disambiguate the choice using the methods described above. Heuristic methods based on experience can be used to weight the importance of the different criteria to determine an overall search priority.

Searching is quicker when spread across multiple servers and done in parallel. This can be achieved if the magazines which are mostly likely to be accessed by particular groups of users are spread across multiple servers. For example, page data for fashion magazines, magazines commonly referenced in a specific geographic location, or magazines which appeal to specific age groups should be spread across multiple servers. Similarly, pages from individual magazines can be spread across multiple servers to improve the probability of a quick hit. Alternative Methods for Page Recognition and Tracking

Alternatively or in addition to the view finder module 130 (see Figure 10) described above, page identity and/or relative pose of the viewing device 100 (such as a projection transform) can be determined and tracked with the following techniques:

Tracking User Movement A system where an image of the user's head image (and/or other environmental features) is captured with a user-facing camera on the viewing device, and the image is used to extract image features which are in turn tracked to estimate relative pose of the viewing device between two frames. Such images are typically consecutive or near consecutive frames from a captured video sequence. By estimating relative viewing device pose change with respect to the largely static scene, an estimate can be made for the change in the camera view of a printed page viewed through a separate page-facing camera, thus improving tracking reliability. In one embodiment, the viewing device is a smartphone 100. The user-facing camera tracks the user's head while the page-facing camera of the smartphone images a printed page 10. Alternatively or in addition to the above, the user's eye movement can be tracked for the same reason. Other examples of image features that can be tracked are the visible edges and/or corners of the viewed page 10. Tracking Viewing Device Movement using Hardware Sensors

A viewing device 100 with an accelerometer or gyroscope or other similar hardware sensor provides motion data used to identify relative camera motion with respect to a given reference pose. For example, an implementation where the view finder module 130 described above is used to estimate an initial page identification and projection transform. Then the viewing device 100 position relative to the page 10 is tracked using measurements from the device's hardware sensors until the view finder module 130 is again required to identify a new page 10 and projection transform. Information from such sensors may also be incorporated into the operation of the view finder module 130 and conductor module 160 as described previously in relation to Figure 12. Tracking Substrate Properties

A substrate with a 2-D pattern of Netpage tags (machine readable encoded data), or other encoded tagging system, can be visually tracked by the viewing device 100 to establish the relative position and location of a viewed page 10. Other printed tracking artifacts (for instance on personal documents or blank notebooks) can also be visually tracked to that effect. Additionally, unique page (paper and/or additive) texture can be tracked to establish a particular location within the document.

Network Location of Servers

The server 20 shown in Figure 10 must provide data and services. As discussed above, the server 20 need not be a single server. A server system 20 with several interconnected servers can provide the necessary data and services in a number of different locations and forms. Some embodiments with specific server and database separations are described below. The server system can have the various server software modules in separate servers, such as having the view finder analysis module 180 and page recognition analysis module 170 in different servers. Furthermore, the server system can divide the workload for each match request across several servers. For example, a search of 40,000 reference pages 210 can be performed by four separate servers that consider 10,000 reference pages each. Optionally, the server system 20 can employ both the alternatives above by distributing the server software modules across separate servers and sharing the processing workload between different sets of the servers.

Page match processing, performed by the page recognition module 1 10 as described above, is preferably performed by multiple servers. Multiple servers are used for two types of parallelism.

In the first type of page match parallelism, sub-sets of the page server reference database 250 are assigned, by rows, to multiple servers. Each such server creates a page recognition feature index 200 for its assigned row sub-set. When a match request 260 is made to the server system, the request is broadcast to all page match servers which each attempt a match as described above and each will return a set of potential matches as describe above. A single control server pools the returned matches and by comparing their confidence and other match parameters selects a set of match responses 280 to return to the viewing device 100. By this mechanism the size of the page recognition feature index 200 (which as described above resides in server memory) can be limited to each server's available memory.

In the second type of page match parallelism, each of the above page servers configured with a sub-set of page server reference database 250 rows is instantiated as multiple load-balancing servers as is known in the art. Each page match request 260 is assigned to one instance of the pool of load balancing servers.

Content requests 290 supply the viewer device 100 and other devices 340 with content from the page server reference database 250, the shared clippings database 330 and the media database 350. Each of these data sources may be provided by a different server, and multiple load balancing servers may be used within this.

Clipping requests 310 may be processed by a separate server. In some embodiments, one or more of the above servers may be located on the view device 100 itself. A server may also be located in an electronic device embedded in the viewed substrate 10. In particular, the viewed substrate 10 may be printed media that is self describing by supplying row information of the page server reference database 250, or a sub-set of the row information. The servers may also be located locally, thus not requiring Internet connection. A combination of local and remote servers may be used. For example, some computers may be accessed via the Internet and others may be local, with these differently located computers working together to provide a single service function.

Access to any server computer is by simple network protocols, and may employ network tunneling to achieve better security.

Origins of Augmentation Information

Referring again to Figure 10, augmentation 220 is information associated with a digital twin 107 that informs the viewer application 190 to enable interactive features relating to the tracking of that digital twin. The viewing device 100 presents a view to the user which is based on the content 13

(see Figure 6) on the viewed substrate 10, but includes options for interaction between the user and the content 13. The interaction is provided by the viewer application 190 on the basis of augmentation 220 paired with the digital twin 107 and therefore also with the viewed substrate 10. The origin of the augmentation 220 information will now be outlined. Augmentation Automatically Derived from the Digital Twin by the Viewer

When a substrate 10 is matched to a reference page 210 by the server system 20, the viewer application 190 receives a copy of the reference page 210, shown here as a PDF file, and stores it in the viewer page information 131. At this time the viewer application 190 may examine the digital twin 107 and directly synthesize augmentation information based on the content. One case of such automatically derived augmentation is a clip-able image region based on images within a PDF file. Similarly for a clip-able text block. Text may also include recognizable strings which cause augmentation such as hyperlinks to be synthesized. Numbers which are preceded by key strings such as "ring", "phone", "tel", etc. are particularly identified as telephone numbers. Also depending on options selected, the telephone number to be called is augmented with supplied country or area codes which are not part of the visual content, but which become part of the detail of the related interactive content. They are highlighted on the display 105 (see Figure 9) of the viewing device 100 to indicate to the user that they are interactive content. When the user touches them, a message is displayed allowing the user to confirm their intention to initiate a telephone call to that number.

Similarly the viewer application 190 may automatically scans the content to identify URLs. These can be identified as text of the form "xxx.companyname.com", or "rn.companyname.com", or similar strings with other country domain names, or strings commencing with the characters "http://" or "ftp://" or "mail:" etc. Some document formats which may be input to the authoring tool support the inclusion of URL. For example, PDF documents can contain hyperlinks. These hyperlinks gathered from both of these methods are highlighted on the display of the viewing device 100 to indicate to the user that they are interactive content. When the user touches them the related web page is opened in a browser on the viewing device 100.

Augmentation Automatically Derived from the Digital Twin During Offline Preparation

All of the augmentation information described above as being performed by the view device 100 may also be synthesized offline in a preparation step and stored in the augmentation 220 information in the page server reference database 250 and delivered to the viewer device 100 in response to a successful match request 260. This is a preferred method as it allows for a verification of the automatically synthesized augmentation 220 and relieves the viewer device 100 of the processing load. Interactivity is typically defined in web pages using HTML or similar formats, or electronic documents using PDF and other file formats. The interactivity in a digital twin 107 is defined as hyperlinks and other clickable elements with associated actions which are associated with graphic elements of the graphic content. Where these web pages, PDF files or other file formats are uploaded to a page server 20, the same files are processed and all, or selected, interactive content is collected and uploaded to the server 20. Offline preparation also allows algorithms to be applied for the synthesis of augmentation that may be impractical or at least less desirable to perform by the viewing device 100. In one case, as the digital twin 107 is prepared and depending on options selected, the authoring tool automatically scans all text for matches to Wikipedia topic text. When a match is found, a hyperlink to the Wikipedia page for this text is inserted as interactive content. These hyperlinks are highlighted on the display of the viewing device 100 to indicate to the user that they are interactive content. The authoring tool provides a method for displaying these hyperlinks differently from hyperlinks mentioned in the previous paragraph so that the user understands where the link will take them. When the user touches them the related Wikipedia page is opened in a browser on the viewing device 100.

All or some of the interactive content can be created by automatic processes by using methods which include the following:

• Metadata which is added to a document and subsequently interpreted as interactive data, collected and written to the server 20. By way of example, comments added to a PDF document may contain the information required to define the interactivity associated with graphic objects in the PDF file.

• Specific formatting of visible graphic elements and/or specific content of visible graphic elements may be interpreted as an indication of an interactive action associated with the element or elements. During a processing step after document creation, these graphic elements are interpreted and interactive content is created and associated with them. By way of example, whenever a magazine subscription icon is found in the source document, a link to the Publisher's subscription site's URL can be added as interactive content associated with that icon.

• Scanning the document for words and phrases which also appear as a title of an article title in Wikipedia. These words and phrases are enabled as interactive content which triggers Wikipedia lookup. • Scanning the document for groups of digits which conform to patterns for telephone numbers. Where these groups of numbers are preceded by words such as "phone", "ph.", "call", "contact", "ring", etc. this adds confidence to this choice. These numbers are tagged as phone numbers and are enabled as interactive content which activates the viewing device 100's "call function" on this number.

• Scanning the document for strings of characters which conform to patterns for web addresses. These strings are tagged as a web addresses and are enabled as interactive content which hyperlinks to the web address.

• Scanning the document for strings of characters which conform to patterns for email addresses. These strings are tagged as a email addresses and are enabled as interactive content which launches an email composition and sending function either locally on the viewing device 100 or using web mail.

• Scanning the document for images which match the logos of companies. These images are tagged as company logos and are enabled as interactive content which hyperlinks to the related company's web site.

• Scanning the document for specific images or icons and modifying the meaning of text co-located with that image or icon according to the image or icon. The icon and/or text are then converted to interactive content based on both the icon and text. For example, if a specific icon is defined to indicate that the associated text represents the name of a specific company's product, then the icon and text can be converted to a hyperlink which directs the user to that company's product information web page for the product, possibly including a means of directly purchasing that product.

Automatic interaction generated based on standard content of the publication does not require additional effort from the publisher to create it. Instead, these functions are always available.

Explicitly Authored Augmentation using Authoring Tools

All the above described augmentation 220 and more, as described below, may also be explicitly authored using an authoring tool and uploaded to the server 20.

Authoring tools provide a means for a content author to create interactive content and associate it with graphic data for use by the interactive viewing system 2. The interactivity authoring tool, in one embodiment, is a stand-alone tool which adds interactive content to an already prepared graphic layout. The authoring tool may be either web or PC based.

The interactive content authoring function, in another embodiment, is integrated into the general document authoring system which is used to prepare the graphic content of the document.

Where the final graphic content is produced from the combination of multiple sources of graphic content, such as in a commercial print-ready graphic preparation system, the interactive content is added as part of the process of combining the various sources of graphic content. In any of the above variations of the authoring tool, a Netpage pen (described in greater detail below) may be used to add graphic information, or control the operation of the process.

The interactive content is defined using one or more of the following methods:

• Provided interactively by the content author as they operate the authoring tool.

· Provided in a document or file which is separate from the main content file, but which references the main content file, references locations on the main content file, and references objects within the main content file.

• The document edit history may be interpreted as instructions to the authoring tool to add interactive content. For example, the instructions to the authoring tool can be added to the document, but deleted as a last step prior to processing by the authoring tool, so that it can find these instructions in the edit history.

Types of Augmentation and Associated Interactivity

Referring to Figures 37A and 37B, when the user places the viewing device 100 over an interactive element 758 or page location that has been flagged as a video during page authoring, the video 760 will immediately start playing in the location on the viewing device 100's display screen 105 of that interactive element 758. The authoring tool provides options to select the automatic playing of the video to be silent or with sound. Typically, the interactive element 758 will be a still image 762 from the video 760, but any other identifiable graphic can be used. If the user moves the viewing device 100 the playing video 760 shifts on the screen 105 to follow the interactive element 758 within the view. That is, the position of the playing video 760 remains static with respect to the remainder of the digital twin 107, and appears static with respect to the viewed page 10, despite movement of the viewing device 100 over the page. If the interactive element 758 moves out of view, the playing video 760 moves off the screen 105. Optionally, the playback may continue, reset or pause, depending on user and publisher preferences, such that if the video interactive element 758 again comes into view, the time sequence of the video 760 may be observed to have continued, restarted or paused while the video was out of view.

If the user touches the display screen 105 in the location where the video 760 is playing, the video will restart and switch to playing full screen with sound enabled. If the user touches the display screen 105 while the video 760 is playing full-screen, volume controls and time controls appear, allowing the adjustment of the playback volume and scrubbing to any place in the video. While in full screen playback mode, if the user moves the viewing device 100 away from over the still image 762 to a different location on the publication, the video 760 is interrupted and the camera video view or another digital twin 107 is again displayed. If the viewing device 100 is pointed away from the publication, the video 760 is not stopped, but can be stopped by a 'Done' button next to the volume control. On viewing devices 100 with adequate processing power, all interactive elements 758 flagged as videos shown on the display screen 105 play simultaneously. On viewing devices 100 with inadequate processors, only the video 760 associated with the interactive element 758 closest to the center of the display screen 105 is played.

Referring to Figure 38, a gallery interactive action is specifically associated with some interactive element 758 on the page 10 by the authoring tool to directly permit display of a gallery of images 764. This enables display of a series of related photos, for example multiple views of the same object, scene or product, or views of related objects, scenes or products. Typically, this interactive content is indicated by a graphic element, which may appear in the digital twin 107 only on the screen 105 of the viewing device 100, or may appear similarly (although not necessarily the same graphic element) on the viewed page 10 and the digital twin 107. When the viewing device 100 is moved over this interactive element 758, the viewing device 100 immediately starts to show in that location a 'slide show' of the gallery of images 764.

When the user uses a sliding gesture over the interactive element 758, the gallery of images 764 is temporarily altered by one or more images in a forward or backward direction, with the physical length of the sliding gesture relating to the number of images skipped or backtracked. When the user touches (without sliding) this location, the slide show 764 becomes full screen, using all the functionality of a full-function interactive picture gallery application. The authoring tool provides options to permit the content author to select whether the slide show 764 automatically starts to play as the viewing device 100 is moved over the related interactive element 758, or whether the user must touch the interactive element on the digital twin 107 to start the playing. One or more of the images in the gallery of images 764 may be a video or animated image, typically quite short, which will automatically play when that "image" is being shown.

An RSS (rich site summary) interactive action is specifically associated with some location or graphic element on the page by the authoring tool to directly permit display of an RSS feed. This enables display of dynamically changing text, such as news updates or stock market updates. Typically, this interactive content is indicated by a graphic element, which may appear only on the screen 105 of the viewing device 100, or may appear similarly (although not necessarily the same) on both the viewed page 10 and the digital twin 107. When the viewing device 100 is moved over this interactive element 758, the viewing device 100 immediately starts to show the RSS feed contents in that location in the digital twin 107. When the user touches this location on the screen 105, the display of the RSS feed becomes full screen, using all the functionality of an RSS feed-capable browser. The authoring tool provides options to permit the content author to select whether the RSS feed automatically starts to play as the viewing device 100 is moved over the related interactive element, or whether the user must touch the location on the screen 105 to start the playing.

A music interactive action is specifically associated with some location or graphic interactive element on the page by the authoring tool to directly permit playing of music or sound. By selecting appropriate options when using the authoring tool, the content author can elect to have the associated sounds start to play automatically when the page location comes into the camera view, or a graphic interactive element can be shown, either on the digital twin 107, or on both the viewed page 10 and the digital twin 107 on the display screen 105 which the user must touch to play the sounds. When playing sound, a volume controller appears to permit the user to control the replay volume, or a volume controller is added to the applications settings menu to permit volume control. Depending on options selected when the content is authored, the sounds may cease to play when the location or graphic interactive element is no longer shown on the touchscreen 105 or it may continue to its conclusion unless interrupted by the user.

An animation interactive action is specifically associated with some location or graphic interactive element on the page by the authoring tool to directly permit display of an animation. An animation consists of a set of visual objects (sprites) which follow motion paths. The motion paths specify the object position, size, rotation, transparency and other characteristics as they change over time. The motion paths can be preprogrammed, or can respond to user input. Motion paths can also repeat, and be grouped hierarchically. This enables the definition of motions such as walk cycles. The content author can use the authoring tool to select whether the animation will start automatically when the location or graphic interactive element comes into view on the screen 105, or whether this animation is associated with a graphic interactive element, which may appear only on the digital twin 107, or may appear similarly (although not necessarily the same) on both the viewed page 10 and the digital twin 107 shown on the screen 105 which must be touched by the user to start the animation. The authoring tool supports position information of the animation which is relative to the page 10 or which is relative to the screen 105. When relative to the page 10, the viewing device 100 can be moved to make the animation disappear off the edge of the screen 105. In this latter case, the viewing application 190 (see Figure 10) provides methods for the user to terminate the animation. The authoring tool provides an extendable library of 'clip animations' which can be accessed easily by the content author, and which can be added to from third party or other animation suppliers.

A reading interactive action is specifically associated with an interactive block or region of text by the authoring tool to directly permit automated reading and simultaneous highlight of the words in the related text as they are being read. The spoken text is downloaded from the page server 20 as it is played. Simultaneously identification of the word being read at each point in time is also downloaded. The viewing application uses this information to highlight the word as it is read, where the highlighting method is selected by the content author using the authoring tool. The content author can use the authoring tool to select whether the reading will start automatically when the interactive block or region of text comes onto view on the screen 105, or whether the block of text must be touched by the user to start the reading. In this case, the author is expected to add a standard graphic to indicate to the user that the content is interactive in this way. This type of augmented content is provided to permit the text to be read by a human reader. Therefore, the authoring tool uses voice recognition functionality to identify the spoken words, match them to the individual words in the related text and create the word timing data. Alternatively, when no recorded voice is available, the authoring tool uses voice generation functionality to generate the voice track and the word timing information, but this is not the intended mode of operation (as it overlaps functionality provided within the viewing application 190).

A coupon interactive action is specifically associated with some location or graphic interactive element on the page 10 by the authoring tool to directly permit the transfer of stored value, or virtual coupon, from the publication's publisher to the user. Typically, this interactive content is indicated by a graphic interactive element, which may appear only on the screen 105 of the viewer device 100, or may appear similarly (although not necessarily the same) on both the viewed page 10 and the display screen 105. When the user touches this location on the screen 105, the page server 20 sends to the viewing device 100 a package of data which the page server 20 can subsequently recognize if it is returned by the viewing device 100 to the page server 20 during a shopping interaction. Prior to supplying the coupon, the page server 20 may optionally request authorizing information such as a user's name, age, password, account details or any other information. If the information is already stored on the viewing device 100, the information is transmitted automatically. If it is not stored, the viewing application 190 will request it from the user. If the information does not meet the requirements of the page server 20, it may choose to refuse to supply the coupon. The coupon may contain any information, encoded or plain, which the page server 20 can subsequently recognize. Each coupon has a name, which is viewable even for encrypted coupons, and an optional numeric value. The viewing application 190 can display, under user control, a list or summary of stored coupons, including a total of their associated numeric values. A shopping interactive action is specifically associated with an interactive element in the form of some location or graphic element on the page 10 by the authoring tool to directly permit efficient purchase operations to be initiated by the user. Typically, this interactive content is indicated by a graphic element, which may appear only on the digital twin 107 shown on the display screen 105, or may appear similarly (although not necessarily the same) on both the viewed page 10 and the digital twin 107 shown on the display screen 105. When the user activates this interactive content, the viewing device 100 either activates a shopping application, which may be specific to the publication or may be generic, or opens a web page in a web browsing application. When the shopping application or web page is activated, information encoded in the specific interactive action is sent with the activation command to cause the shopping application or web page to optimize the purchase process. This information may be the product code, a special offer code, the color or style variant of the specific product, or some other information. In addition, information relating to the user or the user's viewing device 100 can also be sent. This information may be the user's name, location, brand or model of viewing device 100, subscription status for the publication, pre-selected general or specific preferences, prior purchases, account names or numbers, "certificates" for special offers or discounts, details of previously stored coupons, or any other commercially meaningful information which can be stored on the viewing device 100 or accessed by it. If a page server 20 accepts a coupon, it is the responsibility of the page server 20 to ensure that it is valid, and where appropriate, has not been already used. The page server 20 may instruct the viewer application 190 to destroy the coupon.

A download interactive action is specifically associated with a location or graphic element on the page 10 by the authoring tool to facilitate downloading of data to the user's viewing device 100. When the user touches the location or graphic element shown in the digital twin 107 on the screen 105, the viewing device 100 starts download functionality to receive the data from the page server 20.

A forms interactive action is specifically associated with some location or graphic element on the page by the authoring tool to permit the user to fill in answers to on-screen questions which are then uploaded to the page server 20. When the user touches the location or graphic element shown in the digital twin 107 on the screen 105, an on-screen keyboard with layout appropriate to the type of data required by the question appears. When the user completes the answer, or group of answers, the data is uploaded to the page server 20 where it is either stored or acted on. It is not a requirement, but the authoring tool permits the form to appear similarly on both the viewed page 10 and the display screen 105 such that the on-screen keyboard appears when the user touches the answer box for the form, or to appear only on the viewing device's 100 display. The authoring tool also supports the supplied data being formatted as an email for transmission to a given email address. Depending upon previously saved user settings, user data may be pre-filled automatically.

A vote interactive action is specifically associated with an interactive element in the form of some location or graphic element on the page by the authoring tool to permit the user to quickly answer a multiple choice question and have all users' answers efficiently counted and displayed. The authoring tool permits the author to separately design the format of the multiple choice question and the format of the counted answers which is displayed after the user answers the question. A comment interactive action is specifically associated with an interactive element in the form of some location or graphic element on the page 10 by the authoring tool to permit the user to quickly add a short text comment to a comment stream which is maintained by the page server 20. The authoring tool allows the content author to format the display of the comment stream on the viewing device 100. The comment stream may be always visible, or its display in the digital twin 107 on the screen 105, may be activated by the user touching some graphic element on the screen 105, inserted and activated by the author using the authoring tool.

The authoring tool also supports creation of multi-layer content. The multi-layers are alternate views of the viewed page 10 and provide alternate content. For example, the basic printed, viewed page 10 and first displayed image in the digital twin 107 may contain a Sudoku puzzle. The next alternate layer in the digital twin 107 may contain extra numbers filled in on the puzzle. Subsequent layers may contain progressively more filled in numbers, until the final layer displays the completed puzzle. This can be used for providing hints and answers to crosswords, quiz answers, children's puzzles such as findwords, find the difference, and so on. According to the parameters selected by the content author using the authoring tool, the user touches a graphic element on the screen 105 to display the next layer, for example - a "Hint" button in the case of games Optionally, the display of subsequent layers can occur automatically after a selected time. Subsequent layers can replace the underlying content, or can be "transparent" and overlay the existing content. Selection of multiple layers can also be controlled based on other information available in the viewing device 100, such as the user's location, demographic parameters or subscription status. In such cases, the content author can elect to display the selected information automatically, or in place of the original layer. These display options are selected by the content author using the authoring tool. This enables things such as geographically based advertising, for example showing the readers nearest retail outlet. Any layer can contain any of the types of interactive content described in this section. This contained interactive content can be the same or different from preceding layers.

The authoring tool also supports the creation of augmented reality scenes based on the view of the page 10, typically a magazine. It also allows launching of functions such as augmented reality based on other views from the view facing camera 102 (see Figure 7) or the user facing camera 108 of the viewing device 100 or other sources. Other functions launched by viewing a page 10 may include single user games, puzzles and multi-user games.

Three types of automatically generated content, Call, Hyperlink and Wikipedia can also be explicitly generated and tailored by the user of the authoring tool. This permits the content author to create the type of graphic based on artistic considerations differently for each case within a viewed publication or page 10 as desired.

Transferring of Images of Printed Pages to the Page Server

Reference images 210 of pages 10 are loaded onto the page server 20 (see Figure 10) using one or more of the following means. · Reference images 210 pages 10 can be uploaded to the page server 20 directly using a computer network, or copied via removable computer storage medium.

• Printing or publishing companies generally use a pipeline of printing operations to create the print-ready images for publications from input supplied by graphic artists or other content suppliers. At an appropriate point in this pipeline, reference images 210 of the pages 10 to be printed are diverted and uploaded to the page server 20, either automatically, semi-automatically or with manual intervention.

• Reference images 210 are uploaded directly from a device driver which provides the interface of a printer driver, but which writes its output to the page server 20 either directly, over a network protocol, or other indirect method.

· Reference images 210 are uploaded to the page server 20 under control of a web-based application.

• Reference images 210 are uploaded to the page server 20 by a smart printer when that printer receives them for printing. They are either directly uploaded, or they are loaded onto some form of removable media on the printer and then transferred to the page server 20 using that removable media.

• Reference images 210 are uploaded to the page server 20 by a smart copier when that copier scans them for printing by the copier, or when they are scanned for a specific upload function performed on the copier. They are either uploaded directly or they are loaded onto some form of removable media on the copier and then transferred to the page server 20 using that removable media.

• Pre-printed documents or legacy documents without existing digital versions are scanned. The scanned images become reference images 210 subsequently uploaded to the page server 20. Typically, they have optical character recognition (OCR) processing performed on them before uploading to assist with later operations on these images.

Displaying Interactive Content and Receiving User Feedback

Means of Displaying Interactive Content Options

When the viewing device 100 is positioned such that interactive content becomes active, there are many options for indicating to the user that interactive content is available. Among these are: the appearance of an icon on the digital twin 107; visually enhancing the text of a hyperlink 117 (see Figure 9), for example by underlining, changing its color, changing its font, font style or font size, changing its text background, making it flash, cycling its appearance through a number of the above options, or changing some other aspect of its typesetting; in the case of a video link, playing the video in the location of the link in the viewing device 100; displaying an option in the menu bar of the viewing device 100, or other menu function of the viewing device 100; highlighting the interactive object or region with a static highlight, or with a dynamic highlight; playing an animation near, within, covering or surrounding interactive object or region; playing a voice informing the user of sufficient detail of the interactive content to allow them to activate it, that voice being either synthesized or pre-recorded; or playing a voice recording of a question asking the user if they wish to activate the interactive content.

User Feedback on Decisions Relating to Interactive Content

When a user activates some interactive content it is usual, but not required, to give the user some feedback that the content has been activated. This can be achieved in many ways, including: changing the execution context or visual context of the viewing device 100 to the newly activated context; playing an audible sound, such as a click or beep; enhancing the color or brightness of a region on the viewing device 100, for example, making it brighter, where that region may or may not be directly related to the interactive content; muting the color of non-selected items on the viewing device 100; in the case of a text hyperlink, modifying the text by changing its color, changing its background, changing its font or font size, making it flash, cycling its appearance through a number of the above options, or changing some other aspect of its typesetting; applying a geometric enhancement to a region of interest on the viewing device 100, for example, enlarging it, making it appear to move or shake, rotating it either in the plane of the view or through the plane of the view, or other transformation; using a vibration or other haptic feedback on the viewing device 100; playing a voice message, either synthesized or pre-recorded; when the viewing device 100 is a 3D device, bringing the region of interest forward in the view; playing an animation on the viewing device 100, either overlaying the region of interest or elsewhere; popping up a menu in the viewing device 100, where that menu gives access to one or more options; displaying an option in the menu bar of the viewing device 100, or other menu function of the viewing device 100; or where the presence of the interactive content has been displayed in a menu, removing the menu item from the menu.

In the preferred embodiment, interactive content includes specifications of regions within a digital twin 107 that have features associated with them or are pre-defined clipping regions (described below). Within a single page or a multi-page spread, these regions may overlap or be nested. While the user tracks their view over the publication, the application determines at most one region which would be the target for a pre-defined clip at each point in time, depending on the current view, and may give the user on-screen feedback of the bounds of this region.

One method of determining which region, based on the view, is to determine all regions that intersect the middle of the user's view. If there are multiple such regions, for each region, the minimum of the percentage of the region that is visible, and the percentage of the screen the region covers is calculated. The region with the largest value (i.e. the largest minimum) is chosen.

Having established a region in this manner, a bounding outline of the region is animated into position as an overlay graphic over the tracked digital twin 107. The introduction of this overlay may be delayed until the user's view is seen to have low motion, that is, they are lingering over a part of the content.

Type of Viewing Devices

Referring to Figures 7, 8 and 9, the viewing device 100 for the disclosed interactive viewing system 2 is required to display interactive digital content 103 to the user. The viewing device 100 usually enables user-interactivity with displayed digital content and may, for example, comprise a touchscreen 105 or control buttons for interacting with the displayed digital content.

In the case where the viewing device 100 is a smartphone, the viewing device 100 performs the three functions of (i) sensing the substrate 10 via the camera 102; (ii) retrieving corresponding display data 103; and (iii) displaying the interactive digital content 116 via the smartphone's touchscreen 105. The viewing device 100 may integrate these three functions. Alternatively, the viewing device 100 may only perform the function of displaying the digital content 1 16, optionally in combination with the function of retrieving the display data 103. If the viewing device 100 is not configured for sensing the viewed substrate 10 (typically the case for non-handheld viewing devices), then the viewing device 100 may be connected to a suitable device for sensing the substrate 10. The sensing device may be connected to the viewing device 100 via a wired or wireless connection.

Likewise, if the viewing device 100 is not configured for retrieving the display data 103, then this function may be performed by a separate device. For example, an internet- enabled computer connected to the viewing device 100 may receive image data from the sensing device and retrieve the display data which is communicated to the viewing device 100. Alternatively, the sensing device may perform the dual functions of sensing the substrate 10 and retrieving the display data.

Some examples of different types of viewing device 100 are described below:

Smartphone

The smartphone 100 shown in Figure 7 is a type of handheld viewing device 100 comprising a mobile telephone transceiver, a view-facing camera 102, a processor and a display, which is usually a touchscreen 105. Smartphones may run one or more applications which configure the smartphone to operate in a particular way. A plethora of apps are available currently for smartphones, which turn the user's smartphone into a tool far beyond a simple communications device. The interactive viewer system 2 described herein is intended to be accessible to users having a corresponding viewer application 190 running on their smartphones (see Figure 10).

Smartphones may incorporate one or more devices such as a GPS receiver, a device for connecting to a local area network (e.g. Wi-Fi device), a device for connecting to a personal area network (e.g. Bluetooth device), a near-field sensor, RFID sensor, barcode reader, a user-facing camera, accelerometer(s), gyroscope(s), temperature sensor, magnetic field sensor, pressure sensor, chemical sensor etc. Many of these devices are already standard features of existing smartphones. It is anticipated that more of these devices will become standard features of smartphones in the future. Furthermore, processing speeds and internet connection speeds are anticipated to improve dramatically with the demand for more sophisticated apps. Likewise, camera resolution and speech-recognition capabilities of smartphones are anticipated to improve in the future. The scope of this disclosure is not intended to be limited to features of smartphones currently on the market.

Tablet Computer

A tablet computer is an alternative type of viewing device 100, which may be used in the interactive viewer system 2 described herein. A tablet computer is functionally equivalent to the smartphone and may have the same or similar features. However, a tablet computer typically has a larger screen. Figure 39 shows an interaction between a tablet computer 766 and the substrate 10. The tablet computer 766 displays rendered digital content to the user in a similar way to the smartphone 100. Notebook Computer

A notebook or laptop computer is an alternative type of viewing device 100, which may be used in the interactive viewing system described herein. Typically, a laptop computer has a hinged display screen and keyboard. Many laptop computers comprise an integral user-facing camera. In the context of the present disclosure, when a laptop computer is used as the viewing device 100, it typically operates in combination with a separate device for sensing the substrate. Such a sensing device may be connected to the laptop computer via a wired or wireless connection. Referring to Figure 40, there is a shown a sensing device 808 in the form of a mouse or a puck 768, which is in wireless communication with a notebook computer 1 11. Other types of suitable sensing devices will be described in more detail herein.

The mouse 768 comprises a high-resolution camera 102 for imaging the substrate 10 when held at a suitable height above the substrate. Hence, the mouse 768 can capture an image 14 of the substrate 10 in the same way as the smartphone 100. The mouse 768 communicates the captured image data to the notebook computer 1 11, which is itself in communication with the page server 20 via the Internet. Accordingly, the notebook computer 1 11 retrieves digital display data corresponding to the printed content of the substrate. The retrieved display data is rendered as digital content on a display screen of the notebook computer 111. The rendered digital content may be, for example, a virtual or augmented reality view of the printed content with interactive options available to the user.

Desktop Computer

Referring to Figure 41, a desktop computer 1 12 with monitor 114 is an alternative type of viewing device 100, which may be used in the interactive viewing system described herein. The desktop computer with monitor performs an equivalent function to the notebook computer 11 1 described above. Typically, the desktop computer 112 receives image data from a sensing device 808 in the form of a wired or wireless mouse 768 and sends corresponding interaction data to the page server 20 via the Internet. Display data retrieved by the desktop computer 112 may be rendered to the monitor 1 14 as digital content with optional interactive functions, as described above.

TV Display

Referring to Figure 42, a TV display 115 is an alternative type of viewing device 100, which may be used in the interactive viewer system 2 described herein. The TV display 115 preferably has Internet connectivity enabling communication with the page server 20. The Internet connectivity may be integrated into the TV display or the TV display may be connected to a suitable computer system via a wired or wireless connection. The TV display 1 15 functions similarly to the notebook computer 111 described above. Typically, the internet-enabled TV display 1 15 receives image data from a sensing device 808 in the form of a wired or wireless mouse 768 and sends corresponding interaction data to the page server 20 via the Internet. Retrieved display data is rendered to the monitor TV display 115 as digital content with optional interactive functions, as described above.

Head-Mounted Display

A head-mounted display is an alternative type of viewing device 100, which may be used in the interactive viewer system described herein. Figure 43A is a perspective showing a user 902 viewing a page 10 though a head-mounted display (HMD) 113. Figure 43B is a schematic plan view of the user 902 observing the field of view 910 through the HMD 113. Figure 43 C is the user's field of view 910 including the display screen 105.

The HMD usually takes the form of a helmet or goggles worn by the user 902. A centrally mounted camera 102 provides the sensing device 808 for the HMD 113. The camera 102 captures a video feed of the user's field of view 910 (which includes the page 10). A small display screen 105 is placed in front of either or both of the user's eyes and displays a computer-generated, virtual reality view or augmented reality view 912 to the user 902. Virtual reality substitutes the user's view of the physical environment with an artificial, or virtual view 912. Alternatively, the user 902 may experience augmented reality 912 by adding virtual imagery 914 to the user's actual view 910 of the physical environment.

Augmented reality typically relies on either a see-through HMD or a video-based HMD. A video-based HMD 1 13 uses video of the user's field of view 910, augments it with virtual imagery 914, and redisplays it for the user's eyes. A see-through HMD 113 optically combines virtual imagery 914 with the user's actual field of view 910.

In the context of the present interactive viewing system 2 (see Figure 8), a HMD 113 may be used as a virtual reality viewing device or an augmented reality viewing device 100. In one example of a virtual reality experience, the user views the substrate 10 via a see-through HMD 113. The HMD 113 has an integral image sensor 102 which captures images of the user's field of view 910. The network interface 120 communicates corresponding interaction data 101 to the page server 20 via the Internet. The network interface 120 need not be integrated into the goggles 113. A communication link 904 to a mobile device 900 such as a notebook computer or mobile phone acting as a relay can provide a more powerful processor 106 for the viewer application 190. The HMD 113 then retrieves display data 103 and displays digital content 116 to the user. This rendered digital content 116 replaces the user's actual field of view 910 so that the user sees only the virtual imagery 914. Replacement of the user's actual field of view 910 with the rendered digital content 1 16 may be imperceptible to the user.

In another example of a virtual reality experience, the user views the substrate 10 via an opaque HMD 113 which displays images captured by the camera 102. As described above, the opaque HMD 113 renders and displays digital content 116 corresponding to the content in the camera field of view 14. Again, replacement of the camera field of view 14 with the rendered digital content 116 may be imperceptible to the user.

Figure 43C shows an augmented reality experience. The user views the substrate 10 via a see-through HMD 1 13. The user's field of view is then augmented with virtual imagery 914 displayed on the see-through HMD. This virtual imagery 914 comprises rendered digital content 1 16 augmenting the user's actual field of view 910 to provide an augmented reality view 912. This virtual imagery 914 augmenting user's field of view is generated from display data 103 retrieved from the page server 20 by the viewing device 100, as described in above in connection with Figure 8.

In another example of an augmented reality experience, the user views the substrate 10 via an opaque HMD 113 which displays images captured by the camera 102. The opaque HMD 1 13 renders and displays digital content, which augments the content in the camera field of view 14 displayed to the user. This virtual imagery 914 augmenting the camera field of view 14 is generated from display data 103 retrieved by the viewing device 100, as described in above in connection with Figure 8.

The digital twin 107 is displayed in response to recognition of the page 10 in the field of view 910. In this mode of operation, the digital twin 107 remains anchored to the display screen 105 as long as the page is in view (that is, it is sensitive to the presence of the page 10 in the field of view 910 but does not track its position in detail). A region 906 described in the augmentation 220 of the digital twin 107 may be selected and show with a highlight and become the subject to further commands 828 from the user. Methods of selecting and changing the selected region include the user bringing the page area to the centre of augmented reality view 912, moving a pointer on the screen by touching a pointing stick (isometric joystick) on the frame of the HMD, and moving a finger over the surface of the mobile device 900. Further commands such as clip and share 828, or select a different region can be invoked by the user by methods including voice recognition of simple commands and moving a pointer as described above.

Dedicated Document Viewer

In another alternative, the viewing device 100 may be in the form of a dedicated document viewer 1 18 as shown in Figure 44. In this scenario, the user has a device 118 specifically adapted for interacting with substrates 10 (see Figure 8). The device 1 18 may even be specifically adapted for interaction with a certain type of substrate, for example a particular magazine, newspaper, board game etc. Such a device has the disadvantage that users are required to own a separate device which is not a ubiquitous smartphone. However, there are several advantages to dedicated document viewers. For example, a magazine publisher may offer a dedicated document viewer as part of a magazine subscription or as a free give-away. The dedicated document viewer 1 18 may then serve as an incentive for users to purchase a particular magazine if it provides the user with interactivity not offered by competitor magazines.

Furthermore, the memory in the dedicated document viewer 118 can have an index corresponding to a number of magazines published by the same magazine publisher. This index may be updated by the user (e.g. via an Internet download) when a new magazine is published. Therefore, the dedicated document viewer 118 may not be reliant on a highspeed Internet connection for page recognition. This reduces latency and potentially provides an improved user experience of interactivity.

Of course, the dedicated document viewer 118 may communicate with a remote page server 20 for page recognition. In this scenario, the accuracy of page recognition is optimized because the document viewer 1 18 may be configured to communicate with one specific database known to the document viewer.

The dedicated document viewer 1 18 may have other features common to smartphones. However, it will be appreciated that the marketing potential of a dedicated document viewer 118 is leveraged by keeping its cost to a minimum. Therefore, the dedicated document viewer 118 may have a minimum number of components necessary for interactive viewing e.g. a display screen 105 (typically a touchscreen), a processing system, a sensing device and a memory.

As shown in Figure 44, the dedicated document viewer 130 may be branded 770 to show its association with a particular magazine title. Handheld Game Console

In another alternative, the viewing device 100 may be in the form of a handheld game console 135 as shown in Figure 45. A handheld games console usually comprises a display screen 105 and various control keys 772 specifically configured for playing games. More recently, some handheld game consoles have incorporated many of the features of smartphones, such as wireless internet connectivity, high-resolution camera, mobile phone transceiver and so on. Such hybrid devices are well suited for use as the viewing device 100 in the interactive viewer system 2 described herein.

It will be appreciated that the handheld game console 135 shown in Figure 45 may be used in place of the smartphone shown in Figure 7. The handheld game console 135 is typically equipped with a suitable sensing device 808 such as a high-resolution camera 102 for interacting with various substrates 10.

A particularly advantageous use of handheld game consoles 135 is the ability to download games thereto after recognition of suitable substrate 10. For example, a games merchant may advertise a new game in a magazine advertisement 774 and a user may be provided with access to this new game by interacting with the advertisement 774 in a manner described above in in relation to Figures 6 to 9. The game downloaded to the game console 135 is simply another type display data 103 which is retrievable by the user's viewing device 100, in this case, the game console 135.

For marketing purposes, the user may be provided with, for example, temporary access to the game, access to a trial version of the game or access to the game only whilst the game console 135 has the advertisement 774 within its field of view 14 (e.g. as an augmented or virtual reality experience of the game in the context of a magazine page 10). In this way, advertisements for games become a powerful means by which users can trial games and thereby encourage users to purchase those games. As part of a trial version of a game, the user may be provided with a "Buy Now" hyperlink 778 so that the game can be immediately purchased via a download to the game console. It will be appreciated that the handheld game console 135, when used as a viewing device 100 in the interactive viewer system 2 described herein, significantly improves the advertising and distribution of games to users.

Of course, any content may enable users to play games via the viewing device 100. An advertiser may provide games to users as a means for appealing to a particular demographic. For example, a soft drinks advertiser may have printed advertisements with interactive games that are accessible to handheld games consoles 135. A user may be able to play a game connected with the advertiser's product, and this potentially serves as a powerful advertising tool. Of course, the other viewing devices 100 described herein, such as smartphones and tablet computers, may enable similar game-playing functionality in connection with a printed substrate 10. Media Player

The viewing device 100 may be in the form of a media player 136 as shown in Figure 46. In its broadest sense, a media player encompasses devices such as smartphones and tablet computer which have media playback capabilities. However, the viewing device 100 may be any form of media player 136 provided that it has a suitable display screen 105 (e.g. touchscreen) for rendering the digital content 103 (see Figure 8) to the user. Some examples of media players include MP3 players, digital photo frames, portable video consoles, e-book readers etc.

Increasingly, media players are equipped with a high-resolution camera 102 and wireless connectivity so that they are highly suited for use as the viewing device 100 in the interactive viewer system 2 described herein.

As described above in connection with handheld games consoles 135, printed advertisements 774 may be tailored to users of certain media players 136, as well as more ubiquitous tablet and smartphone users. For example, when an advertisement 774 for a new film on a magazine page 10 is in the camera view 14, the user may view a film clip in on the screen 105 with the option of "Buy Film" as a hyperlink 778. Likewise, a printed advertisement for a music album may enable a user to listen to a music clip with the option of "Buy Album" as a hyperlink 782 displayed. Or a printed advertisement for a new book may enable a user to preview pages from the book with the option of "Buy Book" displayed as a hyperlink 780.

The media player 136 is typically equipped with a touchscreen 105 and wireless connectivity. The inbuilt sensing device 808 is a high-resolution camera 102 for interacting with substrates 10 in an analogous manner to the smartphone 100 shown in Figure 7. Projection Display

The viewing device 100 may use a projection display rather than an electronic display screen as previously described. Referring again to Figure 40, there was shown a sensing device 808 in the form of the mouse 768 with image sensor 102. The mouse 768 is in wireless communication with a notebook computer 11 1, which receives image data, retrieves corresponding display data 103 from a remote page server 20 and renders digital content (e.g. an augmented or virtual reality display) to the user via the integral display screen of the notebook computer 1 11. Referring now to Figure 47, the notebook computer 1 11 may be connected to a projector 145, which displays the rendered digital content onto a passive surface (not shown). Accordingly, a projected virtual or augmented reality display is viewable by the user or a wider audience.

Of course, it will be appreciated that the projector 145 may incorporate suitable means for receiving the image data from the imaging mouse 768 and retrieving the display data without the need for a separate computer, such as the notebook computer 1 11. Visual Display of Content

The displayed digital content rendered from retrieved display data may be displayed in a number of different ways.

Augmented Reality Display of Live Video Image

Referring to Figure 48, the viewing device 100 displays rendered digital content to the user as an augmented reality display of a live video image. In this scenario, the user views a live video image 784 of the substrate 10 augmented with computer-generated content 786, based on retrieved display data 103 (see Figure 8). Typically, the computer- generated digital content 786 is graphics (still graphics or video graphics 760) rendered to the display screen 105 of the viewing device 100 based on the display data 103. However, the digital content 786 may alternatively or additionally include other media, such as audio.

Typically, the digital content 786 providing the experience of augmented reality is associated with a particular location or page element 758 of the substrate 10. Therefore, the digital content 786 may change as the user moves the sensing device (typically integrated with the viewing device 100) across the substrate 10. By way of example, when a user views a newspaper page containing different articles, the live video image of the page may be augmented with readers' comments or a box indicating a number of readers' comments associated with a particular article. These comments may be derived from an online version of the same article, previous users viewing the printed article via their viewing devices 100 or both. Since the comments box is associated with a particular article, the box will appear and disappear from the display screen 105 depending on whether the sensing device 100 has a particular article within its field of view.

In another example, a video may play when the sensing device 100 has a particular zone within its field of view. Video playback may be initiated automatically by the viewing device 100 or in response to user interaction with a video playback icon 104 (see Figure 9) displayed to the user. In another example, a user may be able to play a game via the digital content augmenting the live video image.

As will be appreciated, the digital content providing augmented reality is typically interactive via the viewing device 100, for example via the touchscreen 105 of a tablet computer 766 (see Figure 39), via the control buttons of a handheld games console and so on. In the newspaper example above, the user may have the option of reading comments associated with a particular article by tapping on the comments box. Further, the user may be provided with the option of leaving his or her comment so that it is then associated with the printed article (and/or corresponding online article) on subsequent viewing. In this way, it will be appreciated that the interactive viewing system 2 can provide a printed publication with a similar degree of interactivity to an online publication via digital content augmenting a live video image.

Virtual Reality Display In another form, the viewing device 100 displays rendered digital content 786 to the user as a virtual reality display in real-time. Virtual reality display is reliant on accurate page tracking. Typically, the user first views a live video image of a page 10 via the display screen of the viewing device 100, which may be the smartphone 100 incorporating the camera 102. Once the page 10 is recognized and the relative location of the viewing device 100 has been determined, the displayed live video image 784 is replaced with retrieved digital content 786 providing a virtual reality display of the portion of the page 10 within the camera's field of view. Movement of the viewing device 100 relative to the page 10 is tracked via the captured live video image 784 so that the displayed digital content maintains the experience of virtual reality for the user. The virtual reality display may be an orthogonal virtual reality (described below) or a perspective virtual reality.

Orthogonal Virtual Reality Display

Orthogonal virtual reality displays render digital content to the user based on the assumption that that an optical axis of the sensing device 100 is orthogonal to the plane of the viewed page 10. In other words, the appearance of the digital content displayed on the screen 105 of the viewing device 100 does not change even when the camera 102 or other sensing device is tilted relative to the page 10 - the rendered digital content 116 always appears in the same plane as the display screen 105 (see Figure 9). Orthogonal virtual reality has the advantage of being computationally relative simple to implement, although it has the disadvantage of being less realistic from the user's perspective when the viewing device 100 is tilted relative to a viewed substrate 10 such as a printed page. The user interface of the viewing device 100 may prompt or encourage the user to hold the viewing device 100 so that the plane of the display screen 105 is parallel with the plane of the page 10. Not only does this assist in capturing higher quality images for page recognition and/or page tracking, but it also maintains a more realistic experience for the user when orthogonal virtual reality is implemented.

Perspective Virtual Reality Display

Perspective virtual reality displays rendered digital content 116 to the user taking into account the pose of the viewing device 100 relative to the viewed page 10, as shown in Figure 49. In other words, the rendered digital content 116 is displayed to the user as a keystoned projection depending on the relative pose of the viewing device 100. This provides a more realistic virtual reality experience to the user, because the display screen 105 appears, from the user's perspective, to act as transparent viewport onto the viewed page 10 regardless of its pose. In order to calculate the displayed projection of the digital content, the viewing device 100 must determine (or at least estimate) both the page-device pose and the user- device pose. The page-device pose may be determined by comparing features in the live video image with features in the reference image 210 of the viewed page 10 (see Figure 10) - keystoned features in the live video enable a projection transform and the device- page pose to be calculated. Typically, perspective virtual reality assumes that the user's eyes P_e are positioned on a normal N to the display screen 105 in order to estimate the user-device pose. Alternatively, a user-facing camera of the viewing device 100 may be used to determine the position of the user's eyes P_e (or head) relative to the display screen 105. Thus, a more accurate user-device pose may be available to calculate the displayed projection of the digital content 1 16.

In Figure 49, the viewing device 100 is tilted relative to the substrate 10 and a printed graphic element 788 on the substrate 10 is rendered to the display screen 105 as a projected image 790 in accordance with the estimated device-page and user-device poses. US Publication No. 2011/0292198 describes perspective virtual reality projections in more detail when using a smartphone as the viewing device 100.

Cross-Fading Virtual Reality Display

Advantageously, the viewing device 100 may provide a seamless transition between a display of the live video image and a virtual reality display of the digital content 116. This may be achieved by gradually fading out the live video image display 784 (see Figure 48) and gradually fading in the virtual reality display so that the change is virtually imperceptible to the user. From the user's perspective, the displayed content on the viewing device 100 appears to be identical regardless of whether the viewing device 100 is displaying the live video image 784 (having no interactive functionality) or the rendered digital content 116 (having interactive functionality). The user may receive an audible or visual indication via the viewing device 100 to indicate that the interactive digital content 1 16 is being displayed, without an obvious 'jerky' transition between the two. This smooth experience for the user is achievable via the accurate page tracking methods, which include an accurate determination of the orientation or pose of the viewing device 100 relative to the page 10. Combined Augmented Reality and Virtual Reality Display

Figure 9 shows an example of a virtual reality display augmented with interactive content. A simple virtual reality display of rendered digital content 1 16 inherently provides the user with a richer experience of a printed page 10 or other substrate compared with simply reading the printed page or viewing a live video image of the page 10. For example, the user has the option of magnifying the digital content 116 via a simple touchscreen interaction; sharing the displayed digital content or at least a portion thereof; performing a search (e.g. Google, Wikipedia etc) on one or more keyword in the displayed digital content etc. All of these options as well as many others are facilitated by viewing rendered digital content 116 on the viewing device 100. However, as will be appreciated, the interactive viewing system 2 typically has greatest value to the user when the virtual reality display is combined with augmented reality. In other words, the virtual reality display is augmented with additional interactive content. In this case the virtual reality display of the term "200+ Race Cars" is augmented with a hyperlink 117, and the graphic of the race car is augmented with a video playback icon 104 inviting the user to a play a video relating to this graphic. It will of course be appreciated that the combination of virtual and augmented reality provides the user with a plethora of interactive options.

Overlay Augmentation

In addition to showing live video 784, augmented reality that tracks the substrate, virtual reality display of the digital twin 107 that also tracks the substrate, and augmentation of the digital twin, the viewer application 190 (see Figure 10) is also configured to display another type of content augmentation. A region of a substrate 10 may be associated with an overlay augmentation 119 as shown in Figures 50A and 50B. An overlay augmentation 119 has a stable location with respect to the viewing device screen 105, but its presence and absence is dependent on the region the user is viewing and their viewing behavior. This overlay augmentation 1 19 may include simple static elements, dynamic elements (animation or video) and'or interactive elements.

Typically an overlay augmentation 1 19 is implemented as HTML graphical elements. When a user's viewing behavior is classified as "lingering" over a region with an associated overlay augmentation 119, the augmentation is animated onto the device's screen 105. Typically the overlay augmentation 119 obscures only a fraction of the other displayed element (such as the tracked digital twin 107 with its augmentation) and may be partially transparent. "Lingering" is typically determined by the centre of the user's view (i.e. the screen 105) being within the region for at least a third of a second, and the movement of the viewer device 100 relative to the page 10 is low. Other methods of triggering an overlay augmentation 119 are possible. In particular, sudden removal of the viewing device 100 from viewing of the substrate 10 may trigger or maintain an overlay augmentation 1 19.

While activated, the overlay augmentation 1 19 remains static with respect to the device screen 105, even though the user may be moving the view within the sensitive region and the digital twin 107 is tracking this movement (see Figure 50B).

The animation introducing an augmentation overlay 119 may be a brief fade-in, although other methods can be used. In particular the overlay augmentation 1 19 may animate as a transition from specific elements in the digital twin 107 into an overlay augmentation 1 19 with stable screen position. For example, an invitation to buy a product may have a button in the digital twin 107. Once it is detected that the user is lingering over the region, the button may graphically transition from being part of the digital twin 107 augmentation, to an overlay augmentation 119, and the effect may reverse when the user moves the viewing device 100 in such a way that the augmentation overlay 119 is not displayed.

A second example is a picture that has a video augmentation. Initially the video may play in-situ and tracking the substrate 10, then, if the viewing device 100 lingers, the video will "lift off the page and orient in a stable position with respect to the device screen 105. After playing, it returns to its original position in the digital twin 107 of the page and tracking continues.

At any one time the viewing device 100 may display any mixture of:

• The live video 784 (including background view outside the recognized substrate).

• A digital twin 107, which may be displayed in either perspective mode or flat (i.e. orthogonal) mode and which tracks the recognized page 10.

· Augmented reality elements (e.g. hyperlink 117) that track the recognized substrate.

• Overlay augmentation 119 that is fixed to the device screen 105, but whose presence is dependent on viewing location and behavior. Measuring View Rate of Change Including Lingering and Sudden Motion

The primary method of measuring changes in the rate of change in camera view point 802 is shown in Figure 51. View rate of change is determined by comparing where a prior view 792 from a previous time is positioned in the space of a later view 798. The prior view 792 and the later view 798 can be related because they are both projection transforms onto the viewed substrate 10 (typically a page). Thus the corners of the prior view 792 can be projected to discover the area viewed 794 of the substrate 10 viewed by the prior view 792. These coordinates can then be inverse projected by the projection transform associated with the later view 798. This gives a projection 796 of the prior view 792 in the space of the later view 798.

This results in two sets 804 and 806 of four corner coordinates in the space of the later view 798. Each of these is taken as a single vector of 8 values, and the Euclidean distance 800 between the two calculated. For the purpose of this calculation, all coordinates are normalized so that the long edge of the later view is 100 units long. This overcomes differences between device screen sizes.

If a rate of view change 802 is being determined, this measure is divided by the difference in time, in seconds, between when the video frames used to determine the prior 792 and later views 798 were sampled. The video capture rate is typically 10 images per second (0.1 seconds between successively captured images) but capture rates of 20 images per second to 1 images per second are not unusual (0.05 sec to 1.0 sec between images).

A low rate of movement is characterized as being below 50 units difference in the view change 802 from the prior image 792 to the later image 798 (i.e. less than half the length of the long side of the display screen 105). A high rate of movement is characterized as over 200 units difference in the view change 802 (i.e. more than twice the length of the display screen long side). Any value between these two is considered a moderate rate of view change 802.

A lingering condition is determined by both a low rate of movement, and the centre of the camera view always being within the surface region under consideration over a third of a second period (0.333sec). Wavelength of Sensing Device

Visible

The sensing device 100 used for imaging the substrate 10 has a high-resolution camera 102, built into the viewing device 100. Typically, the camera 102 captures an image of a portion of the substrate 10 and the viewing device 100 sends corresponding interaction data to the page server 20 for page recognition. Page recognition typically relies on extracting features from the captured image and comparing those features with an inverted index contained in a database (such as the page server reference database 250 described above). Once a match is found, then the page 10 is recognized, albeit within certain confidence limits.

The camera 102 captures images using visible wavelengths. By using visible wavelengths, there is no need for any special modifications of a conventional smartphone camera. Moreover, publishers are not required to modify their publications with special inks or devices which can be sensed at different wavelengths. Therefore, the use of visible wavelengths for the sensing device is particularly advantageous.

Since feature extraction and page recognition from captured images may be performed without reference to color, the images may be captured in color or monochrome (e.g. black and white). If page recognition does not make use of color in the captured images, the interaction data 101 (see Figure 8) sent to the page server 20 may be monochrome image data derived from color image data captured by the sensing device 100. If the interaction data 101 sent to the page server 20 contains monochrome image data (as opposed to color image data), this reduced amount of data helps to reduce overall latency in the system. Therefore, a processor associated with the sensing device 100 (e.g. the smartphone processor) may be configured to convert color image data into monochrome image data, depending on how page recognition is being performed in the page server 20. Likewise, the processor may compress the image data defining the query image 109 (see Figure 7) before sending the interaction data 101 to the page server 20.

Infrared

Although visible wavelengths advantageously provide interactive viewing via conventional smartphones and the like, other types of sensing device employing different wavelengths are within the ambit of the present disclosure. A sensing device which images infrared wavelengths may be used to detect the substrate, which is typically a substrate having an IR-absorbing coding pattern disposed thereon. By way of example, the sensing device may be a Netpage image sensor as described in US 6,870,966 and US 6,788,293 (the contents of which are herein incorporated by reference) may be used for detecting a coding pattern printed using IR-absorbing ink. Usually, the sensing device incorporates a complementary infrared light source to illuminate substrates carrying IR- absorbing coding patterns. Sensing IR-absorbing coding patterns using an IR sensing device advantageously ensures excellent page recognition and page tracking, because the coding pattern will typically uniquely identify a page and a plurality of coordinate locations on the page. Furthermore, if the coding pattern is invisible to the human eye it does not obscure or have a visual impact on other text or graphics printed on the substrate. However, sensing using infrared wavelengths requires customized sensing devices, or at least customized modifications to conventional sensing devices. Therefore, visible wavelengths may be preferred in many scenarios.

Ultraviolet

The sensing device may sense the substrate using ultraviolet wavelengths. Many documents, for example, carry UV watermarks which may be detected by the sensing device and used for page recognition. Likewise, a coding pattern may be disposed on a substrate using a UV-absorbing ink.

Radio

The sensing device may sense the substrate 10 using radio wavelengths. Examples of sensing devices which detect using radio wavelengths are RFID readers and near field communication (NFC) sensors. In this scenario, the substrate is tagged with, for example, an RFID tag encoding a unique identity for that substrate. Usually, the RFID tag has no battery and is powered up by radio waves emitted from the RFID reader. RFID tags are relatively cheap and unobtrusive and may be incorporated into a variety of different substrates. Clothing tags are one example of a substrate suitable for incorporating an RFID tag.

Hybrid Sensing Device

The sensing device may employ a combination of different wavelengths for sensing the substrate. For example, it may be convenient to employ radio, infrared or ultraviolet wavelengths for determining the identity of the substrate ("page recognition"), and then use visible wavelengths for determining a location relative to the substrate ("page tracking"). For example, a sensing device may incorporate an NFC sensor for page recognition and a camera for page tracking.

Likewise, a hybrid sensing device may employ different types of sensor employing the same or similar wavelengths. For example, a sensing device may employ one type of camera for page recognition and another type of camera for page tracking - each camera may be configured differently with different focal points for optimum page recognition and tracking.

It will appreciated that hybrid sensing devices employing one or more of visible, infrared, ultraviolet and radio wavelength detection are within the ambit of the present disclosure.

Device for Sensing Substrate

Sensing Device Integrated with Viewing Device

As described in connection with Figure 7, the sensing device 808, in the form of a camera 102 may be integrated into the viewing device 100. In the case of the smartphone 100 or tablet computer 107, the camera 102 captures images from an opposite face to the face incorporating the display screen 105. Other examples of viewing devices which may have an integrated sensing device are head- mounted displays 113 (Figure 43), dedicated document viewers 118 (Figure 44), handheld video game consoles 135 (Figure 45) and media players 136 (Figure 46). With the image sensor mounted on an opposite face of the viewing device 100 to the opaque display screen 105, the viewing device 100 enables the user to experience virtual reality and/or augmented reality. In other words, the digital content 116 rendered to the display screen 105 appears in real-time as if it were actually printed on the viewed substrate 10.

In general, it is convenient to integrate the sensing device 808 with the viewing device 100 in handheld viewing devices.

Sensing Device in Wired or Wireless Attachment As shown in Figures 40 and 41, the sensing device 808 may be incorporated into a wired or wireless attachment, such as a handheld pen, puck or mouse 768. Such an arrangement is particularly suitable for use with viewing devices which are not normally handheld. For example, the sensing device 808 may capture an image of the substrate, which is used for page recognition via feature extraction. Alternatively, the captured image may contain a barcode which can be decoded to identify the substrate with greater confidence.

Referring to Figures 52A and 52B, the sensing device 808 is an imaging pen 151 equipped with a high resolution camera 152. The Netpage system described in the above referenced patents, incorporates a pen capable of handwriting capture by sensing a customized position-coding pattern covering the page. The imaging pen 151 is able to capture handwriting without a coding pattern across the page.

In Figure 52 A, the imaging pen 151 is held at a height above the substrate 10. A conventional high-resolution camera 152 positioned in the pen housing captures one or more images of the substrate 10 within its field of view 14. The pen then sends interaction data representing the captured image(s) to the page server 20, optionally via a relay device, such as the smartphone 100. Having received the interaction data, the page server 20 identifies the substrate 10 using a suitable page recognition technique as described herein. The pen 151 typically sends a pen identifier to the page server 20 with the interaction data. A digital description corresponding to the substrate 10 is retrieved by the page server 20, which can then associate a path of the pen with the digital description during subsequent handwritten input. Successful page recognition may be communicated back to the pen so that the user receives feedback that the substrate 10 (typically a page) has been recognized. The path of the pen is recorded a line on the digital description which the Applicant defines as "digital ink". Digital ink is described in detail in the above referenced Netpage patents incorporated herein by reference. Referring now to Figure 52B, once the page 10 has been recognized, the pen 151 is used to enter handwriting 157 (or other markings, drawings etc.) on the substrate 10. During handwritten input, a second image sensor 154 is used to track the path of the nib

155 on the page by imaging features of the page at close range. The second image sensor 154 is specifically configured for close-range imaging so that features in its field of view

153 can be matched with corresponding features in the digital description of the page. Alternatively or additionally, the second image sensor 154 is configured for relative motion sensing using, for example, an optical mouse technique assuming that an initial absolute location on the substrate 10 can be determined or estimated. The second image sensor 154 may be activated by detecting a force applied to the nib 155 via a force sensor

156 in the pen 151. Alternatively, the camera 152 may be reconfigured to perform page tracking upon detection of a nib force, thereby obviating the requirement for the second image sensor 154.

The sequence of images captured by the pen 151 during handwritten input is communicated to the page server together with the pen identifier. This enables the page server 20 to associate the digital ink 158 (i.e. the path of the pen's movement) with the corresponding digital description of the substrate 10. This updated digital description of the reference page, including the digital ink 158, may be displayed to the user. For example, the smartphone 100 may display rendered digital content 116 corresponding to the page together with the digital ink 158. The screen 105 may be updated in real-time as the user writes on the substrate 10 using the pen 151. Editing, sharing and other options regarding the displayed digital content 116 may be provided to the user via the touchscreen 105 of the smartphone 100. Accordingly, it will be appreciated that such a system provides similar functionality to the Netpage system described in the above referenced patents to the present Applicant.

Microscope

Referring to Figure 5, the sensing device may be in the form of a microscope. A microscope accessory 61 for a smartphone is described in detail in the Applicant's US Publication No. 2011/0292198, the contents of which are herein incorporated by reference. Such an accessory may be used to modify the smartphone' s imaging optics 62 so that they are configured for reading the Netpage position-coding pattern (or a similar type of coding pattern) when the smartphone is placed over a substrate. The microscope accessory 61 may comprise an IR phosphor for illuminating coding patterns printed in IR ink using the smartphone' s internal flash.

Electronic Tag Readers Referring to Figure 53, a substrate 10 has an electronic tag 810 such as an RFID tag or a Near-Field Communications (NFC) tag. An RFID tag or an NFC tag can uniquely identify a substrate 10 and thereby provide page recognition with 100% confidence. Electronic tags of this type may be particularly suitable for non-paper substrates, where the tag 810 does not have a significant impact on the usability or appearance of the substrate 10. Indeed, high-value substrates such as clothing may already incorporate such identifying tags for product tracking through the supply chain or to minimize counterfeiting.

The sensing device 100 has an electronic tag reader for reading an RFID tag, NFC tag or other electronic tag 810. The electronic tag reader may be a standalone device in communication with the viewing device 100, or it may be integrated into the viewing device 100. Increasingly, smartphones are equipped with NFC-readers and this functionality may be usefully leveraged in the context of the interactive viewing system 2 (see Figure 8) described herein.

Figure 53 shows an interaction between the smartphone 100 and the viewed substrate 10. The substrate 10 carries an electronic tag 810 in the form of an NFC tag. In this scenario, the smartphone 100 functions as an electronic tag reader which sends a radio signal to the NFC tag 810 when within range of the tag. The radio signal powers up the

NFC tag 810, such that the tag communicates a unique identity associated with the substrate to the smartphone 100. This unique identity enables the smartphone 100 to retrieve corresponding display data with 100% confidence via a simple look-up index, which may be either stored remotely in a page server 20 (see Figure 8) or stored in the phone's memory. Once the smartphone 100 has retrieved display data 103 (see Figure 8) corresponding to the page 10, optical imaging of the page 10 via the smartphone's camera

102 may be used to track the position of the smartphone relative to the page 10 in order to determine what digital content 116 is rendered to the smartphone's display screen 105.

Typically, the digital content 116 will mirror part of the printed content in the camera field of view 14. As described above, this provides real-time virtual reality to the user. Page tracking and determination of a projection transform may be performed by comparing the captured images with the reference page 210 (see Figure 10) of the page content in the same way as if the page recognition had been performed using feature extraction from captured image(s) of the page 10 (described above). While page recognition performed by features extracted from captured images has a high degree of accuracy, the use of an electronic tag 810 effectively eliminates any errors in page recognition. The electronic tag 810 may identify one page or a plurality of pages.

Laser Scanner with Photodetector Referring again to Figure 53, page recognition may be performed by detection of a barcode 812 (either ID or 2D barcode) printed on the substrate 10. Barcodes may be imaged using a smartphone's camera 102 and some barcodes 812 (e.g. QR codes) have been optimized for reading via a conventional camera. However, smartphone cameras are generally not optimal for reading barcodes and it may be preferable in some instances to employ a dedicated barcode reader. Typically, such a reader comprises a laser and photodetector arrangement. In some barcode readers, the laser beam sweeps back and forth across the barcode and a photodiode measures the intensity of light reflected back from the barcode. It will be appreciated that barcode readers may be employed as the sensing device in the interactive viewing system described herein. Device for Sensing User Intent

Referring to Figures 8 and 9, user interaction with displayed digital content 116 occurs via an input device 814 associated with the viewing device 100. This interaction may be used, for example, to request further information; play a video clip; hyperlink to an Internet resource; indicate the success or otherwise of page recognition, etc. The user's intent in respect of the displayed digital content 116 may be sensed by various means.

Touchscreen

The touchscreen 105 of a smartphone or tablet computer is well suited for sensing user intent in the interactive viewing system 2 described herein. In Figure 9, the touchscreen display 105 of the tablet 100 captures user input requesting a particular action with respect to the displayed digital content 1 16. Interaction with the touchscreen 105 is typically via a user's finger or a passive stylus. Touchscreens are usually of a capacitive or resistive type, although other types of touchscreen will be well known to the person skilled in the art. A user may touch a displayed item of interest in order to initiate an interactive experience. A touch on the playback button 104 will initiate playback of a video relating to the displayed race car. Likewise, a touch on the term "200+ Race Cars" will initiate hyperlinking to a corresponding Internet resource. The user may be presented with soft key options relating to an interactive icon following, for example, a touch-and-hold interaction, a gestural interaction or touch-gesture combination. Examples of soft key options include: share link, copy link, share video, copy video, skip, rewind, fast forward, pause and so on. The soft key options presented to the user may be controlled by the author of the interactive content and/or copyright restrictions. For example, playback, copying and/or sharing of videos may be controlled based on a particular geographic region (determined via GPS or phone network), publication (determined via page recognition including confidence refinements), and user subscription (determined via user identity).

One-touch interactions with on-screen icons are ubiquitous in smartphones and tablet computers. Touchscreens may be used for gestural inputs. Pinching and expanding gestures are familiar types of gestural input, which are generally used to indicate "zoom- out" and "zoom-in" requests in smartphones and tablet computers.

The interactive viewing system 2 described herein may be used for clipping (and sharing) interactive digital content 116. In this scenario, a user may indicate a clipped item via, for example, a touch trace (e.g. lasso) relating to the item of interest. The clipped content may or may not preserve interactivity depending on, for example, a user preference, predetermined rights management relating to the clipped content, the location of the sender and/or recipient, and the user identity. Clippings may be stored in the user's viewing device 100 for later viewing and interactivity.

Viral marketing is becoming a powerful tool for advertisers and the ability to clip and share rendered digital content whilst preserving its interactivity enables advertisers to link printed advertisements with viral marketing campaigns. For example, a user may share a video clip or an interactive game with other users by clipping a portion of the displayed digital content and sending the clipped content to a friend. The content may be sent directly to a friend's smartphone or indirectly via a social networking website. In this way, the inherent value of the printed advertisement is increased by enabling the printed advertisement to contribute to viral marketing. Moreover, other users experience interactivity of (originally) printed content which potentially fuels the uptake of the viewing system by encouraging these users to download the requisite viewing app onto their smartphones or tablet computers.

Of course, clipping and sharing is not restricted to viral marketing campaigns. The interactive viewing system allows any printed content to be clipped via a gestural interaction with displayed digital content 116 and then shared, whilst usefully preserving at least some of the interactivity contained in the digital content. For example, a newspaper or magazine article of interest may be clipped and shared with a friend. By preserving the interactivity in the shared content, another user has the option of, for example, posting a comment relating to the originally viewed printed article without actually ever viewing the original printed article.

The ability to clip digital content via a touchscreen gesture, as well as preserving interactivity when the clipped content is shared, provides the viewing system with powerful functionality which goes beyond a simple virtual reality or augmented reality system.

Multi-touch and/or gesture combinations may be used to indicate user input via the touchscreen. A common example of a multi-touch combination is a "double-click". Another example of gesture-touch combination is a lasso gesture followed by touch to indicate "select" (lasso) and "copy" (touch) commands. It will be appreciated that touchscreens provide a versatile means for receiving user input in respect of displayed digital content 116. A variety of different commands are accessible from various combinations of touch and gestural inputs via the touchscreen 105.

A touchscreen 105 may facilitate user input via a virtual keyboard or buttons. For example, if the displayed digital content 116 corresponds to a printed form having a plurality of different form fields, a finger or stylus touch on each form field may cause a virtual on-screen keyboard to appear, enabling the digital form field to be filled in by the user via the virtual keyboard. Likewise, if the displayed digital content 1 16 corresponds to a video, then virtual playback control buttons (e.g. skip, rewind, fast forward, pause) may appear on the display screen during video playback. User Facing Camera

Many viewing devices 100, such as smartphones and tablet computers 766, are equipped with a user-facing camera 108 in addition to a view-facing camera 102, as shown in Figure 39. The user-facing camera 108 provides a useful means for sensing a user's intent in respect of displayed digital content 116. Gesture Recognition by User Facing Camera

The user-facing camera 108 may be used as a means for capturing hand gestures or other gestures from the user in respect of the displayed digital content, which are interpreted by the processor in the viewing device 100. Hand gestures may be used, for example, to control video playback e.g. a palm- facing gesture for stopping or pausing video playback; a rightwards motion for forward-skipping or fast-forwarding video playback; a leftwards motion for backward-skipping or rewinding video playback.

In the case where the displayed digital content 1 16 comprises a videogame, hand gestures may be used for playing the videogame, for example, by controlling the movement of an animated on-screen icon, character or other gamepiece. Facial Expression and Eye-tracking Recognition by User Facing Camera

Referring again to Figure 39, the user-facing camera 108 may be used for capturing facial expressions 816, which can be interpreted by the processor in the viewing device 100. For example, a frown or other quizzical expression 816 may be interpreted as an indication that the displayed digital content 116 does not correspond to the imaged printed content 13. This may prompt the viewing system to find an alternative match via the page recognition process and display alternative digital content to the user. Another facial expression (e.g. a smile) may be interpreted as an indication that a correct match has been obtained. Eye-tracking may be captured via the user-facing camera and interpreted by the processor 106 (see Figure 10) of the viewing device 100 in order to initiate various interactive functions. Tracking the user's eye 818 is particularly useful for viewing devices 100 having a relatively large display screen 105, such as tablet computers 766. In one simple form, eye-tracking may be monitored and then an interactive function initiated after a predetermined dwell time on a particular displayed object. For example, dwelling on a playback icon or a graphic having associated video content may initiate video playback. The processor in the viewing device 100 may monitor dwell times on displayed objects and record dwell time data in a remote page server. Such dwell time data is valuable for publishers and advertisers who may wish to assess the impact of a particular article, advertisement or the like.

Microphone

Voice recognition software in the viewing device 100 may be used to interpret spoken commands captured by a microphone. Spoken commands may be used to initiate interactive functions in respect of the displayed digital content 1 16 (e.g. "play video", "show comments", "link to X" etc). Alternatively or additionally, spoken commands may be used to improve the confidence of page recognition. By way of example, the user may say the name of a magazine before, during or after capturing an image of the printed content 13. The voice recognition data in combination with the page recognition process vastly reduces the scope of the search required to identify the particular page being viewed.

Mechanical Input Device

One or more conventional mechanical input devices 814 may be used to indicate a user's intent in respect of displayed digital content 1 16. Examples of mechanical input devices include keyboards, push buttons, joysticks and computer mice. A notebook computer (see Figure 40) will typically employ a keyboard 820 and/or a mouse 768 for receiving user input; a handheld games console 135 (see Figure 45) will typically employ control keys 772 and/or a joystick for receiving user input. These and other mechanical input device will be well known to the person skilled in the art. Internal Motion Sensors

A handheld viewing device 100 may be equipped with one or more motion sensors, such as accelerometer(s) and/or gyroscope(s) 295 (see Figure 12) which can be used to detect user gestures. For example, a shake or tilt of a smartphone may be used to indicate that incorrect page recognition has occurred, enabling the viewing system to search for another possible match. Alternatively, a shake or some other gestural movement may be used to indicate that the user wishes to change between different types of display or different options presented to the user. Alternatively, a shake or some other gestural movement may be used to indicate that the user has switched to a different publication, prompting the viewing system 2 (see Figure 10) to initiate page recognition.

More complex gestural movements, such as motioning the form of a letter or mark (e.g. "y", "n", tick, cross etc.), may be detected in order to interpret the user's intent in respect of the displayed digital content 116.

View Facing Camera Although the view- facing camera of the viewing device 100 is typically used primarily for page recognition and page tracking, it may also be used to detect a user's intent in respect of the displayed digital content.

Gestural Movement of Viewing Device Relative to Page by View Facing Camera

In one form, the view-facing camera may be used to detect gestural movements of the viewing device 100 relative to the imaged substrate 10. Thus, the gestural inputs described above may be captured by the view- facing camera 102 as an alternative to, or in addition to the internal motion sensor(s) 295.

The view-facing camera 102 is a more accurate means for sensing relative motion than internal motion sensors, and potentially this enables more complex gestures to be captured. For example, movement of the viewing device 100 may be used for capturing handwriting or drawing input. With sufficient practice, users may find this a more convenient method of capturing handwriting or drawings, especially on viewing devices 100 with relatively small display screens 105 where touchscreen input of handwriting or drawings is difficult. The captured handwriting or drawings may be associated with the digital description of the printed page so that they are viewable on subsequent interactions with the page 10.

Interactions With Printed Substrate by View Facing Camera As shown in Figure 54, user interactions 822 with a printed page 10 may be captured via a view-facing camera 102 of the viewing device 100 and interpreted with respect to displayed digital content 116. In one scenario, the viewing device 100 is held at a height above the page 10 with a corresponding virtual or augmented reality display as described herein. The user interaction 822 with the physical printed page 10 is usually indicating with a finger 824 or a stylus to identify different regions of the page. The position and movement of the user's finger 824 may be captured by the view- facing camera 102 and translated into a corresponding position and movement of a pointer on the display screen 105. In this way, the user's finger 824 effectively becomes a mouse for the display screen 105. Simple gestures, such as tapping, may be interpreted by the viewing device 100 and initiate actions, such as hyperlinking or video playback. Alternatively, the user's finger 824 dwelling on a particular zone 826 of the printed page 10 may be used as an indication for the viewing device 100 to initiate an action. Of course, the interactive options available to the user may be displayed on the display screen 105 of the viewing device 100 as described above. Similarly, it will be appreciated that digital ink 158 (see Figure 52A) may be generated using the view-facing camera 102. In one scenario, the movement of a stylus, finger or pen with respect to the printed page 10 is used to generate digital ink, which can be recorded and, optionally, associated with the digital description of the page. This method of generating digital ink has the advantage that the digital ink includes temporal stroke information, which can be used to assist in text recognition (see, for example, US Patent No. 7,359,551, the contents of which are incorporated herein by reference).

Alternatively, digital ink may be generated by writing on the printed page using a marking pen and imaging the handwritten input using the view-facing camera. Subtracting the reference image 210 (see Figure 10) content from the imaged page 10 yields the handwritten input as digital ink 158. The digital ink may be subjected to text recognition using standard methods known in the art. This technique has the disadvantage that the generated digital ink lacks temporal stroke information, which is useful for text recognition. However, it has the advantage of greater accuracy in generating the digital ink, without relying on movement interpretation.

By way of example, using the techniques described above, a printed form may be filled in by hand using an ordinary pen or pencil. The user's handwritten text may then be converted into computer text and entered into the corresponding digital description of the form. The digitally filled-in form can then be sent to a designated recipient via the user's smartphone or tablet computer. A user has the option of performing final editing of the form on the viewing device 100 prior to sending the filled-in form. Of course, the digital form may retain the user's original handwriting without any text conversion, if desired.

Accordingly, the viewing system 2 provides a facile means for capturing digital ink in respect of a printed page 10 without any requirement for special pens or coded paper, such as Netpage. Although the printed page, when imaged and recognized by the viewing system 2 may be non-unique (in contrast with Netpages), the corresponding reference page 210 can acquire a unique identity once it is tagged with the user's identity (or alias identity) retrieved from the user's viewing device 100. Temporal tagging may also be used to distinguish between copies of pages from which the same viewing device 100 has generated digital ink. Since each reference page 10 effectively acquires a unique identity in the system, many of the advantages of the Netpage system can be realized without uniquely coding individually printed pages.

User Identification

The interactive viewing system 2 operates optimally when the system can identify the user, although the viewing system is, of course, still operable without this information. The user identity need not be an actual identity; for most requirements, an alias identity associated with the user or the user's viewing device 100 is sufficient.

The user identity may be used by the viewing system in a number of different ways. For example, a profile associated with the user identity may be used for increasing the confidence of page recognition. The user profile may comprise information, such as preferred language, demographic information (e.g. age, gender, occupation, income bracket etc.), browse history and magazine subscriptions. This information contained in the user profile can valuably contribute to page recognition, either before, during or after a matching process.

In some circumstances, the user identity may be used to determine the interactive data 101 (see Figure 8) displayed to the user. For example, some interactive elements 758 (see Figure 37A) may only be visible (or audible) or interactive to magazine subscribers or users who have paid a fee, and the requisite user status information may be accessible via the user identity.

The user identity may also be used for tagging any data submitted by the user to the interactive viewing system 2. For example, a digital form filled in by the user may be tagged with the user's identity so that it acquires a unique identity in the system. As described below, the user may be identified by a variety of different means.

Identity of Viewing Device

Typically, the viewing device 100 contains a unique identity. For example, smartphones and tablet computers typically contain a unique identity in the form of a removable SIM card which is transferable between devices. A transferable identity is advantageous because the user's established profile is not necessarily lost when the user upgrades to a new viewing device 100. The identity of the viewing device 100 may alternatively be stored in a non-removable memory.

If the user decides voluntarily to register information with, for example, a phone identity then this information may be used to build up a user profile. Otherwise, a user profile associated with the phone identity may be built up based on usage data collected by the viewing system 2.

For shared viewing devices 100, each user may have a user account accessed via a user login. Identity of Sensing Device

In many instances the sensing device 808 is integrated with the viewing device 100. However, where the sensing device is separate (e.g. an optically imaging pen 151 shown in Figure 52A, puck or mouse 768 shown in Figure 42), then the sensing device 808 may contain a unique identity, which can be used to identify the user. Electronic Token

An electronic token, such as an NFC tag or an FID tag may be used to identify the user. In this scenario, the user may carry the electronic token on his or her person and the viewing device 100 has a suitable sensor for sensing the token, which identifies the user. Advantageously, electronic tokens may be sensed by a variety of different viewing devices 100 so that the same user identity is associated with each viewing device 100. In the case where, for example, a user uses different smartphones and different tablet computers for viewing substrates, the electronic token ensures that a consistent user identity is associated with each viewing device 100.

Facial or Iris Recognition

A user-facing camera 108 of a viewing device 100, such as a smartphone or tablet computer 766 (see Figure 39), may be used to identify the user via facial recognition technology or iris recognition technology. Voice Recognition

A microphone of a viewing device 100, such as a smartphone, may be used to identify the user via voice recognition technology.

Password

The user may be identified via a password. For example, the user may receive an on- screen prompt to enter his or her unique password when the viewer application 190 (see Figure 10) is selected. The user's password then identifies the user to the viewing system.

Signature

In viewing systems which are able to receive and interpret digital ink, the user's signature may be used as a means for identifying the user. The Netpage system (described in detail in US 7,106,888, the contents of which are herein incorporated by reference) includes a pen capable of handwriting capture by sensing a customized position-coding pattern covering the page. A Netpage pen in wireless communication with a smartphone may be used to write a signature on Netpage coded paper. The digital ink representing the user's signature is transmitted to the smartphone and then a server, which matches the user's signature against a database of known signatures. Temporal stroke data, nib force data and/or a pen identity may be used to assist further with signature verification.

Fingerprint

The viewing device 100 (and/or sensing device 808) may be equipped with a customized fingerprint sensor or other biometric sensor for identifying the user. Alternatively, a fingerprint sensor may be in the form of a high-resolution touchscreen 105 (see Figure 10) which is able to resolve a fingerprint pattern as well as display content to the user.

User Initiated Actions

In addition to interactions defined in the digital twin 107 (see Figure 38) being viewed, the user can initiate other actions which are not defined in the digital twin. These actions are available at any time when the digital twin 107 is being viewed on the viewing device 100, and do not require any enabling action from the publisher of the corresponding printed substrate.

Zoom, Pan and Rotate View User Action

Typically, the user will move the viewing device 100 over the substrate and the digital twin 107 will track the location. This mode of operation is typically referred to as Dynamic Mode. In this mode, any unsteadiness of the user's hand is reflected in the movement of the image on the screen 105. At any time, in order to see a steady image of the digital twin or to see more detail of the digital twin, the user may use a gesture, such as a two finger pinch, to initiate a zoom of the digital twin display on the viewing device 100. That zoom can be either to increase or decrease the scale of the display. The direction of scale change is typically controlled by the inward or outward sense of the finger pinch.

Using a gesture such as just described switches the viewing device 100 into an alternate state of operation in which the tracking of the substrate 10 ceases. This mode is typically referred to as Static Mode.

Other gestures are also supported to change the view during Static Mode. Typically, these gestures will also switch the viewing device 100 into Static Mode. They include dragging the digital content 1 16 on the screen 105, in its zoomed- in state, un-zoomed state, or zoomed-out state, to pan the view on the screen 105 over the digital twin 107. Similarly, at any time the user may use a gesture, such as a two finger rotate gesture, to rotate the view of the digital twin 107 on the screen 105, in its zoomed- in state, un-zoomed state, or zoomed-out state. Also at any time, the user may use a gesture, such as a double tap, to cause the screen view of the digital twin 107 to rotate to align its vertical direction with the vertical direction of the viewing device 100.

In one mode, static mode is entered automatically if the user moves the camera away from the printed publication being tracked. The detection of this event is described above in subsection entitled "The Conductor Module 160", as well as the method of determining the zoom, pan and rotate that is used as the initial position of the static mode view. In another aspect, the control of all or some of the pan, zoom and rotation functions may be done by use of touch buttons on the screen 105.

In another aspect, the control of some or all of pan, zoom and rotation functions may be done using a menu and/or sub-menus.

In another aspect, the control of some or all of pan, zoom and rotation functions may be done by physical buttons on the viewing device 100 dedicated to these functions.

In another aspect, the control of some or all of pan, zoom and rotation functions may be done by physical buttons on the viewing device 100 that are used for multiple functions, with an indication on the screen 105 that this is the current function of the buttons.

Clipping User Action While using the interactive viewing system 2, users will often see content that particularly interests them or that they believe may interest others. They may wish to save this content for their later reference, or for sharing with their friends or associates, or for sharing on public sharing sites on the Internet. For the purposes of the present specification, the content of interest to the user is referred to as a "clipping" in light of its conceptual similarity and virtual resemblance to cutting or tearing a clipping from a page in a newspaper or other publication. The clipping interactivity enabled by the viewing system 2 will be described in detail below with reference to Figures 55 to 65. Creating a Clipping

At any time when the screen 105 is displaying live video image 784 of the substrate 10, or digital twin 107 (see Figure 48), whether in Dynamic or Static Modes of operation, and with the camera field of view 14 of any part of the substrate 10, the user can send a clip request 310 of the current view to the page server 20 (see Figure 10). This marks that view in preparation for a later action. The clip request 310 is typically made via a touch button 828 shown on the screen 105 (see Figure 56).

The clip request 310 can alternatively be made via a menu or sub-menu function, or using a dedicated physical button on the viewing device 100.

In another aspect, the clip request 310 is made using a physical button on the viewing device 100 that is used for multiple functions, with an indication on the screen 105 that this is the current function of the button.

After the clip action is selected, the viewing device 100 typically displays the created clipping 830 along with a selection of options 832 for disposition of the clipping (see Figure 57). Typically, this is accompanied by an animation 836 in which the view of the clipping 830 is changed to indicate that it now represents a clipping. This may include presentation of the clipped region with ragged edges 834 (see Figure 64) to simulate a torn page, or other visual indication that the clipping 830 is now taken from the publication 838. Other methods of indicating the success of the operation are also possible, including presentation of text message, presentation of a graphic icon, and so on. It is also possible for no indication to be made, other than the presentation of the disposition options 832.

The disposition options 832 are typically presented to the user as on-screen touch buttons as shown in Figure 57. However, it will be appreciated that any of the other user interface methods described earlier could also be used to accept the user's intention. Once the user selects his or her preferred disposition option and completes any subsequent processing related to that option, the viewing device 100 typically returns to its previous viewing mode.

The user can enhance the clipping contents by adding of their own content, such as highlights in the style of a marker pen of specific words or phrases. Other styles of highlight include rings around specific content, optionally in a hand-drawn style, or dynamic highlights, such as flashing image components or image components which change shape or colour. These may be changes to the existing image components or additional image components. Other enhancements can include addition of links to existing URLs, or addition of videos, photos, photo galleries, text, sound recordings, files and the user's rating of the content or the publication. Any such enhancements remain specific to the clipping 830 and stored with it. Still further enhancements can include selection of specific text within the clipping 830 which is used to search sites such as Amazon or Wikipedia and a link to that information is associated with that text in the clipping. These enhancements can be made using the viewing device 100 to the digital twin 107 of the publication 838 prior to the clipping being created, or may be made to the clipping 830 after it is created.

Clipping Pre-Defined Regions

Referring to Figure 58, publications 838 may include portions which the publisher expects or encourages users to take as clippings 830. Examples include coupons and vouchers for businesses and products. The interactive content authoring tool, allows definition of regions within the publication areas as pre-defined clipping regions 840. When the user creates a clipping 830 which is within such a region 840, or is a close match to such a region, it is ambiguous exactly what their intent is. They may wish to make a clipping 830 that is what they see in the tracked digital twin 107, framed the way they have framed it, or they may be interested in capturing the pre-defined clipping region 840. To resolve this ambiguity, both options are presented for user selection. This selection is presented as two visual representations of the clipping 830, along with text "Editor's clipping" and "Your clipping." The user's framed clipping is presented with the appearance of torn edges 834 (see Figure 64), and the pre-defined clipping region is presented with clean edges (see Figure 65). A pre-defined clipping region 840 may also have associated additional content. For example it may be part of a larger flow of text, or an article several pages long. In this case visual or textual information informs the user that this predefined clipping 830 has such additional content. In the preferred embodiment the user selects which clipping 830 they wish to proceed with. After selecting one type of clipping, user interaction moves to selection of one of the disposition options 832. When such pre-defined clipping regions 840 are clipped, the page server 20 can record specific statistics relating to the number of times each pre-defined clipping region has been clipped by all users. Similarly, when that clipping 830 is presented at a later time to the page server 20, or some other business system associated with the publisher, it can be automatically recognized and processed. This is especially useful in the case where the clipping 830 is a voucher which the publisher wishes to specifically recognize when it is subsequently presented.

Typically, pre-defined clipping regions 840 are shown without torn edges 834 to indicate to the user that they are exact clipped regions and special in the way they operate.

When the user creates a clipping 830, whether moving to the selection stage between the user's framed clipping and the pre-defined clipping, or moving straight to the clipping disposition stage, an animation is used to link the image on the tracked digital twin 107 to the proposed user's framed clipping (not the pre-defined clipping). This animation begins with the clipped area image being identical and coincident with the live-tracked digital twin 107. That is, there is no visual difference at this stage. Then, proceeding by incremental steps over a period of approximately 2 seconds, the torn edge 834 is applied, the digital twin 107 is removed outside the borders of the torn edge 834, the clipped area scales, rotates and translates into the position where is resides in either the clipping selection stage, or the clipping disposition stage (as appropriate). In this manner a clear link is established in the user's mind between what they were seeing in the live tracked digital twin 107, and the clipped fragment of the page 10 (see Figure 10).

Figure 55 illustrates the flow of information through the interactive viewing system 2 during clipping operations. The view of the substrate or digital twin on the viewing device

100 is limited by characteristics of the viewing device's display screen 105, such as its resolution and color depth. In order that the clipping 830 is not limited by these restrictions, the clip action does not copy the current view, but creates a link to a packet of data which defines the clipped region 840 by reference to the original publication 838. In one implementation, the viewing application 190 sends the image captured by the camera 102 to the page server 20 (arrow 842), which uses that image to find the matching reference page 210, passing that reference page 210 back to the viewing device 100 (arrow 844). Therefore, the viewing device 100 has a reference to the reference page 210 as known by the page server 20. When the user identifies a region 840 (see Figure 58) on the reference page 210, and initiates a clip operation, the viewing application 190 sends the publication identifier and reference page identifier of the page from which the clipping 830 was made to the interactivity server 846 (arrow 860), along with the page coordinates of the clipped region 840. As previously discussed, the page server 20 and interactivity server 846 need not be separate servers. To reflect this, the page server 20 has been previously defined as encompassing a server system of multiple interconnected servers, or a single server. In this case, the page server and interactivity server are separate servers to better illustrate the information flows during the clipping process. The interactivity server 846 stores the publication identifier and reference page identifier of the page in a clipping information structure 850. This effectively creates a link to the original data. The interactivity server 846 names the information structure 850 uniquely and returns a unique URL to the viewing application 190 (arrow 852). Typically, the interactivity server 846 will store the identity of the user who requested creation of the clipping 830, the date and time of its creation, the geographic location of the user who requested its creation, in the clipping information structure 850.

When that unique URL is passed to another PC 854 or computer device (arrow 856), it remains valid and will still reference the same clipping information (arrow 858). Displaying that unique URL in a browser shows an image of the clipping 830. Typically, the interactivity server 846 does not copy the image of the clipping 830, but when the URL is accessed, it in turn accesses the image from the page server 20 using its stored reference (arrow 848).

In this way, not only a high quality version of the image which was shown on the viewing device 100 at the time the clipping was created can be accessed using the clipping, but also interactive content, user generated content and other content associated with that region 840 in the publication 838 (see Figure 58) becomes accessible via the clipping URL. It is also possible at any time to add further information to the clipping information structure 850 by passing it to the interactivity server 846 using optional parameters on the unique URL which are supported by the interactivity server.

Referring to Figure 64, when the sharing user places his or her mouse over the displayed clipping 830 as it has been displayed in a browser, additional information 858 is displayed, such as publication details, the date and time the clipping was created, identity information of the user who created the clipping, location information of the user who created the clipping and/or other similar data.

When display of the URL is requested, the interactivity server 846 additionally activates the image as a link which, when clicked, will show a view with other options associated with the clipping 830. These options may include viewing details of the publication 838 from which the clipping is taken, options to display or suppress display of user generated content, options to display or suppress display of interactive content, options to zoom out or pan to other parts of the page from which the clipping was taken, options to access the article from which the clipping was taken and options to access the publication from which the clipping was taken.

Typically, when the URL is accessed and the clipping is displayed, the image of the clipping 830 is shown with a pattern on the edge which simulates and represents the torn edge 834 of a physical clipping torn from a paper magazine, similar to the way the clipping was displayed when it was created. The appearance of the torn edge 834 is generally only applied to those edges which do not correspond to the physical edges of the page 10 or edges of the digital twin 107. Thus, when an entire page is clipped, only the binding edge will show a torn edge 834. Similarly, an article clipped from the edge of the digital twin 107 will not be shown as "torn". The application of this torn edge appearance may be applied either by the web client, such as the viewing application 190 (see Figure 55) when the clipping 830 is displayed, or by the interactivity server 846, when the URL is referenced.

In order to support displaying the clipping 830 in a variety of different ways, the interactivity server 846 supports appending parameters to the clipping's unique URL. By way of example, the appending of the parameter string

"?page=full,show=publication&publisher,format=browser" to the unique URL may be used to cause the interactivity server 846 to show an image of the full page from which the clipping 830 was taken, including information 862 about the publication and its publisher, in a format suitable for a standard web browser, as shown in Figure 65. It will be clear to those familiar with WWW operation, that all of the options mentioned in this section can be controlled in a manner similar to this.

The interactivity server 846 and page server 20 are used to support the operation of the clipping URL. It will be clear to those familiar with WWW operation, that the URL can reference the interactivity server 846, the page server 20, or some other server which may or may not be dedicated to this task. It will also be clear that the required information from the page server 20 may be copied onto the server which supports this URL, or copied to some other location from which it is then referenced.

Referring to Figure 64, if the interactivity server 846 is instructed to delete 864 the clipping's information structure 850, by use of a specific parameter string appended to the URL, the clipping 830 will be removed by the interactivity server 846 and will then become unavailable to everyone who attempts to access the URL. Typically, this facility is used when a clipping 830 has been created in error, and before it is shared or stored. However, it may also be used to delete 864 a clipping 830 even after the URL has been shared.

Sharing Clippings Referring back to Figure 57, when the user creates a clipping 830 using the viewing application 190, he or she is presented with a set of disposition options 832 for the clipping just created. By way of example, the disposition options 832 presented to the user may include Facebook and Twitter. If the user selects Facebook, the viewing application 190 will pass the clipping's unique URL to Facebook to be saved into the user's Facebook account. Typically, the user's Facebook account details have been entered as parameters to the viewing application 190 or the Facebook authorization process has been performed in advance, and so the sharing can proceed without further user interaction. However, it is also possible for the viewing application 190 to request these details or to initiate the Facebook authorization process at the time the sharing of the clipping 830 is requested. It will be appreciated that any accessible sharing Internet site can be used in this way, including sites such as Facebook, Twitter, Qzone, Habbo, Bebo, Vkontakte, Orkut, Linkedln, Myspace, Friendster, and others.

It will also be appreciated that clippings 830 can be shared via email by creating an email containing the clipping's unique URL using a similar user interface. The selection of the Internet sharing sites and sharing methods presented to the user as the disposition options can be defined using a parameter, or multiple parameters, to the viewing application 190. Where the number of sharing sites in that selection exceeds the number of options that the menuing or selection process in the viewing application 190 can support, an "Other" option can be presented to allow a subsequent sub-selection process to be performed by the viewing application to select the sharing method requested by the user.

In an alternative implementation, the actions of clipping and sharing to a preconfigured sharing site or using a preconfigured sharing method can be initiated by the user with a single touch. Saving Clippings on the Viewing Device

Users may at times want to create clippings intended for their own later reference. Referring to Figures 57 and 60, the user may save the clipping 830 in a way that allows access to it later, either on the viewing device 100 or on some other computing device.

Typically, one of the disposition options is the option to save the clipping 866 on the viewing device 100. When this option is selected by the user, the clipping's unique URL is saved on the viewing device 100 by writing the clipping's unique URL to the list of bookmarks for an Internet browser installed on the viewing device 100. This permits the user, when subsequently using the Internet browser to access the clipping 830 from the browser's bookmark list. The user can also use any of the browser's bookmark management functions to manage their list of clippings.

Referring to Figures 60, 61 and 62, the clipping's unique URL can be added to a file of clipping URLs 868 on the viewing device 100. Typically, this file of clipping URLs 868 is accessed by the viewing application 190 (see Figure 55) to permit the user to see their list of clippings 870 or to display the clippings by referencing the URL 872. Other functions are also supported such as sorting the clippings, or deleting one or more clippings. Sorting functions can operate based on the clipping content by accessing the clipping's URL and retrieving information such as the clipping's creation date 874 or the clipping's source publication 876. To improve speed of operation, some or all of this information may be cached on the viewing device 100. It will be clear to those skilled in the art that all or most of the functions typically available in list management applications apply to this circumstance as well. Optionally, aspects of the clipping's history 878 can also be saved, including how and when it has been shared. In yet another aspect, the file of clipping URLs 868 can be used and managed as in the previous aspect, but by use of a separate application on the viewing device 100.

In another aspect, the clipping URLs are automatically saved in such a file 868, or in a browser bookmark list, even if the clipping 830 is also shared or disposed of in some other way. This permits a user to view their clipping history 878. Saving Clippings to the Interactivity Server

When a clipping is requested to be saved, its unique URL is recorded in a file of Clipping URLs 868 which is stored in an account kept for the user on the interactivity server 846. This is effectively, saving the clipping 830 in a cloud medium.

In this case, the interactivity server 846 supports separate accounts for individual users. These accounts permit storage of the clipping URLs as well as other information about the user. This information may be collected automatically by the viewing system 2 (see Figure 10), such as the user's viewing history, frequency of use and so on, or it may be information added by the user, such as personal information.

Optionally, the user's account is created by the user. On account creation, the user provides an account name and password. Typically, the account name and password are also stored by the viewing application 190 (see Figure 55) so that the account can be accessed transparently by the viewing application 190 to record clipping URLs and other information about the user's viewing behaviour. The account can also be accessed using a web browser from any other internet connected computer. The web interface provides facilities to view, manage, and share clippings.

In another implementation, the account is created automatically when a new viewing device 100 is first detected by the interactivity server 846. The account is based on the unique identifiers collected from the user's viewing device 100. At a subsequent time, when the user provides an account name and password via the viewing application, these are associated with the existing account which is based on the viewing device's identifier. This then permits the types of access described in the previous paragraph to the clipping and viewing history (868 and 878) collected prior to the user supplying the account name and password. In another implementation, the viewing device 100 does not require its own account name and password, but instead uses the authentication process of another site, such as the Facebook authentication process. In this way, the user perceives the clipping and viewing history (868 and 878) to be available whenever they are logged into Facebook.

Synchronising Saved Clippings The functionalities described above may also be combined. In this case, whenever the user creates a clipping 830, the clipping URL is saved to the viewing device 100 as and also saved to the interactivity server 846. From time to time, usually whenever the viewing application 190 starts, and then at a regular time interval, the viewing application 190 synchronizes the local and interactivity server 846 information. This synchronization is typically transparent to the user, so that the user perceives that their history and information is widely available and always up to date. Notwithstanding this, the user can also specifically request synchronization of this information at any time.

Clippings Viewed by Others One of the most common reasons for users to make clippings 830 based on the publications 838 they view is to share those clippings with their friends and associates. Referring to Figure 55, users can share their clippings by sharing the URLs which define the clippings (arrow 856). The interactive viewing system 2 and viewing application 190 facilitate this sharing.

As previously described, clippings are URLs which in turn contain references back to the original publication (arrow 848). Therefore, the original user or another person who views the clip 830 can reference other parts and aspects of the publication 838. For example, they can zoom out and pan to see the reference page 210 and spread from which the clip 830 was taken and they can turn pages to see the remainder of the publication 838, or the remainder of the article, at full print quality.

Clippings 830 are dynamic, and at the time they are referenced, show the content of the publication 838 at the time the clipping 830 is being viewed, even if the changes or additions to the digital content 116 (see Figure 48) were not made at the time the clipping was first created.

In addition, other users can also add content to the clipping 830 in the same manner as the original user can add content.

The creator of the clipping 830 can indicate whether the others users with whom the clipping is shared can add content to the clipping.

The publisher can use the functions of the content authoring tool to indicate whether the users with whom the clipping 830 is shared are permitted to add content to clippings made from that publication 838 or parts of that publication.

As when it is viewed by the original user, the display of the clip 830 is enhanced, both for visual impact and to reflect the operations that the user can perform on that clip as described above. For example, a clip 830 of part of a reference page 210 has a torn edge 834 applied when viewed to reflect the idea of a part torn from a physical publication page. A clip 830 of a reference page 210 has a torn edge 834 along the side of the reference page 210 applied when viewed to reflect the idea of a whole page torn from a publication 838. Liking Clippings

Clippings 830 are effectively unique URLs, and so can be used with social websites as any other URL. In particular, they can be "shared" and "liked". Social networking websites typically offer facilities to find out how many times a URL has been shared or liked by users of the website, so it is possible to find out how much a particular publication or a particular advertisement or article in a publication has been "shared" or "liked". This becomes a measure of that publication's, article's or advertisement's popularity among users of that social networking website.

Access Rights to Clippings

When the publication 838 is prepared for viewing using the interactive viewing system 2, the content author marks all regions 840 of the publication 838 with an indication of the nature of those regions. For example, regions 840 can be marked as advertisements, advertorial, editorial, stories, photographs and so on. This information is used when the clipping 830 is created to manage how much information is included in the clipping 830 so that copyright and digital rights management restrictions are complied with. In particular, this is used to prevent sharing of complete publications 838 and to force compliance with fair use provisions. For some parts of the publication 838, typically advertisements, the advertiser and publisher can choose to allow that advertisement to be shared without restriction when clipped. For other parts of the publication 838, typically stories and photographs, the publisher may elect not to permit referencing of the whole story or image from that clipping 830.

The entire publication 838 may be in a single category as a default setting, and any region 840 which varies from this category is specifically set using the content authoring tool.

The publisher of the publication 838 can elect to have some content included with all clippings 830 created by users from that publication 838. Examples of such content include a link to an Internet site for subscription to the publication 838, instructions on how to subscribe to the publication, a link to an Internet site for accessing and downloading the viewing application 190, information on the viewing application 190 and how to access it, information about the publication 838, issue, article and page number from which the clipping 830 was taken, links to advertiser's web pages, advertisements for products, in the case when an advertisement is clipped links to the specific advertiser's web pages and information about the publisher and the publisher's other publications. Addition of User Generated Content

Additional information may be attached to the digital twin 107 (see Figure 10) by a user. That attached information is uploaded to the page server 20 and associated with the specific place on the publication 838 so that it is also accessible by other readers of the publication. This feature can be used to do things such as make submissions to competitions advertised in the publication, to allow users to comment on articles or other content in the publication, to allow users to give feedback to the publishers, or to allow users to share feedback amongst themselves. Optionally, information about or identifying the uploading user can be associated with the information. The information attached to the digital twin 107 can be of a number of types, including text, photo, photo gallery, video, hyperlink, live video feed, audio, live audio feed, drawings, files, applications and user ratings. The user can also start a conversation thread associated with a location in the publication. Other users who have access to the user added content can add to the content already uploaded by other users. Videos and photos can be recorded on the viewing device's camera, either view facing 102 or user facing 108, and directly attached as user added content.

User options are provided on the viewing device 100 to permit users, both the submitting user and other users, to select if and how they view user added content. These options include displaying the user added content directly over the digital twin 107, displaying an icon representing the attachment on the digital twin at the location the information was attached, displaying a notification icon, word or option elsewhere on the viewing device display screen 105, or not displaying any information at all.

User options are provided on the viewing device 100 to permit users to display user attached content based on characteristics of that content, including content from all users, from others users in the current user's geographic region, from users with similar demographic characteristics as themselves, from users with a specific demographic characteristic, content which does not have user information or lacks specific user information, content uploaded on a specific date or uploaded within a selected date or time range, information of a particular type (e.g. all videos or all text comments) or combinations of these selection methods. Most publishers will not want users placing negative user added content on their publication's digital twins 107, and so options are provided to moderate content. Depending on settings selected by the publisher, user added content can appear immediately to all users, it can appear after review and approval by a representative of the publisher, or it can be available only to the publisher or representatives of the publisher. In addition, users can report other user added content to the publisher as inappropriate, incorrect or offensive, to alert the publisher to a potential problem. Where the number of user reports exceeds a threshold value set by the publisher, the content is automatically disabled until the publisher has had an opportunity to review it. Text Search, Translate, and Speak

At any time the user can select text in the digital twin 107 on the viewing device 100 and request operations to be performed using that selected text. The selection of text can be performed by the user touching the display screen 105 at the position of the text, by the user's initial touch selected text being updated by dragging start and end markers displayed on the viewing device 100, by the user speaking the text such that a microphone on the viewing device 100 receives the sound and the viewing device 100 interprets that speech and identifies the corresponding text on the display screen 105, by the user typing sufficient characters from the start and end of the text and having the viewing device 100 search the displayed image and select the text, by the user touching and dragging across the text on the printed page 10 and having the viewing device 100 analyze the images from the view facing camera 102 which is viewing the publication to identify the text over which the user dragged their finger, or by any other text selection method supported on the viewing device 100.

Once selected, the operations that can be performed on that selected text include submitting the text as a search string to a search engine. The search results can be stored on the viewing device 100 for later access, can be stored remotely for later access, or can be displayed immediately on the viewing device 100.

The selected text can also be submitted to a translation function for translation into the default language of the viewing device 100 or some other nominated language. The translated text is either saved on the viewing device 100 for later access, saved remotely for later access, immediately displayed on the viewing device 100 in a layout which is independent of the digital twin 107 on which the text appears, or the translated text is placed over the same location as the selected text on the digital twin 107. The translated text can be a different length from the original text, so the font size is adjusted to ensure that the translated text fits into the same space as the original text, thus avoiding obliterating other elements of the digital twin 107.

The text can also be submitted to a speech generation engine so that the speech is sounded on the viewing device 100 or headphones attached to the viewing device 100. The speech engine local to the viewing device 100 or a speech engine remote from the viewing device 100 can be used to generate the sound. Drawings and Digital Ink

Drawings may be added to clippings 830 or to a digital twin 107 as user generated content. In addition to adding content which already exists in the user's viewing device 100, the user can also add content such as hand-drawn shapes or annotations in the form of digital ink 158 (see Figure 52A). There are many options available to users to create such digital ink content, including those described in the following paragraphs. In these cases, the lines of required digital ink are recorded as sequences of 2D coordinates.

By tracing their finger on the touchscreen 105 of the viewing device 100, or by using a stylus on the touchscreen of the viewing device 100, the user can directly draw the lines they require to be included as digital ink over the required location on the clipping 830 or digital twin 107.

By using a digitizing tablet, either with or without an overlayed piece of paper, or by the use of a mouse, the user can draw the shapes or text they require to be included as digital ink 158. After completion of the drawn lines, the shape is displayed over the clipping 830 or digital twin 107 and, if considered necessary by the user, it can be moved to the required location relative to the digital twin 107 or clipping 830 using arrow keys, or by dragging using a finger on a touchscreen 105.

By using a sensing pen on coded paper, such as with the Netpage pen system (described in the above referenced Netpage patents incorporated herein by reference), or by using a sensing pen on non-coded paper, the user can draw the shapes or text they require to be included as digital ink 158. After completion of the drawn lines, the shape is displayed over the clipping 830 or digital twin 107 and, if considered necessary by the user, it can be moved to the required location relative to the digital twin 107 or clipping 830 using arrow keys, or by dragging using a finger on a touch screen 105. The user may trace out the desired line shapes using the hand, finger or a stylus in the air within the view of the view facing camera 102 or user facing camera 108. The camera images are interpreted to find the shape traced, and the shape is recorded as a sequence of 2D coordinates which define the line shares of the digital ink required. After completion of the drawn lines, the resulting shape is displayed over the clipping 830 or digital twin 107. If considered necessary by the user, it can be moved to the required location relative to the digital twin 107 or clipping 830 using arrow keys, or by dragging using a finger on a touchscreen 105.

By moving their finger or a stylus over the viewed page 10 (see Figure 10) in the shape required, the user can draw the lines and shapes they require to be included as digital ink in the location on the digital twin 107 at which they require the shapes to appear. Typically the view facing camera 102 is used for this purpose. The camera images are interpreted to find the shape traced, and the shape is recorded as a sequence of 2D coordinates which define the line shapes of the digital ink 158 required.

When the digital ink 158 is required to be text the user can type or speak that text. In the case of spoken text, the voice recognition system of the viewing device 100, or some other voice recognition system, converts it to text. The text is rendered as a line font, typically in a style similar to hand-written text, and that output is treated as digital ink.

If the user moves their viewing device 100 in the air in the shape of the digital ink 158 to be recorded, the viewing device 100's gyroscope and/or accelerometer determines that shape, flattens it to 2D by projecting it onto the best fit plane, and records it as the required digital ink. In an alternative implementation, the viewing device's camera 102 records images while the user moves the viewing device 100 in the air. The optical flow of the images is used to determine the movement. In another implementation, both the gyroscope/accelerometer and camera methods described here are used, and the two outputs are used together to create a better quality representation of the user's movements. By using a pen, such as an ultrasonic pen which contains an ultrasonic transducer or emitter that is tracked by multiple microphones either attached to the viewing device 100 or some other device, the required shapes can be drawn and recorded. If the ultrasonic pen recording system is not registered to give sufficient absolute positional accuracy in relationship to the viewing device's display screen 105, then after completion of the drawn lines, the resulting shape is displayed over the clipping 830 or digital twin 107 and, if considered necessary by the user, it can be moved to the required location relative to the digital twin 107 or clipping 830 using arrow keys, or by dragging using a finger on a touchscreen 105. In each of the above recording methods, only a line is recorded. If the user requires to apply any colour, line weight or line style other than the default, the viewer application 190 (see Figure 10) provides a method of selecting it.

Another method of recording digital ink 158 is for the user to draw the required lines with a conventional pen directly on the viewed page 10, or a copy of the page 10. The marked up page is then scanned and the scanned image matched to the image of the original reference page 210 and differenced from it. The difference image must be filtered, typically by ignoring relatively small differences, to remove variation in colour resulting from lighting and scanning variations. The remaining differences are treated as digital ink 158. They can be recorded as a raster image, or can be converted to a line image using raster to vector conversion techniques. This technique permits the colour and line weight of the pen used to be recorded as the colour and line weight of the digital ink, and permits the user to draw the exact line style that they require. Alternatively, the colour and line weight of the pen can be overwritten by a colour and line weight in a subsequent step.

Extent and Style of User Generated Content Distribution The interactive viewing system 2 (see Figure 8) provides for users to use clippings

830, user generated content, digital twins 107, digital twins with other users' added content and/or clippings, or a user's own added content and/or clippings, either as files containing the content or URLs pointing to the content on the page server 20 or some other server. The interactive viewing system 2 provides for this by one or more of the following methods: • Saving save the user generated content to their viewing device 100 or another computing device.

• Saving the user generated content to a personal scrapbook website.

• Saving the user generated content to a blog.

· Attaching user generated content to an email for sending.

• Saving or uploading user generated content to a social book-marking site such as deli.cio.us.

• Uploading image files of user generated content to photo sharing web-sites such as Flikr. In each of the above cases, DRM (Digital Rights Management) and copyright must be adhered to, so the interactive viewing system 2 provides restrictions on the amount of a publication that can be copied to the user's viewing device 100 to that amount which is considered fair use. In the case where the information is shared through use of a URL to the interactivity server 846 or the publisher's site, that site must implement appropriate restrictions.

Business Models

An income stream reliant on the interactive viewing system 2 (See Figure 10) may be generated by way of the provision of a service to publishers, where publishers pay for their publications to be supported by the interactive viewing system. A fee per page and/or a fee per specific augmentation added to the publication's digital twin 107 may be charged.

Alternatively, publishers may be charged per view of their reference page 210. That is, each time the page server 20 records that a user has viewed a reference page, a charge is levied on the publisher of the viewed publication. Alternatively, each time a user accesses a specific augmentation 220, a charge is levied on the publisher. Alternatively a levy is charge each time a user clicks-through on an advertisement.

Alternatively, a levy is charge per month (or other time period) on the publisher for each page which is currently supported on the page server 20.

It is also possible to charge for supplying analysis of usage characteristics of a publication. For example, to provide a geographic or demographic breakdown of the usage of the interactive viewing system 2, or to provide a detailed breakdown of the viewing habits or users, such as the order in which they view the publication, the number of times it is viewed by users, the number of times specific articles or advertisements are viewed, the distribution of use of the publication over time.

It is also possible to charge for supplying analysis of "sharing" and "liking" information. This includes detailed information on the number of times particular parts of the publication were shared as clippings, and the number of times these shared clippings were subsequently "shared" or "liked" on specific social networking websites, such as Facebook "sharing" and "liking" counts. It is noted that the use of pre-defined clipping regions 840 (see Figure 58) facilitates this information gathering process on some social networking websites, as it permits the clipping 830 to have a common unique URL for all times it is shared.

It is also possible to derive income from directing searches towards search sites, such as Google, which will pay a small amount each time a Google search is initiated via the interactive viewing system 2.

Advertisements displayed by the viewing application 2 can be directly charged for at, say, start up. In addition, advertisements displayed to a user can also be directly charged for as they use the website for managing their clippings 830, history 878 and personal information (see Figure 60).

A "subscribe" link is placed in the viewing application 190, which allows the user to directly subscribe to the publication 838. Each time this link is used, a charge can be levied on the publisher. Similarly, whenever a clipping 830 is viewed, a "subscribe" link is shown, which permits the viewer of the clipping to subscribe to the magazine from which the clipping was taken. Each time this link is used, a charge can be levied on the publisher.

When the viewing application 190 identifies specific components of the publication as relevant to the user in some way, such as the name of a song or book, it shows a link to purchase this from an appropriate sales website. When the user clicks-through on that link, the operator of the sales website typically will pay a small fee to the redirecting web-site. The accumulation of these charges provides a further income stream. User Interface Features

The following User Interface (UI) features are described in relation to a typical smartphone display screen 105. However, it will be appreciated that references to a smartphone are by way of example only. In general, features of the UI described below may be applicable to any viewing device 100, as previously described.

Serving the Digital Twin to the User

When the user opens the viewing application 190 on their smartphone 100, the user is presented with a live video image having the appearance of a typical camera preview image. A text prompt appears on the display screen 105 instructing the user to point the view facing camera 102 at a page 10 (for example, a magazine page). In the cover match implementation, the text prompt instructs the user to point the camera 102 at a cover page of a magazine. In other implementations, the user may be instructed to point the camera 10 at any page of a magazine.

An important user interface element relating to the process of recognition and tracking the digital twin 107 will now be described with reference to Figure 10. As described, in normal flow, the viewing application 190 sends a match request 260 of captured video frame to the page server 20 through the respective network interfaces 120 and 121. The page server 20 recognizes the reference page 210 corresponding to the frame in the match request 260, and returns a match response 280 to the viewing device, typically a smartphone 100. The smartphone 100 then sends a content request 290 for required resources to the page server 20. These resources normally include data required to track the page, the digital twin 107 and content augmentation 220. These resources are transferred to the smartphone via a content response 300. Once the resources are available locally on the smartphone 100, the viewer application 190 configures the conductor module 160 to track the page 10 (described in detail above in "The Conductor Module" sub-section). The conductor module 160 then prepares a rendered version of the digital twin 107 suitable for display on the touchscreen 105. Only after all these steps have completed can the viewer application 190 track the page 10 and show the digital twin 107 with augmentation 220. The steps from capturing the camera frame, until display of the tracked digital twin can suffer considerable delay in terms of normal user interface interaction. Delays in the range of 1 to 10 seconds are possible. In practice, users often move the smartphone 100 away from the page 10 they initially viewed during this delay, resulting in failure to page track once the above steps are complete. To encourage the user to maintain the camera view of the page 10 during this delay period, a deliberately misleading user interface element is used. A user interface element that leads the user to believe that the smartphone 100 is still analyzing the video image is presented to encourage them to hold the smartphone still, so it is in a suitable position for live digital twin 107 tracking to commence once the above steps are complete. In one embodiment, this user interface element consists of the live video images from the camera 102, overlaid with a static reticule of similar visual appearance to a traditional camera reticule. In addition, a sliding bar is dynamically shown moving up and down within the reticule with a period of a few second to give the impression that the video is being scanned or analyzed. This visual element is displayed in the interval from when the smartphone 100 receives a successful match response 280 from the page server 100, until the tracked digital twin 107 is displayed, although other active periods could also be used.

Once the page server 20 has recognized the viewed page 10, the user receives immediate feedback that page recognition has occurred. This feedback may be a displayed message (e.g. "Page Recognized"). Alternatively or additionally, the user may be presented with a thumbnail image, such as a thumbnail image of the cover page of the magazine. Immediate feedback regarding page recognition is important in order to keep the user's attention while more data (typically the view finder bundle 240) is downloaded to the user's smartphone 100. A progress bar or similar may be displayed to the user during downloading of the view finder bundle(s) 240 for the view finder module 130. Once all the relevant data has been received by the smartphone 100, the user is presented with a dynamic display of the digital twin 107 - that is, a virtual reality display of rendered digital content 1 16 corresponding to the viewed page 10, which is updated in real-time as the smartphone 100 is moved. In the event that the viewed page 10 is not recognized or the page server 20 has been unable to disambiguate between pages from different magazines, the user may be presented with thumbnail images of a plurality of magazine covers and prompted to select the correct magazine. The thumbnail images presented to the user may be selected using contextual information such as browsing history 878 (see Figure 60) known for that user or smartphone 100. Encouraging Optimal Positioning of Smartphone

While holding a smartphone 100 to track a page 10 there is a marked tendency for users to position their smartphones 100 too close to the page 10. This is disadvantageous, because a shorter focusing distance generates relatively poor quality camera images. With poor quality camera images, both the view finder module 130 and server-side page recognition cannot perform optimally, potentially causing failures in page recognition. Further, users typically do not respond to on-screen text prompts, such as "Poor image quality. Hold phone further away from page". Many users find text prompts irritating and either do not read the prompt or do not respond to the instructions in the prompt.

In order to address this problem, the viewing application 190 is configured to display both the live camera image and the 'live' digital twin 107 at a zoom level which gives the appearance to the user that the smartphone 100 is closer to the viewed page 10 than it is in reality. In other words, the viewer application 190 displays live camera images and the live digital twin 107 at a deliberately zoomed-in level compared to the smartphone' s usual camera preview images displayed when the viewer application 190 is not running. Effectively, the viewer application tricks the user into holding the smartphone 100 further away from the page 10 at an optimum distance, without explicit prompting. The extra zoom amount is normally 4% to 10%. This UI feature is effective at improving the overall performance of the interactive viewing system 2.

Sequential Download of Data In many cases, users expect to interact with displayed content as soon as it appears on the screen of their smartphone 100. For this reason, it may be advantageous not to display the digital twin 107 until all the necessary data contained in the view finder bundle 240 has been downloaded. This avoids the user having a potentially negative experience of, for example, seeing the digital twin 107 but being unable to perform any interactions therewith. However, a disadvantage of this approach is that users have to wait for longer until something tangibly useful appears on their screen.

In an alternative approach, downloads may be scheduled so that different packets of data are received sequentially. In this approach, the user is provided with increasing levels of interactivity during the download sequence until full interactivity is experienced. Referring again to Figure 10, a download schedule for a single reference page 210 may be in the order of: (1) optional thumbnail image; (2) page image; (3) tracking data for the view finder module 130 (i.e. the set of image descriptors in the view finder bundle 240); (4) augmentation 220 defining the interactive functions on the reference page 210; (5) word index enabling word searches. The scheduled download approach has the advantage that the user quickly receives something tangible (i.e. a PDF or other format image) without having to wait for the full range of interactive options in relation to the displayed digital twin 107.

When the viewed substrate 10 is a page spread containing two (or more) pages, a download schedule may differ for each page of the spread in order to provide the user with the smoothest possible virtual reality experience. The user initially views a first page of the spread, which is the primary page. For this primary page, the download schedule may be in the order described above enabling the user to view and interact with the primary page.

Once the relevant display data for the primary page has been downloaded, then display data for the secondary page may be downloaded, either automatically in accordance with a typical caching strategy or when the smartphone 100 is moved over the secondary page. The download schedule for the secondary page may be in the following order: (1) a placeholder; (2) tracking data for the view finder module 130; (3) reference page 210; (4) augmentation 220 defining the interactive functions; (5) word index. Significantly, the secondary page is initially represented by a placeholder (i.e. a frame for a blank page) and tracking data for the secondary page is prioritized over the reference page image 210. This enables the user to smoothly track over the primary page and secondary page (initially represented by a placeholder) even before the secondary page image is visible. From the user's perspective, the different download schedule for the secondary page, compared to the primary page of the spread, provides a much smoother virtual reality experience. Display of Digital Twin

Once the view finder bundle 240 has been downloaded and the relative position and orientation of the smartphone 100 determined, the live video image is replaced with the digital twin 107. Replacement of the live video image with the digital twin 107 may be performed by cross-fading so that the appearance of the digital twin 107 is not jerky and 'feels' as if the live video image is still being viewed. This cross-fading increases the sense of virtual reality. In this scenario, the user is presented with relatively subtle onscreen hints to indicate that the digital twin is being viewed as opposed to the live video image. The on-screen hints may comprise: the appearance of a utility button such as the "Clip & Share" button 828 (see Figure 58) and/or a change in the visual display on the screen 105 such as the appearance of a header 880 (see Figure 59) containing the magazine title 838 and an optional thumbnail 882 corresponding to that magazine. Figure 56 shows a typical UI for viewing a digital twin 107 having a "Clip & Share" button 828 and a header bar. The live video image may be replaced with the digital twin 107 in a more impactful manner to provide a stronger indication that the user is now viewing the digital twin 107. For example, an explosion or unfolding folding effect may be used when replacing the live video image with the digital twin.

Clipping and Sharing The ability for users to clip and share printed content via their smartphones is one of the key aspects of the viewer application 190. Therefore, a "Clip & Share" button 828 is usually a prominent component of the User Interface.

User-Generated Clippings

Referring to Figure 56, there is shown a typical UI when the smartphone 100 is dynamically displaying the digital twin 107. The main part 500 of the UI is dedicated to displaying the digital twin 107, and a "Clip & Share" button 828 appears at the bottom of the screen 105. The header 880 may comprise an "Options" button 884, the magazine title 838 (with optional information regarding month, year, volume number, issue number, bind edition etc) and a thumbnail image 882 of the relevant magazine cover. Once the user has a desired portion of the digital twin 107 framed in the main part of the touchscreen 105, the user can tap the "Clip & Share" button 828 to clip the framed image. Usually, tapping the "Clip & Share" button 828 is accompanied by audible feedback, such as a camera shutter sound. Referring to Figure 57, the UI then displays the clipping 830 to the user with various options for interacting with the clipping 830, such as the clip deposition options 832 (for example, via Facebook, Twitter, e-mail, SMS etc) or save clipping button 866 to store it locally on the smartphone 100. The various sharing options available may be defined by user preferences. The displayed clipping 830 may have a torn edge 834 (see Figure 64) appearance to indicate that it is a clipping 830 derived from an image framed by the user.

Smart Clippings

Figure 58 shows an alternative scheme for clipping regions 840 of a page defined in the digital twin 107. In this scenario, the user is not required to frame the desired portion of the digital twin and tap the "Clip & Share" button 828. Instead, tapping on a position contained by a defined region 840 of the digital twin 107 automatically clips that defined region 840 (or alternatively clips the entire reference page 210). Referring to Figure 57, the clipping 830 is displayed to the user with various options 886 relating to the displayed clipping such as visit website, show on map, phone call, connect on Facebook, or share 888 (taking the user to the clipping sharing screen shown in Figure 57). The type of clipping 830 shown in Figure 59 is a "smart clipping", because it is predetermined by the interactive viewing system and will be the same for all users who tap within the corresponding defined region 840 of the digital twin 107. Accordingly, the options 886 appearing in relation to the "smart clipping" can be tailored to the content of that clipping 830. The options appearing in Figure 886 may be suitable for a restaurant advertisement. Different options 886 may appear for a smart clipping containing a coupon such as save coupon, share coupon, explore products and so on.

Accessing Saved Clippings

Clippings 830 may be saved locally on the smartphone 100 and accessed later by the user, even when the camera 102 is not facing the viewed page 10 from which the clipping was derived. Equally, a user may receive a clipping 830 from another user and save this locally on their viewing device 100. In order to access saved clippings, the user taps the "Options" button 884 in the UI shown in Figure 56. Referring to Figure 60, the user is presented with an options menu from which "My Clips" 868 may be selected. When the user selects "My Clips" 868, the user's saved clippings 830 are displayed as large thumbnail images 870 as shown in Figure 61 or as a list 862 as shown in Figure 62 containing a small thumbnail image 890 together with clipping information, such as magazine title and issue date 876, and date of clipping 874. Figure 63 shows clippings in the list 862 organized in accordance with magazine title 838, so that all clippings 830 derived from the same magazine title 838 are stored under the same heading together with the number of clippings 892 associated with that magazine title. The user may switch between the thumbnail view of Figure 61 and the list view of Figure 62 via the clips view button 894 and the list view button 896.

From any one of the clippings display pages, the user may tap to display a desired clipping 830 (see Figure 64). The user is then presented with a display of the clipping together with various deposition options 832 or other options relevant to that clipping, including a delete button 864. With the clipping 830 displayed, the user can display the full clipping 830 in static mode (described above) enabling panning, zooming and, optionally, access to the range of interactive functions associated with the clipping. The user may be provided with an access button 898 to the reference page 10 or publication from which the clipping 830 was derived via the static mode. However, the range of content available to the user via clippings may be limited by Digital Rights Management (also discussed above). For example, publishers may elect to not provide full access to magazine content via clippings so as to prevent excessive harvesting of digital magazine content via the interactive viewing system 2.

Responsive Interface Display Screen The display screen 105 of the viewing device in the form of a smartphone 100 is interactive and responsive to a variety of conditions, as follows:

Rapid Movement of Smartphone

In normal usage, the smartphone 100 employs a combination of the view finder module 130 and optical flow module 150 to determine its position and orientation relative to a viewed page 10 (see Figure 10). This enables a 'live' virtual reality display which is smoothly updated as the smartphone 100 moves relative to the page 10. Sometimes, the user may move the smartphone 100 so quickly that view finder module 160 and optical flow module 150 both fail due to the blurriness of camera images. In this scenario, the UI switches to a display of a live camera image until such time that the view finder module 160 is able to find the correct position and orientation of the smartphone 100 relative to the page 10. Of course, if the view finder module 160 still fails with a high quality camera image, then the smartphone 100 reverts to page recognition in the page server 20 via a match request 260 in the usual manner.

Seamlessly switching to a live camera image when the view finder module 160 and optical flow module 150 both fail ensures that the digital twin 107 does not become stuck in one position and jerkily move to another position once the view finder module 160 succeeds again. From the user's perspective, the illusion of virtual reality is maintained even though the user is not actually viewing the digital twin 107 during a rapid movement of the smartphone 100. Page Moves Outside Field of View

Referring to Figures 7 and 10, if the user moves the smartphone 100 so that the camera 102 no longer has a viewed page 10 within its field of view 14, then page recognition by the view finder module 160 and page server 20 will fail. Once this happens, the display reverts to the live video image and a 'throbber' icon is displayed. The throbber icon indicates to the user that the viewer application 190 is still attempting page recognition.

After a predetermined period of time has elapsed without successful page recognition, the smartphone 100 enters a hibernate mode, typically accompanied by a brief vibration. In the hibernate mode, the smartphone 100 is no longer attempting page recognition in order to save the resources of the processor 106 and conserve battery power. The user may be presented with a "Try Again" prompt whilst in hibernate mode. When the user taps this prompt, the 100 smartphone will attempt page recognition again using camera images. Blackness Detection

The viewing application 190 has a black check module 140 which can detect when the smartphone 100 is lying flat against a page or other surface. Once it is detected that the smartphone 100 is lying flat against the viewed page 10, the page server 20 does not attempt page recognition and the view finder module 160 samples camera images less frequently. Both measures save the resources of the processor 106 in the smartphone 100 and conserve battery power.

Instead of merely displaying a black screen 105 when the camera 102 is flat against the page 10, the smartphone 100 automatically switches to static mode. In static mode, the most recently recognized digital twin 107 is displayed. The user can navigate around the reference page 210 corresponding to the digital twin 107 in static mode via conventional pan and zoom interactions. Furthermore, all interactivity associated with the digital twin 107 is typically preserved in static mode.

When the smartphone 100 is in static mode initiated by the black check module 140, a 'sleeping' icon appears on the touchscreen 105 together with the displayed digital twin 107.

Static Mode

The viewer application 10 typically enters static mode when the smartphone 100 is lying flat against a page 10 or other surface. Indeed, this is one means by which the user can enter static mode.

Alternatively, the user may enter static mode via a gestural interaction with the smartphone's touchscreen 105 or by tapping an onscreen button. For example, a tap and hold gesture, a pan gesture or a pinching gesture may all be used by the user to enter static mode. If the user requests static mode via one of these gestures, then the dynamic virtual reality display of the digital twin 107 is rolled or folded into a corner of the display screen 105 and replaced with the static digital twin 107, which is navigable via on-screen gestural interactions. Meanwhile, the corner of the display screen 105 continues to show the dynamic virtual reality display of the digital twin 107, which is updated as the smartphone 100 moves relative to the viewed page 10. By tapping in this corner, the user can exit static mode and re-enter dynamic mode whereby the dynamic virtual reality display of the digital twin 107 returns full screen (typically via an unfolding or unrolling animation). Whilst in static mode, areas beyond the reference page 210 edges of the digital twin

107 are filled with suitable background wallpaper.

Static mode is useful from the user's perspective, because panning and zooming interactions have become a natural way for many users to navigate content displayed on smartphones 100. Moreover, fine interactions with the displayed digital content 116, such as the selection of text for searching, are usually easier to perform in static mode when there is no movement of the digital twin 107 due to camera shake and so on.

Orientation of Digital Twin

Most smartphones 100 contain an accelerometer for sensing an orientation of the phone and adjusting the orientation of the display on the touchscreen 105 (portrait or landscape) depending on the sensed orientation. The positions of buttons and other visual features in the screen 105 are usually changed when switching between landscape and portrait orientations. Although this is a convenient feature for smartphone users, it has significant shortcomings. Since the accelerometer relies on gravity, the display on the screen 105 does not rearrange when the phone is lying horizontally. When lying flat, the display does not rearrange when the smartphone 100 is rotated, which can be frustrating for users.

The view finder module 160 determines the smartphone' s orientation relative to a viewed page 10. Unlike the smartphone's internal accelerometer, this orientation determination has no dependency on gravity. Therefore, the determined orientation of the smartphone 100 relative to a viewed page 10 can be used to rearrange visual elements displayed on the screen 105, irrespective of whether or not the phone is being held horizontally.

Buttons and other on-screen visual features will rearrange depending on the phone's orientation (portrait or landscape) relative to the viewed page 10. Thus, a consequential advantage of the viewer application 190 is that this on-screen visual feature rearrangement functions consistently, even when the phone is being held horizontally relative to the viewed page 10.

Display of Interactive Buttons The interactive viewing system 2 has the ability for users to interact with augmented reality features of the digital twin 107, such as buttons overlaying the virtual reality display. However, it is equally important that these interactive buttons do not unduly clutter the display and are at the same time discoverable by the user.

Zoom-Dependent Display of Buttons In order to avoid clutter in the relatively small display screen of a smartphone, the number of interactive buttons (or other interactive features) viewable via the digital twin 107 may change depending on the zoom of the page 10. This is applicable both in dynamic mode (displaying a 'live' digital twin) and static mode. For example, when viewing a whole page in dynamic mode, the user may be able to see, for example, a first interactive button in the digital twin 107. The user may tap the first interactive button to display a plurality of second buttons or menu options. However, if the smartphone 100 is held closer to the page 10 in the region of the first button, then the second buttons may automatically appear in the digital twin 107 at this zoomed-in display, providing the user with direct access to the interactivity associated with these second buttons via the digital twin 107. This zoom-dependency of the number of buttons displayed in the digital twin 107 avoids clutter in cases where a small region of a page has a number of associated interactive buttons.

Discovery of Interactive Buttons

In cases where a user is viewing part of a digital twin 107 having no interactive features, the user may wish to discover if any part of that digital twin contains interactive features. This may be achieved in a number of different ways, such as: providing a miniature map of the digital twin 107 as part of the display; a glowing border region of the display in the direction of interactive features; displayed arrows pointing in the direction of interactive features; a display of interactive buttons(s) "squashed" into a border region in the direction of these buttons; an on-screen text prompt etc.

Predetermined Dwell Time Initiating Interactive Features

Typically, interactive content is accessed by the user via tapping on a button displayed as part of an augmented reality view. For example, video content may be accessed by tapping a video playback icon 104 (see Figure 9) which appears in the digital twin 107. Alternatively, video content may be initiated automatically when the user views the relevant part of the digital twin 107. In some embodiments, automatic video playback may be initiated after a predetermined dwell time in the relevant part of the digital twin 107. In general, dwell times may be used to control what interactive options are available to the user. For example, the number of interactive buttons appearing in the digital twin 107 may be dependent on dwell time in a similar manner to the zoom-dependency described above. With a longer dwell in a particular region of the digital twin, more interactive buttons may appear in the digital twin displayed to the user. Use of Animation

The type of display experienced by the user may be determined by a publisher. In some cases, the user may simply view a plain image of the printed page in the digital twin. However, in other cases it may be desirable to provide greater impact to the user via the digital twin 107 without necessarily detracting from the virtual reality experience of the viewed page 10.

Cinemagraph Animations

Cinemagraph animations provide an excellent means for augmenting the digital twin 107 without detracting from the virtual reality experience. Rather than viewing a plain image of a printed page, the user is presented with a digital twin containing subtle animated features, which enhance the richness of the viewed content. For example, a photo of a face may be augmented with periodically blinking eyes; a liquid may periodically drip from a container; hair may be periodically blown and so on. Enhanced Button Displays

Interactive buttons in the digital twin 107 may be subtly enhanced with animation so that they are more appealing or enticing for users. For example, an interactive button may be provided with a sparkling or twinkling effect. Nested Content

Another type of animation comprises the use of nested still content - that is, periodically switching between still photos relating to different variants of a product. For example, an ice cream advertisement may be nested with stills of different flavours; a laptop advertisement may be nested with stills of different colors. CPU Usage

Some displayed digital twins will consume more resources in the processor 106 (see Figure 10) than others. For example, a digital twin 107 containing a cinemagraph animation or an interactive game will place higher demands on the processor 106 of the smartphone 100 than the display of a plain PDF page. It is important to provide the option of more complex graphics for publishers, whilst at the same time ensuring that an acceptable virtual reality experience is maintained. Moreover, it is essential to avoid the worst-case scenario of a system crash caused by overloading the smartphone' s processor 106.

Accordingly, the data downloaded to the smartphone with the digital twin may contain an instruction for the viewer application 190 to pare back the view finder module 160 sampling rate in connection with that page. Thus, the number of times per second that the view finder module 160 attempts to match features in camera images may be reduced if the graphics demands of the displayed digital twin 107 are high. The reduction in the view finder module 160 sampling rate places fewer demands on the processor 106, which frees up more processing resources for displaying complex graphics content. Usually, this reduced sampling rate will go unnoticed by the user.

Text Prompts to Emphasize Displayed Content

In some instances, it is necessary for the touchscreen 105 to display text prompts to the user. For example, the user may forget to point their smartphone camera at the cover page of a new magazine when switching between different magazines. In this case, the user may receive a text prompt such as "Point Camera at Cover Page". Part of the challenge of app development is overcoming most users' natural disinclination to read or respond to text prompts overlaid on a relatively distracting video display. Equally, it is frustrating for users to be presented with a blank screen containing only a text prompt. From the user's perspective, this 'feels' like the app has stopped working.

The viewer application 190 addresses this problem by adjusting the live video display (either the live camera image or the 'live' digital twin) whenever a text prompt is displayed. The adjustment may be by means of, for example, desaturating the live video display, fading or reducing the luminance of the live video display, displaying the live video display in black and white, or a combination of these measures.

This subtle adjustment of the live video display focuses the user's attention on the text prompt, but maintains the feeling that the viewer application 190 is still functioning properly.

Regions Beyond Page Edges of Digital Twin

When the user is viewing the digital twin 107 in dynamic mode, it is desirable to provide the user with the experience of virtual reality whilst at the same time making it clear when the user is viewing the digital twin. In order to maintain a virtual reality experience, the regions beyond page edges of the digital twin 107 typically display a live camera image. However, in order to emphasize the extent of the digital twin 107, this live camera image may be adjusted. For example, the live camera image may be desaturated, faded, displayed in black and white, or similar.

Recording User Interactivity Data As well as providing the user with digital content via a virtual reality or augmented reality display, the viewing system may record data relating to the user's interactivity with the displayed digital content. This data may be useful to, for example, advertisers and publishers, enabling them to assess the impact of a particular printed advertisement or article. In some instances, recorded user interactivity data may be used to determine an amount paid by an advertiser to, for example, a magazine publisher.

Additionally or alternatively, user interactivity data may be used to enable the interactive viewing system 2 to build up a personalized profile associated with each user identity. Each personalized profile may reflect the user's previous interactivity. This not only assists in improving the accuracy of subsequent page recognition, but also provides valuable information about each user which may be used for future direct marketing.

Typically, user interactivity data will be recorded after a viewing session has finished to maximize the confidence of recorded data. This enables the accuracy of page recognition to be refined before corresponding user interactivity data is recorded. For example, in the case where the same printed advertisement appears in publications A and B, the page recognition process may initially determine that the user is viewing publication A with 60% confidence, perhaps based on publication A being the more popular publication in the absence of other data to improve the confidence of page recognition. However, the page recognition process may subsequently determine that the user was, in fact, viewing publication B with 95% confidence, perhaps based on a view of an adjacent page of the same publication, which increases the confidence that the user was viewing publication B. In this scenario, the user interactivity data records that the user viewed the printed advertisement from publication B, even though the page recognition process initially determined that the user was viewing publication A. It is important that advertisers receive as accurate information as possible regarding user interactivity with printed advertisements and the flexibility to refine user interactivity data in this way maximizes the accuracy of recorded interactivity data. In some cases, it may be useful to record (and report) interactivity data with a corresponding confidence parameter. For example, an advertiser may only be prepared to pay a fee to a particular publisher if user interactivity with a printed advertisement is recorded above a predetermined confidence level (e.g. 90%).

Refinements of user interactivity data may be made immediately after a viewing session as described above or at a later date. In some cases, refinements of a user profile based on an accumulation of user interactivity data may cause refinements to previously recorded user interactivity data.

Various types of user interactivity data may be recorded by the interactive viewing system 2 as described below. Views

The number of views received by a particular publication, page, article, advertisement etc. may be recorded by the interactive viewing system 2. The number of views received by a particular publication may be useful for a publisher to determine, for example, a proportion of its readership that is engaging in interactive viewing. The number of views received by a particular printed article may be useful for a publisher to determine the popularity of that article and, potentially, affect future editorial decisions. The number of views received by a particular printed advertisement may be useful for an advertiser to assess the impact of that printed advertisement and/or determine an amount payable by the advertiser to a publisher.

Likewise, the viewing history of a user may be used to update that user's profile. This may be used improve the accuracy of subsequent page recognition as well as gather statistics on a user's viewing habits for other uses. Dwell Times

The period of time a user dwells on a particular publication, page, article, advertisement and so on may be recorded by the system. This is potentially useful information for a publisher, because it reveals the viewing habits of its readership.

Dwell times may be helpful in distinguishing between intentional interactions with a printed advertisement or article and accidental or coincidental views of the printed advertisement or article. Again, advertisement dwell times may be useful for determining an amount payable by the advertiser to a magazine publisher.

Eye-Tracking Data

Viewing devices 100, such as smartphones and tablet computers, may be equipped with a user-facing camera 108 (see Figure 39). The user-facing camera may be used to track the user's eye 818 movements in respect of displayed digital content 116. Eye- tracking data may be used to provide more accurate information on, for example, dwell times in respect of a particular article or advertisement. Additionally, eye-tracking data may also help to distinguish between intentional views and accidental or coincidental views of a particular advertisement or article. It will be appreciated that recorded eye- tracking data potentially provides very specific information on user's viewing habits.

Click-Throughs Click-throughs initiated by interactively viewing a particular publication, page, article, advertisement or other page element may be recorded by the interactive viewing system 2. Click-throughs demonstrate a strong degree of engagement between the user and the printed content, and this is potentially valuable information for publishers, advertisers and the like. In the present context, a click-through broadly encompasses any type of user engagement with displayed digital content 116, which is other than simply viewing the digital content. In the case of a smartphone 100 or tablet computer 766, a click-through is typically initiated by tapping on a zone of a touchscreen 105.

A click-through may result in the user navigating to an internet resource e.g. a user tapping an on-screen "Buy Now" icon 119 (see Figure 50A) and navigating to a suitable merchant webpage, such as Amazon. In other cases, a click-through may simply result in an augmented reality display changing e.g. a user tapping an on-screen "Reveal Comments" icon. In other cases, a click-through may result in initiation of a download e.g. a user tapping an on-screen "Download Coupon" icon. These and other types of click- through will be readily apparent from the present disclosure. Purchases

A click-through (e.g. from viewing a printed advertisement) may ultimately result in a purchase, and this transaction may be recorded by the system. In one business model, a recorded purchase may automatically initiate a payment to be made from the advertiser to a publisher in which the printed advertisement was placed. The payment may be a predetermined percentage of the purchase price.

When a user makes a purchase, this information may be used to update the user's profile. For example, a user profile may contain an indicator of the likelihood of that user to make a purchase via a viewing interaction, based on a number and/or value of previous purchases. Searches

The displayed digital content 1 16 may provide the user with an option of performing searches in respect of keywords or other graphics displayed to the user. A search may be initiated directly using a predetermined gestural interaction with displayed content. Alternatively, a search may be initiated indirectly by clipping a word or phrase and pasting into a suitable search engine.

The number and/or type of searches initiated from a particular publication, page, article, advertisement etc. may be recorded by the system. Searching indicates a strong degree of engagement between the user and the digital content and this information may be used for determining popular or trending topics of interest to users.

Clippings/Shares

The displayed digital content 1 16 may provide the user with an option of clipping and sharing at least part of the digital content. The clipped content may be shared directly with other users or indirectly by posting to a social networking website. The number and/or type of clippings shared from a particular publication, page, article, advertisement etc. may be recorded by the system. Clipping and sharing indicates a strong degree of engagement between the user and the digital content, and this information may be used for determining popular or trending topics of interest to users.

Printing Information The viewing device 100 may be equipped with, or in communication with, specific hardware, such as a printer. In these cases, data related to usage of that hardware may be recorded. For example, a number of printouts initiated from a viewed publication, article, advertisement etc may be recorded by the system. In one scenario, a user may be invited to printout a redeemable barcoded coupon. Accordingly, the number of such coupons in circulation can be determined.

Date and Geographical Location Information

The date/time of viewing a printed page may be recorded by the system. This data may be determined from the page server 20 used for page recognition or the viewing device 100 itself.

Likewise, the geographical location of the viewing device 100 when viewing a viewed page 10 may be recorded by the interactive viewing system 2. This information is typically determined from the viewing device 100 using, for example, GPS data, a mobile network location or similar.

Caching Strategies for Improving Performance of the System

In order to make the viewing device 100 more responsive to the user's requests, it is possible to download content before it is requested by the viewer application 190. In order to make this strategy workable, it is necessary to carefully select the data which is to be downloaded so that the optimal performance is achieved. One or more of the following strategies can be used to improve responsiveness.

Downloading Digital Content Before Digital Media is Available

Where a user has a subscription for a publication, the subscribed publication's digital content 116 can be downloaded to the user's viewing device 100 in advance of the physical delivery of the publication. The advance downloading of the digital content 116 is triggered by the page server 20 when the page information and related subscriber list are loaded. This preferably occurs as a background operation on the viewing device 100 so that the user is not impacted. In some systems, the downloading is done when the user is connected to faster and cheaper network options, such as connected via WiFi rather than connected via the mobile telephone network. Additionally, an automatic or manual (user requested) download may be initiated at the time when the viewing device 100 is connected to a host computer or when a sync operation is initiated between the viewing device 100 and the host computer or at any other time while the viewing device 100 remains connected to a host computer. Downloads may happen progressively, a few pages at a time, for efficiency or for other reasons such as due to the user disconnecting or switching off the viewing device 100 temporarily. Manual downloads may be initiated by a special user interface such as a "sync" or "update" button. Page Pre-Fetching Strategies

Strategies for page pre-fetching may be implemented individually or in various combinations. Each page download may include a digital image of the corresponding page(s), page tracking data and publication details. For example, a view finder bundle 240 (see Figure 10) may be downloaded. A number of strategies for selecting which pages to pre-emptively download are listed below.

• When a page or region of a publication is viewed, pages or regions which are physically close in the publication are downloaded.

• When a page of a publication is viewed, the facing page(s) of the corresponding page spread or page gatefold are downloaded.

• When part of an article of a publication is viewed, the following parts of that article are downloaded.

• When part of a publication is viewed, the following few pages are downloaded.

• When part of a publication is viewed, the parts of the magazine which the user habitually reads are downloaded. For example, when a user has a history of viewing crossword puzzles or comic strips, the crossword puzzle or comic strips from the publication are downloaded.

• When part of a publication is downloaded, the choice of which other parts of the publication to download can be made based on the user's demographic information. For example, the users in a certain geographic regions can be expected to have a preference to access information related to that region, or users in a certain age range may have a preference to access information targeted at that age range.

• Once a page has been visited by the user, it is likely that they will revisit that page, so the tracking and page data can be saved in the same manner as pre-emptively cached data and used to speed up subsequent accesses of that data.

For each of the above strategies, some, or all, of the pre-emptively cached data can be restricted in size by downloading text and/or low-resolution images in preference to full resolution data. This permits the cached data to have a larger coverage. When a hit is found on the cached data, the higher resolution images can be downloaded. The cached data which is restricted in size by this method is that for which the expectation of use is lower. In some systems, page tracking data only is downloaded for the pre-fetched pages in order to reduce cache size. Such page tracking data may be view finder bundles 240.

Digital Rights Management

The interactive viewing system 2 provides a convenient platform for digital rights management (DRM) in respect of the digital content 116 provided by the viewing device 100.

Control of Accessible Digital Content

The digital content 116 available for viewing (or listening) by the user may be controlled in accordance with one or more parameters. Therefore, copyright holders such as magazine publishers, are able to control access to digital content, in accordance with predetermined parameters. In some instances, it is desirable that no restrictions are placed on the digital content 116 viewable by users. However, in many instances it is desirable to control what digital content 1 16 is made available to particular users.

User Identity or Viewing Device Identity

Permission to access digital content 1 16 may be granted on the basis of the user's identity and/or the identity of the viewing device 100. The user identity or viewing device 100 identity may be determined using any of the means described previously. Once the user identity or viewing device 100 identity has been determined, then associated permissions may be determined as well. For example, the user identity may have an associated user profile indicating whether, for example, the user has paid for access to certain content, whether the user has permission to make clippings, whether the user has permission to share clippings.

Password Control

Permission to access digital content may be granted via a password issued to a user of the viewing device 100. For example, when the viewing device 100 is used to view a particular page of a magazine, the user may view a virtual reality view of that page and then be prompted to enter a password in order to view or listen to augmented reality digital content. The digital content accessible after entering the correct password, may be, for example, an audio or video clip associated with a particular article, a redeemable token or coupon, a download, a game and so on.

Passwords may be distributed to certain users as part of a magazine subscription, in response to an online registration, in response to a payment, as part of an e-mail or social network marketing campaign. Different passwords may be associated with different permissions and thereby provide different levels of access to interactive digital content.

Payment

Permission to access digital content may be granted via a payment, such as a payment to a copyright holder. A payment may grant access to one or more pieces of copyrighted material. For example, the user may be allowed to listen to an audio clip when viewing a particular printed graphic, but is only granted permission to download the audio clip in full after paying a fee, such as in response to an on-screen prompt. In another scenario, a magazine publisher may grant permission to view interactive digital content in respect of a particular magazine in response to a payment. Payment may result in issuance of a password, which is applicable for use with, for example, one article, a magazine issue, a range of magazine issues, a magazine title or a range of magazine titles.

User Subscription or Registration

Permission to access digital content may be controlled by a user subscription or registration status, which can be checked automatically via the user identity and/or the viewing device 100. Once a user's subscription status is confirmed as valid, then the user may be provided with access to a range of interactive digital content which is unavailable to non-subscribers. This serves as a potential inducement for users to subscribe to particular magazines, journals and so on.

Geographic Region

The geographic region of the user may be determined via the viewing device 100. For example, a mobile phone network, an internet service provider or a GPS signal may be used to determine in which country a user is located. The geographic region information may then be used to control the digital content accessible by the user.

For example, advertisers typically have different marketing campaigns in different countries. If a printed advertisement has associated video content, then the available video content may be dependent on the country in which the printed advertisement is being viewed. Thus, the same printed advertisement may have different associated video content depending on where the printed advertisement is being viewed by the viewing device 100. Trivially, the language of any audio content may be different.

In another scenario, a copyright holder may restrict access to digital content depending on a user's geographic location. For example, BBC Top Gear magazine may have associated BBC video content available to UK users (who pay an annual BBC license fee), but not available to other users outside the UK. Even if a user outside the UK has a copy of the relevant magazine title, that user may be restricted from viewing at least some of the associated digital content.

Language

The language of digital content 116 available for viewing (or listening) by the user may be controlled based on, for example, the geographic location of the viewing device 100, the language of the printed publication being viewed, or a user preference set in the viewing device 100.

Usually, the language of a publication is taken to be the default language for digital content 116, but it may be desirable to provide users with alternative language options for the digital content, to the extent that the same content is available in other languages. In some cases, copyright holders may wish to provide access to digital content 116 only in a language corresponding to a particular country or region in which the viewing device 100 is located.

Position of Sensing Device or Viewing Device

The position of the sensing device 808 and/or viewing device 100 relative to the substrate 10 may be used to control the digital content 116 available to users. For example, one video clip may be playable only when the sensing device 808 has a corresponding printed graphic within its field of view 14, whereas another video clip may still be playable after the user moves the viewing device 100 away from the page 10 which initiated the video clip. Likewise, content available via clippings 830 may be dependent on whether the user has viewed a corresponding printed page 10. For example, shared clippings may not have the same accessible content as original clippings.

Temporary Access In some instances, it may be desirable for publishers to provide users with temporary access to digital content 1 16. For example, an individual user may be granted with access to digital content 116 for a predetermined period of time or for a predetermined number of views. Once the user's temporary access is revoked, then the user may be requested to pay a fee in order to continue viewing the digital content. Temporary access serves as an inducement for users and, further, provides a potential revenue stream once temporary access is revoked.

Control and Tracking of Clippings

Clipping and sharing of viewed content may also be controlled. For example, a user may be granted permission to view digital content via the viewing device 100, but not be granted permission to clip and share the same content. In other words, permissions relating to viewing and sharing of digital content may be different, depending, for example, on what restrictions are imposed by the copyright holder. These restrictions help to minimize excessive harvesting of printed content onto users' viewing devices 100.

Likewise, the recipient of clipped digital content may only have permission to view (or listen to) the received content in accordance with certain parameters defined by the copyright holder. For example, if the recipient is located in a different country, that recipient may not have permission to view the same clipped content as the sender. Or if the recipient is not a subscriber of a particular publication, then that recipient may not have access to the same digital content as a sender who is a subscriber. Clipping and sharing of viewed digital content is considered to be a powerful means of encouraging uptake of the interactive viewing system among potential users. From that standpoint, it is advantageous to grant users with permission to clip and share digital content with other users. A clipping sent to another user may be tracked in order to provide useful information to, for example, publishers and advertisers. For example, tracking information may provide a publisher with useful data on the impact of a particular article via the number of times the article is shared.

In addition, clippings may be tracked to determine the identity of recipients with whom the clipped content has been shared. Accordingly, a publisher or advertiser may send related content to those recipients as part of a direct marketing campaign.

Control of Click-Through Content

Digital content accessible via a click-through, such as by tapping on an on-screen hyperlink, may also be controlled

The content available via a particular on-screen hyperlink may vary depending on certain permissions granted to a user. The digital content available by tapping a hyperlink may be in a language corresponding to the geographic location of the viewing device 100. Alternatively, the digital content available via a hyperlink may vary depending on the user's subscription or registration status.

Control of User-Modifications of Digital Content In some cases, the user may be invited to modify the viewable digital content. For example, the user may be able to attach an object to the displayed digital content. Typically, the attached object includes a photo, a video, a 'like' identifier, a rating, a comment, or digital ink. This attachment may then be viewable by other users of the interactive viewing system when they access this digital content. In another example, the user may have an option of filling in an on-screen form.

Permission to modify digital content may also be controlled. For example, only subscribers to a magazine may be granted permission to leave a comment on or rate an article; only users in a certain geographical location may be granted permission to leave a comment; only users who a viewing the printed page (as opposed to a shared clipping) may be granted with permission to leave a comment and so on.

Application Areas

Some potential application areas of the interactive viewing system described herein include: children's books, news updates, stock prices, stock portfolio, form completion, catalogues, magazines, newspapers, real estate, medical records, gaming and gambling, puzzles, location finding, geo-tagging, games, travel translations, travel locations, entertainment, movie previews, game previews, document mark-up, plans mark-up, zoom for far-sighted users, textbook annotation, textbook updates, questionnaires, paper auctions, on-line auctions, reverse auctions, social networking, children's drawings, personal scrapbooks, Pictionary, doco-tagging (similar to geo-tagging games, but in published documents rather than physical locations), garment tags, cards, post-it notes, stickers, smart photo albums, product packaging.

The embodiments described here are purely for the purposes of illustration. Skilled workers in this field will readily recognize many variations and modifications which do not depart from the spirit and scope of the broad inventive concepts.

Claims

I/We Claim:

1. A method of interacting with a substrate, the method comprises the steps of: capturing an image of content disposed on the substrate; identifying the substrate using the captured image and an image-matching technique; retrieving digital content corresponding to the identified substrate; and, displaying the digital content on a screen of a viewing device; wherein, the digital content displayed on the screen is a digital twin of the image captured of the content on the substrate, the digital twin having at least one interactive element for user interaction.

2. The method according to claim 1 wherein capturing the image of the content on the substrate is performed by a head-mounted display (HMD) worn on a user's head, the HMD being fitted with a digital camera for capturing digital video of the user's field of view.

3. The method according to claim 1 wherein the at least one interactive element is one or more of: a hyperlink; a button to initiate an action; and, video and/or audio playback options.

4. The method according to claim 1 further comprising the step of clipping digital content by a user interacting with the digital twin to select a clipping, and extracting the clipping from the digital twin.

5. The method according to claim 5 wherein the viewing device has a digital camera to capture the image, the camera having a field of view that at least partially encompasses the substrate and the clipping from the digital twin has an extent that is dependent on the field of view when the clipping is initiated.

6. The method according to claim 4 further comprising the step transferring the clipping to a server where the clipping is accessible to a plurality of users.

7. The method according to claim 6 wherein the extent of the clipping is at least partially indicated by a representation of a torn edge of a paper page when extracted and displayed on the screen.

8. The method according to claim 5 wherein the digital twin has pre-defined clipping regions and the user interaction selects one of the pre-defined clipping regions.

9. The method according to claim 8 wherein the clipping selected by the user encompasses the at least one interactive element such that the at least one interactive element retains at least some interactivity when displayed in the clipping extracted by the user.

10. The method according to claim 5 further comprising augmenting the digital twin to highlight one of the pre-defined clipping regions, the augmentation selecting said one of the pre-defined clipping regions depending on the position of the image captured on the substrate.

1 1. The method according to claim 1 further comprising the step of calculating an image signature that characterizes the image captured of the substrate, and the image-matching technique compares the image signature to similarly calculated image signatures respectively characterizing a database of reference images, each of the reference images forming a basis for a corresponding digital twin to be displayed on the screen in response to successful completion of the image-match technique.

12. The method according to claim 11 wherein calculating the image signature of the image comprises the steps of: using the image as a base image to generate a series of scale images that are successively blurred versions of the base image; using the series of scale images to produce a set of gradient images consisting of gradient vectors at each pixel location in the set of scale images; producing a set of squared, normalized, gradient difference images from the set of gradient images by arranging the set of gradient images into adjacent pairs of gradient images and subtracting the gradient vectors in one gradient image of the pair from the gradient vectors at corresponding pixel locations in the other gradient image of the pair, calculating a squared magnitude of the gradient vector difference at each pixel location, and normalizing the squared magnitude of the gradient difference at each pixel location to generate the set of squared, normalized, gradient difference images; comparing each pixel in the set of squared, normalized, gradient difference images to pixels surrounding said pixel to identify local maxima; using the local maxima to provide a set of feature points that is characteristic of the base image; deriving an image descriptor for each feature point in the set of features points; and, using the images descriptors to form an image signature characterizing the image.

13. The method according to claim 12 wherein the substrate is a page in a publication, the reference images in the database are images of reference pages from a plurality of publications including the publication being viewed by the user, and the user captures the image of the page with a viewing device having a camera, such that the viewing device provides contextual information for prioritizing the reference pages from the publication being viewed when comparing the image signature to the image signatures of the reference pages.

14. The method according to claim 13 wherein the contextual information includes at least one of: a viewing history of the user; a list of favorites specified by the user; an identity of the user; an identity of the viewing device; demographic data applicable to the user; geographic location of the user; a list of publications to which the user subscribes; publications with circulation exceeding a threshold; publications published within a predetermined preceding period; and, an indication of the publication input by the user.

15. A system for user interaction with printed content on a substrate, the system comprising: a sensing device for capturing an image of the printed content disposed on the substrate; a server with a database of reference images; a viewing device with a display screen, the viewing device configured to transmit a match request to the server, the match request including the image captured of the printed content; wherein, the server is configured to use an image-matching technique to match the image to a reference image corresponding to the substrate in response to the match request, and transmit a match response to the viewing device, the match response including digital content corresponding to the substrate identified by the image-matching technique and, the viewing device is configured to display the digital content on the screen, such that the screen displays a digital twin of the image captured, the digital twin having at least one interactive element for user interaction.

16. The system according to claim 15 wherein the sensing device is a camera incorporated into the viewing device and the screen in touch sensitive to enable user interaction with the at least one interactive element.

17. The system according to claim 15 wherein the viewing device is a head-mounted display (HMD) worn on the user's head, the sensing device is a digital camera positioned to capture digital video of the user's field of view and the screen for displaying the digital content positioned in at least part of the user's field of view.

18. The system according to claim 15 wherein the at least one interactive element is one or more of: a hyperlink; a button to initiate an action; and, video and/or audio playback options.

19. The system according to claim 15 wherein the server is configured to calculate an image signature that characterizes the image captured of the substrate, such that the image- matching technique compares the image signature to similarly calculated image signatures respectively characterizing the database of reference images.

20. The system according to claim 19 wherein the server is configured to calculate the image signature of the image by: using the image as a base image to generate a series of scale images that are successively blurred versions of the base image; using the series of scale images to produce a set of gradient images consisting of gradient vectors at each pixel location in the set of scale images; producing a set of squared, normalized, gradient difference images from the set of gradient images by arranging the set of gradient images into adjacent pairs of gradient images and subtracting the gradient vectors in one gradient image of the pair from the gradient vectors at corresponding pixel locations in the other gradient image of the pair, calculating a squared magnitude of the gradient vector difference at each pixel location, and normalizing the squared magnitude of the gradient difference at each pixel location to generate the set of squared, normalized, gradient difference images; comparing each pixel in the set of squared, normalized, gradient difference images to pixels surrounding said pixel to identify local maxima; using the local maxima to provide a set of feature points that is characteristic of the base image; deriving an image descriptor for each feature point in the set of features points; and, using the images descriptors to form an image signature characterizing the image.